java.lang.Object
- com.c12e.cortex.profiles.featurecatalog.DefaultFeatureReportCalculator

All Implemented Interfaces:: FeatureReportCalculator

public class DefaultFeatureReportCalculator
extends java.lang.Object
implements FeatureReportCalculator

Field Summary

Fields
Modifier and Type	Field	Description
`static java.lang.Integer`	`MIN_SAMPLE_SIZE`
`static java.lang.String`	`PROFILE_ID_FIELD`
`static java.lang.Double`	`SAMPLE_MOE`
`static java.lang.Double`	`SAMPLE_P`
`static java.lang.Double`	`SAMPLE_Z_SCORE`
`static java.util.List<java.lang.Double>`	`SAMPLING_FRACTIONS`
`static java.lang.String`	`TIMESTAMP_FIELD`

Constructor Summary

Constructors
Constructor Description

DefaultFeatureReportCalculator()

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`FeatureReport`	`computeDataSourceFeatures(java.lang.String project, java.lang.String sourceName, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf, java.lang.Boolean performCalculations)`	Computes the `Features` associated with a DataSource from the given Dataset.
`FeatureReport`	`computeFeatureReport(java.lang.String project, java.lang.String sourceName, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sampleDf, boolean performCalculations, java.util.List<org.apache.spark.sql.Row> previewCollection, java.lang.String profileGroup)`	Computes the `Features` associated with a given DataSource and ProfileGroup from a sample of the data.
`FeatureReport`	`computePreviewFeatures(java.lang.String project, java.lang.String sourceName, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf)`	Computes the `Features` from an explicit sample of the DataSource.
`FeatureReport`	`computeProfileFeatures(java.lang.String project, java.lang.String sourceName, java.lang.String profileGroup, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf, java.lang.Boolean performCalculations)`	Computes the `Features` associated with a DataSource and specific `ProfileGroup`.
`org.apache.spark.sql.Dataset<org.apache.spark.sql.Row>`	`sample(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df)`	Returns a sample taken from the dataset of size `MIN_SAMPLE_SIZE`.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - SAMPLE_MOE
```
public static final java.lang.Double SAMPLE_MOE
```
  - SAMPLE_Z_SCORE
```
public static final java.lang.Double SAMPLE_Z_SCORE
```
  - SAMPLE_P
```
public static final java.lang.Double SAMPLE_P
```
  - MIN_SAMPLE_SIZE
```
public static final java.lang.Integer MIN_SAMPLE_SIZE
```
  - PROFILE_ID_FIELD
```
public static final java.lang.String PROFILE_ID_FIELD
```
    See Also:
    
    Constant Field Values
  - TIMESTAMP_FIELD
```
public static final java.lang.String TIMESTAMP_FIELD
```
    See Also:
    
    Constant Field Values
  - SAMPLING_FRACTIONS
```
public static final java.util.List<java.lang.Double> SAMPLING_FRACTIONS
```
- Constructor Detail
  - DefaultFeatureReportCalculator
```
public DefaultFeatureReportCalculator()
```
- Method Detail
  - computeFeatureReport
```
public FeatureReport computeFeatureReport(java.lang.String project,
                                          java.lang.String sourceName,
                                          org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sampleDf,
                                          boolean performCalculations,
                                          java.util.List<org.apache.spark.sql.Row> previewCollection,
                                          java.lang.String profileGroup)
```
    Description copied from interface: FeatureReportCalculator
    
    Computes the Features associated with a given DataSource and ProfileGroup from a sample of the data.
    
    Specified by:
    
    computeFeatureReport in interface FeatureReportCalculator
    
    Parameters:
    
    project - project the DataSource belongs to
    
    sourceName - Cortex DataSource name
    
    sampleDf - source data
    
    performCalculations - whether additional calculations should be performed based on the source data to fill out feature information. If false, not all properties will be filled
    
    previewCollection - explicit preview of the data
    
    profileGroup - name of the profile group, maybe null
    
    Returns:
    
    FeatureReport feature information with a reference to the sample the features were inferred from
  - computeDataSourceFeatures
```
public FeatureReport computeDataSourceFeatures(java.lang.String project,
                                               java.lang.String sourceName,
                                               org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf,
                                               java.lang.Boolean performCalculations)
```
    Description copied from interface: FeatureReportCalculator
    
    Computes the Features associated with a DataSource from the given Dataset. Features will not be associated to a specific ProfileGroup.
    
    Specified by:
    
    computeDataSourceFeatures in interface FeatureReportCalculator
    
    Parameters:
    
    project - project the DataSource belongs to
    
    sourceName - Cortex DataSource name
    
    sourceDf - source data
    
    performCalculations - perform analytic calculations
    
    Returns:
    
    FeatureReport feature information with a reference to the sample the features were inferred from.
  - computePreviewFeatures
```
public FeatureReport computePreviewFeatures(java.lang.String project,
                                            java.lang.String sourceName,
                                            org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf)
```
    Description copied from interface: FeatureReportCalculator
    
    Computes the Features from an explicit sample of the DataSource. The provided Dataset should be a sample of the entire dataset, as implementations should use the given dataset for calculations, and not a sub-sample. Features will not be associated to a specific ProfileGroup.
    
    Specified by:
    
    computePreviewFeatures in interface FeatureReportCalculator
    
    Parameters:
    
    project - project the DataSource belongs to
    
    sourceName - Cortex DataSource name
    
    sourceDf - source data
    
    Returns:
    
    FeatureReport
  - computeProfileFeatures
```
public FeatureReport computeProfileFeatures(java.lang.String project,
                                            java.lang.String sourceName,
                                            java.lang.String profileGroup,
                                            org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf,
                                            java.lang.Boolean performCalculations)
```
    Description copied from interface: FeatureReportCalculator
    
    Computes the Features associated with a DataSource and specific ProfileGroup.
    
    Specified by:
    
    computeProfileFeatures in interface FeatureReportCalculator
    
    Parameters:
    
    project - project the DataSource belongs to
    
    sourceName - DataSource name
    
    profileGroup - profile group name
    
    sourceDf - source data
    
    performCalculations - perform analytic calculations
    
    Returns:
    
    FeatureReport
  - sample
```
public org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sample(org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> df)
```
    Returns a sample taken from the dataset of size MIN_SAMPLE_SIZE. If the dataset size is smaller than MIN_SAMPLE_SIZE, then the dataset will be returned as is.
    
    Parameters:
    
    df - dataset to sample
    
    Returns:
    
    a dataset sample

Class DefaultFeatureReportCalculator

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

SAMPLE_MOE

SAMPLE_Z_SCORE

SAMPLE_P

MIN_SAMPLE_SIZE

PROFILE_ID_FIELD

TIMESTAMP_FIELD

SAMPLING_FRACTIONS

Constructor Detail

DefaultFeatureReportCalculator

Method Detail

computeFeatureReport

computeDataSourceFeatures

computePreviewFeatures

computeProfileFeatures

sample