Interface FeatureReportCalculator

  • All Known Implementing Classes:
    DefaultFeatureReportCalculator

    public interface FeatureReportCalculator
    Interface for computing Feature information (@see FeatureReport) from a Cortex DataSource and source data.
    • Method Summary

      All Methods Instance Methods Abstract Methods 
      Modifier and Type Method Description
      FeatureReport computeDataSourceFeatures​(java.lang.String project, java.lang.String sourceName, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf, java.lang.Boolean performCalculations)
      Computes the Features associated with a DataSource from the given Dataset.
      FeatureReport computeFeatureReport​(java.lang.String project, java.lang.String sourceName, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sampleDf, boolean performCalculations, java.util.List<org.apache.spark.sql.Row> previewCollection, java.lang.String profileGroup)
      Computes the Features associated with a given DataSource and ProfileGroup from a sample of the data.
      FeatureReport computePreviewFeatures​(java.lang.String project, java.lang.String sourceName, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf)
      Computes the Features from an explicit sample of the DataSource.
      FeatureReport computeProfileFeatures​(java.lang.String project, java.lang.String sourceName, java.lang.String profileGroup, org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf, java.lang.Boolean performCalculations)
      Computes the Features associated with a DataSource and specific ProfileGroup.
    • Method Detail

      • computeFeatureReport

        FeatureReport computeFeatureReport​(java.lang.String project,
                                           java.lang.String sourceName,
                                           org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sampleDf,
                                           boolean performCalculations,
                                           java.util.List<org.apache.spark.sql.Row> previewCollection,
                                           java.lang.String profileGroup)
        Computes the Features associated with a given DataSource and ProfileGroup from a sample of the data.
        Parameters:
        project - project the DataSource belongs to
        sourceName - Cortex DataSource name
        sampleDf - source data
        performCalculations - whether additional calculations should be performed based on the source data to fill out feature information. If false, not all properties will be filled
        previewCollection - explicit preview of the data
        profileGroup - name of the profile group, maybe null
        Returns:
        FeatureReport feature information with a reference to the sample the features were inferred from
      • computeDataSourceFeatures

        FeatureReport computeDataSourceFeatures​(java.lang.String project,
                                                java.lang.String sourceName,
                                                org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf,
                                                java.lang.Boolean performCalculations)
        Computes the Features associated with a DataSource from the given Dataset. Features will not be associated to a specific ProfileGroup.
        Parameters:
        project - project the DataSource belongs to
        sourceName - Cortex DataSource name
        sourceDf - source data
        performCalculations - perform analytic calculations
        Returns:
        FeatureReport feature information with a reference to the sample the features were inferred from.
      • computePreviewFeatures

        FeatureReport computePreviewFeatures​(java.lang.String project,
                                             java.lang.String sourceName,
                                             org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf)
        Computes the Features from an explicit sample of the DataSource. The provided Dataset should be a sample of the entire dataset, as implementations should use the given dataset for calculations, and not a sub-sample. Features will not be associated to a specific ProfileGroup.
        Parameters:
        project - project the DataSource belongs to
        sourceName - Cortex DataSource name
        sourceDf - source data
        Returns:
        FeatureReport
      • computeProfileFeatures

        FeatureReport computeProfileFeatures​(java.lang.String project,
                                             java.lang.String sourceName,
                                             java.lang.String profileGroup,
                                             org.apache.spark.sql.Dataset<org.apache.spark.sql.Row> sourceDf,
                                             java.lang.Boolean performCalculations)
        Computes the Features associated with a DataSource and specific ProfileGroup.
        Parameters:
        project - project the DataSource belongs to
        sourceName - DataSource name
        profileGroup - profile group name
        sourceDf - source data
        performCalculations - perform analytic calculations
        Returns:
        FeatureReport