Version: 1.3.14

Scan Definition Reference

This document provides a detailed description of the scan definition file that serves as the basis for running a scan in Certifai.

Scan Definition Fields

The scan definition is written in YAML and consists of the following sections (Each section is detailed below.):

scan
model_use_case
models
model_secret
model_headers
datasets
dataset_schema
evaluation
scoring

To refer to different sections of the scan definition, this doc generally uses the syntax section.field. For example, model_use_case.task_type refers to the task_type field in the model_use_case section of the scan definition.

For examples of scan definitions refer to existing definitions in the Certifai toolkit under examples/definitions/.

Scan

The scan section of the scan definition contains general information for the scan job.

Fields

Name	Type	Required/ Optional	Description
`output`	object (key=`path`)	Optional	An object for specifying where to output the resulting reports of the scan. The `path` must be a string that specifies either a local path (e.g. `./reports`,`file:/tmp/reports`) OR a supported cloud storage path (e.g `gs://\<some-path\>`, `s3://\<some-bucket\>`). Paths are relative to the location of the scan definition.

Examples:

scan:
  output:
    path: ./reports

scan:
  output:
    path: "s3://certifai/reports"

Notes

There are different methods of specifying the output path to write reports to:

Using the --output flag in the Scanner CLI
The value of the SCAN_RESULTS_DIRECTORY environment variable
The output path in the scan definition
Default value of ./reports (relative to scan definition file)

Model Use Case

The model_use_case section of the scan definition contains general information about the problem being solved by your machine learning models.

Fields

Name	Type	Required/ Optional	Description
`model_use_case_id`	string	Required	A unique identifier to group the scans related to a specific model use case. The value must be unique across organizations so that it can be used in reporting services. The value is used when creating the model use case folder (after being modified to contain filename safe characters) that reports are written to.
`task_type`	string	Required	The type of machine learning problem solved being solved. Must be one of the following: `binary-classification`, `regression`, or `multiclass-classification`
`performance_metrics`	list[object]	Optional	List of performance metrics that apply to the model use case.
`atx_performance_metric_name`	string	Optional	Name of the performance metric to be used in the ATX score for this use case
`description`	string	Optional	Description of the problem being solved
`author`	string	Optional	Information about the author of the model use case (e.g. email, name)
`name`	string	Required	Name of the model use case - intended to be more human friendly than the `model_use_case_id`

Example

model_use_case:
  model_use_case_id: c12e/datasciencelab/german_credit
  task_type: binary-classification
  name: 'Banking: Loan Approval'
  author: info@cognitivescale.com
  description: 'In this use case, each entry in the dataset represents a person who
    takes a credit loan from a bank. The learning task is to classify each person
    as either a good or bad credit risk according to the set of attributes.

    This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit

    The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
    '
  atx_performance_metric_name: Accuracy
  performance_metrics:
  - metric: Accuracy
    name: Accuracy
  - metric: Recall
    name: Recall
  - metric: Precision
    name: Precision

Performance Metrics

The performance_metrics section of the model_use_case lists the user determined and defined value that may be added in Certifai to track specific use case concerns.

Defining performance metrics at the model use case level does not mean that a performance evaluation is part of a particular scan. This field is only defining the performance metrics that apply to the use case as a whole. To include a performance evaluation as part of your scan, you must list it under the evaluation.evaluation_types field of the scan definition.

Fields

Name	Type	Required/ Optional	Description
`name`	string	Required	A unique name that serves as a reference to this metric, and with which a value is associated in the performance report
`metric`	string	Optional	A specifier of the metric to be calculated, if it is to be calculated by Certifai. See notes below for details

Example

performance_metrics:
- name: Accuracy
  metric: Accuracy
- name: Micro Recall
  metric: Recall(micro)
- name: My Bespoke Metric

Notes

The metric specifier selects a scoring algorithm. It has the general form <algorithm_family_name>[(variant)] and consists of a name for the metric scoring algorithm and allows for minor variant within that. The current supported set of families are:

Accuracy
Precision
Recall
F1
R-squared

The Precision, Recall, and F1 families support optional variants micro and macro. Micro variants are equivalent to the undecorated base family for non-multi-label problems, but are included for future use.

Precision, Recall, and F1 (along with their variants) are classification metrics and apply to both binary and multi-class cases.

In the binary case (model_use_case.task_type set to binary_classification) the metric is evaluated with respect to the 'true' label being the label specified as the favorable outcome.

R-squared is a regression metric, and requires numerical model output.

Supported aliases for R-squaredare: Rsq, R squared and, R2.

To specify a variant, use the syntax <algorithm_family_name>(<variant>). For example, Precision(micro) specifies the micro variant of Precision.

Metric family names and variants are case-insensitive, and Certifai supports aliases for some of the metric families.

Models

The models section of the scan definition specifies which models may be evaluated as part of the scan. Each model you specify is an object that has the following fields defined:

Fields

Name	Type	Required/ Optional	Description
`model_id`	string	Required	An alphanumeric string (including underscores), used to uniquely identity a model within the scan
`name`	string	Required	Name of the model - intended to be more human friendly than the `model_id`
`author`	string	Optional	Information about the author of this model
`version`	string	Optional	Field for tracking version of model
`description`	string	Optional	Description of model
`model_id_tag`	string	Optional	Tag that can be used to store extra metadata about your trained model, such as an id from an ML pipeline or a git commit hash (which wouldn't be suitable as a `model_id`)
`predict_endpoint`	string	Optional	An http address pointing to where your ML model is being hosted. See the below notes for details on how Certifai communicates with your model. This field is required unless `evaluation.no_model_access` is True. See the Evaluation section for more details.
`max_batch_size`	int	Optional	The maximum number of input rows (predictions) that the model supports in a single HTTP request. If not specified, the maximum prediction size can be as many entries as are in the dataset
`supports_soft_scoring`	boolean	Optional	Whether the model supports confidence-scored predictions (e.g. probabilities). This field is only applicable to classification task types. The default value is `false`. See notes below for how this affects the model prediction format.
`prediction_value_order`	list[any]	Optional	Ordering of class labels for soft scoring predictions returned by the model. This field is only applicable if `supports_soft_scoring` is `true`. Assuming the prediction values (i.e. values being returned by the model) are 1 and 2, then a `prediction_value_order` of [2,1] means that the soft outputs returned by your model contain the scores for the value 2 first
`performance_metric_values`	list[object])	Optional	A list of asserted values that are used for the specified metrics that are not to be calculated by Certifai.
`json_strict`	boolean	Optional	Whether the model expects data to be encoded in strict JSON (missing values will encode as `null`). If `false` then JavaScript encoding extensions will be used, and missing values will be encoded as `NaN`. This field is only applicable to remote models. The default value is `false`.

Example

models:
- model_id: logit
  name: Logistic Regression
  predict_endpoint: http://127.0.0.1:5111/german_credit_logit/predict
- model_id: svm
  author: 'info@cognitivescale.com'
  description: Scikit-learn SVC classifier
  name: Support Vector Classifier
  predict_endpoint: http://127.0.0.1:5111/german_credit_svm/predict
  supports_soft_scoring: true
  json_strict: false
  performance_metric_values:
  - name: My Bespoke Metric
    value: 0.748

Notes

Performance metric values are a list of objects, consisting of a name field and a value field. The name must correspond to a performance metric defined at the model_use_case level. The value must be a number between 0 and 1 (inclusive).

The full set of metrics are specified in the Model Use Case section of the scan definition. If a metric in the model_use_case section has no metric value, or if the evaluation section specifies no test_dataset_id, then each model must provide a value for that metric in this list.

For general information and examples of how Certifai communicates with hosted models refer to the Certifai Reference Model repo.

Model Secrets

Admins are able to create model secrets in Kubernetes (that contain a name and key (model_id value)), and those K8s secrets are referenced through the scan definition by specifying ${model_secret.<model_id value>} in the model header.

Fields

Name	Type	Required/ Optional	Description
`model_secret`	string	Optional	The model_secret is created in Kubernetes. After you add the model_secret field in the scan definition, a reference such as ${dotted.field.ref} in the model headers will use the specified field 'dotted.field.ref' taken from the specified secret.

Example

model_secret: <your-k8s-secret-name>

See: Model Secrets for more details.

Model Headers

The model_headers section of the scan definition contains information for custom HTTP-Headers to be used when communicating with your models.

Fields

Name	Type	Required/ Optional	Description
`default`	list[object]	Optional	A list of model headers that is applied to ALL models in the scan definition.
`defined`	list[object]	Optional	A list of model headers that are applied to individual models based on their `model_id`

Example

model_headers:
  default:
  - name: Content-Type
    value: application/json
  - name: accept
    value: application/json
  defined:
  - model_id: svm
    name: Content-Type
    value: application/json
  - model_id: logit
    name: Authorization
    value: Bearer ${TOKEN}

Notes

Model headers must contain name and value string fields. Defined model headers must also specify a model_id field that corresponds to a model in the models section. The value field can contain references to environment variables via the syntax: ${<ENV-VARIABLE-NAME>}.

Datasets

The datasets section of the scan definition lists the datasets that may be used as part of the evaluation. Each dataset you specify is an object that has the following fields defined:

Fields

Name	Type	Required/ Optional	Description
`dataset_id`	string	Required	An alphanumeric (including underscores) ID for referencing this dataset
`url`	string	Required	A url pointing to the dataset file. Local system files must be prefixed with `file:`. Supported URL prefixes are `file:`, `s3:`, `gs:`, and `abfs:`
`file_type`	string	Required	One of: `csv` or `json`. This selection drives an additional set of fields defined below
`encoding`	string	Optional	The file encoding of the dataset. Supported encodings: `[ascii, utf-16, utf-16-be, utf-16-le, utf-32, utf-32-be, utf-32-le, utf-7, utf-8, utf-8-sig, latin-1, iso-8859-1, windows-1252]`
`description`	string	Optional	Description of the dataset
`name`	string	Optional	Name of the dataset - intended to be more human friendly than the `dataset_id`

CSV Specific Fields

These fields are specific to CSV datasets, and are only referenced if file_type is csv.

Name	Type	Required/ Optional	Description
`has_header`	string	Optional	Whether the csv file has column headers. Defaults to `True`
`delimiter`	string	Optional	String to use as separator for the csv file. Defaults to `,`
`quote_character`	string	Optional	A one character string used to denote the start and end of a quoted item. Defaults to `"`
`escape_character`	string	Optional	A one character string used to escape other characters. Defaults to `None`

JSON Specific Fields

These fields are specific to JSON datasets, and are only referenced if file_type is json.

Name	Type	Required/ Optional	Description
`orient`	string	Optional	One of: `records`, `values`, or `columns`. These values are synonymous with pandas' read_json() orient argument. Defaults to `records`
`lines`	boolean	Optional	Whether the JSON dataset is in json-lines format. Defaults to `True`

Example

datasets:
## json dataset
- dataset_id: eval
  description: ''
  name: Evaluation dataset
  file_type: json
  url: file:test/data/german_credit_mini_records.json
  lines: true
  orient: records
## csv dataset
- dataset_id: expl
  description: ''
  file_type: csv
  has_header: true
  name: 100 row explanation dataset
  url: file:test/data/german_credit_explan.csv

Notes

The Datasets section lists one or more dataset options that may be specified for your scans; in the Evaluation section of the scan definition you specify which datasets to use for a particular scan.

Dataset Schema

The dataset_schema section of the scan definition contains schema details about the defined datasets, such as which columns are exposed to the models as well what values are allowed for individual features.

A full feature schema is inferred by Certifai from the datasets provided during the scan, so the feature_schemas field is entirely optional. However, any specified values will be applied as overrides to the inferred schema. If the outcome_column and predicted_outcome_column are not specified, the dataset is assumed to not contain them.

Fields

Name	Type	Required/ Optional	Description
`outcome_column`	string \| int	Optional	Name or index of the outcome column in the dataset
`predicted_outcome_column`	string \| int	Optional	Name or index of the predicted outcome column in the dataset
`hidden_columns`	list[string]	Optional	A list of feature names in the dataset NOT to expose to models. To include a one-hot encoded feature as hidden, list only the feature name and NOT the encoded column names
`feature_schemas`	list[object]	Optional	A list of schema definitions for features in the dataset. This list is treated as overrides to the auto-inferred schema by the scanner. It is entirely optional and does not have to include all features in the dataset (i.e. it may be sparsely defined). The expected objects in the list are described in the below Feature Schemas section
`defined_feature_order`	boolean	Optional	A boolean specifying whether the `feature_schemas` field specifies the order of the features in dataset. If set to `True`, the Feature Schemas section must contain all the features in the dataset. The default value is `false` for csv files and `true` for JSON datasets where `orient` is not set to `columns`.

Examples

dataset_schema:
  outcome_column: outcome
  predicted_outcome_column: predicted_outcome
  feature_schemas:
  - feature_name: age
  - feature_name: foreign
  - feature_name: purpose
    data_type: categorical
    category_values:
      - 'car (new)'
      - 'car (used)'
      - 'furniture/equipment'
      - 'radio/television'
    one_hot_columns:
      - name: 'purpose_car (new)'
        value: 'car (new)'
      - name: 'purpose_car (used)'
        value: 'car (used)'
      - name: 'purpose_furniture/equipment'
        value: 'furniture/equipment'
      - name: 'purpose_radio/television'
        value: 'radio/television'
  hidden_columns:
  - age
  - foreign

dataset_schema:
  outcome_column: outcome
  defined_feature_order: true
  feature_schemas:
  - feature_name: checkingstatus
  - feature_name: duration
  - feature_name: history
  - feature_name: purpose
  - feature_name: amount
    min: 10
    max: 1000000000
    spread: 1.42
  - feature_name: savings
  - feature_name: employ
  - feature_name: installment
  - feature_name: status
  - feature_name: others
  - feature_name: residence
  - feature_name: property
  - feature_name: age
  - feature_name: otherplans
  - feature_name: housing
  - feature_name: cards
  - feature_name: job
  - feature_name: liable
  - feature_name: telephone
  - feature_name: foreign
  - feature_name: outcome

Notes

ALERT

Setting defined_feature_order to false for JSON datasets that do not have orient columns may cause the dataset features to be sent to your model in an incorrect order.

For JSON datasets with an orient of records, the features found in Feature Schemas provide the order of the dataframe.

For csv datasets and JSON datasets with an orient of values or columns, the names found in the Feature Schemas section specify/override the column names of the dataframe.

Feature Schemas

Each item in the feature_schemas list can have the following attributes, with the exact usage depending on the data_type of the feature.

Fields

Name	Type	Required/ Optional	Description
`feature_name`	string \| int	Required	Column name or index in the dataset being referred to
`data_type`	string	Optional	One of: `categorical`, `numerical-int`, or `numerical-float`
`category_values`	list[string \| int \| boolean]	Optional	The possible values for the category. This attribute is only used when the `data_type` is `categorical`
`one_hot_columns`	list[object]	Optional	A list of mappings between (one-hot) column names in the dataset and values of the feature. If the schema `defined_feature_order` is `True` then this also specifies the column ordering. The set of values must match those specified in `category_values`
`target_encodings`	list[float]	Optional	A list of numeric encoding values of the feature. The set of values must be in correspondence to those specified in `category_values` in the same order
`categorical_type`	string	Optional	Data type to interpret the category values as. Must be one of: `auto`, `string`, or `int`. Defaults to `auto`, in which case the data type will be inferred. Should only be used when `data_type` is `categorical`
`min`	number	Optional	Minimum possible value for the feature. Should only be used when `data_type` is `numerical-int` or `numerical-float`
`max`	number	Optional	Maximum possible value for the feature. Should only be used when `data_type` is `numerical-int` or `numerical-float`
`spread`	number	Optional	A typical magnitude of change - normally the mean absolute deviation or standard deviation. If not specified, an appropriate spread is estimated from the dataset. Should only be used when `data_type` is `numerical-int` or `numerical-float`

Examples

Numeric features:

feature_schemas:
- feature_name: age
  data_type: numerical-int
  min: 18
  max: 65
  spread: 1.0
- feature_name: measure
  data_type: numerical-float
  min: 0.0
  spread: 0.4

Categorical features without encodings:

feature_schemas:
- feature_name: age
  data_type: categorical
  category_values:
  - 25
  - 35
  - 45
  - 55
  - 65
- feature_name: foreign
  data_type: categorical
  category_values:
  - "foreign - yes"
  - "foreign - no"
- feature_name: attribute
  data_type: categorical
  category_values:
  - true
  - false

Categorical feature with one-hot encoding:

feature_schemas:
- feature_name: foreign
  data_type: categorical
  category_values:
  - "foreign - yes"
  - "foreign - no"
  one_hot_columns:
  - name: 'foreign_foreign - yes'
    value: 'foreign - yes'
  - name: 'foreign_foreign - no'
    value: 'foreign - no'

Categorical feature with target encoding:

- feature_name: foreign
  data_type: categorical
  category_values:
  - "foreign - yes"
  - "foreign - no"
  target_encodings:
  - 0.3
  - 0.48

Categorical feature with one-hot encoding that is hidden from the model:

dataset_schema:
  hidden_columns:
  - foreign
  feature_schemas:
  - feature_name: foreign
    data_type: categorical
    category_values:
    - "foreign - yes"
    - "foreign - no"
    one_hot_columns:
    - name: 'foreign_foreign - yes'
      value: 'foreign - yes'
    - name: 'foreign_foreign - no'
      value: 'foreign - no'

Notes

One-hot encoding captures the way the dataset is encoded in the dataset source (e.g. - CSV file) and is expected by the model. It does not impact feature semantics apart from the expected encoding.
Target-encodings capture another possible way the dataset could be encoded in the dataset source (e.g. - CSV file) and is expected by the model. The technique of target encoding is a way to encode categorical values numerically (usually by some statistical association with the ground truth value of the prediction). It does not impact feature semantics apart from the expected encoding. The list of target_encoding values should match the list of category_values, providing the corresponding encoding that has been used for each.
Explanations in reports are surfaced as value-encoded for all categorical features, regardless of whether or not they are one-hot or target encoded in the dataset. This makes the explanations human-readable even for one-hot encoded datasets.
The categorical_type field can be used to specify the data type for a categorical feature when there is possible ambiguity. For example, the value "01" could either be interpreted as the string "01" or as the integer 1.

Evaluation

The evaluation section specifies the details for the evaluation that are run in the scan. Only a single evaluation can be specified in this section, but multiple evaluation types (e.g. robustness, fairness, explanation, explainability, data_statistics) can be run against all (or some) of the models specified in the Models section of the scan definition.

Fields

Name	Type	Required/ Optional	Description
`name`	string	Optional	Name of the evaluation being run
`description`	string	Optional	Description of the evaluation
`environment`	string	Optional	Field for tracking the environment the evaluation is being run in (e.g. DEV, QA, Compliance)
`evaluation_types`	list[string]	Required	The list of evaluations to be performed on the models in this scan. Valid values include: `robustness`, `fairness`, `explanation`, `explainability`, `performance`, and `data_statistics`
`evaluation_dataset_id`	string	Required	An ID corresponding to a dataset defined in the Datasets section. This dataset is used for the Robustness, Fairness, and Explainability evaluations
`explanation_dataset_id`	string	Optional	An ID corresponding to a dataset defined in the Datasets section. This dataset is used for explanation evaluation, and is only required if `explanation` is listed as an `evaluation_type`
`test_dataset_id`	string	Optional	An ID corresponding to a dataset defined in the Datasets section. This dataset is used for computing performance metrics, and is only required if the `model_use_case` section lists performance metrics to be evaluated by Certifai
`reference_dataset_id`	string	Optional	An ID corresponding to a dataset defined in the Datasets section. This dataset is used as the reference for computing data quality metrics and drift metrics for the evaluation dataset. This field is only required if `data_statistics` is an `evaluation_type`
`fairness_grouping_features`	list[object]	Optional	List of features and groups within those features to use for calculating Fairness. See the Fairness Grouping Features section for details.
`fairness_metrics`	list[string]	Optional	List of Fairness metrics to be calculated/applied as part of the Fairness evaluation. Possible values include: (case insensitive): "Demographic_parity" ("Demographic parity", "Demographic"), "Equal_opportunity" ("equal opportunity", "opportunity"), "Equal_odds" ("odds", "equal odds"), "Sufficiency" ("predictive rate parity"), and "Burden". The default value for this field is `[burden]`.
`primary_fairness_metric`	string	Optional	The Fairness metric to use as the Fairness aspect for calculating the ATX score
`explanation_types`	list[string]	Optional	List of explanation types to be used in the Explainability and Explanation evaluations. Possible explanation types include (case insensitive): "counterfactual" ("burden") and "shap". The default value is `[counterfactual]`.
`primary_explanation_type`	string	Optional	The explanation type to use as the Explainability aspect for calculating the ATX score
`feature_restrictions`	list[object]	Optional	A list of feature restrictions to apply to generated counterfactuals. See the Feature Restrictions section for details.
`hyperparameters`	list[object]	Optional	A list of objects with `name` and `value` attributes. These parameters can be used to override the default engine hyperparameters. See the Hyperparameters section for details.
`prediction_description`	string	Optional	Description of what is being predicted by the models
`prediction_favorability`	string	Optional	The prediction format for favorable prediction outcomes in the scan definition. The field has different possible values and behaviors depending on the task type, see task type specific fields below for details. If not specified, the value is inferred based on the prediction information in the scan definition.
`save_counterfactuals`	boolean	Optional	Whether to save generated counterfactuals in a separate CSV file for the Robustness, Fairness, and Explainability reports. SHAP explanations will be saved in a separate CSV file as a well for the Explainability report. Defaults to `false`.
`no_model_access`	boolean	Optional	Whether the scan does not have direct access to the model for predictions. Only a limited set of evaluations are supported when set to True. See the below notes for further details. Defaults to `false`.
`monitored_features`	list[string \| int]	Optional	List of feature name or index to use for calculating data quality and drift metrics in the `data_statistics` evaluation type. If no features are specified and model predictions are avaiable, then Certifai will compute up metrics for only the 50 most important features. Otherwise, metrics will be computed for all features.

Regression Specific Fields:

Set these attributes for regression use cases.

Name	Type	Required/ Optional	Description
`regression_boundary_type`	string	Optional	Indicates how the boundary is defined. Valid values are `absolute` and `relative` (default `relative`)
`regression_standard_deviation`	number	Optional	Only used with `regression_boundary_type` of `relative`. Amount of change of prediction required for the analysis to consider it sufficient to be a counterfactual, in units of the standard deviation of the predicted scores for the entire dataset. If omitted, a value of 0.5 is used
`regression_boundary`	number	Optional	Only used with `regression_boundary_type` of `absolute` and mutually exclusive with `regression_boundary_percentile`. Exact value of outcome that separates favorable from unfavorable
`regression_boundary_percentile`	number	Optional	Only used with `regression_boundary_type` of `absolute` and mutually exclusive with `regression_boundary`. Percentile of outcome value distribution (as empirically measured by the evaluation dataset) that separates favorable from unfavorable
`favorable_outcome_value`	string	Optional	Favorable direction for regression task type. Possible values are: `increased` and `decreased`. This field must be left empty if the `prediction_favorability` is `none`.

For regression tasks prediction_favorability is either ordered or none.

A favorability of none specifies that there is no favorable direction (increased or decreased) for the predictions. The favorable_outcome_value field is not set in this case.
A favorability of ordered specifies that there is a favorable direction (either increased or decreased) for the predictions. The favorable_outcome_value field must be set in this case.

Classification Specific Fields:

Set these attributes for classification use cases.

Name	Type	Required/ Optional	Description
`prediction_values`	list[object]	Optional	A list of of prediction outcomes that can be returned by the model. The attributes of predictions are described in the Prediction Values section.
`last_favorable_prediction`	string	Optional	The value of the last favorable prediction outcome in the `prediction_values` list. It must correspond to a prediction outcome in the `prediction_values` list. Applicable only when `prediction_favorability` is `ordered`
`favorable_outcome_group_name`	string	Optional	A name describing the group of predictions that are favorable. This field is only applicable when the `model_use_case.task_type` is `multiclass-classification` and the `prediction_favorability` is `explicit`.
`unfavorable_outcome_group_name`	string	Optional	A name describing the group of prediction are unfavorable. This field is only applicable when the `model_use_case.task_type` is `multiclass-classification` and the `prediction_favorability` is `explicit`.

For classification tasks prediction_favorability may be one of: explicit, ordered, and none.

A favorability of explicit means that the prediction outcomes in the prediction_values list explicitly declare which are favorable, by setting the favorable field to true.
A favorability of none means that none of the prediction outcomes in the prediction_values list is favorable; therefore, entries in the prediction_values list may not have the favorable field set to true.
A favorability of ordered specifies that the prediction outcomes in the prediction_values list are ordered from most favorable to least favorable. At most two counterfactuals are generated per observation for the explanation report (similar to regression) if this set. The last_favorable_prediction field can be used in this case to specify which of the prediction outcomes is deemed favorable (see below in the classification specific section).

Notes for No Model Access:

Information about no model access scanning in Certifai can be found here.

When evaluation.no_model_access is True, Certifai is only able to evaluate your model based on prior model predictions. The model predictions must be included in each dataset. Additionally:

Only a single entry is allowed under the models section.
The Model metadata must be provided because it is used to manage scan results in Certifai.
The predicted_outcome_column must be specified in the dataset_schema section.
Each dataset must have a column matching to the dataset_schema.predicted_outcome_column field.

The following evaluations are supported when evaluation.no_model_access is True:

performance. The test dataset must include the predicted outcome column for the model.
Non-burden Fairness metrics. The evaluation dataset must include the predicted outcome column for the model.
Counterfactual search variant of explanation evaluations. The explanation dataset must include the predicted outcome column for the model.
data_statistics. The evaluation and reference dataset can optionally include the predicted outcome column for the model.

The following evaluations are not supported when evaluation.no_model_access is True:

Robustness
Explainability
Burden based Fairness
SHAP based Explanations

Examples

# binary-classification evaluation
evaluation:
  description: This evaluation compares the robustness, accuracy, fairness and explanations
    for 4 candidate models.
  evaluation_dataset_id: eval
  evaluation_types:
  - robustness
  - fairness
  - explanation
  - explainability
  - performance
  explanation_dataset_id: explan
  test_dataset_id: test
  fairness_grouping_features:
    - name: age
    - name: status
  feature_restrictions:
  - feature_name: age
    restriction_string: no changes
  - feature_name: status
    restriction_string: no changes
  name: Baseline evaluation of 4 models
  prediction_description: Will a loan be granted?
  prediction_favorability: explicit
  prediction_values:
  - favorable: true
    name: Loan Granted
    value: 1
  - favorable: false
    name: Loan Denied
    value: 2

# multiclass-classification evaluation
evaluation:
  name: Baseline evaluation of 3 models
  description: This evaluation compares the robustness, fairness, performance and explainability
    for 3 candidate models.
  evaluation_dataset_id: eval
  explanation_dataset_id: explan
  evaluation_types:
   - robustness
   - fairness
   - explainability
   - explanation
   - performance
  prediction_favorability: explicit
  prediction_values:
  - name: "Heart disease not detected"
    value: 0
    favorable: true
  - name: "Stage 1: > 50% diameter narrowing in a major vessel"
    value: 1
    favorable: false
  - name: "Stage 2: > 50% diameter narrowing in a major vessel"
    value: 2
    favorable: false
  - name: "Stage 3: > 50% diameter narrowing in a major vessel"
    value: 3
    favorable: false
  - name: "Stage 4: > 50% diameter narrowing in a major vessel"
    value: 4
    favorable: false
  favorable_outcome_group_name: Heart Disease not detected
  unfavorable_outcome_group_name: Heart Disease detected
  prediction_description: "Indicator of heart disease level (angiographic disease status)"
  fairness_grouping_features:
    - name: sex
  feature_restrictions:
    - feature_name: sex
      restriction_string: no changes

# regression evaluation
evaluation:
  description: This evaluation compares the robustness, accuracy, fairness and explanations
    for 3 candidate models.
  evaluation_dataset_id: eval
  evaluation_types:
  - robustness
  - fairness
  - explanation
  - explainability
  - performance
  explanation_dataset_id: explan
  test_dataset_id: test
  fairness_grouping_features:
    - name: Marital Status
    - name: Gender
  favorable_outcome_value: increased
  feature_restrictions:
  - feature_name: Gender
    restriction_string: no changes
  name: Baseline evaluation of 3 models
  prediction_description: Amount of Settled Claim
  prediction_favorability: ordered
  regression_standard_deviation: 0.5

Fairness Grouping Features

The fairness grouping features field is only required when fairness is listed under the evaluation.evaluation_type field. Each fairness grouping feature defined has the following attributes:

Fields

Name	Type	Required/ Optional	Description
`name`	string \| int	Optional	Column name or index in the dataset
`buckets`	list[object]	Optional	An optional list of objects defining groups within the named features. If no buckets are specified, then Certifai treats each distinct value for that feature in the dataset as a separate class. The structure for specifying buckets depends on whether the feature is categorical or numerical. Refer to the notes below for details

Buckets (fields) for categorical features:

Name	Type	Required/ Optional	Description
`description`	string	Optional	Description/name of the group
`values`	list[string \| int \| boolean]	Optional	List of category values that belong in the bucket

Buckets (fields) for numerical features

Name	Type	Required/ Optional	Description
`description`	string	Optional	Description/name of the group
`max`	number	Optional	The maximum value allowed in the group. Values belong to the group with the lowest upper bound greater than or equal to the value, and exactly one bucket must omit an upper bound to act as a catch-all

Example

fairness_grouping_features:
- name: gender
- name: Age
  buckets: # buckets for numeric feature
  - description: "<= 40 years old"
    max: 40
  - description: "> 40 years old"
- name: marital-status
  buckets: # buckets for categorical feature
  - description: Single
    values:
    - marital-status_Never-married
  - description: Married
    values:
    - marital-status_Married-AF-spouse
    - marital-status_Married-civ-spouse
    - marital-status_Married-spouse-absent
  - description: Divorced
    values:
    - marital-status_Divorced
  - description: Widowed
    values:
    - marital-status_Widowed
  - description: Separated
    values:
    - marital-status_Separated

Feature Restrictions

Feature Restrictions are specified at the evaluation level and are applied by Certifai to individual features when generating counterfactuals during the Explanation, Explainability, and burden based Fairness evaluations.

All features are treated as having no restrictions by default, meaning that the corresponding feature in the generated counterfactual can take any value as defined in the dataset_schema.feature_schemas. See the Dataset Schema section for more details.

Note that fairness grouping features implicitly have a restriction string of no changes applied during the Fairness evaluation.

Fields

Name	Type	Required/ Optional	Description
`feature_name`	string \| int	Optional	Column name or index in the dataset
`restriction_string`	string	Optional	Restriction to apply on the feature. Must be one of: `no restrictions`, `no changes`, `min/max`, `percentage`, `standard deviation`, `amount`, `value set`, or `value map`. Defaults to `no restrictions` if not specified.
`restriction_numerical_percentage`	number	Optional	Allowed percentage of change relative to the original feature value. Used in conjunction with `restriction_string = percentage`
`restriction_numerical_min`	number	Optional	Minimumn value the feature values may take. Used in conjunction with `restriction_string = min/max`
`restriction_numerical_max`	number	Optional	Maxmimum value the feature values may take. Used in conjunction with `restriction_string = min/max`
`restriction_numerical_direction`	string	Optional	Allowed direction of change relative to the original feature value. Only applicable for numeric features. Can be used in conjunction with feature restriction types of `no restrictions`, `min/max`, `percentage`, `standard deviation`, or `amount`. Must be one of: `any`, `increase`, or `decrease`. Defaults to `any`.
`restriction_numerical_amount`	number	Optional	Fixed amount of change relative to the original feature value. Used in conjunction with `restriction_string = amount`. Only supported for the counterfactual search variant of the Explanation evaluation.
`restriction_numerical_std`	number	Optional	Allowed number of standard deviations of change relative to the original feature value. Used in conjunction with `restriction_string = standard deviation`. Only supported for the counterfactual search variant of the Explanation evaluation.
`restriction_numerical_tolerance`	number	Optional	An additional allowed amount of change for numeric feature values if all feature restrictions cannot be met. Only supported for the counterfactual search variant of the Explanation evaluation. The exact amount of change depends on restriction string. Can be used in conjunction with feature restriction types of `no restrictions`, `min/max`, `percentage`, `standard deviation`, or `amount`.
`restriction_value_set`	list[string \| int \| boolean]	Optional	A subset of category values the feature change to. Only applicable for categorical features and used in conjunction with `restriction_string = value set`. Only supported for the counterfactual search variant of the Explanation evaluation.
`restriction_tolerance_value_set`	list[string \| int \| boolean]	Optional	Additional category values any feature may change to, if all feature restrictions cannot be met. Used in conjunction with `restriction_string = value set`. Only supported for the counterfactual search variant of the Explanation evaluation.
`restriction_value_map`	list[object]	Optional	A list of objects specifiying the allowed subset of feature values a given category change to. Category values not present in the mapping are allowed to change to any feature value. Only applicable for `categorical` features and used in conjunction with `restriction_string = value map`. Only supported for the counterfactual search variant of the Explanation evaluation.
`restriction_tolerance_value_map`	list[object]	Optional	An additional list of objects specfifying the allowed subset of feature values a given category change to, if all feature restrictions cannot be met. Used in conjunction with `restriction_string = value set`. Only supported for the counterfactual search variant of the Explanation evaluation.

Examples

feature_restrictions:
- feature_name: age
  restriction_string: no changes
- feature_name: marital
  restriction_string: no restrictions

feature_restrictions:  # restrictions for numeric features
- feature_name: age
  restriction_string: 'min/max'
  restriction_numerical_min: 18
  restriction_numerical_max: 65
- feature_name: term_length
  restriction_string: percentage
  restriction_numerical_percentage: 0.25
  restriction_numerical_direction: increase
- feature_name: measure1
  restriction_string: standard deviation
  restriction_numerical_std: 0.5
- feature_name: measure2
  restriction_string: amount
  restriction_numerical_amount: 10.0
  restriction_numerical_tolerance: 5.0

feature_restrictions:  # restrictions for categoricals features
- feature_name: age
  restriction_string: no changes
- feature_name: grouping1   # values A - B
  restriction_string: value set
  restriction_value_set:
  - A
  - B
  - C
  restriction_tolerance_value_set:
  - D
- feature_name: grouping2   # values 1 - 5
  restriction_string: value map
  restriction_value_map:
  - value: 1
    allowed_values:
    - 3
    - 5
  - value: 2
    allowed_values: 4
  - value: 4
    allowed_values: 2
  restriction_tolerance_value_map:
  - value: 1
    allowed_values:
    - 2
    - 4

Prediction Values

The prediction_values field is applicable only to classification (binary or multiclass) tasks.

Fields

Name	Type	Required/ Optional	Description
`name`	string	Optional	A human-readable name for this prediction (e.g. 'Has Diabetes')
`value`	any	Optional	The output value from the ML model corresponding to this prediction (e.g. 1/0)
`favorable`	boolean	Optional	Optional field specifying if the prediction is favorable. Defaults to false

Examples

prediction_values:
- favorable: true
  name: Makes a deposit
  value: 1
- favorable: false
  name: Does not make a deposit
  value: 0

prediction_values: # boolean prediction values
- value: true
  favorable: true
  name: "True"
- value: false
  favorable: true
  name: "False"

Hyperparameters

The hyperparameters list can be used to modify the hyper-parameters used by the Certifai engine. Each item in the list must have the following fields:

Fields

Name	Type	Required/ Optional	Description
`name`	string	Required	Name of the hyperparameter
`value`	any	Required	Value to apply to the hyperparameter

Notes

ALERT

Hyperparameter tuning is intended for advanced users and can affect the quality of results produced and overall evaluation time.

For detailed reference of available engine hyper-parameters, refer to the engine hyperparameter section of the Configuration File Reference.

Examples

hyperparameters:
- name: num_counterfactuals
  value: 3
- name: sampling_boundary
  value: 0.05

Scoring

The scoring section of the scan definition is used to specify custom weights for computing Explainability scores and ATX scores.

Fields

Name	Type	Required/ Optional	Description
`explainability`	list[object]		A list of custom weights to be used when computing the explainability score. Refer to the Explainability section for details.
`aspect_weights`	list[object]		A list of objects each specifying `name` and `value` attributes. The `value` is the corresponding weight for the `name` component. Refer to the Aspect Weights section for details.

Example

scoring:
  explainability:
  - num_features: 1
    value: 100
  - num_features: 2
    value: 80
  - num_features: 3
    value: 50
  - num_features: 4
    value: 20
  aspect_weights:
  - name: "explainability"
    value: 1.0
  - name: "robustness"
    value: 0.5
  - name: "fairness"
    value: 1.0
  - name: "performance"
    value: 0.2

Explainability

The explainability weights is a list of objects, each specifying a num_features and value attributes. The value is the corresponding weight to the specified num_features.

Fields

Name	Type	Required/ Optional	Description
`num_features`	int	Optional	An integer between 1 and 10
`value`	number	Optional	A weight between 0 and 100 setting the corresponding weight for `num_features`

The current default value is:

explainability:
- num_features: 1
  value: 100.0
- num_features: 2
  value: 80.0
- num_features: 3
  value: 50.0
- num_features: 4
  value: 20.0
- num_features: 5
  value: 0.0
- num_features: 6
  value: 0.0
- num_features: 7
  value: 0.0
- num_features: 8
  value: 0.0
- num_features: 9
  value: 0.0
- num_features: 10
  value: 0.0

Aspect Weights

The aspects weights section is a list of objects that species the weightings for computing the ATX score.

Fields

Name	Type	Required/ Optional	Description
`name`	string	Optional	Name of the ATX aspect; must be one of: `explainability`, `robustness`, `fairness`, or `performance`
`value`	number	Optional	A non-negative numerical weight for the component

The current default value is:

aspect_weights:
- name: explainability"
  value: 1.0
- name: robustness
  value": 1.0
- name: fairness
  value": 1.0
- name: performance
  value": 1.0

#Scan Definition Fields

#Scan

#Model Use Case

#Performance Metrics

#Models

#Model Secrets

#Model Headers

#Datasets

#Dataset Schema

ALERT

#Feature Schemas

#Evaluation

#Fairness Grouping Features

#Feature Restrictions

#Prediction Values

#Hyperparameters

ALERT

#Scoring

#Explainability

#Aspect Weights

Scan Definition Fields

Scan

Model Use Case

Performance Metrics

Models

Model Secrets

Model Headers

Datasets

Dataset Schema

Feature Schemas

Evaluation

Fairness Grouping Features

Feature Restrictions

Prediction Values

Hyperparameters

Scoring

Explainability

Aspect Weights