Scan Definition Reference
This document provides a detailed description of the scan definition file that serves as the basis for running a scan in Certifai.
Scan Definition Fields
The scan definition is written in YAML and consists of the following sections (Each section is detailed below.):
scan
model_use_case
models
model_secret
model_headers
datasets
dataset_schema
evaluation
scoring
To refer to different sections of the scan definition, this doc generally uses the syntax section.field
. For example,
model_use_case.task_type
refers to the task_type
field in the model_use_case
section of the scan definition.
For examples of scan definitions refer to existing definitions in the Certifai toolkit under examples/definitions/
.
Scan
The scan
section of the scan definition contains general information for the scan job.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
output | object (key=path ) | Optional | An object for specifying where to output the resulting reports of the scan. The path must be a string that specifies either a local path (e.g. ./reports ,file:/tmp/reports ) OR a supported cloud storage path (e.g gs://\<some-path\> , s3://\<some-bucket\> ). Paths are relative to the location of the scan definition. |
Examples:
scan: output: path: ./reports
scan: output: path: "s3://certifai/reports"
Notes
There are different methods of specifying the output path to write reports to:
- Using the
--output
flag in the Scanner CLI - The value of the
SCAN_RESULTS_DIRECTORY
environment variable - The output path in the scan definition
- Default value of
./reports
(relative to scan definition file)
Model Use Case
The model_use_case
section of the scan definition contains general information about the problem being solved by your machine learning models.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
model_use_case_id | string | Required | A unique identifier to group the scans related to a specific model use case. The value must be unique across organizations so that it can be used in reporting services. The value is used when creating the model use case folder (after being modified to contain filename safe characters) that reports are written to. |
task_type | string | Required | The type of machine learning problem solved being solved. Must be one of the following: binary-classification , regression , or multiclass-classification |
performance_metrics | list[object] | Optional | List of performance metrics that apply to the model use case. |
atx_performance_metric_name | string | Optional | Name of the performance metric to be used in the ATX score for this use case |
description | string | Optional | Description of the problem being solved |
author | string | Optional | Information about the author of the model use case (e.g. email, name) |
name | string | Required | Name of the model use case - intended to be more human friendly than the model_use_case_id |
Example
model_use_case: model_use_case_id: c12e/datasciencelab/german_credit task_type: binary-classification name: 'Banking: Loan Approval' author: info@cognitivescale.com description: 'In this use case, each entry in the dataset represents a person who takes a credit loan from a bank. The learning task is to classify each person as either a good or bad credit risk according to the set of attributes.
This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit
The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29 ' atx_performance_metric_name: Accuracy performance_metrics: - metric: Accuracy name: Accuracy - metric: Recall name: Recall - metric: Precision name: Precision
Performance Metrics
The performance_metrics
section of the model_use_case
lists the user determined and defined value that may be added in Certifai to track specific use case concerns.
Defining performance metrics at the model use case level does not mean that a performance
evaluation is part of a particular scan. This field is only defining the performance metrics that apply to the use case as a whole. To include a performance
evaluation as part of your scan, you must list it under the evaluation.evaluation_types
field of the scan definition.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Required | A unique name that serves as a reference to this metric, and with which a value is associated in the performance report |
metric | string | Optional | A specifier of the metric to be calculated, if it is to be calculated by Certifai. See notes below for details |
Example
performance_metrics:- name: Accuracy metric: Accuracy- name: Micro Recall metric: Recall(micro)- name: My Bespoke Metric
Notes
The metric
specifier selects a scoring algorithm. It has the general form <algorithm_family_name>[(variant)]
and consists of a name for the metric scoring algorithm and allows for minor variant within that. The current supported set of families are:
Accuracy
Precision
Recall
F1
R-squared
The Precision
, Recall
, and F1
families support optional variants micro
and macro
. Micro variants are equivalent to the undecorated base family for non-multi-label problems, but are included for future use.
Precision
, Recall
, and F1
(along with their variants) are classification metrics and apply to both binary and multi-class cases.
In the binary case (model_use_case.task_type
set to binary_classification
) the metric is evaluated with respect to the 'true' label being the label specified as the favorable outcome.
R-squared
is a regression metric, and requires numerical model output.
Supported aliases for R-squared
are: Rsq
, R squared
and, R2
.
To specify a variant, use the syntax <algorithm_family_name>(<variant>)
. For example, Precision(micro)
specifies the micro
variant of Precision
.
Metric family names and variants are case-insensitive, and Certifai supports aliases for some of the metric families.
Models
The models
section of the scan definition specifies which models may be evaluated as part of the scan. Each model you specify is an object that has the following fields defined:
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
model_id | string | Required | An alphanumeric string (including underscores), used to uniquely identity a model within the scan |
name | string | Required | Name of the model - intended to be more human friendly than the model_id |
author | string | Optional | Information about the author of this model |
version | string | Optional | Field for tracking version of model |
description | string | Optional | Description of model |
model_id_tag | string | Optional | Tag that can be used to store extra metadata about your trained model, such as an id from an ML pipeline or a git commit hash (which wouldn't be suitable as a model_id ) |
predict_endpoint | string | Optional | An http address pointing to where your ML model is being hosted. See the below notes for details on how Certifai communicates with your model. This field is required unless evaluation.no_model_access is True. See the Evaluation section for more details. |
max_batch_size | int | Optional | The maximum number of input rows (predictions) that the model supports in a single HTTP request. If not specified, the maximum prediction size can be as many entries as are in the dataset |
supports_soft_scoring | boolean | Optional | Whether the model supports confidence-scored predictions (e.g. probabilities). This field is only applicable to classification task types. The default value is false . See notes below for how this affects the model prediction format. |
prediction_value_order | list[any] | Optional | Ordering of class labels for soft scoring predictions returned by the model. This field is only applicable if supports_soft_scoring is true . Assuming the prediction values (i.e. values being returned by the model) are 1 and 2, then a prediction_value_order of [2,1] means that the soft outputs returned by your model contain the scores for the value 2 first |
performance_metric_values | list[object]) | Optional | A list of asserted values that are used for the specified metrics that are not to be calculated by Certifai. |
json_strict | boolean | Optional | Whether the model expects data to be encoded in strict JSON (missing values will encode as null ). If false then JavaScript encoding extensions will be used, and missing values will be encoded as NaN . This field is only applicable to remote models. The default value is false . |
Example
models:- model_id: logit name: Logistic Regression predict_endpoint: http://127.0.0.1:5111/german_credit_logit/predict- model_id: svm author: 'info@cognitivescale.com' description: Scikit-learn SVC classifier name: Support Vector Classifier predict_endpoint: http://127.0.0.1:5111/german_credit_svm/predict supports_soft_scoring: true json_strict: false performance_metric_values: - name: My Bespoke Metric value: 0.748
Notes
Performance metric values are a list of objects, consisting of a name
field and a value
field. The name
must correspond to a performance metric defined at the model_use_case
level. The value
must be a number between 0 and 1
(inclusive).
The full set of metrics are specified in the Model Use Case section of the scan definition. If a metric in the model_use_case
section has no metric
value, or if the evaluation
section specifies no test_dataset_id
, then each model must provide a value for that metric in this list.
For general information and examples of how Certifai communicates with hosted models refer to the Certifai Reference Model repo.
Model Secrets
Admins are able to create model secrets in Kubernetes (that contain a name and key (model_id
value)), and those K8s secrets are referenced through the scan definition by specifying ${model_secret.<model_id value>}
in the model header.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
model_secret | string | Optional | The model_secret is created in Kubernetes. After you add the model_secret field in the scan definition, a reference such as ${dotted.field.ref} in the model headers will use the specified field 'dotted.field.ref' taken from the specified secret. |
Example
model_secret: <your-k8s-secret-name>
See: Model Secrets for more details.
Model Headers
The model_headers
section of the scan definition contains information for custom HTTP-Headers to be used when communicating with your models.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
default | list[object] | Optional | A list of model headers that is applied to ALL models in the scan definition. |
defined | list[object] | Optional | A list of model headers that are applied to individual models based on their model_id |
Example
model_headers: default: - name: Content-Type value: application/json - name: accept value: application/json defined: - model_id: svm name: Content-Type value: application/json - model_id: logit name: Authorization value: Bearer ${TOKEN}
Notes
Model headers must contain name
and value
string fields. Defined model headers must also specify a model_id
field that corresponds to a model in the models section. The value
field can contain references to environment variables via the syntax: ${<ENV-VARIABLE-NAME>}
.
Datasets
The datasets
section of the scan definition lists the datasets that may be used as part of the evaluation. Each dataset you specify is an object that has the following fields defined:
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
dataset_id | string | Required | An alphanumeric (including underscores) ID for referencing this dataset |
url | string | Required | A url pointing to the dataset file. Local system files must be prefixed with file: . Supported URL prefixes are file: , s3: , gs: , and abfs: |
file_type | string | Required | One of: csv or json . This selection drives an additional set of fields defined below |
encoding | string | Optional | The file encoding of the dataset. Supported encodings: [ascii, utf-16, utf-16-be, utf-16-le, utf-32, utf-32-be, utf-32-le, utf-7, utf-8, utf-8-sig, latin-1, iso-8859-1, windows-1252] |
description | string | Optional | Description of the dataset |
name | string | Optional | Name of the dataset - intended to be more human friendly than the dataset_id |
CSV Specific Fields
These fields are specific to CSV datasets, and are only referenced if file_type
is csv
.
Name | Type | Required/ Optional | Description |
---|---|---|---|
has_header | string | Optional | Whether the csv file has column headers. Defaults to True |
delimiter | string | Optional | String to use as separator for the csv file. Defaults to , |
quote_character | string | Optional | A one character string used to denote the start and end of a quoted item. Defaults to " |
escape_character | string | Optional | A one character string used to escape other characters. Defaults to None |
JSON Specific Fields
These fields are specific to JSON datasets, and are only referenced if file_type
is json
.
Name | Type | Required/ Optional | Description |
---|---|---|---|
orient | string | Optional | One of: records , values , or columns . These values are synonymous with pandas' read_json() orient argument. Defaults to records |
lines | boolean | Optional | Whether the JSON dataset is in json-lines format. Defaults to True |
Example
datasets:## json dataset- dataset_id: eval description: '' name: Evaluation dataset file_type: json url: file:test/data/german_credit_mini_records.json lines: true orient: records## csv dataset- dataset_id: expl description: '' file_type: csv has_header: true name: 100 row explanation dataset url: file:test/data/german_credit_explan.csv
Notes
The Datasets section lists one or more dataset options that may be specified for your scans; in the Evaluation section of the scan definition you specify which datasets to use for a particular scan.
Dataset Schema
The dataset_schema
section of the scan definition contains schema details about the defined datasets, such as which
columns are exposed to the models as well what values are allowed for individual features.
A full feature schema is inferred by Certifai from the datasets provided during the scan, so the feature_schemas
field is entirely optional. However, any specified values will be applied as overrides to the inferred schema. If the
outcome_column
and predicted_outcome_column
are not specified, the dataset is assumed to not contain them.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
outcome_column | string | int | Optional | Name or index of the outcome column in the dataset |
predicted_outcome_column | string | int | Optional | Name or index of the predicted outcome column in the dataset |
hidden_columns | list[string] | Optional | A list of feature names in the dataset NOT to expose to models. To include a one-hot encoded feature as hidden, list only the feature name and NOT the encoded column names |
feature_schemas | list[object] | Optional | A list of schema definitions for features in the dataset. This list is treated as overrides to the auto-inferred schema by the scanner. It is entirely optional and does not have to include all features in the dataset (i.e. it may be sparsely defined). The expected objects in the list are described in the below Feature Schemas section |
defined_feature_order | boolean | Optional | A boolean specifying whether the feature_schemas field specifies the order of the features in dataset. If set to True , the Feature Schemas section must contain all the features in the dataset. The default value is false for csv files and true for JSON datasets where orient is not set to columns . |
Examples
dataset_schema: outcome_column: outcome predicted_outcome_column: predicted_outcome feature_schemas: - feature_name: age - feature_name: foreign - feature_name: purpose data_type: categorical category_values: - 'car (new)' - 'car (used)' - 'furniture/equipment' - 'radio/television' one_hot_columns: - name: 'purpose_car (new)' value: 'car (new)' - name: 'purpose_car (used)' value: 'car (used)' - name: 'purpose_furniture/equipment' value: 'furniture/equipment' - name: 'purpose_radio/television' value: 'radio/television' hidden_columns: - age - foreign
dataset_schema: outcome_column: outcome defined_feature_order: true feature_schemas: - feature_name: checkingstatus - feature_name: duration - feature_name: history - feature_name: purpose - feature_name: amount min: 10 max: 1000000000 spread: 1.42 - feature_name: savings - feature_name: employ - feature_name: installment - feature_name: status - feature_name: others - feature_name: residence - feature_name: property - feature_name: age - feature_name: otherplans - feature_name: housing - feature_name: cards - feature_name: job - feature_name: liable - feature_name: telephone - feature_name: foreign - feature_name: outcome
Notes
ALERT
Setting defined_feature_order
to false for JSON datasets that do not have orient columns
may cause the dataset features to be sent to your model in an incorrect order.
For JSON datasets with an orient
of records
, the
features found in Feature Schemas provide the order of the dataframe.
For csv datasets and JSON datasets with an orient
of values
or columns
, the names found in the Feature Schemas section specify/override the column names of the dataframe.
Feature Schemas
Each item in the feature_schemas
list can have the following attributes, with the exact usage depending on the
data_type
of the feature.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
feature_name | string | int | Required | Column name or index in the dataset being referred to |
data_type | string | Optional | One of: categorical , numerical-int , or numerical-float |
category_values | list[string | int | boolean] | Optional | The possible values for the category. This attribute is only used when the data_type is categorical |
one_hot_columns | list[object] | Optional | A list of mappings between (one-hot) column names in the dataset and values of the feature. If the schema defined_feature_order is True then this also specifies the column ordering. The set of values must match those specified in category_values |
target_encodings | list[float] | Optional | A list of numeric encoding values of the feature. The set of values must be in correspondence to those specified in category_values in the same order |
categorical_type | string | Optional | Data type to interpret the category values as. Must be one of: auto , string , or int . Defaults to auto , in which case the data type will be inferred. Should only be used when data_type is categorical |
min | number | Optional | Minimum possible value for the feature. Should only be used when data_type is numerical-int or numerical-float |
max | number | Optional | Maximum possible value for the feature. Should only be used when data_type is numerical-int or numerical-float |
spread | number | Optional | A typical magnitude of change - normally the mean absolute deviation or standard deviation. If not specified, an appropriate spread is estimated from the dataset. Should only be used when data_type is numerical-int or numerical-float |
Examples
Numeric features:
feature_schemas:- feature_name: age data_type: numerical-int min: 18 max: 65 spread: 1.0- feature_name: measure data_type: numerical-float min: 0.0 spread: 0.4
Categorical features without encodings:
feature_schemas:- feature_name: age data_type: categorical category_values: - 25 - 35 - 45 - 55 - 65- feature_name: foreign data_type: categorical category_values: - "foreign - yes" - "foreign - no"- feature_name: attribute data_type: categorical category_values: - true - false
Categorical feature with one-hot encoding:
feature_schemas:- feature_name: foreign data_type: categorical category_values: - "foreign - yes" - "foreign - no" one_hot_columns: - name: 'foreign_foreign - yes' value: 'foreign - yes' - name: 'foreign_foreign - no' value: 'foreign - no'
Categorical feature with target encoding:
- feature_name: foreign data_type: categorical category_values: - "foreign - yes" - "foreign - no" target_encodings: - 0.3 - 0.48
Categorical feature with one-hot encoding that is hidden from the model:
dataset_schema: hidden_columns: - foreign feature_schemas: - feature_name: foreign data_type: categorical category_values: - "foreign - yes" - "foreign - no" one_hot_columns: - name: 'foreign_foreign - yes' value: 'foreign - yes' - name: 'foreign_foreign - no' value: 'foreign - no'
Notes
- One-hot encoding captures the way the dataset is encoded in the dataset source (e.g. - CSV file) and is expected by the model. It does not impact feature semantics apart from the expected encoding.
- Target-encodings capture another possible way the dataset could be encoded in the dataset source (e.g. - CSV file) and is expected by the model. The technique of target encoding is a way to encode categorical values numerically (usually by some statistical association with the ground truth value of the prediction). It does not impact feature semantics apart from the expected encoding. The list of
target_encoding
values should match the list ofcategory_values
, providing the corresponding encoding that has been used for each. - Explanations in reports are surfaced as value-encoded for all categorical features, regardless of whether or not they are one-hot or target encoded in the dataset. This makes the explanations human-readable even for one-hot encoded datasets.
- The
categorical_type
field can be used to specify the data type for a categorical feature when there is possible ambiguity. For example, the value "01" could either be interpreted as the string"01"
or as the integer1
.
Evaluation
The evaluation
section specifies the details for the evaluation that are run in the scan. Only a single evaluation can
be specified in this section, but multiple evaluation types (e.g. robustness
, fairness
, explanation
, explainability
, data_statistics
)
can be run against all (or some) of the models specified in the Models section of the scan definition.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Optional | Name of the evaluation being run |
description | string | Optional | Description of the evaluation |
environment | string | Optional | Field for tracking the environment the evaluation is being run in (e.g. DEV, QA, Compliance) |
evaluation_types | list[string] | Required | The list of evaluations to be performed on the models in this scan. Valid values include: robustness , fairness , explanation , explainability , performance , and data_statistics |
evaluation_dataset_id | string | Required | An ID corresponding to a dataset defined in the Datasets section. This dataset is used for the Robustness, Fairness, and Explainability evaluations |
explanation_dataset_id | string | Optional | An ID corresponding to a dataset defined in the Datasets section. This dataset is used for explanation evaluation, and is only required if explanation is listed as an evaluation_type |
test_dataset_id | string | Optional | An ID corresponding to a dataset defined in the Datasets section. This dataset is used for computing performance metrics, and is only required if the model_use_case section lists performance metrics to be evaluated by Certifai |
reference_dataset_id | string | Optional | An ID corresponding to a dataset defined in the Datasets section. This dataset is used as the reference for computing data quality metrics and drift metrics for the evaluation dataset. This field is only required if data_statistics is an evaluation_type |
fairness_grouping_features | list[object] | Optional | List of features and groups within those features to use for calculating Fairness. See the Fairness Grouping Features section for details. |
fairness_metrics | list[string] | Optional | List of Fairness metrics to be calculated/applied as part of the Fairness evaluation. Possible values include: (case insensitive): "Demographic_parity" ("Demographic parity", "Demographic"), "Equal_opportunity" ("equal opportunity", "opportunity"), "Equal_odds" ("odds", "equal odds"), "Sufficiency" ("predictive rate parity"), and "Burden". The default value for this field is [burden] . |
primary_fairness_metric | string | Optional | The Fairness metric to use as the Fairness aspect for calculating the ATX score |
explanation_types | list[string] | Optional | List of explanation types to be used in the Explainability and Explanation evaluations. Possible explanation types include (case insensitive): "counterfactual" ("burden") and "shap". The default value is [counterfactual] . |
primary_explanation_type | string | Optional | The explanation type to use as the Explainability aspect for calculating the ATX score |
feature_restrictions | list[object] | Optional | A list of feature restrictions to apply to generated counterfactuals. See the Feature Restrictions section for details. |
hyperparameters | list[object] | Optional | A list of objects with name and value attributes. These parameters can be used to override the default engine hyperparameters. See the Hyperparameters section for details. |
prediction_description | string | Optional | Description of what is being predicted by the models |
prediction_favorability | string | Optional | The prediction format for favorable prediction outcomes in the scan definition. The field has different possible values and behaviors depending on the task type, see task type specific fields below for details. If not specified, the value is inferred based on the prediction information in the scan definition. |
save_counterfactuals | boolean | Optional | Whether to save generated counterfactuals in a separate CSV file for the Robustness, Fairness, and Explainability reports. SHAP explanations will be saved in a separate CSV file as a well for the Explainability report. Defaults to false . |
no_model_access | boolean | Optional | Whether the scan does not have direct access to the model for predictions. Only a limited set of evaluations are supported when set to True. See the below notes for further details. Defaults to false . |
monitored_features | list[string | int] | Optional | List of feature name or index to use for calculating data quality and drift metrics in the data_statistics evaluation type. If no features are specified and model predictions are avaiable, then Certifai will compute up metrics for only the 50 most important features. Otherwise, metrics will be computed for all features. |
Regression Specific Fields:
Set these attributes for regression use cases.
Name | Type | Required/ Optional | Description |
---|---|---|---|
regression_boundary_type | string | Optional | Indicates how the boundary is defined. Valid values are absolute and relative (default relative ) |
regression_standard_deviation | number | Optional | Only used with regression_boundary_type of relative . Amount of change of prediction required for the analysis to consider it sufficient to be a counterfactual, in units of the standard deviation of the predicted scores for the entire dataset. If omitted, a value of 0.5 is used |
regression_boundary | number | Optional | Only used with regression_boundary_type of absolute and mutually exclusive with regression_boundary_percentile . Exact value of outcome that separates favorable from unfavorable |
regression_boundary_percentile | number | Optional | Only used with regression_boundary_type of absolute and mutually exclusive with regression_boundary . Percentile of outcome value distribution (as empirically measured by the evaluation dataset) that separates favorable from unfavorable |
favorable_outcome_value | string | Optional | Favorable direction for regression task type. Possible values are: increased and decreased . This field must be left empty if the prediction_favorability is none . |
For regression tasks prediction_favorability
is either ordered
or none
.
- A favorability of
none
specifies that there is no favorable direction (increased or decreased) for the predictions. Thefavorable_outcome_value
field is not set in this case. - A favorability of
ordered
specifies that there is a favorable direction (either increased or decreased) for the predictions. Thefavorable_outcome_value
field must be set in this case.
Classification Specific Fields:
Set these attributes for classification use cases.
Name | Type | Required/ Optional | Description |
---|---|---|---|
prediction_values | list[object] | Optional | A list of of prediction outcomes that can be returned by the model. The attributes of predictions are described in the Prediction Values section. |
last_favorable_prediction | string | Optional | The value of the last favorable prediction outcome in the prediction_values list. It must correspond to a prediction outcome in the prediction_values list. Applicable only when prediction_favorability is ordered |
favorable_outcome_group_name | string | Optional | A name describing the group of predictions that are favorable. This field is only applicable when the model_use_case.task_type is multiclass-classification and the prediction_favorability is explicit . |
unfavorable_outcome_group_name | string | Optional | A name describing the group of prediction are unfavorable. This field is only applicable when the model_use_case.task_type is multiclass-classification and the prediction_favorability is explicit . |
For classification tasks prediction_favorability
may be one of: explicit
, ordered
, and none
.
- A favorability of
explicit
means that the prediction outcomes in theprediction_values
list explicitly declare which are favorable, by setting thefavorable
field to true. - A favorability of
none
means that none of the prediction outcomes in theprediction_values
list is favorable; therefore, entries in theprediction_values
list may not have thefavorable
field set to true. - A favorability of
ordered
specifies that the prediction outcomes in theprediction_values
list are ordered from most favorable to least favorable. At most two counterfactuals are generated per observation for theexplanation
report (similar to regression) if this set. Thelast_favorable_prediction
field can be used in this case to specify which of the prediction outcomes is deemed favorable (see below in the classification specific section).
Notes for No Model Access:
Information about no model access scanning in Certifai can be found here.
When evaluation.no_model_access
is True, Certifai is only able to evaluate your model based on prior model predictions. The model predictions must be included in each dataset. Additionally:
- Only a single entry is allowed under the
models
section. - The Model metadata must be provided because it is used to manage scan results in Certifai.
- The
predicted_outcome_column
must be specified in thedataset_schema
section. - Each dataset must have a column matching to the
dataset_schema.predicted_outcome_column
field.
The following evaluations are supported when evaluation.no_model_access
is True:
performance
. The test dataset must include the predicted outcome column for the model.- Non-burden Fairness metrics. The evaluation dataset must include the predicted outcome column for the model.
- Counterfactual search variant of
explanation
evaluations. The explanation dataset must include the predicted outcome column for the model. data_statistics
. The evaluation and reference dataset can optionally include the predicted outcome column for the model.
The following evaluations are not supported when evaluation.no_model_access
is True:
- Robustness
- Explainability
- Burden based Fairness
- SHAP based Explanations
Examples
# binary-classification evaluationevaluation: description: This evaluation compares the robustness, accuracy, fairness and explanations for 4 candidate models. evaluation_dataset_id: eval evaluation_types: - robustness - fairness - explanation - explainability - performance explanation_dataset_id: explan test_dataset_id: test fairness_grouping_features: - name: age - name: status feature_restrictions: - feature_name: age restriction_string: no changes - feature_name: status restriction_string: no changes name: Baseline evaluation of 4 models prediction_description: Will a loan be granted? prediction_favorability: explicit prediction_values: - favorable: true name: Loan Granted value: 1 - favorable: false name: Loan Denied value: 2
# multiclass-classification evaluationevaluation: name: Baseline evaluation of 3 models description: This evaluation compares the robustness, fairness, performance and explainability for 3 candidate models. evaluation_dataset_id: eval explanation_dataset_id: explan evaluation_types: - robustness - fairness - explainability - explanation - performance prediction_favorability: explicit prediction_values: - name: "Heart disease not detected" value: 0 favorable: true - name: "Stage 1: > 50% diameter narrowing in a major vessel" value: 1 favorable: false - name: "Stage 2: > 50% diameter narrowing in a major vessel" value: 2 favorable: false - name: "Stage 3: > 50% diameter narrowing in a major vessel" value: 3 favorable: false - name: "Stage 4: > 50% diameter narrowing in a major vessel" value: 4 favorable: false favorable_outcome_group_name: Heart Disease not detected unfavorable_outcome_group_name: Heart Disease detected prediction_description: "Indicator of heart disease level (angiographic disease status)" fairness_grouping_features: - name: sex feature_restrictions: - feature_name: sex restriction_string: no changes
# regression evaluationevaluation: description: This evaluation compares the robustness, accuracy, fairness and explanations for 3 candidate models. evaluation_dataset_id: eval evaluation_types: - robustness - fairness - explanation - explainability - performance explanation_dataset_id: explan test_dataset_id: test fairness_grouping_features: - name: Marital Status - name: Gender favorable_outcome_value: increased feature_restrictions: - feature_name: Gender restriction_string: no changes name: Baseline evaluation of 3 models prediction_description: Amount of Settled Claim prediction_favorability: ordered regression_standard_deviation: 0.5
Fairness Grouping Features
The fairness grouping features field is only required when fairness
is listed under the evaluation.evaluation_type
field. Each fairness grouping feature defined has the following attributes:
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | int | Optional | Column name or index in the dataset |
buckets | list[object] | Optional | An optional list of objects defining groups within the named features. If no buckets are specified, then Certifai treats each distinct value for that feature in the dataset as a separate class. The structure for specifying buckets depends on whether the feature is categorical or numerical. Refer to the notes below for details |
Buckets (fields) for categorical features:
Name | Type | Required/ Optional | Description |
---|---|---|---|
description | string | Optional | Description/name of the group |
values | list[string | int | boolean] | Optional | List of category values that belong in the bucket |
Buckets (fields) for numerical features
Name | Type | Required/ Optional | Description |
---|---|---|---|
description | string | Optional | Description/name of the group |
max | number | Optional | The maximum value allowed in the group. Values belong to the group with the lowest upper bound greater than or equal to the value, and exactly one bucket must omit an upper bound to act as a catch-all |
Example
fairness_grouping_features:- name: gender- name: Age buckets: # buckets for numeric feature - description: "<= 40 years old" max: 40 - description: "> 40 years old"- name: marital-status buckets: # buckets for categorical feature - description: Single values: - marital-status_Never-married - description: Married values: - marital-status_Married-AF-spouse - marital-status_Married-civ-spouse - marital-status_Married-spouse-absent - description: Divorced values: - marital-status_Divorced - description: Widowed values: - marital-status_Widowed - description: Separated values: - marital-status_Separated
Feature Restrictions
Feature Restrictions are specified at the evaluation level and are applied by Certifai to individual features when generating counterfactuals during the Explanation, Explainability, and burden based Fairness evaluations.
All features are treated as having no restrictions
by default, meaning that the corresponding feature in the generated
counterfactual can take any value as defined in the dataset_schema.feature_schemas
. See the Dataset Schema section for more details.
Note that fairness grouping features implicitly have a restriction string of no changes
applied during the Fairness evaluation.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
feature_name | string | int | Optional | Column name or index in the dataset |
restriction_string | string | Optional | Restriction to apply on the feature. Must be one of: no restrictions , no changes , min/max , percentage , standard deviation , amount , value set , or value map . Defaults to no restrictions if not specified. |
restriction_numerical_percentage | number | Optional | Allowed percentage of change relative to the original feature value. Used in conjunction with restriction_string = percentage |
restriction_numerical_min | number | Optional | Minimumn value the feature values may take. Used in conjunction with restriction_string = min/max |
restriction_numerical_max | number | Optional | Maxmimum value the feature values may take. Used in conjunction with restriction_string = min/max |
restriction_numerical_direction | string | Optional | Allowed direction of change relative to the original feature value. Only applicable for numeric features. Can be used in conjunction with feature restriction types of no restrictions , min/max , percentage , standard deviation , or amount . Must be one of: any , increase , or decrease . Defaults to any . |
restriction_numerical_amount | number | Optional | Fixed amount of change relative to the original feature value. Used in conjunction with restriction_string = amount . Only supported for the counterfactual search variant of the Explanation evaluation. |
restriction_numerical_std | number | Optional | Allowed number of standard deviations of change relative to the original feature value. Used in conjunction with restriction_string = standard deviation . Only supported for the counterfactual search variant of the Explanation evaluation. |
restriction_numerical_tolerance | number | Optional | An additional allowed amount of change for numeric feature values if all feature restrictions cannot be met. Only supported for the counterfactual search variant of the Explanation evaluation. The exact amount of change depends on restriction string. Can be used in conjunction with feature restriction types of no restrictions , min/max , percentage , standard deviation , or amount . |
restriction_value_set | list[string | int | boolean] | Optional | A subset of category values the feature change to. Only applicable for categorical features and used in conjunction with restriction_string = value set . Only supported for the counterfactual search variant of the Explanation evaluation. |
restriction_tolerance_value_set | list[string | int | boolean] | Optional | Additional category values any feature may change to, if all feature restrictions cannot be met. Used in conjunction with restriction_string = value set . Only supported for the counterfactual search variant of the Explanation evaluation. |
restriction_value_map | list[object] | Optional | A list of objects specifiying the allowed subset of feature values a given category change to. Category values not present in the mapping are allowed to change to any feature value. Only applicable for categorical features and used in conjunction with restriction_string = value map . Only supported for the counterfactual search variant of the Explanation evaluation. |
restriction_tolerance_value_map | list[object] | Optional | An additional list of objects specfifying the allowed subset of feature values a given category change to, if all feature restrictions cannot be met. Used in conjunction with restriction_string = value set . Only supported for the counterfactual search variant of the Explanation evaluation. |
Examples
feature_restrictions:- feature_name: age restriction_string: no changes- feature_name: marital restriction_string: no restrictions
feature_restrictions: # restrictions for numeric features- feature_name: age restriction_string: 'min/max' restriction_numerical_min: 18 restriction_numerical_max: 65- feature_name: term_length restriction_string: percentage restriction_numerical_percentage: 0.25 restriction_numerical_direction: increase- feature_name: measure1 restriction_string: standard deviation restriction_numerical_std: 0.5- feature_name: measure2 restriction_string: amount restriction_numerical_amount: 10.0 restriction_numerical_tolerance: 5.0
feature_restrictions: # restrictions for categoricals features- feature_name: age restriction_string: no changes- feature_name: grouping1 # values A - B restriction_string: value set restriction_value_set: - A - B - C restriction_tolerance_value_set: - D- feature_name: grouping2 # values 1 - 5 restriction_string: value map restriction_value_map: - value: 1 allowed_values: - 3 - 5 - value: 2 allowed_values: 4 - value: 4 allowed_values: 2 restriction_tolerance_value_map: - value: 1 allowed_values: - 2 - 4
Prediction Values
The prediction_values
field is applicable only to classification (binary or multiclass) tasks.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Optional | A human-readable name for this prediction (e.g. 'Has Diabetes') |
value | any | Optional | The output value from the ML model corresponding to this prediction (e.g. 1/0) |
favorable | boolean | Optional | Optional field specifying if the prediction is favorable. Defaults to false |
Examples
prediction_values:- favorable: true name: Makes a deposit value: 1- favorable: false name: Does not make a deposit value: 0
prediction_values: # boolean prediction values- value: true favorable: true name: "True"- value: false favorable: true name: "False"
Hyperparameters
The hyperparameters
list can be used to modify the hyper-parameters used by the Certifai engine. Each item in the list must have the following fields:
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Required | Name of the hyperparameter |
value | any | Required | Value to apply to the hyperparameter |
Notes
ALERT
Hyperparameter tuning is intended for advanced users and can affect the quality of results produced and overall evaluation time.For detailed reference of available engine hyper-parameters, refer to the engine hyperparameter section of the Configuration File Reference.
Examples
hyperparameters:- name: num_counterfactuals value: 3- name: sampling_boundary value: 0.05
Scoring
The scoring
section of the scan definition is used to specify custom weights for computing Explainability scores and ATX scores.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
explainability | list[object] | A list of custom weights to be used when computing the explainability score. Refer to the Explainability section for details. | |
aspect_weights | list[object] | A list of objects each specifying name and value attributes. The value is the corresponding weight for the name component. Refer to the Aspect Weights section for details. |
Example
scoring: explainability: - num_features: 1 value: 100 - num_features: 2 value: 80 - num_features: 3 value: 50 - num_features: 4 value: 20 aspect_weights: - name: "explainability" value: 1.0 - name: "robustness" value: 0.5 - name: "fairness" value: 1.0 - name: "performance" value: 0.2
Explainability
The explainability weights is a list of objects, each specifying a num_features
and value
attributes. The value
is the corresponding weight to the specified num_features
.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
num_features | int | Optional | An integer between 1 and 10 |
value | number | Optional | A weight between 0 and 100 setting the corresponding weight for num_features |
The current default value is:
explainability:- num_features: 1 value: 100.0- num_features: 2 value: 80.0- num_features: 3 value: 50.0- num_features: 4 value: 20.0- num_features: 5 value: 0.0- num_features: 6 value: 0.0- num_features: 7 value: 0.0- num_features: 8 value: 0.0- num_features: 9 value: 0.0- num_features: 10 value: 0.0
Aspect Weights
The aspects weights section is a list of objects that species the weightings for computing the ATX score.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Optional | Name of the ATX aspect; must be one of: explainability , robustness , fairness , or performance |
value | number | Optional | A non-negative numerical weight for the component |
The current default value is:
aspect_weights:- name: explainability" value: 1.0- name: robustness value": 1.0- name: fairness value": 1.0- name: performance value": 1.0