Version: 1.3.13

Scan Definition Reference

This document provides a detailed description of the scan definition file that serves as the basis for running a scan in Certifai.

Scan Definition Fields

The scan definition is written in yaml and consists of the following sections (Each section is detailed below.):

  • scan
  • model_use_case
  • models
  • model_secret
  • model_headers
  • datasets
  • dataset_schema
  • evaluation
  • scoring

To refer to different sections of the scan definition, this doc generally uses the syntax section.field. For example, model_use_case.task_type refers to the task_type field in the model_use_case section of the scan definition.

For examples of scan definitions refer to existing definitions in the Certifai toolkit under examples/definitions/.

Scan

The scan section of the scan definition contains general information for the scan job.

Fields

NameTypeRequired/ OptionalDescription
outputobject (key=path)OptionalAn object for specifying where to output the resulting reports of the scan. The path must be a string that specifies either a local path (e.g. ./reports,file:/tmp/reports) OR a supported cloud storage path (e.g gs://\<some-path\>, s3://\<some-bucket\>). Paths are relative to the location of the scan definition.

Examples:

scan:
output:
path: ./reports
scan:
output:
path: "s3://certifai/reports"

Notes

There are different methods of specifying the output path to write reports to:

  • Using the --output flag in the Scanner CLI
  • The value of the SCAN_RESULTS_DIRECTORY environment variable
  • The output path in the scan definition
  • Default value of ./reports (relative to scan definition file)

Model Use Case

The model_use_case section of the scan definition contains general information about the problem being solved by your ML models.

Fields

NameTypeRequired/ OptionalDescription
model_use_case_idstringRequiredA unique identifier to group the scans related to a specific model use case. The value must be unique across organizations so that it can be used in reporting services. The value is used when creating the model use case folder (after being modified to contain filename safe characters) that reports are written to.
task_typestringRequiredThe type of machine learning problem solved being solved. Must be one of the following: binary-classification, regression, or multiclass-classification
performance_metricslist[object]OptionalList of performance metrics that apply to the model use case.
atx_performance_metric_namestringOptionalName of the performance metric to be used in the ATX score for this use case
descriptionstringOptionalDescription of the problem being solved
authorstringOptionalInformation about the author of the model use case (e.g. email, name)
namestringRequiredName of the model use case - intended to be more human friendly than the model_use_case_id

Example

model_use_case:
model_use_case_id: c12e/datasciencelab/german_credit
task_type: binary-classification
name: 'Banking: Loan Approval'
author: info@cognitivescale.com
description: 'In this use case, each entry in the dataset represents a person who
takes a credit loan from a bank. The learning task is to classify each person
as either a good or bad credit risk according to the set of attributes.
This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit
The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29
'
atx_performance_metric_name: Accuracy
performance_metrics:
- metric: Accuracy
name: Accuracy
- metric: Recall
name: Recall
- metric: Precision
name: Precision

Performance Metrics

The performance_metrics section of the model_use_case lists the user determined and defined value that may be added in Certifai to track specific use case concerns.

Defining performance metrics at the model use case level does not mean that a performance evaluation is part of a particular scan. This field is only defining the performance metrics that apply to the use case as a whole. To include a performance evaluation as part of your scan, you must list it under the evaluation.evaluation_types field of the scan definition.

Fields

NameTypeRequired/ OptionalDescription
namestringRequiredA unique name that serves as a reference to this metric, and with which a value is associated in the performance report
metricstringOptionalA specifier of the metric to be calculated, if it is to be calculated by Certifai. See notes below for details

Example

performance_metrics:
- name: Accuracy
metric: Accuracy
- name: Micro Recall
metric: Recall(micro)
- name: My Bespoke Metric

Notes

The metric specifier selects a scoring algorithm. It has the general form <algorithm_family_name>[(variant)] and consists of a name for the metric scoring algorithm and allows for minor variant within that. The current supported set of families are:

  • Accuracy
  • Precision
  • Recall
  • F1
  • R-squared

The Precision, Recall, and F1 families support optional variants micro and macro. Micro variants are equivalent to the undecorated base family for non-multi-label problems, but are included for future use.

Precision, Recall, and F1 (along with their variants) are classification metrics and apply to both binary and multi-class cases.

In the binary case (model_use_case.task_type set to binary_classification) the metric is evaluated with respect to the 'true' label being the label specified as the favorable outcome.

R-squared is a regression metric, and requires numerical model output.

Supported aliases for R-squaredare: Rsq, R squared and, R2.

To specify a variant, use the syntax <algorithm_family_name>(<variant>). For example, Precision(micro) specifies the micro variant of Precision.

Metric family names and variants are case-insensitive, and Certifai supports aliases for some of the metric families.

Models

The models section of the scan definition specifies which models may be evaluated as part of the scan. Each model you specify is an object that has the following fields defined:

Fields

NameTypeRequired/ OptionalDescription
model_idstringRequiredAn alphanumeric string (including underscores), used to uniquely identity a model within the scan
namestringRequiredName of the model - intended to be more human friendly than the model_id
authorstringOptionalInformation about the author of this model
versionstringOptionalField for tracking version of model
descriptionstringOptionalDescription of model
model_id_tagstringOptionalTag that can be used to store extra metadata about your trained model, such as an id from an ML pipeline or a git commit hash (which wouldn't be suitable as a model_id)
predict_endpointstringOptionalAn http address pointing to where your ML model is being hosted. See the below notes for details on how Certifai communicates with your model. This field is required unless evaluation.no_model_access is True. See the Evaluation section for more details.
max_batch_sizeintOptionalThe maximum number of input rows (predictions) that the model supports in a single HTTP request. If not specified, the maximum prediction size can be as many entries as are in the dataset
supports_soft_scoringbooleanOptionalWhether the model supports confidence-scored predictions (e.g. probabilities). This field is only applicable to classification task types. The default value is false. See notes below for how this affects the model prediction format.
prediction_value_orderlist[any]OptionalOrdering of class labels for soft scoring predictions returned by the model. This field is only applicable if supports_soft_scoring is true. Assuming the prediction values (i.e. values being returned by the model) are 1 and 2, then a prediction_value_order of [2,1] means that the soft outputs returned by your model contain the scores for the value 2 first
performance_metric_valueslist[object])OptionalA list of asserted values that are used for the specified metrics that are not to be calculated by Certifai.
json_strictbooleanOptionalWhether the model expects data to be encoded in strict JSON (missing values will encode as null). If false then JavaScript encoding extensions will be used, and missing values will be encoded as NaN. This field is only applicable to remote models. The default value is false.

Example

models:
- model_id: logit
name: Logistic Regression
predict_endpoint: http://127.0.0.1:5111/german_credit_logit/predict
- model_id: svm
author: 'info@cognitivescale.com'
description: Scikit-learn SVC classifier
name: Support Vector Classifier
predict_endpoint: http://127.0.0.1:5111/german_credit_svm/predict
supports_soft_scoring: true
json_strict: false
performance_metric_values:
- name: My Bespoke Metric
value: 0.748

Notes

Performance metric values are a list of objects, consisting of a name field and a value field. The name must correspond to a performance metric defined at the model_use_case level. The value must be a number between 0 and 1 (inclusive).

The full set of metrics are specified in the Model Use Case section of the scan definition. If a metric in the model_use_case section has no metric value, or if the evaluation section specifies no test_dataset_id, then each model must provide a value for that metric in this list.

For general information and examples of how Certifai communicates with hosted models refer to the Certifai Reference Model repo.

Model Secrets

Admins are able to create model secrets in Kubernetes (that contain a name and key (model_id value)), and those K8s secrets are referenced through the scan definition by specifying ${model_secret.<model_id value>} in the model header.

Fields

NameTypeRequired/ OptionalDescription
model_secretstringOptionalThe model_secret is created in Kubernetes. After you add the model_secret field in the scan definition, a reference such as ${dotted.field.ref} in the model headers will use the specified field 'dotted.field.ref' taken from the specified secret.

Example

model_secret: <your-k8s-secret-name>

See: Model Secrets for more details.

Model Headers

The model_headers section of the scan definition contains information for custom HTTP-Headers to be used when communicating with your models.

Fields

NameTypeRequired/ OptionalDescription
defaultlist[object]OptionalA list of model headers that is applied to ALL models in the scan definition.
definedlist[object]OptionalA list of model headers that are applied to individual models based on their model_id

Example

model_headers:
default:
- name: Content-Type
value: application/json
- name: accept
value: application/json
defined:
- model_id: svm
name: Content-Type
value: application/json
- model_id: logit
name: Authorization
value: Bearer ${TOKEN}

Notes

Model headers must contain name and value string fields. Defined model headers must also specify a model_id field that corresponds to a model in the models section. The value field can contain references to environment variables via the syntax: ${<ENV-VARIABLE-NAME>}.

Datasets

The datasets section of the scan definition lists the datasets that may be used as part of the evaluation. Each dataset you specify is an object that has the following fields defined:

Fields

NameTypeRequired/ OptionalDescription
dataset_idstringRequiredAn alphanumeric (including underscores) ID for referencing this dataset
urlstringRequiredA url pointing to the dataset file. Local system files must be prefixed with file:. Supported URL prefixes are file:, s3:, gs:, and abfs:
file_typestringRequiredOne of: csv or json. This selection drives an additional set of fields defined below
encodingstringOptionalThe file encoding of the dataset. Supported encodings: [ascii, utf-16, utf-16-be, utf-16-le, utf-32, utf-32-be, utf-32-le, utf-7, utf-8, utf-8-sig, latin-1, iso-8859-1, windows-1252]
descriptionstringOptionalDescription of the dataset
namestringOptionalName of the dataset - intended to be more human friendly than the dataset_id

CSV Specific Fields

These fields are specific to CSV datasets, and are only referenced if file_type is csv.

NameTypeRequired/ OptionalDescription
has_headerstringOptionalWhether the csv file has column headers. Defaults to True
delimiterstringOptionalString to use as separator for the csv file. Defaults to ,
quote_characterstringOptionalA one character string used to denote the start and end of a quoted item. Defaults to "
escape_characterstringOptionalA one character string used to escape other characters. Defaults to None

JSON Specific Fields

These fields are specific to JSON datasets, and are only referenced if file_type is json.

NameTypeRequired/ OptionalDescription
orientstringOptionalOne of: records, values, or columns. These values are synonymous with pandas' read_json() orient argument. Defaults to records
linesbooleanOptionalWhether the JSON dataset is in json-lines format. Defaults to True

Example

datasets:
## json dataset
- dataset_id: eval
description: ''
name: Evaluation dataset
file_type: json
url: file:test/data/german_credit_mini_records.json
lines: true
orient: records
## csv dataset
- dataset_id: expl
description: ''
file_type: csv
has_header: true
name: 100 row explanation dataset
url: file:test/data/german_credit_explan.csv

Notes

The Datasets section provides one or more dataset options that may be specified for your scans; in the Evaluation section of the scan definition you specify which dataset to use for a particular scan.

Dataset Schema

The dataset_schema section of the scan definition contains schema details about the defined datasets.

If the outcome_column and predicted_outcome_column are not specified, the dataset is assumed to not contain them.

The scanner infers the feature_schemas field from the evaluation dataset, and the specified feature_schemas input is applied as overrides to the inferred schema.

Fields

NameTypeRequired/ OptionalDescription
outcome_columnstringOptionalName (or index) of the outcome column in the dataset
predicted_outcome_columnstringOptionalName (or index) of the predicted outcome column in the dataset
hidden_columnslist[string]OptionalA list of feature names (or indices) of columns in the dataset NOT to expose to models. Hidden columns are removed from the dataset entries prior to being applied to the scan
feature_schemaslist[object]OptionalA list of schema definitions for features in the dataset. This list is treated as overrides to the auto-inferred schema by the scanner. It is entirely optional and does not have to include all features in the dataset (i.e. it may be sparsely defined). The expected objects in the list are described in the below Feature Schemas section
defined_feature_orderbooleanOptionalA boolean specifying whether the feature_schemas field specifies the order of the features in dataset. If set to True, the Feature Schemas section must contain all the features in the dataset. The default value is false for csv files and true for JSON datasets where orient is not set to columns.

Examples

dataset_schema:
outcome_column: outcome
predicted_outcome_column: predicted_outcome
feature_schemas:
- feature_name: age
- feature_name: foreign
- feature_name: purpose
data_type: categorical
category_values:
- 'car (new)'
- 'car (used)'
- 'furniture/equipment'
- 'radio/television'
one_hot_columns:
- name: 'purpose_car (new)'
value: 'car (new)'
- name: 'purpose_car (used)'
value: 'car (used)'
- name: 'purpose_furniture/equipment'
value: 'furniture/equipment'
- name: 'purpose_radio/television'
value: 'radio/television'
hidden_columns:
- age
- foreign
dataset_schema:
outcome_column: outcome
defined_feature_order: true
feature_schemas:
- feature_name: checkingstatus
- feature_name: duration
- feature_name: history
- feature_name: purpose
- feature_name: amount
min: 10
max: 1000000000
spread: 1.42
- feature_name: savings
- feature_name: employ
- feature_name: installment
- feature_name: status
- feature_name: others
- feature_name: residence
- feature_name: property
- feature_name: age
- feature_name: otherplans
- feature_name: housing
- feature_name: cards
- feature_name: job
- feature_name: liable
- feature_name: telephone
- feature_name: foreign
- feature_name: outcome

Notes

For JSON datasets with an orient of records, the features found in Feature Schemas provide the order of the dataframe.

For csv datasets and JSON datasets with an orient of values or columns, the names found in the Feature Schemas section specify/override the column names of the dataframe.

Feature Schemas

Each item in the feature_schemas list can have the following attributes.

Fields

NameTypeRequired/ OptionalDescription
feature_namestringOptionalName of the column (or index if there are no headers) being referred to
data_typestringOptionalOne of: categorical, numerical-int, or numerical-float
category_valueslist[string | int]The possible values for the category. This attribute is only used when the data_type is categorical
one_hot_columnslist[object]OptionalA list of mappings between (one-hot) column names in the dataset and values of the feature. If the schema defined_feature_order is True then this also specifies the column ordering. The set of values must match those specified in category_values
target_encodingslist[float]OptionalA list of numeric encoding values of the feature. The set of values must be in correspondence to those specified in category_values in the same order
categorical_typestringOptionalData type to interpret the category values as. Must be one of: auto, string, or int. Defaults to auto, in which case the data type will be inferred. Should only be used when data_type is categorical
minnumberOptionalMinimum possible value for the feature. Should only be used when data_type is numerical-int or numerical-float
maxnumberOptionalMaximum possible value for the feature. Should only be used when data_type is numerical-int or numerical-float
spreadnumberOptionalA typical magnitude of change - normally the mean absolute deviation or standard deviation. If not specified, an appropriate spread is estimated from the dataset. Should only be used when data_type is numerical-int or numerical-float

Example

Without one-hot encoding:

feature_schemas:
- feature_name: age
data_type: numerical-int
min: 18
max: 65
spread: 1.0
- feature_name: foreign
data_type: categorical
category_values:
- "foreign - yes"
- "foreign - no"

With one-hot encoding:

feature_schemas:
- feature_name: age
data_type: numerical-int
min: 18
max: 65
spread: 1.0
- feature_name: foreign
data_type: categorical
category_values:
- "foreign - yes"
- "foreign - no"
one_hot_columns:
- name: 'foreign_foreign - yes'
value: 'foreign - yes'
- name: 'foreign_foreign - no'
value: 'foreign - no'

With target encoding:

feature_schemas:
- feature_name: age
data_type: numerical-int
min: 18
max: 65
spread: 1.0
- feature_name: foreign
data_type: categorical
category_values:
- "foreign - yes"
- "foreign - no"
target_encodings:
- 0.3
- 0.48

Notes

  • If an override is not applicable to a feature, it is NOT applied. For example, if an override specifies category_values but the feature's data_type is not categorical, the category_values field is not applied to the feature.
  • One-hot encoding captures the way the dataset is encoded in the dataset source (e.g. - CSV file) and is expected by the model. It does not impact feature semantics apart from the expected encoding.
  • Target-encodings capture another possible way the dataset could be encoded in the dataset source (e.g. - CSV file) and is expected by the model. The technique of target encoding is a way to encode categorical values numerically (usually by some statistical association with the ground truth value of the prediction). It does not impact feature semantics apart from the expected encoding. The list of target_encoding values should match the list of category_values, providing the corresponding encoding that has been used for each.
  • Explanations in reports are surfaced as value-encoded for all categorical features, regardless of whether or not they are one-hot or target encoded in the dataset. This makes the explanations human-readable even for one-hot encoded datasets.
  • The categorical_type field can be used to specify the data type for a categorical feature when there is possible ambiguity. For example, the value "01" could either be interpreted as the string "01" or as the integer 1.

Evaluation

The evaluation section specifies the details for the evaluation that are run in the scan. Only a single evaluation can be specified in this section, but multiple evaluation types (e.g. robustness, fairness, explanation, explainability) can be run against all (or some) of the models specified in the Models section of the scan definition.

Fields

NameTypeRequired/ OptionalDescription
namestringOptionalName of the evaluation being run
descriptionstringOptionalDescription of the evaluation
environmentstringOptionalField for tracking the environment the evaluation is being run in (e.g. DEV, QA, Compliance)
evaluation_typeslist[string]RequiredThe list of evaluations to be performed on the models in this scan. Valid values include: robustness, fairness, explanation, explainability, and performance
evaluation_dataset_idstringRequiredAn ID corresponding to a dataset defined in the Datasets section. This dataset is used for the robustness, fairness, and explainability evaluations
explanation_dataset_idstringOptionalAn ID corresponding to a dataset defined in the Datasets section. This dataset is used for explanation evaluation, and is only required if explanation is listed as an evaluation_type
test_dataset_idstringOptionalAn ID corresponding to a dataset defined in the Datasets section. This dataset is used for computing performance metrics, and is only required if the model_use_case sections lists performance metrics to be evaluated by Certifai
fairness_grouping_featureslist[object]OptionalList of features and groups within those features to use for calculating fairness. See the Fairness Grouping Features section for details.
fairness_metricslist[string]OptionalList of fairness metrics to be calculated/applied as part of the Fairness evaluation. Possible values include: (case insensitive): "Demographic_parity" ("Demographic parity", "Demographic"), "Equal_opportunity" ("equal opportunity", "opportunity"), "Equal_odds" ("odds", "equal odds"), "Sufficiency" ("predictive rate parity"), and "Burden". The default value for this field is [burden].
primary_fairness_metricstringOptionalThe fairness metric to use as the Fairness aspect for calculating the ATX score
explanation_typeslist[string]OptionalList of explanation types to be used in the Explainability and Explanation evaluations. Possible explanation types include (case insensitive): "counterfactual" ("burden") and "shap". The default value is [counterfactual].
primary_explanation_typestringOptionalThe explanation type to use as the Explainability aspect for calculating the ATX score
feature_restrictionslist[object]OptionalA list of feature restrictions to apply to generated counterfactuals. See the Feature Restrictions section for details.
hyperparameterslist[object]OptionalA list of objects with name and value attributes. These parameters can be used to override the default engine hyperparameters. See the Hyperparameters section for details.
prediction_descriptionstringOptionalDescription of what is being predicted by the models
prediction_favorabilitystringOptionalThe prediction format for favorable prediction outcomes in the scan definition. The field has different possible values and behaviors depending on the task type. If not specified, the value is inferred based on the prediction information in the scan definition.
save_counterfactualsbooleanOptionalWhether to save generated counterfactuals in a separate CSV file for the Robustness, Fairness, and Explainability reports. SHAP explanations will be saved in a separate CSV file as a well for the Explainability report. Defaults to false.
no_model_accessbooleanOptionalWhether the scan does not have direct access to the model for predictions. Only a limited set of evaluations are supported when set to True. See the below notes for further details. Defaults to false.

Notes

Prediction Favorability values and behaviors:

  • For Classification, possible values are: explicit, ordered, and none.

    • A favorability of explicit means that the prediction outcomes in the prediction_values list explicitly declare which are favorable, by setting the favorable field to true.
    • A favorability of none means that none of the prediction outcomes in the prediction_values list is favorable; therefore, entries in the prediction_values list may not have the favorable field set to true.
    • A favorability of ordered specifies that the prediction outcomes in the prediction_values list are ordered from most favorable to least favorable. At most two counterfactuals are generated per observation for the explanation report (similar to regression) if this set. The last_favorable_prediction field can be used in this case to specify which of the prediction outcomes is deemed favorable (see below in the classification specific section).
  • For Regression, possible values are: ordered and none.

    • A favorability of none specifies that there is no favorable direction (increased or decreased) for the predictions. The favorable_outcome_value field is not set in this case.
    • A favorability of ordered specifies that there is a favorable direction (either increased or decreased) for the predictions. The favorable_outcome_value field must be set in this case.

Regression Specific Fields:

Set these attributes for regression use cases.

NameTypeRequired/ OptionalDescription
regression_boundary_typestringOptionalIndicates how the boundary is defined. Valid values are absolute and relative (default relative)
regression_standard_deviationnumberOptionalOnly used with regression_boundary_type of relative. Amount of change of prediction required for the analysis to consider it sufficient to be a counterfactual, in units of the standard deviation of the predicted scores for the entire dataset. If omitted, a value of 0.5 is used
regression_boundarynumberOptionalOnly used with regression_boundary_type of absolute and mutually exclusive with regression_boundary_percentile. Exact value of outcome that separates favorable from unfavorable
regression_boundary_percentilenumberOptionalOnly used with regression_boundary_type of absolute and mutually exclusive with regression_boundary. Percentile of outcome value distribution (as empirically measured by the evaluation dataset) that separates favorable from unfavorable
favorable_outcome_valuestringOptionalFavorable direction for regression task type. Possible values are: increased and decreased. This field must be left empty if the prediction_favorability is none.

Classification Specific Fields:

Set these attributes for classification use cases.

NameTypeRequired/ OptionalDescription
prediction_valueslist[object]OptionalA list of of prediction outcomes that can be returned by the model. The attributes of predictions are described in the Prediction Values section.
last_favorable_predictionstringOptionalThe value of the last favorable prediction outcome in the prediction_values list. It must correspond to a prediction outcome in the prediction_values list. Applicable only when prediction_favorability is ordered
favorable_outcome_group_namestringOptionalA name describing the group of predictions that are favorable. This field is only applicable when the model_use_case.task_type is multiclass-classification and the prediction_favorability is explicit.
unfavorable_outcome_group_namestringOptionalA name describing the group of prediction are unfavorable. This field is only applicable when the model_use_case.task_type is multiclass-classification and the prediction_favorability is explicit.

No Model Access:

When evaluation.no_model_access is True, Certifai is only able to evaluate your model based on prior model predictions. The model predictions must be included in each dataset. Additionally:

  • Only a single entry is allowed under the models section.
  • The predicted_outcome_column must be specified in the dataset_schema section.
  • Each dataset must have a column matching to the dataset_schema.predicted_outcome_column field.

The following evaluations are supported when evaluation.no_model_access is True:

  • Performance. The test dataset must include the predicted outcome column for the model.
  • Non-burden fairness metrics. The evaluation dataset must include the predicted outcome column for the model.
  • Counterfactual explanations via counterfactual sampling. The explanation dataset must include the predicted outcome column for the model.

The following evaluations are not supported when evaluation.no_model_access is True:

  • Robustness
  • Explainability
  • Burden based Fairness
  • SHAP based Explanations

Examples

# binary-classification evaluation
evaluation:
description: This evaluation compares the robustness, accuracy, fairness and explanations
for 4 candidate models.
evaluation_dataset_id: eval
evaluation_types:
- robustness
- fairness
- explanation
- explainability
- performance
explanation_dataset_id: explan
test_dataset_id: test
fairness_grouping_features:
- name: age
- name: status
feature_restrictions:
- feature_name: age
restriction_string: no changes
- feature_name: status
restriction_string: no changes
name: Baseline evaluation of 4 models
prediction_description: Will a loan be granted?
prediction_favorability: explicit
prediction_values:
- favorable: true
name: Loan Granted
value: 1
- favorable: false
name: Loan Denied
value: 2
# multiclass-classification evaluation
evaluation:
name: Baseline evaluation of 3 models
description: This evaluation compares the robustness, fairness, performance and explainability
for 3 candidate models.
evaluation_dataset_id: eval
explanation_dataset_id: explan
evaluation_types:
- robustness
- fairness
- explainability
- explanation
- performance
prediction_favorability: explicit
prediction_values:
- name: "Heart disease not detected"
value: 0
favorable: true
- name: "Stage 1: > 50% diameter narrowing in a major vessel"
value: 1
favorable: false
- name: "Stage 2: > 50% diameter narrowing in a major vessel"
value: 2
favorable: false
- name: "Stage 3: > 50% diameter narrowing in a major vessel"
value: 3
favorable: false
- name: "Stage 4: > 50% diameter narrowing in a major vessel"
value: 4
favorable: false
favorable_outcome_group_name: Heart Disease not detected
unfavorable_outcome_group_name: Heart Disease detected
prediction_description: "Indicator of heart disease level (angiographic disease status)"
fairness_grouping_features:
- name: sex
feature_restrictions:
- feature_name: sex
restriction_string: no changes
# regression evaluation
evaluation:
description: This evaluation compares the robustness, accuracy, fairness and explanations
for 3 candidate models.
evaluation_dataset_id: eval
evaluation_types:
- robustness
- fairness
- explanation
- explainability
- performance
explanation_dataset_id: explan
test_dataset_id: test
fairness_grouping_features:
- name: Marital Status
- name: Gender
favorable_outcome_value: increased
feature_restrictions:
- feature_name: Gender
restriction_string: no changes
name: Baseline evaluation of 3 models
prediction_description: Amount of Settled Claim
prediction_favorability: ordered
regression_standard_deviation: 0.5

Fairness Grouping Features

The fairness grouping features field is only required when fairness is listed under the evaluation.evaluation_type field. Each fairness grouping feature defined has the following attributes:

Fields

NameTypeRequired/ OptionalDescription
namestringOptionalName of the feature (or index) in the dataset
bucketslist[object]OptionalAn optional list of objects defining groups within the named features. If no buckets are specified, then Certifai treats each distinct value for that feature in the dataset as a separate class. The structure for specifying buckets depends on whether the feature is categorical or numerical. Refer to the notes below for details

Buckets (fields) for categorical features:

NameTypeRequired/ OptionalDescription
descriptionstringOptionalDescription/name of the group
valueslist[string | int]OptionalList of category values that belong in the bucket

Buckets (fields) for numerical features

NameTypeRequired/ OptionalDescription
descriptionstringOptionalDescription/name of the group
maxnumberOptionalThe maximum value allowed in the group. Values belong to the group with the lowest upper bound greater than or equal to the value, and exactly one bucket must omit an upper bound to act as a catch-all

Example

fairness_grouping_features:
- name: gender
- name: Age
buckets: # buckets for numeric feature
- description: "<= 40 years old"
max: 40
- description: "> 40 years old"
- name: marital-status
buckets: # buckets for categorical feature
- description: Single
values:
- marital-status_Never-married
- description: Married
values:
- marital-status_Married-AF-spouse
- marital-status_Married-civ-spouse
- marital-status_Married-spouse-absent
- description: Divorced
values:
- marital-status_Divorced
- description: Widowed
values:
- marital-status_Widowed
- description: Separated
values:
- marital-status_Separated

Feature Restrictions

Feature Restrictions are specified at the evaluation level and applied to individual features in the dataset. The specified restrictions are applied by Certifai when generating counterfactuals.

Fields

NameTypeRequired/ OptionalDescription
feature_namestringOptionalName (or index) of feature in dataset
restriction_stringstringOptionalRestrictions to apply for robustness and fairness reports. Must be one of: no restrictions, no changes, min/max, or percentage. Defaults to no restrictions if none is specified. Note that fairness grouping features implicitly have a restriction string of no changes applied during the fairness evaluation
restriction_numerical_percentagenumberOptionalUsed in conjunction with restriction_string = percentage
restriction_numerical_minnumberOptionalUsed in conjunction with restriction_string = min/max
restriction_numerical_maxnumberOptionalUsed in conjunction with restriction_string = min/max

Example

feature_restrictions:
- feature_name: age
restriction_string: no changes
- feature_name: marital
restriction_string: no changes

Prediction Values

The prediction_values field is applicable only to classification (binary or multiclass) tasks.

Fields

NameTypeRequired/ OptionalDescription
namestringOptionalA human-readable name for this prediction (e.g. 'Has Diabetes')
valueanyOptionalThe output value from the ML model corresponding to this prediction (e.g. 1/0)
favorablebooleanOptionalOptional field specifying if the prediction is favorable. Defaults to false

Example

prediction_values:
- favorable: true
name: Makes a deposit
value: 1
- favorable: false
name: Does not make a deposit
value: 0

Hyperparameters

The hyperparameters list can be used to modify the hyper-parameters used by the Certifai engine. Each item in the list must have the following fields:

Fields

NameTypeRequired/ OptionalDescription
namestringRequiredName of the hyperparameter
valueanyRequiredValue to apply to the hyperparameter

Notes

For detailed reference of available engine hyper-parameters, refer to the engine hyperparameter section of the Configuration File Reference.

Example

hyperparameters:
- name: num_counterfactuals
value: 3
- name: sampling_boundary
value: 0.05

Scoring

The scoring section of the scan definition is used to specify custom weights for computing Explainability scores and ATX scores.

Fields

NameTypeRequired/ OptionalDescription
explainabilitylist[object]A list of custom weights to be used when computing the explainability score. Refer to the Explainability section for details.
aspect_weightslist[object]A list of objects each specifying name and value attributes. The value is the corresponding weight for the name component. Refer to the Aspect Weights section for details.

Example

scoring:
explainability:
- num_features: 1
value: 100
- num_features: 2
value: 80
- num_features: 3
value: 50
- num_features: 4
value: 20
aspect_weights:
- name: "explainability"
value: 1.0
- name: "robustness"
value: 0.5
- name: "fairness"
value: 1.0
- name: "performance"
value: 0.2

Explainability

The explainability weights is a list of objects, each specifying a num_features and value attributes. The value is the corresponding weight to the specified num_features.

Fields

NameTypeRequired/ OptionalDescription
num_featuresintOptionalAn integer between 1 and 10
valuenumberOptionalA weight between 0 and 100 setting the corresponding weight for num_features

The current default value is:

explainability:
- num_features: 1
value: 100.0
- num_features: 2
value: 80.0
- num_features: 3
value: 50.0
- num_features: 4
value: 20.0
- num_features: 5
value: 0.0
- num_features: 6
value: 0.0
- num_features: 7
value: 0.0
- num_features: 8
value: 0.0
- num_features: 9
value: 0.0
- num_features: 10
value: 0.0

Aspect Weights

The aspects weights section is a list of objects that species the weightings for computing the ATX score.

Fields

NameTypeRequired/ OptionalDescription
namestringOptionalName of the ATX aspect; must be one of: explainability, robustness, fairness, or performance
valuenumberOptionalA non-negative numerical weight for the component

The current default value is:

aspect_weights:
- name: explainability"
value: 1.0
- name: robustness
value": 1.0
- name: fairness
value": 1.0
- name: performance
value": 1.0