Scan Definition Reference
This document provides a detailed description of the scan definition file that serves as the basis for running a scan in Certifai.
Scan Definition Fields
The scan definition is written in yaml and consists of the following sections (Each section is detailed below.):
scan
model_use_case
models
model_secret
model_headers
datasets
dataset_schema
evaluation
scoring
To refer to different sections of the scan definition, this doc generally uses the syntax section.field
. For example,
model_use_case.task_type
refers to the task_type
field in the model_use_case
section of the scan definition.
For examples of scan definitions refer to existing definitions in the Certifai toolkit under examples/definitions/
.
Scan
The scan
section of the scan definition contains general information for the scan job.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
output | object (key=path ) | Optional | An object for specifying where to output the resulting reports of the scan. The path must be a string that specifies either a local path (e.g. ./reports ,file:/tmp/reports ) OR a supported cloud storage path (e.g gs://\<some-path\> , s3://\<some-bucket\> ). Paths are relative to the location of the scan definition. |
Examples:
scan: output: path: ./reports
scan: output: path: "s3://certifai/reports"
Notes
There are different methods of specifying the output path to write reports to:
- Using the
--output
flag in the Scanner CLI - The value of the
SCAN_RESULTS_DIRECTORY
environment variable - The output path in the scan definition
- Default value of
./reports
(relative to scan definition file)
Model Use Case
The model_use_case
section of the scan definition contains general information about the problem being solved by your ML models.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
model_use_case_id | string | Required | A unique identifier to group the scans related to a specific model use case. The value must be unique across organizations so that it can be used in reporting services. The value is used when creating the model use case folder (after being modified to contain filename safe characters) that reports are written to. |
task_type | string | Required | The type of machine learning problem solved being solved. Must be one of the following: binary-classification , regression , or multiclass-classification |
performance_metrics | list[object] | Optional | List of performance metrics that apply to the model use case. |
atx_performance_metric_name | string | Optional | Name of the performance metric to be used in the ATX score for this use case |
description | string | Optional | Description of the problem being solved |
author | string | Optional | Information about the author of the model use case (e.g. email, name) |
name | string | Required | Name of the model use case - intended to be more human friendly than the model_use_case_id |
Example
model_use_case: model_use_case_id: c12e/datasciencelab/german_credit task_type: binary-classification name: 'Banking: Loan Approval' author: info@cognitivescale.com description: 'In this use case, each entry in the dataset represents a person who takes a credit loan from a bank. The learning task is to classify each person as either a good or bad credit risk according to the set of attributes.
This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit
The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29 ' atx_performance_metric_name: Accuracy performance_metrics: - metric: Accuracy name: Accuracy - metric: Recall name: Recall - metric: Precision name: Precision
Performance Metrics
The performance_metrics
section of the model_use_case
lists the user determined and defined value that may be added in Certifai to track specific use case concerns.
Defining performance metrics at the model use case level does not mean that a performance
evaluation is part of a particular scan. This field is only defining the performance metrics that apply to the use case as a whole. To include a performance
evaluation as part of your scan, you must list it under the evaluation.evaluation_types
field of the scan definition.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Required | A unique name that serves as a reference to this metric, and with which a value is associated in the performance report |
metric | string | Optional | A specifier of the metric to be calculated, if it is to be calculated by Certifai. See notes below for details |
Example
performance_metrics:- name: Accuracy metric: Accuracy- name: Micro Recall metric: Recall(micro)- name: My Bespoke Metric
Notes
The metric
specifier selects a scoring algorithm. It has the general form <algorithm_family_name>[(variant)]
and consists of a name for the metric scoring algorithm and allows for minor variant within that. The current supported set of families are:
Accuracy
Precision
Recall
F1
R-squared
The Precision
, Recall
, and F1
families support optional variants micro
and macro
. Micro variants are equivalent to the undecorated base family for non-multi-label problems, but are included for future use.
Precision
, Recall
, and F1
(along with their variants) are classification metrics and apply to both binary and multi-class cases.
In the binary case (model_use_case.task_type
set to binary_classification
) the metric is evaluated with respect to the 'true' label being the label specified as the favorable outcome.
R-squared
is a regression metric, and requires numerical model output.
Supported aliases for R-squared
are: Rsq
, R squared
and, R2
.
To specify a variant, use the syntax <algorithm_family_name>(<variant>)
. For example, Precision(micro)
specifies the micro
variant of Precision
.
Metric family names and variants are case-insensitive, and Certifai supports aliases for some of the metric families.
Models
The models
section of the scan definition specifies which models may be evaluated as part of the scan. Each model you specify is an object that has the following fields defined:
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
model_id | string | Required | An alphanumeric string (including underscores), used to uniquely identity a model within the scan |
name | string | Required | Name of the model - intended to be more human friendly than the model_id |
author | string | Optional | Information about the author of this model |
version | string | Optional | Field for tracking version of model |
description | string | Optional | Description of model |
model_id_tag | string | Optional | Tag that can be used to store extra metadata about your trained model, such as an id from an ML pipeline or a git commit hash (which wouldn't be suitable as a model_id ) |
predict_endpoint | string | Required | An http address pointing to where your ML model is being hosted. See the below notes for details on how Certifai communicates with your model |
max_batch_size | int | Optional | The maximum number of input rows (predictions) that the model supports in a single HTTP request. If not specified, the maximum prediction size can be as many entries as are in the dataset |
supports_soft_scoring | boolean | Optional | Whether the model supports confidence-scored predictions (e.g. probabilities). This field is only applicable to classification task types. The default value is false . See notes below for how this affects the model prediction format. |
prediction_value_order | list[any] | Optional | Ordering of class labels for soft scoring predictions returned by the model. This field is only applicable if supports_soft_scoring is true . Assuming the prediction values (i.e. values being returned by the model) are 1 and 2, then a prediction_value_order of [2,1] means that the soft outputs returned by your model contain the scores for the value 2 first |
performance_metric_values | list[object]) | Optional | A list of asserted values that are used for the specified metrics that are not to be calculated by Certifai. |
json_strict | boolean | Optional | Whether the model expects data to be encoded in strict JSON (missing values will encode as null ). If false then JavaScript encoding extensions will be used, and missing values will be encoded as NaN . This field is only applicable to remote models. The default value is false . |
Example
models:- model_id: logit name: Logistic Regression predict_endpoint: http://127.0.0.1:5111/german_credit_logit/predict- model_id: svm author: 'info@cognitivescale.com' description: Scikit-learn SVC classifier name: Support Vector Classifier predict_endpoint: http://127.0.0.1:5111/german_credit_svm/predict supports_soft_scoring: true json_strict: false performance_metric_values: - name: My Bespoke Metric value: 0.748
Notes
Performance metric values are a list of objects, consisting of a name
field and a value
field. The name
must correspond to a performance metric defined at the model_use_case
level. The value
must be a number between 0 and 1
(inclusive).
The full set of metrics are specified in the Model Use Case section of the scan definition. If a metric in the model_use_case
section has no metric
value, or if the evaluation
section specifies no test_dataset_id
, then each model must provide a value for that metric in this list.
For general information and examples of how Certifai communicates with hosted models refer to the Certifai Reference Model repo.
Model Secrets
Admins are able to create model secrets in Kubernetes (that contain a name and key (model_id
value)), and those K8s secrets are referenced through the scan definition by specifying ${model_secret.<model_id value>}
in the model header.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
model_secret | string | Optional | The model_secret is created in Kubernetes. After you add the model_secret field in the scan definition, a reference such as ${dotted.field.ref} in the model headers will use the specified field 'dotted.field.ref' taken from the specified secret. |
Example
model_secret: <your-k8s-secret-name>
See: Model Secrets for more details.
Model Headers
The model_headers
section of the scan definition contains information for custom HTTP-Headers to be used when communicating with your models.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
default | list[object] | Optional | A list of model headers that is applied to ALL models in the scan definition. |
defined | list[object] | Optional | A list of model headers that are applied to individual models based on their model_id |
Example
model_headers: default: - name: Content-Type value: application/json - name: accept value: application/json defined: - model_id: svm name: Content-Type value: application/json - model_id: logit name: Authorization value: Bearer ${TOKEN}
Notes
Model headers must contain name
and value
string fields. Defined model headers must also specify a model_id
field that corresponds to a model in the models section. The value
field can contain references to environment variables via the syntax: ${<ENV-VARIABLE-NAME>}
.
Datasets
The datasets
section of the scan definition lists the datasets that may be used as part of the evaluation. Each dataset you specify is an object that has the following fields defined:
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
dataset_id | string | Required | An alphanumeric (including underscores) ID for referencing this dataset |
url | string | Required | A url pointing to the dataset file. Local system files must be prefixed with file: . Supported URL prefixes are file: , s3: , gs: , and abfs: |
file_type | string | Required | One of: csv or json . This selection drives an additional set of fields defined below |
encoding | string | Optional | The file encoding of the dataset. Supported encodings: [ascii, utf-16, utf-16-be, utf-16-le, utf-32, utf-32-be, utf-32-le, utf-7, utf-8, utf-8-sig, latin-1, iso-8859-1, windows-1252] |
description | string | Optional | Description of the dataset |
name | string | Optional | Name of the dataset - intended to be more human friendly than the dataset_id |
CSV Specific Fields
These fields are specific to CSV datasets, and are only referenced if file_type
is csv
.
Name | Type | Required/ Optional | Description |
---|---|---|---|
has_header | string | Optional | Whether the csv file has column headers. Defaults to True |
delimiter | string | Optional | String to use as separator for the csv file. Defaults to , |
quote_character | string | Optional | A one character string used to denote the start and end of a quoted item. Defaults to " |
escape_character | string | Optional | A one character string used to escape other characters. Defaults to None |
JSON Specific Fields
These fields are specific to JSON datasets, and are only referenced if file_type
is json
.
Name | Type | Required/ Optional | Description |
---|---|---|---|
orient | string | Optional | One of: records , values , or columns . These values are synonymous with pandas' read_json() orient argument. Defaults to records |
lines | boolean | Optional | Whether the JSON dataset is in json-lines format. Defaults to True |
Example
datasets:## json dataset- dataset_id: eval description: '' name: Evaluation dataset file_type: json url: file:test/data/german_credit_mini_records.json lines: true orient: records## csv dataset- dataset_id: expl description: '' file_type: csv has_header: true name: 100 row explanation dataset url: file:test/data/german_credit_explan.csv
Notes
The Datasets section provides one or more dataset options that may be specified for your scans; in the Evaluation section of the scan definition you specify which dataset to use for a particular scan.
Dataset Schema
The dataset_schema
section of the scan definition contains schema details about the defined datasets.
If the outcome_column
and predicted_outcome_column
are not specified, the dataset is assumed to not contain them.
The scanner infers the feature_schemas
field from the evaluation dataset, and the specified feature_schemas
input is applied as overrides to the inferred schema.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
outcome_column | string | Optional | Name (or index) of the outcome column in the dataset |
predicted_outcome_column | string | Optional | Name (or index) of the predicted outcome column in the dataset |
hidden_columns | list[string] | Optional | A list of feature names (or indices) of columns in the dataset NOT to expose to models. Hidden columns are removed from the dataset entries prior to being applied to the scan |
feature_schemas | list[object] | Optional | A list of schema definitions for features in the dataset. This list is treated as overrides to the auto-inferred schema by the scanner. It is entirely optional and does not have to include all features in the dataset (i.e. it may be sparsely defined). The expected objects in the list are described in the below Feature Schemas section |
defined_feature_order | boolean | Optional | A boolean specifying whether the feature_schemas field specifies the order of the features in dataset. If set to True , the Feature Schemas section must contain all the features in the dataset. The default value is false for csv files and true for JSON datasets where orient is not set to columns . |
Examples
dataset_schema: outcome_column: outcome predicted_outcome_column: predicted_outcome feature_schemas: - feature_name: age - feature_name: foreign - feature_name: purpose data_type: categorical category_values: - 'car (new)' - 'car (used)' - 'furniture/equipment' - 'radio/television' one_hot_columns: - name: 'purpose_car (new)' value: 'car (new)' - name: 'purpose_car (used)' value: 'car (used)' - name: 'purpose_furniture/equipment' value: 'furniture/equipment' - name: 'purpose_radio/television' value: 'radio/television' hidden_columns: - age - foreign
dataset_schema: outcome_column: outcome defined_feature_order: true feature_schemas: - feature_name: checkingstatus - feature_name: duration - feature_name: history - feature_name: purpose - feature_name: amount min: 10 max: 1000000000 spread: 1.42 - feature_name: savings - feature_name: employ - feature_name: installment - feature_name: status - feature_name: others - feature_name: residence - feature_name: property - feature_name: age - feature_name: otherplans - feature_name: housing - feature_name: cards - feature_name: job - feature_name: liable - feature_name: telephone - feature_name: foreign - feature_name: outcome
Notes
ALERT
Setting defined_feature_order
to false for JSON datasets that do not have orient columns
may cause the dataset features to be sent to your model in an incorrect order.
For JSON datasets with an orient
of records
, the
features found in Feature Schemas provide the order of the dataframe.
For csv datasets and JSON datasets with an orient
of values
or columns
, the names found in the Feature Schemas section specify/override the column names of the dataframe.
Feature Schemas
Each item in the feature_schemas
list can have the following attributes.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
feature_name | string | Optional | Name of the column (or index if there are no headers) being referred to |
data_type | string | Optional | One of: categorical , numerical-int , or numerical-float |
category_values | list[string | int] | The possible values for the category. This attribute is only used when the data_type is categorical | |
one_hot_columns | list[object] | Optional | A list of mappings between (one-hot) column names in the dataset and values of the feature. If the schema defined_feature_order is True then this also specifies the column ordering. The set of values must match those specified in category_values |
target_encodings | list[float] | Optional | A list of numeric encoding values of the feature. The set of values must be in correspondence to those specified in category_values in the same order |
categorical_type | string | Optional | Data type to interpret the category values as. Must be one of: auto , string , or int . Defaults to auto , in which case the data type will be inferred. Should only be used when data_type is categorical |
min | number | Optional | Minimum possible value for the feature. Should only be used when data_type is numerical-int or numerical-float |
max | number | Optional | Maximum possible value for the feature. Should only be used when data_type is numerical-int or numerical-float |
spread | number | Optional | A typical magnitude of change - normally the mean absolute deviation or standard deviation. If not specified, an appropriate spread is estimated from the dataset. Should only be used when data_type is numerical-int or numerical-float |
Example
Without one-hot encoding:
feature_schemas:- feature_name: age data_type: numerical-int min: 18 max: 65 spread: 1.0- feature_name: foreign data_type: categorical category_values: - "foreign - yes" - "foreign - no"
With one-hot encoding:
feature_schemas:- feature_name: age data_type: numerical-int min: 18 max: 65 spread: 1.0- feature_name: foreign data_type: categorical category_values: - "foreign - yes" - "foreign - no" one_hot_columns: - name: 'foreign_foreign - yes' value: 'foreign - yes' - name: 'foreign_foreign - no' value: 'foreign - no'
With target encoding:
feature_schemas:- feature_name: age data_type: numerical-int min: 18 max: 65 spread: 1.0- feature_name: foreign data_type: categorical category_values: - "foreign - yes" - "foreign - no" target_encodings: - 0.3 - 0.48
Notes
- If an override is not applicable to a feature, it is NOT applied. For example, if an override specifies
category_values
but the feature'sdata_type
is notcategorical
, thecategory_values
field is not applied to the feature. - One-hot encoding captures the way the dataset is encoded in the dataset source (e.g. - CSV file) and is expected by the model. It does not impact feature semantics apart from the expected encoding.
- Target-encodings capture another possible way the dataset could be encoded in the dataset source (e.g. - CSV file) and is expected by the model. The technique of target encoding is a way to encode categorical values numerically (usually by some statistical association with the ground truth value of the prediction). It does not impact feature semantics apart from the expected encoding. The list of
target_encoding
values should match the list ofcategory_values
, providing the corresponding encoding that has been used for each. - Explanations in reports are surfaced as value-encoded for all categorical features, regardless of whether or not they are one-hot or target encoded in the dataset. This makes the explanations human-readable even for one-hot encoded datasets.
- The
categorical_type
field can be used to specify the data type for a categorical feature when there is possible ambiguity. For example, the value "01" could either be interpreted as the string"01"
or as the integer1
.
Evaluation
The evaluation
section specifies the details for the evaluation that are run in the scan. Only a single evaluation can
be specified in this section, but multiple evaluation types (e.g. robustness, fairness, explanation, explainability)
can be run against all (or some) of the models specified in the Models section of the scan definition.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Optional | Name of the evaluation being run |
description | string | Optional | Description of the evaluation |
environment | string | Optional | Field for tracking the environment the evaluation is being run in (e.g. DEV, QA, Compliance) |
evaluation_types | list[string] | Required | The list of evaluations to be performed on the models in this scan. Valid values include: robustness , fairness , explanation , explainability , and performance |
evaluation_dataset_id | string | Required | An ID corresponding to a dataset defined in the Datasets section. This dataset is used for the robustness, fairness, and explainability evaluations |
explanation_dataset_id | string | Optional | An ID corresponding to a dataset defined in the Datasets section. This dataset is used for explanation evaluation, and is only required if explanation is listed as an evaluation_type |
test_dataset_id | string | Optional | An ID corresponding to a dataset defined in the Datasets section. This dataset is used for computing performance metrics, and is only required if the model_use_case sections lists performance metrics to be evaluated by Certifai |
fairness_grouping_features | list[object] | Optional | List of features and groups within those features to use for calculating fairness. See the Fairness Grouping Features section for details. |
fairness_metrics | list[string] | Optional | List of fairness metrics to be calculated/applied as part of the Fairness evaluation. Possible values include: (case insensitive): "Demographic_parity" ("Demographic parity", "Demographic"), "Equal_opportunity" ("equal opportunity", "opportunity"), "Equal_odds" ("odds", "equal odds"), "Sufficiency" ("predictive rate parity"), and "Burden". The default value for this field is [burden] . |
primary_fairness_metric | string | Optional | The fairness metric to use as the Fairness aspect for calculating the ATX score |
explanation_types | list[string] | Optional | List of explanation types to be used in the Explainability and Explanation evaluations. Possible explanation types include (case insensitive): "counterfactual" ("burden") and "shap". The default value is [counterfactual] . |
primary_explanation_type | string | Optional | The explanation type to use as the Explainability aspect for calculating the ATX score |
feature_restrictions | list[object] | Optional | A list of feature restrictions to apply to generated counterfactuals. See the Feature Restrictions section for details. |
hyperparameters | list[object] | Optional | A list of objects with name and value attributes. These parameters can be used to override the default engine hyperparameters. See the Hyperparameters section for details. |
prediction_description | string | Optional | Description of what is being predicted by the models |
prediction_favorability | string | Optional | The prediction format for favorable prediction outcomes in the scan definition. The field has different possible values and behaviors depending on the task type. If not specified, the value is inferred based on the prediction information in the scan definition. |
save_counterfactuals | boolean | Optional | Whether to save generated counterfactuals in a separate CSV file for the Robustness, Fairness, and Explainability reports. SHAP explanations will be saved in a separate CSV file as a well for the Explainability report. Defaults to false . |
Notes
Prediction Favorability values and behaviors:
For Classification, possible values are:
explicit
,ordered
, andnone
.- A favorability of
explicit
means that the prediction outcomes in theprediction_values
list explicitly declare which are favorable, by setting thefavorable
field to true. - A favorability of
none
means that none of the prediction outcomes in theprediction_values
list is favorable; therefore, entries in theprediction_values
list may not have thefavorable
field set to true. - A favorability of
ordered
specifies that the prediction outcomes in theprediction_values
list are ordered from most favorable to least favorable. At most two counterfactuals are generated per observation for theexplanation
report (similar to regression) if this set. Thelast_favorable_prediction
field can be used in this case to specify which of the prediction outcomes is deemed favorable (see below in the classification specific section).
- A favorability of
For Regression, possible values are:
ordered
andnone
.- A favorability of
none
specifies that there is no favorable direction (increased or decreased) for the predictions. Thefavorable_outcome_value
field is not set in this case. - A favorability of
ordered
specifies that there is a favorable direction (either increased or decreased) for the predictions. Thefavorable_outcome_value
field must be set in this case.
- A favorability of
Regression Specific Fields:
Set these attributes for regression use cases.
Name | Type | Required/ Optional | Description |
---|---|---|---|
regression_boundary_type | string | Optional | Indicates how the boundary is defined. Valid values are absolute and relative (default relative ) |
regression_standard_deviation | number | Optional | Only used with regression_boundary_type of relative . Amount of change of prediction required for the analysis to consider it sufficient to be a counterfactual, in units of the standard deviation of the predicted scores for the entire dataset. If omitted, a value of 0.5 is used |
regression_boundary | number | Optional | Only used with regression_boundary_type of absolute and mutually exclusive with regression_boundary_percentile . Exact value of outcome that separates favorable from unfavorable |
regression_boundary_percentile | number | Optional | Only used with regression_boundary_type of absolute and mutually exclusive with regression_boundary . Percentile of outcome value distribution (as empirically measured by the evaluation dataset) that separates favorable from unfavorable |
favorable_outcome_value | string | Optional | Favorable direction for regression task type. Possible values are: increased and decreased . This field must be left empty if the prediction_favorability is none . |
Classification Specific Fields:
Set these attributes for classification use cases.
Name | Type | Required/ Optional | Description |
---|---|---|---|
prediction_values | list[object] | Optional | A list of of prediction outcomes that can be returned by the model. The attributes of predictions are described in the Prediction Values section. |
last_favorable_prediction | string | Optional | The value of the last favorable prediction outcome in the prediction_values list. It must correspond to a prediction outcome in the prediction_values list. Applicable only when prediction_favorability is ordered |
favorable_outcome_group_name | string | Optional | A name describing the group of predictions that are favorable. This field is only applicable when the model_use_case.task_type is multiclass-classification and the prediction_favorability is explicit . |
unfavorable_outcome_group_name | string | Optional | A name describing the group of prediction are unfavorable. This field is only applicable when the model_use_case.task_type is multiclass-classification and the prediction_favorability is explicit . |
Examples
# binary-classification evaluationevaluation: description: This evaluation compares the robustness, accuracy, fairness and explanations for 4 candidate models. evaluation_dataset_id: eval evaluation_types: - robustness - fairness - explanation - explainability - performance explanation_dataset_id: explan test_dataset_id: test fairness_grouping_features: - name: age - name: status feature_restrictions: - feature_name: age restriction_string: no changes - feature_name: status restriction_string: no changes name: Baseline evaluation of 4 models prediction_description: Will a loan be granted? prediction_favorability: explicit prediction_values: - favorable: true name: Loan Granted value: 1 - favorable: false name: Loan Denied value: 2
# multiclass-classification evaluationevaluation: name: Baseline evaluation of 3 models description: This evaluation compares the robustness, fairness, performance and explainability for 3 candidate models. evaluation_dataset_id: eval explanation_dataset_id: explan evaluation_types: - robustness - fairness - explainability - explanation - performance prediction_favorability: explicit prediction_values: - name: "Heart disease not detected" value: 0 favorable: true - name: "Stage 1: > 50% diameter narrowing in a major vessel" value: 1 favorable: false - name: "Stage 2: > 50% diameter narrowing in a major vessel" value: 2 favorable: false - name: "Stage 3: > 50% diameter narrowing in a major vessel" value: 3 favorable: false - name: "Stage 4: > 50% diameter narrowing in a major vessel" value: 4 favorable: false favorable_outcome_group_name: Heart Disease not detected unfavorable_outcome_group_name: Heart Disease detected prediction_description: "Indicator of heart disease level (angiographic disease status)" fairness_grouping_features: - name: sex feature_restrictions: - feature_name: sex restriction_string: no changes
# regression evaluationevaluation: description: This evaluation compares the robustness, accuracy, fairness and explanations for 3 candidate models. evaluation_dataset_id: eval evaluation_types: - robustness - fairness - explanation - explainability - performance explanation_dataset_id: explan test_dataset_id: test fairness_grouping_features: - name: Marital Status - name: Gender favorable_outcome_value: increased feature_restrictions: - feature_name: Gender restriction_string: no changes name: Baseline evaluation of 3 models prediction_description: Amount of Settled Claim prediction_favorability: ordered regression_standard_deviation: 0.5
Fairness Grouping Features
The fairness grouping features field is only required when fairness
is listed under the evaluation.evaluation_type
field. Each fairness grouping feature defined has the following attributes:
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Optional | Name of the feature (or index) in the dataset |
buckets | list[object] | Optional | An optional list of objects defining groups within the named features. If no buckets are specified, then Certifai treats each distinct value for that feature in the dataset as a separate class. The structure for specifying buckets depends on whether the feature is categorical or numerical. Refer to the notes below for details |
Buckets (fields) for categorical features:
Name | Type | Required/ Optional | Description |
---|---|---|---|
description | string | Optional | Description/name of the group |
values | list[string | int] | Optional | List of category values that belong in the bucket |
Buckets (fields) for numerical features
Name | Type | Required/ Optional | Description |
---|---|---|---|
description | string | Optional | Description/name of the group |
max | number | Optional | The maximum value allowed in the group. Values belong to the group with the lowest upper bound greater than or equal to the value, and exactly one bucket must omit an upper bound to act as a catch-all |
Example
fairness_grouping_features:- name: gender- name: Age buckets: # buckets for numeric feature - description: "<= 40 years old" max: 40 - description: "> 40 years old"- name: marital-status buckets: # buckets for categorical feature - description: Single values: - marital-status_Never-married - description: Married values: - marital-status_Married-AF-spouse - marital-status_Married-civ-spouse - marital-status_Married-spouse-absent - description: Divorced values: - marital-status_Divorced - description: Widowed values: - marital-status_Widowed - description: Separated values: - marital-status_Separated
Feature Restrictions
Feature Restrictions are specified at the evaluation level and applied to individual features in the dataset. The specified restrictions are applied by Certifai when generating counterfactuals.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
feature_name | string | Optional | Name (or index) of feature in dataset |
restriction_string | string | Optional | Restrictions to apply for robustness and fairness reports. Must be one of: no restrictions , no changes , min/max , or percentage . Defaults to no restrictions if none is specified. Note that fairness grouping features implicitly have a restriction string of no changes applied during the fairness evaluation |
restriction_numerical_percentage | number | Optional | Used in conjunction with restriction_string = percentage |
restriction_numerical_min | number | Optional | Used in conjunction with restriction_string = min/max |
restriction_numerical_max | number | Optional | Used in conjunction with restriction_string = min/max |
Example
feature_restrictions:- feature_name: age restriction_string: no changes- feature_name: marital restriction_string: no changes
Prediction Values
The prediction_values
field is applicable only to classification (binary or multiclass) tasks.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Optional | A human-readable name for this prediction (e.g. 'Has Diabetes') |
value | any | Optional | The output value from the ML model corresponding to this prediction (e.g. 1/0) |
favorable | boolean | Optional | Optional field specifying if the prediction is favorable. Defaults to false |
Example
prediction_values:- favorable: true name: Makes a deposit value: 1- favorable: false name: Does not make a deposit value: 0
Hyperparameters
The hyperparameters
list can be used to modify the hyper-parameters used by the Certifai engine. Each item in the list must have the following fields:
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Required | Name of the hyperparameter |
value | any | Required | Value to apply to the hyperparameter |
Notes
ALERT
Hyperparameter tuning is intended for advanced users and can affect the quality of results produced and overall evaluation time.For detailed reference of available engine hyper-parameters, refer to the engine hyperparameter section of the Configuration File Reference.
Example
hyperparameters:- name: num_counterfactuals value: 3- name: sampling_boundary value: 0.05
Scoring
The scoring
section of the scan definition is used to specify custom weights for computing Explainability scores and ATX scores.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
explainability | list[object] | A list of custom weights to be used when computing the explainability score. Refer to the Explainability section for details. | |
aspect_weights | list[object] | A list of objects each specifying name and value attributes. The value is the corresponding weight for the name component. Refer to the Aspect Weights section for details. |
Example
scoring: explainability: - num_features: 1 value: 100 - num_features: 2 value: 80 - num_features: 3 value: 50 - num_features: 4 value: 20 aspect_weights: - name: "explainability" value: 1.0 - name: "robustness" value: 0.5 - name: "fairness" value: 1.0 - name: "performance" value: 0.2
Explainability
The explainability weights is a list of objects, each specifying a num_features
and value
attributes. The value
is the corresponding weight to the specified num_features
.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
num_features | int | Optional | An integer between 1 and 10 |
value | number | Optional | A weight between 0 and 100 setting the corresponding weight for num_features |
The current default value is:
explainability:- num_features: 1 value: 100.0- num_features: 2 value: 80.0- num_features: 3 value: 50.0- num_features: 4 value: 20.0- num_features: 5 value: 0.0- num_features: 6 value: 0.0- num_features: 7 value: 0.0- num_features: 8 value: 0.0- num_features: 9 value: 0.0- num_features: 10 value: 0.0
Aspect Weights
The aspects weights section is a list of objects that species the weightings for computing the ATX score.
Fields
Name | Type | Required/ Optional | Description |
---|---|---|---|
name | string | Optional | Name of the ATX aspect; must be one of: explainability , robustness , fairness , or performance |
value | number | Optional | A non-negative numerical weight for the component |
The current default value is:
aspect_weights:- name: explainability" value: 1.0- name: robustness value": 1.0- name: fairness value": 1.0- name: performance value": 1.0