Define and run scans locally
- Download the Certifai toolkit
- Install the Certifai Toolkit
- You must have the following prepared to define your scan:
- Prepare your Model endpoint(s)
- (Optional) Performance metric(s)
- A dataset URL
- For Fairness reports: one or more dataset features you wish to measure
- For Evaluations: an additional dataset URL (a subset of the original dataset)
Tip: You may want to use the sample use case components provided in the Certifai Toolkit to run sample scans to familiarize yourself with Certifai.
Scan your own model
The steps in this section show how to create a scan from scratch, using one of the models for the German Credit example in the
models folder of the cortex-certifai-examples repository.
Generate a scan definition
In a different terminal, activate the conda environment you created for
certifai.conda activate certifai
Create a new scan definition.
NOTE: If your dataset is being called from a local folder, make sure that you cd into that folder before you run this command.certifai definition-create
Enter information as prompted.
The example below creates a scan definition for a
trust scanusing a German Credit dataset provided in the toolkit:? What type of scan definition would like you to generate? trust scan? Model use case ID test? Learning Task Type binary-classification? Dataset file type csv? Evaluation Dataset URL (use 'file:' prefix for local system files) file:examples/datasets/german_credit_eval.csv? CSV dataset headers are present Yes? Infer one hot encoded columns based on feature names (select None to not infer one-hot encoded columns) None? Infer the possible values taken by categorical features from the data (only recommended when generating from a large representative dataset) NoScan definition created at: scan_definition.yamlThe following integer-valued features were inferred to be categoricals:installmentresidencecardsliableoutcomeThe following integer-valued features were inferred to be numeric:durationamount
The command generates the
scan_definition.yamlfile using the information you enter and analyzing the schema of the dataset you provide. Included in the output is summary information that notes inferences made about features in your dataset.
The prompts ask for:
- The type of scan definition you want to generate (
- Information about your use case
- The file path to an evaluation dataset to use for the scan
- The method for inferring one-hot encoded columns within your dataset.
- Whether to include assertions about the possible values categorical features may take in the scan, based on those present in the data used
trust scantype generates scan reports for: robustness, fairness, explainability, and performance evaluations in your definition; whereas, the
explanationtype provides only an explanation evaluation.
The penultimate asks how to infer one-hot encoded columns within your dataset.
The possible responses are:
None: use if your dataset does not include one-hot encoded columns or if the columns names don't match either the
feature.valuenaming convention. After your definition is created, you should manually edit the scan definition file with any one-hot encoding information.
feature_value (pandas): use if your dataset includes one-hot encoded columns that match the naming convention
feature_value, there are multiple columns containing only 0/1 values, and column names have a common prefix ending with an underscore (
_). In this case the feature name is inferred to be the common prefix before the terminating underscore, and the value is the remaining string.
feature.value: use if if your dataset includes one-hot encoded columns that match the naming convention
feature.valueand columns are encoded similarly to
feature_values (pandas), but with the separating character being a dot (
.) rather than an underscore (
The one-hot encoding inference assumes that the encoded values do not contain the separating character,
_if you selected
.if you selected
feature.value. Additionally, there are possible cases where the inference may result in an incorrect encoding or even fail unexpectedly due to ambiguity. The following message is displayed in this case:Failed to create scan definition - Unable to decode one-hot encoded columns in 'eval' dataset - Column 'user_region' has invalid one-hot encoding (0-hot and multi-hot both present)
If you encounter either situation, you must manually edit your scan definition file.
The last prompt asks whether to attempt to infer the set of values that may be taken by categorical features from the provided data. This is only recommended if you are generating the definition from a large and fully representative dataset
- The type of scan definition you want to generate (
Edit the scan definition file in a text editor to add information about the use case. See the comments in the generated definition for guidance on how to update the fields.
For the German Credit example, the minimum changes you need to make to run a scan in this use case are:
fairness_grouping_features: Change the list of feature names to match the actual features you want to analyze for fairness. In this case, replace the list with
status. The feature names are listed in the generated
outcome_column: If the example has a column containing the ground truth, uncomment and specify the feature name, which in this case is
prediction_values: Change the
valuefields to the values returned in the prediction, and the
namefields to be suitable human-readable labels for the values. (In this use case, 1 indicates the loan is granted and 2 indicates it was denied.)
prediction_description: The guiding question of the use case models; what the models are predicting.
predict_endpoint: Change to the URL for the model, in this case verify that it is
For the German Credit example, the fields in the definition should look similar to:models:- ...predict_endpoint: http://localhost:8551/predictdataset_schema:outcome_column: outcomefeature_schemas:- ...evaluation:...fairness_grouping_features:- name: 'status'...prediction_description: Is a loan granted?prediction_values:- favorable: truename: Loan grantedvalue: 1- favorable: falsename: Loan deniedvalue: 2
You can see a full version of the definition in the toolkit folder in
examples/definitions/german_credit_scanner_definition.yaml. This example has multiple models and multiple fairness grouping features.
Validate the scan definition
Validate that the fields in your scan definition match the expected schema:
certifai definition-validate -f scan_definition.yaml
If the definition validates successfully, the output is:
Successfully validated scan definition
If the definition fails validation, the errors are described.
For example, the following is displayed if you do not change the default generated fairness_grouping_features to match the features in your dataset:
Validation failed, Validation errors found, Evaluation validation failed:Unknown feature 'feature 1' specified in fairness_grouping_features,Unknown feature 'feature 2' specified in fairness_grouping_features
Test connecting to the model endpoint
Test that your definition works with your model.
certifai definition-test -f scan_definition.yaml
For example, if the model server is not running or the URL is incorrect, the test results in connection errors similar to:
...requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8551): Max retries exceeded with url: /predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce99421fd0>: Failed to establish a new connection: [Errno 61] Connection refused',))Prediction test failed for 'ml_model' model against dataset 'eval': HTTPConnectionPool(host='localhost', port=8551): Max retries exceeded with url: /predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce99421fd0>: Failed to establish a new connection: [Errno 61] Connection refused',))
If you get this error, check the endpoint in the scan definition and check that the model is running, then run the test again.
If you get the following error:
ValueError: Model failed to evaluate batch (20).Prediction test failed for 'ml_model' model against dataset 'eval': Model failed to evaluate batch (20).
check that you have updated the
outcome_column as described above. The
above error is returned by the reference model when too many fields are passed
in the predict call.
When the test is successful, the output is similar to the following:
Prediction test successful for 'ml_model' model against 'eval' dataset
Run your scan
Alert: Windows 10 Users
Before you run a scan, you must disable
QuickEdit Mode in your terminal window. Right click in the terminal window and uncheck the
QuickEdit Mode option.
If you do not disable this option, clicking off your terminal window and back into your terminal window will cause the window to freeze and you will not receive your scan results updates.
You are now ready to run your scan:
certifai scan -f scan_definition.yaml -o <directory to place reports in>
-o may be omitted. If it is, the following pathways are used for storing scan results in the order presented:
-ois not specified, the result are sent to the
SCAN_RESULTS_DIRECTORYif that is defined as a locally-set environment variable.
- If neither the
-ovariable nor the
SCAN_RESULTS_DIRECTORYhave been configured, the result are sent to the output path if that is specified in the scan definition.
- If neither the
-ovariable nor the
SCAN_RESULTS_DIRECTORYhave been configured and no output path has been defined in the scan definition, then the reports are sent to the default location,
View the scan results visualizations
View your result visualization in your local Console.