Version: next

Define and run scans locally

Prerequisites

Download the Certifai toolkit
Install the Certifai Toolkit
You must have the following prepared to define your scan:

Prepare your Model endpoint(s)
- (Optional) Performance metric(s)
A dataset URL
- For Fairness reports: one or more dataset features you wish to measure
- For Evaluations: an additional dataset URL (a subset of the original dataset)

Tip: You may want to use the sample use case components provided in the Certifai Toolkit to run sample scans to familiarize yourself with Certifai.

Scan your own model

The steps in this section show how to create a scan from scratch, using one of the models for the German Credit example in the models folder of the cortex-certifai-examples repository.

Generate a scan definition

In a different terminal, activate the conda environment you created for certifai.
```
conda activate certifai
```
Create a new scan definition.
NOTE: If your dataset is being called from a local folder, make sure that you cd into that folder before you run this command.
```
certifai definition-create
```
Enter information as prompted.
The example below creates a scan definition for a trust scan using a German Credit dataset provided in the toolkit:
```
? What type of scan definition would like you to generate?  trust scan
? Model use case ID test
? Learning Task Type  binary-classification
? Dataset file type  csv
? Evaluation Dataset URL (use 'file:' prefix for local system files) file:examples/datasets/german_credit_eval.csv
? CSV dataset headers are present  Yes
? Infer one hot encoded columns based on feature names (select None to not infer one-hot encoded columns)  None
? Infer the possible values taken by categorical features from the data (only recommended when generating from a large representative dataset) No
Scan definition created at: scan_definition.yaml

The following integer-valued features were inferred to be categoricals:
    installment
    residence
    cards
    liable
    outcome
The following integer-valued features were inferred to be numeric:
    duration
    amount
```
The command generates the scan_definition.yaml file using the information you enter and analyzing the schema of the dataset you provide. Included in the output is summary information that notes inferences made about features in your dataset.
The prompts ask for:
- The type of scan definition you want to generate (trust scan)
- Information about your use case
- The file path to an evaluation dataset to use for the scan
- The method for inferring one-hot encoded columns within your dataset.
- Whether to include assertions about the possible values categorical features may take in the scan, based on those present in the data used
The trust scan type generates scan reports for: robustness, fairness, explainability, and performance evaluations in your definition; whereas, the explanation type provides only an explanation evaluation.
The penultimate asks how to infer one-hot encoded columns within your dataset.
The possible responses are:
- None: use if your dataset does not include one-hot encoded columns or if the columns names don't match either the feature_value (pandas) or feature.value naming convention. After your definition is created, you should manually edit the scan definition file with any one-hot encoding information.
- feature_value (pandas): use if your dataset includes one-hot encoded columns that match the naming convention feature_value, there are multiple columns containing only 0/1 values, and column names have a common prefix ending with an underscore (_). In this case the feature name is inferred to be the common prefix before the terminating underscore, and the value is the remaining string.
- feature.value: use if if your dataset includes one-hot encoded columns that match the naming convention feature.value and columns are encoded similarly to feature_values (pandas), but with the separating character being a dot (.) rather than an underscore (_).
  NOTE
  The one-hot encoding inference assumes that the encoded values do not contain the separating character, _ if you selected feature_value (pandas) or . if you selected feature.value. Additionally, there are possible cases where the inference may result in an incorrect encoding or even fail unexpectedly due to ambiguity. The following message is displayed in this case:
```
Failed to create scan definition - Unable to decode one-hot encoded columns in 'eval' dataset  - Column 'user_region' has invalid one-hot encoding (0-hot and multi-hot both present)
```
  If you encounter either situation, you must manually edit your scan definition file.
The last prompt asks whether to attempt to infer the set of values that may be taken by categorical features from the provided data. This is only recommended if you are generating the definition from a large and fully representative dataset
Edit the scan definition file in a text editor to add information about the use case. See the comments in the generated definition for guidance on how to update the fields.
For the German Credit example, the minimum changes you need to make to run a scan in this use case are:

fairness_grouping_features: Change the list of feature names to match the actual features you want to analyze for fairness. In this case, replace the list with status. The feature names are listed in the generated feature_schemas section.
outcome_column: If the example has a column containing the ground truth, uncomment and specify the feature name, which in this case is outcome.
prediction_values: Change the value fields to the values returned in the prediction, and the name fields to be suitable human-readable labels for the values. (In this use case, 1 indicates the loan is granted and 2 indicates it was denied.)
prediction_description: The guiding question of the use case models; what the models are predicting.

predict_endpoint: Change to the URL for the model, in this case verify that it is http://127.0.0.1:8551/predict.

For the German Credit example, the fields in the definition should look similar to:

models:
- ...
  predict_endpoint: http://localhost:8551/predict
dataset_schema:
  outcome_column: outcome
  feature_schemas:
  - ...
evaluation:
  ...
  fairness_grouping_features:
  - name: 'status'
  ...
  prediction_description: Is a loan granted?
  prediction_values:
  - favorable: true
    name: Loan granted
    value: 1
  - favorable: false
    name: Loan denied
    value: 2

You can see a full version of the definition in the toolkit folder in examples/definitions/german_credit_scanner_definition.yaml. This example has multiple models and multiple fairness grouping features.

Validate the scan definition

Validate that the fields in your scan definition match the expected schema:

certifai definition-validate -f scan_definition.yaml

If the definition validates successfully, the output is:

Successfully validated scan definition

If the definition fails validation, the errors are described.

For example, the following is displayed if you do not change the default generated fairness_grouping_features to match the features in your dataset:

Validation failed, Validation errors found, Evaluation validation failed:
Unknown feature 'feature 1' specified in fairness_grouping_features,
Unknown feature 'feature 2' specified in fairness_grouping_features

Test connecting to the model endpoint

Test that your definition works with your model.

certifai definition-test -f scan_definition.yaml

For example, if the model server is not running or the URL is incorrect, the test results in connection errors similar to:

...
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8551): Max retries exceeded with url: /predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce99421fd0>: Failed to establish a new connection: [Errno 61] Connection refused',))

Prediction test failed for 'ml_model' model against dataset 'eval': HTTPConnectionPool(host='localhost', port=8551): Max retries exceeded with url: /predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce99421fd0>: Failed to establish a new connection: [Errno 61] Connection refused',))

If you get this error, check the endpoint in the scan definition and check that the model is running, then run the test again.

If you get the following error:

ValueError: Model failed to evaluate batch (20).
Prediction test failed for 'ml_model' model against dataset 'eval': Model failed to evaluate batch (20).

check that you have updated the outcome_column as described above. The above error is returned by the reference model when too many fields are passed in the predict call.

When the test is successful, the output is similar to the following:

Prediction test successful for 'ml_model' model against 'eval' dataset

Run your scan

Alert: Windows 10 Users

Before you run a scan, you must disable QuickEdit Mode in your terminal window. Right click in the terminal window and uncheck the QuickEdit Mode option.

If you do not disable this option, clicking off your terminal window and back into your terminal window will cause the window to freeze and you will not receive your scan results updates.

You are now ready to run your scan:

certifai scan -f scan_definition.yaml -o <directory to place reports in>

NOTE: -o may be omitted. If it is, the following pathways are used for storing scan results in the order presented:

If -o is not specified, the result are sent to the SCAN_RESULTS_DIRECTORY if that is defined as a locally-set environment variable.
If neither the -o variable nor the SCAN_RESULTS_DIRECTORY have been configured, the result are sent to the output path if that is specified in the scan definition.
If neither the -o variable nor the SCAN_RESULTS_DIRECTORYhave been configured and no output path has been defined in the scan definition, then the reports are sent to the default location, ./reports.

View the scan results visualizations

View your result visualization in your local Console.

#Prerequisites

#Scan your own model

#Generate a scan definition

NOTE

#Validate the scan definition

#Test connecting to the model endpoint

#Run your scan