Version: 1.3.14

Define and run scans locally

Prerequisites

  1. Download the Certifai toolkit
  2. Install the Certifai Toolkit
  3. You must have the following prepared to define your scan:
  • Prepare your Model endpoint(s)
    • (Optional) Performance metric(s)
  • A dataset URL
    • For Fairness reports: one or more dataset features you wish to measure
    • For Evaluations: an additional dataset URL (a subset of the original dataset)

Tip: You may want to use the sample use case components provided in the Certifai Toolkit to run sample scans to familiarize yourself with Certifai.

Scan your own model

The steps in this section show how to create a scan from scratch, using one of the models for the German Credit example in the models folder of the cortex-certifai-examples repository.

Generate a scan definition

  1. In a different terminal, activate the conda environment you created for certifai.

    conda activate certifai
  2. Create a new scan definition.

    NOTE: If your dataset is being called from a local folder, make sure that you cd into that folder before you run this command.

    certifai definition-create
  3. Enter information as prompted.

    The example below creates a scan definition for a trust scan using a German Credit dataset provided in the toolkit:

    ? What type of scan definition would like you to generate? trust scan
    ? Model use case ID test
    ? Learning Task Type binary-classification
    ? Dataset file type csv
    ? Evaluation Dataset URL (use 'file:' prefix for local system files) file:examples/datasets/german_credit_eval.csv
    ? CSV dataset headers are present Yes
    ? Infer one hot encoded columns based on feature names (select None to not infer one-hot encoded columns) None
    ? Infer the possible values taken by categorical features from the data (only recommended when generating from a large representative dataset) No
    Scan definition created at: scan_definition.yaml
    The following integer-valued features were inferred to be categoricals:
    installment
    residence
    cards
    liable
    outcome
    The following integer-valued features were inferred to be numeric:
    duration
    amount

    The command generates the scan_definition.yaml file using the information you enter and analyzing the schema of the dataset you provide. Included in the output is summary information that notes inferences made about features in your dataset.

    The prompts ask for:

    • The type of scan definition you want to generate (trust scan)
    • Information about your use case
    • The file path to an evaluation dataset to use for the scan
    • The method for inferring one-hot encoded columns within your dataset.
    • Whether to include assertions about the possible values categorical features may take in the scan, based on those present in the data used

    The trust scan type generates scan reports for: robustness, fairness, explainability, and performance evaluations in your definition; whereas, the explanation type provides only an explanation evaluation.

    The penultimate asks how to infer one-hot encoded columns within your dataset.

    The possible responses are:

    • None: use if your dataset does not include one-hot encoded columns or if the columns names don't match either the feature_value (pandas) or feature.value naming convention. After your definition is created, you should manually edit the scan definition file with any one-hot encoding information.

    • feature_value (pandas): use if your dataset includes one-hot encoded columns that match the naming convention feature_value, there are multiple columns containing only 0/1 values, and column names have a common prefix ending with an underscore (_). In this case the feature name is inferred to be the common prefix before the terminating underscore, and the value is the remaining string.

    • feature.value: use if if your dataset includes one-hot encoded columns that match the naming convention feature.value and columns are encoded similarly to feature_values (pandas), but with the separating character being a dot (.) rather than an underscore (_).

    The last prompt asks whether to attempt to infer the set of values that may be taken by categorical features from the provided data. This is only recommended if you are generating the definition from a large and fully representative dataset

  4. Edit the scan definition file in a text editor to add information about the use case. See the comments in the generated definition for guidance on how to update the fields.

    For the German Credit example, the minimum changes you need to make to run a scan in this use case are:

  • fairness_grouping_features: Change the list of feature names to match the actual features you want to analyze for fairness. In this case, replace the list with status. The feature names are listed in the generated feature_schemas section.

  • outcome_column: If the example has a column containing the ground truth, uncomment and specify the feature name, which in this case is outcome.

  • prediction_values: Change the value fields to the values returned in the prediction, and the name fields to be suitable human-readable labels for the values. (In this use case, 1 indicates the loan is granted and 2 indicates it was denied.)

  • prediction_description: The guiding question of the use case models; what the models are predicting.

  • predict_endpoint: Change to the URL for the model, in this case verify that it is http://127.0.0.1:8551/predict.

    For the German Credit example, the fields in the definition should look similar to:

    models:
    - ...
    predict_endpoint: http://localhost:8551/predict
    dataset_schema:
    outcome_column: outcome
    feature_schemas:
    - ...
    evaluation:
    ...
    fairness_grouping_features:
    - name: 'status'
    ...
    prediction_description: Is a loan granted?
    prediction_values:
    - favorable: true
    name: Loan granted
    value: 1
    - favorable: false
    name: Loan denied
    value: 2

You can see a full version of the definition in the toolkit folder in examples/definitions/german_credit_scanner_definition.yaml. This example has multiple models and multiple fairness grouping features.

Validate the scan definition

Validate that the fields in your scan definition match the expected schema:

certifai definition-validate -f scan_definition.yaml

If the definition validates successfully, the output is:

Successfully validated scan definition

If the definition fails validation, the errors are described.

For example, the following is displayed if you do not change the default generated fairness_grouping_features to match the features in your dataset:

Validation failed, Validation errors found, Evaluation validation failed:
Unknown feature 'feature 1' specified in fairness_grouping_features,
Unknown feature 'feature 2' specified in fairness_grouping_features

Test connecting to the model endpoint

Test that your definition works with your model.

certifai definition-test -f scan_definition.yaml

For example, if the model server is not running or the URL is incorrect, the test results in connection errors similar to:

...
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8551): Max retries exceeded with url: /predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce99421fd0>: Failed to establish a new connection: [Errno 61] Connection refused',))
Prediction test failed for 'ml_model' model against dataset 'eval': HTTPConnectionPool(host='localhost', port=8551): Max retries exceeded with url: /predict (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fce99421fd0>: Failed to establish a new connection: [Errno 61] Connection refused',))

If you get this error, check the endpoint in the scan definition and check that the model is running, then run the test again.

If you get the following error:

ValueError: Model failed to evaluate batch (20).
Prediction test failed for 'ml_model' model against dataset 'eval': Model failed to evaluate batch (20).

check that you have updated the outcome_column as described above. The above error is returned by the reference model when too many fields are passed in the predict call.

When the test is successful, the output is similar to the following:

Prediction test successful for 'ml_model' model against 'eval' dataset

Run your scan

You are now ready to run your scan:

certifai scan -f scan_definition.yaml -o <directory to place reports in>

NOTE: -o may be omitted. If it is, the following pathways are used for storing scan results in the order presented:

  • If -o is not specified, the result are sent to the SCAN_RESULTS_DIRECTORY if that is defined as a locally-set environment variable.
  • If neither the -o variable nor the SCAN_RESULTS_DIRECTORY have been configured, the result are sent to the output path if that is specified in the scan definition.
  • If neither the -o variable nor the SCAN_RESULTS_DIRECTORYhave been configured and no output path has been defined in the scan definition, then the reports are sent to the default location, ./reports.

View the scan results visualizations

View your result visualization in your local Console.