Version: 1.3.14.1

Scan Verification

Follow the steps below to run a verification scan job using the Certifai CLI on Enterprise edition of Cortex Certifai. ​

Prerequisites

  • A Kubernetes cluster with the Cortex Certifai Operator installed. (See OpenShift Setup) ​
  • The Certifai Reference Models is enabled in your Kubernetes cluster - it can be disabled at any time to save resources
  • You have the following information provided by your administrator and based on the operator instance installation.
    • Console URL and login credentials (This can be found in the RHOS platform - In the left navigation panel click Network -> Routes.)
    • Project name
    • Login token
    • Either the Google Cloud Storage, Azure Blob storage, or S3 compatible storage where Cortex Certifai scan results will be stored
  • The Certifai toolkit has been downloaded and installed locally
    • The following datasets have been downloaded with the Toolkit and are stored in the certifai_toolkit/examples/datasets folder in your local drive.
      • german_credit_explan.csv
      • german_credit_eval.csv
      • german_credit_test.csv
  • You have imported the configuration into your Certifai toolkit.

Upload the dataset files to remote storage

Datasets must be located in a remote storage location accessible, with the storage credentials that were configured for your installation, in order to be used for a remote scan.

  1. Locate the following datasets that were included in the Toolkit download, under certifai_toolkit/examples/datasets:

    • german_credit_explan.csv
    • german_credit_eval.csv
    • german_credit_test.csv
  2. Move the datasets to a remote storage location accessible from your cluster, such as the same bucket/blob that the Certifai Console's is configured to read reports from.

    For example, if the Certifai Console in your cluster was configured with a scan-dir (Scan Directory) of: s3://certifai-test01/reports, you can create a datasets folder within the certifai-test01 bucket and upload the datasets there, e.g. s3://certifai-test01/datasets.

Create the scan definition file

  1. Copy and paste the following YAML into a text editor and save the file as german_credit_remote_scanner_definition.yaml on your local drive.
model_use_case:
atx_performance_metric_name: Accuracy
author: info@cognitivescale.com
description: 'In this use case, each entry in the dataset represents a person who
takes a credit loan from a bank. The learning task is to classify each person
as either a good or bad credit risk according to the set of attributes.
This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit
The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29'
model_use_case_id: c12e/datasciencelab/german_credit
name: 'Banking: Loan Approval'
performance_metrics:
- metric: Accuracy
name: Accuracy
- metric: Recall
name: Recall
- metric: Precision
name: Precision
task_type: binary-classification
evaluation:
description: This evaluation evaluates the robustness, accuracy, fairness and explanations
for a single candidate models.
evaluation_dataset_id: eval
evaluation_types:
- robustness
- fairness
- explanation
- explainability
- performance
explanation_dataset_id: explan
test_dataset_id: test
fairness_grouping_features:
- name: age
- name: status
feature_restrictions:
- feature_name: age
restriction_string: no changes
- feature_name: status
restriction_string: no changes
name: Baseline evaluation
prediction_description: Will a loan be granted?
prediction_favorability: explicit
prediction_values:
- favorable: true
name: Loan Granted
value: 1
- favorable: false
name: Loan Denied
value: 2
models:
- author: ''
description: Scikit-learn DecisionTreeClassifier using entropy criterion
model_id: dtree
name: Decision Tree
predict_endpoint: http://certifai-ref-models.<FILL_ME>.svc.cluster.local:5111/german_credit_dtree/predict
datasets:
- dataset_id: eval
description: 1000 row representative sample of the full dataset
file_type: csv
has_header: true
name: Evaluation dataset
url: <FILL_ME>/german_credit_eval.csv
- dataset_id: explan
description: ''
file_type: csv
has_header: true
name: 100 row explanation dataset
url: <FILL_ME>/german_credit_explan.csv
- dataset_id: test
description: 301 row test dataset
file_type: csv
has_header: true
name: Test dataset
url: <FILL_ME>/german_credit_test.csv
dataset_schema:
feature_schemas:
- feature_name: age
- feature_name: status
- feature_name: foreign
outcome_column: outcome
  1. Edit the scan definition in your text editor as described below:
  • The url for each entry under the datasets section needs to be updated to refer to the remote storage used earlier. (NOTE: There are 3 instances of this that must be modified within in the file.) Replace each of the <FILL_ME> texts with your remote storage path. For example, if you uploaded your datasets to: s3://certifai-test01/datasets/, the urls should be:

    url: s3://certifai-test01/datasets/german_credit_eval.csv
    url: s3://certifai-test01/datasets/german_credit_explan.csv
    url: s3://certifai-test01/datasets/german_credit_test.csv
  • The predict_endpoint under the models section needs to be updated to refer to the certifai-ref-models service running in your cluster. Replace the <FILL_ME> text with your cluster namespace. For example, if your cluster namespace is certifai-test01, then the predict_endpoint should be:

    predict_endpoint: http://certifai-ref-models.certifai-test01.svc.cluster.local:5111

    NOTE: The certifai-reference-models need to be enabled within your cluster to run this scan.

Run the remote scan job

  1. In your terminal or PowerShell window run the following command to start your scan:

    certifai remote scan -f german_credit_remote_scanner_definition.yaml
  2. Optionally, you can manage the remote job through the CLI.

  3. Verify reports have been added to the use case in the remote Console.

    • Navigate to the remote Certifai Console, you may have to login depending on your clusters configuration
    • Refresh the remote Certifai Console and verify the new scan results are displayed.
    • In the row of the Use Case (Banking: Loan Approval) click the menu icon on the far right and select SCAN DETAILS.
    • A scan with the name and date of this process is listed when the scan report is complete.
    • Click VIEW to see the report visualizations.