Version: 1.3.14

Scan Verification

Follow the steps below to run a verification scan job using the Certifai CLI on Enterprise edition of Cortex Certifai.

Prerequisites

A Kubernetes cluster with the Cortex Certifai Operator installed. (See OpenShift Setup)
The Certifai Reference Models is enabled in your Kubernetes cluster - it can be disabled at any time to save resources
You have the following information provided by your administrator and based on the operator instance installation.
- Console URL and login credentials (This can be found in the RHOS platform - In the left navigation panel click Network -> Routes.)
- Project name
- Login token
- Either the Google Cloud Storage, Azure Blob storage, or S3 compatible storage where Cortex Certifai scan results will be stored
The Certifai toolkit has been downloaded and installed locally
- The following datasets have been downloaded with the Toolkit and are stored in the certifai_toolkit/examples/datasets folder in your local drive.
  - german_credit_explan.csv
  - german_credit_eval.csv
  - german_credit_test.csv
You have imported the configuration into your Certifai toolkit.

Upload the dataset files to remote storage

Datasets must be located in a remote storage location accessible, with the storage credentials that were configured for your installation, in order to be used for a remote scan.

Locate the following datasets that were included in the Toolkit download, under certifai_toolkit/examples/datasets:
- german_credit_explan.csv
- german_credit_eval.csv
- german_credit_test.csv
Move the datasets to a remote storage location accessible from your cluster, such as the same bucket/blob that the Certifai Console's is configured to read reports from.
For example, if the Certifai Console in your cluster was configured with a scan-dir (Scan Directory) of: s3://certifai-test01/reports, you can create a datasets folder within the certifai-test01 bucket and upload the datasets there, e.g. s3://certifai-test01/datasets.

Create the scan definition file

Copy and paste the following YAML into a text editor and save the file as german_credit_remote_scanner_definition.yaml on your local drive.

model_use_case:
  atx_performance_metric_name: Accuracy
  author: info@cognitivescale.com
  description: 'In this use case, each entry in the dataset represents a person who
    takes a credit loan from a bank. The learning task is to classify each person
    as either a good or bad credit risk according to the set of attributes.

    This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit

    The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29'

  model_use_case_id: c12e/datasciencelab/german_credit
  name: 'Banking: Loan Approval'
  performance_metrics:
  - metric: Accuracy
    name: Accuracy
  - metric: Recall
    name: Recall
  - metric: Precision
    name: Precision
  task_type: binary-classification

evaluation:
  description: This evaluation evaluates the robustness, accuracy, fairness and explanations
    for a single candidate models.
  evaluation_dataset_id: eval
  evaluation_types:
  - robustness
  - fairness
  - explanation
  - explainability
  - performance
  explanation_dataset_id: explan
  test_dataset_id: test
  fairness_grouping_features:
    - name: age
    - name: status
  feature_restrictions:
  - feature_name: age
    restriction_string: no changes
  - feature_name: status
    restriction_string: no changes
  name: Baseline evaluation
  prediction_description: Will a loan be granted?
  prediction_favorability: explicit
  prediction_values:
  - favorable: true
    name: Loan Granted
    value: 1
  - favorable: false
    name: Loan Denied
    value: 2

models:
- author: ''
  description: Scikit-learn DecisionTreeClassifier using entropy criterion
  model_id: dtree
  name: Decision Tree
  predict_endpoint: http://certifai-ref-models.<FILL_ME>.svc.cluster.local:5111/german_credit_dtree/predict


datasets:
- dataset_id: eval
  description: 1000 row representative sample of the full dataset
  file_type: csv
  has_header: true
  name: Evaluation dataset
  url: <FILL_ME>/german_credit_eval.csv
- dataset_id: explan
  description: ''
  file_type: csv
  has_header: true
  name: 100 row explanation dataset
  url: <FILL_ME>/german_credit_explan.csv
- dataset_id: test
  description: 301 row test dataset
  file_type: csv
  has_header: true
  name: Test dataset
  url: <FILL_ME>/german_credit_test.csv

dataset_schema:
  feature_schemas:
  - feature_name: age
  - feature_name: status
  - feature_name: foreign
  outcome_column: outcome

Edit the scan definition in your text editor as described below:

The url for each entry under the datasets section needs to be updated to refer to the remote storage used earlier. (NOTE: There are 3 instances of this that must be modified within in the file.) Replace each of the <FILL_ME> texts with your remote storage path. For example, if you uploaded your datasets to: s3://certifai-test01/datasets/, the urls should be:
```
url: s3://certifai-test01/datasets/german_credit_eval.csv
```
```
url: s3://certifai-test01/datasets/german_credit_explan.csv
```
```
url: s3://certifai-test01/datasets/german_credit_test.csv
```
The predict_endpoint under the models section needs to be updated to refer to the certifai-ref-models service running in your cluster. Replace the <FILL_ME> text with your cluster namespace. For example, if your cluster namespace is certifai-test01, then the predict_endpoint should be:
```
predict_endpoint: http://certifai-ref-models.certifai-test01.svc.cluster.local:5111
```
NOTE: The certifai-reference-models need to be enabled within your cluster to run this scan.

Run the remote scan job

In your terminal or PowerShell window run the following command to start your scan:
```
certifai remote scan -f german_credit_remote_scanner_definition.yaml
```
Info
Please be patient. It may take up to 120 minutes for a job to run through to completion depending on your cluster resources and configuration.
Optionally, you can manage the remote job through the CLI.
Verify reports have been added to the use case in the remote Console.
- Navigate to the remote Certifai Console, you may have to login depending on your clusters configuration
- Refresh the remote Certifai Console and verify the new scan results are displayed.
- In the row of the Use Case (Banking: Loan Approval) click the menu icon on the far right and select SCAN DETAILS.
- A scan with the name and date of this process is listed when the scan report is complete.
- Click VIEW to see the report visualizations.

#Prerequisites

#Upload the dataset files to remote storage

#Create the scan definition file

#Run the remote scan job

Info

Prerequisites

Upload the dataset files to remote storage

Create the scan definition file

Run the remote scan job