Scan Verification
Follow the steps below to run a verification scan job using the Certifai CLI on Enterprise edition of Cortex Certifai.
Prerequisites
- A Kubernetes cluster with the Cortex Certifai Operator installed. (See OpenShift Setup)
- The Certifai Reference Models is enabled in your Kubernetes cluster - it can be disabled at any time to save resources
- You have the following information provided by your administrator and based on the operator instance installation.
- Console URL and login credentials (This can be found in the RHOS platform - In the left navigation panel click Network -> Routes.)
- Project name
- Login token
- Either the Google Cloud Storage, Azure Blob storage, or S3 compatible storage where Cortex Certifai scan results will be stored
- The Certifai toolkit has been downloaded and installed locally
- The following datasets have been downloaded with the Toolkit and are stored in the
certifai_toolkit/examples/datasets
folder in your local drive.german_credit_explan.csv
german_credit_eval.csv
german_credit_test.csv
- The following datasets have been downloaded with the Toolkit and are stored in the
- You have imported the configuration into your Certifai toolkit.
Upload the dataset files to remote storage
Datasets must be located in a remote storage location accessible, with the storage credentials that were configured for your installation, in order to be used for a remote scan.
Locate the following datasets that were included in the Toolkit download, under
certifai_toolkit/examples/datasets
:german_credit_explan.csv
german_credit_eval.csv
german_credit_test.csv
Move the datasets to a remote storage location accessible from your cluster, such as the same bucket/blob that the Certifai Console's is configured to read reports from.
For example, if the Certifai Console in your cluster was configured with a
scan-dir
(Scan Directory) of:s3://certifai-test01/reports
, you can create adatasets
folder within thecertifai-test01
bucket and upload the datasets there, e.g.s3://certifai-test01/datasets
.
Create the scan definition file
- Copy and paste the following YAML into a text editor and save the file as
german_credit_remote_scanner_definition.yaml
on your local drive.
model_use_case: atx_performance_metric_name: Accuracy author: info@cognitivescale.com description: 'In this use case, each entry in the dataset represents a person who takes a credit loan from a bank. The learning task is to classify each person as either a good or bad credit risk according to the set of attributes.
This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit
The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29'
model_use_case_id: c12e/datasciencelab/german_credit name: 'Banking: Loan Approval' performance_metrics: - metric: Accuracy name: Accuracy - metric: Recall name: Recall - metric: Precision name: Precision task_type: binary-classification
evaluation: description: This evaluation evaluates the robustness, accuracy, fairness and explanations for a single candidate models. evaluation_dataset_id: eval evaluation_types: - robustness - fairness - explanation - explainability - performance explanation_dataset_id: explan test_dataset_id: test fairness_grouping_features: - name: age - name: status feature_restrictions: - feature_name: age restriction_string: no changes - feature_name: status restriction_string: no changes name: Baseline evaluation prediction_description: Will a loan be granted? prediction_favorability: explicit prediction_values: - favorable: true name: Loan Granted value: 1 - favorable: false name: Loan Denied value: 2
models:- author: '' description: Scikit-learn DecisionTreeClassifier using entropy criterion model_id: dtree name: Decision Tree predict_endpoint: http://certifai-ref-models.<FILL_ME>.svc.cluster.local:5111/german_credit_dtree/predict
datasets:- dataset_id: eval description: 1000 row representative sample of the full dataset file_type: csv has_header: true name: Evaluation dataset url: <FILL_ME>/german_credit_eval.csv- dataset_id: explan description: '' file_type: csv has_header: true name: 100 row explanation dataset url: <FILL_ME>/german_credit_explan.csv- dataset_id: test description: 301 row test dataset file_type: csv has_header: true name: Test dataset url: <FILL_ME>/german_credit_test.csv
dataset_schema: feature_schemas: - feature_name: age - feature_name: status - feature_name: foreign outcome_column: outcome
- Edit the scan definition in your text editor as described below:
The
url
for each entry under thedatasets
section needs to be updated to refer to the remote storage used earlier. (NOTE: There are 3 instances of this that must be modified within in the file.) Replace each of the<FILL_ME>
texts with your remote storage path. For example, if you uploaded your datasets to:s3://certifai-test01/datasets/
, the urls should be:url: s3://certifai-test01/datasets/german_credit_eval.csvurl: s3://certifai-test01/datasets/german_credit_explan.csvurl: s3://certifai-test01/datasets/german_credit_test.csvThe
predict_endpoint
under themodels
section needs to be updated to refer to thecertifai-ref-models
service running in your cluster. Replace the<FILL_ME>
text with your cluster namespace. For example, if your cluster namespace iscertifai-test01
, then thepredict_endpoint
should be:predict_endpoint: http://certifai-ref-models.certifai-test01.svc.cluster.local:5111NOTE: The certifai-reference-models need to be enabled within your cluster to run this scan.
Run the remote scan job
In your terminal or PowerShell window run the following command to start your scan:
certifai remote scan -f german_credit_remote_scanner_definition.yamlInfo
Please be patient. It may take up to 120 minutes for a job to run through to completion depending on your cluster resources and configuration.
Optionally, you can manage the remote job through the CLI.
Verify reports have been added to the use case in the remote Console.
- Navigate to the remote Certifai Console, you may have to login depending on your clusters configuration
- Refresh the remote Certifai Console and verify the new scan results are displayed.
- In the row of the Use Case (Banking: Loan Approval) click the menu icon on the far right and select
SCAN DETAILS
. - A scan with the name and date of this process is listed when the scan report is complete.
- Click
VIEW
to see the report visualizations.