AWS: Run remote scans
Follow the steps below to run a scan job in Certifai Pro on AWS.
Prerequisites
- You have downloaded the
certifai-kubeconfig.json
file. - You have imported the configuration into your Certifai toolkit.
- A folder -
certifai_assets
- has been created in your local drive where you store scan definition files and datasets for easy access. - The following Datasets have been downloaded with the Toolkit and are stored in the
certifai_toolkit/examples/datasets
folder in your local drive.german_credit_explan.csv
german_credit_eval.csv
- The following Scan Definition file has been downloaded with the Toolkit and is stored in the
certifai_toolkit/examples/definitions
:german_credit_scanner_definition.yaml
Define scan config files and move to blob storage
Copy and paste the
german_credit_scanner_definition.yaml
file into a text editor window where you can make changes.Save this file to a folder named
definitions
that you must create inside yourcertifai_assets
folder.model_use_case:atx_performance_metric_name: Accuracyauthor: info@cognitivescale.comdescription: 'In this use case, each entry in the dataset represents a person who takes a credit loan from a bank. The learning task is to classify each person as either a good or bad credit risk according to the set of attributes.This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit. The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29.'model_use_case_id: c12e/datasciencelab/german_creditname: 'Banking: Loan Approval'performance_metrics:- metric: Accuracyname: Accuracytask_type: binary-classificationevaluation:description: This evaluation compares the robustness, accuracy, fairness and explanations for 4 candidate models.evaluation_dataset_id: evalevaluation_types:- fairnessexplanation_dataset_id: explantest_dataset_id: evalfairness_grouping_features:- name: age- name: status- name: foreignfeature_restrictions:- feature_name: agerestriction_string: no changes- feature_name: statusrestriction_string: no changes- feature_name: foreignrestriction_string: no changesname: Baseline evaluation of 4 modelsprediction_description: Will a loan be granted?prediction_values:- favorable: truename: Loan Grantedvalue: 1- favorable: falsename: Loan Deniedvalue: 2models:- author: ''description: Scikit-learn LogisticRegression classifier using lbfgs solvermodel_id: svmname: Logistic Regressionpredict_endpoint: http://certifai-ref-models.certifai.svc.cluster.local:5111/german_credit_logit/predictdatasets:- dataset_id: evaldescription: 1000 row representative sample of the full datasetfile_type: csvhas_header: truename: Evaluation dataseturl: s3://<scan-directory-name>/datasets/german_credit_eval.csv- dataset_id: explandescription: ''file_type: csvhas_header: truename: 100 row explanation dataseturl: s3://<scan-directory-name>/datasets/german_credit_explan.csvdataset_schema:feature_schemas:- feature_name: age- feature_name: status- feature_name: foreignoutcome_column: outcomeEdit the following fields in the text editor window:
datasets: url:
(NOTE: There are 2 instances of this that must be modified in the file.)<scan-directory-name>
in the example URL below must be changed to match the Scan Directory Name that was created during Console configuration.)
url: s3://<scan-directory-name>/datasets/german_credit_explan.csv
and
url: s3://<scan-directory-name>/datasets/german_credit_eval.csv
Save this file in the
certifai_assets/definitions
folder with the job definition file.Copy following datasets from
certifai_toolkit/example/datasets
to a folder nameddatasets
that you must create inside yourcertifai_assets
folder (created as a prerequisite).Move datasets to your S3 storage bucket (Scan Directory):
- dataset:
german_credit_explan.csv
that was included with the toolkit (certifai_toolkit/examples/datasets
) - dataset:
german_credit_eval.csv
that was included with the toolkit (certifai_toolkit/examples/datasets
)
- dataset:
Run the remote scan job
- In a new terminal or PowerShell window, run the following command to start your scan job:
certifai remote scan -m svm -o s3://<scan-directory-name> -f s3://<scan-directory-name>/certifai_assets/definitions/german_credit_scanner_definition_reduced_aws_fulldataset.yaml
Info
Please be patient. It may take up to 120 minutes for a job to run through to completion depending upon your connection speed.
Optionally, you can manage the remote job through the CLI.
Verify reports have been added to the Use Case in the remote Console.
- a. In a browser window (Chrome is recommended) enter the
https://<Public IP address of your Certifai VM>
. (A warning message may be displayed telling you that the connection is not private. Click on the link that exposes the Advanced settings. Click the link at the bottom that says "Proceed to <IP address>".) - b. Login using the password that was created during Console configuration. (NOTE: Do NOT change the user name from
certifai
) - c. In the row of the Use Case (Banking: Loan Approval) click the menu icon on the far right and select
SCAN DETAILS
. - d. A scan with the name and date of this process is listed when the scan report is complete.
- e. Click
VIEW
to see the report visualizations.
- a. In a browser window (Chrome is recommended) enter the