Version: 1.3.14.1

GCP: Run remote scans

Follow the steps below to run a scan job in Certifai Pro on AWS.

Prerequisites

  • You have downloaded the certifai-kubeconfig.json file.
  • You have imported the configuration into your Certifai toolkit.
  • A folder - certifai_assets - has been created in your local drive where you store scan definition files and datasets for easy access.
  • The following Datasets have been downloaded with the Toolkit and are stored in the certifai_toolkit/examples/datasets folder in your local drive.
    • german_credit_explan.csv
    • german_credit_eval.csv
  • The following Scan Definition file has been downloaded with the Toolkit and is stored in the certifai_toolkit/examples/definitions:
    • german_credit_scanner_definition.yaml

Define scan config files and move to GCS bucket

  1. Copy and paste the german_credit_scanner_definition.yaml file into a text editor window where you can make changes.

  2. Save this file to a folder named definitions that you must create inside your certifai_assets folder.

    model_use_case:
    atx_performance_metric_name: Accuracy
    author: info@cognitivescale.com
    description: 'In this use case, each entry in the dataset represents a person who takes a credit loan from a bank. The learning task is to classify each person as either a good or bad credit risk according to the set of attributes.
    This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit. The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29.'
    model_use_case_id: c12e/datasciencelab/german_credit
    name: 'Banking: Loan Approval'
    performance_metrics:
    - metric: Accuracy
    name: Accuracy
    task_type: binary-classification
    evaluation:
    description: This evaluation compares the robustness, accuracy, fairness and explanations for 4 candidate models.
    evaluation_dataset_id: eval
    evaluation_types:
    - fairness
    explanation_dataset_id: explan
    test_dataset_id: eval
    fairness_grouping_features:
    - name: age
    - name: status
    - name: foreign
    feature_restrictions:
    - feature_name: age
    restriction_string: no changes
    - feature_name: status
    restriction_string: no changes
    - feature_name: foreign
    restriction_string: no changes
    name: Baseline evaluation of 4 models
    prediction_description: Will a loan be granted?
    prediction_values:
    - favorable: true
    name: Loan Granted
    value: 1
    - favorable: false
    name: Loan Denied
    value: 2
    models:
    - author: ''
    description: Scikit-learn LogisticRegression classifier using lbfgs solver
    model_id: svm
    name: Logistic Regression
    predict_endpoint: http://certifai-ref-models.certifai.svc.cluster.local:5111/german_credit_logit/predict
    datasets:
    - dataset_id: eval
    description: 1000 row representative sample of the full dataset
    file_type: csv
    has_header: true
    name: Evaluation dataset
    url: gs://<scan-directory-name>/datasets/german_credit_eval.csv
    - dataset_id: explan
    description: ''
    file_type: csv
    has_header: true
    name: 100 row explanation dataset
    url: gs://<scan-directory-name>/datasets/german_credit_explan.csv
    dataset_schema:
    feature_schemas:
    - feature_name: age
    - feature_name: status
    - feature_name: foreign
    outcome_column: outcome
  3. Edit the following fields in the text editor window:

    • datasets: url: (NOTE: There are 2 instances of this that must be modified in the file.) <scan-directory-name> in the example URL below must be changed to match the Scan Directory Name that was created during Console configuration.)
url: gs://<scan-directory-name>/datasets/german_credit_explan.csv

and

url: gs://<scan-directory-name>/datasets/german_credit_eval.csv
  1. Save this file in the certifai_assets/definitions folder with the job definition file.

  2. Copy following datasets from certifai_toolkit/example/datasets to a folder named datasets that you must create inside your certifai_assets folder (created as a prerequisite).

  3. Move datasets to your GCP storage bucket (Scan Directory):

    • dataset: german_credit_explan.csv that was included with the toolkit (certifai_toolkit/examples/datasets)
    • dataset: german_credit_eval.csv that was included with the toolkit (certifai_toolkit/examples/datasets)

Run the remote scan job

  1. In a new terminal or PowerShell window, run the following command to start your scan job:
certifai remote scan -m svm -o gs://<scan-directory-name> -f gs://<scan-directory-name>/certifai_assets/definitions/german_credit_scanner_definition.yaml
  1. Optionally, you can manage the remote job through the CLI.

  2. Verify reports have been added to the Use Case in the remote Console.

    • a. In a browser window (Chrome is recommended) enter the https://<Public IP address of your Certifai VM>. (A warning message may be displayed telling you that the connection is not private. Click on the link that exposes the Advanced settings. Click the link at the bottom that says "Proceed to <IP address>".)
    • b. Login using the password that was created during Console configuration. (NOTE: Do NOT change the user name from certifai)
    • c. In the row of the Use Case (Banking: Loan Approval) click the menu icon on the far right and select SCAN DETAILS.
    • d. A scan with the name and date of this process is listed when the scan report is complete.
    • e. Click VIEW to see the report visualizations.