Version: 1.3.14

Azure: Run remote scans

Follow the steps below to run a scan job in Certifai Pro on Azure.

Prerequisites

  • You have downloaded the certifai-kubeconfig.json file.

  • You have imported the configuration into your Certifai toolkit.

  • A folder - certifai_assets - has been created in your local drive where you store scan definition files and datasets for easy access.

  • Datasets from the certifai_toolkit/examples folder in your local drive. (These are created when you download and install the Toolkit).

    • german_credit_explan.csv
    • german_credit_eval.csv
  • Downloaded a copy of the scan definition example file german_credit_scanner_definition.yaml, to be used as a template.

  • Copy and paste the file into a text editor window where you can make changes and save to your local drive.

    model_use_case:
    atx_performance_metric_name: Accuracy
    author: info@cognitivescale.com
    description: 'In this use case, each entry in the dataset represents a person who takes a credit loan from a bank. The learning task is to classify each person as either a good or bad credit risk according to the set of attributes.
    This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit. The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29.'
    model_use_case_id: c12e/datasciencelab/german_credit
    name: 'Banking: Loan Approval'
    performance_metrics:
    - metric: Accuracy
    name: Accuracy
    task_type: binary-classification
    evaluation:
    description: This evaluation compares the robustness, accuracy, fairness and explanations for 4 candidate models.
    evaluation_dataset_id: eval
    evaluation_types:
    - fairness
    explanation_dataset_id: explan
    test_dataset_id: eval
    fairness_grouping_features:
    - name: age
    - name: status
    - name: foreign
    feature_restrictions:
    - feature_name: age
    restriction_string: no changes
    - feature_name: status
    restriction_string: no changes
    - feature_name: foreign
    restriction_string: no changes
    name: Baseline evaluation of 4 models
    prediction_description: Will a loan be granted?
    prediction_values:
    - favorable: true
    name: Loan Granted
    value: 1
    - favorable: false
    name: Loan Denied
    value: 2
    models:
    - author: ''
    description: Scikit-learn LogisticRegression classifier using lbfgs solver
    model_id: svm
    name: Logistic Regression
    predict_endpoint: http://certifai-ref-models.certifai.svc.cluster.local:5111/german_credit_logit/predict
    datasets:
    - dataset_id: eval
    description: 1000 row representative sample of the full dataset
    file_type: csv
    has_header: true
    name: Evaluation dataset
    url: abfs://<scan-directory-name>/datasets/german_credit_eval.csv
    - dataset_id: explan
    description: ''
    file_type: csv
    has_header: true
    name: 100 row explanation dataset
    url: abfs://<scan-directory-name>/datasets/german_credit_explan.csv
    dataset_schema:
    feature_schemas:
    - feature_name: age
    - feature_name: status
    - feature_name: foreign
    outcome_column: outcome

Define scan config files and move to blob storage

  1. Save this file to a folder named definitions that you must create inside your certifai_assets folder (created as a prerequisite).

  2. Open the scan definition example file: german_credit_scanner_definition.yaml in a text editor and edit the following fields:

    • datasets: url: (NOTE: There are 2 instances of this that must be modified in the file.) <scan-directory-name> in the example URL below must be changed to match the Scan Directory Name that was created during Console Configuration.)
url: abfs://<scan-directory-name>/datasets/german_credit_explan.csv

and

url: abfs://<scan-directory-name>/datasets/german_credit_eval.csv
  1. Save this file in the certifai_assets/definitions folder with the job definition file.

  2. Copy following datasets from certifai_toolkit/example/datasets to a folder named datasets that you must create inside your certifai_assets folder.

  3. Move datasets to your Azure blob storage bucket (Scan Directory). There are several ways these files may be moved. This is one of them (NOTE: You will perform this operation 2 times once for each of the required files):

    • dataset: german_credit_explan.csv that was included with the toolkit (certifai_toolkit/examples/datasets)

    • dataset: german_credit_eval.csv that was included with the toolkit (certifai_toolkit/examples/datasets)

      • a. Change your terminal or PowerShell context to the folder where the file is located.
      • b. Copy and paste the command below into a text editor
      • c. Replace the variables with the details from the file you are moving to blob storage.
      • d. Copy and paste the command from the text editor to the terminal or PowerShell window and run it.
      • e. In your Azure portal go to the blob storage container and verify that file has been moved.
az storage blob upload \
--account-name <storage-account> \
--container-name <scan directory name> \
--name <folder/scan-definition-file-name.yaml> \
--file <scan-definition-file-name.yaml> \
--auth-mode key \
--account-key <access-key>

Example:

az storage blob upload \
--account-name mscottblob \
--container-name scans-rc2 \
--name definitions/diabetes_scanner_definition.yaml \
--file diabetes_scanner_definition.yaml \
--auth-mode key \
--account-key abcd

Run the remote scan job

  1. In a new terminal or PowerShell window, run the following command to start your job:

    certifai remote scan -m svm -o abfs://<scan-directory-name> -f abfs://certifai_assets/definitions/german_credit_scanner_definition.yaml
  2. Optionally, you can manage the remote job through the CLI.

  3. Verify reports have been added to the Use Case in the remote Console.

    • a. In a browser window (Chrome is recommended) enter the https://<Public IP address of your Certifai VM>. (A warning message may be displayed telling you that the connection is not private. Click on the link that exposes the Advanced settings. Click the link at the bottom that says "Proceed to <IP address>".)
    • b. Login using the password that was created during Console configuration. (NOTE: Do NOT change the user name from certifai)
    • c. In the row of the Use Case (Banking: Loan Approval) click the menu icon on the far right and select SCAN DETAILS.
    • d. A scan with the name and date of this process is listed when the scan report is complete.
    • e. Click VIEW to see the report visualizations.