Version: 1.3.16

Define performance metrics in a scan definition

Follow the instructions below to learn how to define performance metrics in a scan definition.

Prerequisites

  1. You have accepted the license agreement and downloaded the Toolkit .zip file to your local system.
  2. You have installed the reference models package to your local system.

Tutorial instructions

Learn more about how Certifai uses Performance Metrics or view the current list of supported metrics here.

  1. Set your working directory to the folder where your Certifai Toolkit was unzipped.

    cd <toolkit-location>
  2. Activate the virtual environment you created for certifai when installing the Certifai CLI.

    conda activate certifai
  3. Copy and save the following starter scan definition into a text file named performance_tutorial_scan_definition.yaml in the examples/definitions directory.

    scan:
    output:
    path: ../performance_tutorial_reports
    model_use_case:
    description: 'In this use case, each entry in the dataset represents a person who
    takes a credit loan from a bank. The learning task is to classify each person
    as either a good or bad credit risk according to the set of attributes.
    This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit
    The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29'
    model_use_case_id: c12e/datasciencelab/german_credit
    name: 'Banking: Loan Approval'
    task_type: binary-classification
    evaluation:
    description: Example evaluation against a single model.
    evaluation_dataset_id: eval
    evaluation_types:
    - robustness
    name: Example German Credit evaluation
    prediction_description: Will a loan be granted?
    prediction_favorability: explicit
    prediction_values:
    - favorable: true
    name: Loan Granted
    value: 1
    - favorable: false
    name: Loan Denied
    value: 2
    models:
    - model_id: svm
    description: Scikit-learn SVC classifier
    name: Support Vector Classifier
    predict_endpoint: http://127.0.0.1:5111/german_credit_svm/predict
    datasets:
    - dataset_id: eval
    description: 1000 row representative sample of the full dataset
    file_type: csv
    has_header: true
    name: Evaluation dataset
    url: file:../datasets/german_credit_eval.csv
    dataset_schema:
    outcome_column: outcome

    The above scan definition is a simplified version of the german_credit_scanner_definition.yaml file provided in the toolkit. This definition contains minimal information to only perform a robustness evaluation on a single model (made available by the reference model server).

  4. Add Accuracy, Recall, and a "Custom Metric" as performance metrics to the above scan definition, with Accuracy selected as the ATX performance metric.

    Performance metrics are specified at the model_use_case level of a scan definition, under the performance_metrics field. Each metric can have a metric field that specifies a supported metric for Certifai to calculate, and a name field to identify the metric.

    Update the model_use_case section to contain the following list of performance_metrics:

    performance_metrics:
    - metric: Accuracy
    name: Accuracy
    - metric: Recall
    name: Recall
    - name: "Custom Metric"

    Note: The "Custom Metric" in the above list does not have a metric field because it is not a known metric that Certifai can calculate.

    Next set the atx_performance_metric_name to Accuracy by adding the following to the model_use_case section.

    atx_performance_metric_name: Accuracy

    The model_use_case section should look like the following before proceeding to the next step:

    model_use_case:
    description: 'In this use case, each entry in the dataset represents a person who
    takes a credit loan from a bank. The learning task is to classify each person
    as either a good or bad credit risk according to the set of attributes.
    This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit
    The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29'
    model_use_case_id: c12e/datasciencelab/german_credit
    name: 'Banking: Loan Approval'
    task_type: binary-classification
    performance_metrics:
    - metric: Accuracy
    name: Accuracy
    - metric: Recall
    name: Recall
    - name: "Custom Metric"
    atx_performance_metric_name: Accuracy
  5. To have Certifai calculate metrics as part of the Performance evaluation, add a test dataset under the datasets section of the scan definition.

    The test dataset must contain the testing data from the test/train split you performed on your starting dataset. Note that certain metrics require the test dataset to contain an outcome column.

    A test dataset is included in the toolkit for the German Credit (Banking Loan Approval) use case at examples/datasets/german_credit_test.csv. Add the following dataset information to your datasets section of the scan definition.

    - dataset_id: test
    description: 2741 row test dataset
    file_type: csv
    has_header: true
    name: Test dataset
    url: file:../datasets/german_credit_test.csv

    The datasets section should look like the following before proceeding to the next step:

    datasets:
    - dataset_id: eval
    description: 1000 row representative sample of the full dataset
    file_type: csv
    has_header: true
    name: Evaluation dataset
    url: file:../datasets/german_credit_eval.csv
    - dataset_id: test
    description: 2741 row test dataset
    file_type: csv
    has_header: true
    name: Test dataset
    url: file:../datasets/german_credit_test.csv

    Note: The dataset paths in the above YAML are relative to the location of the scan definition. The path are assuming your YAML file is located at <toolkit-location>/examples/definitions. You may have to adjust the dataset paths if you saved your scan definition in a different location.

  6. Update the evaluation section of the scan definition by adding "performance" as an evaluation_type and setting the test_dataset_id field. The test_dataset_id should match the dataset_id of the dataset added in the previous step, in this case test.

    The evaluation section should look like the following before proceeding to the next step:

    evaluation:
    description: Example evaluation running performance and robustness report for a sample model.
    evaluation_dataset_id: eval
    test_dataset_id: test
    evaluation_types:
    - robustness
    - performance
    name: Example Evaluation calculating performance metrics
    prediction_description: Will a loan be granted?
    prediction_favorability: explicit
    prediction_values:
    - favorable: true
    name: Loan Granted
    value: 1
    - favorable: false
    name: Loan Denied
    value: 2
  7. Certifai can now calculate the Accuracy and Recall based on the scan definition. However, because a "Custom Metric" has been added to the scan definition, you must specify a metric value for each model.

    Under the svm model in the models section, add the following:

    performance_metric_values:
    - name: "Custom Metric"
    value: 0.784

    The performance_metric_values field is a list, where each entry containing a name field that must match a name in metric defined in the model_use_case (see step 4) and a value field between 0 and 1. Each metric in this list acts as a default if the metric cannot be calculated by Certifai.

    The models section should look like the following before proceeding to the next step:

    models:
    - model_id: svm
    description: Scikit-learn SVC classifier
    name: Support Vector Classifier
    predict_endpoint: http://127.0.0.1:5111/german_credit_svm/predict
    performance_metric_values:
    - name: "Custom Metric"
    value: 0.784
  8. Open a new terminal and activate the virtual environment where you installed the reference model server.

    conda activate certifai-reference-models

    Then start the reference model server.

    startCertifaiModelServer
  9. (Optional) Validate & Test your scan definition before running a scan. Make sure to switch to the original terminal you were using for this tutorial, and save the scan definition you have been working on.

    Validate that the scan definition is syntactically correct. If you encounter any errors, then make sure that you have correctly followed the steps above and updated your scan definition. If the validation is successful continue to testing your definition.

    certifai definition-validate -f examples/definitions/performance_tutorial_scan_definition.yaml

    Test that the scan definition correctly connects to the model hosted by the reference model server. If you encounter any errors make sure that the reference model server is running and the model definition matches the result of step 7. If the test is successful continue to the next step.

    certifai definition-test -f examples/definitions/performance_tutorial_scan_definition.yaml
  10. Run the scan:

    certifai scan -f examples/definitions/performance_tutorial_scan_definition.yaml

    Note: The scan may take a few minutes to run to completion.

    After the scan completes, you should see output similar to the following:

    ...
    Scan Completed
    ====== Report Summary ======
    Total number of evaluations performed: 3
    Number of successful reports: 3
    Number of failed reports: 0
  11. Start the Certifai Console and navigate to the performance results for this scan.

    certifai console examples/performance_tutorial_reports

    The Console is available at: http://localhost:8000. Copy this URL into a browser to view your scan result visualizations.

  12. The Console opens on the Use Case list page. Click the menu icon on the far right of the row with the model use case id c12e_datasciencelab_german_credit. Then click the Scan List button to view the list of scans for the model use case.

    Use Case List

  13. From the Scan List page, find the row with the Scan ID of the scan you ran in step 10. Then click the menu icon on the far right of the row and click the Results button.

    Scan List

  14. Toggle to the Evaluation view (at the top right) and click the Performance graph and scroll down to the Performance Metrics section of the results page.

    Results Page

    For more information on navigating the console refer to here.

  15. Hover your mouse over the individual graphs to view the corresponding performance metric value generated by Certifai.

    Note: The results in your Console view may differ slightly from the images provided. The explanations below correspond to the results of the scan at the time of writing this tutorial.

    Performance Results

    According to the results, the Support Vector Classifier model has an Accuracy of 79.67, a Recall of 95.52, and the "Custom Metric" value of 78.40, which matches the value set in step 7. The overall performance score for the model is 79% because Accuracy was as specified as the ATX performance metric in step 4.