Define performance metrics in a scan definition
Follow the instructions below to learn how to define performance metrics in a scan definition.
Prerequisites
- You have accepted the license agreement and downloaded the Toolkit .zip file to your local system.
- You have installed the reference models package to your local system.
Alert: Windows 10 Users
Before you a your scan, you must disable QuickEdit Mode
in your terminal window. Right click in the terminal window and uncheck the QuickEdit Mode
option.
If you do not disable this option, clicking off your terminal window and back into your terminal window will cause the window to freeze and you will not receive your scan results updates.
Tutorial instructions
Learn more about how Certifai uses Performance Metrics or view the current list of supported metrics here.
Set your working directory to the folder where your Certifai Toolkit was unzipped.
cd <toolkit-location>Activate the virtual environment you created for
certifai
when installing the Certifai CLI.conda activate certifaiCopy and save the following starter scan definition into a text file named
performance_tutorial_scan_definition.yaml
in theexamples/definitions
directory.scan:output:path: ../performance_tutorial_reportsmodel_use_case:description: 'In this use case, each entry in the dataset represents a person whotakes a credit loan from a bank. The learning task is to classify each personas either a good or bad credit risk according to the set of attributes.This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-creditThe original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29'model_use_case_id: c12e/datasciencelab/german_creditname: 'Banking: Loan Approval'task_type: binary-classificationevaluation:description: Example evaluation against a single model.evaluation_dataset_id: evalevaluation_types:- robustnessname: Example German Credit evaluationprediction_description: Will a loan be granted?prediction_favorability: explicitprediction_values:- favorable: truename: Loan Grantedvalue: 1- favorable: falsename: Loan Deniedvalue: 2models:- model_id: svmdescription: Scikit-learn SVC classifiername: Support Vector Classifierpredict_endpoint: http://127.0.0.1:5111/german_credit_svm/predictdatasets:- dataset_id: evaldescription: 1000 row representative sample of the full datasetfile_type: csvhas_header: truename: Evaluation dataseturl: file:../datasets/german_credit_eval.csvdataset_schema:outcome_column: outcomeThe above scan definition is a simplified version of the
german_credit_scanner_definition.yaml
file provided in the toolkit. This definition contains minimal information to only perform a robustness evaluation on a single model (made available by the reference model server).Add
Accuracy
,Recall
, and a"Custom Metric"
as performance metrics to the above scan definition, withAccuracy
selected as the ATX performance metric.Performance metrics are specified at the
model_use_case
level of a scan definition, under theperformance_metrics
field. Each metric can have ametric
field that specifies a supported metric for Certifai to calculate, and aname
field to identify the metric.Update the
model_use_case
section to contain the following list ofperformance_metrics
:performance_metrics:- metric: Accuracyname: Accuracy- metric: Recallname: Recall- name: "Custom Metric"Note: The
"Custom Metric"
in the above list does not have ametric
field because it is not a known metric that Certifai can calculate.Next set the
atx_performance_metric_name
toAccuracy
by adding the following to themodel_use_case
section.atx_performance_metric_name: AccuracyThe
model_use_case
section should look like the following before proceeding to the next step:model_use_case:description: 'In this use case, each entry in the dataset represents a person whotakes a credit loan from a bank. The learning task is to classify each personas either a good or bad credit risk according to the set of attributes.This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-creditThe original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29'model_use_case_id: c12e/datasciencelab/german_creditname: 'Banking: Loan Approval'task_type: binary-classificationperformance_metrics:- metric: Accuracyname: Accuracy- metric: Recallname: Recall- name: "Custom Metric"atx_performance_metric_name: AccuracyTo have Certifai calculate metrics as part of the Performance evaluation, add a test dataset under the
datasets
section of the scan definition.The test dataset must contain the testing data from the test/train split you performed on your starting dataset. Note that certain metrics require the test dataset to contain an outcome column.
A test dataset is included in the toolkit for the German Credit (Banking Loan Approval) use case at
examples/datasets/german_credit_test.csv
. Add the following dataset information to yourdatasets
section of the scan definition.- dataset_id: testdescription: 2741 row test datasetfile_type: csvhas_header: truename: Test dataseturl: file:../datasets/german_credit_test.csvThe
datasets
section should look like the following before proceeding to the next step:datasets:- dataset_id: evaldescription: 1000 row representative sample of the full datasetfile_type: csvhas_header: truename: Evaluation dataseturl: file:../datasets/german_credit_eval.csv- dataset_id: testdescription: 2741 row test datasetfile_type: csvhas_header: truename: Test dataseturl: file:../datasets/german_credit_test.csvNote: The dataset paths in the above YAML are relative to the location of the scan definition. The path are assuming your YAML file is located at
<toolkit-location>/examples/definitions
. You may have to adjust the dataset paths if you saved your scan definition in a different location.Update the
evaluation
section of the scan definition by adding "performance" as anevaluation_type
and setting thetest_dataset_id
field. Thetest_dataset_id
should match thedataset_id
of the dataset added in the previous step, in this casetest
.The
evaluation
section should look like the following before proceeding to the next step:evaluation:description: Example evaluation running performance and robustness report for a sample model.evaluation_dataset_id: evaltest_dataset_id: testevaluation_types:- robustness- performancename: Example Evaluation calculating performance metricsprediction_description: Will a loan be granted?prediction_favorability: explicitprediction_values:- favorable: truename: Loan Grantedvalue: 1- favorable: falsename: Loan Deniedvalue: 2Certifai can now calculate the
Accuracy
andRecall
based on the scan definition. However, because a"Custom Metric"
has been added to the scan definition, you must specify a metric value for each model.Under the
svm
model in themodels
section, add the following:performance_metric_values:- name: "Custom Metric"value: 0.784The
performance_metric_values
field is a list, where each entry containing aname
field that must match a name in metric defined in themodel_use_case
(see step 4) and avalue
field between 0 and 1. Each metric in this list acts as a default if the metric cannot be calculated by Certifai.The
models
section should look like the following before proceeding to the next step:models:- model_id: svmdescription: Scikit-learn SVC classifiername: Support Vector Classifierpredict_endpoint: http://127.0.0.1:5111/german_credit_svm/predictperformance_metric_values:- name: "Custom Metric"value: 0.784Open a new terminal and activate the virtual environment where you installed the reference model server.
conda activate certifai-reference-modelsThen start the reference model server.
startCertifaiModelServer(Optional) Validate & Test your scan definition before running a scan. Make sure to switch to the original terminal you were using for this tutorial, and save the scan definition you have been working on.
Validate that the scan definition is syntactically correct. If you encounter any errors, then make sure that you have correctly followed the steps above and updated your scan definition. If the validation is successful continue to testing your definition.
certifai definition-validate -f examples/definitions/performance_tutorial_scan_definition.yamlTest that the scan definition correctly connects to the model hosted by the reference model server. If you encounter any errors make sure that the reference model server is running and the model definition matches the result of step 7. If the test is successful continue to the next step.
certifai definition-test -f examples/definitions/performance_tutorial_scan_definition.yamlRun the scan:
certifai scan -f examples/definitions/performance_tutorial_scan_definition.yamlNote: The scan may take a few minutes to run to completion.
After the scan completes, you should see output similar to the following:
...Scan Completed====== Report Summary ======Total number of evaluations performed: 3Number of successful reports: 3Number of failed reports: 0Start the Certifai Console and navigate to the performance results for this scan.
certifai console examples/performance_tutorial_reportsThe Console is available at:
http://localhost:8000
. Copy this URL into a browser to view your scan result visualizations.The Console opens on the Use Case list page. Click the menu icon on the far right of the row with the model use case id
c12e_datasciencelab_german_credit
. Then click theScan List
button to view the list of scans for the model use case.From the Scan List page, find the row with the Scan ID of the scan you ran in step 10. Then click the menu icon on the far right of the row and click the
Results
button.Toggle to the Evaluation view (at the top right) and click the
Performance
graph and scroll down to thePerformance Metrics
section of the results page.For more information on navigating the console refer to here.
Hover your mouse over the individual graphs to view the corresponding performance metric value generated by Certifai.
Note: The results in your Console view may differ slightly from the images provided. The explanations below correspond to the results of the scan at the time of writing this tutorial.
According to the results, the Support Vector Classifier model has an Accuracy of 79.67, a Recall of 95.52, and the "Custom Metric" value of 78.40, which matches the value set in step 7. The overall performance score for the model is 79% because Accuracy was as specified as the ATX performance metric in step 4.