Version: 1.3.15

Get Started in Databricks Notebooks

Prerequisites

Install the Certifai Toolkit Libraries on a Databricks Cluster

Certifai Toolkit packages can be installed on a Databricks cluster in three ways, using:

The following instructions are to install the Certifai toolkit as workspace libraries.

  1. Set your current working directory to the folder where you unzipped the contents of the Certifai Toolkit .zip during the download process.

    cd <path-to-folder-where-toolkit-was-unzipped>

    Example:

    cd certifai
  2. Install the Certifai packages as workspace libraries.

    • Upload the Certifai packages in the packages/wheels/all folder. When creating each library select a Library Source of Upload and a Library Type of Python whl.

      After uploading the library to your workspace, follow the Databrick Instructions to install the Library via the cluster UI or library UI.

      (Optional) If you intend to install Certifai as Notebook-scoped Libraries, then save the path to the uploaded wheel file.

      Example:

      dbfs:/FileStore/jars/8efcd310_4731_b2e3_a9e5_6c2856caeb14/cortex-certifai-common-1.3.15-py3-none-any.whl
    • Install the Certifai Engine package specific to the Python version for your Databricks cluster. MAKE SURE TO USE THE SAME MINOR VERSION OF PYTHON that is installed on your Databricks cluster.

      To check the Python version on your cluster, you can either locate the cluster's runtime or evaluate !python --version within a cell of a Notebook attached to the cluster.

      After uploading the library to your workspace, follow the Databrick Instructions to install the Library via the cluster UI or library UI.

      EXAMPLE: If Python 3.8 is installed on your Databricks cluster, then upload ONLY the engine package at:

      packages/wheels/python3.8/cortex-certifai-engine-1.3.15-58-g69e639b8-py3.8.13-none-any.whl
  3. Verify the Certifai packages have been installed on your cluster. Create a new Notebook attached to your cluster and add a cell in the notebook with the following code and verify all packages installed in the previous step are listed:

    EXAMPLE:

    %pip list | grep cortex-certifai

    Output:

    cortex-certifai-client 1.3.15
    cortex-certifai-common 1.3.15
    cortex-certifai-engine 1.3.15
    cortex-certifai-scanner 1.3.15
    cortex-certifai-connectors 1.3.15

Special Setup Instructions for Optional Dependencies

If you would like to perform shap-based Explainability or Explanation analyses, you must install shap>=0.31.0.

shap can be installed as a cluster library from PyPI. When creating each library select a Library Source of PyPI and enter a package name of shap.

Alternatively, shap can be installed as a notebook-scoped library with the cortex-certifai-engine package. You must know the the path in dbfs:/FileStore/jars/ to the cortex-certifai-engine package to perform the installation.

EXAMPLE: (Include the following as a cell within a notebook)

%pip install /dbfs/FileStore/jars/31a669a4_2255_4407_945b_ace9db0c11d3/cortex-certifai-engine-1.3.15-58-g69e639b8-py3.8.13-none-any.whl[shap]

Run the Example Notebook

Prerequisites

Follow the instructions provided on these pages to download and install Certifai on a Databricks cluster on which to run Certifai sample notebooks provided in the toolkit:

Import the Notebook

  1. Set your current working directory to the folder where you unzipped the contents of the Certifai Toolkit .zip during the download process.

    cd <path-to-folder-where-toolkit-was-unzipped>

    EXAMPLE

    cd certifai
  2. Create a table in Databricks by importing the CSV file through the Databricks UI. Select a "Data Source" of Upload File and upload the datasets/german_credit_eval.csv file from the Toolkit. Save the path the name of the new table, along with the /Filestore/tables path.

    EXAMPLE: The datasets/german_credit_eval.csv file may be saved as /Filestore/tables/german_credit_eval_1_csv, and the path to the CSV file would be: /dbfs/FileStore/tables/german_credit_eval-1.csv

  3. Import the Jupyter notebook into Databricks included at notebook/BringingInYourOwnModel.ipynb in the Toolkit. Attach the notebook to your running cluster.

  4. Update the value of the all_data_file variable in Cmd (6) to refer to the path where the german_credit_eval.csv file was uploaded in the previous step.

    EXAMPLE: (After the updating the cell)

    all_data_file = "/dbfs/FileStore/tables/german_credit_eval-2.csv"
    df = pd.read_csv(all_data_file)
    df.head()
  5. Update the final cell in the notebook to save the Scan Definition (YAML) file to the Databricks FileStore.

    EXAMPLE: (After updating the cell)

    scan_file = "/dbfs/FileStore/my_scan_definition.yaml"
    with open(scan_file, "w") as f:
    scan.save(f)
    print(f"Saved template to: {scan_file}")
  6. Run the cells of the notebook.

  7. (Optional) Save the scan reports to the Databricks FileStore when running the Certifai Scan, and download the reports to your local machine to visualize the scan results. Scan Reports cannot be visualized from within a Databricks notebook.

    EXAMPLE: When creating the Certifai scan, set the output_path parameter to a string path starting with /dbfs/FileStore/.

    scan = CertifaiScanBuilder.create('test_use_case',
    prediction_task=task,
    output_path='/dbfs/FileStore/scan_reports')