Get Started in Databricks Notebooks
Prerequisites
- Download the Certifai toolkit
- Certifai Toolkit version 1.3.15 or later
- A Databricks cluster with runtime supporting either Python 3.7 or 3.8
ALERT
Prior to Certifai 1.3.15, the Certifai Toolkit did not include wheel
distributions of the client packages. The following instructions assume the .whl
files are packaged in the toolkit.
If this is not the case, you should udpate to the latest tooolkit.
Install the Certifai Toolkit Libraries on a Databricks Cluster
Certifai Toolkit packages can be installed on a Databricks cluster in three ways, using:
The following instructions are to install the Certifai toolkit as workspace libraries.
Set your current working directory to the folder where you unzipped the contents of the Certifai Toolkit .zip during the download process.
cd <path-to-folder-where-toolkit-was-unzipped>Example:
cd certifaiInstall the Certifai packages as workspace libraries.
Upload the Certifai packages in the
packages/wheels/all
folder. When creating each library select a Library Source ofUpload
and a Library Type ofPython whl
.After uploading the library to your workspace, follow the Databrick Instructions to install the Library via the cluster UI or library UI.
(Optional) If you intend to install Certifai as Notebook-scoped Libraries, then save the path to the uploaded wheel file.
Example:
dbfs:/FileStore/jars/8efcd310_4731_b2e3_a9e5_6c2856caeb14/cortex-certifai-common-1.3.15-py3-none-any.whlInstall the Certifai Engine package specific to the Python version for your Databricks cluster. MAKE SURE TO USE THE SAME MINOR VERSION OF PYTHON that is installed on your Databricks cluster.
To check the Python version on your cluster, you can either locate the cluster's runtime or evaluate
!python --version
within a cell of a Notebook attached to the cluster.After uploading the library to your workspace, follow the Databrick Instructions to install the Library via the cluster UI or library UI.
EXAMPLE: If Python 3.8 is installed on your Databricks cluster, then upload ONLY the engine package at:
packages/wheels/python3.8/cortex-certifai-engine-1.3.15-58-g69e639b8-py3.8.13-none-any.whl
Verify the Certifai packages have been installed on your cluster. Create a new Notebook attached to your cluster and add a cell in the notebook with the following code and verify all packages installed in the previous step are listed:
EXAMPLE:
%pip list | grep cortex-certifaiOutput:
cortex-certifai-client 1.3.15cortex-certifai-common 1.3.15cortex-certifai-engine 1.3.15cortex-certifai-scanner 1.3.15cortex-certifai-connectors 1.3.15
ALERT
It is only necessary to install the Certifai client, common, connectors, engine, and scanner packages. The cortex-certifai-console
package
and cortex-certifai-model-sdk
packages run web servers and are not appropriate for use in a notebook context.
Special Setup Instructions for Optional Dependencies
If you would like to perform shap-based Explainability or Explanation analyses,
you must install shap>=0.31.0
.
shap
can be installed as a cluster library
from PyPI. When creating each library select a Library Source of PyPI
and enter a package name of shap
.
Alternatively, shap
can be installed as a notebook-scoped library
with the cortex-certifai-engine
package. You must know the the path in dbfs:/FileStore/jars/
to the cortex-certifai-engine
package
to perform the installation.
EXAMPLE: (Include the following as a cell within a notebook)
%pip install /dbfs/FileStore/jars/31a669a4_2255_4407_945b_ace9db0c11d3/cortex-certifai-engine-1.3.15-58-g69e639b8-py3.8.13-none-any.whl[shap]
Run the Example Notebook
Prerequisites
Follow the instructions provided on these pages to download and install Certifai on a Databricks cluster on which to run Certifai sample notebooks provided in the toolkit:
Import the Notebook
Set your current working directory to the folder where you unzipped the contents of the Certifai Toolkit .zip during the download process.
cd <path-to-folder-where-toolkit-was-unzipped>EXAMPLE
cd certifaiCreate a table in Databricks by importing the CSV file through the Databricks UI. Select a "Data Source" of
Upload File
and upload thedatasets/german_credit_eval.csv
file from the Toolkit. Save the path the name of the new table, along with the/Filestore/tables
path.EXAMPLE: The
datasets/german_credit_eval.csv
file may be saved as/Filestore/tables/german_credit_eval_1_csv
, and the path to the CSV file would be:/dbfs/FileStore/tables/german_credit_eval-1.csv
Import the Jupyter notebook into Databricks included at
notebook/BringingInYourOwnModel.ipynb
in the Toolkit. Attach the notebook to your running cluster.Update the value of the
all_data_file
variable in Cmd (6) to refer to the path where thegerman_credit_eval.csv
file was uploaded in the previous step.EXAMPLE: (After the updating the cell)
all_data_file = "/dbfs/FileStore/tables/german_credit_eval-2.csv"df = pd.read_csv(all_data_file)df.head()Update the final cell in the notebook to save the Scan Definition (YAML) file to the Databricks FileStore.
EXAMPLE: (After updating the cell)
scan_file = "/dbfs/FileStore/my_scan_definition.yaml"with open(scan_file, "w") as f:scan.save(f)print(f"Saved template to: {scan_file}")Run the cells of the notebook.
(Optional) Save the scan reports to the Databricks FileStore when running the Certifai Scan, and download the reports to your local machine to visualize the scan results. Scan Reports cannot be visualized from within a Databricks notebook.
EXAMPLE: When creating the Certifai scan, set the
output_path
parameter to a string path starting with/dbfs/FileStore/
.scan = CertifaiScanBuilder.create('test_use_case',prediction_task=task,output_path='/dbfs/FileStore/scan_reports')
WARNING
Certain Databricks Runtimes
do NOT support programatically writing to the local file system. If your cluster runtime does not allow writing files,
then you should avoid saving scan reports as files by setting write_reports=False
as a parameter when
programatically running a scan in a notebook.