Version: 1.3.13

Configuration Files

This section describes technical runtime configuration options for the scanner that can be set by toolkit users.

Scanner configuration file location and structure

The configuration file is located in the .certifai folder of your home directory, and is named certifai_conf.ini. This file is auto-created when Certifai is first run.

If this folder does not already exist, run the command:

certifai --help

This creates the folder and configuration file.

The certifai_conf.ini file is in the format used by the Python configparser module, which is similar to Windows ini files. You can edit it using your favorite text editor.

Certifai runs out of the box with a set of recommended configuration options. The configuration file is provided to support users who need to tune the scaling or behavior of the algorithm.

There are two main sections in the file that provide users with these advanced configuration options. Each section has commented out options that show you the default values. To set an option, uncomment the option and set its value.

General runtime options

General options apply globally to all scans or notebook/script analyses run on your machine. They are intended to be set to reflect available compute resources, and default to settings appropriate for a typical high end laptop or a personal workstation.

Overrides for these defaults may be made by modifying parameters in the algorithm_options section of the ~/.certifai/certifai_conf.ini file. In regard to algorithm options specifically, it contains a section that (by default) looks like this:

[algorithm_options]
# workers_num: sets the number of worker processes that run the genetic algorithm
# workers_num = 3

Available algorithm options

workers_num

Specifies the number of worker processes to use for the Certifai GA. These operate in parallel and make concurrent calls to the model. In the case of models provided locally (as opposed to via an HTTP endpoint) in notebooks each worker process has its own copy of the model. The Certifai processing in each worker is single threaded, but if it calls local in-process models then multiple threads per worker may be spawned by the model (This depends on the particular model being analyzed). A value of -1 disables multi-processing usage by Certifai entirely. Default is 3.

Certifai Engine Hyperparameters

This section documents the Certifai hyperparameters exposed to users of the system. It is NOT recommended to make changes to the default hyperparameter settings in production usage without specific validation for your environment in a non-production setting.

How to set a hyper-parameter value

This section describes the different mechanisms you can use to provide overrides to the default values for any parameter or parameters.

Via the global Certifai configuration file

Values specified in the ~/.certifai/certifai_conf.ini file act as overrides to the engine defaults and apply to all scans or notebooks run on your machine. In regard to hyperparameters specifically, it contains a section that (by default) looks like this:

# Algorithm Hyper parameters
[algo_hyper_params]
# population: This is the size of the GA data point population evolved for each counterfactual. Use population = 4000 or higher for production; 500 is OK for testing
# population = 4000
# sampling_boundary: the boundary for the confidence interval of the proxy statistic used to
# determine early stopping. Interpretation is that early stopping will occur when there is a 95%
# confidence of being within this proportion of the true value (so 10% here)
# sampling_boundary = 0.10
# sampling_min_n: the minimum number of samples before we can early stop (needed before a
# statistical assessment can be made)
# sampling_min_n = 100
# num_counterfactuals: the number of counterfactuals that are returned for each counterfactual type.
# num_counterfactuals = 2

An override may be provided in this section for any engine hyperparameter.

Via the scan definition

When running scans via the Certifai scanner, local overrides for a particular analysis may be provided within the scan definition for that analysis. This is documented in the main Certifai scan definition reference guide.

Local overrides in the scan definition supersede the global overrides from the config file for a particular scan.

Via the CLI

Local overrides may also be provided when running a scan via the CLI. The Certifai Scanner supports a --config flag (-c for short) that allows configuration values to be specified for a given scan. The expected format for these overrides is section.key=value, where section and key refer to values from the configuration file.

For example, the GA population can be changed via: certifai scan -f scan_definition.yaml --config algo_hyper_params.population=1000.

These local overrides supersede the global overrides from the config file, but not any overrides within the scan definition.

Multiple overrides can be specified by using the --config flag multiple times (e.g. certifai scan -c algo_hyper_params.population=1000 -c algorithm_options.workers_num=4).

Via the API in Python notebook or script usage

Hyperparameters may be set on the CertifaiScanBuilder by setting a value for the hyper_parameter_overrides property. The value should be a dictionary of the form: {param-name: value}. Refer to the Certifai API reference for more details.

Available Hyper-Parameters

population

Specifies the size of evolved data point population used by the Certifai GA. Default is 4000. Larger values cause more load on the model(s) being analyzed and typically increase runtime. A larger population may provide better convergence towards the optimal counterfactuals in higher dimensional cases.

sampling_boundary

When estimating population statistics, (typically mean burden for either the entire dataset or a sub-population of it) this value determines the 95% confidence error, below which early stopping occurs. Default value is 0.1. Smaller values lead to more precise estimation (and typically tighter overall confidence bounds on output scores) at the cost of increased runtime.

sampling_min_n

Minimum sub-sample evaluated before attempting to measure error statistics. Early stopping never occurs prior to evaluating counterfactuals for this many data points. Default value is 100.

num_counterfactuals

Only applies to explanations, and governs the maximum number of counterfactuals derived for each data point to be explained. Default value is 1. Higher values provide diverse explanations (they are optimized for diversity). Computation cost is low.

Log configuration file location and structure

The log file is located in the .certifai folder off your home directory, and is named log_conf.ini.

If this folder does not already exist, run the command:

certifai --help

This creates the folder and log configuration file.

The log_conf.ini file is the format used by the Python logging module.

The logging configuration looks similar to:

[loggers]
keys=root
[handlers]
keys=file_handler,stream_handler
[formatters]
keys=log_formatter
[logger_root]
level=INFO
handlers=file_handler,stream_handler
[handler_file_handler]
class=handlers.RotatingFileHandler
formatter=log_formatter
args=(os.path.join(os.path.expanduser("~"), '.certifai', 'certifai.log'), 'a', 10000000, 100)
[formatter_log_formatter]
format=%(asctime)s %(name)-6s %(levelname)-8s %(message)s
[handler_stream_handler]
class=StreamHandler
level=WARN
formatter=log_formatter
args=(sys.stderr,)

You can edit this file using your favorite text editor. Editing fields other than level is not recommended. Valid values for level are ERROR, WARN, INFO and DEBUG.

To change the level of detail that is sent to the terminal, edit the level field under handler_stream_handler. For example, the following causes detailed diagnostic messages to be output to the terminal:

[handler_stream_handler]
class=StreamHandler
level=DEBUG
formatter=log_formatter
args=(sys.stderr,)

To change the level of detail used in the .certifai/certifai.log file, edit the level field under logger_root.