Configuration Files
This section describes technical runtime configuration options for the scanner that can be set by toolkit users.
Scanner configuration file location and structure
The configuration file is located in the .certifai
folder of your home directory,
and is named certifai_conf.ini
. This file is auto-created when Certifai is first run.
NOTE
The certifai_conf.ini
may be designated as a "hidden" system file.
If this folder does not already exist, run the command:
certifai --help
This creates the folder and configuration file.
The certifai_conf.ini
file is in the format used by the Python
configparser module,
which is similar to Windows ini files. You can edit it using your favorite text editor.
Certifai runs out of the box with a set of recommended configuration options. The configuration file is provided to support users who need to tune the scaling or behavior of the algorithm.
There are two main sections in the file that provide users with these advanced configuration options. Each section has commented out options that show you the default values. To set an option, uncomment the option and set its value.
General runtime options
General options apply globally to all scans or notebook/script analyses run on your machine. They are intended to be set to reflect available compute resources, and default to settings appropriate for a typical high end laptop or a personal workstation.
Overrides for these defaults may be made by modifying parameters in the
algorithm_options
section of the ~/.certifai/certifai_conf.ini
file.
In regard to algorithm options specifically, it contains
a section that (by default) looks like this:
[algorithm_options]# workers_num: sets the number of worker processes that run the genetic algorithm# workers_num = 3
Available algorithm options
workers_num
Specifies the number of worker processes to use for the Certifai GA. These operate in parallel and make concurrent calls to the model. In the case of models provided
locally (as opposed to via an HTTP endpoint) in notebooks each worker process
has its own copy of the model. The Certifai processing in each worker is
single threaded, but if it calls local in-process models then multiple threads
per worker may be spawned by the model (This depends on the particular model
being analyzed). A value of -1
disables multi-processing usage by Certifai entirely. Default is 3.
Certifai Engine Hyperparameters
This section documents the Certifai hyperparameters exposed to users of the system. It is NOT recommended to make changes to the default hyperparameter settings in production usage without specific validation for your environment in a non-production setting.
How to set a hyper-parameter value
This section describes the different mechanisms you can use to provide overrides to the default values for any parameter or parameters.
Via the global Certifai configuration file
Values specified in the ~/.certifai/certifai_conf.ini
file act as overrides to the engine defaults and apply
to all scans or notebooks run on your machine. In regard to hyperparameters specifically, it contains
a section that (by default) looks like this:
# Algorithm Hyper parameters[algo_hyper_params]# population: This is the size of the GA data point population evolved for each counterfactual. Use population = 4000 or higher for production; 500 is OK for testing# population = 4000# sampling_boundary: the boundary for the confidence interval of the proxy statistic used to# determine early stopping. Interpretation is that early stopping will occur when there is a 95%# confidence of being within this proportion of the true value (so 10% here)# sampling_boundary = 0.10# sampling_min_n: the minimum number of samples before we can early stop (needed before a# statistical assessment can be made)# sampling_min_n = 100# num_counterfactuals: the number of counterfactuals that are returned for each counterfactual type.# num_counterfactuals = 2
An override may be provided in this section for any engine hyperparameter.
Via the scan definition
When running scans via the Certifai scanner, local overrides for a particular analysis may be provided within the scan definition for that analysis. This is documented in the main Certifai scan definition reference guide.
Local overrides in the scan definition supersede the global overrides from the config file for a particular scan.
Via the CLI
Local overrides may also be provided when running a scan via the CLI. The Certifai Scanner supports a --config
flag
(-c
for short) that allows configuration values to be specified for a given scan. The expected format for these
overrides is section.key=value
, where section
and key
refer to values from the configuration file.
For example, the GA population can be changed via: certifai scan -f scan_definition.yaml --config algo_hyper_params.population=1000
.
These local overrides supersede the global overrides from the config file, but not any overrides within the scan definition.
Multiple overrides can be specified by using the --config
flag multiple times (e.g. certifai scan -c
algo_hyper_params.population=1000 -c algorithm_options.workers_num=4
).
Via the API in Python notebook or script usage
Hyperparameters may be set on the CertifaiScanBuilder
by setting a value for the hyper_parameter_overrides
property. The value should be a dictionary of the form: {param-name: value}
. Refer to the Certifai API reference for more details.
Available Hyper-Parameters
population
Specifies the size of evolved data point population used by the Certifai GA. Default is 4000. Larger values cause more load on the model(s) being analyzed and typically increase runtime. A larger population may provide better convergence towards the optimal counterfactuals in higher dimensional cases.
sampling_boundary
When estimating population statistics, (typically mean burden for either the entire dataset or a sub-population of it) this value determines the 95% confidence error, below which early stopping occurs. Default value is 0.1. Smaller values lead to more precise estimation (and typically tighter overall confidence bounds on output scores) at the cost of increased runtime.
sampling_min_n
Minimum sub-sample evaluated before attempting to measure error statistics. Early stopping never occurs prior to evaluating counterfactuals for this many data points. Default value is 100.
num_counterfactuals
Only applies to explanations, and governs the maximum number of counterfactuals derived for each data point to be explained. Default value is 1. Higher values provide diverse explanations (they are optimized for diversity). Computation cost is low.
Log configuration file location and structure
The log file is located in the .certifai
folder off your home directory,
and is named log_conf.ini
.
If this folder does not already exist, run the command:
certifai --help
This creates the folder and log configuration file.
The log_conf.ini
file is the format used by the Python
logging module.
The logging configuration looks similar to:
[loggers]keys=root
[handlers]keys=file_handler,stream_handler
[formatters]keys=log_formatter
[logger_root]level=INFOhandlers=file_handler,stream_handler
[handler_file_handler]class=handlers.RotatingFileHandlerformatter=log_formatterargs=(os.path.join(os.path.expanduser("~"), '.certifai', 'certifai.log'), 'a', 10000000, 100)
[formatter_log_formatter]format=%(asctime)s %(name)-6s %(levelname)-8s %(message)s
[handler_stream_handler]class=StreamHandlerlevel=WARNformatter=log_formatterargs=(sys.stderr,)
You can edit this file using your favorite text editor. Editing fields other than level
is not recommended. Valid values for level
are ERROR, WARN, INFO and DEBUG.
To change the level of detail that is sent to the terminal, edit the
level
field under handler_stream_handler
. For example, the following
causes detailed diagnostic messages to be output to the terminal:
[handler_stream_handler]class=StreamHandlerlevel=DEBUGformatter=log_formatterargs=(sys.stderr,)
To change the level of detail used in the .certifai/certifai.log
file,
edit the level
field under logger_root
.