Version: 1.3.12

Fairness

What is fairness?

Fairness is one of the output reports optional for each scan in Certifai. Certifai scans analyze one or more models using the same dataset.

Fairness is a measure of the disparity between the change required to alter the outcome in categorical groups defined by the fairness grouping feature.

Fairness is a particular concern in AI systems because bias exhibited by predictive models can render models untrustworthy and unfair to one or more target groups.

For example, different models can exhibit any number of biases towards features like gender, age, or educational level.

Features targeted for fairness evaluation may be numeric (e.g. age) or non-numeric (e.g. marital status).

Example: In the use case of binary classification models that are predicting if a loan applicant will be granted or denied a loan, Certifai users might want to decide which model shows a higher level of fairness between male, female, and self-assigning applicants. In this case, the user would run a scan choosing "sex" as one of the fairness grouping features.

What is Burden and how is it measured in Certifai?

Fairness Burden scores are derived by using counterfactuals to find the degree to which the values of an input data point must change to obtain a favorable prediction from the model.

The Burden is the average amount of change required for members of feature groups to achieve a preferred result (e.g. loan granted).

A lower number indicates that less change is required for that group to have a favorable outcome.

If the burden for one group is very high and the burden for another group is low, the model may be unfair to the group with the higher burden.

Ideally, the objective is to have burdens as close to one another as possible, which indicates the model is treating members within the selected groups fairly.

It is important to note that burden is not comparable across features. They are only designed to compare the groups within a single feature. (e.g you cannot compare fairness between age and marital status.)

How is fairness calculated?

Here are the brief steps for calculating fairness:

  1. Calculate burden for each group.
  2. Compute the Gini index for each feature group.
  3. Convert to a percentage that is (1 - GiniIndex).

If all feature group burdens are equal, you get a 100% fair system, and if only one group has non-zero burden but none of the others do, you get 0.

Example: In a binary classification problem where the positive class is the desired outcome:

  • If the instance is classified as positive, there is 0 burden.
  • If the instance is classified as negative, burden is a monotonic function, f() of the distance to the decision boundary, as this distance indicates how much work will need to be done (i.e. the burden) to flip to the desired outcome.

What is the importance of fairness?

Based on fairness scores the following actions might be taken by the various Certifai users:

  • Data scientists might attempt to retrain any of the models with datasets that contain more equal distributions for age and status and compare the results again.

  • Business decision-makers might select the fairest model for production deployment of their loan application

  • Compliance officers might reject the use of models with overall scores or feature category differences that don’t meet a certain threshold for fairness since that is a critical aspect of trusted AI.

How is fairness displayed in Certifai?

The Fairness by Group visualizations are displayed in the Console:

  • An overall view that shows a the average fairness score for each model.
  • A model view that shows a histogram of the fairness scores for each grouping feature in the model.

At the top of the page, the favorable outcome for the scan is identified, (for example "Loan Granted" for the Banking:Loan Approval sample use case).

Each of the Fairness features that were defined for the scan is displayed in a separate graph. In the case of the Banking:Loan Approval use case the features are status and age.