NOTE: An earlier draft of this document circulated in December 2017, if you’re looking for that version or have cited it we maintain a copy here. Enormous thanks to everyone who commented on or contributed to that document. This document is a work in progress and comments are highly appreciated. Also, for now it looks best on a computer. There isn’t really any good reason for that and we hope to improve the experience on smaller screens soon. —Shira (sam942@mail.harvard.edu, @shiraamitchell) and Jackie (dont@email.me, @hatfinisher)

# Introduction

How do you know when you’re being fair? How do you know when you’re being treated unfairly? To answer this question we can look to ethical, philosophical, political, and legal conversations, where fairness has been articulated and argued for centuries, often in a highly formalized setting. In this document we are concerned with new entries to the topic of fairness by statisticians, machine learning researchers, and other quantitative experts. We refer to this as the quantitative fairness conversation. We hope to survey a few of the available quantitative articulations of fairness, and to illustrate some of their tensions and limitations.

### Fairness and Civil Rights

The 20th century was marked by a number of legal and legislative attempts to rectify injustice, inequality, and polarization along social axes such as race, gender, and class. These projects inevitably faced issues in articulating and formalizing principles of fairness and we would like to single out an example in a US political context (we largely work in this context without comment and regard it as a serious omission).

The Civil Rights Act of 1964 and the Fair Housing Act of 1968 outlined theories of procedural unfairness under which a person may bring suit. The exact jurisprudential details are beyond our expertise, but it is helpful to focus on Title VII of the Civil Rights Act. Title VII has been interpreted to provide recourse against employers in two settings [Barocas and Selbst]:

• Disparate treatment: the formal use of group membership or intent to treat (members of) groups differently.
• Disparate impact: facially neutral procedures that have a disproportionately adverse outcome in some groups.

To give an example of the concepts in the fair housing setting, imagine two cases of redlining. Suppose a lender denies loans to black applicants on the basis of their race. This would qualify as disparate treatment. Now consider a lender with no intention to consider race, but who uses neighborhood average income as a signal of creditworthiness. If black applicants live in comparatively poor neighborhoods, fewer black applicants receive loans. This might be an example of disparate impact.

Since we are not lawyers, we prefer not to say too much about the legal interpretations of these criteria. In some ways, what matters is that there are several. In other words, that the relevant principle of fairness may be an ensemble of more fundamental principles. This raises a number of questions such as:

• Incompleteness: is my ensemble complete? Perhaps there’s a notion I’ve missed.
• Error: does my ensemble contain errors? Perhaps one of my notions is completely wrong in certain situations where I never thought to apply it.
• Unexpected Consequences: does my ensemble have unexpected entailments? Sometimes a compact set of assumptions can lead to surprising and counterintuitive results.
• Mutual Incompatibility: is it possible to satisfy them all? Perhaps there is a contradiction between my notions of fairness.
• Evaluation of Tradeoffs: in situations where not all notions of fairness can be satisfied, what’s the right way of trading off violations of each?
• Communication: is there any compact way of communicating my notions of fairness?

In the emerging field of quantitative fairness, we will meet all of these concerns again. We hope that our work can be part of a conversation that makes these concerns explicit.

Before we proceed to our exposition it may be helpful to explicitly address the most straightforward definition of fairness in a decision procedure:

• Fairness Through Unawareness: the decision procedure is not allowed to use group membership.

This principle is routinely applied. For example, there are many situations in housing or employment where certain information is regarded as inadmissible to collect or consider. But as we have seen in the example of redlining, it provides no guarantees against disparate impact. Moreover, it builds in no protection against disparate treatment via proxy variables.

Below, we survey quantitative definitions of fairness in the statistics and machine learning literature, organized as:

• assertions of metric parity
• assertions of conditional independence
• assertions of the absence of causal relationships

After this survey of definitions we discuss some of the ramifications of the zoo of conflicting definitions, and proceed to a number of applications and directions. Our approach is necessarily slanted by the authors’ background and expertise, noticeably omitting much discussion of either the social science literature, political philosophy, or activist conversations. It is our hope that this survey can animate the quantitative reader to take an interest in those conversations, which are in no way eclipsed by mathematical developments.

# Setup

We are interested in evaluating both human and automated procedures (“algorithms”) through several yet-to-be-defined definitions of fairness. We narrow our discussion to procedures that are motivated by some unknown, future ground truth. Consider a procedure that is applied to a population of individuals, e.g. loan applicants or criminal defendants. Each of these individuals has a binary true status $Y$, e.g. loan default or criminal re-offense. When we apply the procedure to an individual, their true status is unknown. The procedure may output three different things:

• Decision: a binary decision $d$ that attempts to approximate $Y$.
• Probability: a real number $S$ that can be interpreted as an approximation to the probability that $Y=1$.
• Score: a real number $S$ without necessarily having any probabilistic interpretation, with some approximately monotone but unspecified relationship to probability.

As an example, if an individual is being considered for a loan, a decision tells you to approve or deny the loan, a probability might estimate the individual’s probability of default, while a score might be their credit score. Another example of a score might be fitted values from a logistic regression but we don’t actually believe in it as a model.

We can move among these three types of output as shown:

For each individual, we can define the variables:

 $Y$ - true status. This is the true status. When we apply the procedure to an individual, their true status is unknown. But the procedure is hopefully based on the true statuses of other individuals from the same population (training data). It is also known as the target variable, dependent variable, outcome of interest, class label, or ground truth. $A$ - group membership. This is group membership, where the groups are based on legal or political concerns. In our examples, this is race, gender, or class. $X$ - covariates. These are the variables available to the procedure. They are also known as features, independent variables, inputs. $d$ - decision. This is a binary decision output. $S$ - score. This is a score (or probability) output.

# Definitions: Metric Parities

In machine learning and statistics, a classifier or model is a procedure that attempts to approximate the true status $Y$ based on past data (training data). The first set of fairness definitions we will consider are those that assert equal metrics of the classifier across groups. Different domains might emphasize different metrics. For example, in a medical test we might be highly concerned with false negative diagnosis (in which case a dangerous condition could go untreated), and with false positive diagnosis (which could result in an unnecessary and dangerous medical intervention). The plurality of metrics give rise to a plurality of fairness definitions.

## Fairness for Decisions

For binary decision procedures, we can summarize a procedure with the confusion matrix, which illustrates match and mismatch between decision ($d$) and true status ($Y$). Its margins are fractions of data, expressed as probabilities. For example, $P[Y=1]$ is the fraction of individuals in the data that have positive status, and $P[Y=1~\vert~d=1]$ is the fraction of those with a positive decision who are actually positive.

### Confusion Matrix

 Positive Status $Y = 1$ Negative Status $Y = 0$ Prevalence ("base rate") $P[Y = 1]$ Positive Decision $d=1$ True Positive (TP) False Positive (FP) Positive Predictive Value (PPV), aka precision $P[Y = 1 \vert d = 1]$ False Discovery Rate (FDR) $P[Y = 0 \vert d = 1]$ Negative Decision $d = 0$ False Negative (FN) True Negative (TN) False Omission Rate (FOR) $P[Y = 1 \vert d = 0]$ Negative Predictive Value (NPV) $P[Y = 0 \vert d = 0]$ Positive Decision Rate $P[d = 1]$ True Positive Rate (TPR), aka recall, aka sensitivity $P[d = 1 \vert Y = 1]$ False Positive Rate (FPR) $P[d = 1 \vert Y = 0]$ Accuracy $P[d = Y]$ False Negative Rate (FNR) $P[d = 0 \vert Y = 1]$ True Negative Rate (TNR), aka specificity $P[d = 0 \vert Y = 0]$

For any box in the confusion matrix involving the decision $d$, we can define fairness as equality across groups. See the full list below. We note (and display via highlighting) that equivalent definitions of fairness emerge from pairs that sum to 1.

Here we consider three definitions based on equal classifier metrics across groups:

• Equal False Negative Rates: the fraction of positives which are marked negative in each group agree.
• Equal False Positive Rates: the fraction of negatives which are marked positive in each group agree.
• Equal Positive Predictive Values: the fraction of those marked positive which are actually positive in each group agree.

It is useful to take a moment to consider that these definitions are genuinely different, both mathematically and as principles of fairness.

We can also consider a definition that is not based on how well we approximate true status $Y$:

• Statistical Parity (equal positive decision rates): the fraction marked positive in each group should agree.

## Fairness for Scores

For score outputs, which we’ll have much less to say about, we can consider the following initial definitions of fairness based on equal metrics across groups:

As with binary decisions, there is no single metric. And as with binary decisions, definitions based on equality of metrics are different, both mathematically and as principles of fairness.

## Data: Sampling and Measurement

Suppose we wish to evaluate the fairness of a procedure for the population to which it is applied (e.g. defendants in particular jurisdiction). Suppose we have access to a sample of data from this population. We should ask two questions:

• is the sample representative of the population?
• are the measured values the true values?

We now describe how the very general statistical considerations of sampling and measurement can compromise assessments of fairness.

### Sampling

To generalize from a sample to a larger population, we assume that those in the sample are similar to those not in the sample. Suppose we are interested in hiring rates in a population of people with mostly low incomes. But our sample consists only of people with higher incomes. If income is related to hiring, the hiring rate in our sample will be a biased estimate of the hiring rate in the population. We can attempt to adjust for this issue in many ways but cannot ignore it (see, for example, Chapter 8 of BDA3).

### Measurement

Similarly compromising are issues of measurement. Suppose we wish to evaluate the false positive rate of a criminal justice risk assessment tool to predict offense. This requires the true status, offense. Suppose that instead of offense, our data only include arrests. We can regard arrest as a noisy measurement of offense. In doing so, we would need to take seriously not only the difference between arrest and offense, but the fact that this difference may be much worse for some groups. For example, if black people are arrested more often for crimes they did not commit, our evaluation of fairness is severely compromised [Lum].

# Definitions: Conditional Independence

Some notions of fairness can be phrased naturally and compactly in terms of conditional independence. Unfortunately, these compact statements suffer from a few conceptual issues:

• Conditional independence statements frequently have non-obvious entailments.
• Conditional independence statements which are very different may be almost identical verbally and errors are frequent when they are expressed informally.
• It is often difficult, especially verbally, to distinguish between probabilistic/statistical/observational and causal dependence. We’ll talk a bit more about this later.

However, none of these constitute substantial objections to conditional independence as a framework, we simply want to point to some sharp corners.

The conditions are most naturally phrased in terms of Decisions, Groups, and Data. Decision and Group are just what we’ve been calling $d$ and $A$ above, but Data deserves a bit more explanation. It includes variables we use for fairness checks. It need not be exactly the covariates used by the procedure ($X$), but could be a subset of them, or the true status ($Y$), or even empty ($\varnothing$).

There are three types of conditional independence that we can imagine.

• $\textrm{Decision} \perp \textrm{Group} ~\vert~ \textrm{Data}$
• $\textrm{Data} \perp \textrm{Group} ~\vert~ \textrm{Decision}$
• $\textrm{Decision} \perp \textrm{Data} ~\vert~ \textrm{Group}$

We consider these in an example.

### Interpreting Conditional Independence

• $\textrm{Data}$ - SAT score
• $\textrm{Decision}$ - college admission
• $\textrm{Group}$ - gender

### Interpreting $\textrm{Decision} \perp \textrm{Group} ~\vert~ \textrm{Data}$

This implies that at any fixed SAT score, the admissions decision will be independent of gender. This matches an intuition of avoiding “disparate treatment” but if the groups have different distributions of SAT scores it may produce a “disparate impact” in that one group is admitted at a much higher rate than the other.

### Interpreting $\textrm{Data} \perp \textrm{Group} ~\vert~ \textrm{Decision}$

This implies that the distribution of SAT scores among admitted students of each gender is the same. Suppose we violate this principle and admitted women have lower scores than admitted men. In a sexist environment, this can be pretext for resentment.

### Interpreting $\textrm{Decision} \perp \textrm{Data} ~\vert~ \textrm{Group}$

This implies that within each group, SAT score is independent of the admission decision. This might be a reasonable demand if we felt that the SAT ought to be irrelevant to admissions. But it is not a natural definition of fairness with respect to Group.

Here then, are a few definitions of fairness, organized by the type of conditional independence and by the choice of Data.

 Data: $\textrm{Decision} \perp \textrm{Group} ~\vert~ \textrm{Data}$ $\textrm{Data} \perp \textrm{Group} ~\vert~ \textrm{Decision}$ Status - $Y$ $d \perp A ~\vert~ Y$ Equalized Odds [Hardt et al.] Conditional Procedure Accuracy Equality [Berk et al.] Equal False Positive Rates and Equal False Negative Rates $Y \perp A ~\vert~ d$ Conditional Use Accuracy Equality [Berk et al.] Equal Positive Predictive Values and Equal Negative Predictive Values (A subset of) Covariates - $X$ $d \perp A ~\vert~ X$ Conditional Statistical Parity $X \perp A ~\vert~ d$ NAME??? Nothing - $\varnothing$ $d \perp A$ Statistical Parity Vacuous condition.

First, we note that for each of these definitions involving a decision, one can consider a definition involving a score $S$. Second, and more importantly, we point out that there can be mathematical and moral tension across both rows and columns of this table.

We pause to consider an extended example of law school admissions, illustrating the interpretation of a few of these conditional independence statements.

### Seven Forms of Conditional Independence

Suppose we are selecting incoming law students for an honors program, and wish to select those students who will be successful, defined by some grade threshold. Consider several choices of data on which to base fairness checks:

• $Y$ - actual success
• $X^{\textrm{construct}}$ - an idealized form of qualification, not assumed measurable
• $\textrm{LSAT}$ - LSAT score
• $\varnothing$ - no data at all

We define

• $d$ - decision: admission to honors program
• $A$ - racial group

For each type of data we can attempt to interpret the two kinds of fairness we saw in our chart, namely $\textrm{Decision} \perp \textrm{Group} ~\vert~ \textrm{Data}$ and $\textrm{Data} \perp \textrm{Group} ~\vert~ \textrm{Decision}$. This gives seven conditional independence statements all of which seem to be intensely politicized in a way we cannot resolve here. Consider some examples:

1. $d \perp A ~\vert~ Y$ - for successful students, selection is independent of race.
2. $d \perp A ~\vert~ X^{\textrm{construct}}$ - any relationship between race and decision is explainable by (possibly unmeasured) qualification.
3. $\textrm{LSAT} \perp A ~\vert~ d$ - the distribution of LSAT scores among accepted students is identical across race groups.
4. $d \perp A ~\vert~ \varnothing$ - the rate of acceptance is identical across race groups.

In discussions of fair hiring and admissions one can encounter all these notions. Conversations can easily fall into the mire of assuming or imposing one or another condition as though it were an obviously correct formalization of fairness, treating violations of that principle as moral or conceptual gotchas. The frustrating and repetitive stalemate of these conversations likely reflects both confusion and fundamentally different ideas of fairness.

## Individual Fairness

Consider conditional statistical parity that conditions on all the covariates $X$. This definition is easily satisfied by the definition from the introduction:

• Fairness Through Unawareness: the decision procedure is not allowed to use group membership. Symbolically, this means that $d_i=d_j$ if $X_i=X_j$ for individuals $i,j$.

If fact, if the decision is a deterministic function of $X$, the two definitions are equivalent. An approximate version of fairness through unawareness is:

• Individual Fairness: the decision procedure is not allowed to use group membership. Symbolically, this means that $d_i \approx d_j$ if $X_i \approx X_j$ for individuals $i,j$.

We note that Friedler et al. define individual fairness using $X^{\textrm{construct}}$, the desired (but not perfectly observed) covariates at decision time.

# Definitions: Causality

So far, we’ve discussed definitions of fairness only in probabilistic/statistical/observational terms. But both moral and legal notions of fairness are often phrased in causal language. And as you may have heard before, “correlation does not imply causation.” So what is to be done?

Thankfully, there are several approaches to studying causality. We cannot give an exhaustive introduction here and refer the interested reader to Pearl 2009, Hernan and Robins, and Imbens and Rubin. Nonetheless, for the reader who has not seen this sort of thing before we would like to give some idea.

Arguably the simplest approach to causality is to translate causal statements into counterfactuals:

• “Was I not hired because I was black?” $\leadsto$ “Would I have been hired if I were non-black?”
• “Is there an effect of race on hiring?” $\leadsto$ “Would the rate of hiring be the same if everyone were black? if no one were?”

We can introduce the notation $d(\textrm{black}), d(\textrm{non-black})$ be the decision if the individual had been black, non-black respectively. We can state the following three definitions of fairness in terms of these counterfactuals:

• Individual Counterfactual Fairness: $d_i(\textrm{black}) = d_i(\textrm{non-black})$ for individual $i$
• Counterfactual Parity: $E[d(\textrm{black})] = E[d(\textrm{non-black})]$
• Conditional Counterfactual Parity: $E[d(\textrm{black}) ~\vert~ \textrm{Data}] = E[d(\textrm{non-black}) ~\vert~ \textrm{Data}]$. Conditioning on a lot of Data approaches individual counterfactual fairness. [Kusner et al.]

In the hiring example, individual counterfactual fairness can be regarded as a negative answer to each individual’s question “would the decision have been different if I were not black?” Counterfactual parity provides a negative answer to the question “would the rates of diring be different if everyone were black?” Finally, conditional counterfactual parity answers the same question as counterfactual parity, but now stratified by some factors, e.g. the application’s education.