articles causal inference in civil rights litigation

VOLUME 122
DECEMBER 2008
NUMBER 2
© 2008 by The Harvard Law Review Association
ARTICLES
CAUSAL INFERENCE IN CIVIL RIGHTS LITIGATION
D. James Greiner
TABLE OF CONTENTS
I.
REGRESSION’S DIFFICULTIES AND THE
ABSENCE OF A CAUSAL INFERENCE FRAMEWORK ...................................................540
A. What Is Regression? .........................................................................................................540
B. Problems with Regression: Bias, Ill-Posed Questions, and Specification Issues .....543
1. Bias of the Analyst......................................................................................................544
2. Ill-Posed Questions......................................................................................................544
3. Nature of the Model ...................................................................................................545
4. One Regression, or Two?............................................................................................555
C. The Fundamental Problem: Absence of a Causal Framework ....................................556
II.
POTENTIAL OUTCOMES: DEFINING CAUSAL EFFECTS AND BEYOND .................557
A. Primitive Concepts: Treatment, Units, and the Fundamental Problem.....................558
B. Additional Units and the Non-Interference Assumption.............................................560
C. Donating Values: Filling in the Missing Counterfactuals ...........................................562
D. Randomization: Balance in Background Variables ......................................................563
E. Observational Studies: Challenges and Some Ways To Address Them ......................565
F. The Need for Lots of Background Variables..................................................................573
III. POTENTIAL OUTCOMES IN THE CIVIL RIGHTS LITIGATION SETTING ................575
A. In General ..........................................................................................................................576
1. Identifying Primitive Concepts..................................................................................576
2. A Tug-of-War: The Need for Background Variables
Versus the Desire To Detect Discrimination............................................................581
B. Specific Contexts...............................................................................................................583
1. Capital Punishment.....................................................................................................583
2. Employment Discrimination......................................................................................588
3. Causation and Section 2 of the Voting Rights Act .................................................590
IV. CONCLUSION .........................................................................................................................597
533
CAUSAL INFERENCE IN CIVIL RIGHTS LITIGATION
D. James Greiner∗
Civil rights litigation often concerns the causal effect of some characteristic on decisions
made by a governmental or socioeconomic actor. An analyst may be interested, for
example, in the effect of victim race on jury imposition of the death penalty, in the effect
of applicant gender on a firm’s hiring decisions, or in the effect of candidate ethnicity on
election results. For the past thirty years, such analyses have primarily been accomplished via a statistical technique known as regression. But as it has been used in civil
rights litigation, regression suffers from several shortcomings: it facilitates biased, resultoriented thinking by expert witnesses; it encourages judges and litigators to believe that
all questions are equally answerable; and it gives the wrong answer in situations in
which such might be avoided. These difficulties, and several others, all stem from the
fact that regression does not begin with a paradigm for defining causal effects and for
drawing causal inferences. This Article argues for a wholesale change in thinking in this
area, from a focus on regression coefficients to an explicit framework of causation called
“potential outcomes.” The potential outcomes paradigm of causal inference, which (for
lawyers) may be analogized to but-for causation with a renewed emphasis on time,
addresses many of the shortcomings of regression as the latter is currently used in civil
rights litigation, and it does so within a framework courts, litigators, and juries can
understand. This Article explains regression and the potential outcomes paradigm and
discusses the latter’s application in the death penalty, employment discrimination, and
redistricting settings.
“If they can get you asking the wrong questions, they don’t have to worry
about answers.”
— Thomas Pynchon1
I
n modern litigation, courts, attorneys, and expert witnesses use statistics in the hope of shedding light on questions of causation. This
is particularly true in the civil rights context, where repetition of similar events makes the use of data analysis techniques attractive. The
dialogue between law and quantitative methods in the civil rights area
has lasted for decades, but few would characterize the relationship as
happy. The disquiet is evident on both sides. As early as 1980, legal
commentators concluded that many courts disregarded a substantial
portion of statistical analyses they encountered in the employment dis-
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
∗ Assistant Professor of Law, Harvard Law School. Thanks to Sergio Campos, Heather
Gerken, Don Greiner, Ellen Greiner, Sam Gross, Sam Hirsch, Dan Ho, Louis Kaplow, Jennifer
Lewis, Andrew Martin, Chris Robinson, Matthew Stephenson, Cora True-Frost, Adrian Vermeule, and an anonymous referee for helpful comments and suggestions. No one listed endorses
this Article, and all mistakes are my own.
1 THOMAS PYNCHON, GRAVITY’S RAINBOW 251 (1973).
534
2008]
CAUSAL INFERENCE
535
crimination context.2 Twenty-five years later, a survey of Title VII3
cases demonstrated that little has changed.4
On the quantitative end, innovation has stagnated. During the
decade or so that the Supreme Court was placing its imprimatur on
statistics in general and regression in particular as appropriate forms
of evidence in Title VII cases,5 the academy was responding with
scholarly examinations of quantitative issues arising in employment
discrimination,6 capital punishment,7 redistricting,8 and other contexts.9 Quantitative analysts were convening panels and holding symposia to make recommendations to improve judicial understanding
and use of statistical methods in litigation.10 Those recommendations
were ignored,11 however, and a perusal of the hornbooks and looseleafs
discussing the use of statistics as evidence in civil rights litigation sug–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
2 See, e.g., Michael O. Finkelstein, The Judicial Reception of Multiple Regression Studies in
Race and Sex Discrimination Cases, 80 COLUM. L. REV. 737, 737 & n.5 (1980) (collecting cases in
which judges ignored or expressed discomfort with statistical analyses).
3 Civil Rights Act of 1964, tit. VII, 42 U.S.C. §§ 2000e to 2000e-17 (2000).
4 Robert L. Nelson & Eric Bennett, Judicial Treatment of Statistical Evidence in Cases Alleging Racial Discrimination in Employment: Now (2000–2002) and Then (1980–82), at 3 (Mar. 20,
2003) (unpublished manuscript, on file with the Harvard Law School Library) (“[Courts] typically
are skeptical of efforts by plaintiffs to use statistics to prove discrimination, although they also
very frequently chide plaintiffs if they offer no statistical proof to bolster claims of disparate treatment.”). There have been occasional exceptions. E.g., Vuyanich v. Republic Nat’l Bank of Dallas, 505 F. Supp. 224, 261–79 (N.D. Tex. 1980) (performing a rigorous analysis of the parties’
mathematical models), vacated, 723 F.2d 1195 (5th Cir. 1984).
5 See, e.g., Bazemore v. Friday, 478 U.S. 385, 398–404 (1986); Hazelwood Sch. Dist. v. United
States, 433 U.S. 299, 306–13 (1977); Int’l Bhd. of Teamsters v. United States, 431 U.S. 324, 337–40
(1977).
6 See, e.g., Delores A. Conway & Harry V. Roberts, Regression Analyses in Employment Discrimination Cases, in STATISTICS AND THE LAW 107 (Morris H. DeGroot et al. eds., 1986);
Finkelstein, supra note 2; Franklin M. Fisher, Multiple Regression in Legal Proceedings, 80
COLUM. L. REV. 702 (1980).
7 See, e.g., DAVID C. BALDUS ET AL., EQUAL JUSTICE AND THE DEATH PENALTY: A
LEGAL AND EMPIRICAL ANALYSIS (1990).
8 See, e.g., Richard L. Engstrom & Michael D. McDonald, Quantitative Evidence in Vote Dilution Litigation: Political Participation and Polarized Voting, 17 URB. LAW. 369 (1985); Bernard
Grofman et al., The “Totality of Circumstances Test” in Section 2 of the 1982 Extension of the Voting Rights Act: A Social Science Perspective, 7 LAW & POL’Y 199 (1985).
9 See, e.g., DAVID C. BALDUS & JAMES W.L. COLE, STATISTICAL PROOF OF DISCRIMINATION (1980) (discussing quantitative issues in a range of legal areas); STATISTICS AND THE
LAW, supra note 6 (discussing antitrust, school finance, environmental regulation).
10 See, e.g., AM. STATISTICAL ASS’N COMM. ON LAW AND JUSTICE STATISTICS, PROCEEDINGS OF THE SECOND WORKSHOP ON LAW AND JUSTICE STATISTICS (Alan E. Gelfand ed., 1983); PANEL ON STATISTICAL ASSESSMENTS AS EVIDENCE IN THE COURTS,
NAT’L RESEARCH COUNCIL, THE EVOLVING ROLE OF STATISTICAL ASSESSMENTS AS
EVIDENCE IN THE COURTS 1–16 (Stephen E. Fienberg ed., 1989); Discussion Meeting, The Role
of the Statistician as an Expert Witness, 145 J. ROYAL STAT. SOC’Y SER. A 395 (1982).
11 See Nelson & Bennett, supra note 4, at 4 (“The courts and the legal profession more broadly
have not adopted the suggestions of the Panel on Statistical Assessments as Evidence in the
Courts.”).
536
HARVARD LAW REVIEW
[Vol. 122:533
gests that the field is fixated on methods introduced decades ago, particularly regression,12 despite judicial dissatisfaction.
Judicial reluctance to engage, though not excusable, is perhaps understandable, but the lack of progress on the quantitative end is more
puzzling. In the general area of quantitative analysis of socioeconomic
phenomena (at least, outside the law), the Earth has moved in the past
two decades. The development of fast, inexpensive computers, as well
as statistical techniques (such as hierarchical and Bayesian modeling)
that take advantage of intense computation,13 has contributed to a
revolution in the sophistication and complexity of the analysis of political, economic, and sociological data.
Meanwhile, in the more specific area of drawing inferences of causation, a more fundamental and less technical change has occurred.
Previously, research into causation in social science, especially in observational studies, depended on the same framework still used in civil
rights litigation today: a near-fetishlike focus on regression coefficients.14 In language similar to that currently in statistics-anddiscrimination hornbooks and looseleafs,15 scholarly publications in
nonlegal quantitative fields would periodically intone that regression
coefficients demonstrate correlation only and that correlation does not
equal causation. They would nevertheless proceed to use causal language, speaking in terms of the “effects” of variables after “controlling
for” potential confounders; the “effects” were regression coefficients,
and one controlled for potential confounders by including them in the
right-hand side of a regression equation.16
More recently, however, much cutting-edge causal research outside
the law has moved away from regression coefficients toward what has
become known as the “potential outcomes” framework.17 As discussed
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
12 For example, in employment, see RAMONA L. PAETZOLD & STEVEN L. WILLBORN, THE
STATISTICS OF DISCRIMINATION ch. 6 (2006); and WALTER B. CONNOLLY, JR. ET AL., USE
OF STATISTICS IN EQUAL EMPLOYMENT OPPORTUNITY LITIGATION § 11.03 & apps. D–E
(2008). In redistricting, see the discussion in Appendix A of D. James Greiner, Ecological Inference in Voting Rights Act Disputes: Where Are We Now, and Where Do We Want To Be?, 47
JURIMETRICS J. 115, 155–59 (2007).
13 See, e.g., Joseph B. Kadane & George G. Woodworth, Hierarchical Models for Employment
Decisions, 22 J. BUS. & ECON. STAT. 182 (2004).
14 See William A. Darity, Jr. & Patrick L. Mason, Evidence on Discrimination in Employment:
Codes of Color, Codes of Gender, J. ECON. PERSP., Spring 1998, at 63, 73–76 (collecting studies);
Christopher Winship & Michael Sobel, Causal Inference in Sociological Studies, in HANDBOOK
OF DATA ANALYSIS 481, 481 (Melissa Hardy & Alan Bryman eds., 2004).
15 See, e.g., CONNOLLY, JR. ET AL., supra note 12, § 11.03, at 11-8.2; PAETZOLD & WILLBORN, supra note 12, ch. 6, § 6.03, at 14–15.
16 See infra note 28 and accompanying text.
17 See, e.g., Christopher Winship & Stephen L. Morgan, The Estimation of Causal Effects from
Observational Data, 25 ANN. REV. SOC. 659, 662 (1999); see also PANEL ON METHODS FOR ASSESSING DISCRIMINATION, NAT’L RESEARCH COUNCIL, MEASURING RACIAL DISCRIMINATION 78 (Rebecca M. Blank et al. eds., 2004).
2008]
CAUSAL INFERENCE
537
in greater detail below, the potential outcomes framework begins with
a definition of a “causal effect,” not as a regression coefficient, but
rather as a comparison of outcomes that would occur with, versus
without, some “treatment.” One can think in terms of a pill purporting
to reduce blood pressure. Because for any particular “unit” (for example, a patient), an analyst can only observe one potential outcome (for
example, blood pressure when the patient takes the pill), the challenge
in causal inference is to fill in the outcome value that would have occurred had the unit done other than it actually did (for example, blood
pressure if the patient had taken a placebo).
Judges, litigators, and expert witnesses rarely take notice of, much
less adopt in some way, paradigmatic shifts in the social sciences, and
often with good reason. The near-total isolation of discrimination litigation from a potential outcomes understanding of causation is nevertheless both surprising and unhealthy. It is surprising because the
fundamentals of the potential outcomes framework are familiar, perhaps even instinctive, to any survivor of first year torts; as the above
blood-pressure pill analogy illustrates, the simplest form of the paradigm may be thought of as but-for causation with a special focus on
time.18 The isolation is unhealthy because the framework can at least
begin to address many problems festering in the application of statistical analysis to civil rights issues, and it can do so in a way accessible to
adjudicators. For example, a complaint among judges is that the opinions of expert witnesses appear to be for sale.19 In the potential outcomes framework, however, the primary focus is on reproducing an
imagined randomized experiment, so most of the hard quantitative
work should be accomplished before the analyst knows what answer
the study will produce. In other words, the framework allows an ex–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
The potential outcomes paradigm has also been labeled a “counterfactual” definition of causality, PANEL ON METHODS FOR ASSESSING DISCRIMINATION, supra, at 78, although this
term has been used to describe other frameworks. As is true of most good ideas, ownership of the
paradigm is disputed. Credit has been given, variously, to Professor Donald Rubin, see Paul W.
Holland, Statistics and Causal Inference, 81 J. AM. STAT. ASS’N 945, 946 (1986), Professor Jerzy
Neyman, see David A. Freedman, Graphical Models for Causation, and the Identification Problem, 28 EVALUATION REV. 267, 287 (2004), and multiple others, see James J. Heckman, Rejoinder: Response to Sobel, 35 SOC. METHODOLOGY 135, 138–39 (2005) (crediting various econometricians); Michael E. Sobel, Discussion: “The Scientific Model of Causality,” 35 SOC.
METHODOLOGY 99, 99–101 (2005) (spreading credit among various thinkers).
18 Part of the explanation for the disconnect between civil rights litigation and the potential
outcomes framework may be the fact that several influential quantitative analysts who developed,
articulated, or used the paradigm currently deny that any discussion of the causal effects of race,
gender, or other immutable characteristics is well-defined. See infra note 94.
19 See Samuel R. Gross, Expert Evidence, 1991 WIS. L. REV. 1113, 1114 (collecting references).
Even if witnesses are not for sale, the practice of shopping for an expert with a favorable opinion
is well known, e.g., Paul Meier, Damned Liars and Expert Witnesses, 81 J. AM. STAT. ASS’N. 269,
273–75 (1986), and is in fact encouraged by legal rules allowing parties to hire both consulting and
testifying experts while shrouding the former’s analyses behind the work product privilege.
538
HARVARD LAW REVIEW
[Vol. 122:533
pert witness to commit to the most important aspects of a study before
she knows whether it will favor the plaintiff or the defendant. Judges
may respond better to experts who testify (truthfully) that they committed to their analyses before knowing the results.
There are other benefits to using the potential outcomes framework
in civil rights litigation, particularly as compared to the current practice of addressing almost any problem with regression. Regression can
give the wrong answer, or contradictory answers, to questions lawyers
and judges care about; for example, it may imply that a firm is discriminating against men when in fact it is discriminating against
women, or it may suggest that the firm is discriminating against both
men and women at the same time. In other situations, regression (particularly in more advanced forms) provides answers — such as conclusions in terms of logarithms of odds ratios — that are difficult for
judges and juries to interpret. The potential outcomes paradigm suffers less from these difficulties. Because the framework begins with
the inquiry into what would have happened had the perception of
some characteristic (for example, race or gender) been different at a defined time point, courts, juries, litigators, and experts need focus less
attention on comprehending statistical minutiae.
Further, regression provides no framework within which to assess
whether sharp causal questions, questions linked to information in
available data, have been articulated. Before using any statistical
technique, a quantitative analyst must articulate a question by translating the applicable legal standards and the background facts into an
inquiry focusing on one or more quantities of interest, and must then
assess whether available data have any information on these quantities. It is hubris to suppose that our present knowledge of the world is
such that we can analyze any situation with currently available quantitative methods, and it is downright foolish to suppose that we can
tackle any problem with one set of techniques. There are some situations as to which, at present, we can say nothing useful. In contrast to
regression, the potential outcomes understanding of causal inference
helps analysts understand when they are unable to translate facts and
law into a sharp causal question linked to available data. Section
III.B.3 provides an example of such a situation.
Finally, and most importantly, the potential outcomes framework
makes clear that critical choices required to draw causal inferences in
civil rights litigation are not mathematical. Instead, they depend on
decisions about the law and on an understanding of how the world
works. These are matters about which judges, lawyers, and laypeople
can speak as intelligently as those trained in quantitative methods.
2008]
CAUSAL INFERENCE
539
This Article explains the potential outcomes paradigm for causation and applies it to civil rights litigation.20 Part I begins with the
difficulties regression as currently used in civil rights litigation encounters. It shows that, at bottom, these problems stem from the fact that
regression does not begin with any definition of a causal effect, much
less one that would lead to the near-exclusive focus on coefficients
characteristic of most modern expert and judicial analysis. Part II articulates and develops the potential outcomes framework for causal inference. It begins by defining a causal effect in terms of some specific
changed circumstance (that is, a counterfactual). It then argues that,
in most instances, an analyst addressing causal inference questions in
the context of civil rights litigation should draw inferences using the
template of a randomized experiment, despite the fact that the data
available are not randomized but rather observational. Part III applies the potential outcomes paradigm to the civil rights litigation setting generally as well as to three particular areas in which causal issues arise: imposition of capital punishment, private employment
discrimination class actions, and lawsuits alleging vote dilution under
section 2 of the Voting Rights Act. It argues that well-posed questions
linked to accessible data are available in the first two of these areas,
but that in the latter no such question is available. Part IV concludes.
To illustrate some of the issues discussed, this Article makes frequent reference to a running example of a fictional employer sued for
salary discrimination under Title VII by a class of female employees.
Hypothetically, the employer’s computer records include information
on each employee’s gender, educational achievement (on some scale),
date of hire (which allows calculation of how long each employee has
worked at the firm), and current job level (again, on some scale), as
well as salary. Simulated data assist in the illustration.21
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
20 There are only a handful of references to the potential outcomes framework in any sort of
legal setting. For example, a recent report by a National Academy of Sciences panel on measuring racial discrimination outlined some of the basics. See PANEL ON METHODS FOR ASSESSING DISCRIMINATION, supra note 17, at 77–89. Perhaps because it concerned itself with measuring the effects of discrimination generally, however, the panel struggled with the issue of the
timing of treatment assignment, and it dedicated little attention to litigation. For other examples,
see Lee Epstein et al., The Supreme Court During Crisis: How War Affects Only Non-War Cases,
80 N.Y.U. L. REV. 1, 65–74 (2005), and Daniel E. Ho, Scholarship Comment, Why Affirmative
Action Does Not Cause Black Students To Fail the Bar, 114 YALE L.J. 1997 (2005).
21 This Article discusses only issues of intentional discrimination — disparate treatment as
opposed to disparate impact. In addition, this Article is limited to issues of gender, race, and ethnicity; it uses the term “race” as a shorthand for both of the latter two characteristics.
540
HARVARD LAW REVIEW
[Vol. 122:533
I. REGRESSION’S DIFFICULTIES AND THE
ABSENCE OF A CAUSAL INFERENCE FRAMEWORK
This Part briefly summarizes the regression technique as it is typically used in civil rights disputes,22 then discusses the difficulties it encounters as a method for drawing causal inferences in this context. It
demonstrates that all difficulties arise from the fact that regression
does not begin with a coherent framework for causal inference.
A. What Is Regression?
Regression has been explained many times; this section is as brief
as possible.23 Moreover, this Article will use exactly one technical/mathematical symbol: β, the universal representation of a regression coefficient.
To understand regression, begin with the running example identified above, and focus on the question of whether salary levels at the
hypothetical firm reflect gender bias. In the overwhelming majority of
employment discrimination cases,24 the analyst would propose the following, “simple” model:
Simple Model:
Salary = β0 + βG*(gender) + βJL*(job level) + βYE*(years educ.) +
βYW*(years worked) + error25
One part of the appeal of this model is its simplicity and interpretability. β0 can be understood as some baseline salary level common to
all employees. βG (“G” for gender) is the addition or subtraction in salary associated with being a woman; by assumption, this amount is
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
22 “Regression” here refers to regression as it is most commonly used in civil rights litigation
and as it is most typically presented in corresponding hornbooks and generic “quantitative methods for lawyers” volumes. There are variants to this most common form, such as the “PetersBelson” approach illustrated in Figure 3 and accompanying text, infra, that represent improvements. The point in Part I is that these improvements, while useful if employed as part of a coherent framework for defining and estimating causal effects, are insufficient to save regression
when divorced from such a framework, which is typically the case in modern civil rights
litigation.
23 The references in note 12, supra, include more expansive explanations for lawyers.
24 See, e.g., Thomas J. Campbell, Regression Analysis in Title VII Cases: Minimum Standards,
Comparable Worth, and Other Issues Where Law and Statistics Meet, 36 STAN. L. REV. 1299,
1312 (1984); Arthur S. Goldberger, Comment, 3 STAT. SCI. 165, 165 (1988).
25 Gender is 0 for men, 1 for women. Note that the “error” here is not a mistake of any kind.
Instead, it is the difference between what the equation above predicts and what is actually observed; the term “residual” is also commonly used. Statisticians do not expect predictions from a
model to be perfect (in fact, they would not know what to do if the predictions were perfect). The
idea is that the “error,” the difference between what is predicted and what is observed, is due to
random variation.
2008]
CAUSAL INFERENCE
541
constant for all women. βJL (“JL” for job level) is the salary amount associated with each additional unit of job level. βYE is the amount associated with each additional year of education, and βYW the amount associated with each additional year worked. The question of interest
under this model is whether βG is negative and statistically significant,
meaning negative in a way that is unlikely to be due to chance. If it is,
the appealing thing about regression is that βG immediately provides a
rough measure of how much the defendant firm owes each member of
the (female) class, because by assumption βG represents the constant
amount of salary deduction associated with being a woman.26 Most
introductory statistics books discuss the math used to implement this
type of regression. The math can generate what are called “point estimates” for the βs, which are single numbers representing some kind
of best guess for each, and estimated “standard errors” or “standard
deviations,” which are measures of uncertainty about the point estimates. Analysts can use these figures to produce, for example, an approximate interval within which the true βG is likely (in some sense) to
be found.27
Some vocabulary: the quantity on the left-hand side of the equation
(salary, above) is often called the “response” or the “dependent variable,” while the observed quantities on the right-hand side (gender, job
level, years of education, years worked) are called “explanatory” or
“independent” variables. Variables included on the right-hand side of
a regression equation are said to be “adjusted for” or “controlled for”;
for example, a litigator might ask an expert witness: “Did your analysis
control for years of education?” As mentioned above, the βs are often
referred to as “effects.” Note the causal connotation in the phrases
“dependent variable,” “independent variable,” “explanatory variable,”
“control for,” “response,” and “effect.”28
A brief word about implementing this model is necessary. Any elementary statistical package or spreadsheet program will produce point
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
26 Even if such a figure may not serve as a basis for an award to each plaintiff, see Int’l Bhd.
of Teamsters v. United States, 431 U.S. 324, 361, 371–76 (1977); Franks v. Bowman Transp. Co.,
424 U.S. 747, 769 (1976), its availability should greatly increase the likelihood of settlement.
27 Because this Article focuses on other, more intuitive concepts, it deliberately finesses most
technical issues, such as the frequentist definition of confidence intervals, normality assumptions
for small samples, the central limit theorem for large samples, etc. Note also that the description
above corresponds to a frequentist framework of statistics; for some causal inference questions, it
may be advisable to be Bayesian.
28 The following statement is typical: “A common method for assessing statistics in employment discrimination cases is regression analysis, which isolates the impact of certain variables in
effecting a particular outcome. Multiple regression analysis . . . is a method of examining the
effect of particular independent or explanatory variables on a dependent variable.” CONNOLLY
ET AL., supra note 12, § 11.03, at 11-4 (footnotes omitted). In the death penalty context, see
BALDUS ET AL., supra note 7, at 164–66 (1990) (“After adjusting simultaneously for [certain]
variables . . . the average race-of-victim effect” was estimated to be a stated quantity.).
542
HARVARD LAW REVIEW
[Vol. 122:533
estimates and standard errors for the βs. Whenever possible, however,
a conscientious analyst checks whether a model under consideration
fits the data, that is, whether the mathematical assumptions necessary
to implement this model (not discussed here) are reasonable for the
dataset at issue. When a diagnostic shows a lack of fit (that is, that
the mathematical assumptions are unlikely), a conscientious analyst alters the model.29 In regression, the alteration usually consists of adding combinations or different forms of the independent variables to the
right-hand side of the regression equation, such as two variables multiplied together or the same variable squared. Intuitively, these
changes represent a belief that a one-unit change in an independent
variable is not associated (even approximately) with a constant change
in the response.
To illustrate, in the salary example above: after trying the Simple
Model and becoming less than satisfied, an analyst might hypothesize
that the longer people work, the more they get paid, but only up to a
certain point. After a certain number of years, salary might tend to
stabilize, or even decrease. If so, the analyst might add the variable
(years worked squared) to the above regression equation,30 producing
the following, “revised” model:
Revised Model 1:
Salary = β0 + βG*(gender) + βJL*(job level) + βYE*(years educ.) +
βYW*(years worked) + βYW2*(years worked squared) + error
One of the maddening aspects of statistics in general, and regression in particular, is that when one adds a variable in this way, the results of the revised model may look nothing like those of the simple
model. Recall that in the running salary discrimination example, our
focus is on βG. It may well happen that in a regression in which each
variable appears plainly (Simple Model, above), the estimate for βG
might be positive, statistically significant, and thus favorable (in litigation terms) to the defendant firm; but if an analyst adds the term
(years worked squared) to the equation (Revised Model 1, above) the
estimate for βG is negative, statistically significant, and thus favorable
(in litigation terms) to a putative female class. Although the intuition
behind this phenomenon is not easy to explain in lay terms, one could
imagine that adding a variable includes a different kind of information. If it is not duplicative of the information already in the model
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
29
30
For an example, see infra pp. 549–50.
Adding (years worked squared) to the equation allows the relationship between salary and
years worked to differ for new versus experienced employees.
2008]
CAUSAL INFERENCE
543
(from the other variables previously included there), then this new information could change the overall results a great deal.
Two observations on model-checking are required, the first a general point, the second specific to regression. First, assessing how well
any statistical model fits a dataset requires the exercise of judgment.
Few hard and fast rules are available because whether the model is
adequate for a dataset depends on — among other things — the question being asked, the nature of the lack of fit, and how bad the indicators are. Some commonly used diagnostics consist of simple graphs,
with the analyst making a heuristic judgment about whether the shape
of the plot is in some sense “good enough.” Second, and more importantly, when using regression, the analyst cannot examine the fit of the
model without first implementing that model, that is, directing software to produce estimates of the βs that are the quantities of interest
(particularly βG in the above example). Thus, an expert witness in a
salary discrimination case almost unavoidably sees the litigation “answer” a particular regression would produce before she assesses the
goodness of fit.
B. Problems with Regression:
Bias, Ill-Posed Questions, and Specification Issues
This section discusses a variety of problems with the regression
technique as it has been used in the civil rights context during the past
three decades. A fundamental theme of this Article is that these difficulties all stem at least in part from a common source, namely, the lack
of a framework for causal inference. A “framework for causal inference” means a structure that compels articulation of a sharp question
of interest in terms of a causal effect to be measured, links this question to a set of quantities that can be reasonably well estimated from
available data, and provides guidance for the choice of methods to accomplish this estimation. Each of these three elements is essential.
The first, the insistence on question definition, is the reason why the
civil rights legal community (in the persons of judges, triers of fact,
and litigators) consults quantitative analysts on causal questions. If
the analysts are to serve their intended purpose, the vague feeling on
the part of those in the legal business that “the numbers must have
some information here” must be honed by an insistence on a definition
of a causal effect and an initial stab at the question to be addressed.
Otherwise, the conversation between the legal and quantitative communities can result in the latter giving answers to questions the former
did not really intend to ask. The second element, how to link this
question to one or more quantities of interest, permits identification of
the causal effect that should be the goal of the investigation, the precise number or set of numbers that is the target of the inference. With
this in mind, the analyst can begin to assess whether the available data
544
HARVARD LAW REVIEW
[Vol. 122:533
(taking into particular account how they were collected) contain information about the desired target. The third step, guidance on methods, makes the framework useful.
The fact that regression as currently employed is not a causal inference framework has resulted in various problems. These problems are
listed in order of their rough importance, where “importance” refers to
the threat each poses to litigation’s truth-discovery function.
1. Bias of the Analyst. — “Bias” is often used in its statistical
sense, where it has a precise, mathematical definition. Here, in contrast, “bias” means good old-fashioned bias, the human tendency to favor one side or another more than what an impartial assessment of
available information would warrant.
Over twenty years ago, J. Morgan Kousser, after years of experience testifying in voting rights cases, wrote an article entitled, Are Expert Witnesses Whores?31 If some of them are, then regression provides ample opportunity for the fallen to ply their trade. The problem
stems from three facts articulated above: first, an analyst fitting a regression sees the litigation answer before she assesses goodness of fit;
second, deciding whether a model is adequate for the data requires
judgment; and third, adding or removing variables from a regression
can result in wholesale changes to the results.
The ease of stating this difficulty contrasts with the difficulty in
finding a solution. Model-checking is typically a multi-stage process:
the analyst implements a first model, assesses fit, is less than perfectly
satisfied, implements a second model, assesses fit, compares the fit of
the first model to that of the second, implements a third, etc.32 A
model’s fit is never perfect. At each stage of this process of exploration
and assessment, the substantive result, the litigation answer, stares the
analyst in the face. Only the superhuman can completely disregard
the temptation to lean towards a result favorable to a chosen side, consciously or no.33
2. Ill-Posed Questions. — Given the stranglehold regression currently enjoys on quantitative proof in civil rights litigation, it is hard to
shake the belief that one can measure the causal effect of any variable
by including it on the right-hand side of the equals sign in a regression
equation. Such overconfidence is unfortunate. There are certain matters about which, at present, we cannot even articulate sharp, answer–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
31 J. Morgan Kousser, Are Expert Witnesses Whores? Reflections on Objectivity in Scholarship
and Expert Witnessing, PUB. HISTORIAN, Winter 1984, at 5.
32 See Daniel E. Ho et al., Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference, 15 POL. ANALYSIS 199, 199–200 (2007).
33 See Franklin M. Fisher, Statisticians, Econometricians, and Adversary Proceedings, 81 J.
AM. STAT. ASS’N 277, 285–86 (1986). See generally Joseph B. Kadane, Ethical Issues in Being an
Expert Witness, 4 LAW, PROBABILITY & RISK 21 (2005).
2008]
CAUSAL INFERENCE
545
able quantitative questions about which available data provide information. Legislators and judges might be able to describe processes,
including what they suspect to be causal processes, that they would
like to understand better. But before statistical techniques can be
brought to bear, these vagaries must be translated into specific questions asked in terms of quantities of interest, and these questions must
be examined to assess whether they are, at the present stage of our
knowledge of causation and with available data, answerable.
Little in this translation process depends on an understanding of
mathematical obscurities, particularly in the civil rights context. Perhaps for that reason, regression provides little assistance in the translation. One can put virtually anything on the left-hand side of the
equals sign in a regression equation, place virtually anything else on
the right-hand side, and obtain a result (including the statistical significance of coefficients) from a computer. But without a well-posed
question, this result is of no value. The result is simply a number, or
more accurately, a range of numbers; what does this range of numbers
mean?34 Section III.B.3 discusses this issue further in the context of
section 2 of the Voting Rights Act.
3. Nature of the Model. — The following difficulties with regression have a common theme; namely, they concern the form of the
model used, also known as the “specification.” The issues discussed
here are often treated mathematically and with technical terms. That
is a mistake. Resolving these issues requires fundamental, and fundamentally legal, choices that can and should be understood intuitively.
To illustrate, this section makes frequent reference to the running
example of a class action alleging discriminatory pay on the basis of
gender.
(a) Which Variables? — Should analysts include any and all variables (squares, products of variables, etc.) in regression equations? If
not, what principles should guide their choices? Unfortunately, courts
cannot simply leave to the experts the task of choosing among variables. For example, in salary discrimination cases, courts must often
decide whether a variable measuring job level or job type should be
included in the right-hand side of a regression equation. It may be
that women (say) are concentrated in lower-level jobs, which pay less.
Typically, defendants present regressions controlling for job level, that
is, including it as an independent variable on the right-hand side of the
equation, while plaintiffs present regressions excluding this variable.
Plaintiffs argue that the job level measure is tainted in that the defen–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
34 Cf. DOUGLAS ADAMS, THE HITCHHIKER’S GUIDE TO THE GALAXY 179–80 (1979)
(“‘The Answer to the Great Question . . . Of Life, the Universe and Everything . . . Is . . . Fortytwo,’ said Deep Thought . . . .”).
546
HARVARD LAW REVIEW
[Vol. 122:533
dant is assigning women to lower-level (and thus lower-paying) jobs on
the basis of gender, and thus including job level in a regression “controls away” discrimination. After disputing the allegations of taint, defendants respond that including job level improves the explanatory
power (the “fit”) of the model. The statistical significance of the gender regression coefficient (βG) may depend on whether the job level
variable is included. If the job level variable is included, the gender
coefficient is not significant, while if the job level variable is excluded,
the gender coefficient is significant. In other words, the “answer” of
the study for the purposes of litigation depends on whether the job
level variable is included. The question of whether to include job level
(or other potentially tainted variables, such as performance evaluations
or disciplinary action)35 can even be case-dispositive, and the situation
has arisen so often in employment discrimination cases that it has become the subject of its own mini-literature.36
Meanwhile, courts have struggled. Some have stumbled upon what
is part of the right answer, that is, that job level (or another similar
variable) should be excluded if it is tainted by discrimination.37 But
they have allocated the burden of proving or disproving taint inconsistently.38 Other courts have focused on a debate between “human capital” and “establishment-oriented” theories of labor economics, which is
said to inform this question.39 Further, at least one prominent commentator argues that concededly tainted variables should be included
in regressions despite the fact that they represent the mechanism for a
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
35 See, e.g., Segar v. Smith, 738 F.2d 1249, 1276 (D.C. Cir. 1984) (approving plaintiffs’ study’s
decision to disregard “specialized experience” requirement for promotion as too subjective);
CONNOLLY ET AL., supra note 12, app. D at A-89, A-97.
36 See, e.g., Srijati Ananda & Kevin Gilmartin, Inclusion of Potentially Tainted Variables in
Regression Analyses for Employment Discrimination Cases, 13 INDUS. REL. L.J. 121 (1992); Walter Fogel, Class Pay Discrimination and Multiple Regression Proofs, 65 NEB. L. REV. 289, 303
(1986); Ramona L. Paetzold, Multicollinearity and the Use of Regression Analyses in Discrimination Litigation, 10 BEHAV. SCI. & L. 207 (1992); cf. Finkelstein, supra note 2, at 750–51 (discussing
a case in which the court applied a regression analysis to determine that scores on a test used to
screen job applicants were poorly correlated with later job performance).
37 E.g., Ottaviani v. State Univ. of N.Y. at New Paltz, 875 F.2d 365, 374–75 (2d Cir. 1989);
Craik v. Minn. State Univ. Bd., 731 F.2d 465, 479–80 (8th Cir. 1984); Sobel v. Yeshiva Univ., 566
F. Supp. 1166, 1180–81 (S.D.N.Y. 1983), rev’d and remanded for reconsideration, 797 F.2d 1478
(2d Cir. 1986); Trout v. Hidalgo, 517 F. Supp. 873, 886 n.47 (D.D.C. 1981); see also PAETZOLD &
WILLBORN, supra note 12, ch. 6, § 6.13, at 34–35 & nn.3–4 (collecting cases).
38 See Ananda & Gilmartin, supra note 36, at 143–46 (collecting cases). Compare, e.g., Trout,
517 F. Supp. at 886 n.47 (burden on defendant to prove lack of taint), with Coates v. Johnson &
Johnson, 756 F.2d 524, 544 (7th Cir. 1985) (burden on plaintiffs to prove taint).
39 A nice summary for lawyers of these two theories, together with an application of both to
the tainted variables question, appears in Fogel, supra note 36, at 304–07. See also Vuyanich v.
Republic Nat’l Bank of Dallas, 505 F. Supp. 224, 266 (N.D. Tex. 1980) (discussing human capital
theory), vacated, 723 F.2d 1195 (5th Cir. 1984).
2008]
CAUSAL INFERENCE
547
defendant firm’s discrimination, if they improve fit and if a specific,
technical circumstance is present.40
The job level issue in the employment discrimination context is
part of a larger debate in the legal community about potentially
tainted variables, which is itself part of a wider discussion on which
variables to include on the right-hand side of a regression equation (or
other statistical model), which in turn is a subset of a broader debate
on identifying variables for use in a statistical model generally. An example of a dispute at the lowest level of generality occurred recently
on the subject of whether affirmative action programs at law schools
cause lower bar passage rates for African American students. Here,
the allegedly tainted variable was lower grade point average, the value
of which is determined after the application of an affirmative action
program.41
Stepping back from the question of tainted variables, the question
of which variables to include on the right-hand side of a regression
equation is ubiquitous but is especially problematic in the civil rights
litigation context. In the vote dilution context, for example, Justice
Thomas has suggested that regressions used to assess whether voting
patterns in a jurisdiction are racially polarized should include potentially explanatory variables other than race.42 Theoretically, such
variables might include party affiliation or educational differences.
Similarly, returning to the employment context, how should analysts
treat variables such as gender in a race discrimination case? On the
one hand, it is unlawful for an employer to make most employment
decisions even partially on the basis of gender; on the other hand, gender may be a powerful explanatory variable (even in the absence of sex
discrimination) for the dependent variable being studied, and a lawsuit
alleging only race discrimination cannot result in an award of compensation to a class of women.
Finally, variable selection issues are not limited to the right-hand
side of a regression equation but extend to the identification of the dependent variable (on the left-hand side, the “response”). An example
of the confusion existing in this area is the debate over “reverse regres–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
40 Paetzold, supra note 36, at 227. The specific technical circumstance is lack of multicollinearity. Speaking non-technically, multicollinearity occurs when two explanatory variables contain
the same information; an extreme example is including both time in years and time in weeks as
explanatory variables. See FRED L. RAMSEY & DANIEL W. SCHAFER, THE STATISTICAL
SLEUTH: A COURSE IN METHODS OF DATA ANALYSIS 347 (2d ed. 2002). Multicollinearity can
cause estimates of regression coefficients to have a great deal of uncertainty in them.
As discussed at p. 564, infra, Paetzold is wrong here. The issue is not multicollinearity but
statistical bias, particularly post-treatment adjustment bias.
41 Compare Richard H. Sander, A Systemic Analysis of Affirmative Action in American Law
Schools, 57 STAN. L. REV. 367, 442–45 (2004), with Ho, supra note 20, at 2002–03.
42 Holder v. Hall, 512 U.S. 874, 904 & n.13 (1994) (Thomas, J., concurring in the judgment).
548
HARVARD LAW REVIEW
[Vol. 122:533
sion” in employment discrimination cases. Instinctively, most analysts
take salary, promotion success, hiring success, and similar variables to
be the response in regression equations, and they put observable qualification measures (for example, years worked, years of education, and
test scores) on the right-hand side. But a group of academics has proposed another procedure, namely, considering a qualification measure
as the response and making salary, for example, an independent variable.43 In other words, one predicts qualifications from salary instead
of predicting salary from qualifications. Advocates of reverse regression argue that before adjudicating a case, courts must choose between
asking (i) whether the salaries for men and women of equal qualifications should be the same, or (ii) whether the qualifications of men and
women of equal salaries should be the same. And if the focus of
causal analysis in civil rights litigation is on regression coefficients, reverse regression has surface appeal because, depending on the assumptions one makes, it might be possible to use either “forward” or “reverse” regression to estimate coefficients. Alas, the choice of whether
to predict salary from qualifications or qualifications from salary can
be litigation-dispositive because reverse regression invariably provides
less “evidence” of discrimination than do more traditional models.44
(b) Constant Additive Effect, Problems with the Legally “Prohibited” Variable. — In the salary regression equation example above, is it
reasonable to believe that the effect on salary of being a woman is the
same for a plaintiff with a high job level, perhaps an upper-level manager, as it is for a plaintiff with a low job level, perhaps a file clerk?
To clarify, suppose the former’s current salary is $300,000, while the
latter’s is $15,000, and the regression equation produces an estimated
βG (found to be statistically significant) of -$1500.00. Is the -$1500.00
figure plausible for either plaintiff? With respect to the upper level
manager, the figure is so small relative to overall salary that one might
question whether even a misogynistic firm would bother to discriminate at this level. With respect to the file clerk, the figure is so large
relative to salary that one might question whether a misogynistic but
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
43 See, e.g., Conway & Roberts, supra note 6, at 114–29; Delores A. Conway & Harry V. Roberts, Reverse Regression, Fairness, and Employment Discrimination, 1 J. BUS. & ECON. STAT. 75,
78–79 (1983) [hereinafter Conway & Roberts, Reverse Regression]; Arthur S. Goldberger, Comment, in STATISTICS AND THE LAW, supra note 6, at 182, 182 [hereinafter Goldberger, Comment]; Stephan Michelson, Comment, in STATISTICS AND THE LAW, supra note 6, at 169, 174–
77. Reverse regression, unfortunately, refuses to die. See, e.g., United States v. Delaware, 93 Fair
Empl. Prac. Cas. (BNA) 1248, 1266–69 (D. Del. 2004). The reference to the reverse regression
debate is not intended to lend credence to the technique, which suffers from a host of statistical
problems, only some of which have been explored in the literature thus far. See, e.g., Arthur S.
Goldberger, Redirecting Reverse Regression, 2 J. BUS. & ECON. STAT. 114 (1984); John J. Miller,
Some Observations, a Suggestion, and Some Comments on the Conway-Roberts Article, 2 J. BUS.
& ECON. STAT. 123 (1984).
44 Goldberger, Comment, supra note 43, at 182.
2008]
CAUSAL INFERENCE
549
minimally rational firm could hope to avoid detection of discrimination. Yet the assumption of the Simple Model above is that the effect
on salary of being a woman is the same for both the manager and
the file clerk, that is, that the effect at issue is constant across all
individuals.
So what? The problem is that this assumption of a constant, additive effect for all women can mask discrimination when it is present
and can give a false impression of discrimination when none exists.
For an example of how the former might occur, imagine a sexist but
somewhat economically rational firm assigning salaries to men and
women occupying similar positions. Some women are so productive
that the firm, despite its sexism, wants to retain their services, so it
provides salaries equal to those of similarly qualified men (that is, the
effect here is $0). Some men are so unproductive that the firm wants
to ease them out, so it provides salaries designed to induce them to
look for other jobs; it also, of course, does the same to unproductive
women (that is, the effect here is also $0). In the medium range of
productivity, however, the firm feels able to express its sexist nature,
providing salaries to members of both genders designed to keep most
of them working but still paying women less than equally productive
men (that is, the effect is $X). In such a situation, the simple regression model with a constant additive effect will produce one estimate of
the effect on salary of being a woman, a single number for all women,
that constitutes a complicated kind of average of (a) the $0 for the
highly productive, (b) the $X for those of middling productivity, and
(c) the $0 for the unproductive. Recalling that there is some uncertainty in all statistical estimation, the averaging of the $0, $X, and $0
amounts might well be statistically indistinguishable from an overall
$0 differential (that is, not statistically significant).45
From the point of view of an economist attempting to understand
how a firm functions, as opposed to an expert witness attempting to
produce an analysis understandable and useful to judges and juries,
relaxing this constancy assumption is easy. Returning to the running
salary discrimination example, and recalling the Simple Model, above,
one potential fix is to create a new variable by multiplying gender and
job level together; the resulting model is still readable.
Revised Model 2:
Salary = β0 + βG*(gender) + βJL*(job level) + βYE*(years educ.)
+ βYW*(years worked) + βGJL*(gender)*(job level) + error
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
45
See infra note 50 and accompanying text.
550
HARVARD LAW REVIEW
[Vol. 122:533
The new variable is the (gender)*(job level) term (and the subscript
“GJL” stands for gender-job-level). Because of it, the effect on salary
associated with being female now depends on job level. For a
woman46 at job level 1 (whatever that means), the effect is βG*1 +
βGJL*1, while for a woman at job level 2, the effect is βG*1 + βGJL*2.
In terms of usefulness in a discrimination case, however, the model
has grown complicated. In the original model, we could pose a single,
simple question: is the estimate for βG, the only coefficient associated
with gender, negative (and statistically significant)? In the revised
model, we now have two coefficients associated with gender, βG and
βGJL; upon which should the analysis focus? What if one is statistically
significant and the other is not? What if one is negative (and thus favorable to the litigation proof of a female class) while the other is positive (and thus adverse to such proof)?47 Notice that the problem
grows as we add more gender-specific terms to the regression, such as
the multiplication of gender and years worked, gender and years of
education, etc.
Experimental, experiential, and judicial evidence all suggest that
the problems articulated above are, unfortunately, not hypothetical.
An example of experimental evidence is a recently published paper in
which two authors studied the callback rate for (fictitious) resumes
randomized to have either African American–sounding (such as Lakisha) or Caucasian-sounding (such as Emily) names.48 The two authors
found that an African American–sounding name lowered the callback
rate, but they also found that the size of the negative effect varied according to whether the fictional resume was “high” or “low” quality.49
In other words, the name-race effect was not constant. On the experiential front, a pioneering expert witness in employment discrimination
cases has written, “The simple regression model assuming a constant
shift [β] is unrealistic. . . . Virtually all highly (low) qualified employees
will (not) be promoted. It is in the middle range of qualification levels
that discrimination is most likely to appear.”50 For evidence of confusion in the judiciary, see Penk v. Oregon State Board of Higher Educa–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
46
47
Recall that the gender variable was 1 for women and 0 for men.
See Conway & Roberts, Reverse Regression, supra note 43, at 78 (“If [interaction effects] are
not slight, the problems of interpretation become severe not only for statisticians but for judges.
Pronounced interaction effects could lead to data patterns in which any concept of simple discrimination or unfairness against females [or] minorities is lost.”).
48 Marianne Bertrand & Sendhil Mullainathan, Are Emily and Greg More Employable than
Lakisha and Jamal? A Field Experiment on Labor Market Discrimination, 94 AM. ECON. REV.
991 (2004).
49 Id. at 992 (“[H]aving a higher-quality resume has a smaller [positive] effect [on callback rate]
for African-Americans.”); id. at 1000–01 (“Most strikingly, African-Americans experience much
less of an increase in callback rate for similar improvements in their credentials.”).
50 Joseph L. Gastwirth, Comment, 3 STAT. SCI. 175, 176 (1988).
2008]
CAUSAL INFERENCE
551
tion,51 where a court downgraded the evidentiary weight of the plaintiffs’ regression coefficients because of the inclusion of a variable
analogous to (gender)*(job level) in the example.52
(c) Problems with Variables Other than the Legally “Prohibited”
Variable. — The previous section focused on the difficulties that arise
when the effect associated with gender is not the same at all job levels.
But this problem might also arise with other variables as well; for example, the effect associated with an additional year of education might
not be the same at all job levels. Or, as outlined above in the discussion of model fitting, it may be that salary tends to increase with years
worked only up to a point, and that after that point, additional years
worked are associated with a lower salary.
One might think that if we are focused on the effect associated
with gender, problems with the other variables would matter less.
Alas, such is not the case. To see why, return to the salary discrimination running example, and consider Figure 1, which plots the salaries
of 100 employees (50 male, 50 female) on the Y-axis against the years
worked on the X-axis. Because salary is on the Y-axis, higher is
“good” in the sense of more money. It appears that men, represented
by the black squares, do better than women, represented by the grey
diamonds, for any reasonable range of years worked. Take, for example, the period from five to ten years worked, as marked by the two
vertical, dotted lines. In this range, most of the squares are above
most of the diamonds, meaning men tend to have higher salaries than
women when both have worked between five and ten years.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
51
52
48 Fair Empl. Prac. Cas. (BNA) 1724 (D. Or. 1985).
Id. at 1849–51.
552
HARVARD LAW REVIEW
[Vol. 122:533
FIGURE 1. MALE AND FEMALE SALARIES
VERSUS YEARS WORKED
100
Salary (in thousands)
90
80
70
60
Women
Men
50
40
0
5
10
15
20
Years Worked
Salaries Versus Years Worked, Male and Female, for a Hypothetical Firm
(Simulated Data): At any given level of Years Worked, it appears that men
generally have higher salaries than women. But in a simple regression
approach, represented by the two parallel lines, the line for women is
above the line for men, suggesting discrimination against men! Under this
framework, the amount of “damages” for each man equals the (constant)
distance between the two lines. The explanation for the simple regression
model’s incorrect result is (i) a curve in the data (at least for men) that is
not reflected in the regression equation, and (ii) the fact that men have a
wider range of years worked than women.
What is the problem? The problem is that if analysts use the typical model in such cases, they would conclude that there is statistically
significant salary discrimination in favor of women, that is, that men
should be suing the firm. In other words, this is another instance in
which the usual regression gives the wrong answer. The “usual regression” here means the following, reduced version of the Simple Model,
above:
Salary = β0 + βG*(gender) + βYW*(years worked) + error
Geometrically, this model corresponds to two parallel lines, one for
men, the other for women. The lines are the “best fitting”; stated
loosely, they are the lines that stay as close to as many points as possible. Both lines appear in Figure 1, above, and there, one can see that
the dotted line for women is above the solid line for men. Up means
higher salary, so the fact that the female line is above the male line
2008]
CAUSAL INFERENCE
553
means that the regression model “thinks” that, given years worked,
women are earning more than men. How did that happen? Notice
that at the far left and far right parts of Figure 1, there are a fair
number of squares but no diamonds. Notice also that most of these
squares at the far left and right are below the points in the middle of
the graph. The male line has to account for those lower-salaried men
near the edges of the graph. In other words, the lower-earning men at
the right- and left-hand portions of the plot “pull” the male line down.
But there are few if any women who have worked fewer than five or
more than fifteen years,53 so there is nothing to “pull” the female line
down (assuming, which one should not, that female salaries would fall
off in a pattern similar to the men). In the big picture, the analyst has
tried to fit a single, straight line to all the male data, when the male
pattern is in fact curved. That resulted in the wrong answer for the
question of interest, which was whether the firm discriminates against
women (and, incidentally, a statistically significant wrong answer).
Again, one might ask, what is the problem? If analysts can see this
sort of thing in a graph, they will adjust the model. There are two responses. The first is that when analysts try to include additional variables such as years of education and job level, they can no longer plot
all variables on a flat piece of paper, or even on a hologram. Unfortunately, the world we live in has only three spatial dimensions, not
enough to visualize non-trivial data. In many (perhaps most) cases of
interest, then, an analyst would not be able to see this problem. The
second issue is that even with only one variable (as in the example
above), this problem can be difficult to see. The upside-down U-curve
in the above graph was constructed to make the problem obvious, but
it will not ordinarily be so clear.54
Finally, notice that in Figure 1 matters might not have gone amiss
if the analyst had confined herself to the range of years worked for
which there existed an appropriate number of both male and female
observations, say between five and fourteen years. Part II returns to
this thought.
(d) Differing Variable Types. — In the running gender example
used thus far, the variable on the left-hand side of the regression equation is salary. Suppose salary takes on values from, say, $15,000 to
$300,000. Statistically, regression depends on the assumption that sal–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
53 There are several reasons, some gender-neutral and some not, why men and women might
have different patterns of a variable such as years worked. For example, a years-worked measurement might reflect maternity leave time. Educational achievement patterns might reflect early
childhood socialization that steered women away from certain degrees or fields.
54 It is likely that this problem infects employment data, as it does data from so many other
fields. See Kevin M. Murphy & Finis Welch, Empirical Age-Earnings Profiles, 8 J. LAB. ECON.
202 (1990).
554
HARVARD LAW REVIEW
[Vol. 122:533
ary can be any value at all, but both negative values and unreasonably
high figures have such low probability that analysts typically do not
bother with this annoyance. Sometimes, however, analysts must deal
with variables where the annoyance cannot be ignored. For example,
suppose that instead of salary, we want to analyze for gender bias a set
of promotions recently awarded by a firm. Here, the variable of promotion success may be only two values, 0 (failure) and 1 (success).
Treating a 0–1 variable as though it could take on any value at all often goes badly.
Again, the problem is not a shortage of statistical techniques; the
terms “logistic regression” or “probit regression” may be familiar. Both
are methods that focus on the probability that an applicant receives a
promotion, and they often work well for 0-1 response variables. These
models may be conceptualized as follows.
Promotion probability = Fancy Math(β0 + βG*(gender) + βJL*(job
level) + βYE*(years educ.) + βYW*(years
work))
One can again ask the question of whether βG is non-zero and statistically significant. Luckily, for many models of this type, it is still the
case that a negative estimate for βG is good for the female plaintiffs’
lawsuit. But the fancy math55 involved in the above model is a substantial drawback in terms of both intuitive understanding of the
model and its usefulness in litigation; βG is no longer the salary addition or subtraction associated with being a woman, as it was with the
Simple Model, so there is no intuitively obvious measure of the relevant effect. In logistic regression, for example, βG would represent the
logarithm of an odds ratio, not an intuitive concept. Courts and commentators have disparaged or even rejected alternative models of discrimination for precisely these reasons.56
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
55 See RAMSEY & SCHAFER, supra note 40, at 580–637, for an accessible explanation of the
principles involved in logistic and probit regression.
56 See, e.g., Craik v. Minn. State Univ. Bd., 731 F.2d 465, 476 n.14 (8th Cir. 1984); Penk, 48
Fair Empl. Prac. Cas. (BNA) at 1860 (“Because the regression analysis is a logistic regression
analysis, the sex coefficients in [an expert report] represent the logarithm of the ratio of the odds
of a man being in the higher group to the odds of a woman being in the higher group after all
qualifications represented as variables in the equation are accounted for. These sex coefficients
are not readily understandable, and plaintiffs did not make them any more so in their rebuttal
report.” (citations omitted)); BALDUS ET AL., supra note 7, at 70 n.34 (“The key measure[] of the
impact of a given variable [is] the logistic-regression coefficient, which is difficult to interpret in
its own right . . . .”); Campbell, supra note 24, at 1313 n.43 (An alternative technique “would not
directly yield dollar estimates of different treatment, however, so continued use of regression models is preferable.”). As before, this particular issue can be solved, even within the regression context, if one is willing to abandon a focus on regression coefficients. See, e.g., Joseph L. Gastwirth
& Samuel W. Greenhouse, Biostatistical Concepts and Methods in the Legal Setting, 14 STAT.
2008]
CAUSAL INFERENCE
555
4. One Regression, or Two? — In an effort to address some of the
issues identified in the previous sections, some expert witnesses and
commentators have proposed running separate regressions, one for
members of the plaintiff class and one for those outside the protected
group. The idea, which is absolutely correct, is that a single regression
with a constant additive treatment effect is a blunt instrument, a
model ordinarily too clumsy to capture the nuances of the data, and
that running separate regressions for protected and unprotected groups
allows for greater flexibility.57 Continuing with the example of a gender discrimination lawsuit focusing on salary, an analyst might state
the following two equations:
SalaryW = βW0 + βWJL*(job level) + βWYE*(years educ.) + βWYW*(years
work) + errorW
SalaryM = βM0 + βMJL*(job level) + βMYE*(years educ.) + βMYW*(years
work) + errorM
The symbols W and M refer, of course, to women and men; note that
there is no βG*(gender) term in either equation because, for each equation, there is only one gender.
An analyst following the two-regression approach divides the data
into a male dataset and a female dataset, then uses the now-standard
regression techniques twice, once for each dataset. Now what? For an
analyst in a quantitative culture such as ours that fixates on regression
coefficients, one initial instinct is to compare the coefficients from the
two regressions. Suppose the analyst finds that the estimate for βWYE,
which corresponds to the amount associated with a one-year increase
in education level among women, is lower than the estimated βMYE,
which corresponds to the amount associated with a one-year increase
in education level for men. Assuming this difference is statistically
significant,58 the analyst might consider it evidence that the defendant
firm is valuing years of education less for women than for men, which
might support an inference of gender discrimination.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
MED. 1641, 1646–51 (1995). But the fact that the players in civil rights litigation have not
adopted such solutions suggests something about the way that regression, when used without reference to a causal inference framework, leads to an unhealthy focus on coefficients.
57 See, e.g., Ottaviani v. State Univ. of N.Y. at New Paltz, 875 F.2d 365, 374 (2d Cir. 1989);
PAETZOLD & WILLBORN, supra note 12, ch. 6, § 6.10.
58 I have not personally run across a proposal to test for significance in this setting, but constructing one does not seem too difficult. One could, for example, examine the posterior distributions of the two coefficients after fitting the regression models with Bayesian techniques. See, e.g.,
ANDREW GELMAN ET AL., BAYESIAN DATA ANALYSIS 353–85 (2d. ed. 2004).
556
HARVARD LAW REVIEW
[Vol. 122:533
As courts and commentators have recognized,59 however, the problem with this focus on regression coefficients is that it often yields contradictory results. Suppose the analyst finds that βWYE is significantly
lower than βMYE, but simultaneously finds that the estimate for βWYW is
significantly higher than βMYW. In other words, it appears that the firm
values additional years of education for men greater than for women,
but simultaneously values additional years worked for women greater
than for men. Perhaps both men and women should sue the firm.
There are methods to address this issue even wholly within the regression context,60 but they require the abandonment of a focus on regression coefficients.
C. The Fundamental Problem: Absence of a Causal Framework
In the gender discrimination context, should experts, litigators, juries, and judges care whether a defendant firm pays men more than
equally qualified women, or whether men are less qualified than
equally salaried women, as the debate over reverse regression might
suggest? Should the court be forced to pick between human capital
and establishment-oriented theories of labor economics, as the debate
over including a job level variable in a regression might suggest? The
first inquiry is bizarre because it seems as though a defendant is liable
in either case, while the second seems removed from the legal task of
assessing whether discrimination is present. In the voting rights context, should courts, in attempting some sort of causal inquiry, focus on
the races of candidates, the races of voters, or both? What sort of evidence from statistical models, regression or otherwise, would convince
courts that something causally connected to race is driving voting
patterns?
Few would doubt that litigation often turns on the esoterica of specialized fields, and when legal questions depend on exploring the unfamiliar, courtroom actors learn what they must. An oft-cited danger
in such situations is that generalist judges and lay juries will fail to
understand what they see and hear, even with material presented by
able litigators and articulate, chaste experts. But in my view, an underappreciated danger in such contexts is that of undue immersion
such that legal arbiters or triers of fact lose track of what it is that they
need to decide, of the questions they need to answer.
In the civil rights context, when experts analyze data containing information about repetitive events, such as imposition of capital pun–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
59 See, e.g., Vuyanich v. Republic Nat’l Bank of Dallas, 505 F. Supp. 224, 278 (N.D. Tex.
1980), vacated, 723 F.2d 1195 (5th Cir. 1984); PAETZOLD & WILLBORN, supra note 12, ch. 6,
§ 6.10, at 27–28 & n.7.
60 See infra note 83 and accompanying text.
2008]
CAUSAL INFERENCE
557
ishment, employment decisions, and elections, what’s the question?
The discussion above has demonstrated that, all too often, the quantitative question has been the significance of regression coefficients
(usually a single regression coefficient). Is that the right question?
The argument in favor of coefficients is that they capture a particular
kind of conditional association, and that a legal decisionmaker can,
perhaps in conjunction with other evidence, use the existence of the
association to support an inference of causation. But the previous sections have demonstrated that this alone will not suffice because regression is a mathematical and statistical technique, not a causal inference
framework. The ill-posed questions discussed in section I.B.2 demonstrate that this technique is insufficiently rich to allow (or insist on)
definition of a causal effect and articulation of a sharp causal question
of interest. The modeling issues that arise, discussed in sections I.B.3
and I.B.4, suggest problems regarding the linking of the causal question to a quantity that becomes the target of estimation. And regression’s lack of guidance on the choices one must make during implementation raises the possibility that bias of the analyst, discussed in
section I.B.1, will overwhelm any information the data contain.
The result is that while regression might capture a particular kind
of conditional association, without a theory to specify whether a particular association is legally relevant, analysts lack a principled basis to
decide something as fundamental as the “outcome of interest,” much
less the comparatively mundane matters of what variables to include
in a regression, in what form, and whether to use one regression or
two. We should tolerate this situation only if we must. In other
words, we should continue using regression in the way that we have
used it thus far only if we cannot come up with something better.
II. POTENTIAL OUTCOMES:
DEFINING CAUSAL EFFECTS AND BEYOND
This Part articulates the potential outcomes understanding of
causal inference, a general framework that defines a causal effect and
provides guidance on how to use data to draw inferences about the effects associated with an identified cause.61 A thumbnail sketch of the
framework is as follows. An analyst begins by identifying “units” that
can be subjected to different “treatments” and by picking an “outcome”
variable. It may help to think of hospital patients taking a pill or not
taking a pill when the hospital is measuring blood pressure levels. A
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
61 For technical primers on potential outcomes, see Guido W. Imbens & Donald B. Rubin,
Rubin Causal Model, in NEW PALGRAVE DICTIONARY OF ECONOMICS 255–60 (Steven N.
Durlauf & Lawrence E. Blume eds., 2d ed. 2008); and Paul W. Holland, Rejoinder, 81 J. AM.
STAT. ASS’N 968 (1986), in addition to the sources cited supra note 17.
558
HARVARD LAW REVIEW
[Vol. 122:533
causal effect is defined as a comparison between (i) the value a particular unit would have for the outcome variable after the application
of a treatment at a given time, and (ii) the value that same unit would
have for the same outcome variable after application of some alternative treatment at that same given time. Unfortunately, for any single
unit (call it “Unit A”), an analyst can only apply one treatment at a
given time, so only the outcome associated with the treatment actually
received can be observed. Thus, the analyst must search for a way to
fill in Unit A’s missing (counterfactual) outcome value, the one corresponding to the alternative treatment. One way to accomplish this
task is to find a unit (call it “Unit B”) that received the alternative
treatment; Unit B can donate its value as Unit A’s missing counterfactual outcome. For such a donation to make sense, Unit A and Unit B
must be similar to one another; if they are dissimilar, any difference in
their potential outcome values could be due to dissimilarity rather than
the differing treatments to which they were exposed. The best way to
assure similarity over a set of units is via a randomized experiment because randomization assures that all variables other than the treatment
are statistically similar; thus, in causal inference, a randomized experiment is the gold standard. In the civil rights context, a randomized experiment is impossible, so analysts should use observational
data to recreate the circumstances of a randomized experiment to the
extent possible. At bottom, this process of recreating a randomized
experiment typically involves searching for units that are similar to
one another in all observable ways (as measured before treatment, not
after) except treatment and ignoring the data from units that have no
counterparts.
A. Primitive Concepts:
Treatment, Units, and the Fundamental Problem
A potential outcomes understanding of causation begins with identification of four fundamental concepts: units, a treatment, the timing
of treatment assignment, and an outcome of interest. The units are the
things upon which a treatment operates. The nature of the causal inquiry is to discern how the treatment affects the value of a specified
outcome. Assume for the moment that there are precisely two forms of
treatment, denoted “M” and “F” (for perceived male and perceived female).62 There are then two potential outcomes for a single unit: one
would occur if the unit receives treatment M at a particular moment,
while the other would occur if the unit receives treatment F at the
same moment. The causal effect for a single unit involves some com–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
62 See infra notes 93–97 and accompanying text for a discussion of the use of “perceived” as
opposed to “actual” gender here.
2008]
CAUSAL INFERENCE
559
parison of the value of the outcome under treatment M to the value of
the outcome under treatment F; one typical comparison is the difference between the two quantities (that is, one minus the other). The
role of time in these definitions is critical. The analyst must specify
the timing of treatment assignment. The very concept of a “treatment”
refers to something applied at a particular time.
For a single unit called “Unit 1” that receives treatment M, the concepts in these definitions can be represented effectively in the following
chart, which also introduces some abbreviations (“xx” if something is
observed, “??” if something is not observed). The chart will grow horizontally and vertically as additional concepts are defined.
TABLE 1. CAUSAL INFERENCE TABLE FOR A SINGLE UNIT,
ASSUMING EXACTLY ONE FORM OF TREATMENT
Unit #
Treatment
1
M
Outcome (if treatment
M received)
some observed value = xx
Outcome (if treatment F
received)
missing/unobserved = ??
Again, the causal effect of the treatment on Unit 1 is some kind of
comparison between the outcome under treatment M and the outcome
under treatment F (at the same moment in time), often Outcome(M)
subtracted from Outcome(F). Immediately, what has been called the
“fundamental problem of causal inference”63 becomes evident: it is not
possible to observe directly the causal effect for any single unit. Even
in the simplest possible case — a single unit and a treatment that can
take on only two values64 — one of the potential outcomes is missing.
Assumptions and more information are needed for quantitative techniques to become relevant.
To make these concepts more concrete, return to the running example of a salary discrimination case focusing on gender and define
the treatment as being perceived as male, “M,” versus being perceived
as female, “F,” at the same moment in time.65 Then Table 1, above,
for a unit assigned treatment F looks like the following.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
63
64
Holland, supra note 17, at 947.
If the treatment could take on, say, four values, the chart would have a potential outcome
column for each treatment value (and thus four potential outcome columns).
65 A critical assumption here (stated somewhat simplistically) is that there is only one form of
“M” and one form of “F.” This assumption would be violated if, for example, it were nonsensical
to speak about a unit’s being perceived as male without discussing how “manly” the unit is. In
that case, the analyst would need to divide Table 2’s “Salary(M)” column into separate columns
labeled, say, “Salary(Very Manly)” and “Salary(Moderately Manly).”
The assumption that there is only one form of each treatment, together with the requirement of replication across different units, may be seen as the potential outcomes framework’s defense against the theoretical and philosophical attack that all forms of counterfactual reasoning
560
HARVARD LAW REVIEW
[Vol. 122:533
TABLE 2. CAUSAL INFERENCE TABLE FOR A SINGLE UNIT,
SALARY DISCRIMINATION, GENDER
Unit #
Treatment
Salary(F)
Salary(M)
1
F
some observed value = “xx”
missing/unobserved = ??
At this point, those with minimal training in tort law may feel like
they are in familiar territory. Tables 1 and 2 can be thought of as representations of but-for causation, the most basic causation concept in
law.66 Individual tort cases typically involve a comparison of what actually did occur with what would have occurred had some specific act
or omission not taken place. In such a case, one can think of the
treatment as either the act or omission or the absence of the act or
omission. The trier of fact in individual cases uses the evidence presented at trial and its own understanding of how the world works to
fill in the missing potential outcome and, subject to other relevant legal principles, decides the case accordingly. The critical concept implicit in but-for causation as applied in ordinary tort cases, but often
lost in the civil rights context, is time.67 In tort cases, the focus on a
particular act or omission typically makes it easy to identify a specific
time at which the treatment (the allegedly tortious act or omission) occurred. In the salary discrimination example in Table 2, however, the
problem can be more difficult, as discussed below.
B. Additional Units and the Non-Interference Assumption
As noted above, the fundamental problem of causal inference is
that for any individual unit at least one of the potential outcomes is
missing, and this problem makes additional information and assumptions necessary before statistical techniques become useful. Ordinarily,
the additional information comes in the form of observations of other
units, and an oft-made assumption is that the units do not interfere
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
are impossibly non-specific and inherently value-laden. See, e.g., Robert N. Strassfeld, If . . . :
Counterfactuals in the Law, 60 GEO. WASH. L. REV. 339 (1992) (supporting the use of counterfactuals); see also H.L.A. HART & TONY HONORÉ, CAUSATION IN THE LAW (2d ed. 1985). In the
potential outcomes framework, we imagine each unit’s situation to be exactly as it was except for
a single, sharply defined change as of a particular moment in time. Replication assures that puzzles posed by, for example, an unusual novus actus interveniens (superseding event), see HART &
HONORÉ, supra, at xlv, should not affect overall estimates because such bizarre chains of events
typically do not happen repeatedly.
66 See Strassfeld, supra note 65, at 345–46.
67 See Gastwirth, supra note 50, at 176 (“The failure of some courts to realize the importance
of time and the need for exchangeability, in my opinion, has led to far more ‘legal mischief’ than
some of the technical issues concerning regression analysis that have dominated the statistical literature.” (citations omitted)).
2008]
561
CAUSAL INFERENCE
with one another.68 In other words, with a set of multiple observations, the potential outcomes for one unit do not depend on the treatment a different unit receives.69 If so, then an expanded version of
Table 2, above, for a dataset with three units receiving treatment F
(perceived as female) and five receiving treatment M (perceived as
male) appears below.70
TABLE 3. CAUSAL INFERENCE TABLE FOR MULTIPLE UNITS,
SALARY DISCRIMINATION, GENDER, ASSUMING NONINTERFERENCE AND EXACTLY ONE FORM OF TREATMENT
Unit #
1
2
3
4
5
6
7
8
Treatment
F
F
F
M
M
M
M
M
Salary(M)
??
??
??
47,100
98,900
97,700
95,300
81,300
Salary(F)
91,200
92,400
94,100
??
??
??
??
??
Again, notice that half of the information we would like to have is
missing. Specifically, what is missing is the counterfactual for each
unit, that is, the outcome under the treatment that was not applied to
that unit. To proceed, there must be some way to fill in all of the missing values from Table 3.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
68 If the units do interfere with each other, then the treatment applied to one unit might affect
the other’s potential outcomes. For example, if a firm is applying a hard gender quota to hiring
decisions, the fact that Unit 1, a person who in fact received treatment perceived male, is hired
may decrease the probability that Unit 2, who also in fact received treatment perceived male, is
hired. Thus, we need to know the treatment both units received in order to define the potential
outcomes of either. Pictorially, the chart for two units, where both in fact received treatment perceived male (T = M), but where either could have received the alternative treatment perceived
female (T = F), and where O stands for “outcome” (here, whether the unit is hired), would be as
follows.
Unit #
Treatment
O(T1=M, T2=M)
O(T1=M, T2=F)
O(T1=F, T2=M)
O(T1=F, T2=F)
1
M
xx
??
??
??
2
M
xx
??
??
??
69 In some situations, this non-interference assumption is hard to recognize and assess. For
example, if one attempts to draw causal inferences about judicial behavior from appellate decisions, and in doing so one treats cases as units, then the non-interference assumption means that
precedent has no relevance or force in judicial decisionmaking. See, e.g., Epstein et al., supra note
20, at 65–69. Litigators might find such an assumption unsettling.
70 These values, as well as those in subsequent tables in this section, come from some of the
artificial data graphed in Figure 1.
562
HARVARD LAW REVIEW
[Vol. 122:533
C. Donating Values: Filling in the Missing Counterfactuals
With the problem defined in this manner, one instinctive way to fill
in Unit 1’s missing salary under treatment M is to substitute an observed value from a unit that actually did receive treatment M, that is,
an observed salary from Units 4–8. Lacking a better idea, the analyst
might choose among Units 4–8 at random with equal probability (1/5)
and “impute” the chosen unit’s salary as Unit 1’s salary under treatment M (then do the same for Units 2–3). For Units 4–8, the analyst
does the opposite, that is, picks a unit from 1–3 with probability 1/3
and imputes the corresponding salary. In essence, observations “donate” observed outcomes to one another to fill in each unit’s missing
potential outcomes; one can also refer to “imputation” of potential outcomes. An example of one iteration of this procedure produces the following table, with imputed values in bold italic type.
TABLE 4. CAUSAL INFERENCE TABLE FOR MULTIPLE
UNITS, SALARY DISCRIMINATION, GENDER, VALUES
IMPUTED FOR MISSING POTENTIAL OUTCOMES
Unit #
1
2
3
4
5
6
7
8
Treatment
F
F
F
M
M
M
M
M
Salary(M)
81,300
47,100
95,300
47,100
98,900
97,700
95,300
81,300
Salary(F)
91,200
92,400
94,100
91,200
94,100
91,200
92,400
94,100
With the missing potential outcomes filled in, calculating any quantity of interest is straightforward. For example, to estimate the average difference in salary caused by the perceived gender treatment, an
analyst can calculate the average in the Salary(F) column minus the
average in the Salary(M) column.71
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
71 There are various ways to generate uncertainty measures. For example, an analyst can repeat this (random) imputation procedure a great many times, calculating the average of the Salary(F) column minus the average of the Salary(M) column each time, to obtain a distribution of
the possible values of average Salary(F) minus the average Salary(M).
The idea of donating values to fill in missing potential outcomes is the instinct behind socalled “audit studies” popularized in the law, primarily in the fair housing context. An unfortunate fact of life is that the researcher can only randomize which of a pair of testers visits a vendor,
or perhaps the order in which two testers visit the vendor, as opposed to randomizing a single
tester’s race or gender. That fact of life is the source of the instinct to make the testers resemble
each other in as many ways as possible except for race or gender.
2008]
CAUSAL INFERENCE
563
D. Randomization: Balance in Background Variables
From the description above, it should be clear this procedure will
produce misleading results without another critical assumption: the
units perceived male cannot be systematically different from the units
perceived female in some relevant way the analyst does not observe.72
If, for example, Units 1 and 8 differ in a salary-relevant way other
than the treatment given to each, then this business of using Unit 8’s
observed salary for what Unit 1’s salary would have been had Unit 1
been perceived male is a bad idea. Any causal effect that an analyst
attributes to the treatment might really be due instead to the other difference between Units 1 and 8.
The best way to assure73 that the units assigned M are not systematically different from the units assigned F is to assign the treatment
(M or F) randomly. This is what makes random assignment such a
powerful procedure in causal inference. Random assignment assures
that, in the absence of bad luck, units who receive one treatment
are not systematically different from those who receive the other
treatment.
More specifically, randomization assures that, with an important
set of exceptions discussed immediately below, any variable that might
affect the potential outcomes will look approximately the same for the
group assigned treatment M as it does for the group assigned treatment F. Here, “look approximately the same” means that the distribution of the variable will be the same in the M and the F groups, that
is, the same among men and women. In the salary discrimination
running example, if it were possible (see below) to randomize the perceived gender treatment, the randomization would assure that the pattern of years of education values for men would look roughly the same
as the pattern of years of education values for women. Because of this
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
The literature reporting the results of, assessing, and criticizing audit studies is vast. For
overviews and summaries, see, for example, A NATIONAL REPORT CARD ON DISCRIMINATION
IN AMERICA: THE ROLE OF TESTING (Michael Fix & Margery Austin Turner eds., 1998);
CLEAR AND CONVINCING EVIDENCE: MEASUREMENT OF DISCRIMINATION IN AMERICA
(Michael Fix & Raymond J. Struyk eds., 1993); and HARRY CROSS ET AL., URBAN INST., EMPLOYER HIRING PRACTICES: DIFFERENTIAL TREATMENT OF HISPANIC AND ANGLO JOB
SEEKERS (1990). For criticisms, see, for example, Bertrand & Mullainathan, supra note 48, at
993–94; and James J. Heckman, Detecting Discrimination, J. ECON. PERSP., Spring 1998, at 101,
104–11.
72 In more precise terms, the assumption is that the potential outcome vectors (the two salaries
for each unit in the example above) for the persons perceived as male (treatment M) are not systematically different from the potential outcome vectors for the persons perceived as female
(treatment F).
73 Technically, “assure” is too strong. When there is a sufficient number of units, random assignment makes systematic differences between treated and control units unlikely in a way analysts can quantify and handle.
564
HARVARD LAW REVIEW
[Vol. 122:533
similarity, something quantitative analysts call “balance,”74 it is
unlikely that any systematic difference in salary between men and
women is due to disparate lengths of time in school. Note that this
balance would occur even if no one measured the years of education
variable. That, again, is the power of randomization; it even balances
variables that analysts do not see.
The previous paragraph explained that randomization balances
variables that might have a role in determining the potential outcomes,
subject to an important set of exceptions: those variables that are
themselves affected by the treatment. This is as it should be, as analysts do not ordinarily want balance in variables affected by treatment.75 An example (standard in the quantitative literature) demonstrates why: Suppose an analyst is assessing the effect of smoking on
death, and suppose it were possible to randomize some people to
smoke and some to avoid smoking. Would the analyst expect randomization to assure that the percentage of people who contract lung
cancer be similar among smokers and non-smokers? The answer is
“No,” if incidence of lung cancer is itself affected by the treatment received (smoking or non-smoking). Nor should the analyst desire balance on this variable, particularly if lung cancer is a means by which
smoking (the treatment) induces death (the outcome of interest).
Speaking in commonly used terms that are dangerously unclear, an
analyst assessing the effect of smoking on death should not “control
away” the “effect” of lung cancer.76
Thus, in a randomized experiment, the gold standard in causal inference, a clear distinction exists between variables that cannot be affected by the treatment and variables that might be so affected. Randomization can, and should, balance the former. Randomization does
not, and should not, balance the latter. The importance of time, specifically the time at which treatment is assigned and applied, is again
evident. If a variable is measured before the assignment of treatment,
then that variable cannot be affected by treatment. If the variable is
measured after application of treatment, an analyst must think care–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
74 See, e.g., Paul R. Rosenbaum & Donald B. Rubin, Reducing Bias in Observational Studies
Using Subclassification on the Propensity Score, 79 J. AM. STAT. ASS’N 516, 517–19 (1984).
75 See Paul R. Rosenbaum, The Consequences of Adjustment for a Concomitant Variable That
Has Been Affected by the Treatment, 147 J. ROYAL STAT. SOC’Y SER. A 656 (1984); see also PAUL
R. ROSENBAUM, OBSERVATIONAL STUDIES 73–74 (2d ed. 2002).
76 Although harder to visualize, the reverse can (and does) also happen. That is, in the smoking, lung cancer, and death hypothetical, one danger is that incorrectly “controlling for” or balancing on lung cancer (an intermediate outcome, a variable that might itself have been affected by
treatment) might make it appear that no causal effect exists when in fact such an effect does exist.
In some situations, however, incorrectly “controlling for” or balancing an intermediate outcome
can make it appear as though a causal effect exists when in fact there is no such effect. See, e.g.,
Ho, supra note 20, at 2000.
2008]
CAUSAL INFERENCE
565
fully as to whether the treatment could affect it. To return again to
the salary discrimination running example, suppose that the years of
education variable is “measured,” in the sense of recorded in the employer’s computer database, after the employer finds out whether the
potential employee is male or female, that is, after treatment. It might
be plausible to believe nevertheless that the years of education variable
is unaffected by perceived gender. An analyst might assume reasonably that the process of recording the number of years an employee
spent in school is a mechanical matter of transferring a value written
on a resume or a job application to a computer file, and that there is
no bias in this process related to perceived gender. If so, then an analyst should desire and expect that randomization would balance years
of education. In contrast, and as explained in greater detail below, it is
not clear that a similar assumption would always be plausible with respect to a job level variable; a firm might assign persons to jobs on the
basis of perceived gender.
Finally, a word about vocabulary: variables that cannot be affected
by treatment are often called “covariates,” while variables that are affected by treatment are called “intermediate outcomes.” Thus, the
analyst’s job is to separate covariates from intermediate outcomes and
to examine the latter with particular care.
E. Observational Studies:
Challenges and Some Ways To Address Them
Experiments that randomize race and gender are not possible. In
the civil rights context, courts are typically limited to observational
data — that is, data that are not gathered from randomized experiments. In observational studies, the analyst controls neither the treatment each unit receives nor the time at which the unit receives the
treatment. This lack of control means that the analyst cannot depend
on randomization either to distinguish covariates from intermediate
outcomes or to balance the covariates. For these reasons, even though
potential outcome tables, such as Table 4, can (and should) be constructed in observational studies, it is not immediately clear how an
analyst should identify units to donate outcome values to one another
because it is not clear whether the “donor” unit is sufficiently similar to
the “donee” unit to make the transfer of values plausible. Systematic
differences of the type discussed in section II.D may exist.
A fundamental premise of the potential outcomes framework as
applied to observational studies is that because randomization is unavailable either to distinguish covariates from intermediate outcomes
or to balance covariates, analysts must do so themselves. In other
words, the goal in an observational study is to recreate to the extent
possible a randomized experiment. To accomplish this goal, the analyst must (i) sharply identify the units, the treatment, the timing of
566
HARVARD LAW REVIEW
[Vol. 122:533
treatment assignment, and the outcome of interest; (ii) assess the plausibility of underlying assumptions, such as units not interfering with
one another; (iii) separate covariates from intermediate outcomes; (iv)
examine the distributions of the covariates in the treatment groups to
see whether they are balanced; and if not, (v) balance these distributions. The first four steps are clear enough. The last, balancing the
covariate distributions, means looking for subsets of the data in which
the two treatment groups have similar patterns of covariate values.
Units with covariate values in one treatment group that lack rough
analogues in the other treatment group are “discarded” in the sense
that they are not used for inference.
To understand this balancing process, return to the data from Figure 1, above, which depict salaries versus years worked for men and
women and are thus a continuation of the running salary discrimination example. Recall that Figure 1 demonstrates a problem with the
regression model standard among employment discrimination experts,
as represented by the two parallel lines that stretch across the entirety
of the graph. Figure 1 shows that even though a visual inspection of
the data depicts men being paid more than women at any level of
years worked in which both genders are represented, the standard regression model produces a statistically significant result in women’s
favor, a result adverse to a female class’s litigation proof. Graphically,
the line for women is above the line for men.
Figure 2 reproduces the Figure 1 data with a critical difference:
The outcomes of interest (salaries) have been removed. The search for
subsets of the data that might be reasonably analogized to data from a
randomized experiment involves looking for regions of the yearsworked variable in which both treatment types are well represented.
In Figure 2, the vertical lines at values of five, ten, and fourteen years
worked divide the data into Regions 1–4, as labeled on the graph. The
data in Regions 1 and 4 do not look as though they came from a randomized experiment; only one of the thirty or more units in these regions received treatment F (perceived female).77 It would be unwise to
attempt inferences in these two regions. In Regions 2 and 3, however,
the years-worked covariate appears reasonably balanced — that is,
there appears to be a sufficient number of units assigned to each
treatment to allow inference to proceed. Notice that the analyst can
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
77 A randomized experiment need not have a .5 probability of assigning each of two treatments
to produce the favorable properties discussed above. If, however, the probability of one of the
two (or more) assignments gets too low, it becomes difficult to draw inferences about the effect
caused by that treatment without a significant amount of data. With extremely low treatment
probabilities, there often are not enough units with a particular treatment to discern patterns.
Figure 2 again provides an example. How does salary vary by years worked for units perceived
as female in Region 1? With only one unit, it is difficult to say.
2008]
567
CAUSAL INFERENCE
(and should) identify these regions without using the outcome (salary)
data.
To be perfectly clear: the implication of applying the potential outcomes framework to these data is that estimation should not be attempted for employees with fewer than five or greater than (about)
fourteen years worked. The analyst discards data, and discarding data
ordinarily carries a cost, namely, loss of precision. In other words, the
resultant causal estimates typically have wider confidence intervals
and are thus less likely to be deemed statistically significant. That is
the price to be paid for relaxing the implausible assumptions regression
makes.78
FIGURE 2. MALE AND FEMALE YEARS WORKED ONLY
Region 2
Region 1
Region 3
Region 4
Men
Women
0
5
10
15
20
Years Worked
Data From Figure 1 Without the Salary Variable: The analyst can isolate
subsets of the data that can be analogized to data from a randomized experiment, such as Regions 2 and 3, above. This process can (and should)
take place without reference to the outcome (salary) values.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
78 If an analyst perfectly specified a regression equation over the whole range of the data, discarding units in Regions 1 and 4 (and the corresponding loss of precision) would not be necessary.
As discussed above, however, claims of perfect specification are difficult to credit. See supra section I.B.
To the extent that it counsels discarding uninformative data and thus suffering a loss of precision, the potential outcomes framework might be deemed to favor defendants. In contrast, the
implication that intermediate outcomes not be included in balancing (corresponding roughly to
the assertion that tainted variables not be included on the right hand side of a regression equation), see supra notes 75–76 and accompanying text, is ordinarily deemed to favor plaintiffs.
568
HARVARD LAW REVIEW
[Vol. 122:533
Having identified these subsets of data,79 one can proceed in a variety of ways; two are illustrated here. One method is to fit regression
lines in the subsets of the data.80 Figure 3 shows four such lines, one
treatment-F and one treatment-M line for the data in each of Regions
2 and 3. Notice that the two male (solid) lines largely remain above
their female (dashed) counterparts in both regions, suggesting that being perceived female is causing a lower salary. These four regression
lines here are worlds apart from the two regression lines in Figure 1 in
critical ways. The analyst here is not assuming a constant additive
treatment effect even within Region 2 or Region 3, in contrast to the
regression equation example discussed in section I.B.3.b; visually, this
is clear from the fact that the lines in Figure 3 are not parallel. The
separate, non-parallel lines for units receiving treatments M and F allow for more complex patterns in the data. Moreover, the Figure 3 regression lines are fit over smaller ranges of the data than their Figure 1
counterparts. The technical assumptions of regression81 (some of
which this Article does not discuss) are more plausible over the smaller
ranges of data represented by the Figure 3 regions than over the whole
range of data, as in Figure 1.82
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
79 Exactly how to divide the data into subsets or blocks is a matter of current research. There
is some guidance on the separate issue of how many blocks to define. See, e.g., W.G. Cochran,
The Effectiveness of Adjustment by Subclassification in Removing Bias in Observational Studies,
24 BIOMETRICS 295 (1968).
80 This use of regression lines demonstrates that the potential outcomes framework does not
suggest wholesale abandonment of the regression technique, but rather judicious use of it in a
way that does not purport to interpret the coefficients causally. See generally Paul W. Holland,
The Causal Interpretation of Regression Coefficients, in STOCHASTIC CAUSALITY 173 (Maria
Carla Galavotti et al. eds., 2001) (clarifying under what assumptions it might be plausible to have
a causal interpretation of regression coefficients).
81 See, e.g., Yulia Gel et al., The Importance of Checking the Assumptions Underlying Statistical Analysis: Graphical Methods for Assessing Normality, 46 JURIMETRICS 3 (2005).
82 The desire to avoid using a single model (or regression line) over a large range of the data is
the reason why Regions 2 and 3 are treated separately instead of combined into a single region.
2008]
569
CAUSAL INFERENCE
FIGURE 3. MALE AND FEMALE SALARIES VERSUS YEARS
WORKED
Salary (in thousands)
100
95
90
Men
85
Region 2
Region 3
Women
80
5
10
Years Worked
Subset of Figure 1 Data Concentrating on Regions 2 and 3: This graph focuses on areas of the data in which an analyst may reasonably attempt
causal inference (Regions 2 and 3) and ignores those where inference
should not be attempted (Regions 1 and 4). The solid lines represent the
best-fitting lines in each region for men. The dashed grey lines are the
best-fitting lines in each region for women. Note that here, in contrast to
Figure 1, the female lines are below the male lines for almost the entire
graph, conforming to the intuition that women (not men) suffer from salary discrimination in this data set. The figure also shows how to use the
lines to predict the salary a woman would have received had she been
male. Specifically, a dashed line rises up from the diamond at Years
Worked = 10.5 and then moves over to Salary = $99,700. Thus, the prediction for the woman in Region 3 with the lowest seniority, the left-most
diamond in that region, is that she would have earned $99,700 had she
been a man.
Nor do separate regression lines for M and F units pose a problem
for inferences in the potential outcomes framework, as they did with
ordinary regression, as discussed in section I.B.4. The potential outcomes framework makes clear that the goal of the analysis is not to estimate regression coefficients, but rather to fill in missing potential
outcome values. To see how an analyst might use the separate regression lines to do so, consider the unit assigned treatment F that has the
lowest value for years worked in Region 3 — that is, the left-most
570
HARVARD LAW REVIEW
[Vol. 122:533
diamond in Region 3 of Figure 3, with about 10.5 years worked. An
analyst might fill in the missing potential outcome for that unit, the
salary that would have been observed had that unit received treatment
M, by putting her finger on the corresponding diamond and traveling
straight up until the finger intersects the male line. Then, the analyst
would move her finger straight to the left until it intersects with the yaxis. The corresponding number on the y-axis constitutes a predicted
value for the counterfactual salary outcome for this unit. Figure 3
shows, with a dashed line, this tracing process. For the unit under
consideration, the model predicts that had she been perceived as male,
she would have earned approximately $99,700, as opposed to the approximately $93,000 she did earn. The analyst repeats the process for
each unit until the potential outcomes table is full — that is, all female
units (in Regions 2 and 3) have an imputed value in the “Salary(M)”
column, and all male units (in Regions 2 and 3) have an imputed value
in the “Salary(F)” column.83
Another way of filling in the missing values of the potential outcomes table is a species of a process called “matching.” To understand
this process, abandon Figure 2 (and its Regions), and consider Table 5,
below, which, unlike Table 3, includes an additional column to show
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
83 In more technical and precise language, one estimates the regression coefficients associated
with the M units, then combines these estimates with the F covariate values to predict counterfactual potential outcome values for the F units. To predict the counterfactual values for the units
that received treatment M, reverse the process. There are several ways to generate uncertainty
estimates. The easiest and most intuitive is to specify a prior distribution on the regression coefficients and use Bayesian simulation techniques. See, e.g., GELMAN ET AL., supra note 58, at 353–
85. In other words, one fills in missing potential outcome values over and over again with a certain amount of random noise included each time. That generates a distribution of the quantity of
interest.
This general approach of marrying coefficients from a model fit to control group data with
treatment group covariates to generate a counterfactual outcome for the treated data points is often referred to as the “Peters-Belson” method after the two authors who independently proposed
it. See William A. Belson, A Technique for Studying the Effects of a Television Broadcast, 5 APPLIED STAT. 195 (1956); Charles C. Peters, A Method of Matching Groups for Experiment with
No Loss of Population, 34 J. EDUC. RES. 606 (1941). As the dates of these references indicate, it
was proposed prior to what some, see sources cited supra note 17, consider the articulation of the
potential outcomes framework. By now it should be clear that while the Peters-Belson method is
a useful refinement of the regression technique, it is not a substitute for a causal inference framework. In particular, nothing in the Peters-Belson method suggests separating covariates from intermediate outcomes or balancing covariates.
The discussion above is intended only to illustrate the Peters-Belson or two-regression technique (as applied within the potential outcomes framework) conceptually. An analyst actually
applying the technique would need to account for at least two sources of uncertainty: the uncertainty in the regression coefficients (by drawing from their posterior) and the uncertainty in the
counterfactual salary (by drawing from its posterior conditional on a draw of the regression coefficients). Thus, the predicted counterfactual salaries would never be exactly where the fingertracing technique above would put them, and in any event the regression lines used to do the tracing would change from simulation iteration to iteration.
2008]
571
CAUSAL INFERENCE
each unit’s number of years worked. Pretend for a moment that only
these eight observations exist.
TABLE 5. TABLE 3, REPRINTED WITH AN ADDITIONAL
COLUMN FOR YEARS WORKED AND SHADING
TO EMPHASIZE THE MATCHING OF UNITS
Unit #
1
2
3
4
5
6
7
8
Years Worked
6.2
9.0
13.2
0.5
9.3
5.7
12.6
19.1
Treatment
F
F
F
M
M
M
M
M
Salary(M)
??
??
??
47,100
98,900
97,700
95,300
81,300
Salary(F)
91,200
92,400
94,100
??
??
??
??
??
Focus for the moment on Unit 1. To proceed with a causal inference, an analyst needs a value for what this unit’s salary would have
been had it received treatment M. Of the five units in the table that
actually did receive treatment M, the one that looks the most like Unit
1 is Unit 6 because among those five units, Unit 6’s 5.7 years worked
is the most similar to Unit 1’s 6.2. Thus, it makes sense for Unit 6 to
donate its observed salary value to be Unit 1’s missing, counterfactual
value. Units 1 and 6 are a reasonably close “match.” One can proceed
to find reasonable matches for Units 2, 3, 5, and 7 (the pairings of 2
and 5 and of 3 and 7 seem intuitive). Units 4 and 8 present a problem:
both received treatment M, and both have values of years worked that
are fairly far removed from the values represented in any unit that received treatment F. Under a potential outcomes framework, causal inference is probably not advisable with respect to these units. The best
procedure is to ignore them; this is the matching equivalent of ignoring
the data in Regions 1 and 4 of Figure 2, above. Overall, the intuition
with matching is that one has created a mini-randomized experiment
in each pair of units, with the pairs identified according to their values
on some other important variable.84
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
84 Years ago, employment discrimination expert witnesses debated the merits of a primitive
form of what is labeled above as “matching.” These experts called the technique “cohort analysis.” Compare Carl C. Hoffman & Dana Quade, Regression and Discrimination: A Case of Lack
of Fit, 11 SOC. METHODS & RES. 407, 426–38 (1983) (favoring this technique), with Michelson,
supra note 43, at 169, 177–78 (noting the inadequacies of cohort analysis). The Hoffman and
Quade article is extraordinary in its lucid discussion of the problem of finding similar units for
appropriate comparison, its concomitant emphasis on balancing covariate distributions, and its
articulation of the failings of regression, particularly the issue of extrapolating across wide ranges
572
HARVARD LAW REVIEW
[Vol. 122:533
The result of this matching process is conceptually the same as the
result of the regression-lines approach — namely, a completed potential outcomes table for those units as to which an analogy to a randomized experiment makes sense. For the subset of units represented in
Tables 3 and 5, the result might resemble Table 6, where imputed salaries are shown in italics.
TABLE 6. POTENTIAL OUTCOMES TABLE, FULLY IMPUTED
Unit #
Years Worked
Treatment
Salary(M)
Salary(F)
1
6.2
F
97,700
91,200
2
9.0
F
98,900
92,400
3
13.2
F
95,300
94,100
5
9.3
M
98,900
92,400
6
5.7
M
97,700
91,200
7
12.6
M
95,300
94,100
With the potential outcomes table fully imputed, any quantity of
interest can be calculated. One obvious candidate for calculation is
the average causal effect of gender on salary levels for all units.85 In a
salary discrimination lawsuit, however, the real target of inference is
the members of the plaintiff class, so a more interesting quantity might
be the average causal effect for women — that is, for those units that
received treatment F. From Table 6, that quantity would be 1/3 *
((91,200-97,700) + (92,400-98,900) + (94,100-95,300)) = -4733. Thus, in
the running example of salary discrimination, after an analysis of the
data from Table 3, an expert witness could testify as follows: For
members of the plaintiff class as to which a causal conclusion could be
made, the fact that the defendant firm perceived them to be female
caused the firm to reduce their salaries by approximately $4700 on average. Notice, moreover, that the expert (with a larger data set) could
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
of the covariate space. See Hoffman & Quade, supra. It appears that the article was ignored,
which is a pity.
Other commentators appear to be feeling their way toward something like matching, although they appear to appreciate fully neither the issues at stake nor the power of the technique
in drawing inferences of causation. See, e.g., CONNOLLY ET AL., supra note 12, § 11.10[1], at 11–
25. As demonstrated by the cases Michelson collects, supra, courts largely rejected cohort analysis
because (supposedly) (i) it was unsupported in the literature and (ii) it had low power to detect
discrimination. In my view, the former problem stemmed from the fact that the technique was
never linked to an explicit definition of and framework for causal inference, while the latter resulted from difficulties encountered when experts attempted to achieve exact or nearly exact
matches in the presence of multiple covariates and limited data. Multiple covariates and limited
data resulted in few pairs being deemed permissible matches, and thus the cohort analysis had
low power to detect discrimination. As this Article demonstrates, both problems can now be addressed. The explicit causal paradigm is, of course, the potential outcomes framework, while the
problem of achieving balance on multiple covariates is discussed infra section II.F.
85 See supra note 71 and accompanying text.
2008]
CAUSAL INFERENCE
573
also make calculations specific to subsets of the plaintiff class, such as
those at the low end or the high end of years worked, by isolating the
relevant units in the potential outcomes table and doing similar arithmetic. Such flexibility was not available from the simple regression
model discussed in section I.B.3.b.86
The two methods briefly outlined above, separate regression lines
and matching, should provide roughly similar completed potential outcomes tables and thus roughly similar substantive results. The reason
is that the analyst has done the hard work of isolating subsets of the
data to treat them as mini-randomized experiments first, and with
data that came from a randomized experiment, the statistical technique used during analysis matters less.87 If the results obtained by
these methods differ substantially, the analyst should be wary; it would
appear that even after attempting to make the data look as though
they came from a randomized experiment, the results are still sensitive
to modeling assumptions.
F. The Need for Lots of Background Variables
The previous sections explain that obtaining balance in covariates
is a key part of causal inference and that randomized experiments balance covariates regardless of whether the covariates are measured by
the experimenter or are unobserved. In observational studies, however, the analyst must balance by isolating subsets of data with similar
covariate values. These facts lead to an important difficulty with observational studies, which in turn leads to a critical (often the most
critical) assumption. The basic problem is that analysts cannot balance variables that they do not observe. To draw causal inferences
from an observational study, an analyst must assume that she has
measured all of the really “important” covariates — that is, the covariates that affect both the probability of receiving a particular treatment
and the values of the potential outcomes.88
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
86 Again, certain technical details in this discussion, such as how to generate estimates of uncertainty, are deliberately suppressed. Note that active research is ongoing on this specific subject. E.g., Alberto Abadie & Guido W. Imbens, On the Failure of the Bootstrap for Matching Estimators (May 2006) (unpublished manuscript), available at http://elsa.berkeley.edu/~imbens/
bootstrap.pdf.
87 See, e.g., Robert J. LaLonde, Evaluating the Econometric Evaluations of Training Programs
with Experimental Data, 76 AM. ECON. REV. 604 (1986).
88 This assumption is also present in regression as used in modern civil rights litigation. In
fact, the role of this particular assumption in an observational study is not new or unique to the
potential outcomes framework. The problem goes by various names in various fields; “selection
bias” and “presence of confounders” are two that are popular in econometrics. One may also
analogize this issue to the problem of “omitted variable bias,” although this term is more commonly used in conjunction with a particular model (such as regression), while the concept here is
more general.
574
HARVARD LAW REVIEW
[Vol. 122:533
In the salary employment discrimination example, for instance, an
analyst never has direct measurements of variables such as “motivation” or “competence” (assuming such concepts to be well defined).
Their absence could cause difficulties if, for example, (i) these variables
are distributed differently among persons perceived male versus persons perceived female, and (ii) the defendant firm bases its salary decisions on something related to these variables that the analyst does not
observe. In a courtroom setting, these difficulties should not be overstated; defendants will invariably claim that missing variables explain
any observed disparity, and judges who uncritically accept such explanations render discrimination law unenforceable, at least via the class
action device. Nevertheless, the fact that analysts cannot balance
what they do not see suggests that they should attempt to see as many
of the variables the defendant firm used (or claims to have used) in its
decisionmaking as possible.89 If measurements of the variables that
the defendant firm actually used are unavailable, the analyst might resort to related covariates. The hope is that by achieving balance in the
related covariates, the analyst achieves balance on the important
variables.
This discussion leads to a final principle that must be explained before one can apply the potential outcomes framework to civil rights
data sets: how to balance multiple covariates simultaneously. To understand the problem, consider the matching technique outlined in the
previous section and recall the discussion of how in Table 5 the values
of the years-worked variable suggested matching Unit 1 to Unit 6.
Imagine now that a second variable, such as years of education, was
measured and that with respect to the years-of-education variable,
Unit 1’s closest potential match was Unit 5. To which of the two
should Unit 1 be matched: Unit 6 (its closest fit on years worked) or
Unit 5 (its closest fit on years of education)? Moving away from the
matching technique specifically, how does one achieve balance on several covariates at once?
The potential outcomes framework literature has focused on this
problem for decades, and a variety of answers now exist. One popular
option is to estimate something called a “propensity score,” which is
the probability that a particular unit receives a particular treatment.
The analyst uses the observed covariates, together with a statistical
model or technique, to estimate a probability for each unit that it received one of the two treatments. The idea, again, is to recreate a
randomized experiment by imagining that treatment application was
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
89 Quantitative techniques are available to assess the sensitivity of causal inferences to unmeasured covariates. The classic example in the observational study context is Jerome Cornfield
et al., Smoking and Lung Cancer: Recent Evidence and a Discussion of Some Questions, 22 J.
NAT’L CANCER INST. 173 (1959).
2008]
CAUSAL INFERENCE
575
in fact random, but that the analyst “lost” the probabilities used in the
random assignment. If lost probabilities can be recovered with reasonable accuracy, and if certain technical assumptions are met, then
effective balancing of the propensity scores will induce effective balancing of the observed covariates.90 Note that an analyst can use a
variety of statistical methods to check whether such balance has in
fact been achieved, and if it has not, the analyst can modify the original propensity score model.91 Again, the process is somewhat technical, but the point is that by estimating propensity scores, analysts can
reduce the problem of simultaneously balancing many covariates down
to a problem of balancing one variable. In this sense, the propensity
score acts as a one-dimensional summary of the covariates.92 Nothing
in this process of balancing multiple covariates — the hard work of
causal inference in an observational study — requires the use of the
outcome variable (for example, salary in the running gender discrimination hypothetical).
III. POTENTIAL OUTCOMES IN THE
CIVIL RIGHTS LITIGATION SETTING
This Part demonstrates how the potential outcomes definition of
and framework for causation may be applied in the civil rights litigation context, particularly in the areas of capital punishment, employment discrimination, and vote dilution. In doing so, this Part demonstrates that the potential outcomes framework addresses problems
discussed in section I.B that regression could not. In particular, this
Part argues that one can often identify primitives (units, treatment,
timing of treatment assignment, and outcome) and articulate coherent
questions linked to available data in the capital punishment and employment discrimination contexts, but that matters can become more
difficult in the vote dilution setting.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
90 See, e.g., Paul R. Rosenbaum & Donald B. Rubin, The Central Role of the Propensity Score
in Observational Studies for Causal Effects, 70 BIOMETRIKA 41, 41 (1983).
91 See, e.g., Kosuke Imai et al., Misunderstandings Between Experimentalists and Observationalists About Causal Inference, 171 J. ROYAL STAT. SOC’Y SER. A 481, 498 (2008). In addition
to the propensity score–based methods, other algorithms are discussed in, for instance, Alexis
Diamond & Jasjeet S. Sekhon, Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies (June 12, 2008) (unpublished manuscript), available at http://sekhon.berkeley.edu/papers/GenMatch.pdf.
92 The literature that discusses or uses propensity score methods is extensive. The articles collected in MATCHED SAMPLING FOR CAUSAL EFFECTS (Donald B. Rubin ed., 2006) are a good
place to start. The articles collected there also demonstrate the use of the propensity score as a
one-dimensional summary of the covariates for purposes of blocking. See generally supra note 79.
576
HARVARD LAW REVIEW
[Vol. 122:533
A. In General
1. Identifying Primitive Concepts. — In the civil rights context, as
in any other area of causal inference, an analyst’s first task is to identify the units, the treatment, the timing of treatment assignment, and
the outcome variable. In most instances, the units are individual people or cases: criminal defendants or victims in the capital punishment
context, employees or applicants in employment discrimination. The
outcome of interest is something assigned by an institutional or socioeconomic actor: the imposition of capital punishment by juries or salary received from a defendant employer. The treatment is ordinarily
the institutional or socioeconomic actor’s perception of a prohibited
personal characteristic (gender, race, national origin).
Why identify the treatment as perceived gender (or race), as opposed to self-identified gender (or race), or actual gender (or race), assuming that there is general agreement on the meaning of actual gender (or race)?93 There are a variety of reasons, many of them
technical.94 For the purposes of this discussion, one is sufficient: time.
As explained previously, one must specify the time at which treatment
was applied.95 Furthermore, and as noted above, in the civil rights
context we typically wish to assess whether some actor’s decisions as
to a group of individuals were affected by some characteristic common
to the group.96 The law’s focus on the specific actor’s decisionmaking
requires, indeed compels, the analyst to regard as “given” the charac–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
93 In fact, in the racial context in particular, scholars have difficulty agreeing on an “objective”
definition. Thinkers in a wide variety of fields believe that race is a social construct, not a biological set of characteristics. See, e.g., PAUL W. HOLLAND, EDUCATIONAL TESTING SERVICE,
CAUSATION AND RACE 3 (2003); IAN HANEY LÓPEZ, WHITE BY LAW: THE LEGAL CONSTRUCTION OF RACE xxi (10th anniversary ed., rev. 2006); Am. Ass’n of Physical Anthropologists, AAPA Statement on Biological Aspects of Race, 101 AM. J. PHYSICAL ANTHROPOLOGY
569, 569–70 (1996), available at http://www.physanth.org/positions/race.html; Am. Anthropological Ass’n, Statement on “Race” (May 17, 1998), http://www.aaanet.org/stmts/racepp.htm; see also
Saint Francis Coll. v. Al-Khazraji, 481 U.S. 604, 610 n.4 (1987). But see 18 U.S.C. § 1093(6)
(2006) (defining “racial group” in terms of “physical characteristics or biological descent”).
94 A primary issue with defining the prohibited characteristic variable is that some influential
quantitative analysts, including several involved in the development and articulation of the potential outcomes framework (see supra note 17), view the study of causation as well-defined only
when the conceptualized “treatment” is an intervention (actual or hypothetical, depending on the
analyst), and that immutable characteristics (being immutable) are not subject to change or manipulation. See, e.g., RICHARD A. BERK, REGRESSION ANALYSIS: A CONSTRUCTIVE CRITIQUE 90–92 (2004); HOLLAND, supra note 93, at 3; Donald B. Rubin, Bayesian Inference for
Causality: The Importance of Randomization, in AM. STAT. ASS’N, PROCEEDINGS OF THE SOCIAL STATISTICS SECTION 233, 233 (1975). Don Rubin and I examine these and other concerns
in a forthcoming paper, D. James Greiner & Donald B. Rubin, Potential Outcomes and Causal
Effects of Immutable Characteristics (July 12, 2008) (unpublished manuscript, on file with the
Harvard Law School Library).
95 See supra section II.A.
96 See supra section I.B.3.a.
2008]
CAUSAL INFERENCE
577
teristics of individuals that were in place prior to the individuals’ interaction with the actor. The only way to do that is to define the
treatment as taking place at some moment of perception by the actor
of the characteristic common to the group.
A hypothetical example may clarify matters. In an employment
discrimination disparate treatment case focusing on hiring, a class of
African American applicants claims intentional discrimination, and the
plaintiffs demonstrate that the defendant hired a lower percentage of
black applicants than white applicants. In response, the employer
proves that it hired based on education, and that members of the African American class had lower educational achievement levels. At rebuttal, the plaintiffs demonstrate that their concededly lower education
achievement levels were the result of societal and governmental race
discrimination, including (to make this hypothetical perfectly clear) a
system of de jure segregated schools. What result? Assuming no
doubt as to these proofs and no other relevant circumstances, the result is a ruling for the employer. In legal terms, Title VII compels the
defendant firm to avoid its own discrimination, not to take remedial
action based on someone else’s. In statistical terms, the employer
is entitled to condition on the educational achievements of all applicants, to take these achievements as it finds them without asking what
led up to them. And in terms of the potential outcomes framework,
the law compels analysts to treat educational achievement as a covariate, something unaffected by treatment. Thus, for timing purposes,
what matters is not actual race or actual gender (which is usually conceptualized as being set before birth), even if there were a general consensus on what those terms mean. Rather, what matters is the state of
the world as of some moment of the social or governmental actor’s
perception.
This discussion demonstrates that an analyst should ordinarily consider treatment assigned as of the moment in which the social or governmental actor first perceives the prohibited characteristic of interest.
Ordinarily, the actor’s perception of a unit’s gender or race will not
change over time; that is, the treatment remains stable as of the moment of first interaction. Moreover, it is usually unsafe to assume that
if the socioeconomic or governmental actor does discriminate, it delays
or limits its discrimination in some way. For example, in the salary
discrimination running hypothetical, if an analyst is assessing whether
the defendant firm awards employees perceived female lower salaries
than employees perceived male, does it make sense for the analyst to
assume that the firm evaluates performance without regard to gen-
578
HARVARD LAW REVIEW
[Vol. 122:533
der?97 Such evaluations can include a heavy dose of subjectivity and
are determined after the evaluator perceives employee gender. Thus,
the fact that this kind of variable may be affected by the prohibited
characteristic under study suggests that it may be an intermediate outcome as defined in the previous section — a variable whose value is
affected by treatment.
The issue of sharply identifying the timing of treatment assignment
dovetails with another of critical importance, namely, precisely identifying the institutional or socioeconomic actor whose behavior is to be
studied. Much depends on the analyst’s exercise of judgment here.
For example, in the capital punishment context, should the analyst assess the effect of (say, the victim’s) perceived race on the criminal justice system as a whole, on the prosecutor’s charging decisions, on jury
behavior, or on the decisions of appellate courts? If an analyst attempts to assess the effect of race on the whole system, can she safely
treat any characteristics of the case as unaffected by the treatment? In
less technical terms, would it be safe to assume in such a study that
police investigate homicides of African Americans or Hispanics in the
same way as homicides of whites? If such assumptions should be
viewed with skepticism,98 then variables such as the heinousness of an
offense or even the number of victims in a particular criminal “occurrence” may be intermediate outcomes. If the analyst attempts to study
the effect of the victim’s perceived race on the whole system, there
may be few variables that could safely be assumed to be unaffected by
treatment.
In this regard, the use of years worked as a balancing variable in
this Article’s salary discrimination running example is deliberately provocative. Buried inside a variable such as years worked are a host of
decisions that might well be characterized as intermediate outcomes.
Some employees may have been fired, perhaps discriminatorily, or may
have chosen to leave rather than face the unpleasant task of fighting a
Title VII lawsuit. Certainly, if the analyst chooses the firm (as opposed
to one of its employees) as the institutional actor to study, and conceptualizes the timing of treatment assignment as occurring at the moment of first perception (meaning, probably, at the time of an employ–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
97 It is possible that a firm does limit its discrimination to certain areas of activity. Irrational
prejudices are, alas, irrational, and they may operate only in certain socioeconomic spaces or at
certain times. My point here is that it will ordinarily be unwise to assume that such discrimination is not taking place.
98 See Samuel R. Gross & Robert Mauro, Patterns of Death: An Analysis of Racial Disparities
in Capital Sentencing and Homicide Victimization, 37 STAN. L. REV. 27, 44 (1984) (discussing
empirical evidence that “official descriptions of homicides by prosecutors were affected by racial
considerations”).
2008]
CAUSAL INFERENCE
579
ment application), years worked is post-treatment and thus may be an
intermediate outcome as opposed to a covariate.
The discussion above demonstrates the delicate balance an analyst
attempting causal inference in the civil rights context must strike
among competing concerns such as the nature of the problem to be
analyzed and the plausibility of the assumptions necessary to proceed.
Much depends on identification of the question. A fundamental aim of
this Article is to persuade (i) that these choices are at least as important, perhaps more so, to the answers produced than are disagreements
about models, statistical techniques, or computer software; and (ii) that
none of these choices is mathematical or technical. No such decision is
beyond the ken of the average layperson. Previous paragraphs suggest
that it might be difficult, without strong and potentially risky assumptions, to assess the effect of the victim’s perceived race on the entirety
of the system that administers capital punishment. But difficulty is
not the only concern. If an analyst can identify primitives (units,
treatment, treatment timing, and outcome of interest) and articulate
the assumptions that would be required to proceed, perhaps the risky
assumptions are worth making, so long as the analyst states them
clearly.
Having identified the units, the treatment, the timing of treatment
assignment, and the potential outcomes of interest, the analyst should
inquire as to the nature of the data available. This is not the same as
examining the data. The analyst should do the hard work of (i) classifying variables into covariates versus intermediate outcomes and (ii)
balancing covariates, all without having access to the outcome measurements. In other words, particularly in a litigation setting, an expert
witness should specifically request that she not be provided with the
outcome data at first. This way of proceeding decreases the danger
identified in section I.B.1 as a principal failing of regression, namely,
improperly favoring one side or the other by snooping for a preferred
result during the model fitting process, because without access to the
outcome data, the analyst has less idea what the litigation answer will
be as she works.99
This is a situation in which early intervention and ongoing oversight by the trial judge (or, more likely, a magistrate or a special master) can encourage both sides’ experts to commit to models before results are available, making settlement more likely. For example, the
court in a large-scale Title VII class action can begin by prohibiting
transmission of data to any testifying expert until, through discovery
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
99 There are, concededly, some limited ways for the fallen to cheat, but they are both harder to
implement and easier to detect than the model-snooping that is so easy to accomplish when checking the fit of regression equations. See supra section I.B.1.
580
HARVARD LAW REVIEW
[Vol. 122:533
and any necessary rulings, available variables have been identified and
the process of their generation understood. Next, the court could, to
the extent consistent with the role of a jury (if any),100 adjudicate disputes as to which variables are pre- and post-treatment, and as to the
post-treatment variables, which ones are unlikely to have been affected
by treatment. Then, the court might allow transmission of the data
corresponding to pre-treatment variables, post-treatment variables unaffected by treatment, and the treatment itself to the parties’ experts.
The experts could implement the balancing process and disclose to
each other which subsets of data each deems sufficiently similar to a
randomized experiment to allow causal inference to proceed. The
point is to require experts to commit, in a way difficult to disavow
later, to critical parts of their analyses before knowing what the results
will be. Finally, upon being satisfied that both sides have committed
in this way, the court could order the release of the outcome data. The
hope is that, because both sides have done the hard work of balancing
covariates before knowing the results, their conclusions will be similar
to one another, easing settlement. At a minimum, important stages of
the analysis will be less susceptible to biasing by experts, some of
whom may be fighting a subconscious battle with themselves.
If this procedure cannot be implemented, civil rights litigants might
nevertheless wish to ask their testifying expert to proceed via potential
outcomes as opposed to regression. As noted above, a trier of fact may
(and should) give greater weight to an expert who truthfully testifies
that she committed to an analysis before knowing its substantive results. In this way, one might hope that the incentive structure of litigation could make the use of potential outcomes self-executing.
As suggested above, classifying variables as covariates or intermediate outcomes — unaffected by treatment versus affected by treatment — requires the analyst to understand the data-generating process
thoroughly. This classification process benefits from a sharp definition
of the treatment, particularly its timing. For this reason, the analyst
using potential outcomes has a better idea of which variables to include in balancing. The basic rule is that any variable measured prior
to the application of treatment is a covariate that should be included
in balancing. Anything measured contemporaneously with treatment
application or assessed afterward is suspect and should be treated as a
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
100 The ideas articulated in this paragraph and the next seem consistent with the trial judge’s
inherent power to control pretrial proceedings as well as his or her gatekeeper role with respect to
expert testimony. See Daubert v. Merrell Dow Pharm., Inc., 509 U.S. 579, 592–93 (1993); see also
D.H. Kaye, The Dynamics of Daubert: Methodology, Conclusions, and Fit in Statistical and
Econometric Studies, 87 VA. L. REV. 1933, 2008 (2001) (arguing that the fact that a statistical
study relies on regression is an insufficient basis upon which to conclude that the study passes
muster under Daubert’s interpretation of the Federal Rules of Evidence).
2008]
CAUSAL INFERENCE
581
covariate only if the analyst may reasonably conclude that the variable’s value is unaffected by treatment. In the salary discrimination
running example, the variable for years of education is a reasonable
candidate for inclusion in balancing; absent other evidence, it might be
plausible to assume that this variable’s values were recorded in an essentially mechanical process from an employment application or resume. Other variables are more suspect, particularly those with inherent subjectivity, such as performance evaluations. Should a dispute
arise as to whether to balance a variable whose values are determined
after treatment application, the burden of persuasion should be on the
party seeking to categorize such a variable as a covariate instead of as
an intermediate outcome.
Thus, within the potential outcomes framework, an analyst finds
guidance on the issue of which variables to include, while the balancing process (which, again, should take place without reference to
measured outcomes) provides information on which observations
should be the focus of analysis. Courts find guidance on how to assign
burdens of proof. The potential outcomes paradigm addresses problems that regression (as commonly used in modern civil rights litigation) could not.
After achieving acceptable covariate balance, the analyst can use a
variety of techniques, such as matching or separate regressions for
treated and control groups, to fill in the counterfactual outcome for
each unit. Regression coefficients themselves (if regressions are used at
all) are of little interest. The outcomes the model predicts, not the estimated coefficients, are what matter. Thus, expert witnesses, litigators, and courts need no longer attempt to shoehorn data into models
with poor fits but with easily interpretable coefficients, such as the
constant additive effect model described in section I.B.3.b. Courts and
litigators may focus less on grasping or explaining logarithms of odds
ratios, per section I.B.3.d. Instead, they can concentrate on figuring
out what would have occurred “but for” the identified treatment, a
task litigators, judges, and juries can understand.
2. A Tug-of-War: The Need for Background Variables Versus the
Desire To Detect Discrimination. — In drawing inferences of causation in the civil rights context, one issue arises often enough to deserve
special consideration. An example may illustrate. In the employment
discrimination context, if an analyst chooses the employer as the institutional actor whose behavior is to be assessed (a reasonable choice
given that the employer as a firm, and not an individual officer of the
firm, is the defendant in the lawsuit), then the analyst should ordinarily conceive of treatment (perceived gender, race, ethnicity, etc.) as assigned at the time of employees’ job applications. Many application
forms include race and gender boxes, and certainly if any sort of interview is required for the job, the prohibited characteristic of interest
will be apparent at that time. The treatment presumptively occurs
582
HARVARD LAW REVIEW
[Vol. 122:533
quite early in the relationship between the units and the actor of interest. Recall that a critical assumption of an observational study is that
the analyst has measured a sufficient number of the important covariates (pretreatment variables) so that, after achieving balance in the covariates, members of one treatment group are not systematically different from members of the other. Recall also that analysts should not
balance on variables whose values are affected by treatment. The
problem in conceiving of treatment as being assigned at the time of a
job application is that a great many variables as to which measurements may be available (for example, tasks or projects accomplished,
types of experience gained, or nature of training received, not to mention performance evaluations, disciplinary actions, initial job levels,
bonuses, etc.) have values that are determined after treatment. In fact,
much of the information in a class member’s employment file, including even the fact that she was hired, is determined after treatment.
Thus, few variables may be presumptively considered covariates.
With fewer covariates, the assumption that the analyst has measured a
sufficient number of the important ones becomes strained.
This concern suggests conceptualizing treatment as occurring later
in the employment relationship, if doing so would be plausible. For
example, if a Title VII disparate treatment class action focuses on the
class members’ failure to achieve a particular kind of promotion, and
if promotion decisions were made by an official within the defendant
firm who had little prior interaction with employees eligible for promotion, the analyst might identify this official as the socioeconomic actor
whose behavior is under study, and thus conceptualize treatment as
having occurred when this official perceived the gender (or other prohibited characteristic) of the promotion applicants. That choice would
allow more variables, such as performance evaluations, job assignments, etc., to be considered covariates and thus be subject to balancing. But this course of action has costs. One of these costs is that any
discrimination perpetrated by the defendant firm before the promotion
application stage, such as while evaluating performance, could go undetected. Balancing on the performance evaluation variables would
mask the discrimination in the same way that balancing on lung cancer could mask the effect of smoking on death.
The problem described here is a general one that occurs whenever
one attempts to draw inferences of causation in a civil rights context in
which the interaction between potential plaintiffs and the institutional
actor of interest extends over a period of time. A tug-of-war exists between (a) the desire to include in balancing as many variables as possible to make plausible the assumption that one has taken care of the
important ones, which counsels in favor of considering treatment to
have been assigned late in the interaction between the units and the
institutional actor of interest; and (b) the desire to understand and
model properly the fullness of the relationship between the institu-
2008]
CAUSAL INFERENCE
583
tional actor and the units, including by assessing the possibility of discrimination early in that relationship, which counsels in favor of conceptualizing treatment to have been assigned as early as possible.
One way to address this tug-of-war is by making assumptions, supported by reasonable judgment, about what variables have been affected by the treatment. Here, the analyst identifies variables whose
values are (i) determined after treatment, but are (ii) less likely to have
been affected by treatment, and (iii) likely to be related (strongly related, one hopes) to the values of important unobserved covariates.
The analyst includes these post-treatment variables in the balancing
process, the hope being that in doing so, she balances important pretreatment covariates while not inducing bias (in a statistical sense) in
the results. In the aforementioned promotions example, perhaps investigation into the defendant firm’s operations reveals that employees receive assignments based only on their current workload and the type
of tasks that require completion. Or it may be that the training programs a firm offers are ordinarily open to all employees who wish to
undertake them. Or it could be that a part of the process of evaluating
employee performance is mechanical — for example, the number of
DVD packages processed per minute.101 In that case, the analyst
might attempt to resolve the tug-of-war articulated above by balancing
on variables measuring types of job experience, number and nature of
training programs, and the mechanically reported aspects of performance. The analyst should, however, be particularly wary of variables
that inherently involve a partially subjective judgment, such as overall
job evaluation.102
B. Specific Contexts
1. Capital Punishment. — Drawing causal inferences about the
role of race in the administration of the death penalty poses several
challenges, some familiar, some unique to the capital setting. A logical
place to begin is with the identification of the treatment. The literature in this area suggests that the treatment should be defined in terms
of perceived race of both the victim and the defendant. For example,
in a jurisdiction made up almost exclusively of Caucasians and African
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
101 See Susan Sheehan, Tear, Slap, Clack, NEW YORKER, Aug. 28, 2006, at 26 (describing a
Netflix facility). Unfortunately, some forms of discrimination, such as perpetration of a hostile
working environment, can affect an employee’s performance even as measured by the most neutral of criteria or by reference to the most mechanical tasks. This is a problem, and it haunts any
analysis framework (potential outcomes or otherwise). One would expect that the actions taken to
communicate a message of inferiority or stigma to certain employees might leave a nonquantitative evidentiary trail.
102 Courts have on occasion recognized the danger here. See, e.g., Broderick v. Ruder, 685 F.
Supp. 1269, 1280 n.10 (D.D.C. 1988); Weiss v. United States, 595 F. Supp. 1050, 1057 (E.D. Va.
1984).
584
HARVARD LAW REVIEW
[Vol. 122:533
Americans, the analyst could divide cases into four treatment categories: white defendant, black victim; white defendant, white victim;
black defendant, black victim; black defendant, white victim.103 Each
unit, or case, would then have four potential outcomes, only one of
which is observed. Although in a real study an analyst would probably want to have four treatment groups, for the sake of illustrating
principles suppose for the moment that the analyst is interested in
race-of-the-victim effects only, such that there are two treatments, “B”
for victim perceived black and “W” for victim perceived white.
Next, the analyst must choose the institutional actor whose behavior is to be studied. If the analyst follows the presumption that treatment is administered as of the moment the institutional actor first perceives the victim’s race, then much depends on this choice. Suppose
the analyst identifies the actor as the criminal justice system. From
case files, the available variables will be those that the police, the
prosecutor, and other official sources have uncovered and recorded after they perceive the victim’s race, that is, after treatment application.
These post-treatment variables must be assessed carefully to see
whether it is reasonable to assume that the treatment does not affect
their values. Consider something as simple as whether the murder
was for hire. The value the analyst sees for this variable is not
whether, in some abstract sense of “truth,” the murder was for hire,
but rather the value determined by a police investigation. It seems
odd, if one is assessing the race effects in the entire criminal justice
system, to assume that police investigate homicides of whites with
equal vigor as homicides of blacks. Thus, the analyst must think carefully about which variables, if any, may be deemed unaffected by
treatment if the institutional actor of interest is the criminal justice system. On the other hand, if the analyst chooses the jury as the actor
whose behavior is to be assessed, then the facts as discovered by the
police (and the defense investigation) may be taken as given on the
principle that any discrimination by the police is not chargeable to the
jury. But among other things, any racial bias in police investigations
(for example) will go undetected. The tug-of-war identified in section
III.A.2 is fully apparent here.
For the sake of illustrating other questions that arise, suppose for
the remainder of this section that the analyst chooses the jury as the
actor of interest. The remaining “primitives” are the timing of treatment application and the outcome. The first should be easy to identify; as articulated above, the presumption should be that treatment is
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
103 If there are a small number of cases fitting a treatment category, the analyst can either remove units in that treatment category from the analysis or collapse it together with another category (thus redefining the treatments). To reiterate a point made several times in this Article, this
choice is not mathematical.
2008]
CAUSAL INFERENCE
585
applied when the jury first perceives the victim’s race. To be safe, an
analyst might conceptualize the perception as having occurred at the
earliest possible moment, perhaps during jury selection.104 Identifying
an outcome to study can be more tricky. After Furman v. Georgia,105
most capital systems include two phases, guilt and penalty. A great
deal of scholarly attention has been focused on the latter, perhaps because the guilt phase of a capital trial is similar to trial of lesser offenses. The penalty phase of capital trials, in contrast, includes elaborate and unusual procedural safeguards designed, in theory at least, to
prevent arbitrary administration of the law’s severest punishment.106
Thus, assume for the moment that the question of interest is the effect
of the perceived race of the victim on the probability that a jury will
sentence a defendant convicted of a death-eligible crime to die.
With these sharp definitions of the primitives, much is clarified, including the fact that at least one critical issue remains: the relationship
of when the jury perceives the race of the victim to the jury’s decisions
on guilt and punishment. The analyst may be interested in the penalty
phase alone, but, per the discussion above, treatment is administered
earlier, before the jury renders a verdict on guilt. Is it safe to assume
that the treatment had no effect on the jury’s guilt or innocence decision? Probably not. In other words, if we believe it worthwhile to
study whether the victim’s perceived race has an effect on a jury’s
punishment decision, it seems odd to assume that the victim’s perceived race had no effect on the same jury’s prior verdict of guilt or
innocence of a death-eligible offense.
To clarify, suppose, in a particular case, the treatment assigned is
“W,” that is, that the jury perceives the race of the victim to be white;
that in the first stage of proceedings, the jury convicts the defendant of
capital murder; and that in the second stage of the proceedings, the
jury sentences the defendant to die. Does it make any sense to talk
about what the jury’s decision in that same case at the second stage of
the trial would have been under treatment “B” if, had the jury perceived the victim to be black, it would have acquitted (or found the
defendant guilty of only a non-capital offense)?107 Here, a focus on a
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
104 Cf. Turner v. Murray, 476 U.S. 28, 36–37 (1986) (holding that a defendant in a cross-racial
capital trial has a constitutional right to inform the venire of the race of the victim and question
potential jurors about racial prejudices).
105 408 U.S. 238 (1972).
106 See Turner, 476 U.S. at 33–39.
107 The fundamental issue here is that the characteristics of one phase of a capital trial might
induce outcome-determinative alterations in a jury’s behavior at the other phase. See Woodson v.
North Carolina, 428 U.S. 280, 302–03 (1976) (plurality opinion) (holding North Carolina’s mandatory death penalty statute unconstitutional in part because juries would exercise unbridled discretion in refusing to convict defendants of death-eligible offenses to avoid the mandatory capital
punishment); see also Roberts v. Louisiana, 428 U.S. 325, 334–35 (1976).
586
HARVARD LAW REVIEW
[Vol. 122:533
variable occurring after treatment, the phase one verdict, is necessary
because unless that variable takes a certain value (conviction of a
death-eligible crime), the outcome of interest (the choice between a
sentence of death or life imprisonment) has no defined value. Recall
that under the potential outcomes framework, a causal effect is defined
in terms of a comparison of potential outcomes of the variable of interest under alternative forms of treatment. To answer the question of
interest, that is, the effect of the victim’s perceived race on the jury’s
penalty phase decisionmaking, an analyst must isolate those units
(cases) in which a jury would convict the defendant of a death-eligible
offense under both treatment B and treatment W. Only then does it
make sense to compare the outcomes of the penalty phase under the
two treatment assignments.
In table form, the problem can be illustrated as follows:
Unit #
1
2
3
4
5
6
TABLE 7. POTENTIAL OUTCOMES TABLE
WITH A CRUCIAL INTERMEDIATE OUTCOME108
Death if
Death if
Treatment Conviction if Conviction if
Treatment
Treatment
Treatment =
Treatment =
(Victim’s
= White
= Black
White
Black
Perceived
Race)
White
??
No
??
Undef.
White
??
Yes
??
Yes
White
??
Yes
??
Yes
Black
No
??
Undef.
??
Black
Yes
??
No
??
Black
Yes
??
No
??
In Table 7, it is clear that Units 1 and 4 do not belong in the study.
Neither was convicted of a death-eligible offense, so the analyst knows
that because of the potential outcome under one treatment (the one actually received), a comparison of punishment under treatment “B” to
punishment under treatment “W” makes no sense. But the analyst
cannot stop there. Suppose it were the case that missing ?? values in
Table 7 were in truth filled in as per Table 8. These values would
never be directly observed, so Table 8 is for illustration. Filled-in values appear in italics.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
108
“Undef.” is an abbreviation for “undefined,” both here and in later tables.
2008]
587
CAUSAL INFERENCE
TABLE 8: POTENTIAL OUTCOMES TABLE WITH A
CRUCIAL INTERMEDIATE OUTCOME, ALL VALUES FILLED IN
Unit #
1
2
3
4
5
6
Treatment
(Victim’s
Perceived
Race)
White
White
White
Black
Black
Black
Conviction if
Treatment =
Black
Conviction if
Treatment =
White
Death if
Treatment
= Black
Death if
Treatment
= White
No
No
Yes
No
Yes
Yes
No
Yes
Yes
Yes
No
Yes
Undef.
Undef.
No
Undef.
No
No
Undef.
Yes
Yes
Yes
Undef.
Yes
Now, it is clear that Units 2 and 5 also have no place in the study,
for the same reason that Units 1 and 4 do not belong. For each of
these units, the outcome of interest is undefined under one of the two
treatments. The only units for which a causal effect of perceived victim race on penalty is defined are Units 3 and 6. Because the analyst
observes Table 7 with its missing values, not the fully filled-in Table 8,
his or her task is to use statistical techniques to fill in the missing values in the two “Conviction if . . .” columns in Table 7, isolate the units
for which both “Conviction if . . .” columns have the value “Yes,” fill in
the missing values in the two “Death if . . .” columns, and then (say)
calculate an average death penalty rate using the now-completed potential outcomes columns. There are various methods the analyst
might use to accomplish the first and third steps, and a choice among
them might require some familiarity with sophisticated mathematical
principles. But nothing in understanding the problem requires anything technical.109
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
109 The capital punishment problem outlined above is structurally identical to a set of statistical
issues in the biomedical context that are referred to, alas, as “censoring by death” or “truncation
due to death.” See, e.g., Junni L. Zhang & Donald B. Rubin, Estimation of Causal Effects via
Principal Stratification When Some Outcomes Are Truncated by “Death,” 28 J. EDUC. & BEHAV.
STAT. 353, 354 (2003); see also Constantine E. Frangakis & Donald B. Rubin, Principal Stratification in Causal Inference, 58 BIOMETRICS 21 (2002). As before, see supra note 17, ownership
over the recognition of this problem is disputed. Compare Zhang & Rubin, supra, and Frangakis
& Rubin, supra, with James Robins & Sander Greenland, Estimability and Estimation of Expected Years of Life Lost Due to a Hazardous Exposure, 10 STAT. MED. 79 (1991).
To my knowledge, no study has done what is required to make valid causal inferences about
jury decisions at the penalty phase, that is, to remove units from the study for which a treatment
effect is undefined. To the contrary, the issue articulated above usually has either gone unrecognized or has been presumed to be amenable to solution via modeling, such as with a properly
specified regression. In fact, regression cannot solve this problem because the issue is one of data
collection and question identification, not modeling technique. Note that the Baldus study,
BALDUS ET AL., supra note 7, suffered from this problem, along with the difficulty of including
588
HARVARD LAW REVIEW
[Vol. 122:533
2. Employment Discrimination. — Employment discrimination examples appear throughout this Article to illustrate basic concepts.
Moreover, the number and kind of employment decisions subject to
challenge under antidiscrimination laws render complete coverage in
this area in a few pages impossible. Thus, this section is limited to a
few general principles, as well as to identification of issues that require
further research.
First, the problem of data with undefined outcomes due to crucial
intermediate variables appears in employment discrimination analysis
on a larger scale than in the death penalty context above. Technically,
it does not make sense to discuss whether an employee actually perceived male would have received a promotion had that employee been
perceived female if, had the employee been perceived female, he/she
would not have been hired in the first place. If an analyst assessing a
promotion discrimination claim examines all units currently eligible to
receive the promotion, she is using a dataset that is contaminated by
units as to which a counterfactual comparison makes no perfect theoretical sense. A second issue that stands out as troublesome in the employment discrimination context is the tug-of-war identified above,
that is, the problem of what variables to include in a balancing process
given that gender and race are perceived (treatment is applied) so early
in the employee-defendant relationship. Much here may depend on
whether the law chooses to characterize events as separate or related.
In the capital punishment context, the Supreme Court has recognized
a strong connection between the guilt and penalty phases of a capital
trial,110 triggering the problem illustrated by Tables 7–8, above. Perhaps in certain employment discrimination contexts, hiring and promotion constitute legally separate transactions or occurrences.111
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
intermediate outcomes in the right-hand side of its regression equations. See Greiner & Rubin,
supra note 94, manuscript at 23.
110 See supra note 107.
111 See Gen. Tel. Co. of the Sw. v. Falcon, 457 U.S. 147, 157–59 (1982) (holding that the class
action representativeness requirement prevented a plaintiff who alleged that he personally suffered race discrimination in promotion from being the named plaintiff for a class that alleged race
discrimination in hiring). However, as Judge Goldberg has pointed out, in certain contexts, such
as when the hiring and promotion decisionmakers are the same, conceptualizing hiring and promotion as separate events makes little sense. See Watson v. Fort Worth Bank & Trust, 798 F.2d
791, 803–06 (5th Cir. 1986) (Goldberg, J., dissenting), vacated, 487 U.S. 977 (1988); see also Stuart
v. Roache, 951 F.2d 446, 452 (1st Cir. 1991) (collecting cases in which courts have permitted employment discrimination plaintiffs to link hiring and promotion). My thanks to an anonymous
reviewer for referring me to both opinions.
For a discussion of estimation issues that arise when related employment processes are analyzed separately, see Joseph L. Gastwirth, Statistical Evidence in Discrimination Cases, 160 J.
ROYAL STAT. SOC’Y SER. A 289, 293–95 (1997). The point made in the text of this Article is different in that it goes to the definition of a causal effect as opposed to estimation, but this difference just demonstrates that the issue of whether, when, and how to atomize long-term relationships among units and decisionmakers has multiple facets.
2008]
CAUSAL INFERENCE
589
This discussion demonstrates that difficult, fundamental issues may
arise over and over in categories of employment discrimination cases.
Regression as used in modern civil rights litigation does not avoid
these issues; it masks them. Thus, expert witnesses, litigators, and
courts will be required to make delicate judgments. It will not do to
render the employment discrimination laws wholly inapplicable to entire classes of lawsuits because we currently lack a statistical framework that will satisfy all skeptics.112 Nor will it do to allow defendants to prevail in any cases in which they point out that the ideal
data are not available; the ideal data are never available. Assumptions
that might make the quantitative analyst “blush”113 outside the litigation setting, particularly with respect to the characteristics of persons a
firm does not hire (as to which covariate information may not be
available or recoverable), must be seriously assessed and explored.
Obviously, a prerequisite to such a process of exploration and assessment is that the assumptions be stated clearly. Moreover, experts, litigators, and courts need not be satisfied in all instances with the information available in a defendant’s computer files. In some cases, paper
records can be searched and coded; in others, surveys may be administered to reconstruct lost or previously unavailable information.
The key point here is that in most employment discrimination contexts, we can identify the units, the treatment, a timing of treatment
application, the outcome of interest, and the variables that might be
subject to balancing. We can assess critical assumptions, such as non–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
112
Consider the following passage:
Scientifically, the strength of the case against smoking rests not so much on the P-values,
but more on the size of the effect, on its coherence and on extensive replication both
with the original research design and with many other designs. . . .
....
The results of the studies on smoking are generally coherent in the following ways:
(i) There is a dose-response relationship: persons who smoke more heavily have greater
risks of disease than those who smoke less. (ii) The risk from smoking increases with the
duration of exposure. (iii) Among those who quit smoking, excess risk decreases with
time after exposure stopped.
David Freedman, From Association to Causation: Some Remarks on the History of Statistics, 14
STAT. SCI. 243, 253 (1999). Requiring this level of proof before reaching a conclusion is a respectable viewpoint for a scientist, one who is interested in investigating and establishing universal
truths. It will not do for a court, which must either (i) work toward finding the best possible answer in a particular dispute under a tight (by social science standards) timeline, or (ii) always rule
against the party with the burden of proof. See Joseph L. Gastwirth, Some Issues Arising in the
Presentation of Statistical Testimony, 4 L., PROBABILITY & RISK 5, 5–7 (2005); Steven L. Willborn, A Lawyer’s View of the Statistical Expert, 4 L., PROBABILITY & RISK 25, 26 (2005).
113 As Judge Higginbotham has noted:
In countless areas of the law weighty legal conclusions frequently rest on methodologies
that would make scientists blush. The use of such blunt instruments in examining complex phenomena and corresponding reliance on inference owes not so much to a lack of
technical sophistication among judges, although this is often true, but to an awareness
that greater certitude frequently may be purchased only at the expense of other values.
League of United Latin Am. Citizens v. Clements, 999 F.2d 831, 860 (5th Cir. 1993) (en banc).
590
HARVARD LAW REVIEW
[Vol. 122:533
interference among units. Some uncomfortable additional assumptions
may be necessary to proceed in actual cases, but at least we can ask
coherent questions linked to available data. As the next section demonstrates, such is not always the case in civil rights litigation.
3. Causation and Section 2 of the Voting Rights Act. — Since
Thornburg v. Gingles,114 it has been clear that section 2 of the Voting
Rights Act prohibits dilution of minority voting strength115 and that an
essential part of a plaintiff’s case in a section 2 lawsuit is proof of racial bloc voting,116 which until recently was attempted primarily via
regression.117 Among the innumerable unresolved issues in this area is
whether a plaintiff’s prima facie case requires a causal element of
some kind, either as part of the proof of racial bloc voting or later at a
“totality-of-circumstances” stage.118 Courts adjudicating this issue describe the causal element in various ways. Examples include (i) a focus on “the reasons why white voters reject[] minority candidates”;119
(ii) “an inquiry into the cause for the correlation [between voter race
and voter preferences] (to determine, for example, whether [the correlation] might be the product of similar socioeconomic interests rather
than some other factor related to race)”;120 (iii) a need to eliminate
party affiliation or party preferences as a cause of voting patterns;121
and (iv) whether any vote dilution observed is “being caused by the interaction of racial bias in the voting community and the challenged
scheme.”122
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
114
115
478 U.S. 30 (1986).
See id.; see also Holder v. Hall, 512 U.S. 874, 885 (1994) (O’Connor, J., concurring in part
and concurring in the judgment); id. at 961–63 (separate opinion of Stevens, J.).
116 This discussion assumes knowledge of basic section 2 law, including the three Gingles prerequisites, the totality-of-circumstances test, and the so-called “Senate Report” factors. See Voting
Rights Act Amendments of 1982, Pub. L. No. 97-205, § 3, 96 Stat. 131, 134 (codified at 42 U.S.C.
§ 1973(b) (2000)); Clark v. Calhoun County, Miss., 21 F.3d 92 (5th Cir. 1994); Colleton County
Council v. McConnell, 201 F. Supp. 2d 618, 632 n.13 (D.S.C. 2002); S. REP. NO. 97-417, at 28–29
(1982).
117 See Greiner, supra note 12, at app. A; see also League of United Latin Am. Citizens v. Perry,
126 S. Ct. 2594, 2657 (2006) (Roberts, C.J., concurring in part, concurring in the judgment in part,
and dissenting in part) (“At trials, assumptions and assertions give way to facts. In voting rights
cases, that is typically done through regression analyses of past voting records.”).
118 See supra note 116.
119 Gingles, 478 U.S. at 100 (O’Connor, J., concurring in the judgment).
120 Holder, 512 U.S. at 904 (Thomas, J., concurring in the judgment). Notice how quickly doctrinal confusion in this area emerges. Usually, for socioeconomic circumstances to provide an alternative explanation for voting patterns that are related to race, these circumstances would need
to be themselves related to race. But the relation of socioeconomic circumstances and race is a
Senate Report factor tending to support a finding of vote dilution, at least when (as is usually the
case in this country) minority race is related to less advantageous socioeconomic conditions.
121 See, e.g., League of United Latin Am. Citizens v. Clements, 999 F.2d 831, 859 (5th Cir. 1993)
(en banc); Baird v. Consol. Indianapolis, 976 F.2d 357, 361 (7th Cir. 1992).
122 Nipper v. Smith, 39 F.3d 1494, 1497 (11th Cir. 1994) (en banc).
2008]
CAUSAL INFERENCE
591
The judiciary has spilled a fair amount of ink on this subject, with
the prevailing view being that section 2 does contemplate some sort of
causal inquiry.123 Much of the debate has been doctrinal and is thus
beyond the scope of this Article, but one can learn important things
about the nature of causal inference by examining how courts actually
apply the concept of causation in the section 2 context. That examination should begin with two preliminary observations. First, the causal
inquiry, however defined, appears to matter little in actual cases unless
the factual record demonstrates that candidates of minority race have
enjoyed some measure of electoral success.124 Second, the only addi–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
123 The debate has concerned at least three broad issues. First, there is the threshold question
of whether section 2 vote dilution plaintiffs must prove any sort of causation. This issue was one
(among several) over which the Eleventh Circuit, sitting en banc, divided equally in Solomon v.
Liberty County, Florida, 899 F.2d 1012 (11th Cir. 1990), and later resolved for circuit purposes in
Nipper, 39 F.3d 1494. The en banc Fifth Circuit also split, although a majority supported the
view that section 2 requires causal proof. League of United Latin Am. Citizens, 999 F.2d at 856;
id. at 901–12 (King, J., dissenting). In addition, note that the Supreme Court recently found a
section 2 violation without mentioning causation. See League of United Latin Am. Citizens v.
Perry, 126 S. Ct. 2594 (2006).
Second, courts have disagreed on whether the causal element of proof, whatever it means, is
(i) part of the third Gingles prerequisite (white bloc voting usually sufficient to defeat the minority’s preferred candidate); (ii) part of the totality of circumstances; or (iii) synonymous with a section 2 vote dilution claim, when joined with a potentially dilutive electoral structure such as an
at-large districting scheme or a gerrymander. The Fifth Circuit appears to favor the first choice.
See League of United Latin Am. Citizens, 999 F.2d at 856–58. The First, Second, Fourth, and
Seventh Circuits appear to prefer the second. See United States v. Charleston County, S.C., 365
F.3d 341, 347–48 (4th Cir. 2004); Goosby v. Town Board, 180 F.3d 476, 493 (2d Cir. 1999); Milwaukee Branch of NAACP v. Thompson, 116 F.3d 1194, 1199–1200 (7th Cir. 1997); Vecinos de
Barrio Uno v. City of Holyoke, 72 F.3d 973, 983 & n.4 (1st Cir. 1995). Note that the NAACP opinion is not perfectly clear on this point. Finally, the Eleventh Circuit appears to espouse the third
choice. See Nipper, 39 F.3d at 1514–15. Unfortunately, this matter is not one of mere doctrinal
nicety, because most courts at least give lip service to the proposition that “it will be only the very
unusual case in which the plaintiffs can establish the existence of the three Gingles factors but still
have failed to establish a violation of § 2 under the totality of circumstances.” Clark v. Calhoun
County, Miss., 21 F.3d 92, 97 (5th Cir. 1994) (quoting Jenkins v. Red Clay Consol. Sch. Dist. Bd. of
Educ., 4 F.3d 1103, 1135 (3d Cir. 1993)); see also NAACP v. City of Niagara Falls, N.Y., 65 F.3d
1002, 1019 n.21 (2d Cir. 1995).
Third, the courts have divided on which party has the burden of proof on causation. The
Fifth Circuit essentially places the burden on the plaintiffs, where it remains at all times. See
League of United Latin Am. Citizens, 999 F.2d at 856–58. Other courts, while purporting to eschew formal burden-shifting, have stated that if plaintiffs prove the three Gingles prerequisites
and the defendant is mute on the issue, plaintiffs are deemed to have prevailed on this question.
See Vecinos de Barrio Uno, 72 F.3d at 983 & n.4; Nipper, 39 F.3d at 1524 & n.61, 1525 n.64.
Other judges would adopt formal burden-shifting. See Goosby, 180 F.3d at 502–03 (Leval, J.,
concurring); League of United Latin Am. Citizens, 999 F.2d at 902–04, 910–12 (King, J.,
dissenting).
124 As this Article does not attempt a comprehensive examination of all section 2 cases over a
particular time period, support for the statement above comes from a review of several circuit
court cases grappling with the causation issue, as well as some subsequent rulings. In particular,
consider the following three illustrations. First, in League of United Latin American Citizens v.
Clements, the case in which the en banc Fifth Circuit first adopted a causation element in the section 2 context, the court relied exclusively on the success of Republican candidates of minority
592
HARVARD LAW REVIEW
[Vol. 122:533
tional evidence occasionally brought to the table to address the
causation issue, apart from that which addresses the laundry list of
factors courts would otherwise consult at the section 2 totality-ofcircumstances stage, is whether white voter support for a minoritypreferred candidate is generally lower if the candidate is of minority
race or ethnicity (as opposed to white).125 These two facts suggest
that, to the extent courts really do focus on causation when adjudicating section 2 cases (as opposed to when articulating section 2 doctrine),
they are attempting to draw an inference about the effect of the race of
the candidate, or (worse yet) the race of voters, on election results.
A potential outcomes understanding of causation pays dividends in
this context by demonstrating that neither inquiry can be coherently
linked to available data. Consider an attempt to assess the effect of
the race of the candidate on election results. One can identify the
treatment as (perceived) candidate race and, at least initially, the outcome of interest as election wins and losses or perhaps vote tallies. If
vote tallies or election results are the outcomes of interest, then the institutional or socioeconomic actor whose behavior we are interested in
assessing is the set of persons eligible to vote in each election. So far
so good. Identifying the units presents more difficulties; if we consider
units to be electoral contests, the fact that candidates usually run more
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
race to hold that party, not race, best “explained” observed voting patterns. 999 F.2d at 877–80.
Later, however, in Clark v. Calhoun County, Mississippi, the Fifth Circuit reversed a district court
holding in favor of a section 2 defendant without mentioning causation, although the court did
examine success of candidates of minority race, as 42 U.S.C. § 1973(b) compelled it to do. 88 F.3d
1393, 1400 (5th Cir. 1996). Both opinions were authored by Judge Higginbotham. See also Jones
v. City of Lubbock, 730 F.2d 233, 235 (5th Cir. 1984) (Higginbotham, J., concurring in denial of
petition for rehearing en banc). Second, in Nipper v. Smith, the Eleventh Circuit spent more than
fifteen pages justifying its pronouncements that section 2 included a causal element. 39 F.3d at
1509–27. In strongly implying that the requisite causation had been proven, however, the en banc
court simply looked to the Gingles factors, weighed the totality of circumstances, and examined
special circumstances of the case (there, the unusual considerations attendant to a vote dilution
claim when the offices at issue are judicial). Id. All of these steps are ones the court would have
taken in a section 2 discussion lacking any focus on causation. Third, after the First Circuit held
in Vecinos de Barrio Uno, 72 F.3d at 981, that section 2 includes some causal element, a threejudge court sitting in Massachusetts found a section 2 violation in a districting scheme for the
state legislature. Black Political Task Force v. Galvin, 300 F. Supp. 2d 291 (D. Mass. 2004). The
three-judge court’s analysis of causation consisted of the following phrase: “We have also inquired into causation where appropriate . . . .” Id. at 313. Judge Selya authored both Vecinos de
Barrio Uno and Galvin.
Again, identifying the important role the success of candidates of minority race apparently
plays in the causation inquiry does little to clear up the doctrinal confusion in this area. Courts
would examine the success of minority candidates as a factor in the section 2 context apart from
any causal focus. 42 U.S.C. § 1973(b) (2000).
125 See, e.g., Charleston County, 365 F.3d at 350 (noting that voting polarization increased when
a black candidate ran against a white candidate); Old Person v. Cooney, 230 F.3d 1113, 1128 (9th
Cir. 2000); see also Goosby, 180 F.3d at 495–98 (rejecting a party-not-race argument primarily because of the defendant jurisdiction’s candidate-slating process, which is itself the fourth Senate
Report factor).
2008]
CAUSAL INFERENCE
593
than once renders the ordinary assumption that units do not interfere
with one another hogwash. In other words, to maintain the noninterference assumption, we would have to assume away things like
the incumbency effect and the coattails phenomenon.126 If we consider
units to be candidates, it becomes hard to define the outcome of interest; which of several potential contests in which a single candidate
runs should one examine, or should one combine them in some way
over time?
The problem of sharply defining units, however, pales in comparison to the issue of identifying the timing of the treatment administration. The reasoning courts employ in their discussions of causation
demonstrates their desire to conceptualize the effect of perceived candidate race as of election day. This desire is evident from the fact that
courts contemplate separating the effect of candidate race from the effect of funding levels, party loyalty, incumbency, use of the media, and
a laundry list of other factors,127 some of which continue to change up
until voters enter polling booths. Courts (and some commentators) anticipate accomplishing this task by using “multi-variate mathematical
inquiry,”128 adding variables to regression equations,129 or employing
“properly specified models,”130 which are all complicated ways of saying that certain variables should be included on the right-hand side of
a regression equation.
We are in trouble here. Recall the presumption that treatment
should be conceptualized as having been administered as of the moment of first interaction between the unit (here, the candidate, or the
candidate in a given election) and the socioeconomic or institutional
actor of interest (here, the members of the public eligible to vote). The
public perceives a candidate’s race long before election day. Mean–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
126 This is another example demonstrating that the assumption of non-interference among units
can be difficult to recognize and, when recognized, can be fairly startling. See supra note 69.
127 See, e.g., Vecinos de Barrio Uno, 72 F.3d at 983 n.4 (“[F]actors might include . . . lack of
funds, want of campaign experience, the unattractiveness of particular candidates, or the universal popularity of an opponent.”); City of Lubbock, 730 F.2d at 235 (Higginbotham, J., concurring
in denial of petition for rehearing en banc) (“[O]ther explanatory factors [might be] . . . campaign
expenditure, party identification, income, media use measured by cost, religion, name identification, or distance that a candidate lived from any particular precinct.”); Black Political Task Force,
300 F. Supp. 2d at 313–15 (finding that incumbency played a significant role in redistricting).
128 See City of Lubbock, 730 F.2d at 234; see also League of United Latin Am. Citizens, 999 F.2d
at 859 (“detailed multivariate analysis”).
129 See Holder v. Hall, 512 U.S. 874, 904 n.13 (1994) (Thomas, J., concurring in the judgment);
see also Rodriguez v. Pataki, 308 F. Supp. 2d 346, 433–34 (S.D.N.Y. 2004).
130 See Charles S. Bullock, III, Misinformation and Misperceptions: A Little Knowledge Can
Be Dangerous, 72 SOC. SCI. Q. 834, 838 (1991); see also Richard L. Engstrom & Michael D.
McDonald, Definitions, Measurements, and Statistics: Weeding Wildgen’s Thicket, 20 URB. LAW.
175, 177 (1988) (referring to “multivariate causal analysis”). But see Bernard Grofman, Multivariate Methods and the Analysis of Racially Polarized Voting: Pitfalls in the Use of Social Science by the Courts, 72 SOC. SCI. Q. 826, 828 (1991).
594
HARVARD LAW REVIEW
[Vol. 122:533
while, courts have fits. For example, they alternately consider the
same circumstance, minority candidates’ difficulty in raising funds, as
favoring both section 2 plaintiffs and section 2 defendants.131 The fact
is, both positions make sense, depending on the timing of treatment
application. If one considers the treatment of perceived race as being
administered on election day, difficulty in raising funds is a covariate
that should be subject to balancing, which will often favor the defendant. If one considers the treatment as administered earlier, perhaps
at some moment of perception by a part of the voting polity, difficulty
in raising funds is an intermediate outcome that should not be subject
to balancing, which will often benefit the plaintiff.
Is there something special about the electoral process that would allow treatment assignment to be conceptualized later than the moment
potential voters first observe a candidate’s race, such as on the day of
the election? In a word, no. To the contrary, what we think we know
about elections suggests that departing from the ordinary presumption
about the timing of treatment application is more dangerous here than
elsewhere. There is no point in rehashing arguments already made to
the effect that perceived race affects campaign contributions, party
loyalty, incumbency, access to the media, and most if not all of the
laundry list of other factors courts point to as potential nonracial
causes for election results. It is worth pausing to point out that one of
the dangers here is the same as the danger in the smoking, lung cancer,
and death example articulated above. By analogy to the courts’ section 2 reasoning, if a statistician looks for an effect of smoking on
death rates after balancing on the occurrence of lung cancer, diabetes,
heart disease, throat ailments, or other similar variables, she will
probably end up concluding that smoking is a healthy habit that all of
us should make a part of our daily lives.
The real point here is that the courts are the ones making strong
and implausible assumptions, not section 2 litigants who argue that
race affects election results by way of affecting the alternative variables upon which courts would focus. That is, by conceptualizing
treatment application as occurring on election day, courts are assuming
that a candidate’s race has no effect on his or her ability to raise funds,
command party discipline and loyalty, campaign, or gain access to the
media, all this while deeming it worthwhile to study whether candidate race has an effect on votes cast. It is one thing to say that candidate race may have no effect on these other variables; it is another to
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
131 Compare, e.g., Vecinos de Barrio Uno, 72 F.3d at 983 n.4 (listing “lack of funds” as an alternative, non-racial explanation for voting patterns, suggesting exoneration of a section 2 defendant), with United States v. Charleston County, S.C., 365 F.3d 341, 352 (4th Cir. 2004) (stating
that lack of funds makes it difficult for minority candidates to compete in districts of larger size,
such as multi-member districts).
2008]
CAUSAL INFERENCE
595
assume that no such effects exist, while simultaneously deeming it useful to study whether candidate race affects voter behavior. One would
not, without justification based in evidence, assume that smoking has
no effect on risk of lung cancer, diabetes, and heart disease, and nevertheless deem it worthwhile to study whether smoking has an effect on
risk of death.
Judges, or at least some judges, understand some of these principles.132 What they do not appear to understand, and what is a fundamental lesson of this Article, is that no multivariate mathematical
inquiry, no regression equation, no properly specified model, no
amount of common sense and intuitive assessment, and no (to borrow
a favorite judicial phrase) examination of all the facts and circumstances can solve this problem. The difficulty stems from a failure to
identify a plausible time for treatment assignment, and thus a failure
to articulate a coherent question linked to available data.
Is it possible to articulate a coherent causal question in the section
2 context by conceptualizing treatment as administered at a time earlier than election day? I have been unable to do so. One might try to
focus on the moment a potential candidate announces his or her candidacy. But commentators and courts know that potential candidates
choose the offices they run for only after serious thought as to their
prospects for winning,133 and such a calculation must include the racial preferences of voters, party officials, and potential contributors, all
of which potential candidates probably know better than do Article III
judges.134 It would be unwise to assume, in a study designed to assess
the effects of candidate race, that no such strategic thinking occurs.
Worse, race may affect a candidate’s behavior years before she contemplates running for office, perhaps at the candidate’s initial choice
of party affiliation. Again, one need not be certain that such an effect
in fact exists. One must, however, ask whether it is plausible (or
worthwhile) to assume that it does not exist and simultaneously study
whether race affects voting behavior on election day.
At the most fundamental level, the difficulties here are the result of
three overarching factors. The first is the tug-of-war articulated in
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
132 See in particular the cogent discussion in United States v. Charleston County, S.C., 316 F.
Supp. 2d 268, 300–03 (D.S.C. 2003), and the more economic-based examination in Milwaukee
Branch of NAACP v. Thompson, 116 F.3d 1194, 1198–200 (7th Cir. 1997).
133 See, e.g., GARY C. JACOBSON & SAMUEL KERNELL, STRATEGY AND CHOICE IN CONGRESSIONAL ELECTIONS 19, 35 (2d ed. 1983); V.O. KEY, JR., THE RESPONSIBLE ELECTORATE 7 (1966). Both the Charleston County and NAACP courts explicitly recognized this fact; in
the former case, there was testimony to this effect. See NAACP, 116 F.3d at 1198; Charleston
County, 316 F. Supp. 2d at 278 n.13.
134 Putting aside questions of institutional competence, potential witnesses hoping to run for
office in the future have powerful incentives not to testify truthfully and completely on such
subjects.
596
HARVARD LAW REVIEW
[Vol. 122:533
section III.A.2. The long (often decades-long) time period between the
moment of first candidate-electorate interaction and vote tallies renders the tug-of-war tension particularly evident.
The second overarching factor is the nature of the institutional or
socioeconomic actor whose behavior we are attempting to assess. “The
electorate” (more accurately, the potential electorate) is a complicated
mass of individuals whose choices and motivations change over time
and whose actions even at a particular time point are difficult to understand. The employment discrimination context was hard enough,
but there we had the advantage of not being concerned about the potential for someone else’s discrimination; only the employer’s behavior
mattered. In the section 2 context, by contrast, judges must assess voters’ “opportunity . . . to participate in the political process and to elect
representatives of their choice,” and applicable law directs them to
consider the “totality of circumstances.”135 The entire political system
is fair game.136
The third overarching factor is strategic thinking on the part of
persons contemplating a run for political office. In terms of the vocabulary used in this Article, treatment assignment has occurred before
treatment application — the potential candidate knows what the electorate will perceive his or her race to be before she enters the electoral
contest. The previous portions of this Article have used the terms
“treatment assignment” and “treatment application” interchangeably
because in many situations the two occur (or can be conceptualized as
occurring) simultaneously. That will not do here. Potential political
candidates are hyper-strategic actors.137
The above discussion has doctrinal implications: until resolved, the
problems discussed here are a reason to interpret section 2 as not including a causal element. If courts feel that other doctrinal considerations, considerations beyond the scope of this Article, compel some
kind of causal inquiry in the section 2 context, they should at least define sharply (i) the units, (ii) the treatment, (iii) the timing of treatment
administration, (iv) the outcomes of interest, and (v) the relationships
the units have to one another (for example, how to handle interference
among units caused by the incumbency and coattail effects). They
should then articulate how available data have any information on the
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
135
136
Voting Rights Act of 1965, 42 U.S.C. § 1973(b) (2000).
The above discussion should make clear that matters are far worse if one shifts the treatment definition from the race of the candidate to the races of the voters. Which voters? When is
treatment applied?
137 Recently, scholars have begun to examine the challenges for causal inference that occur
when units take action between treatment assignment and treatment application, anticipating the
results of that application. See, e.g., Gerard J. van den Berg, An Economic Analysis of Exclusion
Restrictions for Instrumental Variable Estimation (Inst. for the Study of Labor, Discussion Paper
No. 2585, 2007), available at http://ssrn.com/abstract=964965.
2008]
CAUSAL INFERENCE
597
questions identified. Perhaps courts will succeed in doing what I have
been unable to do. Until they do so, they are interpreting the statute
to include a nonsensical question.
IV. CONCLUSION
The potential outcomes framework is not so much a solution to
problems of causal inference in the civil rights context as it is a coherent structure within which to attempt to solve problems. Its adoption
would constitute progress, not a final answer. The framework provides a way to reduce expert witness bias, assess whether questions in
litigation are at the present time answerable, and identify unrealistic
assumptions, all within a definition of causation accessible to judges
and juries. The paradigm helps clarify the data that must be collected
to implement studies with believable conclusions, and it does a better
job of forcing quantitative analysts to articulate the assumptions required to use data to provide answers to questions of interest.
Adoption of the potential outcomes framework would make analyses better, not easier. One of the attractive, but dangerous, characteristics of regression as used in modern civil rights litigation is that it is so
simple to implement. At least, the initial steps of regression are easy to
implement; the process of assessing fit, attempting different equations,
etc., requires more effort. Comparatively speaking, however, much of
the process of implementing a regression as it is used in modern civil
rights litigation does not require as much effort as would implementing
the potential outcomes framework. Implementing a potential outcomes causal analysis in the civil rights context requires work before
the analyst sits down in front of the computer: work in identifying
units, treatment, timing of treatment assignment, and outcomes of interest, along with work in maximizing the plausibility of initial assumptions. If these initial stages go well, more hard work is required
to identify and balance covariates. Only then, after all of the “design”
phase of the study is over, does the “analysis” phase become simpler.
Reasoned judgment is required at every step. Fortunately, the higherquality result is worth the extra work.
Much remains to be done in this area. This Article focuses on gender, race, and national origin discrimination and shows that many
hurdles still exist. Age claims are another matter. Here, an initial
challenge is to identify the treatment — the counterfactual.138 But the
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
138 For example, imagine a 60-year-old employee with thirty years of experience in a given field
and an expectation of being able to work for another eight years. Should this employee be compared for age discrimination purposes to (i) a 40-year-old worker with ten years of experience and
the expectation of eight more years of work, (ii) a 40-year-old worker with ten years of experience
and twenty-eight more expected work years, (iii) a 40-year-old with thirty years of experience and
eight more expected work years, (iv) a 60-year-old worker with thirty years experience and
598
HARVARD LAW REVIEW
[Vol. 122:533
benefits of the potential outcomes framework are evident: experts, litigators, and courts are forced to articulate their questions clearly, which
requires close consideration of statutory objectives and policy goals.
The results should be not just more accurate quantitative answers, but
better law.
–––––––––––––––––––––––––––––––––––––––––––––––––––––––––––––
twenty-eight more expected work years, or (v) something else? Finding actual employees with
some of these profiles to donate outcome values could prove difficult.