race and turnout in us elections exposing hidden effects

Public Opinion Quarterly, Vol. 74, No. 2, Summer 2010, pp. 286–318
RACE AND TURNOUT IN U.S. ELECTIONS
EXPOSING HIDDEN EFFECTS
BENJAMIN J. DEUFEL
ORIT KEDAR*
BENJAMIN J. DEUFEL directs quantitative analysis for the Financial Services Practice at the Corporate Executive Board, Arlington, VA, USA. ORIT KEDAR is an Associate Professor in the
Department of Political Science at the Massachusetts Institute of Technology, Cambridge,
MA, USA, and the Hebrew University of Jerusalem, Jerusalem, Israel. Benjamin Deufel benefited from financial support by the Jacob K. Javits Fellowship Program of the U.S. Department of
Education and the Multidisciplinary Program in Inequality and Social Policy at Harvard University, sponsored by the National Science Foundation. Orit Kedar benefited from the V.O.K.
Fellowship and Dellon Fellowship, Harvard University. The authors would like to thank the Center for American Political Studies at Harvard University for a seed grant. For helpful comments
and suggestions, they thank Chris Achen, Barry Burden, Don Green, Jonathan Katz, Gary King,
Matthew Lebo, Skip Lupia, Jonathan Nagler, Ken Scheve, Nick Valentino, Lynn Vavreck, and
Jonathan Wand. They also thank Greg Distelhorst and Mike Sances for superb research assistance. Accompanying materials can be found on the authors’ Web site at http://web.mit.edu/
okedar/www/. *Address correspondence to Orit Kedar, Massachusetts Institute of Technology,
Department of Political Science, 77 Massachusetts Ave., E53-429, Cambridge, MA 02139,
USA; e-mail: [email protected].
doi: 10.1093/poq/nfq017
Advance Access publication April 22, 2010
© The Author 2010. Published by Oxford University Press on behalf of the American Association for Public Opinion Research.
All rights reserved. For permissions, please e-mail: [email protected]
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Abstract We demonstrate that the use of self-reported turnout data
often results in misleading inferences about racial differences in turnout.
We theorize about the mechanism driving report of turnout and, utilizing ANES turnout data in presidential elections from 1976 to 1988 (all
years for which comparable validated data are available), we empirically model report of turnout as well as the relationship between reported
and actual turnout. We apply the model to the two subsequent presidential elections in which validated data are not available, 1992 and 1996.
Our findings suggest that African Americans turned out almost 20 percentage points less than did Whites in the 1992 and 1996 U.S.
presidential elections—almost double the gap that the self-reported data
indicates. In contrast with previous research, we show that racial differences in factors predicting turnout make African Americans less likely
to vote compared to Whites and thus increase their probability of overreporting. At the same time, when controlling for this effect, other
things equal, African Americans overreport electoral participation more
than Whites.
Race and Turnout in U.S. Elections
287
Introduction
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
The question “Who votes?” has been the focus of numerous studies. Out of
both positive and normative motivations, political scientists have examined
voter turnout over time, across electoral institutions, and across segments of
the population. For the most part, however, the secret ballot prevents political
scientists from observing what we seek to explain; we know who reports voting in surveys, but we do not know who actually votes.
The vast majority of studies simply assume that the missing piece—
individual turnout—is identical to the observed piece—self-reported turnout. In the absence of any further information, this is, of course, a
reasonable assumption. However, auxiliary information reveals not only
that a substantial proportion of American survey respondents report turning out when they do not, but also that their propensity to misreport
turnout is related to their propensity to vote, which may lead to mistaken
inferences. Two steps are thus required in order to explain who votes.
First, we need to comprehend the mechanism that drives the observed
piece—reported turnout. And second, the relationship between the observed and unobserved, reported and actual turnout, should be modeled.
Although much energy has been focused on the former, the latter has
been largely ignored.
In this study, we take up this challenge, focusing on racial differences in
turnout. Numerous studies provide evidence that African Americans overreport turnout at higher rates than Whites. Building on these studies, we
develop a theoretical account that explicitly models both the reporting process and its relationship to actual turnout. We employ this theory to model
the relationship between turnout and report of turnout in those years in
which the two are available at the individual level. We contend that African
Americans overreport their turnout more than Whites for two reasons. First,
African Americans overreport more because they simply have more of an
opportunity to do so. Group differences in other predictors of turnout, such
as socioeconomic status, cause African Americans to be less likely to turn
out compared to Whites, and a lower propensity to vote actually makes
African Americans more likely to overreport. Furthermore, holding constant other predictors of turnout, African Americans still overreport more
than Whites. We argue that, because of the importance of race in American
politics and elections in particular, the social desirability of voting is higher
for African Americans.
After establishing our theory empirically, we employ it to deflate selfreported figures in the 1990s where validated data are not available. Our
findings suggest that throughout the 1980s and 1990s, the overreporting
bias in self-reports masks a large turnout gap between Whites and African
Americans. On the other hand, this gap narrows considerably after accounting for other predictors of turnout.
288
Deufel and Kedar
Finally, a side benefit of our venture to expose hidden effects is methodological. Modeling the relationship between validated and self-reported
turnout in the 1970s and 1980s, we produce a function that probabilistically deflates reported turnout. We test the performance of our deflating
function and show that our algorithm predicts turnout more accurately
than self-reports.
WHAT WE KNOW (AND DON’T KNOW) ABOUT OVERREPORTING
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Studies of voter turnout are in agreement that misreporting of turnout exists
and is almost entirely in one direction. The fraction of respondents voting but
reporting they did not (“underreporters”) is negligible and, importantly, these
discrepancies are usually random (Silver, Anderson, and Abramson 1986;
Presser and Traugott 1992). Overreporting of turnout is different: it is of substantial magnitude and, most students of voting behavior agree, it is
systematically related to voters’ characteristics. However, several issues make
it difficult to develop theories about the relationship between self-reports (R)
and verified turnout (V). The appropriate quantities of interest, the mechanism
that generates overreporting, and the substantive effects of overreporting are
all disputed in the literature. While these issues are mostly empirical, they
muddle the theoretical waters. Before we outline our theory, we need to first
wrestle with these issues.
The first hurdle is determining the population of interest, as choices on
this front may change inferences about the nature of overreporting. Focusing
on overreporting as their ultimate variable of interest, some (e.g., Silver et
al., 1986) argue that overreporting should be calculated among nonvoters
only, the actual group at risk of overreporting. Another advantage of this
procedure is that it controls for turnout rate. Given our motivation, however,
a different strategy is in order.
Because our goal is to explain voter turnout rather than overreporting, we
estimate the hypothetical probability of all respondents to overreport. In line
with standard treatment of binary variables (Greene 1993, pp. 636–43), we
acknowledge that every observed binary variable (here, both validated turnout and reported turnout) is a realization of an unobserved continuous
proclivity (a probability of actually turning out, and a probability of reporting turning out). Therefore, every individual has both an underlying
unobserved probability, and a realization—the observed binary outcome.
Whether they turned out or not, it is possible for an individual to have a
probability of turning out of, say, 0.65, and a probability of reporting turnout of, say, 0.85. Thus, the relevant group by which we calculate
overreporting of turnout is the general voting-eligible population, and in
the sample, the entire group of respondents eligible to vote.
The second hurdle is understanding the sources of systematic components
of overreporting. The standard argument, perhaps most clearly represented in
Race and Turnout in U.S. Elections
289
1. In the only comparative study of overreporting we are aware of, Karp and Brockington (2005)
make a distinction between social desirability and opportunity of overreporting. Examining overreporting in Britain, New Zealand, Norway, Sweden, and the United States, the authors show that
where turnout is high, social desirability is higher as well, yet high turnout also leaves fewer
nonvoters with the opportunity to overreport their participation.
2. Bernstein, Chadha, and Montjoy (2001) propose an alternate possibility. They argue that respondents misreport out of guilt from failure to fulfill a social obligation.
3. A review of the APSR, AJPS, JOP, and Political Behavior between 1993 and 2008 for studies
estimating turnout in U.S. presidential elections, employing individual-level data, and having race
on the right-hand side yielded 13 studies; among them, five found no effect of race on turnout,
two found mixed effect, and among the six which found an effect, five found that African Americans were more likely to participate than Whites.
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Silver et al. (1986), employs the concept of social desirability. The basic idea
is that respondents report their behavior untruthfully because they wish to portray themselves as engaged in a socially desirable activity. This wish is
unevenly felt across respondents; the very factors that make respondents likely to vote also reinforce their desire to portray themselves as voters, regardless
of their actual behavior.1 Thus, those who are more likely to vote are more
likely to misreport when they did not actually participate.2
This argument implies that the use of self-reported (R) data will lead to
overestimation of partial correlations with regard to turnout (Cassel 2003).
For example, if education makes people both more likely to vote and more
likely to exaggerate the extent of their participation, use of R will result in
overestimation of the effect of education on the propensity to turn out. Indeed,
Presser and Traugott (1992) show that, using R, political scientists overestimate partial correlations with regard to turnout. Nonetheless, Sigelman (1982)
argues that substantive conclusions about the predictors of voting, despite biased coefficients, are mostly unchanged by the use of validated as opposed to
reported voting data.
Racial effects are a seeming exception to this general pattern. Repeated
studies have shown that African Americans are more likely to overreport than
Whites, despite possibly being less likely to turn out (Sigelman 1982; Hill and
Hurley 1984; Bernstein, Chadha, and Montjoy 2001). In a series of articles,
Abramson and Claggett (1984, 1986, 1989, 1991) find that although selfreported data indicate no racial difference in turnout after accounting for
education and region, use of validated data reveals otherwise. In other words,
African Americans overreport at higher rates, yet are less likely to turn out
compared to Whites. These two gaps in opposite directions mask each other,
leading researchers using R to underestimate the effect of race on turnout and
often conclude that there is no relationship or even that African Americans turn
out at higher rates than Whites do.3
While much attention has been dedicated to empirical descriptions of this
pattern, to our knowledge there has been no attempt to model how the relationship between turnout and reported turnout varies by race. Doing so allows
290
Deufel and Kedar
us to understand how differential overreporting affects our inference of racial
differences in turnout. It is to this task that we now turn.
Explaining the Reporting Gap: Two Mechanisms
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
We expect the impact of social desirability to be nonlinear, where overreporting first increases and then decreases with propensity to vote. Those with the
highest likelihood of voting have low levels of overreporting because they often actually do turn out. Those respondents who are moderately likely to turn
out are those who overreport at the highest rates. They turn out at the polls less
often than do those most likely to vote, which gives them more of an opportunity to overreport (for similar intuition, see also McDonald 2003, p. 185).
These mid-range potential voters feel a greater need to report socially desirable
behavior than those with a low likelihood of voting, and thus their level of
overreporting is also higher than that of those who vote at low rates. In sum,
opportunity to overreport, along with social desirability, produces a nonlinear
relationship between propensity to vote and overreporting.
Figure 1 graphically illustrates this relationship. Let us begin by examining
curve I only. The top panel (panel A) presents the probability of turning out
on the horizontal axis against the probability of overreporting on the vertical
axis. This curve graphically presents the social desirability argument we discussed above. It asserts that as the probability of voting increases, the
tendency to overreport, measured vertically, first increases and then decreases.
(If one voted for sure, she has no chance of overreporting, and if we assume
that people do not underreport, someone with zero probability of reporting
having voted has an identical probability of actually voting.) If curve I holds
for everyone in the population, the effect of race on reporting should diminish
as more factors that account for the tendency to turn out are included in the
analysis. In other words, any potential racial difference in overreporting is a
product of opportunity and social desirability (on-the-curve effect).
Panel B (still curve I) presents the latent propensity of turning out on the
horizontal axis and the respondent’s latent propensity to report having voted
on the vertical axis. The diagonal represents the relationship between the two
had there been no overreporting. The vertical difference between the curve
and the 45-degree line is the overreporting gap presented in the first panel
on the vertical axis. Suppose that on average Whites score higher on predictors of turnout (e.g., homeownership) such that they are located around point
A and that African Americans are located on average at a point like C or E.
While an E-type voter is both less likely to vote and less likely to overreport
than an A-type, C is less likely to vote yet more likely to overreport than A.
This latter possibility is consistent with the empirical findings that African
Americans may be more likely to overreport but less likely to vote. In sum,
the account in curve I alone suggests that African Americans may overreport
more than Whites because they are less likely to vote, not because of any
Race and Turnout in U.S. Elections
291
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Figure 1. (a) Possible Relationships between Voting and Overreporting.
(b) Possible Relationships between Voting and Report of Voting
292
Deufel and Kedar
4. For discussion of a related issue, the relationship between group consciousness and participation, see Verba, Schlozman, and Brady (1995, pp. 355–56), and their discussion of the departure
of their findings from those of Verba and Nie (1972).
5. See Dawson (1994, pp. 140–41) for discussion of the effect of Jackson’s candidacy and his
treatment by the party. See also Tate (1994) and Kinder, Mendelberg, Dawson, et al. (1989).
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
special propensity to overreport relative to Whites—they are on a different
point on the same overreporting function (curve I).
However, previous studies also note that overreporting by African Americans may have an additional source (Bernstein et al., 2001). Because most
African Americans effectively gained the right to vote in the 1960s after a
hard-fought struggle that drew heavily upon group resources, other things
equal, voting may be more of a socially desirable act for African Americans
than for Whites. A high percentage of African American disenfranchisement
compared to Whites (Uggen and Manza 2002) potentially serves as an additional source of social desirability, reducing the likelihood that those who did
not turn out will report so to the pollster. More generally, though, American
electoral politics is often about race. Electoral districts are drawn explicitly
with race in mind, and race is important for electoral mobilization and demobilization (see, for example, Rosenstone and Hansen 1993; Dawson 1994;
Tate 1994; and Kinder and Sanders 1996).
According to this argument, African Americans have a different tendency
to overreport, controlling for predictors of turnout. Therefore, unlike the previous “on-the-curve” explanation, curve I represents overreporting for Whites
and curve II represents overreporting for African Americans. For example,
take two voters of different races but identical in their likelihood of voting—
E is White and D is African American. Although they are equally likely to vote,
D is more likely to overreport than E. Technically, African Americans are located on a different reporting curve.
We allow for both accounts and, therefore, for both on-the-curve and offthe-curve effects to affect racial differences in reporting. For example, perhaps
the average White is at point A and the average African American is at point
B. In this case, African Americans would be more likely to overreport because
of the relationship between race and overreporting and because of differences
in the values of predictors of turnout between Whites and African Americans.
A side issue we need to address is the consistency of the relationship over
time. The relationship may change systematically as we move away from the
height of civil rights movement activity,4 or idiosyncratically, with events that
alter the social desirability of voting by race.5 This highlights the need to establish whether any of our expectations are met in the data before we can
make inferences about race and turnout. Analysis of survey and validation
data will allow us to determine the extent of racial differences in reporting
and voting, and the degree to which the two mechanisms account for gaps
in reporting of turnout.
Race and Turnout in U.S. Elections
293
REPORTED AND OFFICIAL TURNOUT: HOW BIG IS THE PROBLEM?
6. We do not use the 1964 validation study since the validation procedure in that year was
changed in the subsequent studies.
7. The respondent was included in our validation sample if her validation record indicated that
she voted (coded as 1), did not vote, or no record of voting or registering was found, or if the
respondent reported that she had not registered or SDR (coded as 0), but not if one of the latter
four was found and the status of the office voting records was such that some or none of the
records were inaccessible. The respondent was included in our reported sample if she reported
that she had voted or that she did not vote.
8. In 1976 and 1980, the ANES attempted to validate each self-report, but in 1984 and 1988, selfreports were validated only if the respondent indicated that she was registered or had voted in
localities without registration requirements. In 1980–88 (and for some respondents in 1976), the
ANES also validated those who completed the pre-election interview but not the post-election one
(leaving such respondents without R data), leading to the possibility of a V sample size greater
than that of R (as is indeed the case in 1980, 1984, and 1988).
9. On this point, see also Clausen (1968).
10. In most official measures, the denominator is the Voting Age Population (VAP) as reported
by the Bureau of the Census in their Current Population Reports, Series P-25. VAP includes all
persons over the age of 18, including those ineligible to vote in federal elections, such as legal and
illegal aliens, convicted felons, and individuals legally declared non compos mentis. The VAP is
therefore considerably larger than the pool of potential voters (McDonald and Popkin 2001, although see Burden 2000). For related articles, see Burden (2003), Martinez (2003), and
McDonald (2003).
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
The American National Election Studies (ANES) conducted vote validation
studies in the 1964, 1972–80, and 1984–90 election studies. In this article, we
utilize validated data from the four most recent presidential elections where
these data are available: 1976, 1980, 1984, and 1988.6 In appendix A, we
justify the quality of the validation procedure and offer evidence that it provides an accurate means to address our research questions, and in appendices
B and C, we report the question wording and sampling procedures along with
response rates, respectively. Table 1 shows the official turnout rate, the proportion of respondents reporting having voted (R), and the proportion
validated as voting (V), with the latter two quantities broken down by race.7
Each row presents these quantities for one of the four presidential elections:
1976–88.8
The first obvious point is that in the full sample, V is closer to figures of
official turnout than R is in all years.9 Although at first glance there is still
roughly a 10-percentage-point difference between validated turnout in the full
sample and official turnout figures, this is not surprising, both because the
latter includes in its denominator many who are ineligible to vote (McDonald
and Popkin 2001)10 and because of potential systematic nonresponse to the
ANES. Examination of the differences between validated and reported turnout
over time in the full sample (as well as the gap between the two measures and
official turnout) suggests no apparent secular trend across the four elections.
The difference between R and V is around 10 percentage points in each elec-
53.6
52.6
53.1
50.2
1976
1980
1984
1988
73.3
(71.3, 75.3)
n =1,872
71.7
(69.4, 74.1)
n = 1,390
74.1
(72.1, 76.0)
n = 1,945
70.4
(68.2, 72.6)
n = 1,713
Self-reported
turnout
(R)
Validated
turnout
(V)
66.1
(63.8, 68.4)
n = 1,639
60.0
(57.3, 62.7)
n = 1,288
65.6
(63.4, 67.7)
n = 1,856
65.7
(63.3, 68.1)
n = 1,527
Self-reported
turnout
(R)
73.9
(72.8, 76.0)
n = 1,684
72.3
(69.8, 74.9)
n = 1,222
75.2
(73.1, 77.2)
n = 1,719
71.9
(69.7, 74.2)
n = 1,493
64.7
(62.5, 66.9)
n = 1,826
58.9
(56.4, 61.4)
n = 1,453
63.4
(61.3, 65.5)
n = 2,109
63.2
(60.9, 65.4)
n = 1,756
Whites
Validated
turnout
(V)
Full Sample**
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
52.1
(44.5, 59.7)
n = 169
49.4
(41.6, 57.2)
n = 162
47.3
(41.0, 53.6)
n = 243
45.8
(39.2, 52.3)
n = 225
66.1
(58.9, 73.2)
n = 171
66.7
(59.4, 73.9)
n = 165
65.6
(59.2, 72.0)
n = 215
59.7
(53.1, 66.3)
n = 216
Validated
turnout
(V)
Self-reported
turnout
(R)
African Americans
NOTE.—Cell entries include rate, 95-percent confidence interval, and number of respondents, respectively. The R sample includes respondents who have reported
data, while the V sample includes all those with validated data. This table does not use the post-stratification weights provided by the ANES.
*Official turnout figures are from the Federal Election Commission, http://www.fec.gov/pages/tonote.htm.
**The full sample includes respondents who indicated they were either African American or White.
Official
Turnout*
Year
Table 1. Self-Reported Turnout, Validated Turnout, and Official Turnout by Year (95-percent confidence intervals in
parentheses)
294
Deufel and Kedar
Race and Turnout in U.S. Elections
295
UNMASKING RACIAL EFFECTS: REVEALING THE REPORTING GAP
We now turn to examining whether our theory holds empirically. What is the
underlying relationship between reported turnout and actual turnout? To capture this relationship, in each of the four years we first estimate a logistic
model of turnout and a model of reporting turnout using validated turnout
(V) and self-reported turnout (R), respectively, as dependent variables. We rely
on an established empirical and theoretical literature in choosing our explanatory variables for these models (Rosenstone and Hansen 1993; Verba et al.,
1995). These variables capture resources, in addition to social and political
experiences and attachments that raise the benefits and lower the costs of voting, as well as standard demographic controls. In particular, the model
includes party contact, church attendance, party attachment, age, education,
income, homeownership, race, and gender. The estimated coefficients of the
eight regressions are too much to be discussed in detail here and can be found
in table D2 (appendix D). Nonetheless, some effects hold throughout the eight
models, reflecting the systemic effects discussed in the literature. Resources
such as education and income make voting more likely, as does community
embeddedness measured as homeownership. By the same token, young adults
(usually more mobile and less immersed in a stable community) are less likely
to vote. Attachment to a party, voter mobilization, and church attendance all
have a positive effect on the likelihood of turning out. Finally, once controlling for these effects, living in the South has a partial negative effect on one’s
likelihood of turning out.
11. In a thorough analysis combining data from both presidential and congressional elections,
Belli, Traugott, and Beckmann (2001) find that non-Whites overreport at higher rates than
Whites do.
12. The inferences we make here are identical if one looks at only those respondents who have
both reported and validated data.
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
tion. Among African Americans, however, the gap is considerably greater
than among Whites, falling between roughly 14 and 18 percentage points
across the four elections.11 Finally, note that at first glance there seems to
be no trend in overreporting of African Americans over time.12
While these figures expose a potential reporting gap, they do not provide a
theoretical framework for understanding the sources of racial differences in
reported or validated turnout. In particular, the differences do not tell us
whether the gap is simply a reflection of African Americans’ lower propensity
to vote given other predictors of turnout (an on-the-curve effect) or whether
African Americans are more likely to overreport, even accounting for predictors of turnout (an off-the-curve effect related to differing social desirability),
or both. To get at this issue, we must model the relationship between reported
and actual turnout.
296
Deufel and Kedar
Next, we map the relationship between voting and reported voting. We pool
the resulting predicted probabilities over the four years and model the relationship between voting and reporting as a quadratic function, allowing it
to vary by race and year (including all relevant interaction effects for maximum flexibility).13 The quadratic specification and the interactions allow us to
examine if the relationship expected between overreporting and turnout (as
specified in figure 1) is found in the data and whether it varies by race and
over time.
Figure 2 presents this relationship for all four years by race. On the horizontal axis are the probabilities of voting using validated data, and on the
vertical axis are predicted probabilities of reporting having voted. As in
figure 1b, the 45-degree line in figure 2 represents a hypothetical relationship
with no reporting gap. The four dashed lines present the relationship for
Whites, while the four solid lines present the relationship for African Americans in the four years. The vertical gaps between the diagonal line and each
of the eight lines are the respective overreporting gaps. The figure illuminates
several aspects of the relationship between reported and actual turnout. Examine the relationship for Whites first. The pattern is consistent with the theory
13. The full specification and coefficients are presented in table E1 of appendix E.
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Figure 2. Discounting Functions: 1976–1988.
Race and Turnout in U.S. Elections
297
Implications of the Reporting Gap
So far we have established that there is a nonlinear tendency to overreport,
and that this tendency differs for African Americans and for Whites. Our next
step is to analyze the consequences of these patterns for inference about turnout. To accomplish this task, we rely on three models of voter turnout. The
first two models come from Abramson and Claggett’s studies of racial differences in overreporting and turnout (1984, 1986, 1989, 1991).14 Their first
(“thin”) model has race only on the right-hand side. Their second (“thick”)
model controls for education and region (South). Finally, based on recent advances in the study of turnout (Rosenstone and Hansen 1993; Verba et al.,
1995), we extend our analysis to a third (“full”) model that is identical to
14. Abramson and Claggett (1984, 1986, 1989, 1991) code the race variable as a dichotomous
1, 1 variable, while we employ the more common dummy approach (0, 1). Despite this difference,
our results are very similar.
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
specified above: the line is, in general, above the diagonal, and the relationship is nonlinear; as the probability of voting increases, the probability of
overreporting one’s likelihood of voting first increases and then declines.
Individuals in the middle range have a higher likelihood of overreporting
than either those who are unlikely to vote or those who almost surely vote.
Consistent with our expectation, the relationship between turnout and report
of turnout for Whites is constant over time. The functions for 1976, 1980,
and 1984 are almost identical, and the data expose no trend in the relationship. However, the overreporting function for 1988 is a curious exception to
this stable pattern.
The relationship for African Americans tells an even clearer story. Here,
too, overreporting is in quadratic relation to voting, but the magnitude of
overreporting is greater; the vertical gaps between the diagonal and each of
the functions reaches almost 20 percentage points. Overreporting is not orthogonal to race. Although the coefficients bear large standard errors (partly
because of the many interaction terms included), the point estimates (presented in figure 2) consistently suggest that African Americans’ reporting
function is different than Whites (off-the-curve effect). Furthermore, the nature of overreporting by African Americans appears roughly similar across all
four elections. Based on this consistency, in the next stage we eliminate the
year variables, resulting in reduced estimation uncertainty and similar results
(column 2 in table E1).
In summary, we have presented evidence for two mechanisms at play. First
is a quadratic relationship between voting and report of voting, supporting the
importance of opportunity to overreport and social desirability. Second is a
racial difference, which may be attributed to an additional social desirability
variable related to racial history. Both effects are constant over time.
298
Deufel and Kedar
15. Likelihood Ratio Tests comparing the “full” model to the “thick” and the “thin” models are
statistically significant in all years.
16. Results for 1976 and 1980 are similar to those reported below.
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
the model of voting turnout we used to construct figure 2 (for estimates, see
tables D1 and D2 in appendix D). To reiterate, in addition to region and education, this model also includes variables that capture resources, social and
political experience, and attachment that raise the benefits and lower the costs
of voting, as well as standard demographic controls.15
While the inclusion of these particular variables is not novel, in the context
of our unpacking of overreporting, a comparison of the three models sheds
light on our theoretical argument. Our theory implies that effects relating to
opportunity and social desirability account for part of the racial reporting
gap. According to this argument, as we move from the thin to the thick to
the full model and control for more of the determinants of turnout, and hence
opportunity to overreport, using R, the reporting gap between the two racial
groups should decline. In other words, if as we account for more factors that
disproportionately reduce African Americans’ likelihood of voting, the gap in
overreporting diminishes, then we may infer that African Americans overreport more because they vote less (on-the-curve effect). However, if the gap
between reported and actual data holds even as we control for such factors,
then it is an indication that African Americans indeed have a different reporting
function, likely because of additional social desirability, as we discuss above.
We focus below on 1984 and 1988, the two most recent presidential elections in which validated data are available.16 Table 2 presents the results for
1984. The first and second rows in each section present the mean predicted
probabilities for African Americans and Whites, respectively, with the values
of other explanatory variables held at their mean. The third entry is the mean
difference in turnout between the two racial groups. A positive number indicates that Whites are estimated to turn out at higher rates than African
Americans. As the table shows, in 1984, R and V (in the first and second columns, respectively) lead to different inferences about racial gaps in turnout. In
model 1, both R and V suggest that African Americans are less likely to turn
out, yet using V, the racial difference is substantially greater (10 percentage
points using R as opposed to 19 using V). In other words, validated data
reveal a large effect of race on turnout, an effect that is masked by selfreported data.
The results of model 2 make an even stronger case: using R, there is no
statistically significant racial gap in turnout, and the predicted probabilities
for both races are inflated (73 percentage points for African Americans,
and 76 for Whites). V, however, exposes a different picture. First note that
the predicted probability of African Americans turning out is substantially
lower (57 percentage points) and significantly different from R. The figure
for Whites is lower as well (69 percentage points). Most importantly, while
0.822 (0.760, 0.872) 0.651 (0.564, 0.727) 0.649 (0.533, 0.764) 0.670 (0.551, 0.775)
0.805 (0.781, 0.828) 0.710 (0.682, 0.737) 0.720 (0.673, 0.762) 0.718 (0.676, 0.758)
-0.017 (-0.072, 0.044) 0.059 (-0.022, 0.148) 0.071 (-0.051, 0.190) 0.048 (-0.069, 0.168)
0.565 (0.431, 0.688)
0.672 (0.631, 0.713)
0.107 (-0.016, 0.243)
0.539 (0.414, 0.656)
0.675 (0.633, 0.717)
0.136 (0.012, 0.272)
Model 3 (“full”)
African Americans
Whites
Difference b/w African Americans and White
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
NOTE.—Entries in the table are predicted probabilities of turning out. All other variables in the model are held at their respective means.
Model 1: Race.
Model 2: Race, education, and region (South vs. other).
Model 3: Full model; see specification in table D2.
For all models, see N and coefficients in tables D1 and D2.
0.566 (0.484, 0.634)
0.687 (0.661, 0.710)
0.121 (0.046, 0.203)
0.731 (0.665, 0.792)
0.763 (0.740, 0.786)
0.032 (-0.035, 0.102)
0.495 (0.354, 0.622)
0.666 (0.619, 0.708)
0.171 (0.040, 0.323)
0.466 (0.338, 0.578)
0.668 (0.622, 0.713)
0.202 (0.081, 0.331)
Model 2 (“thick”)
African Americans
Whites
Difference b/w African Americans and Whites
0.503 (0.429, 0.573)
0.686 (0.662, 0.709)
0.183 (0.110, 0.260)
Deflated
self-reported turnout
(1976–1984,
out of sample)
Discounted
self-reported
turnout
(1984–1984)
0.663 (0.595, 0.730)
0.757 (0.735, 0.777)
0.093 (0.019, 0.166)
Validated
turnout
(V)
Model 1 (“thin”)
African Americans
Whites
Difference b/w African Americans and Whites
Self-reported
turnout
(R)
Table 2. Predicted Probabilities of Turning Out by Race: 1984 Presidential Elections (95-percent confidence intervals in
parentheses)
Race and Turnout in U.S. Elections
299
0.664 (0.587, 0.727)
0.728 (0.703, 0.753)
0.064 (-0.010, 0.143)
0.765 (0.670, 0.839)
0.775 (0.746, 0.803)
0.010 (-0.070, 0.105)
Model 2 (“thick”)
African Americans
Whites
Difference b/w African Americans and Whites
Model 3 (“full”)
African Americans
Whites
Difference b/w African Americans and White
0.530 (0.437, 0.614)
0.704 (0.674, 0.733)
0.175 (0.082, 0.275)
0.480 (0.389, 0.560)
0.679 (0.650, 0.706)
0.199 (0.114, 0.293)
0.465 (0.386, 0.541)
0.672 (0.645, 0.697)
0.207 (0.125, 0.287)
Validated
turnout (V)
Deflated
self-reported turnout
(1976–1988,
out of sample)
0.431 (0.304, 0.562)
0.620 (0.573, 0.666)
0.190 (0.048, 0.331)
0.497 (0.363, 0.637)
0.635 (0.584, 0.680)
0.138 (-0.008, 0.273)
0.600 (0.454, 0.733)
0.686 (0.638, 0.732)
0.086 (-0.058, 0.237)
Deflated
self-reported
turnout
(1988–1988)
0.405 (0.283, 0.523)
0.654 (0.610, 0.697)
0.250 (0.112, 0.387)
0.464 (0.339, 0.587)
0.665 (0.622, 0.709)
0.201 (0.072, 0.340)
0.562 (0.422, 0.690)
0.708 (0.663, 0.747)
0.146 (0.006, 0.297)
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
NOTE.—Entries in the table are predicted probabilities of turning out. All other variables in the model are held at their respective means.
Model 1: Race.
Model 2: Race, education, and region (South vs. other).
Model 3: Full model, see specification in table D2.
For all models, see N and coefficients in tables D1 and D2.
0.596 (0.528, 0.664)
0.715 (0.690, 0.739)
0.119 (0.046, 0.193)
Model 1 (“thin”)
African Americans
Whites
Difference b/w African Americans and Whites
Self-reported
turnout (R)
Table 3. Predicted Probabilities of Turning Out by Race: 1988 Presidential Elections (95-percent confidence intervals in
parentheses)
300
Deufel and Kedar
Race and Turnout in U.S. Elections
301
Table 4. The Challenge of Partial Observation of Turnout
Data available/period
t
t+1
Self-reported
Validated
+
+
+
?
DEFLATING SELF-REPORTS
Table 4 summarizes the challenge we face. In some periods (t) both selfreported and verified data are observed, while in others (t + 1) only self-reported
data are available. While the literature on overreporting bias is extensive,
almost no attention has been given to potential solutions. Suggestions of
measurement improvements ex ante (Belli et al. 1999; Duff, Hammer, Park,
et al. 2007; Holbrook and Krosnick forthcoming), discussing issues such as
poor memory and face-saving response options, produce more accurate
reports, but the problem of bias with already collected data still holds.
Our proposed algorithm is simple. Based on our theory and empirical
evidence for the joint roles of opportunity and desirability, we model the relationship between voting and reporting of voting using all years in which
reported and validated data from presidential elections are available (1976,
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
using R alone there is no statistically significant effect of race on turnout, V
reveals an effect of 12 percentage points—reporting is correlated with race.
Finally, the “full” model shows the same trend, although the results fall
slightly short of standard levels of statistical significance. As in the previous
two models, the point estimates indicate that the racial gap—hidden using
self-reported figures—is greater when turnout is measured with validated
figures, and the racial gap (which is six percentage points) is exposed by
the use of validated figures.
Table 3 repeats this exercise in 1988. The results are consistent with those of
1984, and we describe them here only briefly. Within each model, the racial
difference exposed by V is substantively greater than the difference suggested
by R. Even in the “full” model, use of validated data suggests an 18-percent
turnout gap while self-reported data suggests no difference. The findings using
validated data are consistent with the findings of Dawson (1994) and Kinder
and Sanders (1996) previously noted.
In sum, with regard to our substantive issue of interest—racial differences
in turnout—a strong case can be made that the use of self-reported data can
lead to mistaken inferences. Both opportunity and social desirability (on-thecurve effects) and an additional desirability related to race (off-the-curve effect) lead to biased inferences. As we control for opportunity to overreport,
the gap declines, yet because of differing social desirability, some racial differences remain. Having revealed these effects, in the next section we devise a
remedy for the biases produced by conventional investigation.
302
Deufel and Kedar
1980, 1984, and 1988). We then use the estimated relationship to deflate selfreported turnout data at time t + 1 (1992 and 1996), where validated data are not
available. We now turn to presenting our algorithm in greater detail.
The Deflating Algorithm
STEP 1: MODEL THE RELATIONSHIP BETWEEN VOTING AND REPORTING OF VOTE
AT TIME
t
1.1. We estimate a model of reported turnout at time t:
ð1Þ
where Xt1 is a vector of voter characteristics, β t1 is a vector of coefficients,
and f ð⋅Þis a logistic function.17 We then calculate the predicted probability of
reported voting for each individual,
^
PrðRt = 1Þ
1.2. Similarly, we estimate a model of validated turnout at time t:
V t = f X2t ; β t2 ;
ð2Þ
and calculate the predicted probability of turning out for each individual
^
PrðV t = 1Þ.18
1.3. The deflating function:
We model the relationship between voting (V) and reporting of vote (R)
where the dependent variable is the estimated probability of actually voting
calculated from equation (2). Given the racial differences in voting and report
of voting established above, we let the voting be a quadratic function of selfreported voting and vary interactively by race. The right-hand side, then, is a
vector of covariates such that
^
^
^
PrðV t = 1Þ = δt0 + δt1 racet + δt2 PrðRt = 1Þ + δt3 Pr2 ðRt = 1Þ
h
i
h
i
^
^
+ δt4 racet PrðRt = 1Þ + δt5 racet Pr2 ðRt = 1Þ
ð3Þ
Equation (3) reflects how social desirability and opportunity affect racial differences in overreporting. The coefficients δ 0 , δ 2 , and δ 3 represent the
quadratic relationship between voting and report of voting among Whites.
17. In our case, based on the observed consistent relationships observed over time, we pool data
from 1976, 1980, 1984, and 1988.
18. In our case, we use the same functional form and vector of voter characteristics as in equation 1,
reported in table D2.
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Rt = f X1t ; β t1 ;
Race and Turnout in U.S. Elections
303
The coefficients δ1, δ4, and δ5 represent the difference between the curve for
Whites and the curve for African Americans. The additional effect of social
desirability (off-the-curve) is carried out by the latter set of coefficients. The
estimated coefficients are reported in the second column of table E1. We will
next use these coefficients to deflate reported turnout.
STEP 2: DEFLATE REPORTED DATA AT
t
+1
Rt + 1 = f X3t + 1 ; β t3 + 1
ð4Þ
(in our case, 1992 or 1996). Here, too, we use a logistic function. We then compute predicted probabilities of reported turnout for two hypothetical individuals:
White ðP^rðRt + 1 jW = 1ÞÞ and African American ðP^rðRt + 1 jAf :A = 1ÞÞ, with all
other variables held at their mean (six hypothetical individuals altogether).
We repeat this step for three different model specifications corresponding with
the models above (“thin,” “thick,” and “full”). The estimates are reported in
tables D1 and D2.
2.2. We apply the deflating function to the predicted probabilities generated
in the previous step. For example, for a White individual:
Dt + 1 = δ^ t0 + δ^ t1 ðrace = W Þt + 1 + δ^ t2 P^ r Rt + 1 jW + δ^ t3 P^ r2 Rt + 1 jW
h
^ h
^
^ ^ + δ t4 ðrace = W Þt + 1 P r Rt + 1 jW + δ t5 ðrace = W Þt + 1 P r2 Rt + 1 jW ð5Þ
The outcome Dt+1 is the estimated individual’s deflated probability of
turning out.
Multiple-Step Estimation
The logic of our algorithm is straightforward. It is important to keep in mind,
however, that each step produces a layer of uncertainty that should be taken
into account. The first sourcet is the tuncertainty around the Maximum Likelihood estimates at time t (β^ 1 andβ^ 2 ), which in turn produce the predicted
probabilities for the two
variables
V and R (steps 1.1 and 1.2). We assume that
t
β t1 = fMultivar:−Normal β^ 1 ; ∑ β^t , and
drawing randomly from this distribution,
t
1
we get a sample distribution of β^ 1 (in this step, as in all other steps t described
below, we draw 1,000 times). We repeat the same procedure for β^ 2, producing 1,000 predicted probabilities of turning out and 1,000 predicted
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
In this stage, we first estimate the probability of turning out using reported
data (the only data we have at t + 1). We then employ our deflator to deflate
the estimated probabilities.
2.1. Similar to step 1.1, we estimate a model of reported turnout at time
t + 1:
304
Deufel and Kedar
probabilities of reporting to have turned out for each individual. Therefore, in
step 1.3, we estimate 1,000 deflating functions. Similarly, we
draw on the ret+1
spective distribution of the coefficients on R at time t + 1 (β^ 3 ) and calculate
predicted probabilities of two hypothetical respondents to be deflated. Finally,
^
we employ the sets of coefficients of the deflating function δ to these 1,000
predicted probabilities and get deflated probabilities. The results we present
are the mean of those deflated probabilities.19
Evaluating Our Algorithm
19. We compute 95-percent confidence intervals by sorting the deflated probabilities and taking
the 25th and 975th probabilities.
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
To evaluate our algorithm, we pose two questions. First, we examine whether
the inferences we make about racial differences in turnout using our method
come closer to V than those we would make using R. Our measure of turnout
ought to produce quantities of interest that are closer to those produced by
the use of V than those produced by the use of R, conditioning on the same
model. Focusing on racial differences in turnout, we turn back to tables 2
and 3. Recall that the use of R coefficients often led to underestimated effects
of race on turnout. We perform both in-sample and out-of-sample tests to
examine the performance of our measure. For the in-sample tests, we deflate
self-reports using the deflating function produced by validated data from
1984. For our out-of-sample tests, we estimate our deflating function in
1976 and use it to deflate self-reported data in 1984. These tests are particularly demanding for two reasons. First, we use auxiliary information from
1976 to correct bias in data produced eight and 12 years later. Second, we
know from figure 2 that Whites overreported in a unique manner in 1988
compared to other years. Results of these estimations are presented in the
third (in-sample) and fourth (out-of-sample) columns of table 2. We then
repeat this exercise for 1988 (table 3).
Tables 2 and 3 reveal that using information from 1976 to deflate
self-reports in 1984 and 1988 produces substantially more accurate estimates
of racial differences in turnout than the use of R does. As expected, the results
for 1984 are somewhat more accurate than those of 1988, but in all cases the
point estimates using the deflated data (D) are considerably closer to V than
those produced by R (the multiple stages of estimation acknowledge the uncertainty in these point estimates, leading to a few cases where the D
confidence intervals overlap with zero). Notice in particular the figures for
African American turnout produced by our algorithm in the “full” model
for 1988. While self-reports suggest no racial gap in turnout, our algorithm
exposes the particularly low turnout of African Americans, consistent with
previously cited accounts of alienation felt among the African American population in 1988. In sum, while keeping in mind the caveat that the same
Race and Turnout in U.S. Elections
305
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Figure 3. a and b. Model Fit Based on Self-Reported, Validated, and
Deflated Data for 1984 (Panel a) and 1988 (Panel b)
306
Deufel and Kedar
Exposing Hidden Effects in the 1990s
Having established its validity, we reach the final stage of our analysis, applying our algorithm to deflate self-reports in the 1992 and 1996 presidential
elections where validated data are unavailable. We do so exactly as we describe above: we estimate a turnout model based on self-reports for 1992
(and 1996), calculate the predicted probability of turning out for each individual, and apply our deflator to those predicted probabilities.
Table 5 compares the results based on self-reports to those based on the
deflated self-reports produced by our algorithm for each of the three models,
as above. The table presents two main findings. First, it shows that predicted
probabilities produced by R are inflated compared to D across models in both
elections. More importantly, our use of D exposes a larger racial gap in turn-
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
sampling frame was used both in 1976 and in 1984 and 1988, our method
produces point estimates superior to those produced by R, although as expected, a fair amount of noise created in the process increases uncertainty.
Second, we examine how well our model fits the data. In particular, we
evaluate how predicted probabilities of turning out based on D produced
by the out-of-sample procedure match observed data of turnout based on
validated figures. Figure 3 presents this evaluation for 1984 and 1988 in
the top and bottom panels, respectively. To evaluate our model fit, for
each of three measures, V, R, and D, we sort our observations by predicted probability of turning out produced by the respective measure
(based on estimates in table D2), group them into 20 equal-length intervals, and plot on the horizontal axis the average predicted probability for
that group, and on the vertical axis the actual observed fraction of votes in
that group as indicated by validated data. A perfect fit (on average) will
result in all points aligned right on the 45-degree line.
As can be seen for 1984, results produced by the self-reported model do
not fit the data well and overestimate the observed vote by up to 20 percentage points. Consistent with figure 2, the inaccuracy is largest in the middle
range. It is reassuring that our model for validated data produces a good fit—
predicted probabilities are aligned close to the 45-degree line, and importantly,
in cases where our predictions diverge from the line, they reveal no systematic
bias but rather are scattered equally above and below the line. Finally, results
produced by the deflated data are tightly clustered around the 45-degree line
with no systematic deviation.
Results for 1988 are similar. Recall that this is our hardest case: the out-ofsample test is based on data 12 years prior to the election, and the reporting
pattern in 1988 is different than in other years. And although our prediction
does involve more noise than in 1984, here too, while the prediction based on
self-reports differs systematically from the observed vote, prediction based on
our algorithm is in line with observed patterns.
0.854 (0.804, 0.900)
0.836 (0.813, 0.857)
-0.018 (-0.064, 0.032)
0.790 (0.734, 0.841)
0.792 (0.769, 0.813)
0.001 (-0.052, 0.061)
0.670 (0.607, 0.725)
0.773 (0.753, 0.793)
0.103 (0.044, 0.165)
0.680 (0.603, 0.752)
0.750 (0.720, 0.777)
0.069 (0.000, 0.146)
0.609 (0.513, 0.693)
0.703 (0.673, 0.730)
0.094 (0.001, 0.197)
0.483 (0.399, 0.559)
0.685 (0.656, 0.712)
0.202 (0.121, 0.287)
Deflated self-reported
turnout
Deflated self-reported
turnout
0.490 (0.396, 0.581)
0.687 (0.658, 0.716)
0.197 (0.106, 0.294)
0.576 (0.488, 0.659)
0.699 (0.667, 0.729)
0.122 (0.038, 0.216)
0.671 (0.582, 0.746)
0.751 (0.720, 0.780)
0.080 (0.003, 0.166)
0.676 (0.594, 0.745)
0.776 (0.753, 0.799)
0.100 (0.026, 0.182)
0.760 (0.687, 0.822)
0.787 (0.763, 0.810)
0.027 (-0.036, 0.101)
0.845 (0.780, 0.897)
0.837 (0.810, 0.861)
-0.008 (-0.066, 0.058)
1996
Self-reported
turnout
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
NOTE.—Entries in the table are predicted probabilities of turning out. All other variables in the model are held at their respective means.
Model 1: Race.
Model 2: Race, education, and region (South vs. other).
Model 3: Full model; see specification in table D2.
For all models, see N and coefficients in tables D1 and D2.
Model 3 (full)
African Americans
Whites
Difference b/w African Americans and White
Model 2 (thick)
African Americans
Whites
Difference b/w African Americans
and Whites
Model 1 (thin)
African Americans
Whites
Difference b/w African Americans and Whites
Self-reported
turnout
1992
Table 5. Predicted Probabilities of Turning Out by Race: 1990s Presidential Elections (95-percent confidence intervals in
parentheses)
Race and Turnout in U.S. Elections
307
308
Deufel and Kedar
Conclusion
Sophisticated survey methodology can shed light on important puzzles in
political behavior, yet in some circumstances survey responses should not
be taken at face value. Electoral participation is a prime example.
This study explicitly analyzes how reporting of turnout is related to actual
turnout, and thus, who really votes. We demonstrated that racial differences in
turnout are underestimated because of both differing opportunities to overreport and differences in social desirability between African Americans and
Whites. With this insight we provided a means to assess the impact of race
on turnout in years where validation data are unavailable. Finally, through the
mechanism we develop, we inferred that the large racial gap in turnout existed
in the 1970s and 1980s and was likely present in the 1990s.
With an African American leading a major party for the first time and
African Americans turning out in record numbers (Ansolabehere and Stewart
2009, p. 6), the 2008 U.S. presidential elections presented a new reality to the
American public. Yet, other things equal, social desirability may be an even
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
out compared to the use of R. In both 1992 and 1996, the use of R suggests
that while Whites turn out significantly more than African Americans in the
“thin” model (a difference of about 10 percentage points), the effect washes
out when controls are introduced in the “thick” and in the “full” models. However, similar to our findings in the 1980s, in the case of D, the point estimates
of the gap between African Americans and Whites are considerably higher in
each model. In the thin model, the racial gap in turnout increases to about 20
percentage points in both elections. In the thick model, we expose a difference
of 12 percentage points in 1996. Finally, results of model 2 in 1992 and the
full model in both 1992 and 1996 expose differences that, although close to a
traditional threshold of statistical significance, are all centered at a much
greater gap than self-reported data suggest. That racial differences decline
as we move from the “thin” to the “full” model is not surprising. While in
the thin model the gap is a function of both opportunity and desirability effects (on-the-curve and off-the-curve effects), adding background variables
reduces the role of opportunity effects relative to those caused by desirability.
How far in time can we extrapolate? Although we establish the constancy
of the relationship between reported and validated turnout between the 1970s
and the 1980s and test the performance of our algorithm between the earliest
and latest available data points, we extrapolate only to the two elections immediately following the latest validated data. Our approach can be applied to
elections after 1996, but absent further validation data, we take caution and
stop in 1996. We do not argue that the same results we find in the 1980s and
1990s hold indefinitely. Rather, our exercise exposes the perils of using selfreported turnout data for inference about racial (and perhaps other) differences in turnout. We discuss this point further in the concluding section.
Race and Turnout in U.S. Elections
309
more powerful source of overreporting among African Americans who did not
turn out in elections in which a Black candidate competes. While we demonstrated that the underlying patterns of overreporting on which we base our
method hold during the period in which we are able to test our assumptions,
we are, of course, unable to argue that they hold forever after. The 2008 U.S.
presidential elections call us to collect new validated data that will allow us to
continue to uncover the important and sometimes hidden relationship between
race and turnout.
Appendix A: Validated or “Validated”?
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
The arguments presented in this article rest on the assumption that the validation
data are an accurate representation of actual voting. In this section, we
address possible concerns about this assumption. In a National Election
Studies Technical Report, Presser, Traugott, and Traugott (1990) suggest
that the racial gap in turnout revealed by the validated data is merely an
artifact of poor recordkeeping in areas where African Americans reside. In
other words, since African Americans’ voting records are poorly kept, validated figures mistakenly suggest that they vote at substantially lower rates
compared to Whites. While no data are pure of measurement error, we
doubt this conclusion for three reasons.
The authors argue that African Americans are more likely to live in
areas where record quality and access to records were poorer (table 6,
p. 28), and that across both races, the proportion validated as voting (given a positive self-report and validation as registered to vote) is lower in
areas where record quality is lower. (The authors combine six variables
of record management into an additive index of record quality (see footnote 6 in the report) and then break the index into three categories.)
Focusing on those living in areas of high record quality alone, African
Americans are only six percent less likely to be validated as voting than
Whites—a third of the racial difference overall.
While we do not contest this argument, it does not necessarily undermine our finding of a racial gap in overreporting. First, although of
varying magnitudes, racial differences in overreporting are found within
each level of recordkeeping quality (28 percent, 11 percent, and six percent, depending on record quality). In other words, since we are not
aware of an argument suggesting that the records for African Americans
are more difficult to find than those for Whites within districts where
record quality as a whole is similar, racial differences in overreporting
remain, holding constant record quality.
In addition, although we do not reanalyze the data here, the nature of the
relationships the authors detect both across and within types of record quality actually complements our argument rather than undercutting it. Presser et
al. limit their analysis to respondents who both registered and reported to
310
Deufel and Kedar
Appendix B: Question Wording and Variable Names
All variables and codes are from the American National Election Study 1948–
2004 Cumulative file.
VCF9151 Vote Validation: Turnout (Self-Report)
1. Yes, voted
5. No, did not vote
VCF9152 Vote Validation: Attempt to Validate Registration
1. Yes, attempted
2. No: R says is not registered (1984–1990); R says not required/DK/NA if
registered (1984–1990)
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
have voted. In our framework, these are respondents who are relatively likely to vote, located at the middle and upper sections of figures 1 and 2. Since
the authors reveal that districts with low-quality records are also areas where
turnout is generally lower, such as the South and central cities, respondents
in these districts are probably relatively less likely to vote among a sample
of people with a generally high propensity of turning out. In our framework,
these voters are located in the middle (as opposed to the uppermost section)
of figures 1 and 2. Therefore, the racial gap the authors find within categories of record quality is analogous to our differential desirability (off-thecurve) effect: this is the racial gap controlling for propensity to vote. As
we showed, this gap is large for voters in the middle section (poor record
quality in Presser et al.’s sample) and smaller for voters who are highly
likely to vote (high record quality). Conversely, the effect the authors find
across record quality is analogous to our opportunity effect (on-the-curve):
rates of overreporting are lower where voters are likely to vote (areas with
high record quality).
Indeed, utilizing the same data but including nonregistered as well as registered voters, Abramson and Claggett (1992) conclude that racial differences
in overreporting are not products of varying record quality. Utilizing the same
objective measures of record quality as Presser et al., they find no evidence in
1986 or 1988 that African Americans are more likely to live in areas with
poor recordkeeping. Finally, conducting a thorough analysis of record quality,
Cassel (2004) concludes that “validation error from poor-quality or inaccessible voting records—located particularly in central city, African American,
and Southern communities—does not bias the effects of related turnout predictors” (p. 107). In sum, available evidence supports our assumption
regarding the quality of validated data.
Race and Turnout in U.S. Elections
311
3. No: No respondent named and/or insufficient address (all years); registered
out of area (except 1984, 1990); Washington, D.C. [Note: Code 2 has priority]
4. No: Same-day registration (1964, 1980)
5. No: Records not sent out due to: Office error (1980); No Post IW (1964,
some cases in 1976)
7. No: Office refuses all access to registration records
VCF9155 Vote Validation: Vote Validated
VCF9153 Vote Validation: Office Voting Records
1. Office records appear to be adequate; no information about deficiencies
2. Some office voting records not accessible
3. R’s name unknown and/or insufficient address and/or registered out of
area** (not 1984, 1986: see code 0)
5. No office voting records accessible
VCF0703 Did R Register and Vote?
“In talking to people about the election, we [1972 and later: often] find that
a lot of people weren't able to vote because they weren’t registered or they
were sick or they just didn’t have time.” (1978 and later: “How about you, did
you vote in the elections this November?”)
(1) No, did not vote
(2) Yes, voted
VCF0106 Race
1948–1998: Interviewer Observation of Race
1972 and later: “In addition to being American, what do you consider your
main ethnic group or nationality group?”
(1) White
(2) Black
(3) Other
VCF0108 Hispanic
“In addition to being American, what do you consider your main ethnic
group or nationality group?” (If no Hispanic group mentioned:) “Are you of
Spanish or Hispanic origin or descent?” (If yes:) “Please look at the booklet
and tell me which category best describes your Hispanic origin.”
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
1. Yes
3. Registration record was found; no record of R voting
5. No registration record found; no record of voting found
312
Deufel and Kedar
(1) Yes, R is Hispanic
(2) No, R is not Hispanic
Appendix C: Sampling Procedure and Response Rates
SAMPLING PROCEDURE
RESPONSE RATE
The ANES calculates response rate as the number of interviews divided by the
number of all eligible citizens from a random sample of households. Below,
we reproduce part of a table of response rates from the ANES Web site for the
survey years relevant to this study. Full details about the ANES methodology
for constructing response rates are available at http://www.electionstudies.org.
Table C1. Response Rates and Refusal Rates for the 1976–1996 Preand Post-Election Studies (Presidential Election Years Only)
Year
1996#
1992#
1988
1984
1980
1976+
Response Refusal Number of Sample Dates conducted Dates conducted
rate
rate
interviews
N
(pre)
(post)
59.8
74.0
70.5
72.1
71.8
70.4
20.8
22.2
20.7
20.8
-
398
1126
2040
2257
1614
2248
+ Using unweighted Ns
- Unknown
# Only includes fresh cross-section sample
666
1522
2893
3131
2249
3191
9/3-11/4
9/1-11/2
9/6-11/7
9/4-11/6
9/7-11/3
9/7-11/1
11/6-12/31
11/4-1/13
11/9-1/24
11/8-1/25
11/5-2/7
11/3-1/30
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
The ANES defines their sample universe as “all U.S. households (including
civilian households on military bases) in the 48 coterminous states and the
District of Columbia,” and employs a multi-stage area probability design,
in four stages. In stage one, primary geographical areas including Standard
Metropolitan Statistical Areas (SMSAs), counties, and county groups are selected, followed by area segments such as housing unit clusters, followed by
housing units, and finally all eligible members of the housing unit. Selection is
random at each stage. (This description is paraphrased from, and is described
in greater detail on, the ANES Web site at http://www.electionstudies.org.)
Race and Turnout in U.S. Elections
313
Appendix D: Model Specifications
Table D1. Estimated Turnout Based on Self-Reported and Discounted
Data for Models 1 and 2 (Thin and Thick) (standard errors in parentheses)
1984
1988
1992
1996
SelfSelfSelfValidated
SelfValidated
reported
reported
reported
turnout
reported
turnout
turnout
turnout
coefficients turnout coefficients turnout
coefficients coefficients coefficients
coefficients
N
1614
Log
1016.9
likelihood
Model 2 (thick)
African
-0.500*
American
(0.168)
Education
0.338*
(0.048)
South
-0.440*
(0.130)
Constant
-0.244
(0.170)
N
Log
likelihood
p < .05 = *
1614
982.9
-0.458*
(0.166)
1.135*
(0.061)
-0.852*
(0.168)
0.717*
(0.061)
-0.532*
(0.159)
0.920*
(0.061)
-0.518*
(0.151)
1.230*
(0.055)
-0.498*
(0.189)
1.242*
(0.069)
1663
938.3
1385
886.0
1530
929.2
1838
1005.9
1349
732.2
-0.165
(0.179)
0.450*
(0.052)
-0.315*
(0.136)
-0.253
(0.177)
-0.834*
(0.179)
0.424*
(0.050)
0.172
(0.154)
-0.729*
(0.180)
-0.293
(0.174)
0.593*
(0.052)
-0.297*
(0.137)
-0.941*
(0.180)
-0.002
(0.172)
0.686*
(0.053)
-0.673*
(0.131)
-0.869*
(0.180)
-0.148
(0.203)
0.573*
(0.066)
-0.319*
(0.144)
-0.582*
(0.227)
1663
892.9
1385
848.5
1530
850.8
1838
895.2
1349
686.9
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Model 1 (thin)
African
-0.773
American
(0.158)
Constant
0.784*
(0.057)
1992
1996
Age 25–34
African
American
Age 17–24
South
Male
Family income
0.542*
(0.071)
0.111
(0.076)
0.301*
(0.142)
-0.396*
(0.158)
-0.119
(0.230)
-1.025*
(0.250)
-0.665*
(0.215)
0.690*
(0.078)
0.157
(0.082)
0.307*
(0.152)
-0.211
(0.165)
0.141
(0.226)
-1.296*
(0.288)
-0.831*
(0.263)
0.403*
(0.064)
0.377*
(0.062)
0.003
(0.109)
-0.373*
(0.144)
-0.267
(0.191)
-1.019*
(0.235)
-0.405
(0.210)
0.530*
(0.069)
0.254*
(0.070)
0.034
(0.116)
-0.193
(0.151)
0.133
(0.208)
-1.090*
(0.250)
-0.512*
(0.230)
0.403*
(0.064)
0.138
(0.072)
0.293*
(0.137)
0.022
(0.118)
-0.754*
(0.206)
-1.180*
(0.293)
-0.634*
(0.245)
0.607*
(0.070)
0.311*
(0.075)
0.190
(0.143)
-0.513*
(0.159)
-0.058
(0.251)
-1.185*
(0.301)
-0.752*
(0.263)
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
0.524*
(0.065)
0.260*
(0.073)
0.634*
(0.135)
-0.436*
(0.147)
0.248
(0.217)
-1.130*
(0.245)
-0.795*
(0.222)
0.786*
(0.070)
0.325*
(0.069)
-0.069
(0.133)
-0.554*
(0.146)
0.146
(0.193)
-0.766*
(0.260)
-0.439*
(0.220)
Continued
0.576*
(0.083)
0.388*
(0.085)
0.002
(0.153)
-0.249
(0.166)
0.081
(0.238)
-0.606
(0.337)
-0.544*
(0.253)
0.331*
(0.059)
0.253*
(0.069)
0.279*
(0.127)
-0.675*
(0.144)
0.114
(0.215)
-1.039*
(0.230)
-0.692*
(0.203)
1988
Education
1984
Validated Self-reported Validated Self-reported Validated Self-reported Validated Self-reported Self-reported Self-reported
turnout
turnout
turnout
turnout
turnout
turnout
turnout
turnout
turnout
turnout
1980
Explanatory
variable
1976
Table D2. Estimated Turnout Based on Self-Reported and Validated Data, Model 3 (Full) (standard errors in parentheses)
314
Deufel and Kedar
1692
786.4
N
Log likelihood
1590
839.7
-0.081
(0.244)
0.052
(0.258)
0.130
(0.261)
0.288*
(0.143)
0.343*
(0.046)
0.415*
(0.066)
0.715*
(0.159)
-3.484*
(0.405)
-0.185
(0.226)
Age 55–64
0.169
(0.230)
Age 65 and over 0.242
(0.233)
Homeowner
0.718*
(0.137)
Church
0.261*
attendance
(0.043)
PID (strength)
0.292*
(0.063)
Contact
0.760*
(0.146)
Constant
-2.994*
(0.373)
p < .05 = *
1980
1984
1988
1992
1996
1109
622.8
-0.120
(0.235)
0.026
(0.151)
0.352
(0.247)
0.558*
(0.165)
0.216*
(0.048)
0.282*
(0.073)
0.354*
(0.173)
-3.071*
(0.391)
1223
573.2
-0.179
(0.287)
0.264
(0.309)
0.416
(0.306)
0.475*
(0.170)
0.225*
(0.052)
0.382*
(0.077)
0.829*
(0.205)
-3.406*
(0.450)
1614
845.1
-0.350
(0.219)
0.272
(0.259)
0.484
(0.246)
0.520*
(0.138)
0.262*
(0.043)
0.360*
(0.062)
0.624*
(0.155)
-3.102*
(0.365)
1663
746.9
-0.146
(0.248)
0.365
(0.286)
0.435
(0.270)
0.561*
(0.147)
0.288*
(0.047)
0.403*
(0.066)
0.865*
(0.182)
-3.435*
(0.392)
1385
724.0
-0.430
(0.239)
-0.164
(0.272)
0.202
(0.277)
0.506*
(0.152)
0.297*
(0.047)
0.453*
(0.068)
0.557*
(0.164)
-3.315*
(0.392)
-0.468*
(0.227)
0.916*
(0.306)
0.741*
(0.252)
0.335*
(0.144)
0.133*
(0.042)
0.439*
(0.065)
0.992*
(0.208)
-3.836*
(0.380)
1838
760.3
-0.405
(0.265)
0.063
(0.333)
0.274
(0.297)
0.326*
(0.157)
0.343*
(0.051)
0.592*
(0.071)
0.910*
(0.191)
-4.426*
(0.429)
1530
685.3
1349
563.9
-0.504*
(0.248)
0.022
(0.299)
0.580*
(0.285)
0.333
(0.171)
0.317*
(0.053)
0.584*
(0.080)
0.896*
(0.201)
-4.370*
(0.471)
Validated Self-reported Validated Self-reported Validated Self-reported Validated Self-reported Self-reported Self-reported
turnout
turnout
turnout
turnout
turnout
turnout
turnout
turnout
turnout
turnout
1976
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Age 35–44
Explanatory
variable
Table D2. Continued
Race and Turnout in U.S. Elections
315
316
Deufel and Kedar
Appendix E: The Relationship Between Validated and
Self-Reported Data
Table E1. Deflated Functions (standard errors in parentheses)
Varying over time
and race
African American
Pr(R)
Pr(R) squared
Year dummy:
1980
1984
1988
Interaction effects:
Af. Am. * Pr(R)
Af. Am. * Pr(R) squared
1980 * Af. Am.
1984 * Af. Am.
1988 * Af. Am.
1980 * Pr(R)
1984 * Pr(R)
1988 * Pr(R)
1980 * Pr(R) squared
1984 * Pr(R) squared
1988 * Pr(R) squared
1980 * Pr(R) * Af. Am.
1984 * Pr(R)*Af. Am.
1988 * Pr(R)*Af. Am.
1980 * Pr(R) squared * Af. Am.
1984 * Pr(R) squared * Af. Am.
1988 * Pr(R) squared * Af. Am.
Constant
0.022
0.385*
0.454*
N
Varying over
race only
(0.053)
(0.052)
(0.040)
-0.011
0.420*
0.391*
(0.019)
(0.189)
(0.020)
0.005
-0.016
0.010
(0.024)
(0.023)
(0.020)
–
–
–
–
–
–
-0.273
0.200
-0.022
-0.031
-0.012
-0.006
0.056
0.212*
-0.037
-0.041
-0.252*
0.042
-0.053
-0.249
-0.012
0.073
0.231
0.113*
(0.178)
(0.141)
(0.070)
(0.066)
(0.060)
(0.078)
(0.075)
(0.067)
(0.061)
(0.057)
(0.052)
(0.238)
(0.226)
(0.206)
(0.190)
(0.181)
(0.167)
(0.016)
-0.280*
0.221*
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
0.126*
(0.069)
(0.057)
–
–
–
–
–
–
–
–
–
–
–
–
–
–
–
(0.008)
6108
6108
p < .05 = *
References
Abramson, Paul, and William Claggett. 1984. “Race-Related Differences in Self-Reported and
Validated Turnout.” Journal of Politics 46(3):719–38.
———. 1986. “Race-Related Differences in Self-Reported and Validated Turnout in 1984.”
Journal of Politics 48(2):412–22.
———. 1989. “Race-Related Differences in Self-Reported and Validated Turnout in 1986.”
Journal of Politics 51(2):397–408.
———. 1991. “Racial Differences in Self-Reported and Validated Turnout in the 1988 Presidential
Election.” Journal of Politics 53(1):186–97.
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Variable
Race and Turnout in U.S. Elections
317
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
———. 1992. “The Quality of Record Keeping and Racial Differences in Validated Turnout.”
Journal of Politics 54(3):871–80.
Ansolabehere, Stephen, and Charles Stewart III. 2009. “Amazing Race: How Post-Racial Was
Obama's Victory?" Boston Review 34(1):(January/February).
Belli, Robert F., Michael W. Traugott, and Matthew N. Beckmann. 2001. “What Leads to Voting
Overreports? Contrasts of Overreporters to Validated Voters and Admitted Nonvoters in the
American National Election Studies.” Journal of Official Statistics 17(4):479–98.
Belli, Robert F., Michael W. Traugott, Margaret Young, and Katherine A. McGonagle. 1999. “Reducing Vote Overreporting in Surveys.” Public Opinion Quarterly 63:90–108.
Bernstein, Robert, Anita Chadha, and Robert Montjoy. 2001. “Overreporting Voting: Why It Happens and Why It Matters.” Public Opinion Quarterly 65(1):22–44.
Burden, Barry C. 2000. “Voter Turnout and the National Election Studies.” Political Analysis 8
(4):389–90.
———. 2003. “Internal and External Effects on the Accuracy of NES Turnout: Reply.” Political
Analysis 11(2):193–95.
Cassel, Carol A. 2003. “Overreporting and Electoral Participation Research.” American Politics
Research 31(3):81–92.
———. 2004. “Voting Records and Validated Voting Studies.” Public Opinion Quarterly 68
(1):102–8.
Clausen, Aage R. 1968. “Response Validity: Vote Report.” Public Opinion Quarterly 32
(4):588–606.
Dawson, Michael C. 1994. Behind the Mule. Princeton, NJ: Princeton University Press.
Duff, Brian, Michael J. Hanmer, Won-Ho Park, and Ismail K. White. 2007. “Good Excuses: Understanding Who Votes with an Improved Turnout Question.” Public Opinion Quarterly 71
(1):67–90.
Greene, William H. 1993. Econometric Analysis. 2nd ed. Upper Saddle River, NJ: Prentice
Hall International.
Hill, Kim Quaile, and Patricia A. Hurley. 1984. “Nonvoters in Voters’ Clothing: The Impact of
Voting Behavior Misreporting on Voting Behavior Research.” Social Science Quarterly 65
(1):199–206.
Holbrook, Allyson L., and Jon A. Krosnick. Forthcoming. “Social Desirability Bias in Voter Turnout Reports: Tests Using the Item Count Technique.” Public Opinion Quarterly.
Karp, Jeffrey A., and David Brockington. 2005. “Social Desirability and Response Validity: A
Comparative Analysis of Overreporting Voter Turnout in Five Countries.” Journal of Politics
67(3):825–40.
Kinder, Donald R., Mendelberg Tali, Michael C. Dawson, Lynn M. Sanders, Steven J. Rosenstone, Jocelyn Sargent, and Cathy Cohen. 1989. Race and the 1988 American Presidential
Election. Paper presented at the Annual Meeting of the American Political Science Association,
Atlanta, GA, USA.
Kinder, Donald R., and Lynn M. Sanders. 1996. Divided by Color: Racial Politics and Democratic Ideals. Chicago, IL: University of Chicago Press.
Martinez, Michael D. 2003. “Comment on 'Voter Turnout and the National Election Studies'.”
Political Analysis 11(2):187–92.
McDonald, Michael P. 2003. “On the Overreport Bias of the National Election Study Turnout
Rate.” Political Analysis 11(2):180–86.
McDonald, Michael P., and Samuel L. Popkin. 2001. “The Myth of the Vanishing Voter.” American Political Science Review 95(4):963–74.
Miller, Warren E, and the National Election Studies. American National Election Studies Cumulative Data File, 1948–1998 [Computer file]. 1999. 10th ICPSR version. Ann Arbor: University of
Michigan, Center for Political Studies [producer] Ann Arbor, MI: Inter-University Consortium
for Political and Social Research [distributor].
318
Deufel and Kedar
Downloaded from poq.oxfordjournals.org by guest on February 7, 2011
Presser, Stanley, and Michael Traugott. 1992. “Little White Lies and Social Science Models: Correlated Response Errors in a Panel Study of Voting.” Public Opinion Quarterly 56(1):77–86.
Presser, Stanley, Michael W. Traugott, and Santa Traugott. 1990. Vote 'Over' Reporting in Surveys: The Records or the Respondent. Paper prepared for the International Conference on
Measurement Errors, Tucson, AZ, USA.
Rosenstone, Steven J., and John Mark Hansen. 1993. Mobilization, Participation, and Democracy
in America. New York: MacMillan.
Sigelman, Lee. 1982. “The Nonvoting Voter in Voting Research.” American Journal of Political
Science 26(1):47–56.
Silver, Brian D., Barbara A. Anderson, and Paul R. Abramson. 1986. “Who Overreports Voting?
American Political Science Review 80(2):613–24.
Tate, Katherine. 1994. From Protest to Politics: The New Black Voters in American Elections.
Cambridge, MA: Harvard University Press.
Uggen, Christopher, and Jeff Manza. 2002. “Democratic Contraction? Political Consequences
of Felon Disenfranchisement in the United States.” American Sociological Review 67
(6):777–803.
Verba, Sidney, and Norman H. Nie. 1972. Participation in America: Political Democracy and
Social Equality. New York: Harper and Row.
Verba, Sidney, Kay Lehman Schlozman, and Henry E. Brady. 1995. Voice and Equality. Cambridge, MA: Harvard University Press.