Prejudice Matters in Elections?: An Estimator for Binary Outcomes

Prejudice Matters in Elections?: An Estimator for
Binary Outcomes with Sample-Selection
Jin-Young Choi1
(Oct. 2015)
Abstract
In this paper, we propose a new semiparametric estimator for binary-outcome
selection models that does not impose any distributional assumptions, and only imposes
an index assumption on the selection equation. We adopt the idea in Lewbel (2000)
of using a special regressor to transform the binary Y in a way that is linear in the
latent index, and then remove the selection correction term by di¤erencing as in the
case where Y is linear.
We apply our estimator to US presidential election data in 2008 and 2012 to assess
the impacts of racism (a variable that measures prejudice) on the election of Barack
Obama. When we control for the sample-selection problem, our results show that
prejudice does not signi…cantly a¤ect support for Obama among white Democrats, but
has a positive e¤ect among white Republicans. And this pattern is found to be strong
and signi…cant in the South.
Keywords: Sample-Selection, Binary, Semiparametric estimator, Elections, Prejudice.
JEL codes: C14, C35, D72.
1
Economics and Business, Goethe University Frankfurt, Theodor-W.-Adorno-Platz 4, 60629, Frankfurt am Main,
Germany, Tel.+49)69-798-34757, [email protected]. I am very grateful to Arthur Lewvel and Myoung-jae Lee
for their comments. Also I wish to extend thanks to participants of seminars in Korea U and Sogang U for their helpful
comments. All errors are my own, and comments are very welcome.
1
1
Introduction
Sample-selection models consist of a selection equation and an outcome equation.
For a continuous outcome/response variable, many semiparametric estimators have
been proposed in the literature, in addition to the fully parametric maximum likelihood estimator (MLE) and the nearly parametric Heckman (1979) two-stage estimator.
Semiparametric estimators di¤er in their assumptions. Some but not all require an exclusion restriction, typically a regressor that is included in the selection equation but
excluded from the outcome equation. Semiparametric estimators also vary in whether
they allow for unknown forms of heteroskedasticity or not, and some identify the outcome equation intercept while others do not.
Newey et al. (1990) adapts Robinson’s (1988) two-stage approach for sampleselection models, while Ahn and Powell (1993) and Powell (1987, 2001) use pairwise
di¤erencing methods. These estimators require an exclusion restriction. Donald (1995)
imposes normality but allows for a general form of heteroskedasticity and does not require an exclusion restriction. Chen’s (1999) estimator imposes an error symmetry
restriction and allows error terms to depend on the regressors only through the absolute value of a linear index, to obtain an estimator that does not require an exclusion
restriction and includes identi…cation of the intercept. Chen and Zhou (2010) propose
a symmetry-based estimator allowing an unknown form of heteroskedasticity without
normality, but with an exclusion restriction that the heteroskedasticity function depends only a subset of regressors. Lewbel (2007) introduces a GMM-type estimator
using a density weighting idea, which allows an unknown form of heteroskedasticity
and identi…es the outcome equation intercept, but requires a “special regressor” that
is excluded from the heteroskedasticity function.
When the response variable is binary, estimating the sample-selection model becomes much more di¢ cult: call such a model “a binary-outcome selection model’. The
most popular estimator for binary-outcome selection models is probably MLE, assuming joint normality of the two equation-error terms and independence between the errors
2
and the regressors. This estimator is widely available in popular econometric software
packages such as STATA. However, MLE runs the risk of misspeci…cation of normality
or violation of the independence assumption. Also, MLE requires estimating the correlation coe¢ cient between the two error terms, which is often only weakly identi…ed
because both errors are latent, thereby leading to numerical di¢ culties in practice. For
binary-outcome selection models, there is no direct analogue to the Heckman’s (1979)
two-stage estimator, although it is possible to add the usual selection correction term
(the “inverse Mill’s ratio”) into the latent response equation.
Semiparametric estimators for binary-outcome selection models are relatively scarce.
Klein et al. (2015) propose a quasi-MLE under a double linear index assumption. They
require each index to contain at least one continuous regressor and they require an exclusion restriction that the selection equation contains a continuous regressor excluded
from the outcome equation. Escanciano et al. (2012) assume a double index model
for the outcome equation: one linear and the other unknown. They do not require
an exclusion restriction, but a continuous regressor should be included in the outcome
equation. Their results apply to a more general class of models that include binaryoutcome selection models as a special case.
In this paper, we propose a new semiparametric estimator for binary-outcome selection models that does not impose any distributional assumptions, and only imposes
an index assumption on the selection equation. The estimator, however, does require a
continuous special regressor as in Lewbel (2000, 2007) that satis…es a support restriction in the outcome equation, and a variable (which can be discretely distributed) that
satis…es an exclusion restriction. Unlike most parametric and semiparametric estimators for this problem, including Klein et al. (2015), Escanciano et al. (2012), and MLE,
our estimator for the outcome equation has a closed-form expression and therefore does
not require numerical optimization (our selection equation can be estimated in a variety of ways, some of which may entail numerical optimization, but this equation is still
not the typical source of numerical problems such as those that arise from estimating
3
the features of the joint distribution of the errors in the two equations).
For the sake of comparison, we conduct simulations using favourable/unfavourable
models to MLE. We …nd that our estimator performs well in most cases. Its performance is robust to heterogeneity (caused by regressors) and non-normality of the error
terms, while MLE is not. Also, it is computationally not as time-demanding as MLE
because it does not require estimation of the correlation coe¢ cient and has a closedform solution in the second stage.
We apply our estimator to US presidential election data in 2008 and 2012 to
assess the impacts of racism (speci…cally a variable that measures prejudice) on the
election of Barack Obama. We …nd evidence that prejudice does not signi…cantly a¤ect
support for Obama among white Democrats, but has a positive e¤ect among white
Republicans. Also, this pattern is shown to be strong and signi…cant in the South.
These results would be interpreted as follows; since white Republicans are known to
have a high level of negative prejudice against blacks ‘as a group’, it might put pressure
on them. If Obama were not elected, they would more easily be judged as racist and
it would hurt their political decency. They might want to disprove this stigma and the
prejudice might lead some of them to vote for Obama who otherwise would not have
voted Democrat, even if they still had negative feeling about blacks. On the other
hand, white Democrats are known to be favourable to blacks ‘as a group’, thus might
have no pressure to manipulate their decision due to prejudice.
We also …nd that our results slightly di¤er from those of MLE; the signs of signi…cant coe¢ cients are matched, but their magnitudes are not. When an (arbitrary)
known form of heteroskedastic errors is allowed, MLE’s results considerably change.
Thus, we cannot eliminate the possibility of its inconsistency. Also, the correlation
coe¢ cient in MLE indicating presence of the sample-selection problem is found to be
close to zero, in contrast to other research in political science.
The rest of this paper is organized as follows. Section 2 introduces our estimator.
Section 3 does a preliminary empirical analysis after examining the relevant racism and
4
own-race favour literature. Section 4 presents our main empirical …ndings. Finally,
Section 5 is our conclusion.
2
Estimator
Let 1[A] = 1 if A holds and 0 otherwise. Our binary-outcome selection model is
Y = 1[W + X 0
0
+ U > 0];
P (D = 1jZ) = (Z 0
0)
E(U jZ; D = 1) = E(U jZ 0
Z = (R; X 0 )0
Y is observed only if D = 1;
(1.1)
for a function ( ) and a parameter
0; D
= 1)
g(Z 0
(1.2)
0;
for a function g( ) (1.3)
0)
where R is a scalar regressor,
(Di ; Wi ; Zi ; Di Yi ) is observed, i = 1; :::; N
where W is a special regressor as in Lewbel (2000), X and Z are kx
1 and kz
regressor vectors with its …rst component being 1, U is an error term, and
are parameter vectors. We can replace the linear indices X 0
0
and Z 0
0
0
and
1
0
with nonlinear
ones, but we will stick to linear indexes for simplicity.
In (1.1), the coe¢ cient of W is normalized to one, which is arranged by dividing
both sides of the inequality by the slope of W ; the sign of W can be assumed to
be known without loss of generality (and converted to positive by replacing W with
p
W if necessary) since it can be estimated at a rate faster than N . The selection
equation is assumed to satisfy the single index assumption in (1.2). Another single
index assumption is imposed on U in (1.3) because U is assumed to depend on Z only
through Z 0
0.
The model has an exclusion restriction that R is excluded from the
outcome equation and included in the selection equation, and an inclusion restriction
that W appears in the outcome equation. Although W does not appear in the selection
equation, W does not have to be excluded from the selection equation because the
selection equation may be taken as a ‘reduced form’for E(DjZ) that obeys the index
restriction in (1.2).
5
Since g(Z 0
0)
will be removed eventually by a di¤erencing argument, we will not
need to specify the function g( ). We also do not need to specify the functional form
of ( ) since there already exist semiparametric estimators, e.g. Han (1987), Powell
et al. (1989), Sherman (1993), Klein and Spady (1993) or Ichimura (1993), allowing
for an unknown ( ). Our estimator allows this level of generality, though in the later
empirical analysis probit will be used for simplicity, in which case
( ) is just the
standard normal distribution function.
To provide some intuition for our proposed estimator, suppose that Y were continuous so that we could postulate Y = W + X 0
E(U jZ 0
0; D
= 1) from Y = W + X 0
Y = W + X0
0
+ g(Z 0
0)
0
+ U . Then adding and subtracting
+ U would yield
where V
+V
0
U
E(U jZ 0
0; D
= 1):
(1.4)
Many of the semiparametric estimators in the literature (including most of those discussed in the previous section) use this equation to estimate
the ‘selection correction term’ g(Z 0
in (1.1), g(Z 0
0)
0)
0
by di¤erencing out
in some way. However, when Y is binary as
appears inside of the 1[ ] function, which makes removing g(Z 0
0)
by di¤erencing the model infeasible. To overcome this problem, we adopt the idea in
Lewbel (2000) of using a special regressor to transform the binary Y in a way that is
linear in the latent index X 0
0 + g(Z
0
0 ).
We can then remove g(Z 0
0)
by di¤erencing
as in the case where Y is linear.
Applying Lewbel (2000) requires a special regressor W that satis…es the following
assumptions (letting F denote a distribution function):
(i) : U q W jZ ( =) U j(W; Z)
U jZ that follows the same distribution as U jZ 0
(ii) : FW jZ;D=1 is absolutely continuous with density fW jZ;D=1
(iii) : the support of W jZ; D = 1 is [Wl ; Wh ] that includes
the support of
X0
0
U where
6
1
Wl < 0 < W h
1:
0)
De…ne a transformed response:
Y 1[W > 0]
:
fW jZ;D=1 (W )
Yez
Then, the following theorem holds that is the key for our estimator:
Theorem 1 Under the model (1.1)-(1.3) and assumptions (i)-(iii), it holds that
E(Yez jZ; D = 1) = X 0
Proof. Observe:,
0
+ g(Z 0
(1.5)
0 ):
Y 1[W > 0]
jW; Z; D = 1g jZ; D = 1]
fW jZ;D=1 (W )
1[W > 0]jW; Z; D = 1g
jZ; D = 1]
fW jZ;D=1 (W )
E(Yez jZ; D = 1) = E[ Ef
= E[
=
Z
EfY
Wh
Wl
Wh
=
=
Z
Wl
Z Wh
Wl
Z Z
EfY
1[W > 0]jW; Z; D = 1g
fW jZ;D=1 (w)dw
fW jZ;D=1 (w)
Ef1[W + X 0
Z
(1[W + X 0
Wh
0
+ U > 0]
1[W > 0]jW; Z; D = 1gdw
0
+ u > 0]
1[W > 0])dFU jZ;D=1 (u)dw
(1[W >
X0
The inner integrand depends on
X0
=
0
u]
Wl
also zero when
X0
0
0
1[W > 0])dw dFU jZ;D=1 (u):
X0
0
u = 0, and it is
X0
0
u<W <0
u: it is zero when
U 6= 0 except
if
X0
0
u < 0, then the inner integrand is 1 when
if
X0
0
u > 0, then the inner integrand is
1 when 0 < W <
X0
0
u:
Thus,
E(Yez jZ; D = 1)
Z
Z 0
0
=
(1[ X 0 u < 0]
dw
1[ X 0 0
X0 0 u
Z
=
(X 0 0 + u)dFU jZ;D=1 (u) = X 0 0 + g(Z 0 0 ):
7
u > 0]
Z
0
X0
0
u
dw) dFU jZ;D=1 (u)
The selection correction term g(Z 0
0)
in (1.5) can be removed using one of the
di¤erencing ideas applied to (1.4) in the literature. In this paper, we use the approach
of Newey et al. (1990), applied to Lewbel’s (2000) special regressor transformation.
For this we need to de…ne another transformed response variable:
Y 1[W > 0]
:
fW jZ 0 0 ;D=1 (W )
Yez
The following lemma then gives a linear expression for the outcome equation.
Lemma 2 Under the model (1.1)-(1.3) and assumptions (i)-(iii), it holds that
EfYez
E(Yez jZ 0
0; D
E(XjZ 0
= 1)jZ; D = 1g = fX
0; D
= 1)g0
(1.6)
0:
Proof. Following the proof for Theorem 1, we obtain
E(Yez jZ 0
0; D
= 1) = E(XjZ 0
0; D
= 1)0
0
+ g(Z 0
0 ):
Subtract this from
to remove g(Z 0
E(Yez jZ; D = 1) = X 0
0 ):
E(Yez jZ; D = 1)
E(Yez jZ 0
0; D
0
+ g(Z 0
= 1) = fX
0)
E(XjZ 0
0; D
= 1)g0
which can be rewritten as (1.6).
Using (1.6), we can estimate
DfYez
E(Yez jZ 0
0
0; D
by the least squares estimator (LSE) of
= 1)g on DfX
E(XjZ 0
0; D
= 1)g
under the non-singularity of
E[ DfX
E(XjZ 0
0; D
= 1)gfX
8
E(XjZ 0
0; D
= 1)g0 ]:
0
As the intercept in
0
is not identi…ed in the LSE, let
our estimator for is
" N
X
b =
Di fXi
i=1
" N
X
i=1
b i jZ 0 ^ 0 ; Di = 1)gfXi
E(X
i
b i jZ 0 ^ 0 ; Di = 1)gfY^zi
E(X
i
Di fXi
^ ) denote estimators for
where ^ 0 and E(
Y^zi
Yi 1[Wi > 0]
fbW jZ ;D =1 (Wi )
i
0
and
i
denote the slopes in
b i jZ 0 ^ 0 ; Di = 1)g0
E(X
i
0.
#
b z i jZ 0 ^ 0 ; Di = 1)g
E(Y
i
Then
1
(1.7)
#
and E( ) and
Y^z
i
Yi 1[Wi > 0]
:
fbW jZ 0 ^ ;D =1 (Wi )
i
0
i
Taken together, these equations provide a simple multistaged estimator ^ for .
The …rst stage is obtaining ^ 0 and the second stage is estimating f^’s using ^ 0 . The
b ) using ^ 0 and f^’s, and the …nal stage is
next step is constructing Y^z , Y^z , and E(
then calculating ^ . The …rst stage estimation of ^ 0 , can be done using a variety of
p
estimators. Semiparametric N -consistent estimators that could be used for ^ 0 include
those described by Han (1987), Powell et al. (1989), Sherman (1993), Klein and Spady
(1993) or Ichimura (1993). As for Y^z and Y^z , Dong and Lewbel (2015) showed various
b ) using kernel
ways to obtain Y^z and Y^z . Given ^ 0 , we can obtain Y^z , Y^z and E(
density and regression estimators.
If
0
were known, then the resulting estimator b would have the same structure
as the special regressor estimator in Lewbel (2000); given this, it could be shown to
p
be N -consistent and asymptotically normal under the same regularity conditions
provided there. Here
0
has to be estimated, which complicates derivation of the
limiting distribution of b. However, the estimator still takes the form of a standard
multistep estimator where some of the steps are standard nonparametric regression
p
b ), yielding N -consistent and asymptotically normal
and density estimators, f^ and E(
estimates for b under standard conditions. Explicit formulas for the limiting variance
of b will be complicated given the number of nuisance estimators involved (^ 0 , f^,
b )), and will depend on details regarding the chosen estimator ^ 0 . We
Y^z , Y^z and E(
9
therefore instead use a nonparametric bootstrap to obtain con…dence intervals in our
later empirical application. See, e.g., Chen et al. (2003) for one set of regularity
conditions that su¢ ce to rationalize use of a bootstrap in a multistep estimator like
ours. Note that our estimator is particularly suitable for bootstrapping because it does
not require any numerical searches or optimizations.
3
Simulation Studies
To simplify presentation, call our semiparametric estimator “LBS (Lewbelian Binary-
outcome Selection model estimator)”. The simulation design is as follows
yi = 1 +
1 x1i
+
2 x2i
+
w wi
+ ui ;
d = 1[1 + x1i + x2i + ri + wi + ei >
yi = 1[yi >
2 ]di
, COR(ei ; ui )
x1i ; x2i ; ri ~N (0; 1),
ui q xi ; wi ,
1 ];
0:5;
ri = ri or 1[ri > 0],
ei q xi ; wi ; ri ,
1;
2
= 1,
w
= 2.
The regressor values x1i , x2i , wi and ri for the binary instrument ri were generated
as i.i.d. standard normal. The error term ui in the outcome equation and the error
term ei in the selection equation were generated from a joint distribution with non-zero
correlation between ui and ei . The special regressor wi was included in the selection
equation because it is not required to be excluded from this equation, but not used as
a regressor in Z for Z 0
and P (Y = 1jD = 1)
values (
1
and
2)
0.
We set the cuto¤ values
1
and
2
to retain P (D = 1)
0:65
0:77 similar to the actual data we use. Also the parameter
are set to 1, except the coe¢ cient of the special regressor (
so that the true identi…ed coe¢ cients of regressors are 0:5(=
1
w
=
2
w
w
= 2),
). Note that the
intercept is not identi…ed in LBS, because our estimator is based on the approach of
Newey et al. (1990).
10
Probit MLE was used to estimate Z 0
0
in the …rst-stage, and following a suggestion
in Dong and Lewbel (2015), we estimated using f"1 jZ;D=1 and f"2 jZ 0
fW jZ;D=1 and fW jZ 0
for W jZ and W jZ 0
0 ;D=1
0.
0 ;D=1
instead of
, where "1 and "2 are the error terms for the linear models
Since the density function f"1 jZ;D=1 and f"2 jZ 0
0 ;D=1
are in the
denominator of Yez and Yez , we chose trimming values within [0:01; 0:1] minimizing the
mean squared error (MSE). For a bandwidth of nonparametric estimation for f ( jS)
and E( jS), we used the rule of thumb bandwidth h = SD(S)N
1=5
. Mean bias (Mn-
Bias) and root MSE are reported, and all performance measures were calculated from
300 repetitions.
For the sake of comparison, we presents simulation results of MLE for a binary
response with sample-selection, along with results of LBS. The error terms ui and ei
were generated from the jointly normal distribution with correlation
= 0:5 which
is favourable to MLE, otherwise MLE does not converge well. We examine various
simulation designs from (1) to (4) below to verify performance of LBS compared to
MLE. Since our main interests are slope coe¢ cients and di¤erences between
are not noticeable, only LBS’s results of
and
1= w
are presented, where
u
1= w ,
and MLE’s results of
1
1= u,
and
2
w = u,
denotes standard deviation of the outcome equation
error u which equals 1 for the jointly normal distribution.
The …rst panel of Table 1 presents simulation results of the simulation design (1)
which is favourable to MLE assuming the joint normal distribution of the two error
terms. Under the design favourable to MLE, LBS performs well even in the small
sample and its RMSE is improved as the sample size increases. For MLE, biases of
the estimators for
the estimator for
w= u
1= u
1= w
and
w= u
are relatively large in magnitude, but a bias of
is close to zero. This is because the biases for
1= u
and
appear in the same direction; the biases cancelled out in the ratio form. ‘FP’
in the last column denotes percentage of failure to converge for MLE. Even in the
favourable design, MLE failed to converge 4:0% of the times out of 300 repetitions in
the small sample (N = 500), but the percentage of failure decreases fast as the sample
11
size increases. Under the design favourable to MLE, both LBS and MLE perform well,
but MLE is better than LBS in terms of RMSE, due to its e¢ ciency.
Table1. Simulation Results for LBS and MLE
LBS
MLE
1= w
N
Bias
1= u
w= u
1= w
FP
RMSE
Bias
RMSE
Bias
RMSE
Bias
RMSE
(1) (ei ; ui )~JN
500
-0.021
0.129
0.020
0.177
0.051
0.237
-0.003
0.069
4.0%
1000
-0.004
0.088
0.002
0.120
0.022
0.161
-0.005
0.047
1.0%
2000
0.000
0.063
0.009
0.081
0.010
0.117
0.002
0.031
0.0%
(2) (ei ; ui )~Beta
500
-0.019
0.112
0.539
0.587
1.094
1.170
-0.005
0.056
12.0%
1000
-0.014
0.071
0.526
0.551
1.056
1.089
-0.001
0.037
1.7%
2000
0.000
0.049
0.510
0.522
1.022
1.038
0.000
0.026
0.0%
(3) SD(ujzi ) = j
z zi j
where zi
(x1i ; x2i ; ri );
z
(0:5; 0:5; 0:5)
500
-0.021
0.110
0.348
0.460
0.803
0.893
-0.016
0.065
91.7%
1000
-0.014
0.081
0.184
0.259
0.620
0.676
-0.047
0.061
90.3%
2000
-0.003
0.060
0.128
0.184
0.529
0.570
-0.053
0.062
88.0%
(4) SD(ujzi ; wi ) = j
z zi
+ wi j
500
0.016
0.117
0.117
0.461
0.253
0.824
0.000
0.076
31.3%
1000
0.030
0.087
-0.091
0.300
-0.074
0.494
-0.023
0.063
24.7%
2000
0.030
0.062
-0.165
0.242
-0.199
0.355
-0.034
0.055
5.3%
In the second panel, we examine a case where the normality assumption does not
hold. The two error terms ei and ui are generated from a beta distribution (Beta(2; 2))
on the interval [ 1:5; 1:5]. For MLE, the estimators for
but there is no bias for
1= w
1= u
and
w= u
are biased,
for the same reason as described above. For LBS, the
results are not much di¤erent from those of the design (1).
12
Next, the e¤ect of heteroskedasticity of the error term ui (SD(ujzi ) = j0:5zi j)
is investigated. For LBS, the performance does not change much, but for MLE, as
expected, biases of the estimators for
and
1= u
nitude, and a bias of the estimator for
1= w
w= u
are substantially large in mag-
does not disappear even if the sample
size increases. The percentage of failure is shown to be very high (91:7% in the small
sample), and it is not much diminished as the sample size increases.
For LBS heteroskedasticity with Z 0
0
is allowed, but not with W . Thus, when
the heteroskedasticity with W , along with Z 0
0,
is introduced, LBS is biased and the
size of the bias is similar to that of MLE as shown in the third panel. All estimates of
MLE are biased under presence of heteroskedasticity with any of regressors.
Additionally, we consider two cases: where the excluded variable ri is binary, and
where the correlation
is close to 0 or 1. These results are presented in appendix.
The main advantage of LBS comparing to other semiparametric estimators for LDV
models is to allow the excluded variable ri to be binary. When the binary ri is used,
instead of the continuous ri , the performance of LBS becomes slightly worse than that
of the design (1), but it is improved as the sample size increases. In the case where
the correlation
in estimating
1= w
is close to 0 or 1, while LBS remains the same, MLE becomes poor
1= u
and
w= u
as the correlation
is close to 1. However, the ratio
is estimated correctly in MLE.
In summary, even under the simulation design favourable to MLE, LBS performs
fairly well in various cases. It is computationally not time-demanding as much as MLE
because it has a closed-form solution in the second stage, and the …rst stage could be
done by simple Probit. If the presence of heteroskedasticity is considered, LBS would
be more appropriate for binary-outcome selection models. While MLE works better in
terms of RMSE and estimating
1= w
seems …ne, it frequently fails to converge in many
cases and we could not be sure which reasons cause this failure. In MLE, estimating the
correlation parameter
is computationally troublesome and gradi-search over
makes
maximization procedure time-demanding. In LBS, normalization of the parameters by
13
dividing through with
w
could be seen as a disadvantage. However, all parameters in
binary models are identi…ed only up to scale, so that it is actually not a disadvantage
for LBS. Also, MLE’s performance in estimating
and its performance in estimating
4
1= w
1= u
and
w= u
is worse than LBS’s,
is rather comparable to LBS’s.
Empirical Analysis
Many studies in various disciplines of social science provided evidence of preju-
dice/discrimination against blacks; Heckman and Siegelman (1992), Heckman (1998),
Raphael et al. (2000), Bertrand and Mullainathan (2004), Stoll et al. (2004), Charles
and Guryan (2008) in the labour market; Dovidio and Gaertner (2000), Hodson et al.
(2002) in schools; Cunningham (2010) in sports; Knowles et al. (2001), Coker (2003),
Hodson et al. (2005) in the justice system. Given the evidence of negative prejudice against blacks, the 2008 US presidential election, where the …rst black candidate
Barack Obama was nominated, might be a good natural experiment to study the e¤ect
of prejudice on a social decision.
Many studies analyzed the impact of prejudice on Obama’s bid for presidency
and whether white voters discriminated against the black candidate (i.e. Steele 2008;
Thernstrom 2008; Hutchings 2009; Mas and Moretti 2009; Lewis-Beck et al. 2010;
Piston 2010; Redlawsk et al. 2010; Tesler and Sears 2010; Ehrlinger et al. 2011;
Highton 2011; Kinder and Dale-Riddle 2011; Scha¤ner 2011). Many studies agree that
white prejudice (partially) a¤ected the election outcomes. However, there is still a wide
diversity of opinion about the ways in which Obama overcame the adverse e¤ects of the
prejudice. In most studies, they focussed on which groups contributed to the victories
of Obama and ignored the sample-selection problem results in inconsistent estimates
regarding the impact of prejudice. We, however, focus on the prejudice issue where
the selection problem is critical: whether voters with negative feeling against blacks
are more likely to participate in voting. Our empirical analysis, both the preliminary
14
analysis using time-series data and the main analysis using individual data, is intended
to shed some light on the e¤ect of prejudice, taking advantage of the unique opportunity
that the 2008 and 2012 US presidential elections present in terms of race.
4.1
Preliminary Analysis With Time-Series: Race and Party
We carried out a preliminary data analysis using the aggregate time-series data
over 1980-2012 (9 elections) obtained from the American National Election Studies
(ANES)2 that is designed to be representative at the national level, not at state nor
local levels. Our main empirical analysis in the next section is based on individual
survey data that are also from ANES. Let “Black”, “White”, and “Hisp” stand for
blacks, whites, and Hispanics respectively.
The left panel of Figure 1 shows that the voting share over the period 1980–2012
has been dominated by whites. The voting share of Hispanics has increased gradually
since 1980 while that of blacks has been stable at around 11~13%, except in the 2004
election. There is a local peak of whites’voting share in 2008 when Barack Obama ran
for president for the …rst time, followed by a dip in 2012 when he ran for the second
time.
The second panel of Figure 1 presents the turn-out rate by race over 1980–2012.
The turn-out rate of whites has been retained consistently above 70% and hit 80% in
2004 but dropped slightly in 2008 and 2012. The turn-out rate of blacks has increased
gradually since 1988; it caught up to the turn-out rate of whites in 2008 and even
overtook it in 2012.
The right panel of Figure 1 shows the Democrat-candidate-supporting (DCS ) probability among the voters by race over the period 1980–2012. Due to the popularity of
Bill Clinton when he ran for the second time, it reached a local peak for all races in
1996. After Clinton, the DCS probability dipped during George Bush’s era and peaked
2
See
ANES
website
for
detail
http://www.electionstudies.org/studypages/cdf/cdf.htm
15
information
on
the
study.
Figure 1: Voting Behavior by Race
in the 2008 election to reach about 98% for blacks and 76% for Hispanics. Blacks
and Hispanics strongly supported Obama in both 2008 and 2012. DCS probability of
whites in 2008 slightly increased compared to the previous election and dropped again
in 2012, but the amount of change seems negligible.
Even if blacks and Hispanics strongly supported Obama, their voting share was not
big enough to give Obama an absolute chance of winning. Whites’voting behaviour
would play a huge role for Obama, so that overcoming the negative prejudice that
whites might have against blacks, might have been a critical issue for him. It will
be interesting to know in which direction whites’voting behaviour contributed to the
results of the elections in 2008 and 2012, and especially how their perceived negative
prejudice against blacks a¤ected the election results.
In addition to race, party a¢ liation is another important factor for explaining a
voting decision. Let “Demo”and “Rep”denote respondents who identify as Democrat
or Republican, respectively. Those who did not belong to either party are identi…ed
as “Independent”, which is not presented in the …gures because the party a¢ liation is
exclusive.
16
Figure 2: Voting Behavior for Black and White by Party
The left panel of Figure 2 shows the share of voters (D = 1) by race and party
a¢ liation. The share of black Democrats decreased from 12:7% in 2004 to 11:3% in
2008 and 11:6% in 2012. The share of white Democrats increased from about 29:6%
in 2004 to 32% in 2008 and then decreased to 28:6% in 2012. It implies that the white
Democrats were more likely to participate in voting temporally in the 2008 election,
and their voting decision might have played a crucial role in that election. In contrast,
the share of white Republicans decreased from 40% in 2004 to 39:1% in 2008, and then
further decreased to 37:1% in 2012.
In the right panel of Figure 2, the black Democrats’ DCS probability increased
from 92% in 2004 to 99:8% in 2008, and then dropped to 97:7% in 2012. The black
Republicans’ DCS probability changed dramatically from 45:9% in 2004 to 92:4% in
2008, and then dropped to 47% in 2012. It demonstrates that blacks strongly supported
Obama in the 2008 election regardless of their party a¢ liation, but not in the 2012
17
election. Although this characterizes the 2008 and 2012 elections well, the small voting
share of blacks did not make a big change in the overall election picture.
The white Democrats’DCS probability declined from 88:2% in 2004 to 86:9% in
2008 and then went up to 88:1% in 2012. And the white Republicans’ DCS probability increased from 7:1% in 2004 to 8:4% in 2008 and then declined to 6% in 2012.
Even if the size of the change was small, it is puzzling. If anything, one would expect a small decrease for the white Republicans and a small increase for the white
Democrats because white Democrats (but not white Republicans) are known to be relatively favourable to blacks. One possible explanation is that prejudice against blacks
might have a¤ected white Republicans and white Democrats di¤erently. The white
Republicans might have tried to avoid the inevitability of being seen as racists if they
did not vote for Obama. Even if they had other strong reasons not to vote for Obama,
because they are known to have a high level of prejudice against blacks as ‘a group’,
they could easily be judged as racists if Obama were not elected. Thus, by voting for
Obama, some of them might have been trying to escape the stigma of racism (Steele,
2008). On the other hand, the white Democrats are known to be favourable to blacks
as ‘a group’, so they were less likely to be judged as racists even if Obama were not
elected. In this case, the prejudice might a¤ect DCS probability of white Democrats
negatively.
We analyzed the DCS probability for Hispanics and other races by party a¢ liation,
and found similar results to those of blacks, which are presented in Figure 5 of the
appendix. Hispanics and other races also showed unusually strong support for Obama
in 2008 and 2012 regardless of their party a¢ liation, but their total contributions are
negligible due to their small voting share.
4.2
Preliminary Analysis With 2008 Data: Prejudice
One of the key variables in our analysis is “Prejudice” which measures racial
prejudice against blacks ranging over 0 to 1. “Prejudice”is constructed by subtracting
18
the score given to whites (by the respondent) from the score given to blacks; the
score is about blacks/whites being perceived as lazy and unintelligent, and this way
of measuring prejudice follows Kinder and Mendelberg (1995), Hutchings (2009) and
Piston (2010). The prejudice variable falls in 0 to 1, “0” being the most positive
for blacks (i.e. blacks seen as diligent and intelligent, with whites seen as lazy and
unintelligent) and “1”being the most negative for blacks. Unfortunately, the prejudice
variable is available only in 2008, which results in estimating predicted prejudice values
for 2012 in the next section. Figure 3 shows the histogram of “Prejudice” in 2008 by
di¤erent groups.
For blacks, “Prejudice” is symmetrically distributed around 0:5 with 0:5 being
neutral (no prejudice). However, for whites it is skewed to the right, even though it
peaks at being neutral. By party a¢ liation, Republicans have a relatively high level of
prejudice compared to Democrats whose distribution is almost symmetrical, and their
proportion of being neutral is also 10% lower than that of Democrats. However, the
prejudice of white Democrats only might be di¤erent from this because most blacks
belong to Democrats.
To see the di¤erence in the level of prejudice by group, we categorize respondents
according to education level, income level, region, and age. “College” denotes college
graduates, and “HIncome”denotes the household income being higher than the upper
33 percentile. “Over50” indicates a group with age over 50. The regional dummy
variable, “South” represents the region codes of the U.S. Census3 . Forty-…ve % of
“College” have no prejudice and only 18% have a high level of prejudice (higher than
0:6), while 31% of “Non-College”(College= 0) have no prejudice and 29% have a high
level of prejudice. In other categories, i.e. “South” vs. “Non-South”, “HIncome” vs.
“Non-Hincome”, and “Over50” vs. “Non-Over50”, there are no signi…cant di¤erences
in the density of “Prejudice”.
Figure 4 shows the density of “Prejudice” for whites only by party a¢ liation.
3
South: AL, AR, DE, D.C., FL, GA, KY, LA, MD, MS, NC, OK, SC,TN, TX, VA, WV.
19
Figure 3: Prejudice by Group
White Republicans have a substantially higher level of prejudice than white Democrats.
Even if the density for white Democrats is slightly skewed to the right, the proportion
of those being neutral is 12% higher than that of white Republicans.
Even if “Prejudice”is self-reported, which results in underestimating real prejudice
for whites, Figures 4 and 5 clearly show that whites have a relatively high level of
prejudice against blacks, but that the levels of prejudice are shown to be di¤erent by
party a¢ liation.
A number of interesting conclusions emerged from the preliminary data analysis
based on the ANES time-series data and “Prejudice” in 2008. First, non-whites sent
strong support to Obama in 2008 and 2012, regardless of their party a¢ liation. Second, there was a 1:3% drop of the DCS probability in the white Democrats and a
1:3% increase in the white Republicans in 2008. This is an interesting …nding in that
Democrats are known to be favourable to blacks, but Republicans are not. This is
veri…ed by the “Prejudice”variable in 2008 data that white Republicans have a higher
20
Figure 4: Prejudice of Whites by Party
level of prejudice than white Democrats.
To investigate the second issue more precisely, in the next section we turn to
estimation approach using micro level data in 2008 and 2012. We focus on whether
the prejudice has a¤ected the 2008 and 2012 election outcomes di¤erently by race and
party a¢ liation, controlling the sample-selection problem. As most support for Obama
came from whites and there are few variations in non-whites’ voting behaviour, our
estimation analysis focuses on whites’voting behaviour.
If our interest is to see which group contributed to the victories of Obama in
2008 and 2012, there is no reason to bother with the sample-selection problem that
we can only observe voting decisions for those who voted. However, in the main
empirical analysis that follows, we focus on the prejudice issue where the selection
problem is critical: whether voters with negative feeling against blacks are more likely
to participate in voting (in the 2008 election, white Democrats were more likely to
participate in voting). That is, whether the prejudice a¤ects in the population, not just
in the selected sample of the voters, which calls for methods controlling the selection
21
problem.
Before estimation, it would be helpful to consider two possible interpretations
regarding results we might have. If the prejudice matters for whites’voting decisions,
it might work di¤erently by party a¢ liation. For white Republicans, if they behaved
following their prejudice, it would work as a strong reason not to vote for Obama (race
and party a¢ liation) and their decision could be rationalized by party a¢ liation, not
race. In this case, it would amplify the negative e¤ect of Republican on voting for
Obama, and the e¤ect of interaction between the prejudice and (white) Republican
would be negative. However, since they are known to have a high level of prejudice ‘as
a group’, it might put pressure on them and they might want to disprove this stigma.
If Obama were not elected, they would more easily be judged as racist and it would
hurt their political decency. Thus, the prejudice might lead some of them to vote
for Obama even if they still had negative feeling about blacks. And the higher the
prejudice they have, the more likely they might have been to vote for Obama to avoid
such blame. If this argument holds, the interaction e¤ect between the prejudice and
(white) Republican would turn out to be positive.
Di¤erently from Republicans, if white Democrats did not vote for Obama because
of prejudice, their decision would not be rationalized by anything other than race and
they would easily be judged as racist. In that case, they would behave against their
prejudice so that the prejudice might be irrelevant or have small positive e¤ect. On
the other hand, it is also possible that because the voting decision is not disclosed and
they are known to be favourable to blacks ‘as a group’, they would be not blamed even
if Obama were not elected. Thus, the negative prejudice might a¤ect their decision,
subconsciously leading to their not voting for Obama. If this argument holds, the
interaction e¤ect between the prejudice and (white) Democrats would be negative. If
voters do consider Obama simply as a presidential candidate, not as a ‘black’, the e¤ect
of the prejudice should be insigni…cant regardless of party a¢ liation.
22
4.3
Main Empirical Analysis
4.3.1
Variables and Descriptive Statistics
The individual ANES data for our main empirical analysis are repeated crosssections. These data include individual sample weights4 to retain the representativeness
of the samples over time. Although the weight is used for the preliminary time-series
data analysis, it is not used for our main empirical analysis; a weight based on regressors
does not a¤ect estimators’consistency.
Let Y = 1 denote voting for Barack Obama (in the 2008 and 2012 elections), and
D = 1 denote participation in voting. Y is observed only for those who voted (D = 1).
Table 2 presents the descriptive statistics of the variables used; both unweighted and
weighted numbers are displayed, with the weighted numbers in ( ). As described in the
previous section, “College” takes 1 for college graduates, “HIncome” is the household
income being higher than the upper 33 percentile, and “Over50” is a cohort dummy
for age over 50. Regional dummy variables, “West”, “N. Central (Northeast)”, “N.
East (North Central)”, and "South" represent the region codes of the U.S. Census5 .
“NoCareWhoWin” is not caring about who will win the presidential election: this is
used as exclusion restriction variable R; “ThmLiberal” is the feeling “thermometer”
toward Liberals: it ranges over 0 to 100, with 50 being neutral, and this variable is
used as the special regressor W . Table 2 shows that indeed blacks and Hispanics are
over-represented in the sample, and as a consequence, Democrats are over-represented
whereas Republicans are under-represented. We exclude the individuals with missing
values in the variables used in our analysis.
4
For
detail,
see
the
documentation
for
the
weight
in
ANES
in
http://www.electionstudies.org/studypages/cdf/cdf.htm.
5
Northeast: CT, ME, MA, NH, NJ, NY, PA, RI, VT; North Central: IL, IN, IA, KS, MI, MN,
MO, NE, ND, OH, SD, WI; South: AL, AR, DE, D.C., FL, GA, KY, LA, MD, MS, NC, OK, SC,TN,
TX, VA, WV; West: AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA, WY.
23
Table 2. Unweighted (Weighted) Descriptive Statistics
Variable
2008
2012
Mean
SD
Mean
Y jD = 1
0.654 (0.544)
0.678 (0.524)
D
0.781 (0.788)
0.742 (0.763)
Black
0.235 (0.115)
0.250 (0.118)
White
0.539 (0.761)
0.461 ( 0.725)
Hisp
0.187 (0.076)
0.224 (0.100)
Over50
0.402 (0.395)
0.310 (0.359)
West
0.253 (0.212)
0.242 (0.222)
N.Central
0.173 (0.215)
0.186 (0.231)
N.East
0.108 (0.142)
0.157 (0.183)
South
0.466 (0.431)
0.415 (0.364)
College
0.234 (0.296)
0.266 (0.319)
HIncome
0.245 (0.314)
0.212 (0.345)
Demo
0.592 (0.505)
0.629 (0.498)
Rep
0.309 (0.397)
0.295 (0.427)
NoCareWW
0.173 (0.185)
0.852 (0.859)
ThmLiberal
56.90 (54.36)
20.8
Prejudice
0.540 (0.546)
0.12
N
1614
54.72 (51.85)
SD
21.2
1534
In Table 2, the weighted means of most variables are shown to be similar in 2008
and 2012, except “NoCareWhoWin” which is only 0:185 in 2008, but 0:859 in 2012.
Obviously, the 2008 election received huge attention because of the …rst black candidate
nominated for president and most voters might care about who will win the presidential
election: but this was not the case in 2012.
24
4.3.2
Estimation Results with 2008 and 2012 data
Recall the model (1.1) and (1.2) in the estimator section:
Y = W + X0
Y = 1[Y > 0];
0
Y is observed only if D = 1;
P (D = 1jZ) = (Z 0
Z = (R; X 0 )0
(1.1)
+U
0)
for a function ( ) and a parameter
0;
(1.2)
where R is a scalar regressor.
As X, we control several variables described in the previous section. For LBS, we need
to estimate fW jZ;D=1 and fW jZ 0
0 ;D=1
. Following a suggestion in Dong and Lewbel
(2015), we estimated for f"1 jZ;D=1 and f"2 jZ 0
0 ;D=1
instead of fW jZ;D=1 and fW jZ 0
where "1 and "2 are the error terms for the linear models for W jZ and W jZ 0
0.
0 ;D=1
,
We used
b The 90% asymptotic
a cross-validation method to choose the bandwidth for fb and E.
statistical signi…cance is shown with ‘ ’ along with the 90% con…dence interval (CI)
in ( ); i.e. if the nonparametric (the re-sampling) bootstrap 90% CI excludes 0, then
‘ ’ is attached. The identi…ed parameter is a ratio e
w
which is a relative e¤ect
to the e¤ect of the special regressor W , and the sign of the identi…ed parameter is
not changed because the sign of W might be positive. In the estimation, we rescaled
“ThmLiberal”ranging over 0 to 1, otherwise the identi…ed parameters are too big.
Since ANES are repeated cross-section, we pooled the 2008 and 2012 data and
treated them as cross-sectional data, including the time dummy “Year2012”. One of
the di¢ culties of our data is that the key variable “Prejudice”is observed only in 2008.
We assumed a linear model for “Prejudice” in 2008 and estimated it6 to calculate its
6
As covariates, we included the following variables: age, female, regions, three income level, em-
ployment status, education, thermometer for race, party, and sexual minarity, economics expectation,
opinions about blacks. Only 2008 variables were used in estimating the regression model of “Prejudice”, and plugged 2012 variables into the prediction fucntion to caculate the predicted values of
“Prejudice” for 2012. Since some of the covariates were not used in the regressiom models of Y , the
predicted prejudice can be seen as IV. Detail de…nition of variables and data description are available
upon request.
25
predicted values in 2012. The …tted values were used for 2008, as well as 2012, instead
of the true values.
Table 3. Results Using LBS for 2008/2012 US Presidential Election
Variable
White
(1)
(2)
(3)
(4)
(5)
-0.218
0.092
0.273
0.289
0.327
(-0.44, 0.97)
(-0.37, 0.74)
(-0.39, 0.69)
(-0.54, 0.74)
(-0.27, -0.13)
Wht Rep
0.308
(0.19, 0.40)
Wht Rep Prej
-2.034
-1.912
(-2.93, -0.12)
3.715
3.495
(0.69, 5.18)
Prej Rep
Republican
Year2012
Controlled Var.
(0.85, 4.71)
-2.930
-2.476
(-2.49, -0.05)
3.000
(0.38, 4.53)
-2.064
-1.281
(-2.17, 0.42)
2.422
(-0.48, 3.84)
-1.742
(-3.92, -0.14)
(-3.32, -0.24)
(-3.4, -0.08)
(-3.01, 0.35)
-0.486
-0.824
-0.881
-0.969
(-2.05, 0.38)
(-1.61, 0.30)
(-1.61, 0.34)
(-1.7, 0.57)
0.290
0.457
0.307
0.032
-0.060
(-0.02, 0.46)
(-0.13, 0.82)
(-0.27, 0.55)
(-0.40, 0.32)
(-0.45, 0.29)
1.359
1.062
0.814
0.636
(-0.51, -0.3)
(-0.28, 2.03)
(-0.22, 1.67)
(-0.36, 1.65)
(-0.66, 1.43)
0.019
0.037
0.029
0.026
0.019
(-0.06, 0.09)
(-0.08, 0.10)
(-0.42, 0.08)
(-0.02, 0.07)
(-0.02, 0.06)
-
-
(H In c o m e )
(H In c o m e , C o lle g e ,
(H In c o m e , C o lle g e ,
O ve r5 0 )
O ve r5 0 , R e g io n )
Prej White
Prejudice
(-2.66, -0.3)
-1.614
-0.432
Table 3 reports estimation results using LBS with di¤erent covariates controlled
for. In column (1), the e¤ect of “White” and the e¤ect of “Republican” are shown
to be signi…cantly negative which implies that whites and Republicans generally do
not vote for Obama. The coe¢ cient of “White” could be interpreted as the e¤ect
of white Democrats by controlling for “White Republican” in that the majority of
non-Republicans among whites is Democrat. Thus, it demonstrates that white De26
mocrats are less likely to vote for Obama relative to non-whites. The e¤ect of “Prejudice” is found to be almost zero and insigni…cant in column (1). However, once
interaction terms within “Prejudice”, “White”and “Republican”are included, the effect of “Prejudice” turns out to be positive for white Republicans. The coe¢ cient
of “White Republican Prejudice”is signi…cantly positive in columns (2) to (5) with
di¤erent covariates controlled.
To see the e¤ects accurately, the predicted willingness to vote for Obama (Y ) for
white Republicans in column (2) is:
E(Y jW hite = 1; Rep = 1; P rejud = p; Y ear12 = y; W = w)
= e + w + 0:092
2:034 + 3:715p
2:93p
0:486p + 0:457p + 1:359 + 0:037y:
where e is a non-identi…ed intercept. If a white Republican has a strongly negative
feeling about blacks (p = 1), her/his willingness to vote for Obama ironically increases
by 0:756 (= 3:715
2:93
0:486 + 0:457), compared to a white Republican with
strongly positive feeling about blacks (p = 0). Interestingly, the higher prejudice white
Republicans are more willing to vote for Obama. Since this e¤ect is relative to that of
“ThmLiberal”, it implies the size of the e¤ect of the prejudice is almost two third of
the e¤ect of the feeling toward Liberals.
On the other hand, the predicted willingness for white Democrats to vote for
Obama can be written as:
E(Y jW hite = 1; Rep = 0; P rejud = p; Y ear12 = y; W = w)
= e + w + 0:092
0:486p + 0:457p + 0:037y:
If a white Democrat has a strongly negative feeling about blacks (p = 1), then her/his
willingness to vote for Obama decreases by 0:029, compared to a white Democrat
having a strongly positive feeling about blacks (p = 0). However, the size of the e¤ect
is almost zero and none of the coe¢ cients for white Democrats are signi…cant.
27
After controlling for income level in column (3), the predicted willingness to vote
for Obama for white Republicans is
E(Y jW hite = 1; Rep = 1; P rejud = p; Y ear12 = y; W = w)
= e + w + 0:273
1:912 + 3:495p
2:476p
0:824p + 0:307p + 1:062 + 0:029y:
If a white Republican changes her/his prejudice level from zero to one, then her/his
willingness to vote for Obama increases by 0:502. The predicted willingness to vote for
Obama for white Democrats can be written as:
E(Y jW hite = 1; Rep = 0; P rejud = p; Y ear12 = y; W = w)
= e + w + 0:273
0:824p + 0:307p + 0:029y:
If a white Democrat changes her/his prejudice level from zero to one, then her/his
willingness to vote for Obama decreases by 0:517, but still none of the coe¢ cients for
white Democrats are signi…cant.
Given a level of prejudice p, for whites the predicted willingness to vote for Obama
can be written as:
E(Y jW hite = 1; Rep = r; P rejud = p; Y ear12 = y; W = w)
= e + w + 0:273
1:912r + 3:495r
p
2:476r
p
0:824p + 0:307p + 1:062r + 0:029y:
If party a¢ liation changes from Democrat to Republican, willingness to vote for Obama
changes by
0:85 + 1:019p, which implies that the e¤ect of Republican changes from
negative to positive as the level of prejudice increase.
For white Republicans, this positive prejudice e¤ect is highly contradictory. It
could be seen as evidence of our second argument about white Republicans’ voting
behaviour; the more prejudice white Republicans had, the more likely they were to
vote for Obama in order to avoid being judged as racist. Since it was the …rst election
28
where a black candidate ran for president, white Republicans might have been under
strong pressure since they were known to have a high level of prejudice ‘as a group’.
If Obama were not elected, then it could have been attributed to white Republicans
and they would easily have been judged as racist. They might have struggled between
party identity and being blamed for being racist, and it might have pushed them to
vote for Obama reluctantly in order to escape the stigma of racism (Steele, 2008).
Thus, prejudice …nally played a crucial role in reducing the negative e¤ect of white
Republicans.
On the other hand, for white Democrats the prejudice seems to have a¤ected their
voting decisions di¤erently. Since they are known to be favourable to blacks ‘as a
group’, they would be not blamed even if Obama were not elected. Thus, the negative
prejudice might a¤ect their decision, subconsciously leading to their not voting for
Obama. However, this e¤ect is shown to be insigni…cant.
As other variables, such as “College”, “Over50”, and regional variables, are controlled for, the e¤ect of “Prejudice” for white Republicans gradually decreases. Once
regional variables are controlled for, all coe¢ cients become insigni…cant. It would be
because the e¤ect of “Prejudice” for whites by party a¢ liation is di¤erent by region.
To address this issue, we estimate the same model used in the column (2) for each
region, i.e. “Northeast”, “North Central”, “South”, and “West”, and the results are
presented in Table 4.
In Table 4, the e¤ects of “Prejudice”for white Democrats and white Republicans
are shown to be insigni…cant, except in the South. The predicted willingness for white
Republicans in the South to vote for Obama is
E(Y jW hite = 1; Rep = 1; P rejud = p; Y ear12 = y; W = w)
= e + w + 0:513
3:788 + 6:509p
4:46p
1:313p + 0:73p + 2:228 + 0:124y:
If a white Republican changes her/his prejudice level from zero to one, then her/his
29
willingness to vote for Obama increases by 1:466. On the other hand, the predicted
willingness for white Democrats in the South to vote for Obama is
E(Y jW hite = 1; Rep = 0; P rejud = p; Y ear12 = y; W = w)
= e + w + 0:513
1:313p + 0:73p + 0:124y:
If a white Democrat changes her/his prejudice level from zero to one, then her/his
willingness to vote for Obama decreases by 0:583. However, none of the coe¢ cients
for white Democrats are signi…cant. In the South, the e¤ect of “Prejudice” for white
Republicans is strongly positive and higher than its national level, whereas the e¤ect
of “Prejudice” for white Democrats is shown to be negative. It implies that white
Republicans in the South are more likely to vote for Obama as they have a higher level
of prejudice, and it does make sense in that they might be under much pressure about
the word of ‘South’, in addition to the word of ‘Republican’, both known to have a
high level of prejudice.
Given the level of prejudice p, for whites in the South the predicted willingness to
vote for Obama can be written as:
E(Y jW hite = 1; Rep = r; P rejud = p; Y ear12 = y; W = w)
= e + w + 0:513
3:788r + 6:509r
p
4:46r
p
1:313p + 0:73p + 2:228r + 0:124y:
If party a¢ liation changes from Democrat to Republican, then willingness to vote for
Obama changes by
1:56 + 2:049p, which implies that if the prejudice level is higher
than 0:76, then the e¤ect of “Republican”becomes positive.
In other regions, the conclusion does not di¤er much from that on the national
level, but we cannot con…rm this conclusion because of the insigni…cant coe¢ cients.
These insigni…cant results might be due to small sample size.
30
Table 4. Results Using LBS for Regions
Variable
White
Wht Rep
Wht Rep Prej
Prej Rep
Prej White
Prejudice
Republican
Year2012
Sample Size
4.4
N.Central
Northeast
South
West
-0.259
-0.898
0.513
-0.023
(-1.25, 1.02)
(-2.36, 0.13)
(-0.47, 1.65)
(-1.26, 1.29)
0.411
1.419
(-3.68, 4.56)
(-5.4, 7.04)
-0.478
-3.046
(-8.18, 6.35)
(-13.5, 8.93)
0.587
2.217
(-6.34, 7.97)
(-9.37, 12.8)
(-5.67, -0.08)
(-6.22, 0.44)
-0.12
1.263
-1.313
0.048
(-2.13, 2.00)
(-0.72, 4.09)
(-3.37, 0.37)
(-2.37, 2.38)
0.281
0.206
0.73
0.338
(-0.43, 0.75)
(-0.62, 0.77)
(-0.03, 0.86)
(-0.07, 0.68)
-0.798
-1.251
2.228
1.704
(-5.22, 3.31)
(-6.97, 5.85)
(-0.44, 2.93)
(-0.62, 3.44)
-0.023
-0.051
0.124
-0.093
(-0.11, 0.15)
(-0.24, 0.08)
(-0.03, 0.21)
(-0.23, 0.08)
550
406
1366
764
-3.788
(-4.95, -0.48)
6.509
(1.31, 8.54)
-4.46
-2.303
(-4.21, 0.42)
3.81
(-1.04, 7.16)
-3.432
Comparison to MLE
For the sake of comparison, estimation results of MLE for both selection and
outcome equations are presented in Table 5, along with the results of LBS. We estimate
two di¤erent MLEs: conventional MLE with homoskedastic errors (MLE_ho) and
another MLE with an (arbitrary) known form of heteroskedastic errors (MLE_he) with
SD(ujZ) = Ze0 where Ze
(P rejudice, Republican, P rejudice
Repuplican). Since
we use probit for the D equation estimation of LBS, the results are almost the same as ^
in the MLE; since
at the bottom row shows that ‘H0 :
31
= 0’cannot be rejected: ^ in
MLE is almost the same as the probit for the D equation. To simplify the comparison,
we present the estimates for e
x= w
of MLE which is a coe¢ cient normalized by
the slope of the special regressor. The 90% asymptotic statistical signi…cance is shown
for MLE with ‘ ’along with the 90% con…dence interval (CI).
In MLE_ho, the …rst column presenting results for the D equation shows that
the e¤ect of “Republican”on voting for whites is 0:849
0:638p. It implies that white
Republicans are more likely to participate in voting, but this e¤ect decreases as they
have a higher level of prejudice. The prejudice turns out to have a negative e¤ect
on voting for whites. Applying probit to the D equation and doing the likelihood
ratio test for zero slopes for the four prejudice variables, we reject the test with pvalue 0:00: prejudice does matter in the decision to vote or not to vote. As expected,
“NoCareWhoWin” has a strong e¤ect on voting and its e¤ect signi…cantly di¤ers in
the 2008 and 2012 data; ironically people who did not care who will the presidential
election are more likely to participate in voting in 2012 (its marginal e¤ect in 2012
turns out to be positive).
Turning to e in MLE_ho, the signs of signi…cant coe¢ cients are shown to be
matched in MLE_ho and LBS, but the sizes of coe¢ cients are di¤erent from LBS.
“White Republican”, “White Republican Prejudice”, and “Prejudice Republican”
are signi…cant in both MLE_ho and LBS, and “White”, “Prejudice White”, and
“Republican” are insigni…cant in both MLE_ho and LBS. However, “Prejudice” is
substantially di¤erent, and it might be due to the presence of the heteroskedastic
errors. As mentioned in the simulation section, if SD of the error terms u is a function
of regressors, then the identi…ed coe¢ cients of MLE are biased.
32
Table 5. Comparison Results of MLE
Variable
MLE_ho
in D
Intercept
White
White Rep
2.329
Prejudice Wht
Prejudice
-0.027
(-0.79, 1.32)
(-1.76, 1.70)
3.527
-5.049
4.411
8.071
-5.465
-1.504
(-2.44, 1.31)
(-4.56, 1.55)
-2.678
-1.274
2.199
-0.014
-2.535
-4.215
4.301
-2.913
-0.801
(-0.07, 0.20)
3.532
-5.055
4.406
-0.587
-2.465
-2.676
-1.274
-7.719
Y_e
Y_e
0.494
0.092
2.006
(-0.44, 0.97)
-2.604
(-9.44, -6.0)
14.56
4.911
(11.7, 17.4)
-6.109
-2.061
-1.933
-1.408
-0.040
-0.001
(-0.07, 0.06)
1.000
2.964
(1.51, 2.24)
(1.92, 4.01)
0.008
-0.014
(-0.32, 0.33)
(-0.09, 0.07)
33
0.037
(-0.08, 0.10)
(1.92, 2.35)
1.876
1.359
(-0.28, 2.03)
2.132
(1.92, 2.35)
0.457
(-0.13, 0.82)
(-1.23, 0.99)
-0.004
-0.486
(-2.05, 0.38)
(-8.3, -0.05)
-0.120
-2.930
(-3.92, -0.14)
(-11.3, -0.11)
-4.176
3.715
(0.69, 5.18)
(-6.71, -5.51)
-5.728
-2.034
(-2.93, -0.12)
(-1.14, -0.85)
2.132
ThmLiberal
(-1.56, 4.49)
LBS
-0.995
(-1.14, -0.85)
NCWW Y12
(-1.88, 2.43)
(-1.43, -1.12)
-0.995
5.946
1.463
(-3.8, -1.56)
0.035
in Y
0.279
(-3.56, -1.37)
0.663
u
(4.02, 7.88)
(-4.43, 3.26)
-2.247
=
(1.73, 2.91)
(2.61, 6.2)
(-1.52, 4.01)
0.065
2.323
(-6.01, -4.1)
(-6.44, -1.99)
1.244
in D
(3.01, 4.05)
(2.18, 13.96)
-0.565
-2.475
Y_e
(-8.20, -1.32)
(-10.1, -0.77)
(-1.43, -1.12)
NoCareWW
-4.756
(1.35, 7.47)
(-4.50, -0.86)
Year2012
4.126
0.267
(-3.55, -1.40)
Republican
in Y
(2.89, 5.36)
(-9.01, -1.09)
Prejudice Rep
u
(1.74, 2.92)
(1.19, 5.86)
Wht Rep Prej
=
MLE_he
1.000
We examine an arbitrary form of heteroskedastic errors in MLE which is a function
of “Prejudice” and “Republican”, and present its results in the next three columns of
Table 5 denoted by MLE_he. When the speci…c form of heteroskedastic errors are
allowed, MLE’s results for e considerably change. Based on MLE’s results, if a white
Republican changes her/his prejudice level from zero to one, then her/his willingness
to vote for Obama is shown to decrease by 1:66 in MLE_ho, but 0:491 in MLE_he.
Thus, we cannot eliminate the possibility of inconsistency of MLE due to the heteroskedaticity. Additionally, zero correlation coe¢ cient in both MLEs which indicates
no sample-selection problem is in contrast to other literatures in political science.
Did the prejudice against blacks play a role in the two presidential elections?
We conclude yes because the variables interacting with “Prejudice” are found to be
signi…cant in most cases. It is shown that “Prejudice”did not have a signi…cant e¤ect
for non-white and white Democrats, but it did for white Republicans. They were less
likely to vote for Obama, but this negative e¤ect decreased as they had a high level
of prejudice. Also, this e¤ect is found to be strong in the South which is known to
be unfavourable to blacks. The results might be interpreted as follows; …rst, for the
white Republicans, they might have been stigmatized as being prejudiced and longed
for ways to disprove the stigma. Thus, on the margin, they might have voted more
to avoid the stigma of racism. However, the white Democrats might have been less
stigmatized, thus no pressure to hide their prejudice: thus the prejudice would have
a¤ected their voting decision negatively, but it is found to be insigni…cant.
5
Conclusions
In this paper, we proposed a new semiparametric estimator for binary-outcome
selection models. Unlike the MLE, our estimator does not require any distributional assumption and allows heteroskedasticity with a linear index form. Our estimator, called
“LBS (Lewbelian Binary-outcome Selection estimator)” is a multi-stage estimator in
34
need of a preliminary estimator for the selection equation as well as some conditional
means of regressors nonparametrically estimated. Unlike most parametric and semiparametric estimators for this problem, our estimator for the outcome equation has a
closed-form expression and therefore does not require numerical optimization. LBS,
however, needs a regressor included in the selection equation but excluded from the
outcome equation, and a special regressor in the outcome equation á la Lewbel (2000)–
this explains “Lewbelian”in the term LBS.
We applied the LBS to US presidential election data in 2008 and 2012 where
Obama won. Our main empirical focus was whether there were negative prejudice
components against blacks in the elections, which was suggested by both the literature
and our preliminary data analysis. When we controlled for the sample-selection problem, our results showed that white Republicans voted more on the margin to avoid the
internal stigma of thinking themselves racists as they have a high level of prejudice,
while white Democrats were not or at least negatively a¤ected by the prejudice they
had. And this pattern was shown to be strong and signi…cant in the South.
35
References
Ahn, H. and J. L. Powell, 1993, Semiparametric estimation of censored selection
models with a nonparametric selection mechanism, Journal of Econometrics, 58, 3-29.
Bertrand, M. and S. Mullainathan, 2004, Are Emily and Greg more employable
than Lakisha and Jamal? A …eld experiment on labor market discrimination, American
Economic Review, 94, 991-1013.
Charles, K. K. and J. Guryan, 2008, Prejudice and Wages: An Empirical Assessment of Becker’s The Economics of Discrimination, Journal of Political Economy, 116,
773-809.
Chen, S., 1999, Distribution-free estimation of the random coe¢ cient dummy endogenous variable model, Journal of Econometrics, 91, 171-199.
Chen, X., O. Lindon, and I. Van Keilegom, 2003, Estimation of semiparametric
models when the criterion function is not smooth, Econometrica, 71, 1591-1608.
Chen, S. and Y. Zhou, 2010, Semiparametric and nonparametric estimation of
sample selection models under symmetry, Journal of Econometrics, 157, 143-150.
Coker, D., 2003, Foreword: Addressing the real world of racial injustice in the
criminal justice system, Journal of Criminal Law and Criminology, 93, 827-880.
Cunningham, G.B., 2010, Understanding the under-representation of African American coaches: A multilevel perspective, Sport Management Review, 13, 395-406.
Donald, S. G., 1995, Two-step estimation of heteroscedastic sample selection models, Journal of Econometrics, 65, 347-380.
Dong, Y., and A. Lewbel, 2015, A simple estimator for binary choice models with
endogenous regressors, Econometric Reviews, 34, 82-105.
Dovidio, J. F. and S. L. Gaertner, 2000, Aversive racism and selection decisions:
1989 and 1999, Psychological Science, 11, 315-319.
Ehrlinger, J., E. A. Plant, R. P. Eibach, C. J. Columb, J. L. Goplen, J. W. Kunstman, and D. A. Butz, 2011, How exposure to the confederate ‡ag a¤ects willingness
to vote for Barack Obama, Political Psychology, 32, 131-146.
36
Escanciano, J. C., D. Jacho-Chavez, and A. Lewbel, 2012, Identi…cation and estimation of semiparametric two-step models, Unpublished manuscript.
Han, A. K., 1987, Non-parametric analysis of a generalized regression model, Journal of Econometrics, 35, 303-316.
Heckman, J. J., 1979, Sample selection bias as a speci…cation error, Econometrica,
47, 153-161.
Heckman, J. J., 1998, Detecting discrimination, Journal of Economic Perspectives,
12, 101-116.
Heckman, J. J. and P. Siegelman, 1992, The urban institute audit studies: Their
methods and …ndings, in clear and convincing evidence: measurement of …scrimination
in America, edited by Fix, M. and R. J. Struyk, Washington, DC: Urban Institute
Press.
Hodson, G., J. F. Dovidio, and S. L. Gaertner, 2002, Processes in racial discrimination: Di¤erential weighting of con‡icting information, Personality and Social
Psychology Bulletin, 28, 460-471.
Hodson G., H. Hooper, J. F. Dovidio, and S. L. Gaertner, 2005, Aversive racism
in Britain: The use of inadmissible evidence in legal decisions, European Journal of
Social Psychology, 35, 437-448.
Hutchings, V. L., 2009, Change or more of the same? Evaluating racial attitudes
in the Obama era, Public Opinion Quarterly, 73, 917-942.
Ichimura, H., 1993, Semiparametric least squares (SLS) and weighted SLS estimator of single-index models, Journal of Econometrics, 58, 71-120.
Kinder, D. R. and A. Dale-Riddle, 2011, The end of race?. New haven, Yale
University Press.
Kinder, D. R. and T. Mendelberg, 1995, Cracks in American apartheid: The
political impact of prejudice among desegregated Whites, Journal of Politics, 57, 402424.
Klein, R. W., and R. H. Spady, 1993, An e¢ cient semiparametric estimator for
37
discrete choice models, Econometrica 61, 387-421.
Klein, R.W., C. Shen, and F. Vella, 2015, Semiparametric selection models with
binary outcomes, Journal of Econometrics, 185, 82-94.
Knowles, D. E., B. S. Lowery, and R. L. Schaumberg, 2001, Racial prejudice
predicts opposition to Obama and his health care reform plan, Journal of Experimental
Social Psychology, 46, 420-423.
Lewbel, A., 2000, Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables, Journal of Econometrics, 97, 145177.
Lewbel, A., 2007, Endogenous selection or treatment model estimation, Journal
of Econometrics, 141, 777-806.
Lewis-Beck, M., C. Tien, and , R. Nadeau, 2010, Obama’s missed landslide: A
racial cost?, Political Science and Politics, 43, 69-76.
Mas, A. and and E. Moretti, 2009, Racial Bias in the 2008 Presidential Election,
American Economic Review: Papers & Proceedings, 99, 323-329.
Newey, W., J. L. Powell, and J. R Walker, 1990, Semiparametric estimation of
selection models: Some empirical results, American Economic Review, 80, 324-328.
Piston, S., 2010, How explicit racial prejudice hurt Obama in the 2008 election,
Political Behavior, 32, 431-451.
Powell, J. L., 1987, Semiparametric estimation of bivariate latent variable models,
Department of Economics, University of Wisconsin-Madison, SSRI Working Paper No.
8704.
Powell, J. L., 2001, Semiparametric estimation of censored selection models in
nonlinear statistical modeling, edited by C. Hsiao, K. Morimune, and J.L. Powell,
Cambridge University Press.
Powell, J. L., J. H. Stock, and T. S. Stoker, 1989, Semiparametric estimation of
index coe¢ cients, Econometrica, 57, 1403-1430.
Raphael, S., M. A. Stoll, and H. J. Holzer, 2000, Are suburban …rms more likely
38
to discriminate against African Americans?, Journal of Urban Economics, 48, 485-508.
Redlawsk, D. P., C. J. Tolbert, and W. Franko, 2010, Voters, emotions, and race
in 2008: Obama as the …rst black president, Political Research Quarterly, 63, 875-889.
Robinson, P., 1988, Root-N-consistent semiparametric regression, Econometrica,
56, 31-954.
Scha¤ner, B. F., 2011, Racial salience and the Obama vote, Political Psychology,
32, 963-88.
Sherman, R. P., 1993, The limiting distribution of the maximum rank correlation
estimator, Econometrica, 61, 123-37.
Steele, S., 2008, Obama’s post-racial promise, The Los Angeles Times.
Stoll, M. A., S. Raphael, and H. J. Holzer, 2004, Black job applicants and the
hiring o¢ cer’s race, Industrial and Labor Relations Review, 57, 267-287.
Tesler, M. and D. O. Sears, 2010, Obama’s race: The 2008 election and the dream
of a post-racial America, University of Chicago Press.
Thernstrom, A., 2008, Great black hope? The reality of president-elect Obama,
National Review Online.
39
Appendix I
Table1 Continued. Simulation Results for LBS and MLE
LBS
MLE
1= w
N
Bias
1= u
w= u
1= w
RMSE
Bias
RMSE
Bias
RMSE
Bias
RMSE
FP
(5) ri = 1[ri > 0]
500
-0.040
0.173
-0.008
0.177
0.012
0.211
-0.006
0.066
25.7%
1000
-0.015
0.119
-0.010
0.107
-0.010
0.144
-0.008
0.050
9.3%
2000
-0.004
0.082
-0.003
0.078
-0.015
0.098
0.002
0.033
1.3%
(6)
= 0:05
500
-0.005
0.139
0.062
0.204
0.085
0.274
0.009
0.073
4.0%
1000
0.005
0.093
0.022
0.130
0.056
0.189
-0.003
0.041
0.0%
2000
0.002
0.070
0.019
0.091
0.029
0.130
0.002
0.034
0.0%
(7)
= 0:95
500
-0.015
0.131
-0.141
0.201
-0.270
0.342
-0.006
0.078
19.7%
1000
-0.006
0.090
-0.149
0.176
-0.293
0.322
-0.002
0.048
6.3%
2000
-0.003
0.069
-0.166
0.177
-0.331
0.344
-0.001
0.034
0.7%
40
Figure 5: Voting Behavior for Hispanic and Other by Party
41