Prejudice Matters in Elections?: An Estimator for Binary Outcomes with Sample-Selection Jin-Young Choi1 (Oct. 2015) Abstract In this paper, we propose a new semiparametric estimator for binary-outcome selection models that does not impose any distributional assumptions, and only imposes an index assumption on the selection equation. We adopt the idea in Lewbel (2000) of using a special regressor to transform the binary Y in a way that is linear in the latent index, and then remove the selection correction term by di¤erencing as in the case where Y is linear. We apply our estimator to US presidential election data in 2008 and 2012 to assess the impacts of racism (a variable that measures prejudice) on the election of Barack Obama. When we control for the sample-selection problem, our results show that prejudice does not signi…cantly a¤ect support for Obama among white Democrats, but has a positive e¤ect among white Republicans. And this pattern is found to be strong and signi…cant in the South. Keywords: Sample-Selection, Binary, Semiparametric estimator, Elections, Prejudice. JEL codes: C14, C35, D72. 1 Economics and Business, Goethe University Frankfurt, Theodor-W.-Adorno-Platz 4, 60629, Frankfurt am Main, Germany, Tel.+49)69-798-34757, [email protected]. I am very grateful to Arthur Lewvel and Myoung-jae Lee for their comments. Also I wish to extend thanks to participants of seminars in Korea U and Sogang U for their helpful comments. All errors are my own, and comments are very welcome. 1 1 Introduction Sample-selection models consist of a selection equation and an outcome equation. For a continuous outcome/response variable, many semiparametric estimators have been proposed in the literature, in addition to the fully parametric maximum likelihood estimator (MLE) and the nearly parametric Heckman (1979) two-stage estimator. Semiparametric estimators di¤er in their assumptions. Some but not all require an exclusion restriction, typically a regressor that is included in the selection equation but excluded from the outcome equation. Semiparametric estimators also vary in whether they allow for unknown forms of heteroskedasticity or not, and some identify the outcome equation intercept while others do not. Newey et al. (1990) adapts Robinson’s (1988) two-stage approach for sampleselection models, while Ahn and Powell (1993) and Powell (1987, 2001) use pairwise di¤erencing methods. These estimators require an exclusion restriction. Donald (1995) imposes normality but allows for a general form of heteroskedasticity and does not require an exclusion restriction. Chen’s (1999) estimator imposes an error symmetry restriction and allows error terms to depend on the regressors only through the absolute value of a linear index, to obtain an estimator that does not require an exclusion restriction and includes identi…cation of the intercept. Chen and Zhou (2010) propose a symmetry-based estimator allowing an unknown form of heteroskedasticity without normality, but with an exclusion restriction that the heteroskedasticity function depends only a subset of regressors. Lewbel (2007) introduces a GMM-type estimator using a density weighting idea, which allows an unknown form of heteroskedasticity and identi…es the outcome equation intercept, but requires a “special regressor” that is excluded from the heteroskedasticity function. When the response variable is binary, estimating the sample-selection model becomes much more di¢ cult: call such a model “a binary-outcome selection model’. The most popular estimator for binary-outcome selection models is probably MLE, assuming joint normality of the two equation-error terms and independence between the errors 2 and the regressors. This estimator is widely available in popular econometric software packages such as STATA. However, MLE runs the risk of misspeci…cation of normality or violation of the independence assumption. Also, MLE requires estimating the correlation coe¢ cient between the two error terms, which is often only weakly identi…ed because both errors are latent, thereby leading to numerical di¢ culties in practice. For binary-outcome selection models, there is no direct analogue to the Heckman’s (1979) two-stage estimator, although it is possible to add the usual selection correction term (the “inverse Mill’s ratio”) into the latent response equation. Semiparametric estimators for binary-outcome selection models are relatively scarce. Klein et al. (2015) propose a quasi-MLE under a double linear index assumption. They require each index to contain at least one continuous regressor and they require an exclusion restriction that the selection equation contains a continuous regressor excluded from the outcome equation. Escanciano et al. (2012) assume a double index model for the outcome equation: one linear and the other unknown. They do not require an exclusion restriction, but a continuous regressor should be included in the outcome equation. Their results apply to a more general class of models that include binaryoutcome selection models as a special case. In this paper, we propose a new semiparametric estimator for binary-outcome selection models that does not impose any distributional assumptions, and only imposes an index assumption on the selection equation. The estimator, however, does require a continuous special regressor as in Lewbel (2000, 2007) that satis…es a support restriction in the outcome equation, and a variable (which can be discretely distributed) that satis…es an exclusion restriction. Unlike most parametric and semiparametric estimators for this problem, including Klein et al. (2015), Escanciano et al. (2012), and MLE, our estimator for the outcome equation has a closed-form expression and therefore does not require numerical optimization (our selection equation can be estimated in a variety of ways, some of which may entail numerical optimization, but this equation is still not the typical source of numerical problems such as those that arise from estimating 3 the features of the joint distribution of the errors in the two equations). For the sake of comparison, we conduct simulations using favourable/unfavourable models to MLE. We …nd that our estimator performs well in most cases. Its performance is robust to heterogeneity (caused by regressors) and non-normality of the error terms, while MLE is not. Also, it is computationally not as time-demanding as MLE because it does not require estimation of the correlation coe¢ cient and has a closedform solution in the second stage. We apply our estimator to US presidential election data in 2008 and 2012 to assess the impacts of racism (speci…cally a variable that measures prejudice) on the election of Barack Obama. We …nd evidence that prejudice does not signi…cantly a¤ect support for Obama among white Democrats, but has a positive e¤ect among white Republicans. Also, this pattern is shown to be strong and signi…cant in the South. These results would be interpreted as follows; since white Republicans are known to have a high level of negative prejudice against blacks ‘as a group’, it might put pressure on them. If Obama were not elected, they would more easily be judged as racist and it would hurt their political decency. They might want to disprove this stigma and the prejudice might lead some of them to vote for Obama who otherwise would not have voted Democrat, even if they still had negative feeling about blacks. On the other hand, white Democrats are known to be favourable to blacks ‘as a group’, thus might have no pressure to manipulate their decision due to prejudice. We also …nd that our results slightly di¤er from those of MLE; the signs of signi…cant coe¢ cients are matched, but their magnitudes are not. When an (arbitrary) known form of heteroskedastic errors is allowed, MLE’s results considerably change. Thus, we cannot eliminate the possibility of its inconsistency. Also, the correlation coe¢ cient in MLE indicating presence of the sample-selection problem is found to be close to zero, in contrast to other research in political science. The rest of this paper is organized as follows. Section 2 introduces our estimator. Section 3 does a preliminary empirical analysis after examining the relevant racism and 4 own-race favour literature. Section 4 presents our main empirical …ndings. Finally, Section 5 is our conclusion. 2 Estimator Let 1[A] = 1 if A holds and 0 otherwise. Our binary-outcome selection model is Y = 1[W + X 0 0 + U > 0]; P (D = 1jZ) = (Z 0 0) E(U jZ; D = 1) = E(U jZ 0 Z = (R; X 0 )0 Y is observed only if D = 1; (1.1) for a function ( ) and a parameter 0; D = 1) g(Z 0 (1.2) 0; for a function g( ) (1.3) 0) where R is a scalar regressor, (Di ; Wi ; Zi ; Di Yi ) is observed, i = 1; :::; N where W is a special regressor as in Lewbel (2000), X and Z are kx 1 and kz regressor vectors with its …rst component being 1, U is an error term, and are parameter vectors. We can replace the linear indices X 0 0 and Z 0 0 0 and 1 0 with nonlinear ones, but we will stick to linear indexes for simplicity. In (1.1), the coe¢ cient of W is normalized to one, which is arranged by dividing both sides of the inequality by the slope of W ; the sign of W can be assumed to be known without loss of generality (and converted to positive by replacing W with p W if necessary) since it can be estimated at a rate faster than N . The selection equation is assumed to satisfy the single index assumption in (1.2). Another single index assumption is imposed on U in (1.3) because U is assumed to depend on Z only through Z 0 0. The model has an exclusion restriction that R is excluded from the outcome equation and included in the selection equation, and an inclusion restriction that W appears in the outcome equation. Although W does not appear in the selection equation, W does not have to be excluded from the selection equation because the selection equation may be taken as a ‘reduced form’for E(DjZ) that obeys the index restriction in (1.2). 5 Since g(Z 0 0) will be removed eventually by a di¤erencing argument, we will not need to specify the function g( ). We also do not need to specify the functional form of ( ) since there already exist semiparametric estimators, e.g. Han (1987), Powell et al. (1989), Sherman (1993), Klein and Spady (1993) or Ichimura (1993), allowing for an unknown ( ). Our estimator allows this level of generality, though in the later empirical analysis probit will be used for simplicity, in which case ( ) is just the standard normal distribution function. To provide some intuition for our proposed estimator, suppose that Y were continuous so that we could postulate Y = W + X 0 E(U jZ 0 0; D = 1) from Y = W + X 0 Y = W + X0 0 + g(Z 0 0) 0 + U . Then adding and subtracting + U would yield where V +V 0 U E(U jZ 0 0; D = 1): (1.4) Many of the semiparametric estimators in the literature (including most of those discussed in the previous section) use this equation to estimate the ‘selection correction term’ g(Z 0 in (1.1), g(Z 0 0) 0) 0 by di¤erencing out in some way. However, when Y is binary as appears inside of the 1[ ] function, which makes removing g(Z 0 0) by di¤erencing the model infeasible. To overcome this problem, we adopt the idea in Lewbel (2000) of using a special regressor to transform the binary Y in a way that is linear in the latent index X 0 0 + g(Z 0 0 ). We can then remove g(Z 0 0) by di¤erencing as in the case where Y is linear. Applying Lewbel (2000) requires a special regressor W that satis…es the following assumptions (letting F denote a distribution function): (i) : U q W jZ ( =) U j(W; Z) U jZ that follows the same distribution as U jZ 0 (ii) : FW jZ;D=1 is absolutely continuous with density fW jZ;D=1 (iii) : the support of W jZ; D = 1 is [Wl ; Wh ] that includes the support of X0 0 U where 6 1 Wl < 0 < W h 1: 0) De…ne a transformed response: Y 1[W > 0] : fW jZ;D=1 (W ) Yez Then, the following theorem holds that is the key for our estimator: Theorem 1 Under the model (1.1)-(1.3) and assumptions (i)-(iii), it holds that E(Yez jZ; D = 1) = X 0 Proof. Observe:, 0 + g(Z 0 (1.5) 0 ): Y 1[W > 0] jW; Z; D = 1g jZ; D = 1] fW jZ;D=1 (W ) 1[W > 0]jW; Z; D = 1g jZ; D = 1] fW jZ;D=1 (W ) E(Yez jZ; D = 1) = E[ Ef = E[ = Z EfY Wh Wl Wh = = Z Wl Z Wh Wl Z Z EfY 1[W > 0]jW; Z; D = 1g fW jZ;D=1 (w)dw fW jZ;D=1 (w) Ef1[W + X 0 Z (1[W + X 0 Wh 0 + U > 0] 1[W > 0]jW; Z; D = 1gdw 0 + u > 0] 1[W > 0])dFU jZ;D=1 (u)dw (1[W > X0 The inner integrand depends on X0 = 0 u] Wl also zero when X0 0 0 1[W > 0])dw dFU jZ;D=1 (u): X0 0 u = 0, and it is X0 0 u<W <0 u: it is zero when U 6= 0 except if X0 0 u < 0, then the inner integrand is 1 when if X0 0 u > 0, then the inner integrand is 1 when 0 < W < X0 0 u: Thus, E(Yez jZ; D = 1) Z Z 0 0 = (1[ X 0 u < 0] dw 1[ X 0 0 X0 0 u Z = (X 0 0 + u)dFU jZ;D=1 (u) = X 0 0 + g(Z 0 0 ): 7 u > 0] Z 0 X0 0 u dw) dFU jZ;D=1 (u) The selection correction term g(Z 0 0) in (1.5) can be removed using one of the di¤erencing ideas applied to (1.4) in the literature. In this paper, we use the approach of Newey et al. (1990), applied to Lewbel’s (2000) special regressor transformation. For this we need to de…ne another transformed response variable: Y 1[W > 0] : fW jZ 0 0 ;D=1 (W ) Yez The following lemma then gives a linear expression for the outcome equation. Lemma 2 Under the model (1.1)-(1.3) and assumptions (i)-(iii), it holds that EfYez E(Yez jZ 0 0; D E(XjZ 0 = 1)jZ; D = 1g = fX 0; D = 1)g0 (1.6) 0: Proof. Following the proof for Theorem 1, we obtain E(Yez jZ 0 0; D = 1) = E(XjZ 0 0; D = 1)0 0 + g(Z 0 0 ): Subtract this from to remove g(Z 0 E(Yez jZ; D = 1) = X 0 0 ): E(Yez jZ; D = 1) E(Yez jZ 0 0; D 0 + g(Z 0 = 1) = fX 0) E(XjZ 0 0; D = 1)g0 which can be rewritten as (1.6). Using (1.6), we can estimate DfYez E(Yez jZ 0 0 0; D by the least squares estimator (LSE) of = 1)g on DfX E(XjZ 0 0; D = 1)g under the non-singularity of E[ DfX E(XjZ 0 0; D = 1)gfX 8 E(XjZ 0 0; D = 1)g0 ]: 0 As the intercept in 0 is not identi…ed in the LSE, let our estimator for is " N X b = Di fXi i=1 " N X i=1 b i jZ 0 ^ 0 ; Di = 1)gfXi E(X i b i jZ 0 ^ 0 ; Di = 1)gfY^zi E(X i Di fXi ^ ) denote estimators for where ^ 0 and E( Y^zi Yi 1[Wi > 0] fbW jZ ;D =1 (Wi ) i 0 and i denote the slopes in b i jZ 0 ^ 0 ; Di = 1)g0 E(X i 0. # b z i jZ 0 ^ 0 ; Di = 1)g E(Y i Then 1 (1.7) # and E( ) and Y^z i Yi 1[Wi > 0] : fbW jZ 0 ^ ;D =1 (Wi ) i 0 i Taken together, these equations provide a simple multistaged estimator ^ for . The …rst stage is obtaining ^ 0 and the second stage is estimating f^’s using ^ 0 . The b ) using ^ 0 and f^’s, and the …nal stage is next step is constructing Y^z , Y^z , and E( then calculating ^ . The …rst stage estimation of ^ 0 , can be done using a variety of p estimators. Semiparametric N -consistent estimators that could be used for ^ 0 include those described by Han (1987), Powell et al. (1989), Sherman (1993), Klein and Spady (1993) or Ichimura (1993). As for Y^z and Y^z , Dong and Lewbel (2015) showed various b ) using kernel ways to obtain Y^z and Y^z . Given ^ 0 , we can obtain Y^z , Y^z and E( density and regression estimators. If 0 were known, then the resulting estimator b would have the same structure as the special regressor estimator in Lewbel (2000); given this, it could be shown to p be N -consistent and asymptotically normal under the same regularity conditions provided there. Here 0 has to be estimated, which complicates derivation of the limiting distribution of b. However, the estimator still takes the form of a standard multistep estimator where some of the steps are standard nonparametric regression p b ), yielding N -consistent and asymptotically normal and density estimators, f^ and E( estimates for b under standard conditions. Explicit formulas for the limiting variance of b will be complicated given the number of nuisance estimators involved (^ 0 , f^, b )), and will depend on details regarding the chosen estimator ^ 0 . We Y^z , Y^z and E( 9 therefore instead use a nonparametric bootstrap to obtain con…dence intervals in our later empirical application. See, e.g., Chen et al. (2003) for one set of regularity conditions that su¢ ce to rationalize use of a bootstrap in a multistep estimator like ours. Note that our estimator is particularly suitable for bootstrapping because it does not require any numerical searches or optimizations. 3 Simulation Studies To simplify presentation, call our semiparametric estimator “LBS (Lewbelian Binary- outcome Selection model estimator)”. The simulation design is as follows yi = 1 + 1 x1i + 2 x2i + w wi + ui ; d = 1[1 + x1i + x2i + ri + wi + ei > yi = 1[yi > 2 ]di , COR(ei ; ui ) x1i ; x2i ; ri ~N (0; 1), ui q xi ; wi , 1 ]; 0:5; ri = ri or 1[ri > 0], ei q xi ; wi ; ri , 1; 2 = 1, w = 2. The regressor values x1i , x2i , wi and ri for the binary instrument ri were generated as i.i.d. standard normal. The error term ui in the outcome equation and the error term ei in the selection equation were generated from a joint distribution with non-zero correlation between ui and ei . The special regressor wi was included in the selection equation because it is not required to be excluded from this equation, but not used as a regressor in Z for Z 0 and P (Y = 1jD = 1) values ( 1 and 2) 0. We set the cuto¤ values 1 and 2 to retain P (D = 1) 0:65 0:77 similar to the actual data we use. Also the parameter are set to 1, except the coe¢ cient of the special regressor ( so that the true identi…ed coe¢ cients of regressors are 0:5(= 1 w = 2 w w = 2), ). Note that the intercept is not identi…ed in LBS, because our estimator is based on the approach of Newey et al. (1990). 10 Probit MLE was used to estimate Z 0 0 in the …rst-stage, and following a suggestion in Dong and Lewbel (2015), we estimated using f"1 jZ;D=1 and f"2 jZ 0 fW jZ;D=1 and fW jZ 0 for W jZ and W jZ 0 0 ;D=1 0. 0 ;D=1 instead of , where "1 and "2 are the error terms for the linear models Since the density function f"1 jZ;D=1 and f"2 jZ 0 0 ;D=1 are in the denominator of Yez and Yez , we chose trimming values within [0:01; 0:1] minimizing the mean squared error (MSE). For a bandwidth of nonparametric estimation for f ( jS) and E( jS), we used the rule of thumb bandwidth h = SD(S)N 1=5 . Mean bias (Mn- Bias) and root MSE are reported, and all performance measures were calculated from 300 repetitions. For the sake of comparison, we presents simulation results of MLE for a binary response with sample-selection, along with results of LBS. The error terms ui and ei were generated from the jointly normal distribution with correlation = 0:5 which is favourable to MLE, otherwise MLE does not converge well. We examine various simulation designs from (1) to (4) below to verify performance of LBS compared to MLE. Since our main interests are slope coe¢ cients and di¤erences between are not noticeable, only LBS’s results of and 1= w are presented, where u 1= w , and MLE’s results of 1 1= u, and 2 w = u, denotes standard deviation of the outcome equation error u which equals 1 for the jointly normal distribution. The …rst panel of Table 1 presents simulation results of the simulation design (1) which is favourable to MLE assuming the joint normal distribution of the two error terms. Under the design favourable to MLE, LBS performs well even in the small sample and its RMSE is improved as the sample size increases. For MLE, biases of the estimators for the estimator for w= u 1= u 1= w and w= u are relatively large in magnitude, but a bias of is close to zero. This is because the biases for 1= u and appear in the same direction; the biases cancelled out in the ratio form. ‘FP’ in the last column denotes percentage of failure to converge for MLE. Even in the favourable design, MLE failed to converge 4:0% of the times out of 300 repetitions in the small sample (N = 500), but the percentage of failure decreases fast as the sample 11 size increases. Under the design favourable to MLE, both LBS and MLE perform well, but MLE is better than LBS in terms of RMSE, due to its e¢ ciency. Table1. Simulation Results for LBS and MLE LBS MLE 1= w N Bias 1= u w= u 1= w FP RMSE Bias RMSE Bias RMSE Bias RMSE (1) (ei ; ui )~JN 500 -0.021 0.129 0.020 0.177 0.051 0.237 -0.003 0.069 4.0% 1000 -0.004 0.088 0.002 0.120 0.022 0.161 -0.005 0.047 1.0% 2000 0.000 0.063 0.009 0.081 0.010 0.117 0.002 0.031 0.0% (2) (ei ; ui )~Beta 500 -0.019 0.112 0.539 0.587 1.094 1.170 -0.005 0.056 12.0% 1000 -0.014 0.071 0.526 0.551 1.056 1.089 -0.001 0.037 1.7% 2000 0.000 0.049 0.510 0.522 1.022 1.038 0.000 0.026 0.0% (3) SD(ujzi ) = j z zi j where zi (x1i ; x2i ; ri ); z (0:5; 0:5; 0:5) 500 -0.021 0.110 0.348 0.460 0.803 0.893 -0.016 0.065 91.7% 1000 -0.014 0.081 0.184 0.259 0.620 0.676 -0.047 0.061 90.3% 2000 -0.003 0.060 0.128 0.184 0.529 0.570 -0.053 0.062 88.0% (4) SD(ujzi ; wi ) = j z zi + wi j 500 0.016 0.117 0.117 0.461 0.253 0.824 0.000 0.076 31.3% 1000 0.030 0.087 -0.091 0.300 -0.074 0.494 -0.023 0.063 24.7% 2000 0.030 0.062 -0.165 0.242 -0.199 0.355 -0.034 0.055 5.3% In the second panel, we examine a case where the normality assumption does not hold. The two error terms ei and ui are generated from a beta distribution (Beta(2; 2)) on the interval [ 1:5; 1:5]. For MLE, the estimators for but there is no bias for 1= w 1= u and w= u are biased, for the same reason as described above. For LBS, the results are not much di¤erent from those of the design (1). 12 Next, the e¤ect of heteroskedasticity of the error term ui (SD(ujzi ) = j0:5zi j) is investigated. For LBS, the performance does not change much, but for MLE, as expected, biases of the estimators for and 1= u nitude, and a bias of the estimator for 1= w w= u are substantially large in mag- does not disappear even if the sample size increases. The percentage of failure is shown to be very high (91:7% in the small sample), and it is not much diminished as the sample size increases. For LBS heteroskedasticity with Z 0 0 is allowed, but not with W . Thus, when the heteroskedasticity with W , along with Z 0 0, is introduced, LBS is biased and the size of the bias is similar to that of MLE as shown in the third panel. All estimates of MLE are biased under presence of heteroskedasticity with any of regressors. Additionally, we consider two cases: where the excluded variable ri is binary, and where the correlation is close to 0 or 1. These results are presented in appendix. The main advantage of LBS comparing to other semiparametric estimators for LDV models is to allow the excluded variable ri to be binary. When the binary ri is used, instead of the continuous ri , the performance of LBS becomes slightly worse than that of the design (1), but it is improved as the sample size increases. In the case where the correlation in estimating 1= w is close to 0 or 1, while LBS remains the same, MLE becomes poor 1= u and w= u as the correlation is close to 1. However, the ratio is estimated correctly in MLE. In summary, even under the simulation design favourable to MLE, LBS performs fairly well in various cases. It is computationally not time-demanding as much as MLE because it has a closed-form solution in the second stage, and the …rst stage could be done by simple Probit. If the presence of heteroskedasticity is considered, LBS would be more appropriate for binary-outcome selection models. While MLE works better in terms of RMSE and estimating 1= w seems …ne, it frequently fails to converge in many cases and we could not be sure which reasons cause this failure. In MLE, estimating the correlation parameter is computationally troublesome and gradi-search over makes maximization procedure time-demanding. In LBS, normalization of the parameters by 13 dividing through with w could be seen as a disadvantage. However, all parameters in binary models are identi…ed only up to scale, so that it is actually not a disadvantage for LBS. Also, MLE’s performance in estimating and its performance in estimating 4 1= w 1= u and w= u is worse than LBS’s, is rather comparable to LBS’s. Empirical Analysis Many studies in various disciplines of social science provided evidence of preju- dice/discrimination against blacks; Heckman and Siegelman (1992), Heckman (1998), Raphael et al. (2000), Bertrand and Mullainathan (2004), Stoll et al. (2004), Charles and Guryan (2008) in the labour market; Dovidio and Gaertner (2000), Hodson et al. (2002) in schools; Cunningham (2010) in sports; Knowles et al. (2001), Coker (2003), Hodson et al. (2005) in the justice system. Given the evidence of negative prejudice against blacks, the 2008 US presidential election, where the …rst black candidate Barack Obama was nominated, might be a good natural experiment to study the e¤ect of prejudice on a social decision. Many studies analyzed the impact of prejudice on Obama’s bid for presidency and whether white voters discriminated against the black candidate (i.e. Steele 2008; Thernstrom 2008; Hutchings 2009; Mas and Moretti 2009; Lewis-Beck et al. 2010; Piston 2010; Redlawsk et al. 2010; Tesler and Sears 2010; Ehrlinger et al. 2011; Highton 2011; Kinder and Dale-Riddle 2011; Scha¤ner 2011). Many studies agree that white prejudice (partially) a¤ected the election outcomes. However, there is still a wide diversity of opinion about the ways in which Obama overcame the adverse e¤ects of the prejudice. In most studies, they focussed on which groups contributed to the victories of Obama and ignored the sample-selection problem results in inconsistent estimates regarding the impact of prejudice. We, however, focus on the prejudice issue where the selection problem is critical: whether voters with negative feeling against blacks are more likely to participate in voting. Our empirical analysis, both the preliminary 14 analysis using time-series data and the main analysis using individual data, is intended to shed some light on the e¤ect of prejudice, taking advantage of the unique opportunity that the 2008 and 2012 US presidential elections present in terms of race. 4.1 Preliminary Analysis With Time-Series: Race and Party We carried out a preliminary data analysis using the aggregate time-series data over 1980-2012 (9 elections) obtained from the American National Election Studies (ANES)2 that is designed to be representative at the national level, not at state nor local levels. Our main empirical analysis in the next section is based on individual survey data that are also from ANES. Let “Black”, “White”, and “Hisp” stand for blacks, whites, and Hispanics respectively. The left panel of Figure 1 shows that the voting share over the period 1980–2012 has been dominated by whites. The voting share of Hispanics has increased gradually since 1980 while that of blacks has been stable at around 11~13%, except in the 2004 election. There is a local peak of whites’voting share in 2008 when Barack Obama ran for president for the …rst time, followed by a dip in 2012 when he ran for the second time. The second panel of Figure 1 presents the turn-out rate by race over 1980–2012. The turn-out rate of whites has been retained consistently above 70% and hit 80% in 2004 but dropped slightly in 2008 and 2012. The turn-out rate of blacks has increased gradually since 1988; it caught up to the turn-out rate of whites in 2008 and even overtook it in 2012. The right panel of Figure 1 shows the Democrat-candidate-supporting (DCS ) probability among the voters by race over the period 1980–2012. Due to the popularity of Bill Clinton when he ran for the second time, it reached a local peak for all races in 1996. After Clinton, the DCS probability dipped during George Bush’s era and peaked 2 See ANES website for detail http://www.electionstudies.org/studypages/cdf/cdf.htm 15 information on the study. Figure 1: Voting Behavior by Race in the 2008 election to reach about 98% for blacks and 76% for Hispanics. Blacks and Hispanics strongly supported Obama in both 2008 and 2012. DCS probability of whites in 2008 slightly increased compared to the previous election and dropped again in 2012, but the amount of change seems negligible. Even if blacks and Hispanics strongly supported Obama, their voting share was not big enough to give Obama an absolute chance of winning. Whites’voting behaviour would play a huge role for Obama, so that overcoming the negative prejudice that whites might have against blacks, might have been a critical issue for him. It will be interesting to know in which direction whites’voting behaviour contributed to the results of the elections in 2008 and 2012, and especially how their perceived negative prejudice against blacks a¤ected the election results. In addition to race, party a¢ liation is another important factor for explaining a voting decision. Let “Demo”and “Rep”denote respondents who identify as Democrat or Republican, respectively. Those who did not belong to either party are identi…ed as “Independent”, which is not presented in the …gures because the party a¢ liation is exclusive. 16 Figure 2: Voting Behavior for Black and White by Party The left panel of Figure 2 shows the share of voters (D = 1) by race and party a¢ liation. The share of black Democrats decreased from 12:7% in 2004 to 11:3% in 2008 and 11:6% in 2012. The share of white Democrats increased from about 29:6% in 2004 to 32% in 2008 and then decreased to 28:6% in 2012. It implies that the white Democrats were more likely to participate in voting temporally in the 2008 election, and their voting decision might have played a crucial role in that election. In contrast, the share of white Republicans decreased from 40% in 2004 to 39:1% in 2008, and then further decreased to 37:1% in 2012. In the right panel of Figure 2, the black Democrats’ DCS probability increased from 92% in 2004 to 99:8% in 2008, and then dropped to 97:7% in 2012. The black Republicans’ DCS probability changed dramatically from 45:9% in 2004 to 92:4% in 2008, and then dropped to 47% in 2012. It demonstrates that blacks strongly supported Obama in the 2008 election regardless of their party a¢ liation, but not in the 2012 17 election. Although this characterizes the 2008 and 2012 elections well, the small voting share of blacks did not make a big change in the overall election picture. The white Democrats’DCS probability declined from 88:2% in 2004 to 86:9% in 2008 and then went up to 88:1% in 2012. And the white Republicans’ DCS probability increased from 7:1% in 2004 to 8:4% in 2008 and then declined to 6% in 2012. Even if the size of the change was small, it is puzzling. If anything, one would expect a small decrease for the white Republicans and a small increase for the white Democrats because white Democrats (but not white Republicans) are known to be relatively favourable to blacks. One possible explanation is that prejudice against blacks might have a¤ected white Republicans and white Democrats di¤erently. The white Republicans might have tried to avoid the inevitability of being seen as racists if they did not vote for Obama. Even if they had other strong reasons not to vote for Obama, because they are known to have a high level of prejudice against blacks as ‘a group’, they could easily be judged as racists if Obama were not elected. Thus, by voting for Obama, some of them might have been trying to escape the stigma of racism (Steele, 2008). On the other hand, the white Democrats are known to be favourable to blacks as ‘a group’, so they were less likely to be judged as racists even if Obama were not elected. In this case, the prejudice might a¤ect DCS probability of white Democrats negatively. We analyzed the DCS probability for Hispanics and other races by party a¢ liation, and found similar results to those of blacks, which are presented in Figure 5 of the appendix. Hispanics and other races also showed unusually strong support for Obama in 2008 and 2012 regardless of their party a¢ liation, but their total contributions are negligible due to their small voting share. 4.2 Preliminary Analysis With 2008 Data: Prejudice One of the key variables in our analysis is “Prejudice” which measures racial prejudice against blacks ranging over 0 to 1. “Prejudice”is constructed by subtracting 18 the score given to whites (by the respondent) from the score given to blacks; the score is about blacks/whites being perceived as lazy and unintelligent, and this way of measuring prejudice follows Kinder and Mendelberg (1995), Hutchings (2009) and Piston (2010). The prejudice variable falls in 0 to 1, “0” being the most positive for blacks (i.e. blacks seen as diligent and intelligent, with whites seen as lazy and unintelligent) and “1”being the most negative for blacks. Unfortunately, the prejudice variable is available only in 2008, which results in estimating predicted prejudice values for 2012 in the next section. Figure 3 shows the histogram of “Prejudice” in 2008 by di¤erent groups. For blacks, “Prejudice” is symmetrically distributed around 0:5 with 0:5 being neutral (no prejudice). However, for whites it is skewed to the right, even though it peaks at being neutral. By party a¢ liation, Republicans have a relatively high level of prejudice compared to Democrats whose distribution is almost symmetrical, and their proportion of being neutral is also 10% lower than that of Democrats. However, the prejudice of white Democrats only might be di¤erent from this because most blacks belong to Democrats. To see the di¤erence in the level of prejudice by group, we categorize respondents according to education level, income level, region, and age. “College” denotes college graduates, and “HIncome”denotes the household income being higher than the upper 33 percentile. “Over50” indicates a group with age over 50. The regional dummy variable, “South” represents the region codes of the U.S. Census3 . Forty-…ve % of “College” have no prejudice and only 18% have a high level of prejudice (higher than 0:6), while 31% of “Non-College”(College= 0) have no prejudice and 29% have a high level of prejudice. In other categories, i.e. “South” vs. “Non-South”, “HIncome” vs. “Non-Hincome”, and “Over50” vs. “Non-Over50”, there are no signi…cant di¤erences in the density of “Prejudice”. Figure 4 shows the density of “Prejudice” for whites only by party a¢ liation. 3 South: AL, AR, DE, D.C., FL, GA, KY, LA, MD, MS, NC, OK, SC,TN, TX, VA, WV. 19 Figure 3: Prejudice by Group White Republicans have a substantially higher level of prejudice than white Democrats. Even if the density for white Democrats is slightly skewed to the right, the proportion of those being neutral is 12% higher than that of white Republicans. Even if “Prejudice”is self-reported, which results in underestimating real prejudice for whites, Figures 4 and 5 clearly show that whites have a relatively high level of prejudice against blacks, but that the levels of prejudice are shown to be di¤erent by party a¢ liation. A number of interesting conclusions emerged from the preliminary data analysis based on the ANES time-series data and “Prejudice” in 2008. First, non-whites sent strong support to Obama in 2008 and 2012, regardless of their party a¢ liation. Second, there was a 1:3% drop of the DCS probability in the white Democrats and a 1:3% increase in the white Republicans in 2008. This is an interesting …nding in that Democrats are known to be favourable to blacks, but Republicans are not. This is veri…ed by the “Prejudice”variable in 2008 data that white Republicans have a higher 20 Figure 4: Prejudice of Whites by Party level of prejudice than white Democrats. To investigate the second issue more precisely, in the next section we turn to estimation approach using micro level data in 2008 and 2012. We focus on whether the prejudice has a¤ected the 2008 and 2012 election outcomes di¤erently by race and party a¢ liation, controlling the sample-selection problem. As most support for Obama came from whites and there are few variations in non-whites’ voting behaviour, our estimation analysis focuses on whites’voting behaviour. If our interest is to see which group contributed to the victories of Obama in 2008 and 2012, there is no reason to bother with the sample-selection problem that we can only observe voting decisions for those who voted. However, in the main empirical analysis that follows, we focus on the prejudice issue where the selection problem is critical: whether voters with negative feeling against blacks are more likely to participate in voting (in the 2008 election, white Democrats were more likely to participate in voting). That is, whether the prejudice a¤ects in the population, not just in the selected sample of the voters, which calls for methods controlling the selection 21 problem. Before estimation, it would be helpful to consider two possible interpretations regarding results we might have. If the prejudice matters for whites’voting decisions, it might work di¤erently by party a¢ liation. For white Republicans, if they behaved following their prejudice, it would work as a strong reason not to vote for Obama (race and party a¢ liation) and their decision could be rationalized by party a¢ liation, not race. In this case, it would amplify the negative e¤ect of Republican on voting for Obama, and the e¤ect of interaction between the prejudice and (white) Republican would be negative. However, since they are known to have a high level of prejudice ‘as a group’, it might put pressure on them and they might want to disprove this stigma. If Obama were not elected, they would more easily be judged as racist and it would hurt their political decency. Thus, the prejudice might lead some of them to vote for Obama even if they still had negative feeling about blacks. And the higher the prejudice they have, the more likely they might have been to vote for Obama to avoid such blame. If this argument holds, the interaction e¤ect between the prejudice and (white) Republican would turn out to be positive. Di¤erently from Republicans, if white Democrats did not vote for Obama because of prejudice, their decision would not be rationalized by anything other than race and they would easily be judged as racist. In that case, they would behave against their prejudice so that the prejudice might be irrelevant or have small positive e¤ect. On the other hand, it is also possible that because the voting decision is not disclosed and they are known to be favourable to blacks ‘as a group’, they would be not blamed even if Obama were not elected. Thus, the negative prejudice might a¤ect their decision, subconsciously leading to their not voting for Obama. If this argument holds, the interaction e¤ect between the prejudice and (white) Democrats would be negative. If voters do consider Obama simply as a presidential candidate, not as a ‘black’, the e¤ect of the prejudice should be insigni…cant regardless of party a¢ liation. 22 4.3 Main Empirical Analysis 4.3.1 Variables and Descriptive Statistics The individual ANES data for our main empirical analysis are repeated crosssections. These data include individual sample weights4 to retain the representativeness of the samples over time. Although the weight is used for the preliminary time-series data analysis, it is not used for our main empirical analysis; a weight based on regressors does not a¤ect estimators’consistency. Let Y = 1 denote voting for Barack Obama (in the 2008 and 2012 elections), and D = 1 denote participation in voting. Y is observed only for those who voted (D = 1). Table 2 presents the descriptive statistics of the variables used; both unweighted and weighted numbers are displayed, with the weighted numbers in ( ). As described in the previous section, “College” takes 1 for college graduates, “HIncome” is the household income being higher than the upper 33 percentile, and “Over50” is a cohort dummy for age over 50. Regional dummy variables, “West”, “N. Central (Northeast)”, “N. East (North Central)”, and "South" represent the region codes of the U.S. Census5 . “NoCareWhoWin” is not caring about who will win the presidential election: this is used as exclusion restriction variable R; “ThmLiberal” is the feeling “thermometer” toward Liberals: it ranges over 0 to 100, with 50 being neutral, and this variable is used as the special regressor W . Table 2 shows that indeed blacks and Hispanics are over-represented in the sample, and as a consequence, Democrats are over-represented whereas Republicans are under-represented. We exclude the individuals with missing values in the variables used in our analysis. 4 For detail, see the documentation for the weight in ANES in http://www.electionstudies.org/studypages/cdf/cdf.htm. 5 Northeast: CT, ME, MA, NH, NJ, NY, PA, RI, VT; North Central: IL, IN, IA, KS, MI, MN, MO, NE, ND, OH, SD, WI; South: AL, AR, DE, D.C., FL, GA, KY, LA, MD, MS, NC, OK, SC,TN, TX, VA, WV; West: AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA, WY. 23 Table 2. Unweighted (Weighted) Descriptive Statistics Variable 2008 2012 Mean SD Mean Y jD = 1 0.654 (0.544) 0.678 (0.524) D 0.781 (0.788) 0.742 (0.763) Black 0.235 (0.115) 0.250 (0.118) White 0.539 (0.761) 0.461 ( 0.725) Hisp 0.187 (0.076) 0.224 (0.100) Over50 0.402 (0.395) 0.310 (0.359) West 0.253 (0.212) 0.242 (0.222) N.Central 0.173 (0.215) 0.186 (0.231) N.East 0.108 (0.142) 0.157 (0.183) South 0.466 (0.431) 0.415 (0.364) College 0.234 (0.296) 0.266 (0.319) HIncome 0.245 (0.314) 0.212 (0.345) Demo 0.592 (0.505) 0.629 (0.498) Rep 0.309 (0.397) 0.295 (0.427) NoCareWW 0.173 (0.185) 0.852 (0.859) ThmLiberal 56.90 (54.36) 20.8 Prejudice 0.540 (0.546) 0.12 N 1614 54.72 (51.85) SD 21.2 1534 In Table 2, the weighted means of most variables are shown to be similar in 2008 and 2012, except “NoCareWhoWin” which is only 0:185 in 2008, but 0:859 in 2012. Obviously, the 2008 election received huge attention because of the …rst black candidate nominated for president and most voters might care about who will win the presidential election: but this was not the case in 2012. 24 4.3.2 Estimation Results with 2008 and 2012 data Recall the model (1.1) and (1.2) in the estimator section: Y = W + X0 Y = 1[Y > 0]; 0 Y is observed only if D = 1; P (D = 1jZ) = (Z 0 Z = (R; X 0 )0 (1.1) +U 0) for a function ( ) and a parameter 0; (1.2) where R is a scalar regressor. As X, we control several variables described in the previous section. For LBS, we need to estimate fW jZ;D=1 and fW jZ 0 0 ;D=1 . Following a suggestion in Dong and Lewbel (2015), we estimated for f"1 jZ;D=1 and f"2 jZ 0 0 ;D=1 instead of fW jZ;D=1 and fW jZ 0 where "1 and "2 are the error terms for the linear models for W jZ and W jZ 0 0. 0 ;D=1 , We used b The 90% asymptotic a cross-validation method to choose the bandwidth for fb and E. statistical signi…cance is shown with ‘ ’ along with the 90% con…dence interval (CI) in ( ); i.e. if the nonparametric (the re-sampling) bootstrap 90% CI excludes 0, then ‘ ’ is attached. The identi…ed parameter is a ratio e w which is a relative e¤ect to the e¤ect of the special regressor W , and the sign of the identi…ed parameter is not changed because the sign of W might be positive. In the estimation, we rescaled “ThmLiberal”ranging over 0 to 1, otherwise the identi…ed parameters are too big. Since ANES are repeated cross-section, we pooled the 2008 and 2012 data and treated them as cross-sectional data, including the time dummy “Year2012”. One of the di¢ culties of our data is that the key variable “Prejudice”is observed only in 2008. We assumed a linear model for “Prejudice” in 2008 and estimated it6 to calculate its 6 As covariates, we included the following variables: age, female, regions, three income level, em- ployment status, education, thermometer for race, party, and sexual minarity, economics expectation, opinions about blacks. Only 2008 variables were used in estimating the regression model of “Prejudice”, and plugged 2012 variables into the prediction fucntion to caculate the predicted values of “Prejudice” for 2012. Since some of the covariates were not used in the regressiom models of Y , the predicted prejudice can be seen as IV. Detail de…nition of variables and data description are available upon request. 25 predicted values in 2012. The …tted values were used for 2008, as well as 2012, instead of the true values. Table 3. Results Using LBS for 2008/2012 US Presidential Election Variable White (1) (2) (3) (4) (5) -0.218 0.092 0.273 0.289 0.327 (-0.44, 0.97) (-0.37, 0.74) (-0.39, 0.69) (-0.54, 0.74) (-0.27, -0.13) Wht Rep 0.308 (0.19, 0.40) Wht Rep Prej -2.034 -1.912 (-2.93, -0.12) 3.715 3.495 (0.69, 5.18) Prej Rep Republican Year2012 Controlled Var. (0.85, 4.71) -2.930 -2.476 (-2.49, -0.05) 3.000 (0.38, 4.53) -2.064 -1.281 (-2.17, 0.42) 2.422 (-0.48, 3.84) -1.742 (-3.92, -0.14) (-3.32, -0.24) (-3.4, -0.08) (-3.01, 0.35) -0.486 -0.824 -0.881 -0.969 (-2.05, 0.38) (-1.61, 0.30) (-1.61, 0.34) (-1.7, 0.57) 0.290 0.457 0.307 0.032 -0.060 (-0.02, 0.46) (-0.13, 0.82) (-0.27, 0.55) (-0.40, 0.32) (-0.45, 0.29) 1.359 1.062 0.814 0.636 (-0.51, -0.3) (-0.28, 2.03) (-0.22, 1.67) (-0.36, 1.65) (-0.66, 1.43) 0.019 0.037 0.029 0.026 0.019 (-0.06, 0.09) (-0.08, 0.10) (-0.42, 0.08) (-0.02, 0.07) (-0.02, 0.06) - - (H In c o m e ) (H In c o m e , C o lle g e , (H In c o m e , C o lle g e , O ve r5 0 ) O ve r5 0 , R e g io n ) Prej White Prejudice (-2.66, -0.3) -1.614 -0.432 Table 3 reports estimation results using LBS with di¤erent covariates controlled for. In column (1), the e¤ect of “White” and the e¤ect of “Republican” are shown to be signi…cantly negative which implies that whites and Republicans generally do not vote for Obama. The coe¢ cient of “White” could be interpreted as the e¤ect of white Democrats by controlling for “White Republican” in that the majority of non-Republicans among whites is Democrat. Thus, it demonstrates that white De26 mocrats are less likely to vote for Obama relative to non-whites. The e¤ect of “Prejudice” is found to be almost zero and insigni…cant in column (1). However, once interaction terms within “Prejudice”, “White”and “Republican”are included, the effect of “Prejudice” turns out to be positive for white Republicans. The coe¢ cient of “White Republican Prejudice”is signi…cantly positive in columns (2) to (5) with di¤erent covariates controlled. To see the e¤ects accurately, the predicted willingness to vote for Obama (Y ) for white Republicans in column (2) is: E(Y jW hite = 1; Rep = 1; P rejud = p; Y ear12 = y; W = w) = e + w + 0:092 2:034 + 3:715p 2:93p 0:486p + 0:457p + 1:359 + 0:037y: where e is a non-identi…ed intercept. If a white Republican has a strongly negative feeling about blacks (p = 1), her/his willingness to vote for Obama ironically increases by 0:756 (= 3:715 2:93 0:486 + 0:457), compared to a white Republican with strongly positive feeling about blacks (p = 0). Interestingly, the higher prejudice white Republicans are more willing to vote for Obama. Since this e¤ect is relative to that of “ThmLiberal”, it implies the size of the e¤ect of the prejudice is almost two third of the e¤ect of the feeling toward Liberals. On the other hand, the predicted willingness for white Democrats to vote for Obama can be written as: E(Y jW hite = 1; Rep = 0; P rejud = p; Y ear12 = y; W = w) = e + w + 0:092 0:486p + 0:457p + 0:037y: If a white Democrat has a strongly negative feeling about blacks (p = 1), then her/his willingness to vote for Obama decreases by 0:029, compared to a white Democrat having a strongly positive feeling about blacks (p = 0). However, the size of the e¤ect is almost zero and none of the coe¢ cients for white Democrats are signi…cant. 27 After controlling for income level in column (3), the predicted willingness to vote for Obama for white Republicans is E(Y jW hite = 1; Rep = 1; P rejud = p; Y ear12 = y; W = w) = e + w + 0:273 1:912 + 3:495p 2:476p 0:824p + 0:307p + 1:062 + 0:029y: If a white Republican changes her/his prejudice level from zero to one, then her/his willingness to vote for Obama increases by 0:502. The predicted willingness to vote for Obama for white Democrats can be written as: E(Y jW hite = 1; Rep = 0; P rejud = p; Y ear12 = y; W = w) = e + w + 0:273 0:824p + 0:307p + 0:029y: If a white Democrat changes her/his prejudice level from zero to one, then her/his willingness to vote for Obama decreases by 0:517, but still none of the coe¢ cients for white Democrats are signi…cant. Given a level of prejudice p, for whites the predicted willingness to vote for Obama can be written as: E(Y jW hite = 1; Rep = r; P rejud = p; Y ear12 = y; W = w) = e + w + 0:273 1:912r + 3:495r p 2:476r p 0:824p + 0:307p + 1:062r + 0:029y: If party a¢ liation changes from Democrat to Republican, willingness to vote for Obama changes by 0:85 + 1:019p, which implies that the e¤ect of Republican changes from negative to positive as the level of prejudice increase. For white Republicans, this positive prejudice e¤ect is highly contradictory. It could be seen as evidence of our second argument about white Republicans’ voting behaviour; the more prejudice white Republicans had, the more likely they were to vote for Obama in order to avoid being judged as racist. Since it was the …rst election 28 where a black candidate ran for president, white Republicans might have been under strong pressure since they were known to have a high level of prejudice ‘as a group’. If Obama were not elected, then it could have been attributed to white Republicans and they would easily have been judged as racist. They might have struggled between party identity and being blamed for being racist, and it might have pushed them to vote for Obama reluctantly in order to escape the stigma of racism (Steele, 2008). Thus, prejudice …nally played a crucial role in reducing the negative e¤ect of white Republicans. On the other hand, for white Democrats the prejudice seems to have a¤ected their voting decisions di¤erently. Since they are known to be favourable to blacks ‘as a group’, they would be not blamed even if Obama were not elected. Thus, the negative prejudice might a¤ect their decision, subconsciously leading to their not voting for Obama. However, this e¤ect is shown to be insigni…cant. As other variables, such as “College”, “Over50”, and regional variables, are controlled for, the e¤ect of “Prejudice” for white Republicans gradually decreases. Once regional variables are controlled for, all coe¢ cients become insigni…cant. It would be because the e¤ect of “Prejudice” for whites by party a¢ liation is di¤erent by region. To address this issue, we estimate the same model used in the column (2) for each region, i.e. “Northeast”, “North Central”, “South”, and “West”, and the results are presented in Table 4. In Table 4, the e¤ects of “Prejudice”for white Democrats and white Republicans are shown to be insigni…cant, except in the South. The predicted willingness for white Republicans in the South to vote for Obama is E(Y jW hite = 1; Rep = 1; P rejud = p; Y ear12 = y; W = w) = e + w + 0:513 3:788 + 6:509p 4:46p 1:313p + 0:73p + 2:228 + 0:124y: If a white Republican changes her/his prejudice level from zero to one, then her/his 29 willingness to vote for Obama increases by 1:466. On the other hand, the predicted willingness for white Democrats in the South to vote for Obama is E(Y jW hite = 1; Rep = 0; P rejud = p; Y ear12 = y; W = w) = e + w + 0:513 1:313p + 0:73p + 0:124y: If a white Democrat changes her/his prejudice level from zero to one, then her/his willingness to vote for Obama decreases by 0:583. However, none of the coe¢ cients for white Democrats are signi…cant. In the South, the e¤ect of “Prejudice” for white Republicans is strongly positive and higher than its national level, whereas the e¤ect of “Prejudice” for white Democrats is shown to be negative. It implies that white Republicans in the South are more likely to vote for Obama as they have a higher level of prejudice, and it does make sense in that they might be under much pressure about the word of ‘South’, in addition to the word of ‘Republican’, both known to have a high level of prejudice. Given the level of prejudice p, for whites in the South the predicted willingness to vote for Obama can be written as: E(Y jW hite = 1; Rep = r; P rejud = p; Y ear12 = y; W = w) = e + w + 0:513 3:788r + 6:509r p 4:46r p 1:313p + 0:73p + 2:228r + 0:124y: If party a¢ liation changes from Democrat to Republican, then willingness to vote for Obama changes by 1:56 + 2:049p, which implies that if the prejudice level is higher than 0:76, then the e¤ect of “Republican”becomes positive. In other regions, the conclusion does not di¤er much from that on the national level, but we cannot con…rm this conclusion because of the insigni…cant coe¢ cients. These insigni…cant results might be due to small sample size. 30 Table 4. Results Using LBS for Regions Variable White Wht Rep Wht Rep Prej Prej Rep Prej White Prejudice Republican Year2012 Sample Size 4.4 N.Central Northeast South West -0.259 -0.898 0.513 -0.023 (-1.25, 1.02) (-2.36, 0.13) (-0.47, 1.65) (-1.26, 1.29) 0.411 1.419 (-3.68, 4.56) (-5.4, 7.04) -0.478 -3.046 (-8.18, 6.35) (-13.5, 8.93) 0.587 2.217 (-6.34, 7.97) (-9.37, 12.8) (-5.67, -0.08) (-6.22, 0.44) -0.12 1.263 -1.313 0.048 (-2.13, 2.00) (-0.72, 4.09) (-3.37, 0.37) (-2.37, 2.38) 0.281 0.206 0.73 0.338 (-0.43, 0.75) (-0.62, 0.77) (-0.03, 0.86) (-0.07, 0.68) -0.798 -1.251 2.228 1.704 (-5.22, 3.31) (-6.97, 5.85) (-0.44, 2.93) (-0.62, 3.44) -0.023 -0.051 0.124 -0.093 (-0.11, 0.15) (-0.24, 0.08) (-0.03, 0.21) (-0.23, 0.08) 550 406 1366 764 -3.788 (-4.95, -0.48) 6.509 (1.31, 8.54) -4.46 -2.303 (-4.21, 0.42) 3.81 (-1.04, 7.16) -3.432 Comparison to MLE For the sake of comparison, estimation results of MLE for both selection and outcome equations are presented in Table 5, along with the results of LBS. We estimate two di¤erent MLEs: conventional MLE with homoskedastic errors (MLE_ho) and another MLE with an (arbitrary) known form of heteroskedastic errors (MLE_he) with SD(ujZ) = Ze0 where Ze (P rejudice, Republican, P rejudice Repuplican). Since we use probit for the D equation estimation of LBS, the results are almost the same as ^ in the MLE; since at the bottom row shows that ‘H0 : 31 = 0’cannot be rejected: ^ in MLE is almost the same as the probit for the D equation. To simplify the comparison, we present the estimates for e x= w of MLE which is a coe¢ cient normalized by the slope of the special regressor. The 90% asymptotic statistical signi…cance is shown for MLE with ‘ ’along with the 90% con…dence interval (CI). In MLE_ho, the …rst column presenting results for the D equation shows that the e¤ect of “Republican”on voting for whites is 0:849 0:638p. It implies that white Republicans are more likely to participate in voting, but this e¤ect decreases as they have a higher level of prejudice. The prejudice turns out to have a negative e¤ect on voting for whites. Applying probit to the D equation and doing the likelihood ratio test for zero slopes for the four prejudice variables, we reject the test with pvalue 0:00: prejudice does matter in the decision to vote or not to vote. As expected, “NoCareWhoWin” has a strong e¤ect on voting and its e¤ect signi…cantly di¤ers in the 2008 and 2012 data; ironically people who did not care who will the presidential election are more likely to participate in voting in 2012 (its marginal e¤ect in 2012 turns out to be positive). Turning to e in MLE_ho, the signs of signi…cant coe¢ cients are shown to be matched in MLE_ho and LBS, but the sizes of coe¢ cients are di¤erent from LBS. “White Republican”, “White Republican Prejudice”, and “Prejudice Republican” are signi…cant in both MLE_ho and LBS, and “White”, “Prejudice White”, and “Republican” are insigni…cant in both MLE_ho and LBS. However, “Prejudice” is substantially di¤erent, and it might be due to the presence of the heteroskedastic errors. As mentioned in the simulation section, if SD of the error terms u is a function of regressors, then the identi…ed coe¢ cients of MLE are biased. 32 Table 5. Comparison Results of MLE Variable MLE_ho in D Intercept White White Rep 2.329 Prejudice Wht Prejudice -0.027 (-0.79, 1.32) (-1.76, 1.70) 3.527 -5.049 4.411 8.071 -5.465 -1.504 (-2.44, 1.31) (-4.56, 1.55) -2.678 -1.274 2.199 -0.014 -2.535 -4.215 4.301 -2.913 -0.801 (-0.07, 0.20) 3.532 -5.055 4.406 -0.587 -2.465 -2.676 -1.274 -7.719 Y_e Y_e 0.494 0.092 2.006 (-0.44, 0.97) -2.604 (-9.44, -6.0) 14.56 4.911 (11.7, 17.4) -6.109 -2.061 -1.933 -1.408 -0.040 -0.001 (-0.07, 0.06) 1.000 2.964 (1.51, 2.24) (1.92, 4.01) 0.008 -0.014 (-0.32, 0.33) (-0.09, 0.07) 33 0.037 (-0.08, 0.10) (1.92, 2.35) 1.876 1.359 (-0.28, 2.03) 2.132 (1.92, 2.35) 0.457 (-0.13, 0.82) (-1.23, 0.99) -0.004 -0.486 (-2.05, 0.38) (-8.3, -0.05) -0.120 -2.930 (-3.92, -0.14) (-11.3, -0.11) -4.176 3.715 (0.69, 5.18) (-6.71, -5.51) -5.728 -2.034 (-2.93, -0.12) (-1.14, -0.85) 2.132 ThmLiberal (-1.56, 4.49) LBS -0.995 (-1.14, -0.85) NCWW Y12 (-1.88, 2.43) (-1.43, -1.12) -0.995 5.946 1.463 (-3.8, -1.56) 0.035 in Y 0.279 (-3.56, -1.37) 0.663 u (4.02, 7.88) (-4.43, 3.26) -2.247 = (1.73, 2.91) (2.61, 6.2) (-1.52, 4.01) 0.065 2.323 (-6.01, -4.1) (-6.44, -1.99) 1.244 in D (3.01, 4.05) (2.18, 13.96) -0.565 -2.475 Y_e (-8.20, -1.32) (-10.1, -0.77) (-1.43, -1.12) NoCareWW -4.756 (1.35, 7.47) (-4.50, -0.86) Year2012 4.126 0.267 (-3.55, -1.40) Republican in Y (2.89, 5.36) (-9.01, -1.09) Prejudice Rep u (1.74, 2.92) (1.19, 5.86) Wht Rep Prej = MLE_he 1.000 We examine an arbitrary form of heteroskedastic errors in MLE which is a function of “Prejudice” and “Republican”, and present its results in the next three columns of Table 5 denoted by MLE_he. When the speci…c form of heteroskedastic errors are allowed, MLE’s results for e considerably change. Based on MLE’s results, if a white Republican changes her/his prejudice level from zero to one, then her/his willingness to vote for Obama is shown to decrease by 1:66 in MLE_ho, but 0:491 in MLE_he. Thus, we cannot eliminate the possibility of inconsistency of MLE due to the heteroskedaticity. Additionally, zero correlation coe¢ cient in both MLEs which indicates no sample-selection problem is in contrast to other literatures in political science. Did the prejudice against blacks play a role in the two presidential elections? We conclude yes because the variables interacting with “Prejudice” are found to be signi…cant in most cases. It is shown that “Prejudice”did not have a signi…cant e¤ect for non-white and white Democrats, but it did for white Republicans. They were less likely to vote for Obama, but this negative e¤ect decreased as they had a high level of prejudice. Also, this e¤ect is found to be strong in the South which is known to be unfavourable to blacks. The results might be interpreted as follows; …rst, for the white Republicans, they might have been stigmatized as being prejudiced and longed for ways to disprove the stigma. Thus, on the margin, they might have voted more to avoid the stigma of racism. However, the white Democrats might have been less stigmatized, thus no pressure to hide their prejudice: thus the prejudice would have a¤ected their voting decision negatively, but it is found to be insigni…cant. 5 Conclusions In this paper, we proposed a new semiparametric estimator for binary-outcome selection models. Unlike the MLE, our estimator does not require any distributional assumption and allows heteroskedasticity with a linear index form. Our estimator, called “LBS (Lewbelian Binary-outcome Selection estimator)” is a multi-stage estimator in 34 need of a preliminary estimator for the selection equation as well as some conditional means of regressors nonparametrically estimated. Unlike most parametric and semiparametric estimators for this problem, our estimator for the outcome equation has a closed-form expression and therefore does not require numerical optimization. LBS, however, needs a regressor included in the selection equation but excluded from the outcome equation, and a special regressor in the outcome equation á la Lewbel (2000)– this explains “Lewbelian”in the term LBS. We applied the LBS to US presidential election data in 2008 and 2012 where Obama won. Our main empirical focus was whether there were negative prejudice components against blacks in the elections, which was suggested by both the literature and our preliminary data analysis. When we controlled for the sample-selection problem, our results showed that white Republicans voted more on the margin to avoid the internal stigma of thinking themselves racists as they have a high level of prejudice, while white Democrats were not or at least negatively a¤ected by the prejudice they had. And this pattern was shown to be strong and signi…cant in the South. 35 References Ahn, H. and J. L. Powell, 1993, Semiparametric estimation of censored selection models with a nonparametric selection mechanism, Journal of Econometrics, 58, 3-29. Bertrand, M. and S. Mullainathan, 2004, Are Emily and Greg more employable than Lakisha and Jamal? A …eld experiment on labor market discrimination, American Economic Review, 94, 991-1013. Charles, K. K. and J. Guryan, 2008, Prejudice and Wages: An Empirical Assessment of Becker’s The Economics of Discrimination, Journal of Political Economy, 116, 773-809. Chen, S., 1999, Distribution-free estimation of the random coe¢ cient dummy endogenous variable model, Journal of Econometrics, 91, 171-199. Chen, X., O. Lindon, and I. Van Keilegom, 2003, Estimation of semiparametric models when the criterion function is not smooth, Econometrica, 71, 1591-1608. Chen, S. and Y. Zhou, 2010, Semiparametric and nonparametric estimation of sample selection models under symmetry, Journal of Econometrics, 157, 143-150. Coker, D., 2003, Foreword: Addressing the real world of racial injustice in the criminal justice system, Journal of Criminal Law and Criminology, 93, 827-880. Cunningham, G.B., 2010, Understanding the under-representation of African American coaches: A multilevel perspective, Sport Management Review, 13, 395-406. Donald, S. G., 1995, Two-step estimation of heteroscedastic sample selection models, Journal of Econometrics, 65, 347-380. Dong, Y., and A. Lewbel, 2015, A simple estimator for binary choice models with endogenous regressors, Econometric Reviews, 34, 82-105. Dovidio, J. F. and S. L. Gaertner, 2000, Aversive racism and selection decisions: 1989 and 1999, Psychological Science, 11, 315-319. Ehrlinger, J., E. A. Plant, R. P. Eibach, C. J. Columb, J. L. Goplen, J. W. Kunstman, and D. A. Butz, 2011, How exposure to the confederate ‡ag a¤ects willingness to vote for Barack Obama, Political Psychology, 32, 131-146. 36 Escanciano, J. C., D. Jacho-Chavez, and A. Lewbel, 2012, Identi…cation and estimation of semiparametric two-step models, Unpublished manuscript. Han, A. K., 1987, Non-parametric analysis of a generalized regression model, Journal of Econometrics, 35, 303-316. Heckman, J. J., 1979, Sample selection bias as a speci…cation error, Econometrica, 47, 153-161. Heckman, J. J., 1998, Detecting discrimination, Journal of Economic Perspectives, 12, 101-116. Heckman, J. J. and P. Siegelman, 1992, The urban institute audit studies: Their methods and …ndings, in clear and convincing evidence: measurement of …scrimination in America, edited by Fix, M. and R. J. Struyk, Washington, DC: Urban Institute Press. Hodson, G., J. F. Dovidio, and S. L. Gaertner, 2002, Processes in racial discrimination: Di¤erential weighting of con‡icting information, Personality and Social Psychology Bulletin, 28, 460-471. Hodson G., H. Hooper, J. F. Dovidio, and S. L. Gaertner, 2005, Aversive racism in Britain: The use of inadmissible evidence in legal decisions, European Journal of Social Psychology, 35, 437-448. Hutchings, V. L., 2009, Change or more of the same? Evaluating racial attitudes in the Obama era, Public Opinion Quarterly, 73, 917-942. Ichimura, H., 1993, Semiparametric least squares (SLS) and weighted SLS estimator of single-index models, Journal of Econometrics, 58, 71-120. Kinder, D. R. and A. Dale-Riddle, 2011, The end of race?. New haven, Yale University Press. Kinder, D. R. and T. Mendelberg, 1995, Cracks in American apartheid: The political impact of prejudice among desegregated Whites, Journal of Politics, 57, 402424. Klein, R. W., and R. H. Spady, 1993, An e¢ cient semiparametric estimator for 37 discrete choice models, Econometrica 61, 387-421. Klein, R.W., C. Shen, and F. Vella, 2015, Semiparametric selection models with binary outcomes, Journal of Econometrics, 185, 82-94. Knowles, D. E., B. S. Lowery, and R. L. Schaumberg, 2001, Racial prejudice predicts opposition to Obama and his health care reform plan, Journal of Experimental Social Psychology, 46, 420-423. Lewbel, A., 2000, Semiparametric qualitative response model estimation with unknown heteroscedasticity or instrumental variables, Journal of Econometrics, 97, 145177. Lewbel, A., 2007, Endogenous selection or treatment model estimation, Journal of Econometrics, 141, 777-806. Lewis-Beck, M., C. Tien, and , R. Nadeau, 2010, Obama’s missed landslide: A racial cost?, Political Science and Politics, 43, 69-76. Mas, A. and and E. Moretti, 2009, Racial Bias in the 2008 Presidential Election, American Economic Review: Papers & Proceedings, 99, 323-329. Newey, W., J. L. Powell, and J. R Walker, 1990, Semiparametric estimation of selection models: Some empirical results, American Economic Review, 80, 324-328. Piston, S., 2010, How explicit racial prejudice hurt Obama in the 2008 election, Political Behavior, 32, 431-451. Powell, J. L., 1987, Semiparametric estimation of bivariate latent variable models, Department of Economics, University of Wisconsin-Madison, SSRI Working Paper No. 8704. Powell, J. L., 2001, Semiparametric estimation of censored selection models in nonlinear statistical modeling, edited by C. Hsiao, K. Morimune, and J.L. Powell, Cambridge University Press. Powell, J. L., J. H. Stock, and T. S. Stoker, 1989, Semiparametric estimation of index coe¢ cients, Econometrica, 57, 1403-1430. Raphael, S., M. A. Stoll, and H. J. Holzer, 2000, Are suburban …rms more likely 38 to discriminate against African Americans?, Journal of Urban Economics, 48, 485-508. Redlawsk, D. P., C. J. Tolbert, and W. Franko, 2010, Voters, emotions, and race in 2008: Obama as the …rst black president, Political Research Quarterly, 63, 875-889. Robinson, P., 1988, Root-N-consistent semiparametric regression, Econometrica, 56, 31-954. Scha¤ner, B. F., 2011, Racial salience and the Obama vote, Political Psychology, 32, 963-88. Sherman, R. P., 1993, The limiting distribution of the maximum rank correlation estimator, Econometrica, 61, 123-37. Steele, S., 2008, Obama’s post-racial promise, The Los Angeles Times. Stoll, M. A., S. Raphael, and H. J. Holzer, 2004, Black job applicants and the hiring o¢ cer’s race, Industrial and Labor Relations Review, 57, 267-287. Tesler, M. and D. O. Sears, 2010, Obama’s race: The 2008 election and the dream of a post-racial America, University of Chicago Press. Thernstrom, A., 2008, Great black hope? The reality of president-elect Obama, National Review Online. 39 Appendix I Table1 Continued. Simulation Results for LBS and MLE LBS MLE 1= w N Bias 1= u w= u 1= w RMSE Bias RMSE Bias RMSE Bias RMSE FP (5) ri = 1[ri > 0] 500 -0.040 0.173 -0.008 0.177 0.012 0.211 -0.006 0.066 25.7% 1000 -0.015 0.119 -0.010 0.107 -0.010 0.144 -0.008 0.050 9.3% 2000 -0.004 0.082 -0.003 0.078 -0.015 0.098 0.002 0.033 1.3% (6) = 0:05 500 -0.005 0.139 0.062 0.204 0.085 0.274 0.009 0.073 4.0% 1000 0.005 0.093 0.022 0.130 0.056 0.189 -0.003 0.041 0.0% 2000 0.002 0.070 0.019 0.091 0.029 0.130 0.002 0.034 0.0% (7) = 0:95 500 -0.015 0.131 -0.141 0.201 -0.270 0.342 -0.006 0.078 19.7% 1000 -0.006 0.090 -0.149 0.176 -0.293 0.322 -0.002 0.048 6.3% 2000 -0.003 0.069 -0.166 0.177 -0.331 0.344 -0.001 0.034 0.7% 40 Figure 5: Voting Behavior for Hispanic and Other by Party 41
© Copyright 2026 Paperzz