Estimating Candidate Positions in a Polarized Congress

Estimating Candidate Positions in a Polarized Congress∗
Chris Tausanovitch†
Department of Political Science
UCLA
Christopher Warshaw‡
Department of Political Science
Massachusetts Institute of Technology
February 13, 2016
Word Count: 9,650
Abstract: In order to test theories of legislative polarization, representation, and accountability, it is crucial to have accurate measures of candidates’ policy positions. To address
this challenge, scholars have developed a variety of innovative measurement models based on
survey data, campaign finance contributions, and social networks. But there has not been
a comprehensive evaluation of these methods that examines their accuracy and usefulness
for testing important theories. In this paper, we find that each of these measurement models accurately estimates the political party of legislative candidates, but they do poorly at
distinguishing the ideological extremity of candidates within each party. As a result, they
fall short when it comes to facilitating empirical analysis of theories of representation and
spatial voting. More generally, our findings suggest that even with large amounts of data
and advanced statistical models it is very difficult to predict candidates’ policy positions.
This has profound implications for democratic governance.
∗
We are grateful for feedback about this project from Gregory Huber, Seth Hill, Howard Rosenthal,
Adam Bonica, Walter Stone, Boris Shor, Nolan McCarty, Jon Rogowski, Pablo Barbera, Adam Ramey and
participants at the 2015 American Political Science Association Conference.
†
Assistant Professor, Department of Political Science, UCLA, [email protected].
‡
Assistant Professor, Department of Political Science, Massachusetts Institute of Technology, [email protected].
1
Thanks to a large body of innovative work, political scientists have access to rigorous measures of the policy positions of incumbent legislators based on their roll call positions (e.g.,
Poole and Rosenthal, 2011; Clinton, Jackman, and Rivers, 2004; Groseclose, Levitt, and Snyder, 1999). This work has spawned a large literature on the causes and effects of legislators’
ideological extremity. However, the seminal work in the field was generally limited to incumbents, leaving scholars with little information on the policy positions of non-incumbents
running for Congress. Over the past decade, scholars have increasingly focused on developing comparable measures of incumbent and non-incumbent candidates’ policy positions
in order to test theories of political economy and representation. Are ideologically extreme
candidates punished at the ballot box (Black, 1948; Downs, 1957; Enelow and Hinich, 1984;
Hall, 2015)? How much does the ideological leaning of a district influence the ideological
positions of candidates that run for Congress (Ansolabehere, Snyder Jr, and Stewart, 2001)?
How much does the available pool of candidates affect the degree of legislative polarization
(Thomsen, 2014)? Does variation in electoral rules in primaries affect the spatial positions
of candidates that run for office (Kousser, Phillips, and Shor, 2015; Ahler, Citrin, and Lenz,
Forthcoming; Rogowski and Langella, 2014)? Do ideologically extreme candidates raise less
money than centrist candidates (Ensley, 2009)?
In order to address these important theoretical questions, scholars have developed a
variety of innovative measurement models to estimate candidates’ ideological positions. One
approach is to estimate candidate positions based on their political positions outside of
Congress, such as their roll call votes in state legislatures (Shor, Berry, and McCarty, 2010;
Shor and McCarty, 2011) or responses to surveys (Montagnes and Rogowski, 2014). Another
approach is to use data on the perceptions of voters (Aldrich and McKelvey, 1977; Hare et al.,
2014; Ramey, 2016) or experts (Joesten and Stone, 2014; Maestas, Buttice, and Stone, 2014;
Stone and Simas, 2010) about candidates’ ideological positions. A third approach is to
assume that some set of behavior by citizens or donors is based on their implicit perceptions
of candidates’ positions. For instance, Bonica (2013b, 2014) and Hall and Snyder (2015)
1
estimate the ideology of political figures using the composition of their campaign donors based
on the plausible assumption that donors give to candidates who are similar to themselves
politically. Following similar logic, Barberá (2015) estimates positions using the constellation
of individuals who follow a candidate on the social media service Twitter.
Despite their importance, little effort has been put into comparing these methods and
rigorously evaluating their accuracy and usefulness for testing theories of interest. In this
paper, we test the accuracy of six prominent measures of candidate positions by examining
their convergent validity (Adcock and Collier, 2001).1 In other words, we examine the relationship between each measure and candidates’ roll call behavior.2 Specifically, we examine
how well they predict legislators’ DW-Nominate scores during the period between 2000-2012
within each party. We focus on this period because many empirical studies focus on recent
congresses, and these congresses may be particularly difficult to predict because they are so
polarized.
Our findings indicate that all six measures correctly classify candidates into the appropriate party. However, none of these measures provide accurate estimates of candidates’ roll
call positions within their party. The low within-party accuracy of these measures can be
seen by looking at prominent individual members of Congress. For instance, Republican congressman Dave Reichert’s DW-Nominate score places him among the most liberal members
of his party in 2010. However, measures of his spatial position based on survey respondents
(Ramey, 2016) and his campaign finance (CF) contributions (Bonica, 2014) place him in the
most conservative tercile of his party. Peter King’s roll call record also places him among
the most liberal members of his party. His Twitter score, however, places him in the more
conservative half of the Republican party. On the Democratic side, Henry Waxman’s DWNominate score places him among the most liberal group of Democrats. But his CF-Score
1
We also attempted to evaluate the validity of the measures of candidate ideology in Bond and Messing
(2015). However, the authors of this study were unable to share replication data due to Facebook’s privacy
policy.
2
This approach is supported by the validation strategy used by the underlying papers for these models.
Indeed, each of the underlying papers for these models use roll call behavior as benchmark for their results.
2
places him among the most conservative tercile of Democrats.
We find that none of the existing methods explain more than half of the variation in
incumbent House members’ contemporaneous DW-Nominate scores within each party. They
perform even more poorly at predicting the roll call behavior of non-incumbent candidates
that go on to win election and serve in Congress. Overall, these existing methods improve
only marginally on candidates’ party identification for predicting their roll call behavior in
Congress.3
The fact that sophisticated models fail (so far) to predict legislative behavior is an interesting finding in itself, as it bodes poorly for the ability of average citizens to predict the
behavior of the candidates they must choose between. Moreover, the modest within-party
relationship between these measures and candidates’ roll call positions is problematic because
most of the fundamental questions about political economy, polarization, and representation
that we wish to answer involve comparisons of candidates within their parties.
In the penultimate section of the paper, we show that the measurement error in the nonroll call based measures of candidate positions leads to inferential errors for two important
questions in political science. First, we compare changes in polarization over time across
DW-Nominate and other time-varying measures of candidate positions (NPAT-scores and
CF-scores). These measures all show dramatically different stories regarding the relative
changes in polarization over the past decade. The substantively different trends in polarization across models indicate that it is unlikely that these models are actually measuring
the same latent quantity. Moreover, the variation in polarization between CF-scores and
DW-Nominate calls into questions the usage of CF-scores to examine polarization outside
of Congress (e.g., Rogowski and Langella, 2014). Next, we examine how inferences about
the effect of legislators’ ideology on distributive spending vary across measures. In line with
Alexander, Berry, and Howell (2016), we show that legislators with extreme DW-Nominate
scores get less distributive spending. However, there is no significant relationship between
3
It is important to note that we focus our analysis on recent Congresses. It is possible that these measures
perform better in earlier, less polarized, Congresses.
3
other measures of ideology and distributive spending, which suggests that their usage as
a proxy for legislators’ roll call behavior could lead to incorrect inferences on fundamental
questions in political economy and the study of legislatures.
While the measures we evaluate in this paper perform poorly at predicting legislators’
roll call positions, they do have a number of other valuable uses. Because each of the
measures we evaluate accurately classify candidates into the correct party, they could be
extremely valuable for contexts where partisanship is not readily available. For instance,
these measures could be used to impute the partisanship of candidates running in nonpartisan elections (de Benedictis-Kessner and Warshaw, 2015). They could also be used to
impute the partisanship of voters when survey data is unavailable (Hill and Huber, 2015).
In addition, the mismatch between these measures and legislators’ roll call behavior raises
a host of interesting questions. For example, the mismatch between survey respondents’
perceptions and candidates’ actual roll call positions suggests the need for new research
to determine whether legislators are strategically manipulating their positions in campaigns
(see, e.g., Cormack, 2015; Henderson, 2013). These measures also have a number of potential
applications for specific substantive questions that are unrelated to legislative behavior, such
as the campaign finance behavior of lawyers (Bonica and Sen, 2015) and physicians (Bonica,
Rosenthal, and Rothman, 2014).
The paper proceeds as follows. First, we discuss background theories and literature on
the task of estimating candidate positions. Next, we discuss the benchmark model, DWNominate, that we use to evaluate measures of candidate positions. We also discuss the
measurement models of candidate positions that we evaluate in this paper in more detail.
In the following section, we discuss our validation strategy. Then, we evaluate the various
measures of candidate positions using a variety of different approaches. The penultimate
section examines variation in substantive inferences across different measures of candidate
positions for the study of polarization and distributive spending. The final section briefly
concludes.
4
Background
Measuring the ideological preferences and behavior of political officeholders and candidates
is central to the study of American Politics. Most of the canonical work on measuring
candidates’ ideology has focused on incumbent legislators’ roll call behavior. Indeed, roll
call positions are the gold standard for measuring legislators’ political positions. However,
an important challenge is that roll call behavior is only available for incumbents. In order
to test theories of representation, polarization, and accountability, scholars need measures of
the ideological positions of both incumbents and non-incumbents. For example, in order to
examine whether polarization is increasing, it is important to know whether the ideological
positions of Democratic and Republican candidates in each district are diverging over time
(Ansolabehere, Snyder Jr, and Stewart, 2001). To examine theories of spatial voting, we need
to know whether voters are more likely to vote for the more spatially proximate candidate,
which requires measures of the ideological positions of both Democratic and Republican
candidates (e.g., Jessee, 2012; Joesten and Stone, 2014; Shor and Rogowski, 2015).
In order to address these important substantive questions, a large body of methodological work has been done in recent years to measure the ideological positions of incumbents
and non-incumbents on a common scale. For instance, Bonica (2013b, 2014) estimates the
ideology of both incumbent and non-incumbent candidates based on the composition of their
campaign donors. The primary validation metric for models of candidates’ spatial positions
is typically the proportion of the variation in legislators’ roll call behavior that they explain
(e.g., Barberá 2015, Bonica 2014, Hare et al. 2014, Joesten and Stone 2014). Indeed, several
papers highlight that they successfully predict 90% or more of the variation in incumbent legislators’ ideal points on roll call votes. However, there are two problems with this validation
strategy.
First, a good measure of candidate ideology should also be able to outperform measures
that are much simpler and more parsimonious. In recent years, over 90% of the variation
in DW-Nominate scores can be predicted by the party identification of the legislator. Po5
larization in Congress has been on the rise since the 1970s (Poole and Rosenthal, 2011). As
the parties have become more extreme and more homogeneous, across-party prediction of
DW-Nominate scores has become easier and within-party prediction more difficult. Thus,
many measures are able to report very high correlations with DW-Nominate because they
have very high correlations with party ID.
The problem with such a measure is not just that it might as well be replaced with party
identification. Understanding within-party variation in preferences is vitally important for
understanding polarization and spatial voting. Polarization is a process by which extreme
legislators are replacing moderates within each party. In order to identify instances of this
process, we need measures of preferences that can accurately identify which candidates in
nomination contests are more extreme than other candidates within their party. Likewise,
spatial voting involves judgements about which candidates are closer in some sense to particular voters, which requires accurate measures of the spatial location of candidates within
their party.
Second, existing papers generally focus on their ability to estimate the positions of incumbent legislators. But we already have good estimates of incumbent legislators’ behavior
based on their roll call positions. Thus, the most common use of the estimates from the
recent wave of models is to provide estimates of non-incumbents’ spatial positions. Few of
the existing papers validate their measures of non-incumbents’ positions against their future
roll call positions.4
There are a variety of reasons to think that pre-election measures of candidates’ ideology
may not be accurate predictors of their future roll call records. Although candidates make
commitments and promises during their campaigns, these commitments are rarely enforceable (Alesina, 1988). Incumbent legislators are widely believed to be in a highly advantageous
position to win reelection (Gelman and King, 1990; Lee, Moretti, and Butler, 2004), so pun4
An exception is Bonica (2014, 371), which validates campaign-finance (CF) scores against candidates’
future DW-Nominate scores. Also, Bonica (2013b, 298-299) validates campaign-finance (CF) scores for
non-incumbents against the same candidate’s future CF-Score. But it does not validate them against
candidates’ future roll call behavior.
6
ishing legislators for unkept promises may be difficult, and may even risk electing a legislator
from the opposite party. The quirks of political geography are also important in shaping
candidate’s support bases. Social media commentators, donors, and the public are limited
in the choice of viable candidates to support in any particular district. Information gleaned
from these relationships may be a feature of the limited choice set rather than true similarity. As a result, we should not assume that measures based on these sources will ultimately
reflect actual legislative behavior.
Benchmark Model of Legislator Policy Positions
As our benchmark metric, we use candidates’ DW-Nominate scores (Poole and Rosenthal,
2011), which are estimated using actual roll call votes cast in Congress. DW-Nominate is a
measure of legislators’ induced preferences: their preferred choices given the combination of
their own personal beliefs and the incentives they face. DW-Nominate is the gold standard
for measuring legislator preferences, broadly construed, because legislative action, not public
communication, is what is implicated in most theories of spatial voting, representation, and
polarization. If elections are a meaningful constraint, they must constrain what legislators
do, not just what legislators say during the campaign.5 Moreover, Nominate scores are also
the benchmark used by nearly all of the existing measurement models of candidate positions
that we assess below (Barberá 2015, 82, Bonica 2014, 370-371, Hare et al. 2014, 769-770,
Joesten and Stone 2014, 745).
A legislator’s Nominate score is a measure of the ideal point, xi , of legislator i. In
considering a bill, j, legislators choose the outcome that gives them greater utility: either
the status quo, aj or the policy that would be enacted if the bill were passed, bj . Their
utility for any outcome is a function of the distance between their ideal point, xi , and the
5
Of course, it need not be the case that a legislator’s DW-Nominate score agrees with the image that she
tries to portray of herself, or her own “true” preferences. Indeed, there is research showing that legislators
often try to give an impression of themselves that does not reflect their voting records (Cormack, 2015;
Henderson, 2013).
7
outcome in question, aj or bk , plus a random error that represents idiosyncratic or random
features of the legislator’s utility. If the status quo point is ‘closer’ to what the legislator
wants, then she votes nay. If the bill is closer, she votes yea. The only exception is if the
random shock to her utility is enough to make her prefer the less close option more. This
will be more likely when the legislator is close to indifferent between the two options. If we
make a few simplifying assumptions, we can write the probability that a legislator votes in
favor of a bill (yea) is as follows:6
P (yij = Y ea) = P ((xi − bj )2 − (xi − aj )2 + ij > 0)
(1)
The probability of a vote against (nay) is one minus the probability of a vote in favor.
The likelihood of the model is simply the product of the likelihoods of every vote. This
model is often referred to as the quadratic utility item response model. The “ideal point”
summarizes a legislator’s preferences in the sense that legislators will tend to prefer bills that
are closer to their ideal points on average. Observing simply the y matrix of vote choices,
we can estimate the latent x’s that underly those choices.
Alternative Measures of Candidate Positions
In recent years, scholars have developed three broad groups of measurement models to estimate the spatial locations of both incumbents and non-incumbents based on some set of
information other than roll call votes in Congress. These measurement models all assume
that some observed behavior is generated by unobserved, latent preferences. Thanks to the
finding by Poole and Rosenthal that in recent congresses one-dimensional summaries of voting are almost as good as much higher dimensional summaries, all of the measures under
6
Poole and Rosenthal (2011) put flesh on this model by assuming a normal curve as the shape of the utility
functions, and errors ij that are logistically distributed. A much simpler formula results if we use quadratic
utility with normal errors (Shor and McCarty, 2011). Clinton, Jackman, and Rivers (2004) show that the
results of this model are almost identical to the results of Nominate.
8
study are unidimensional.7 So in each case, the ideology of a given individual is summarized
by a single number, which we will denote as the variable xi where i indexes candidates.
The choices in question often have features that are taken into account as well- choices will
be indexed by j. In order to contrast the models used, we will attempt to harmonize the
notation, and depart from that used by the original authors.
Models of Ideology Based on Political Positions Outside Congress
One potential approach for measuring the ideology of candidates is to use information from
their political positions outside of Congress. For instance, we could estimate the ideology
of state legislators that run for Congress based on their roll call votes in state legislatures.
Shor and McCarty (2011) use a spatial utility model similar to equation 1 to estimate state
legislators’ ideal points based on their roll call voting records from the mid-1990s to 2012.
They bridge together the ideal points of state legislators in different states using surveys of
legislators from 1996 to 2009. In total, they estimate the positions of 18,000 state legislators.8
Of course, only a fraction of these state legislators become candidates for Congress, and even
fewer win election to Congress. Moreover, a changing constituency in Congress may lead
candidates to adapt their behavior (Stratmann, 2000).
Another approach is to use only candidates’ responses to questionnaires about their
positions. The most widely used questionnaire is the National Political Awareness Test
(NPAT) survey conducted by Project Vote Smart. This is the survey that Shor and McCarty
(2011) use to link legislators from different states. Ansolabehere, Snyder Jr, and Stewart
(2001) use factor analysis to estimate candidates’ spatial positions based on the NPAT survey.
More recently, Montagnes and Rogowski (2014), Shor and Rogowski (2015), and others use a
7
Technically, the DW-Nominate and W-Nominate scores are two dimensional, but almost all the information in recent congresses is supplied by the first dimension. All of the other measures are explicitly
one-dimensional
8
It is important to note that these measures are important in their own right for the study of polarization,
representation, and accountability in state legislatures, regardless of their ability to predict congressional
candidates’ positions.
9
spatial utility model similar to equation 1 to estimate candidates’ ideal points based on their
NPAT responses. These estimates have been widely used in the applied, empirical literature
for studies on polarization, spatial voting, elections, and other topics.
Models of Ideology Based on Perceptions of Candidate Positions
Rather than using roll call votes, another approach is to estimate candidate positions from
survey respondents’ or experts’ explicit perceptions of candidates’ ideological positions. This
approach has the benefit of providing estimates for candidates that did not serve in the state
legislature. Indeed, conceptually one could imagine survey respondents or experts rating
thousands of candidates for all levels of office.
Stone and Simas (2010) and Joesten and Stone (2014) pioneered the use of experts to
rate candidates’ ideological positions. These studies survey a sample of state legislators
and party convention delegates and ask them to place their congressional candidates on a
7-point scale.9 These “expert informants” can label candidates as either very liberal, liberal,
somewhat liberal, moderate, somewhat conservative, conservative, or very conservative. The
resulting scores are adjusted by subtracting/adding the average difference between partisans
and independents. Averaging responses is a sensible approach if we assume that errors in
perceptions are symmetrically distributed.
Although Joesten and Stone (2014) correct for the average “bias” from partisanship, they
do not attempt to correct for the fact that individuals often use scales differently. For
instance, some individuals may think that “very liberal” is an appropriate term for anyone
who is not a Republican whereas others may reserve the term for revolutionary socialists.
When individuals are asked to rate a variety of politicians and political entities, their own
tendencies in the use of the scale can be accounted for. This observation led Aldrich and
McKelvey (1977) to the following model:
9
Maestas, Buttice, and Stone (2014) improve on the measurement model in Joesten and Stone (2014).
However, we will focus here on Joesten and Stone (2014) for simplicity.
10
x̃ij = wj (xi − cj ) + ij
(2)
x̃ij is personj ’s placement of candidate i. wj and cj are coefficients that capture person
j’s individual use of the scale, which can be estimated because each person places multiple
candidates and political entities. xi is again the actual, latent position or preferences of
candidate i. Hare et al. (2014) and Ramey (2016) use a Bayesian variant of this model to
estimate candidate locations based on the perceptions of survey respondents.10
Models of Ideology Based on Spatial Models of Citizen Behavior
Another approach is to measure candidates’ ideology based on the idea that some set of
behavior by voters or citizens is driven by a spatial model which is a function of candidate positions. For instance, we could assume that citizens donate to spatially proximate
candidates. Likewise, we could assume that social network users follow spatially proximate
candidates on Facebook and Twitter.
In Barberá (2015), the choice of Twitter users whether or not to follow political candidates is assumed to be a function of the policy distance between the Twitter user and
the candidate.11 The Twitter user follows the candidate if the utility of doing so is greater
than some threshold, t, where utility is once again quadratic. Barberá uses a logistically distributed random error, which is very similar to the normal distribution. So the probability
that user j follows candidate i is:
P (yij = F ollow) = P (−(xi − θj )2 + ij > t)
10
(3)
Ramey (2016) allows the variance of the error to have a candidate-specific component, and we follow this
specification. There are many possible extensions. For instance, Hare et al. (2014) allow the error variance
to have both a candidate-specific and a rater-specific component.
11
Twitter is a social media platform that allows users to send brief messages to other users who choose to
receive these messages or “follow” them.
11
In order to allow for arbitrary levels of sensitivity to this distance, Barberá (2015) adds a
scaling parameter, γ, as well as two different intercepts, recognizing that any given user can
only follow so many accounts, and that many candidates have limited name recognition and
thus few followers. αi captures candidate i’s overall popularity with users, and βj captures
user i’s propensity for following people on Twitter. These intercepts are arbitrarily scaled, so
we can replace our threshold t with an arbitrary fixed number, in this case 0. The following
specification results:
P (yij = F ollow) = P (αi + βj − γ(xi − θj )2 + ij > 0)
(4)
Bonica (2014) uses a similar model to estimate candidates’ ideology based on their campaign contributors. The main difference between Barberá (2015)’s model and the model in
Bonica (2014) is that when it comes to campaign contributions, donors must choose both
who to give to and how much to give. Bonica recodes all contribution amounts as categories
of $100s of dollars, and uses correspondence analysis to recover ideal points.12
Validation Approach
We use a multi-faceted approach to evaluate the various measures of candidate positions.
First, we evaluate how much of the variation in incumbents’ contemporaneous DW-Nominate
score that each measure explains. This is similar to the approach used to validate previous
measures of candidate ideology (e.g., Joesten and Stone, 2014; Hare et al., 2014). However,
we focus on the within-party explanatory power of each measure. We choose to focus on
DW-Nominate scores because they are a simple and accurate summary of the entire roll call
voting record. In Appendix B we show that our results are very similar if we instead focus
12
The correspondence analysis in Bonica (2014) is meant to approximate an IRT model similar to the one in
Barberá (2015). It builds off of an earlier paper, Bonica (2013b), which actually estimates such a model.
However, due to the very large size of the donation data, Bonica (2014) opts for this simpler method.
12
on fitting individual votes.
Next, we evaluate how much each measure improves upon a much simpler model that
simply assigns each candidate the mean ideal point of legislators in their party (xparty ).
To do this, we linearly map each non-roll call based measure of candidates’ positions into
Nominate’s space xj . Then, we calculate the average improvement for each measurement
model compared to a model that assumes each candidate takes the mean ideological position
of other candidates in their party. This is can be thought of as a R2 statistic but based on
absolute rather than squared errors:
1 − mean(|xN ominate − xj |)/mean(|xN ominate − xparty |)
(5)
Finally, we evaluate the percentage of candidates that each method is able to correctly
classify into the proper tercile of their Nominate score for their party. This provides a
simple metric for the degree of confidence we should have in each model’s ability to roughly
differentiate between moderates and extremists in each party (see, e.g., Hall, 2015).
None of these statistics is sufficient to show that the measure in question is useful for
measuring the spatial positions of non-incumbents. After all, our goal is not to obtain
external measures of incumbents’ ideal points, it is to obtain external measures of nonincumbents’ ideal points in order to evaluate theories of polarization, spatial voting, and so
on. So, next, we repeat these evaluations for non-incumbents who go onto win their election.
Thus, our final evaluation examines how well each measure of candidate position predicts
the future positions of non-incumbent House candidates who go on to win the election and
compile a legislative record. This is a more difficult test, but a crucial one for validating
these measure with respect to non-incumbents. Indeed, the most valuable use of non-roll
call based measures of candidate positions is to predict how non-incumbents will vote after
they take office.
Before we proceed to evaluate each of the estimates of candidate position, it is important
13
Table 1: Validation Statistics for Simulations of DW-Nominate in the U.S. House and Senate
DW-Nominate Simulations
1642
0.999
Observations
Perc. Classified into Correct Party
Success in Explaining Within-Party Variation in DW-Nominate Scores for Democrats
Perc. Variation Explained
0.971
Perc. Correctly Classified (Terciles)
0.866
Prop. improvement on party baseline
0.923
Success in Explaining Within-Party Variation in DW-Nominate Scores for Republicans
Perc. Variation Explained
0.944
Perc. Correctly Classified (Terciles)
0.837
Prop. improvement on party baseline
0.923
to first examine how well we could possibly do. After all, if the roll call voting record itself
does not locate candidates very precisely, then we should not expect measures of candidate
positions to accurately predict DW-Nominate scores. In order to conduct this evaluation,
we randomly generate new DW-Nominate scores using the estimated scores and their bootstrapped standard errors, assuming that the scores are normally distributed. After generating
these scores, we examine all of the above statistics. We limit ourselves to the 110th, 111th,
and 112th sessions of Congress (2007-2012) in order to take account of the fact that recent
congresses may be particularly hard to predict because they are particularly polarized.
This exercise shows that DW-Nominate is very precisely measured. After 10,000 simulations, the simulated ideal points for Republicans explained 94% of the variance in the
actual scores, and this statistic was 97% for Democrats. 84 to 87% of all legislators were
placed in the correct tercile within their party, and the simulated value reduced error by
92% over using the party mean. Not surprisingly, virtually every simulation classified each
legislator into the correct party. This is the upper limit of predictive accuracy for a measure
of candidate positions – we should not expect to do better in predicting a legislator’s roll
call voting behavior than we can do with the actual votes themselves.
14
Data
In the next section, we evaluate six different models of candidates’ ideology (Table 2). Wherever possible, we use estimates that are supplied in the data from the corresponding paper
for each model. We compare each measure with DW-Nominate scores that are based on
92,186 roll call votes in Congress taken from 1789 to 2014, identifying the positions of 11,976
legislators.13
Table 2: Methods for estimating candidate preferences
Paper
Data
Benchmark Model
Poole and Rosenthal (2011) Congressional roll call votes
Statistical Model
Spatial choice model (dynamic)
Models of Ideology based on Candidates’ Political Positions Outside Congress
Shor and McCarty (2011)
State legislature roll call votes Spatial choice model
Shor and Rogowski (2015)
NPAT Responses
Spatial choice model
Models of Ideology based on Perceptions of Candidate Position
Ramey (2016)
Survey respondent perceptions Measurement error model
Joesten and Stone (2014)
Expert perceptions
Party-adjusted average
Models of Ideology based on Spatial Model of Citizen Behavior
Barberá (2015)
Followers on Twitter
Spatial choice model
Bonica (2014)
Campaign contributions
Correspondence analysis
1. State Legislative Ideal Points: Shor and McCarty (2011) use a model similar to equation
1 to estimate state legislators’ ideal points based on their roll call voting records from
the mid-1990s to 2012. We downloaded Shor and McCarty’s data from the Dataverse
(Shor and McCarty, 2014), and manually matched the estimates of state legislators’
ideal points to their ICPSR numbers that Poole and Rosenthal use to index their
13
More recent years tend to have more votes than earlier years.
15
DW-Nominate scores.14
2. NPAT scores: Montagnes and Rogowski (2014) use a model similar to equation 1 to
estimate candidates’ ideal points based on their NPAT responses from the mid-1990s
to 2006. Jon Rogowski generously shared an expanded version of the data used in this
paper.
3. Aldrich-McKelvey estimates based on constituent perceptions: Based on a model similar to equation 2, Hare et al. (2014) and Ramey (2016) estimate the positions of
Congressional candidates using survey responses from the Cooperative Congressional
Election Study (CCES). In our evaluation, we focus on the estimates from Ramey
(2016), which uses 109,935 survey responses from 2010 and 2012 to estimate the positions of House and Senate candidates.15
4. Expert Ratings: Based on a model similar to equation 2, Maestas, Buttice, and Stone
(2014) uses data from 726 experts and over 4,000 survey respondents in 155 districts
in 2010, for an average of about 30 raters per district. Each rater evaluates both the
current incumbent’s ideology as well as the Democratic and Republican candidates’
ideology. We downloaded the replication data from the Dataverse (Maestas, Buttice,
and Stone, 2013).16
5. Twitter scores: Barberá (2015) uses a model similar to equation 4 to estimate the
latent ideology of several hundred House and Senate candidates using data on 301,537
Twitter users from November of 2012.17
14
Because state legislative ideal points are only available before legislators take office, we use them in the
validation below for non-incumbents.
15
We downloaded the replication data for Ramey (2016) from the dataverse, and used this to analyze the
ability of Aldrich-McKelvey scores to predict contemporaneous roll call positions. However, the replication
data for Ramey (2016) does not include estimates for non-incumbent candidates. So we used what we
believe to be the same data, from the 2010 and 2012 Cooperative Congressional Election Studies, and the
same method, to compute our own estimates based on an identical measurement model.
16
We use the inclc pc09 variable for incumbent placements, dlc pc10 for Democratic candidates’ placements,
and rlc pc10 for Republican candidates’ placements.
17
We downloaded the replication data from the Dataverse (Barbera, 2014).
16
6. Campaign Finance (CF) Scores: Bonica (2014) uses correspondence analysis to estimate the ideology of virtually every House and Senate candidate between 1980 and
2012 based on over 100 million contributions to political campaigns from 1979 to 2012.
Correspondence analysis is meant to approximate a model similar to equation 4. We
downloaded each congressional candidates’ dynamic and static CF-Score data from
Adam Bonica’s DIME website (Bonica, 2013a).18
Importantly, all of the estimates that we are testing do not take congressional roll call
votes into account as a source of information. It would not necessarily be wrong to do so.19
However, the fact that these models do not use roll call votes ensures that they are at least
plausibly exogenous with respect to DW-Nominate scores.
Validation Results
In this section, we discuss the results of our evaluation of these measures of candidate positions for the period between 2000-2012. We focus on this period because many empirical
studies focus on recent congresses, and recent congresses may be particularly hard to predict
because they are so polarized.
U.S. House
Table 3 evaluates each model’s ability to provide accurate estimates of the partisan affiliation
and ideological positions of incumbents in the U.S. House between 2000-2012. The first row
indicates the number of observations available for each model.20 It is important to note that
the sample size of CF-Scores is several times the available sample from the other models.
The second row indicates the percentage of candidates that each model correctly classifies
18
We use the dynamic CF-Scores in each of the analyses that follow. However the results are very similar
using static CF-Scores.
19
In fact, in many contexts it may make sense to leverage this information. See Groseclose and Milyo (2005)
and McCarty, Poole, and Rosenthal (2006) for examples.
20
Note that each observation represents legislators’ estimates for a particular Congress.
17
Table 3: Validation Statistics for Various Measurement Models Against Contemporaneous Nominate Scores in the U.S. House
Name
Available Congresses:
Observations
Perc. Classified into Correct Party
NPAT
Scores
(106-109)
546
0.923
Survey
Respond’s
(112)
427
1.000
Experts
Twitter
CF-Score
(111)
159
1.000
(112)
144
1.000
(106-112)
3453
0.983
Success in Explaining Within-Party Variation in DW-Nominate Scores for
Perc. Variation Explained
0.545
0.483
0.601
Perc. Correctly Classified (Terciles)
0.602
0.577
0.630
Prop. improvement on party baseline
0.311
0.256
0.386
Democrats
0.462
0.203
0.660
0.460
0.286
0.104
Success in Explaining Within-Party Variation in DW-Nominate Scores for
Perc. Variation Explained
0.368
0.183
0.252
Perc. Correctly Classified (Terciles)
0.493
0.568
0.540
Prop. improvement on party baseline
0.225
0.124
0.117
Republicans
0.118
0.279
0.670
0.568
0.086
0.166
into the correct party based on a simple logistic regression model. It shows that all of the
models do extremely well at classifying candidates into the correct party.
The next two panels of Table 3 present each of our validation statistics of the withinparty relationship between the estimates from each model and DW-Nominate scores, broken
down by party. The first column examines the ability of the estimates of the ideal points
of candidates based on Project Vote Smart’s NPAT survey to accurately place incumbents’
positions in Congress. NPAT scores explain about 54% of the variation in DW-Nominate
scores for Democratic members, and 37% of the variation for Republican members. They
also correctly classify 60% of Democrats and 49% of Republicans into the correct tercile
of the distribution of DW-Nominate scores. The model reduces overall error by 31% for
Democrats and 23% for Republicans over a model that assumes that all Democrats and all
Republicans are the same. Clearly, NPAT scores are better than this much simpler model,
but not by very much.
The other measures do not fare much better. No model of candidate positions explains
18
more than half of the variation in incumbents’ DW-Nominate scores across both parties.21
Moreover, no model correctly classifies at least two thirds of the members of both parties
into the correct tercile. Moreover, each model only modestly improves over a much simpler
model that assumes that all Democrats and all Republicans are the same.
These contemporaneous relationships between each measure and DW-Nominate are shown
visually in Figures 1 and 2. Figure 1 shows these relationships for Democrats, and Figure 2
shows them for Republicans. Each panel contains a scatterplot of individual measurements
as well as a loess line to allow a more flexible comparison between the measure and the true
value of DW-Nominate.
As a baseline, in the upper-left panel in each Figure we compare the Nokken-Poole scores
for each representative that served in the House since 2000 with their lifetime Nominate
scores (Nokken and Poole, 2004). Nokken-Poole scores are based on the same data as DWNominate but use a different measurement model. The difference between these two sets of
scores reflects the degree of difference we would expect due to arbitrary modeling choices
alone. We find that they are closely related. The Nokken-Poole scores explain about 88%
of the variation in Nominate scores for Democrats, and 79% for Republicans. Besides a few
outlying values, the relationship between Nokken-Poole and Nominate scores is surprisingly
linear. In contrast, the other panels in both figures are extremely noisy. This is not surprising,
as these figures simply graph the data that underlies the statistics in Table 3.
So far, we have leveraged all the data we have from 2000 to 2012 to examine contemporaneous predictive accuracy. However, our desired purpose is to use these measures to
assess a counterfactual: what DW-Nominate score would a candidate have if they were a
sitting legislator? Unfortunately we cannot answer this question directly, but we can come
closer. For many of our data points, we observe a candidate who wins their election and
subsequently gets assigned a DW-Nominate score. We can repeat the statistics from Table
21
In the online appendix, we compare our results to the validation results reported in each paper. Overall,
our results regarding the amount of within party variation explained by each model are very similar to the
results reported in the source papers.
19
0.25
0.25
0.00
●
●
● ●●
●●● ● ● ●
●
●
●●
●
● ●●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●●
●●●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●● ●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
● ● ●
●
●●●
●
● ●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●●
●
●●
●●
●
●
●
●●●●●
●●
● ●●●●●● ●
●
●
● ● ●
●●●
●●●
● ●
● ●
●
●●●● ●
●
DW−Nominate Score
DW−Nominate Score
●
●
●
●
●
−0.25
R^2 = 0.9
−0.50
●
−0.75
0.00
●
−0.25
●
−0.50
−0.6
−0.3
0.0
0.3
−1.5
Nokken−Poole Score
R^2 = 0.46
−1.0
−0.5
0.25
●
●
DW−Nominate Score
DW−Nominate Score
●
Twitter Score (112th Congress)
0.25
0.00
●
−0.50
●
●
−0.75
−0.9
−0.25
●
●● ●
●
●●●●
●
●
● ●●● ● ●
●
●
● ●
●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
● ●●
●
● ●●
● ●
● ●
●
● ●●
● ●● ● ●●● ●
●●●●● ●
●●
● ● ●
●● ● ●●
●
●
●
●
●
●
● ● ●● ● ●●● ●
●● ●● ●
●●●●
●
●●
● ●● ●
●
●●
●
●●● ●
●●
●
●●● ● ●●
●
● ●
●●
●● ●●
● ●● ● ●●
●
●●
●
● ●●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●●● ● ●
●
●
●
●
●
●
●
●
●
●●●
● ●● ●●
● ● ● ● ● ● ● ●●
●
●
●●
●
● ● ● ●● ●
●● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
● ●●● ● ● ●
●
●● ●
● ●
●●
●●
●●
●
●
●
●
● ●●
●
●●● ●
●
●
●
● ●
●
●
●
●
● ●● ●
●
●●
● ●
● ●●●
R^2 = 0.54
0.00
−0.25
−0.50
●
−0.75
●
● ●●
●
●
●
● ●●
●
●● ●
●
●● ●●
●
●
●●
●● ● ●
● ●●●
●
●
● ●
●
●
●●●
●
●
●
●
●
● ●● ● ● ● ●●
● ●
●●
●
●
●
●
●
●
● ●
●
● ●
●
●
●
●●
● ●● ●●●
●● ● ●
● ●●● ● ● ●
●
●
●
● ●
● ●
●● ●
● ●● ●● ● ●
●
●●
●
●
●●
●
● ●
● ●
●●
● ●●
● ●●
● ●
● ●● ● ● ●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●● ● ● ●
●
●
● ●
●
● ●
●
●
●
●
●
●
●● ●
R^2 = 0.48
−0.75
−2
−1
0
1
−0.75
NPAT Score
−0.50
−0.25
0.00
0.25
Aldrich−McKelvey Score (112th Congress)
0.25
0.25
DW−Nominate Score
DW−Nominate Score
●
●
●
0.00
●
●
●●
●●
−0.25
−0.50
●
●
●
●
● ● ●●●
●
●●
●
●
●
●
●
●
● ●● ● ● ● ●●●
●
●
●
●
●●
●●
●
● ●●
●●
●
●●
●
●
●
●●●
●
●
●
● ●● ●
●
●●
●
●
●●
●
●●●
●
● ●
● ●●
●
●
● ●
●
●
●●
●
●
●
●
●
R^2 = 0.6
−0.75
●
● ●
●
● ●
●
● ●
●
●
● ●
●●●● ● ●
●
●●
●
●
●
● ●●●●
● ●
●
●● ●
●
●●
●
●
●
● ●● ●
●
●
●
●
●
●
● ●● ●●
●
●
●
● ● ●●
●●●
● ●
●
●●
●
●
●
●● ●
●
●
●●●●
●
●●
●●●● ●
●●●
●
●
●
●●
●●●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●●●
●● ●
●●● ●
● ●●●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
● ● ●●
●●
●●
●
●
●
●
● ● ● ●●
●
●
●●
●
●
●
●●
●
●●
●
●●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●●
●
●
●
●●●●
●
●
●
●●
●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●
●
●
●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●● ●
●
●
●●
●
● ●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ● ●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●●●●●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●● ●●
●
●
●
●
● ●● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
● ● ●
●●
●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●● ●
●
●
●
●
●
●
●
●● ● ●
●●
●
●
● ●●
●●
●●
●●
●● ●●
●
● ● ●●
● ●●
●
●
●
●● ●● ● ●
●
●●
●●
●●
●
●● ●● ●
●●
●
● ●●
●
●
0.00
−0.25
R^2 = 0.2
−0.50
−0.75
2
3
4
−2
Expert Assessment Score (111th Congress)
−1
0
1
CF Score
Figure 1: The relationship between DW-Nominate and various measures of candidate positions
for Democrats in the House between 2000-2012
3, but this time each measure is taken from a candidate for the House of Representatives
who has not previously held office. Their DW-Nominate score is the score they receive in
the next Congress after they win election. Table 4 shows these predictive results.
The first thing to note about Table 4 is that each model has far fewer observations (row
1) than in Table 3. This is due to the fact that most of the non-incumbents for which we
20
1.00
●
●●●
●
●●
●
●
●
●●●
●●
●●
●
●
●
●
●
●
●●●●
● ● ●●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
● ●● ●●●●●●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
● ●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
● ● ●
●
● ●●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●●●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●
●
●●●●●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●● ●●●●●
●
●
●
●
●● ●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
● ● ●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
● ●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●● ●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ● ● ●
●
●
●
●
●
●
●
● ●● ●●
●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
● ●●
●
●● ●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●●●●●
●
●
●
●
●
●
● ●●●● ●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
● ● ●
●● ●
● ●●●
●
●
●
●●●●●
●●●
●
●● ●
● ●
●
●●
●● ●
● ● ●● ●
●
● ●
●
●
● ●
● ●
●●
●●
●● ●
●
0.75
0.50
0.25
R^2 = 0.82
● ● ●
● ● ● ●
●
●
●
● ●
●
● ●
● ●
● ●●
●
●
●
● ●
●
●
●
●
●●
●
● ● ●● ● ●
●● ● ●
●
●●
● ●
●●● ●●
●
●
●
● ●
●
●
●
● ●
●
●
● ● ●
●
● ●
●
●
● ●
●
●
●
●
● ● ●
●
●
●
●
DW−Nominate Score
DW−Nominate Score
1.00
0.75
0.50
0.25
R^2 = 0.12
0.25
0.50
0.75
1.00
0.5
Nokken−Poole Score
1.00
●
● ●
●● ●
●●
●
●
●
●●
●●
● ●●
●●●● ●
●
●
●
●
●
● ●●
●●
●● ●
●
● ●
●
●
●
●
●
●
●
● ●●
●
●●● ●● ●●● ●
● ●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
● ● ●● ● ●●
● ● ●●
●● ● ● ●
●
●
●
●
● ● ●
●
●● ●
●
●
● ● ●● ● ●
●●
●●●●● ● ● ●
●
● ● ●
●
● ●● ●
●● ●
●
●
●●
●
●
●
●● ● ●●
●● ● ● ●
●●
●
●
●● ● ●
● ●●
● ●
●
●
●
●
●
DW−Nominate Score
DW−Nominate Score
1.0
1.5
2.0
Twitter Score (112th Congress)
1.00
0.25
●
0.00
0.00
0.50
●
●
0.00
0.75
●
●
●
R^2 = 0.37
●●
0.75
●
●
0.50
●●
●
●●●
●
●
●
●
●
● ● ●
● ● ●
●●
●
● ● ● ● ●● ●● ● ●
●
●
●
● ●
●
● ● ●●
●
● ●
●
●●
● ●● ●●
●
● ●
●
● ●●●
● ● ●●
●
●● ●
● ●
● ● ●●
●
●●●●
● ●●●●
●
● ●
●
●
●
● ● ●●● ●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
● ● ● ●● ● ● ●
●●
●
●
● ●
● ●
●
●●
●
●
●
● ●
●
●
●
●● ● ● ● ● ● ●
●
●
●
●●● ●
● ●● ●
●
●● ● ●
●
●
● ● ●
●
●
●
●
●
●
●●
●
●
● ●●●●
●
●
● ●● ●● ●
●
●
●● ●
●●
●●●
●
●
●
●
●
●
●
●●
0.25
●
R^2 = 0.18
●
0.00
0.00
0.0
0.5
1.0
1.5
2.0
0.4
NPAT Score
1.00
DW−Nominate Score
●
●
●
●
●
●
0.50
●
●
●
● ●●
●
●
● ● ●
●
●●
● ●
●
● ●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
0.25
R^2 = 0.25
●
0.8
●
● ● ●●
●● ●
●
●●●●●● ●
● ●●●
● ●
●
●●
●
●
●● ●
●
●●
●●
● ●●●●
●●
●
● ●●
●●
●●
●●● ●
●●●
●
● ●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●●
●●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●●●
● ● ● ●●
●●
●
●
●●●
●●
●
●●●●
●
●
●
●●
●
●●●
●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●● ●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
● ● ●●●
●
●●
●●
●
●
●●● ●●
●
●
●
●● ●●●●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●●●●●●●
●
●
●
● ●
●
●
●●
●
●
●
●●●
●
●●
●
● ● ●●
●●
●
●●●
●
●●●
●
●
●
●
●●●
●
●
●
●
●
● ●●●
●●
●
●
●
●
●
●●●
●
●
●●
●●●●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●●●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●●
●
●
●
●
●
●
●
●●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●
●
●●
●●● ●●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
● ●
●
●
●●
●
●●
●●
●
●
●●
●
●●
●
●●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●●
● ●
●
● ●
●●
● ●● ●
●●
●
●●●●
●●●
●●
●
●
●●
●●
●
●●
●
●
●● ●●
●
● ●
●●●
●
●●
●●
●
●●
●
●●
●●●
●
●● ●
●
●
●●●●●
●●
●●
●●●
●
●●●● ●●
● ● ●●
●●●
●●
●●
● ●●●●
●
● ●
●
●
●
●
●
●
●
● ● ●● ● ●● ● ●
●
●●●● ● ●●●●● ●
● ●
●
●
● ●
●
●
● ● ●
● ●●●
●
●●
●
●
●
●
● ●
●
●
●
●
●
DW−Nominate Score
●
●
0.75
1.00
●
●
0.6
1.0
Aldrich−McKelvey Score (112th Congress)
0.00
0.75
0.50
0.25
●
R^2 = 0.28
0.00
5.0
5.5
6.0
6.5
0.0
Expert Assessment Score (111th Congress)
0.5
1.0
1.5
2.0
CF Score
Figure 2: The relationship between DW-Nominate and various measures of candidate positions
for Republicans in the House between 2000-2012
have measures did not win their subsequent election and vote in the House. We have too
few Twitter scores to consider them, and too few Expert scores for the Democrats (hence
the “NA” values, meaning “Not Applicable”).
For the observations we do have, the results are much weaker than they were for the
contemporaneous comparisons. Only one measure (NPAT scores) explains more than half
21
Table 4: Validation Statistics for Various Measurement Models for Non-Incumbents Against Future Nominate Scores in the U.S. House
Name
Observations
Percent Classified into Correct Party
State Leg.
Ideal Pts.
94
0.968
NPAT
Scores
35
0.914
Survey
Respond’s
83
1.000
Experts
CF-Score
39
1.000
301
0.993
Success in Explaining Within-Party Variation in DW-Nominate Scores for Democrats
Percent Variation Explained
0.387
0.629
0.255
NA
Percent Correctly Classified (Terciles)
0.552
0.714
0.567
NA
Prop. improvement on party baseline
0.251
0.494
0.142
NA
0.048
0.567
0.011
Success in Explaining Within-Party Variation in DW-Nominate Scores for Republicans
Percent Variation Explained
0.173
0.275
0.033
0.026
Percent Correctly Classified (Terciles)
0.446
0.556
0.632
0.703
Prop. improvement on party baseline
0.080
0.222
0.017
-0.007
0.220
0.534
0.119
of the variance in Democratic DW-Nominate scores and none of the models explain half of
the variance in Republican DW-Nominate scores. Moreover, only one measure reduces the
variance over a naive model that assumes that all Democrats and all Republicans are the
same by more than 25% in any case. Making matters worse, measures that perform well
for one group generally perform relatively worse for the other. For instance, although CF
Scores explain more variance in the Republican positions than any other measure, they are
the worst measure for Democrats.
While it may be the case that these measures have better than zero predictive accuracy
for some parties in some years, with current data we can say very little about the conditions
under which we expect them to perform well for the House of Representatives. Average
accuracy is very low. In fact, no models performs much better than a model that assumes
one ideal point per party.
22
U.S. Senate
One possibility is that these measures perform poorly for the House of Representatives because it is inherently more difficult to predict the voting records of House members. House
members tend to have lower visibility to donors, members of the public, and experts. Some
House candidates are political novices, and may not have formed their own views on a variety
of issues. The experience of operating in a chamber where majority party control is the norm
may alter candidate positions once they begin serving.
In contrast, the United States Senate is a much more visible body, and candidates for the
Senate tend to have longer experience in the public eye. Once elected, Senators participate
in a legislative body that is noted for its individualism rather than overbearing party control.
For these reasons we might expect our measures to have better accuracy in the Senate than
in the House of Representatives.
The disadvantage of the Senate is a greatly reduced sample size. There are fewer total
Senators (100 instead of 435), fewer Senatorial elections (each Senator is up for election every
six years instead of 2), and lower turnover. We lack enough data from two of the models
(NPAT and Experts) to test these models at all. For the other measures, we have lower
sample sizes for the contemporaneous comparison. For the predictive comparison involving
candidates who win, we will not be able to test the Twitter-based measure either.
Table 5 shows the contemporaneous comparison for the Senate. In most cases, the fit
is substantially higher for these measures than in the case of the House of Representatives,
particularly for Republican legislators. Twitter scores perform particularly well across all
three statistics for both parties. However, the overall predictive power of these measures is
still limited.
Table 6 repeats the analysis above using the candidate scores for candidates who have
not yet held Senate seats and their later DW-Nominate scores as Senators. Once again, the
fit is generally lower than it was for the contemporaneous comparison. However, AldrichMcKelvey scores from survey respondents perform the best. These scores explain 45% of
23
Table 5: Validation Statistics for Various Measurement Models for Incumbents Against Contemporaneous Nominate Scores in the U.S. Senate
Name
Available Congresses:
Observations
Percent Classified into Correct Party
Survey
Respondents
(112)
103
0.971
Twitter
Scores
(112)
77
0.974
CF-Score
(106-112)
710
0.969
Success in Explaining Within-Party Variation in DW-Nominate Scores for Democrats
Percent Variation Explained
0.325
0.722
0.325
Percent Correctly Classified (Terciles)
0.463
0.677
0.463
Prop. improvement on party baseline
0.267
0.430
0.124
Success in Explaining Within-Party Variation in DW-Nominate Scores for Republicans
Percent Variation Explained
0.384
0.624
0.240
Percent Correctly Classified (Terciles)
0.729
0.719
0.729
Prop. improvement on party baseline
0.257
0.368
0.106
the variation in Democratic DW-Nominate scores and 64% of the variation in Republican
DW-Nominate scores.
Applications
In this section, we show how the choice of measurement strategy for candidates positions
dramatically affects substantive inferences in two important areas: polarization and the
allocation of distributive spending.
Polarization
There is a vast literature that examines changes in polarization over time among legislators
and candidates. In their authoritative study, McCarty, Poole, and Rosenthal (2006) show
that legislators’ roll call records have polarized asymmetrically, with virtually all of the
polarization occurring among Republicans. In line with this finding, the upper-left panel
of Figure 3 shows that between 2000 and 2012, virtually all of the polarization in DW24
Table 6: Validation Statistics for Various Measurement Models for Non-Incumbents Against Future Nominate Scores in the U.S. Senate
Name
Observations
Percent Classified into Correct Party
State Legislative
Ideal Pts.
14
0.929
Survey
Respondents
53
1.000
CF-Score
71
0.972
Success in Explaining Within-Party Variation in DW-Nominate Scores for Democrats
Percent Variation Explained
0.230
0.449
0.325
Percent Correctly Classified (Terciles)
0.375
0.333
0.333
Prop. improvement on party baseline
0.163
0.287
0.098
Success in Explaining Within-Party Variation in DW-Nominate Scores for Republicans
Percent Variation Explained
0.436
0.640
0.476
Percent Correctly Classified (Terciles)
0.667
0.778
0.778
Prop. improvement on party baseline
0.223
0.380
0.285
Nominate scores occurred among Republicans. The middle and upper-right panels show the
analogous change among NPAT scores for incumbents and all-candidates (i.e., both winners
and losers) between 2000 and 2006. Like DW-Nominate scores, NPAT-scores also polarize
asymmetrically. However, virtually all of the polarization in NPAT-Scores occurs among
Democrats. Finally, the lower panel shows the change in polarization in CF-Scores for
incumbents and all candidates. A number of recent empirical studies have used CF-Scores
to examine the causal factors for polarization in state legislatures and Congress (e.g., Ahler,
Citrin, and Lenz, Forthcoming; Rogowski and Langella, 2014; Thomsen, 2014). Figure 3
shows that unlike DW-Nominate scores, CF-Scores polarized among both Democrats and
Republicans.
Overall, these plots indicates that DW-Nominate scores, NPAT-scores, and CF-scores
each show a different story regarding the relative changes in polarization over the past decade.
It is possible that each of the stories is substantively interesting. For instance, it is possible
that the composition of Democratic donors is polarizing, while Democrats’ roll-call behavior
is staying constant. But the substantively different trends in polarization across models are
further evidence that it is unlikely that these models are actually measuring the same latent
25
Polarization in DW−Nominate Scores
(Incumbents)
●
●
●
●
1.0
●
●
0.00
−0.25
●
●
●
●
●
●
0.0
−0.5
−1.0
●
●
●
●
●
●
●
●
0.5
●
●
●
●
0.0
−0.5
●
−1.0
2000 2002 2004 2006 2008 2010 2012
2000 2002 2004 2006 2008 2010 2012
Year
Year
Year
Polarization in CF−Scores
(Incumbents)
Polarization in CF−Scores
(All Candidates)
●
●
●
●
●
●
●
1.0
CF−Score
0.5
CF−Score
●
2000 2002 2004 2006 2008 2010 2012
1.0
0.0
−0.5
●
0.5
0.25
●
Polarization in NPAT−Scores
(All Candidates)
NPAT−Score
0.50
●
●
NPAT−Score
DW−Nominate Score
0.75
Polarization in NPAT−Scores
(Incumbents)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.5
0.0
−0.5
●
●
●
●
−1.0
●
●
2000 2002 2004 2006 2008 2010 2012
2000 2002 2004 2006 2008 2010 2012
Year
Year
Figure 3: The evolution of DW-Nominate and various measures of candidate positions for
Democrats and Republicans in the House between 2000-2012. Blue dots show the mean spatial
position of Democrats and red dots show the mean spatial position of Republicans.
quantity. This suggests that scholars should use caution in using non-roll call based measures
of candidate ideology to make inferences about changes in polarization in Congress or state
legislatures.
Allocation of Distributive Spending
An important question in the field of legislative politics is the degree to which legislators’
ideology influences the amount of distributive spending that their district receives (e.g., Ferejohn, 1974; Cann and Sidman, 2011). Alexander, Berry, and Howell (2016) persuasively show
that moderate legislators get more non-formula (e.g., flexible, non-mandatory) discretionary
spending than extremist legislators. The logic is that moderate legislators near the median
receive pay-offs in exchange for their support of close bills (Snyder, 1991; Dekel, Jackson,
and Wolinsky, 2008).
26
Alexander, Berry, and Howell (2016) use a nuanced identification strategy with countyby-member fixed effects and other time-varying controls. However, their basic result also
appears in a much simpler cross-sectional regression.22 Indeed, Table 7 shows that a onestandard deviation increase in the extremity of legislators’ DW-Nominate scores in 2008 is
associated with a 3.7% decrease in non-formula spending.23
Table 7
Dependent variable: Log(Non-Formula Grants)
(1)
DW-Nominate
(2)
(3)
(4)
−0.040∗∗
(0.018)
−0.011
(0.013)
CF-Scores
−0.053
(0.032)
NPAT-Scores
−0.008
(0.027)
Aldrich-McKelvey Scores
Twitter Scores
Republican
Constant
Observations
R2
Adjusted R2
(5)
−0.014
(0.035)
21.934∗∗∗
(0.021)
−0.064∗∗
(0.026)
21.956∗∗∗
(0.018)
−0.015
(0.067)
21.398∗∗∗
(0.040)
−0.058
(0.054)
21.936∗∗∗
(0.029)
0.005
(0.029)
−0.062
(0.056)
21.947∗∗∗
(0.040)
420
0.031
0.026
420
0.021
0.016
99
0.029
0.009
290
0.021
0.014
120
0.021
0.004
∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01
Note:
In contrast, none of the other measures of candidate positions have a statistically significant association with the distribution of non-formula grants. This further suggests that
these measures may not be capturing the same latent quantity as DW-Nominate. Moreover,
their usage as a proxy for legislators’ roll call behavior could lead to incorrect inferences on
22
We downloaded their data from the Harvard Dataverse, http://dx.doi.org/10.7910/DVN/VR12G4, and
matched it with the various measures of candidate positions that we evaluate in this paper.
23
This is substantively consistent with the “7.2% decrease in outlays associated with a one standard-deviation
increase in a member’s ideological distance from the median voter” that Alexander, Berry, and Howell (2016,
223) report in their paper.
27
fundamental questions in political economy and the study of legislatures.
Conclusion
Despite the development of a variety of innovative strategies for measuring the political positions of candidate for Congress, existing measures have only limited predictive power in
terms of the voting records that candidates establish once elected. Even contemporaneous
measures, which use data on legislators as they are currently serving in Congress, typically
fail to explain even half the variance in legislator voting, and usually closer to a third. The
performance of these measures varies across parties and time, with no measure clearly dominant. As a result, the usage of these measures of candidate positions could lead to inferential
errors for substantive research. For instance, we have shown that different measures of candidate positions lead to dramatically different inferences for both polarization and the link
between legislator ideology and distributive spending. Our findings are profound not just
for academic research, but for our understanding of democracy. Prospective voting requires
voters, not just political scientists, to know what candidates will do if elected.
While these measures perform poorly at predicting legislators’ roll call positions, they
do have a number of other valuable uses. They could be used to impute the partisanship of
candidates (de Benedictis-Kessner and Warshaw, 2015) and voters (Hill and Huber, 2015)
when other information on their partisanship is not readily available. Moreover, they could
be used to examine potential explanations for the mismatch between survey respondents’
perceptions and candidates’ actual roll call positions (see, e.g., Cormack, 2015; Grimmer,
2013; Henderson, 2013). These measures also have a number of potential applications for
specific substantive questions outside the realm of legislative behavior. For instance, CFScores could be used to examine the campaign finance behavior of bureaucrats (Bonica
et al., 2015) and Barberá (2015)’s measures of candidates’ Twitter followers could be used
to examine the effect of candidates’ roll call positions on their followings on social networks.
28
There are a variety of reasons that constituents’ implicit (e.g., campaign finance donations
or twitter following) and explicit (e.g., survey responses) perceptions of candidates’ ideology
are both only weakly associated with candidates’ roll call behavior inside of Congress. Although candidates make commitments and promises during their campaigns, these commitments are rarely enforceable (Alesina, 1988). Moreover, candidates have a variety of reasons
to distort their positions during the campaign. This may weaken the relationship between
candidates’ campaign platforms and their roll call positions (Rogowski, 2015). Moreover, the
ability of constituent perceptions to predict roll call behavior may be further distorted by
political geography. Indeed, social media commentators, donors, and the public are limited
in the choice of viable candidates to support in any particular district. Information gleaned
from these relationships may be a feature of the limited choice set rather than true similarity.
Finally, there are a variety of factors that could influence candidates’ roll call votes (e.g.,
lobbying, agenda-control, party leaders, etc).
It is also important to note that our findings do not imply that it is impossible to find
a better measure of candidates’ spatial positions. On the contrary, we encourage future
researchers to look for better data sources and modeling strategies in order to more accurately
measure the positions of candidates (e.g., Bonica, 2016). However, we would also encourage
future research to measure success by a high standard. Cross-party correlation coefficients
are a poor way to evaluate the accuracy of a measure. Instead we would encourage scholars
to use within-party measures, with variance explained being the easiest to interpret. It is
also important to evaluate the performance of new measures in different time periods as
measures that appear to predict well may vary substantially in their usefulness, particularly
in the current, more polarized era.
For the time being, however, our findings call into question the usefulness of these measures for examining questions that depend on the relative spatial distance between candidates, such as tests of spatial voting theories or the causes of Congressional polarization.24
24
Whether or not these measures are useful depends on the application in question. Even relatively weak
proxy measures can sometimes produce orderings that are correct a substantial fraction of the time. How-
29
At the very least, empirical papers that use these measures to study the causes and effects of
candidate positions in Congress should validate their usage, and demonstrate the robustness
of their findings using different measures of candidates positions.
ever, comparisons of relative distances can be highly inaccurate.
30
References
Adcock, Robert, and David Collier. 2001. “Measurement Validity: A Shared Standard for
Qualitative and Quantitative Research.” American Political Science Review 95(3): 529–
546.
Ahler, Douglas J, Jack Citrin, and Gabriel S Lenz. Forthcoming. “Do Open Primaries
Improve Representation? An Experimental Test of California’s 2012 Top-Two Primary.”
Legislative Studies Quarterly .
Aldrich, John H, and Richard D McKelvey. 1977. “A Method of Scaling with Applications to
the 1968 and 1972 Presidential Elections.” The American Political Science Review 71(1):
111–130.
Alesina, Alberto. 1988. “Credibility and Policy Convergence in a Two-Party System with
Rational Voters.” American Economic Review 78(4): 796–805.
Alexander, Dan, Christopher R Berry, and William G Howell. 2016. “Distributive Politics
and Legislator Ideology.” The Journal of Politics 78(1): 000–000.
Ansolabehere, Stephen, James M Snyder Jr, and Charles Stewart. 2001. “Candidate Positioning in US House Elections.” American Journal of Political Science 45(1): 136–159.
Barbera,
Tweet
Pablo. 2014.
Together.
“Replication data for:
Bayesian
Ideal
Point
Birds of the Same Feather
Estimation
Using
Twitter
Data.”
http://dx.doi.org/10.7910/DVN/26589.
Barberá, Pablo. 2015. “Birds of the same feather tweet together: Bayesian ideal point
estimation using Twitter data.” Political Analysis 23(1): 76–91.
Black, Duncan. 1948. “On the Rationale of Group Decision-Making.” The Journal of Political
Economy 56(1): 23–34.
31
Bond, Robert, and Solomon Messing. 2015. “Quantifying Social Media?s Political Space:
Estimating Ideology from Publicly Revealed Preferences on Facebook.” American Political
Science Review 109(01): 62–78.
Bonica, Adam. 2013a. “Database on Ideology, Money in Politics, and Elections: Public
version 1.0.” http://data.stanford.edu/dime.
Bonica, Adam. 2013b. “Ideology and Interests in the Political Marketplace.” American
Journal of Political Science 57(2): 294–311.
Bonica, Adam. 2014. “Mapping the Ideological Marketplace.” American Journal of Political
Science 58(2): 367–386.
Bonica, Adam. 2016. “Inferring Roll-Call Scores from Campaign Contributions Using Supervised Machine Learning.” Unpublished manuscript. Available for download at http:
//papers.ssrn.com/sol3/papers.cfm?abstract_id=2732913.
Bonica, Adam, and Maya Sen. 2015. “A Common-Space Scaling of the American Judiciary and Legal Profession.” Unpublished manuscript. Available for download at http:
//scholar.harvard.edu/msen/judges-scaling.
Bonica, Adam, Chen Jowei, Johnson Tim et al. 2015. “Senate Gate-Keeping, Presidential
Staffing of Inferior Offices, and the Ideological Composition of Appointments to the Public
Bureaucracy.” Quarterly Journal of Political Science 10(1): 5–40.
Bonica, Adam, Howard Rosenthal, and David J Rothman. 2014. “The political polarization of
physicians in the United States: an analysis of campaign contributions to federal elections,
1991 through 2012.” JAMA internal medicine 174(8): 1308–1317.
Cann, Damon M, and Andrew H Sidman. 2011. “Exchange theory, political parties, and the
allocation of federal distributive benefits in the House of Representatives.” The Journal of
Politics 73(04): 1128–1141.
32
Clinton, Joshua, Simon Jackman, and Douglas Rivers. 2004. “The Statistical Analysis of
Roll Call Data.” American Political Science Review 98(2): 355–370.
Cormack, Lindsey. 2015. “Extremity in Congress: Communications versus Votes.” Unpublished manuscript. Available for download at personal.stevens.edu/~lcormack/
extreme_comms_votes.pdf.
de Benedictis-Kessner, Justin, and Christopher Warshaw. 2015. “Mayoral Partisanship and
Municipal Fiscal Policy.” Unpublished manuscript. Available for download at http://
cwarshaw.scripts.mit.edu/papers/CitiesMayors_160120.pdf.
Dekel, Eddie, Matthew O Jackson, and Asher Wolinsky. 2008. “Vote buying: General elections.” Journal of Political Economy 116(2): 351–380.
Downs, Anthony. 1957. An Economic Theory of Democracy. New York: Harper & Row.
Enelow, James M, and Melvin J Hinich. 1984.
The Spatial Theory of Voting: An
Introduction. Cambridge University Press.
Ensley, Michael J. 2009. “Individual Campaign Contributions and Candidate Ideology.”
Public Choice 138(1-2): 221–238.
Ferejohn, John A. 1974. Pork barrel politics: Rivers and harbors legislation, 1947-1968.
Stanford University Press.
Gelman, Andrew, and Gary King. 1990. “Estimating incumbency advantage without bias.”
American Journal of Political Science 34(4): 1142–1164.
Grimmer, Justin. 2013. Representational Style in Congress: What Legislators Say and Why
It Matters. Cambridge University Press.
Groseclose, Tim, and Jeffrey Milyo. 2005. “A measure of media bias.” The Quarterly Journal
of Economics pp. 1191–1237.
33
Groseclose, Tim, Steven D Levitt, and James M Snyder. 1999. “Comparing interest group
scores across time and chambers: Adjusted ADA scores for the US Congress.” American
Political Science Review 93(01): 33–50.
Hall, Andrew, and James Snyder. 2015. “Candidate Ideology and Electoral Success.” Unpublished manuscript. Available for download at https://dl.dropboxusercontent.com/u/
11481940/Hall_Snyder_Ideology.pdf.
Hall, Andrew B. 2015. “What Happens When Extremists Win Primaries?” American
Political Science Review 109(01): 18–42.
Hare, Christopher, David A Armstrong, Ryan Bakker, Royce Carroll, and Keith T Poole.
2014. “Using Bayesian Aldrich-McKelvey Scaling to Study Citizens’ Ideological Preferences
and Perceptions.” American Journal of Political Science .
Henderson, John Arthur. 2013. “Downs’ Revenge: Elections, Responsibility and the Rise
of Congressional Polarization.” Unpublished PhD Dissertation. Available for download at
http://gradworks.umi.com/36/16/3616463.html.
Hill, Seth, and Greg Huber. 2015. “Representativeness and Motivations of Contemporary
Contributors to Political Campaigns: Results from Merged Survey and Administrative
Records.” Unpublished manuscript. Available for download at http://www.sethjhill.
com/HillHuberDonorate_062515.pdf.
Jessee, Stephen A. 2012. Ideology and Spatial Voting in American Elections. New York,
NY: Cambridge University Press.
Joesten, Danielle A, and Walter J Stone. 2014. “Reassessing Proximity Voting: Expertise,
Party, and Choice in Congressional Elections.” The Journal of Politics 76(3): 740–753.
Kousser, Thad, Justin Phillips, and Boris Shor. 2015. “Reform and Representation: A New
34
Method Applied to Recent Electoral Changes.” Unpublished manuscript. Available for
download at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2260083.
Lee, David S, Enrico Moretti, and Matthew J Butler. 2004. “Do voters affect or elect policies?
Evidence from the US House.” The Quarterly Journal of Economics 119(3): 807–859.
Maestas, Cherie D., Matthew K.; Buttice, and Walter J. Stone. 2013. “Replication data for:
Extracting Wisdom from Experts and Small Crowds: Strategies for Improving Informantbased Measures of Political Concepts.” http://dx.doi.org/10.7910/DVN/23170.
Maestas, Cherie D, Matthew K Buttice, and Walter J Stone. 2014. “Extracting Wisdom
from Experts and Small Crowds: Strategies for Improving Informant-based Measures of
Political Concepts.” Political Analysis 22(3): 354–373.
McCarty, Nolan M, Keith T Poole, and Howard Rosenthal. 2006. Polarized America: The
Dance of Ideology and Unequal Riches. Cambridge, MA: MIT Press Cambridge.
Montagnes, B Pablo, and Jon C Rogowski. 2014. “Testing Core Predictions of Spatial Models:
Platform Moderation and Challenger Success.” Political Science Research and Methods
Forthcoming.
Nokken, Timothy P, and Keith T Poole. 2004. “Congressional party defection in American
history.” Legislative Studies Quarterly 29(4): 545–568.
Poole, Keith T. 2005. Spatial models of parliamentary voting. Cambridge University Press.
Poole, Keith T, and Howard L Rosenthal. 2011. Ideology and Congress. Transaction Publishers.
Ramey, Adam. 2016. “Vox Populi, Vox Dei? Crowdsourced Ideal Point Estimation.” Journal
of Politics 78(1).
35
Rogowski, Jon C. 2015. “Faithful Agents? Electoral Platforms and Legislative Behavior.”
Unpublished Manuscript. Available for download at https://pages.wustl.edu/files/
pages/imce/rogowski/measure_accountability_10-14.pdf.
Rogowski, Jon C, and Stephanie Langella. 2014. “Primary Systems and Candidate Ideology
Evidence From Federal and State Legislative Elections.” American Politics Research 43(5):
846–871.
Shor, Boris, and Jon C Rogowski. 2015. “Ideology and the US Congressional Vote.” Unpublished manuscript. Available for download at http://papers.ssrn.com/sol3/papers.
cfm?abstract_id=2650028.
Shor, Boris, and Nolan McCarty. 2011. “The Ideological Mapping of American Legislatures.”
American Political Science Review 105(03): 530–551.
Shor, Boris, and Nolan McCarty. 2014. “Individual State Legislator Shor-McCarty Ideology
Data, July 2014 update.” http://dx.doi.org/10.7910/DVN/26805.
Shor, Boris, Christopher Berry, and Nolan McCarty. 2010. “A Bridge to Somewhere:
Mapping State and Congressional Ideology on a Cross-institutional Common Space.”
Legislative Studies Quarterly 35(3): 417–448.
Snyder, James M. 1991. “On Buying Legislatures.” Economics & Politics 3(2): 93–109.
Stone, Walter J, and Elizabeth N Simas. 2010. “Candidate valence and ideological positions
in US House elections.” American Journal of Political Science 54(2): 371–388.
Stratmann, Thomas. 2000. “Congressional voting over legislative careers: Shifting positions
and changing constraints.” American Political Science Review 94(03): 665–676.
Thomsen, Danielle M. 2014. “Ideological Moderates Won?t Run: How Party Fit Matters for
Partisan Polarization in Congress.” The Journal of Politics 76(03): 786–797.
36
Appendix A: Comparison of our Results with the Validation Metrics in the Source Papers
In this section, we compare our results to the results reported in the original papers that we
evaluate. Table 8 shows the percentage of the variation in incumbents’ DW-Nominate scores
explained by each model that we report in the main text. It also shows the percentage of the
variation in DW-Nominate scores that each of the source papers reports that their model
explains.
Table 8: Validation Statistics for Various Measurement Models Against Contemporaneous Nominate Scores in the U.S. House - Comparison with Results in Source Papers
Name
State Leg.
Ideal Pts.
NPAT
Scores
Survey
Respond’s
Experts
Twitter
CF-Score
Success in Explaining Within-Party Variation in DW-Nominate Scores for Democrats
% Variation Explained (our analysis)
0.466
0.545
0.483
0.601
0.462
% Variation Explained (source paper)
NA
NA
0.573
0.518
0.519
0.203
0.314
Success in Explaining Within-Party Variation in DW-Nominate Scores for Republicans
% Variation Explained (our analysis)
0.284
0.368
0.183
0.252
0.118
% Variation Explained (source paper)
NA
NA
0.289
0.313
0.123
0.279
0.436
Overall, our results regarding the amount of within-party variation explained by each
model are very similar to the results reported in the source papers.25 The only notable
differences between our results and those in the source papers are for CF-Scores (Bonica,
2014) and the Survey-based scores (Ramey, 2016). This difference for CF-Scores likely
stems from the fact that we focus on the 107-113 Congresses, while Bonica (2014, 370-371)
focuses on the 96-112 Congresses. There may have been a tighter relationship between CFScores and DW-Nominate scores in earlier Congresses. The difference for the survey-based
scores appears to stem from the fact that Ramey (2016) uses W-Nominate rather than DW25
Shor and McCarty (2011) and Shor and Rogowski (2015) do not report correlations with DW-Nominate
scores in their papers. Also, note that results from Barberá (2015) that we report in Table 8 were calculated
from his replication data based on members of the U.S. House in the 112th Congress. The correlations
reported in the paper are somewhat higher, but they include both members of the U.S. House and Senate.
37
Nominate to validate his estimates, and there is a slightly higher correlation between the
survey scores and W-Nominate than there is with DW-Nominate.
38
Appendix B: Vote-by-Vote Statistics
In the main body of this paper, we focus on the fit of each measure to DW-Nominate scores.
This is for the simple reason that DW-Nominate recovers a latent dimension which best
summarizes roll call voting behavior, according to a likelihood model. Methods based on
other likelihood models, such as Clinton, Jackman, and Rivers’s (2004) IDEAL, or methods
which explicitly maximize classification, such as Poole’s (2005) Optimal Classification, lead
to an extremely similar result.
Nonetheless, it is possible to examine the fit of each measure to the votes themselves, and
examine the degree to which these measures lead to correct classifications. To do this, we run
univariate logistic regressions for every vote case in the House of Representatives from 2003 to
2012. For each measure, we calculate predicted votes and compare them to the actual votes.
We consider three measures. The first is Poole and Rosenthal’s (2011) Average Proprtional
Reduction in Error (APRE). The APRE calculates the overall reduction in error as compared
to a naive model that assumes all votes are cast with the majority. It is calculated as follows:
X
AP RE =
Minority Vote − Classification Errors
votes
X
Minority Vote
votes
The second measure is the Percent Correctly Predicted (PCP) which is simply the percent
of all non-missing votes that are correctly predicted. The third measure is the “Improvement
over Party.” Like the APRE, the “Improvement over Party” measures the reduction in error
above and beyond a naive model, but in this case the naive predictions are the predicted
values from a logistic regression on a dummy variable for the party identification of the
legislator. “Improvement over Party” is the percent reduction in error where the error from
the party model is in the denominator.
39
X
Improvement over Party =
Party Model Errors − Errors From This Model
votes
X
Party Model Errors
votes
Table 9 shows the results of these analyses for each measure discussed in the paper. It
also shows the number of votes analyzed, which varies due to the availability of the measures
in question. Party and DW-Nominate scores are included as separate measures.
Unsurprisingly, the results of this exercise mirror the results from the rest of the paper.
DW-Nominate is a measure based on the underlying votes. Despite the very high importance
of party in recent years, DW-Nominate scores substantially improve the classification of
votes. This is why we use DW-Nominate scores as a general measure of legislator behavior.
However, it is notable that significant idiosyncratic error remains.
Most of the remaining measures vary significantly in their explanatory power, which is
often close to 0, and sometimes even negative. No measure besides DW-Nominate consistently reduces error above and beyond party. Twitter scores and Survey-based Aldrich
McKelvey scores explain 8.9% and 8.6% of the variation left unexplained by party in the one
congress where they are available. This is still only 60% of the reduction in error achieved
by DW-Nominate. These are contemporaneous comparisons, and so they likely overestimate
the predictive power of these measures.
40
Table 9: Vote-by-vote Statistics for Each Measure
APRE
108th House, 2003-2004
Party 0.729
Dynamic CF-Score 0.732
Static CF-Score 0.734
NPAT 0.696
State Leg. 0.72
DW-Nominate 0.768
109th House, 2005-2006
Party 0.693
Dynamic CF-Score 0.686
Static CF-Score 0.695
NPAT 0.649
State Leg. 0.666
DW-Nominate 0.731
110th House, 2007-2008
Party 0.773
Dynamic CF-Score 0.772
Static CF-Score 0.774
State Leg. 0.776
DW-Nominate 0.815
111th House, 2009-2010
Party 0.75
Dynamic CF-Score 0.758
Static CF-Score 0.754
Experts 0.723
State Leg. 0.744
DW-Nominate 0.796
112th House, 2011-2012
Party 0.743
Dynamic CF-Score 0.753
Static CF-Score 0.748
Twitter 0.766
Survey 0.765
State Leg. 0.724
DW-Nominate 0.779
PCP
Votes
Improvement
over Party
0.917
0.918
0.918
0.91
0.924
0.929
406,387
389,117
389,117
115,780
62,640
406,387
0
0.011
0.018
-0.122
-0.033
0.144
0.907
0.906
0.909
0.904
0.908
0.919
432,589
420,391
420,391
97,381
74,989
432,589
0
-0.023
0.007
-0.143
-0.088
0.124
0.926
0.925
0.926
0.924
0.94
641,222
626,308
626,308
120,105
641,222
0
-0.004
0.004
0.013
0.185
0.931
0.933
0.932
0.934
0.923
0.943
502,225
484,762
484,762
169,947
101,641
502,225
0
0.032
0.016
-0.108
-0.024
0.184
0.909
0.913
0.911
0.924
0.917
0.914
0.922
635,443
616,114
616,114
214,589
628,296
149,986
635,443
0
0.039
0.019
0.089
0.086
-0.074
0.14
41