Estimating Candidate Positions in a Polarized Congress∗ Chris Tausanovitch† Department of Political Science UCLA Christopher Warshaw‡ Department of Political Science Massachusetts Institute of Technology February 13, 2016 Word Count: 9,650 Abstract: In order to test theories of legislative polarization, representation, and accountability, it is crucial to have accurate measures of candidates’ policy positions. To address this challenge, scholars have developed a variety of innovative measurement models based on survey data, campaign finance contributions, and social networks. But there has not been a comprehensive evaluation of these methods that examines their accuracy and usefulness for testing important theories. In this paper, we find that each of these measurement models accurately estimates the political party of legislative candidates, but they do poorly at distinguishing the ideological extremity of candidates within each party. As a result, they fall short when it comes to facilitating empirical analysis of theories of representation and spatial voting. More generally, our findings suggest that even with large amounts of data and advanced statistical models it is very difficult to predict candidates’ policy positions. This has profound implications for democratic governance. ∗ We are grateful for feedback about this project from Gregory Huber, Seth Hill, Howard Rosenthal, Adam Bonica, Walter Stone, Boris Shor, Nolan McCarty, Jon Rogowski, Pablo Barbera, Adam Ramey and participants at the 2015 American Political Science Association Conference. † Assistant Professor, Department of Political Science, UCLA, [email protected]. ‡ Assistant Professor, Department of Political Science, Massachusetts Institute of Technology, [email protected]. 1 Thanks to a large body of innovative work, political scientists have access to rigorous measures of the policy positions of incumbent legislators based on their roll call positions (e.g., Poole and Rosenthal, 2011; Clinton, Jackman, and Rivers, 2004; Groseclose, Levitt, and Snyder, 1999). This work has spawned a large literature on the causes and effects of legislators’ ideological extremity. However, the seminal work in the field was generally limited to incumbents, leaving scholars with little information on the policy positions of non-incumbents running for Congress. Over the past decade, scholars have increasingly focused on developing comparable measures of incumbent and non-incumbent candidates’ policy positions in order to test theories of political economy and representation. Are ideologically extreme candidates punished at the ballot box (Black, 1948; Downs, 1957; Enelow and Hinich, 1984; Hall, 2015)? How much does the ideological leaning of a district influence the ideological positions of candidates that run for Congress (Ansolabehere, Snyder Jr, and Stewart, 2001)? How much does the available pool of candidates affect the degree of legislative polarization (Thomsen, 2014)? Does variation in electoral rules in primaries affect the spatial positions of candidates that run for office (Kousser, Phillips, and Shor, 2015; Ahler, Citrin, and Lenz, Forthcoming; Rogowski and Langella, 2014)? Do ideologically extreme candidates raise less money than centrist candidates (Ensley, 2009)? In order to address these important theoretical questions, scholars have developed a variety of innovative measurement models to estimate candidates’ ideological positions. One approach is to estimate candidate positions based on their political positions outside of Congress, such as their roll call votes in state legislatures (Shor, Berry, and McCarty, 2010; Shor and McCarty, 2011) or responses to surveys (Montagnes and Rogowski, 2014). Another approach is to use data on the perceptions of voters (Aldrich and McKelvey, 1977; Hare et al., 2014; Ramey, 2016) or experts (Joesten and Stone, 2014; Maestas, Buttice, and Stone, 2014; Stone and Simas, 2010) about candidates’ ideological positions. A third approach is to assume that some set of behavior by citizens or donors is based on their implicit perceptions of candidates’ positions. For instance, Bonica (2013b, 2014) and Hall and Snyder (2015) 1 estimate the ideology of political figures using the composition of their campaign donors based on the plausible assumption that donors give to candidates who are similar to themselves politically. Following similar logic, Barberá (2015) estimates positions using the constellation of individuals who follow a candidate on the social media service Twitter. Despite their importance, little effort has been put into comparing these methods and rigorously evaluating their accuracy and usefulness for testing theories of interest. In this paper, we test the accuracy of six prominent measures of candidate positions by examining their convergent validity (Adcock and Collier, 2001).1 In other words, we examine the relationship between each measure and candidates’ roll call behavior.2 Specifically, we examine how well they predict legislators’ DW-Nominate scores during the period between 2000-2012 within each party. We focus on this period because many empirical studies focus on recent congresses, and these congresses may be particularly difficult to predict because they are so polarized. Our findings indicate that all six measures correctly classify candidates into the appropriate party. However, none of these measures provide accurate estimates of candidates’ roll call positions within their party. The low within-party accuracy of these measures can be seen by looking at prominent individual members of Congress. For instance, Republican congressman Dave Reichert’s DW-Nominate score places him among the most liberal members of his party in 2010. However, measures of his spatial position based on survey respondents (Ramey, 2016) and his campaign finance (CF) contributions (Bonica, 2014) place him in the most conservative tercile of his party. Peter King’s roll call record also places him among the most liberal members of his party. His Twitter score, however, places him in the more conservative half of the Republican party. On the Democratic side, Henry Waxman’s DWNominate score places him among the most liberal group of Democrats. But his CF-Score 1 We also attempted to evaluate the validity of the measures of candidate ideology in Bond and Messing (2015). However, the authors of this study were unable to share replication data due to Facebook’s privacy policy. 2 This approach is supported by the validation strategy used by the underlying papers for these models. Indeed, each of the underlying papers for these models use roll call behavior as benchmark for their results. 2 places him among the most conservative tercile of Democrats. We find that none of the existing methods explain more than half of the variation in incumbent House members’ contemporaneous DW-Nominate scores within each party. They perform even more poorly at predicting the roll call behavior of non-incumbent candidates that go on to win election and serve in Congress. Overall, these existing methods improve only marginally on candidates’ party identification for predicting their roll call behavior in Congress.3 The fact that sophisticated models fail (so far) to predict legislative behavior is an interesting finding in itself, as it bodes poorly for the ability of average citizens to predict the behavior of the candidates they must choose between. Moreover, the modest within-party relationship between these measures and candidates’ roll call positions is problematic because most of the fundamental questions about political economy, polarization, and representation that we wish to answer involve comparisons of candidates within their parties. In the penultimate section of the paper, we show that the measurement error in the nonroll call based measures of candidate positions leads to inferential errors for two important questions in political science. First, we compare changes in polarization over time across DW-Nominate and other time-varying measures of candidate positions (NPAT-scores and CF-scores). These measures all show dramatically different stories regarding the relative changes in polarization over the past decade. The substantively different trends in polarization across models indicate that it is unlikely that these models are actually measuring the same latent quantity. Moreover, the variation in polarization between CF-scores and DW-Nominate calls into questions the usage of CF-scores to examine polarization outside of Congress (e.g., Rogowski and Langella, 2014). Next, we examine how inferences about the effect of legislators’ ideology on distributive spending vary across measures. In line with Alexander, Berry, and Howell (2016), we show that legislators with extreme DW-Nominate scores get less distributive spending. However, there is no significant relationship between 3 It is important to note that we focus our analysis on recent Congresses. It is possible that these measures perform better in earlier, less polarized, Congresses. 3 other measures of ideology and distributive spending, which suggests that their usage as a proxy for legislators’ roll call behavior could lead to incorrect inferences on fundamental questions in political economy and the study of legislatures. While the measures we evaluate in this paper perform poorly at predicting legislators’ roll call positions, they do have a number of other valuable uses. Because each of the measures we evaluate accurately classify candidates into the correct party, they could be extremely valuable for contexts where partisanship is not readily available. For instance, these measures could be used to impute the partisanship of candidates running in nonpartisan elections (de Benedictis-Kessner and Warshaw, 2015). They could also be used to impute the partisanship of voters when survey data is unavailable (Hill and Huber, 2015). In addition, the mismatch between these measures and legislators’ roll call behavior raises a host of interesting questions. For example, the mismatch between survey respondents’ perceptions and candidates’ actual roll call positions suggests the need for new research to determine whether legislators are strategically manipulating their positions in campaigns (see, e.g., Cormack, 2015; Henderson, 2013). These measures also have a number of potential applications for specific substantive questions that are unrelated to legislative behavior, such as the campaign finance behavior of lawyers (Bonica and Sen, 2015) and physicians (Bonica, Rosenthal, and Rothman, 2014). The paper proceeds as follows. First, we discuss background theories and literature on the task of estimating candidate positions. Next, we discuss the benchmark model, DWNominate, that we use to evaluate measures of candidate positions. We also discuss the measurement models of candidate positions that we evaluate in this paper in more detail. In the following section, we discuss our validation strategy. Then, we evaluate the various measures of candidate positions using a variety of different approaches. The penultimate section examines variation in substantive inferences across different measures of candidate positions for the study of polarization and distributive spending. The final section briefly concludes. 4 Background Measuring the ideological preferences and behavior of political officeholders and candidates is central to the study of American Politics. Most of the canonical work on measuring candidates’ ideology has focused on incumbent legislators’ roll call behavior. Indeed, roll call positions are the gold standard for measuring legislators’ political positions. However, an important challenge is that roll call behavior is only available for incumbents. In order to test theories of representation, polarization, and accountability, scholars need measures of the ideological positions of both incumbents and non-incumbents. For example, in order to examine whether polarization is increasing, it is important to know whether the ideological positions of Democratic and Republican candidates in each district are diverging over time (Ansolabehere, Snyder Jr, and Stewart, 2001). To examine theories of spatial voting, we need to know whether voters are more likely to vote for the more spatially proximate candidate, which requires measures of the ideological positions of both Democratic and Republican candidates (e.g., Jessee, 2012; Joesten and Stone, 2014; Shor and Rogowski, 2015). In order to address these important substantive questions, a large body of methodological work has been done in recent years to measure the ideological positions of incumbents and non-incumbents on a common scale. For instance, Bonica (2013b, 2014) estimates the ideology of both incumbent and non-incumbent candidates based on the composition of their campaign donors. The primary validation metric for models of candidates’ spatial positions is typically the proportion of the variation in legislators’ roll call behavior that they explain (e.g., Barberá 2015, Bonica 2014, Hare et al. 2014, Joesten and Stone 2014). Indeed, several papers highlight that they successfully predict 90% or more of the variation in incumbent legislators’ ideal points on roll call votes. However, there are two problems with this validation strategy. First, a good measure of candidate ideology should also be able to outperform measures that are much simpler and more parsimonious. In recent years, over 90% of the variation in DW-Nominate scores can be predicted by the party identification of the legislator. Po5 larization in Congress has been on the rise since the 1970s (Poole and Rosenthal, 2011). As the parties have become more extreme and more homogeneous, across-party prediction of DW-Nominate scores has become easier and within-party prediction more difficult. Thus, many measures are able to report very high correlations with DW-Nominate because they have very high correlations with party ID. The problem with such a measure is not just that it might as well be replaced with party identification. Understanding within-party variation in preferences is vitally important for understanding polarization and spatial voting. Polarization is a process by which extreme legislators are replacing moderates within each party. In order to identify instances of this process, we need measures of preferences that can accurately identify which candidates in nomination contests are more extreme than other candidates within their party. Likewise, spatial voting involves judgements about which candidates are closer in some sense to particular voters, which requires accurate measures of the spatial location of candidates within their party. Second, existing papers generally focus on their ability to estimate the positions of incumbent legislators. But we already have good estimates of incumbent legislators’ behavior based on their roll call positions. Thus, the most common use of the estimates from the recent wave of models is to provide estimates of non-incumbents’ spatial positions. Few of the existing papers validate their measures of non-incumbents’ positions against their future roll call positions.4 There are a variety of reasons to think that pre-election measures of candidates’ ideology may not be accurate predictors of their future roll call records. Although candidates make commitments and promises during their campaigns, these commitments are rarely enforceable (Alesina, 1988). Incumbent legislators are widely believed to be in a highly advantageous position to win reelection (Gelman and King, 1990; Lee, Moretti, and Butler, 2004), so pun4 An exception is Bonica (2014, 371), which validates campaign-finance (CF) scores against candidates’ future DW-Nominate scores. Also, Bonica (2013b, 298-299) validates campaign-finance (CF) scores for non-incumbents against the same candidate’s future CF-Score. But it does not validate them against candidates’ future roll call behavior. 6 ishing legislators for unkept promises may be difficult, and may even risk electing a legislator from the opposite party. The quirks of political geography are also important in shaping candidate’s support bases. Social media commentators, donors, and the public are limited in the choice of viable candidates to support in any particular district. Information gleaned from these relationships may be a feature of the limited choice set rather than true similarity. As a result, we should not assume that measures based on these sources will ultimately reflect actual legislative behavior. Benchmark Model of Legislator Policy Positions As our benchmark metric, we use candidates’ DW-Nominate scores (Poole and Rosenthal, 2011), which are estimated using actual roll call votes cast in Congress. DW-Nominate is a measure of legislators’ induced preferences: their preferred choices given the combination of their own personal beliefs and the incentives they face. DW-Nominate is the gold standard for measuring legislator preferences, broadly construed, because legislative action, not public communication, is what is implicated in most theories of spatial voting, representation, and polarization. If elections are a meaningful constraint, they must constrain what legislators do, not just what legislators say during the campaign.5 Moreover, Nominate scores are also the benchmark used by nearly all of the existing measurement models of candidate positions that we assess below (Barberá 2015, 82, Bonica 2014, 370-371, Hare et al. 2014, 769-770, Joesten and Stone 2014, 745). A legislator’s Nominate score is a measure of the ideal point, xi , of legislator i. In considering a bill, j, legislators choose the outcome that gives them greater utility: either the status quo, aj or the policy that would be enacted if the bill were passed, bj . Their utility for any outcome is a function of the distance between their ideal point, xi , and the 5 Of course, it need not be the case that a legislator’s DW-Nominate score agrees with the image that she tries to portray of herself, or her own “true” preferences. Indeed, there is research showing that legislators often try to give an impression of themselves that does not reflect their voting records (Cormack, 2015; Henderson, 2013). 7 outcome in question, aj or bk , plus a random error that represents idiosyncratic or random features of the legislator’s utility. If the status quo point is ‘closer’ to what the legislator wants, then she votes nay. If the bill is closer, she votes yea. The only exception is if the random shock to her utility is enough to make her prefer the less close option more. This will be more likely when the legislator is close to indifferent between the two options. If we make a few simplifying assumptions, we can write the probability that a legislator votes in favor of a bill (yea) is as follows:6 P (yij = Y ea) = P ((xi − bj )2 − (xi − aj )2 + ij > 0) (1) The probability of a vote against (nay) is one minus the probability of a vote in favor. The likelihood of the model is simply the product of the likelihoods of every vote. This model is often referred to as the quadratic utility item response model. The “ideal point” summarizes a legislator’s preferences in the sense that legislators will tend to prefer bills that are closer to their ideal points on average. Observing simply the y matrix of vote choices, we can estimate the latent x’s that underly those choices. Alternative Measures of Candidate Positions In recent years, scholars have developed three broad groups of measurement models to estimate the spatial locations of both incumbents and non-incumbents based on some set of information other than roll call votes in Congress. These measurement models all assume that some observed behavior is generated by unobserved, latent preferences. Thanks to the finding by Poole and Rosenthal that in recent congresses one-dimensional summaries of voting are almost as good as much higher dimensional summaries, all of the measures under 6 Poole and Rosenthal (2011) put flesh on this model by assuming a normal curve as the shape of the utility functions, and errors ij that are logistically distributed. A much simpler formula results if we use quadratic utility with normal errors (Shor and McCarty, 2011). Clinton, Jackman, and Rivers (2004) show that the results of this model are almost identical to the results of Nominate. 8 study are unidimensional.7 So in each case, the ideology of a given individual is summarized by a single number, which we will denote as the variable xi where i indexes candidates. The choices in question often have features that are taken into account as well- choices will be indexed by j. In order to contrast the models used, we will attempt to harmonize the notation, and depart from that used by the original authors. Models of Ideology Based on Political Positions Outside Congress One potential approach for measuring the ideology of candidates is to use information from their political positions outside of Congress. For instance, we could estimate the ideology of state legislators that run for Congress based on their roll call votes in state legislatures. Shor and McCarty (2011) use a spatial utility model similar to equation 1 to estimate state legislators’ ideal points based on their roll call voting records from the mid-1990s to 2012. They bridge together the ideal points of state legislators in different states using surveys of legislators from 1996 to 2009. In total, they estimate the positions of 18,000 state legislators.8 Of course, only a fraction of these state legislators become candidates for Congress, and even fewer win election to Congress. Moreover, a changing constituency in Congress may lead candidates to adapt their behavior (Stratmann, 2000). Another approach is to use only candidates’ responses to questionnaires about their positions. The most widely used questionnaire is the National Political Awareness Test (NPAT) survey conducted by Project Vote Smart. This is the survey that Shor and McCarty (2011) use to link legislators from different states. Ansolabehere, Snyder Jr, and Stewart (2001) use factor analysis to estimate candidates’ spatial positions based on the NPAT survey. More recently, Montagnes and Rogowski (2014), Shor and Rogowski (2015), and others use a 7 Technically, the DW-Nominate and W-Nominate scores are two dimensional, but almost all the information in recent congresses is supplied by the first dimension. All of the other measures are explicitly one-dimensional 8 It is important to note that these measures are important in their own right for the study of polarization, representation, and accountability in state legislatures, regardless of their ability to predict congressional candidates’ positions. 9 spatial utility model similar to equation 1 to estimate candidates’ ideal points based on their NPAT responses. These estimates have been widely used in the applied, empirical literature for studies on polarization, spatial voting, elections, and other topics. Models of Ideology Based on Perceptions of Candidate Positions Rather than using roll call votes, another approach is to estimate candidate positions from survey respondents’ or experts’ explicit perceptions of candidates’ ideological positions. This approach has the benefit of providing estimates for candidates that did not serve in the state legislature. Indeed, conceptually one could imagine survey respondents or experts rating thousands of candidates for all levels of office. Stone and Simas (2010) and Joesten and Stone (2014) pioneered the use of experts to rate candidates’ ideological positions. These studies survey a sample of state legislators and party convention delegates and ask them to place their congressional candidates on a 7-point scale.9 These “expert informants” can label candidates as either very liberal, liberal, somewhat liberal, moderate, somewhat conservative, conservative, or very conservative. The resulting scores are adjusted by subtracting/adding the average difference between partisans and independents. Averaging responses is a sensible approach if we assume that errors in perceptions are symmetrically distributed. Although Joesten and Stone (2014) correct for the average “bias” from partisanship, they do not attempt to correct for the fact that individuals often use scales differently. For instance, some individuals may think that “very liberal” is an appropriate term for anyone who is not a Republican whereas others may reserve the term for revolutionary socialists. When individuals are asked to rate a variety of politicians and political entities, their own tendencies in the use of the scale can be accounted for. This observation led Aldrich and McKelvey (1977) to the following model: 9 Maestas, Buttice, and Stone (2014) improve on the measurement model in Joesten and Stone (2014). However, we will focus here on Joesten and Stone (2014) for simplicity. 10 x̃ij = wj (xi − cj ) + ij (2) x̃ij is personj ’s placement of candidate i. wj and cj are coefficients that capture person j’s individual use of the scale, which can be estimated because each person places multiple candidates and political entities. xi is again the actual, latent position or preferences of candidate i. Hare et al. (2014) and Ramey (2016) use a Bayesian variant of this model to estimate candidate locations based on the perceptions of survey respondents.10 Models of Ideology Based on Spatial Models of Citizen Behavior Another approach is to measure candidates’ ideology based on the idea that some set of behavior by voters or citizens is driven by a spatial model which is a function of candidate positions. For instance, we could assume that citizens donate to spatially proximate candidates. Likewise, we could assume that social network users follow spatially proximate candidates on Facebook and Twitter. In Barberá (2015), the choice of Twitter users whether or not to follow political candidates is assumed to be a function of the policy distance between the Twitter user and the candidate.11 The Twitter user follows the candidate if the utility of doing so is greater than some threshold, t, where utility is once again quadratic. Barberá uses a logistically distributed random error, which is very similar to the normal distribution. So the probability that user j follows candidate i is: P (yij = F ollow) = P (−(xi − θj )2 + ij > t) 10 (3) Ramey (2016) allows the variance of the error to have a candidate-specific component, and we follow this specification. There are many possible extensions. For instance, Hare et al. (2014) allow the error variance to have both a candidate-specific and a rater-specific component. 11 Twitter is a social media platform that allows users to send brief messages to other users who choose to receive these messages or “follow” them. 11 In order to allow for arbitrary levels of sensitivity to this distance, Barberá (2015) adds a scaling parameter, γ, as well as two different intercepts, recognizing that any given user can only follow so many accounts, and that many candidates have limited name recognition and thus few followers. αi captures candidate i’s overall popularity with users, and βj captures user i’s propensity for following people on Twitter. These intercepts are arbitrarily scaled, so we can replace our threshold t with an arbitrary fixed number, in this case 0. The following specification results: P (yij = F ollow) = P (αi + βj − γ(xi − θj )2 + ij > 0) (4) Bonica (2014) uses a similar model to estimate candidates’ ideology based on their campaign contributors. The main difference between Barberá (2015)’s model and the model in Bonica (2014) is that when it comes to campaign contributions, donors must choose both who to give to and how much to give. Bonica recodes all contribution amounts as categories of $100s of dollars, and uses correspondence analysis to recover ideal points.12 Validation Approach We use a multi-faceted approach to evaluate the various measures of candidate positions. First, we evaluate how much of the variation in incumbents’ contemporaneous DW-Nominate score that each measure explains. This is similar to the approach used to validate previous measures of candidate ideology (e.g., Joesten and Stone, 2014; Hare et al., 2014). However, we focus on the within-party explanatory power of each measure. We choose to focus on DW-Nominate scores because they are a simple and accurate summary of the entire roll call voting record. In Appendix B we show that our results are very similar if we instead focus 12 The correspondence analysis in Bonica (2014) is meant to approximate an IRT model similar to the one in Barberá (2015). It builds off of an earlier paper, Bonica (2013b), which actually estimates such a model. However, due to the very large size of the donation data, Bonica (2014) opts for this simpler method. 12 on fitting individual votes. Next, we evaluate how much each measure improves upon a much simpler model that simply assigns each candidate the mean ideal point of legislators in their party (xparty ). To do this, we linearly map each non-roll call based measure of candidates’ positions into Nominate’s space xj . Then, we calculate the average improvement for each measurement model compared to a model that assumes each candidate takes the mean ideological position of other candidates in their party. This is can be thought of as a R2 statistic but based on absolute rather than squared errors: 1 − mean(|xN ominate − xj |)/mean(|xN ominate − xparty |) (5) Finally, we evaluate the percentage of candidates that each method is able to correctly classify into the proper tercile of their Nominate score for their party. This provides a simple metric for the degree of confidence we should have in each model’s ability to roughly differentiate between moderates and extremists in each party (see, e.g., Hall, 2015). None of these statistics is sufficient to show that the measure in question is useful for measuring the spatial positions of non-incumbents. After all, our goal is not to obtain external measures of incumbents’ ideal points, it is to obtain external measures of nonincumbents’ ideal points in order to evaluate theories of polarization, spatial voting, and so on. So, next, we repeat these evaluations for non-incumbents who go onto win their election. Thus, our final evaluation examines how well each measure of candidate position predicts the future positions of non-incumbent House candidates who go on to win the election and compile a legislative record. This is a more difficult test, but a crucial one for validating these measure with respect to non-incumbents. Indeed, the most valuable use of non-roll call based measures of candidate positions is to predict how non-incumbents will vote after they take office. Before we proceed to evaluate each of the estimates of candidate position, it is important 13 Table 1: Validation Statistics for Simulations of DW-Nominate in the U.S. House and Senate DW-Nominate Simulations 1642 0.999 Observations Perc. Classified into Correct Party Success in Explaining Within-Party Variation in DW-Nominate Scores for Democrats Perc. Variation Explained 0.971 Perc. Correctly Classified (Terciles) 0.866 Prop. improvement on party baseline 0.923 Success in Explaining Within-Party Variation in DW-Nominate Scores for Republicans Perc. Variation Explained 0.944 Perc. Correctly Classified (Terciles) 0.837 Prop. improvement on party baseline 0.923 to first examine how well we could possibly do. After all, if the roll call voting record itself does not locate candidates very precisely, then we should not expect measures of candidate positions to accurately predict DW-Nominate scores. In order to conduct this evaluation, we randomly generate new DW-Nominate scores using the estimated scores and their bootstrapped standard errors, assuming that the scores are normally distributed. After generating these scores, we examine all of the above statistics. We limit ourselves to the 110th, 111th, and 112th sessions of Congress (2007-2012) in order to take account of the fact that recent congresses may be particularly hard to predict because they are particularly polarized. This exercise shows that DW-Nominate is very precisely measured. After 10,000 simulations, the simulated ideal points for Republicans explained 94% of the variance in the actual scores, and this statistic was 97% for Democrats. 84 to 87% of all legislators were placed in the correct tercile within their party, and the simulated value reduced error by 92% over using the party mean. Not surprisingly, virtually every simulation classified each legislator into the correct party. This is the upper limit of predictive accuracy for a measure of candidate positions – we should not expect to do better in predicting a legislator’s roll call voting behavior than we can do with the actual votes themselves. 14 Data In the next section, we evaluate six different models of candidates’ ideology (Table 2). Wherever possible, we use estimates that are supplied in the data from the corresponding paper for each model. We compare each measure with DW-Nominate scores that are based on 92,186 roll call votes in Congress taken from 1789 to 2014, identifying the positions of 11,976 legislators.13 Table 2: Methods for estimating candidate preferences Paper Data Benchmark Model Poole and Rosenthal (2011) Congressional roll call votes Statistical Model Spatial choice model (dynamic) Models of Ideology based on Candidates’ Political Positions Outside Congress Shor and McCarty (2011) State legislature roll call votes Spatial choice model Shor and Rogowski (2015) NPAT Responses Spatial choice model Models of Ideology based on Perceptions of Candidate Position Ramey (2016) Survey respondent perceptions Measurement error model Joesten and Stone (2014) Expert perceptions Party-adjusted average Models of Ideology based on Spatial Model of Citizen Behavior Barberá (2015) Followers on Twitter Spatial choice model Bonica (2014) Campaign contributions Correspondence analysis 1. State Legislative Ideal Points: Shor and McCarty (2011) use a model similar to equation 1 to estimate state legislators’ ideal points based on their roll call voting records from the mid-1990s to 2012. We downloaded Shor and McCarty’s data from the Dataverse (Shor and McCarty, 2014), and manually matched the estimates of state legislators’ ideal points to their ICPSR numbers that Poole and Rosenthal use to index their 13 More recent years tend to have more votes than earlier years. 15 DW-Nominate scores.14 2. NPAT scores: Montagnes and Rogowski (2014) use a model similar to equation 1 to estimate candidates’ ideal points based on their NPAT responses from the mid-1990s to 2006. Jon Rogowski generously shared an expanded version of the data used in this paper. 3. Aldrich-McKelvey estimates based on constituent perceptions: Based on a model similar to equation 2, Hare et al. (2014) and Ramey (2016) estimate the positions of Congressional candidates using survey responses from the Cooperative Congressional Election Study (CCES). In our evaluation, we focus on the estimates from Ramey (2016), which uses 109,935 survey responses from 2010 and 2012 to estimate the positions of House and Senate candidates.15 4. Expert Ratings: Based on a model similar to equation 2, Maestas, Buttice, and Stone (2014) uses data from 726 experts and over 4,000 survey respondents in 155 districts in 2010, for an average of about 30 raters per district. Each rater evaluates both the current incumbent’s ideology as well as the Democratic and Republican candidates’ ideology. We downloaded the replication data from the Dataverse (Maestas, Buttice, and Stone, 2013).16 5. Twitter scores: Barberá (2015) uses a model similar to equation 4 to estimate the latent ideology of several hundred House and Senate candidates using data on 301,537 Twitter users from November of 2012.17 14 Because state legislative ideal points are only available before legislators take office, we use them in the validation below for non-incumbents. 15 We downloaded the replication data for Ramey (2016) from the dataverse, and used this to analyze the ability of Aldrich-McKelvey scores to predict contemporaneous roll call positions. However, the replication data for Ramey (2016) does not include estimates for non-incumbent candidates. So we used what we believe to be the same data, from the 2010 and 2012 Cooperative Congressional Election Studies, and the same method, to compute our own estimates based on an identical measurement model. 16 We use the inclc pc09 variable for incumbent placements, dlc pc10 for Democratic candidates’ placements, and rlc pc10 for Republican candidates’ placements. 17 We downloaded the replication data from the Dataverse (Barbera, 2014). 16 6. Campaign Finance (CF) Scores: Bonica (2014) uses correspondence analysis to estimate the ideology of virtually every House and Senate candidate between 1980 and 2012 based on over 100 million contributions to political campaigns from 1979 to 2012. Correspondence analysis is meant to approximate a model similar to equation 4. We downloaded each congressional candidates’ dynamic and static CF-Score data from Adam Bonica’s DIME website (Bonica, 2013a).18 Importantly, all of the estimates that we are testing do not take congressional roll call votes into account as a source of information. It would not necessarily be wrong to do so.19 However, the fact that these models do not use roll call votes ensures that they are at least plausibly exogenous with respect to DW-Nominate scores. Validation Results In this section, we discuss the results of our evaluation of these measures of candidate positions for the period between 2000-2012. We focus on this period because many empirical studies focus on recent congresses, and recent congresses may be particularly hard to predict because they are so polarized. U.S. House Table 3 evaluates each model’s ability to provide accurate estimates of the partisan affiliation and ideological positions of incumbents in the U.S. House between 2000-2012. The first row indicates the number of observations available for each model.20 It is important to note that the sample size of CF-Scores is several times the available sample from the other models. The second row indicates the percentage of candidates that each model correctly classifies 18 We use the dynamic CF-Scores in each of the analyses that follow. However the results are very similar using static CF-Scores. 19 In fact, in many contexts it may make sense to leverage this information. See Groseclose and Milyo (2005) and McCarty, Poole, and Rosenthal (2006) for examples. 20 Note that each observation represents legislators’ estimates for a particular Congress. 17 Table 3: Validation Statistics for Various Measurement Models Against Contemporaneous Nominate Scores in the U.S. House Name Available Congresses: Observations Perc. Classified into Correct Party NPAT Scores (106-109) 546 0.923 Survey Respond’s (112) 427 1.000 Experts Twitter CF-Score (111) 159 1.000 (112) 144 1.000 (106-112) 3453 0.983 Success in Explaining Within-Party Variation in DW-Nominate Scores for Perc. Variation Explained 0.545 0.483 0.601 Perc. Correctly Classified (Terciles) 0.602 0.577 0.630 Prop. improvement on party baseline 0.311 0.256 0.386 Democrats 0.462 0.203 0.660 0.460 0.286 0.104 Success in Explaining Within-Party Variation in DW-Nominate Scores for Perc. Variation Explained 0.368 0.183 0.252 Perc. Correctly Classified (Terciles) 0.493 0.568 0.540 Prop. improvement on party baseline 0.225 0.124 0.117 Republicans 0.118 0.279 0.670 0.568 0.086 0.166 into the correct party based on a simple logistic regression model. It shows that all of the models do extremely well at classifying candidates into the correct party. The next two panels of Table 3 present each of our validation statistics of the withinparty relationship between the estimates from each model and DW-Nominate scores, broken down by party. The first column examines the ability of the estimates of the ideal points of candidates based on Project Vote Smart’s NPAT survey to accurately place incumbents’ positions in Congress. NPAT scores explain about 54% of the variation in DW-Nominate scores for Democratic members, and 37% of the variation for Republican members. They also correctly classify 60% of Democrats and 49% of Republicans into the correct tercile of the distribution of DW-Nominate scores. The model reduces overall error by 31% for Democrats and 23% for Republicans over a model that assumes that all Democrats and all Republicans are the same. Clearly, NPAT scores are better than this much simpler model, but not by very much. The other measures do not fare much better. No model of candidate positions explains 18 more than half of the variation in incumbents’ DW-Nominate scores across both parties.21 Moreover, no model correctly classifies at least two thirds of the members of both parties into the correct tercile. Moreover, each model only modestly improves over a much simpler model that assumes that all Democrats and all Republicans are the same. These contemporaneous relationships between each measure and DW-Nominate are shown visually in Figures 1 and 2. Figure 1 shows these relationships for Democrats, and Figure 2 shows them for Republicans. Each panel contains a scatterplot of individual measurements as well as a loess line to allow a more flexible comparison between the measure and the true value of DW-Nominate. As a baseline, in the upper-left panel in each Figure we compare the Nokken-Poole scores for each representative that served in the House since 2000 with their lifetime Nominate scores (Nokken and Poole, 2004). Nokken-Poole scores are based on the same data as DWNominate but use a different measurement model. The difference between these two sets of scores reflects the degree of difference we would expect due to arbitrary modeling choices alone. We find that they are closely related. The Nokken-Poole scores explain about 88% of the variation in Nominate scores for Democrats, and 79% for Republicans. Besides a few outlying values, the relationship between Nokken-Poole and Nominate scores is surprisingly linear. In contrast, the other panels in both figures are extremely noisy. This is not surprising, as these figures simply graph the data that underlies the statistics in Table 3. So far, we have leveraged all the data we have from 2000 to 2012 to examine contemporaneous predictive accuracy. However, our desired purpose is to use these measures to assess a counterfactual: what DW-Nominate score would a candidate have if they were a sitting legislator? Unfortunately we cannot answer this question directly, but we can come closer. For many of our data points, we observe a candidate who wins their election and subsequently gets assigned a DW-Nominate score. We can repeat the statistics from Table 21 In the online appendix, we compare our results to the validation results reported in each paper. Overall, our results regarding the amount of within party variation explained by each model are very similar to the results reported in the source papers. 19 0.25 0.25 0.00 ● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ●● ● ● ●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●●● ● ●● ●● ● ● ● ●●●●● ●● ● ●●●●●● ● ● ● ● ● ● ●●● ●●● ● ● ● ● ● ●●●● ● ● DW−Nominate Score DW−Nominate Score ● ● ● ● ● −0.25 R^2 = 0.9 −0.50 ● −0.75 0.00 ● −0.25 ● −0.50 −0.6 −0.3 0.0 0.3 −1.5 Nokken−Poole Score R^2 = 0.46 −1.0 −0.5 0.25 ● ● DW−Nominate Score DW−Nominate Score ● Twitter Score (112th Congress) 0.25 0.00 ● −0.50 ● ● −0.75 −0.9 −0.25 ● ●● ● ● ●●●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ●●●●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●● ●● ● ●●●● ● ●● ● ●● ● ● ●● ● ●●● ● ●● ● ●●● ● ●● ● ● ● ●● ●● ●● ● ●● ● ●● ● ●● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●● R^2 = 0.54 0.00 −0.25 −0.50 ● −0.75 ● ● ●● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● R^2 = 0.48 −0.75 −2 −1 0 1 −0.75 NPAT Score −0.50 −0.25 0.00 0.25 Aldrich−McKelvey Score (112th Congress) 0.25 0.25 DW−Nominate Score DW−Nominate Score ● ● ● 0.00 ● ● ●● ●● −0.25 −0.50 ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ●● ●● ● ●● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● R^2 = 0.6 −0.75 ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ●●●● ● ●● ●●●● ● ●●● ● ● ● ●● ●●● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●●● ● ● ● ●●●● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ●● ● ●●●●●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ●● ●● ●● ●● ● ● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ●● ●● ●● ● ●● ●● ● ●● ● ● ●● ● ● 0.00 −0.25 R^2 = 0.2 −0.50 −0.75 2 3 4 −2 Expert Assessment Score (111th Congress) −1 0 1 CF Score Figure 1: The relationship between DW-Nominate and various measures of candidate positions for Democrats in the House between 2000-2012 3, but this time each measure is taken from a candidate for the House of Representatives who has not previously held office. Their DW-Nominate score is the score they receive in the next Congress after they win election. Table 4 shows these predictive results. The first thing to note about Table 4 is that each model has far fewer observations (row 1) than in Table 3. This is due to the fact that most of the non-incumbents for which we 20 1.00 ● ●●● ● ●● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ● ● ●● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ●●●●●● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●●●●● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ●● ●●●●● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ●●●●● ● ● ● ● ● ● ● ●●●● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●●● ● ● ● ●●●●● ●●● ● ●● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● 0.75 0.50 0.25 R^2 = 0.82 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● DW−Nominate Score DW−Nominate Score 1.00 0.75 0.50 0.25 R^2 = 0.12 0.25 0.50 0.75 1.00 0.5 Nokken−Poole Score 1.00 ● ● ● ●● ● ●● ● ● ● ●● ●● ● ●● ●●●● ● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ●●●●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● DW−Nominate Score DW−Nominate Score 1.0 1.5 2.0 Twitter Score (112th Congress) 1.00 0.25 ● 0.00 0.00 0.50 ● ● 0.00 0.75 ● ● ● R^2 = 0.37 ●● 0.75 ● ● 0.50 ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●●●● ● ●●●● ● ● ● ● ● ● ● ● ●●● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● ● ● ●● ●● ● ● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ●● 0.25 ● R^2 = 0.18 ● 0.00 0.00 0.0 0.5 1.0 1.5 2.0 0.4 NPAT Score 1.00 DW−Nominate Score ● ● ● ● ● ● 0.50 ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● 0.25 R^2 = 0.25 ● 0.8 ● ● ● ●● ●● ● ● ●●●●●● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ●●●● ●● ● ● ●● ●● ●● ●●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ●●● ●● ● ●●●● ● ● ● ●● ● ●●● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ●●● ● ●● ●● ● ● ●●● ●● ● ● ● ●● ●●●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●●●●●●● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ●● ●● ● ●●● ● ●●● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●●● ● ● ●● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ●●●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●● ● ● ● ● ● ● ● ●● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ●● ● ●● ●● ● ● ●● ● ●● ● ●● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ●● ● ●●●● ●●● ●● ● ● ●● ●● ● ●● ● ● ●● ●● ● ● ● ●●● ● ●● ●● ● ●● ● ●● ●●● ● ●● ● ● ● ●●●●● ●● ●● ●●● ● ●●●● ●● ● ● ●● ●●● ●● ●● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●●● ● ●●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● DW−Nominate Score ● ● 0.75 1.00 ● ● 0.6 1.0 Aldrich−McKelvey Score (112th Congress) 0.00 0.75 0.50 0.25 ● R^2 = 0.28 0.00 5.0 5.5 6.0 6.5 0.0 Expert Assessment Score (111th Congress) 0.5 1.0 1.5 2.0 CF Score Figure 2: The relationship between DW-Nominate and various measures of candidate positions for Republicans in the House between 2000-2012 have measures did not win their subsequent election and vote in the House. We have too few Twitter scores to consider them, and too few Expert scores for the Democrats (hence the “NA” values, meaning “Not Applicable”). For the observations we do have, the results are much weaker than they were for the contemporaneous comparisons. Only one measure (NPAT scores) explains more than half 21 Table 4: Validation Statistics for Various Measurement Models for Non-Incumbents Against Future Nominate Scores in the U.S. House Name Observations Percent Classified into Correct Party State Leg. Ideal Pts. 94 0.968 NPAT Scores 35 0.914 Survey Respond’s 83 1.000 Experts CF-Score 39 1.000 301 0.993 Success in Explaining Within-Party Variation in DW-Nominate Scores for Democrats Percent Variation Explained 0.387 0.629 0.255 NA Percent Correctly Classified (Terciles) 0.552 0.714 0.567 NA Prop. improvement on party baseline 0.251 0.494 0.142 NA 0.048 0.567 0.011 Success in Explaining Within-Party Variation in DW-Nominate Scores for Republicans Percent Variation Explained 0.173 0.275 0.033 0.026 Percent Correctly Classified (Terciles) 0.446 0.556 0.632 0.703 Prop. improvement on party baseline 0.080 0.222 0.017 -0.007 0.220 0.534 0.119 of the variance in Democratic DW-Nominate scores and none of the models explain half of the variance in Republican DW-Nominate scores. Moreover, only one measure reduces the variance over a naive model that assumes that all Democrats and all Republicans are the same by more than 25% in any case. Making matters worse, measures that perform well for one group generally perform relatively worse for the other. For instance, although CF Scores explain more variance in the Republican positions than any other measure, they are the worst measure for Democrats. While it may be the case that these measures have better than zero predictive accuracy for some parties in some years, with current data we can say very little about the conditions under which we expect them to perform well for the House of Representatives. Average accuracy is very low. In fact, no models performs much better than a model that assumes one ideal point per party. 22 U.S. Senate One possibility is that these measures perform poorly for the House of Representatives because it is inherently more difficult to predict the voting records of House members. House members tend to have lower visibility to donors, members of the public, and experts. Some House candidates are political novices, and may not have formed their own views on a variety of issues. The experience of operating in a chamber where majority party control is the norm may alter candidate positions once they begin serving. In contrast, the United States Senate is a much more visible body, and candidates for the Senate tend to have longer experience in the public eye. Once elected, Senators participate in a legislative body that is noted for its individualism rather than overbearing party control. For these reasons we might expect our measures to have better accuracy in the Senate than in the House of Representatives. The disadvantage of the Senate is a greatly reduced sample size. There are fewer total Senators (100 instead of 435), fewer Senatorial elections (each Senator is up for election every six years instead of 2), and lower turnover. We lack enough data from two of the models (NPAT and Experts) to test these models at all. For the other measures, we have lower sample sizes for the contemporaneous comparison. For the predictive comparison involving candidates who win, we will not be able to test the Twitter-based measure either. Table 5 shows the contemporaneous comparison for the Senate. In most cases, the fit is substantially higher for these measures than in the case of the House of Representatives, particularly for Republican legislators. Twitter scores perform particularly well across all three statistics for both parties. However, the overall predictive power of these measures is still limited. Table 6 repeats the analysis above using the candidate scores for candidates who have not yet held Senate seats and their later DW-Nominate scores as Senators. Once again, the fit is generally lower than it was for the contemporaneous comparison. However, AldrichMcKelvey scores from survey respondents perform the best. These scores explain 45% of 23 Table 5: Validation Statistics for Various Measurement Models for Incumbents Against Contemporaneous Nominate Scores in the U.S. Senate Name Available Congresses: Observations Percent Classified into Correct Party Survey Respondents (112) 103 0.971 Twitter Scores (112) 77 0.974 CF-Score (106-112) 710 0.969 Success in Explaining Within-Party Variation in DW-Nominate Scores for Democrats Percent Variation Explained 0.325 0.722 0.325 Percent Correctly Classified (Terciles) 0.463 0.677 0.463 Prop. improvement on party baseline 0.267 0.430 0.124 Success in Explaining Within-Party Variation in DW-Nominate Scores for Republicans Percent Variation Explained 0.384 0.624 0.240 Percent Correctly Classified (Terciles) 0.729 0.719 0.729 Prop. improvement on party baseline 0.257 0.368 0.106 the variation in Democratic DW-Nominate scores and 64% of the variation in Republican DW-Nominate scores. Applications In this section, we show how the choice of measurement strategy for candidates positions dramatically affects substantive inferences in two important areas: polarization and the allocation of distributive spending. Polarization There is a vast literature that examines changes in polarization over time among legislators and candidates. In their authoritative study, McCarty, Poole, and Rosenthal (2006) show that legislators’ roll call records have polarized asymmetrically, with virtually all of the polarization occurring among Republicans. In line with this finding, the upper-left panel of Figure 3 shows that between 2000 and 2012, virtually all of the polarization in DW24 Table 6: Validation Statistics for Various Measurement Models for Non-Incumbents Against Future Nominate Scores in the U.S. Senate Name Observations Percent Classified into Correct Party State Legislative Ideal Pts. 14 0.929 Survey Respondents 53 1.000 CF-Score 71 0.972 Success in Explaining Within-Party Variation in DW-Nominate Scores for Democrats Percent Variation Explained 0.230 0.449 0.325 Percent Correctly Classified (Terciles) 0.375 0.333 0.333 Prop. improvement on party baseline 0.163 0.287 0.098 Success in Explaining Within-Party Variation in DW-Nominate Scores for Republicans Percent Variation Explained 0.436 0.640 0.476 Percent Correctly Classified (Terciles) 0.667 0.778 0.778 Prop. improvement on party baseline 0.223 0.380 0.285 Nominate scores occurred among Republicans. The middle and upper-right panels show the analogous change among NPAT scores for incumbents and all-candidates (i.e., both winners and losers) between 2000 and 2006. Like DW-Nominate scores, NPAT-scores also polarize asymmetrically. However, virtually all of the polarization in NPAT-Scores occurs among Democrats. Finally, the lower panel shows the change in polarization in CF-Scores for incumbents and all candidates. A number of recent empirical studies have used CF-Scores to examine the causal factors for polarization in state legislatures and Congress (e.g., Ahler, Citrin, and Lenz, Forthcoming; Rogowski and Langella, 2014; Thomsen, 2014). Figure 3 shows that unlike DW-Nominate scores, CF-Scores polarized among both Democrats and Republicans. Overall, these plots indicates that DW-Nominate scores, NPAT-scores, and CF-scores each show a different story regarding the relative changes in polarization over the past decade. It is possible that each of the stories is substantively interesting. For instance, it is possible that the composition of Democratic donors is polarizing, while Democrats’ roll-call behavior is staying constant. But the substantively different trends in polarization across models are further evidence that it is unlikely that these models are actually measuring the same latent 25 Polarization in DW−Nominate Scores (Incumbents) ● ● ● ● 1.0 ● ● 0.00 −0.25 ● ● ● ● ● ● 0.0 −0.5 −1.0 ● ● ● ● ● ● ● ● 0.5 ● ● ● ● 0.0 −0.5 ● −1.0 2000 2002 2004 2006 2008 2010 2012 2000 2002 2004 2006 2008 2010 2012 Year Year Year Polarization in CF−Scores (Incumbents) Polarization in CF−Scores (All Candidates) ● ● ● ● ● ● ● 1.0 CF−Score 0.5 CF−Score ● 2000 2002 2004 2006 2008 2010 2012 1.0 0.0 −0.5 ● 0.5 0.25 ● Polarization in NPAT−Scores (All Candidates) NPAT−Score 0.50 ● ● NPAT−Score DW−Nominate Score 0.75 Polarization in NPAT−Scores (Incumbents) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 0.0 −0.5 ● ● ● ● −1.0 ● ● 2000 2002 2004 2006 2008 2010 2012 2000 2002 2004 2006 2008 2010 2012 Year Year Figure 3: The evolution of DW-Nominate and various measures of candidate positions for Democrats and Republicans in the House between 2000-2012. Blue dots show the mean spatial position of Democrats and red dots show the mean spatial position of Republicans. quantity. This suggests that scholars should use caution in using non-roll call based measures of candidate ideology to make inferences about changes in polarization in Congress or state legislatures. Allocation of Distributive Spending An important question in the field of legislative politics is the degree to which legislators’ ideology influences the amount of distributive spending that their district receives (e.g., Ferejohn, 1974; Cann and Sidman, 2011). Alexander, Berry, and Howell (2016) persuasively show that moderate legislators get more non-formula (e.g., flexible, non-mandatory) discretionary spending than extremist legislators. The logic is that moderate legislators near the median receive pay-offs in exchange for their support of close bills (Snyder, 1991; Dekel, Jackson, and Wolinsky, 2008). 26 Alexander, Berry, and Howell (2016) use a nuanced identification strategy with countyby-member fixed effects and other time-varying controls. However, their basic result also appears in a much simpler cross-sectional regression.22 Indeed, Table 7 shows that a onestandard deviation increase in the extremity of legislators’ DW-Nominate scores in 2008 is associated with a 3.7% decrease in non-formula spending.23 Table 7 Dependent variable: Log(Non-Formula Grants) (1) DW-Nominate (2) (3) (4) −0.040∗∗ (0.018) −0.011 (0.013) CF-Scores −0.053 (0.032) NPAT-Scores −0.008 (0.027) Aldrich-McKelvey Scores Twitter Scores Republican Constant Observations R2 Adjusted R2 (5) −0.014 (0.035) 21.934∗∗∗ (0.021) −0.064∗∗ (0.026) 21.956∗∗∗ (0.018) −0.015 (0.067) 21.398∗∗∗ (0.040) −0.058 (0.054) 21.936∗∗∗ (0.029) 0.005 (0.029) −0.062 (0.056) 21.947∗∗∗ (0.040) 420 0.031 0.026 420 0.021 0.016 99 0.029 0.009 290 0.021 0.014 120 0.021 0.004 ∗ p<0.1; ∗∗ p<0.05; ∗∗∗ p<0.01 Note: In contrast, none of the other measures of candidate positions have a statistically significant association with the distribution of non-formula grants. This further suggests that these measures may not be capturing the same latent quantity as DW-Nominate. Moreover, their usage as a proxy for legislators’ roll call behavior could lead to incorrect inferences on 22 We downloaded their data from the Harvard Dataverse, http://dx.doi.org/10.7910/DVN/VR12G4, and matched it with the various measures of candidate positions that we evaluate in this paper. 23 This is substantively consistent with the “7.2% decrease in outlays associated with a one standard-deviation increase in a member’s ideological distance from the median voter” that Alexander, Berry, and Howell (2016, 223) report in their paper. 27 fundamental questions in political economy and the study of legislatures. Conclusion Despite the development of a variety of innovative strategies for measuring the political positions of candidate for Congress, existing measures have only limited predictive power in terms of the voting records that candidates establish once elected. Even contemporaneous measures, which use data on legislators as they are currently serving in Congress, typically fail to explain even half the variance in legislator voting, and usually closer to a third. The performance of these measures varies across parties and time, with no measure clearly dominant. As a result, the usage of these measures of candidate positions could lead to inferential errors for substantive research. For instance, we have shown that different measures of candidate positions lead to dramatically different inferences for both polarization and the link between legislator ideology and distributive spending. Our findings are profound not just for academic research, but for our understanding of democracy. Prospective voting requires voters, not just political scientists, to know what candidates will do if elected. While these measures perform poorly at predicting legislators’ roll call positions, they do have a number of other valuable uses. They could be used to impute the partisanship of candidates (de Benedictis-Kessner and Warshaw, 2015) and voters (Hill and Huber, 2015) when other information on their partisanship is not readily available. Moreover, they could be used to examine potential explanations for the mismatch between survey respondents’ perceptions and candidates’ actual roll call positions (see, e.g., Cormack, 2015; Grimmer, 2013; Henderson, 2013). These measures also have a number of potential applications for specific substantive questions outside the realm of legislative behavior. For instance, CFScores could be used to examine the campaign finance behavior of bureaucrats (Bonica et al., 2015) and Barberá (2015)’s measures of candidates’ Twitter followers could be used to examine the effect of candidates’ roll call positions on their followings on social networks. 28 There are a variety of reasons that constituents’ implicit (e.g., campaign finance donations or twitter following) and explicit (e.g., survey responses) perceptions of candidates’ ideology are both only weakly associated with candidates’ roll call behavior inside of Congress. Although candidates make commitments and promises during their campaigns, these commitments are rarely enforceable (Alesina, 1988). Moreover, candidates have a variety of reasons to distort their positions during the campaign. This may weaken the relationship between candidates’ campaign platforms and their roll call positions (Rogowski, 2015). Moreover, the ability of constituent perceptions to predict roll call behavior may be further distorted by political geography. Indeed, social media commentators, donors, and the public are limited in the choice of viable candidates to support in any particular district. Information gleaned from these relationships may be a feature of the limited choice set rather than true similarity. Finally, there are a variety of factors that could influence candidates’ roll call votes (e.g., lobbying, agenda-control, party leaders, etc). It is also important to note that our findings do not imply that it is impossible to find a better measure of candidates’ spatial positions. On the contrary, we encourage future researchers to look for better data sources and modeling strategies in order to more accurately measure the positions of candidates (e.g., Bonica, 2016). However, we would also encourage future research to measure success by a high standard. Cross-party correlation coefficients are a poor way to evaluate the accuracy of a measure. Instead we would encourage scholars to use within-party measures, with variance explained being the easiest to interpret. It is also important to evaluate the performance of new measures in different time periods as measures that appear to predict well may vary substantially in their usefulness, particularly in the current, more polarized era. For the time being, however, our findings call into question the usefulness of these measures for examining questions that depend on the relative spatial distance between candidates, such as tests of spatial voting theories or the causes of Congressional polarization.24 24 Whether or not these measures are useful depends on the application in question. Even relatively weak proxy measures can sometimes produce orderings that are correct a substantial fraction of the time. How- 29 At the very least, empirical papers that use these measures to study the causes and effects of candidate positions in Congress should validate their usage, and demonstrate the robustness of their findings using different measures of candidates positions. ever, comparisons of relative distances can be highly inaccurate. 30 References Adcock, Robert, and David Collier. 2001. “Measurement Validity: A Shared Standard for Qualitative and Quantitative Research.” American Political Science Review 95(3): 529– 546. Ahler, Douglas J, Jack Citrin, and Gabriel S Lenz. Forthcoming. “Do Open Primaries Improve Representation? An Experimental Test of California’s 2012 Top-Two Primary.” Legislative Studies Quarterly . Aldrich, John H, and Richard D McKelvey. 1977. “A Method of Scaling with Applications to the 1968 and 1972 Presidential Elections.” The American Political Science Review 71(1): 111–130. Alesina, Alberto. 1988. “Credibility and Policy Convergence in a Two-Party System with Rational Voters.” American Economic Review 78(4): 796–805. Alexander, Dan, Christopher R Berry, and William G Howell. 2016. “Distributive Politics and Legislator Ideology.” The Journal of Politics 78(1): 000–000. Ansolabehere, Stephen, James M Snyder Jr, and Charles Stewart. 2001. “Candidate Positioning in US House Elections.” American Journal of Political Science 45(1): 136–159. Barbera, Tweet Pablo. 2014. Together. “Replication data for: Bayesian Ideal Point Birds of the Same Feather Estimation Using Twitter Data.” http://dx.doi.org/10.7910/DVN/26589. Barberá, Pablo. 2015. “Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data.” Political Analysis 23(1): 76–91. Black, Duncan. 1948. “On the Rationale of Group Decision-Making.” The Journal of Political Economy 56(1): 23–34. 31 Bond, Robert, and Solomon Messing. 2015. “Quantifying Social Media?s Political Space: Estimating Ideology from Publicly Revealed Preferences on Facebook.” American Political Science Review 109(01): 62–78. Bonica, Adam. 2013a. “Database on Ideology, Money in Politics, and Elections: Public version 1.0.” http://data.stanford.edu/dime. Bonica, Adam. 2013b. “Ideology and Interests in the Political Marketplace.” American Journal of Political Science 57(2): 294–311. Bonica, Adam. 2014. “Mapping the Ideological Marketplace.” American Journal of Political Science 58(2): 367–386. Bonica, Adam. 2016. “Inferring Roll-Call Scores from Campaign Contributions Using Supervised Machine Learning.” Unpublished manuscript. Available for download at http: //papers.ssrn.com/sol3/papers.cfm?abstract_id=2732913. Bonica, Adam, and Maya Sen. 2015. “A Common-Space Scaling of the American Judiciary and Legal Profession.” Unpublished manuscript. Available for download at http: //scholar.harvard.edu/msen/judges-scaling. Bonica, Adam, Chen Jowei, Johnson Tim et al. 2015. “Senate Gate-Keeping, Presidential Staffing of Inferior Offices, and the Ideological Composition of Appointments to the Public Bureaucracy.” Quarterly Journal of Political Science 10(1): 5–40. Bonica, Adam, Howard Rosenthal, and David J Rothman. 2014. “The political polarization of physicians in the United States: an analysis of campaign contributions to federal elections, 1991 through 2012.” JAMA internal medicine 174(8): 1308–1317. Cann, Damon M, and Andrew H Sidman. 2011. “Exchange theory, political parties, and the allocation of federal distributive benefits in the House of Representatives.” The Journal of Politics 73(04): 1128–1141. 32 Clinton, Joshua, Simon Jackman, and Douglas Rivers. 2004. “The Statistical Analysis of Roll Call Data.” American Political Science Review 98(2): 355–370. Cormack, Lindsey. 2015. “Extremity in Congress: Communications versus Votes.” Unpublished manuscript. Available for download at personal.stevens.edu/~lcormack/ extreme_comms_votes.pdf. de Benedictis-Kessner, Justin, and Christopher Warshaw. 2015. “Mayoral Partisanship and Municipal Fiscal Policy.” Unpublished manuscript. Available for download at http:// cwarshaw.scripts.mit.edu/papers/CitiesMayors_160120.pdf. Dekel, Eddie, Matthew O Jackson, and Asher Wolinsky. 2008. “Vote buying: General elections.” Journal of Political Economy 116(2): 351–380. Downs, Anthony. 1957. An Economic Theory of Democracy. New York: Harper & Row. Enelow, James M, and Melvin J Hinich. 1984. The Spatial Theory of Voting: An Introduction. Cambridge University Press. Ensley, Michael J. 2009. “Individual Campaign Contributions and Candidate Ideology.” Public Choice 138(1-2): 221–238. Ferejohn, John A. 1974. Pork barrel politics: Rivers and harbors legislation, 1947-1968. Stanford University Press. Gelman, Andrew, and Gary King. 1990. “Estimating incumbency advantage without bias.” American Journal of Political Science 34(4): 1142–1164. Grimmer, Justin. 2013. Representational Style in Congress: What Legislators Say and Why It Matters. Cambridge University Press. Groseclose, Tim, and Jeffrey Milyo. 2005. “A measure of media bias.” The Quarterly Journal of Economics pp. 1191–1237. 33 Groseclose, Tim, Steven D Levitt, and James M Snyder. 1999. “Comparing interest group scores across time and chambers: Adjusted ADA scores for the US Congress.” American Political Science Review 93(01): 33–50. Hall, Andrew, and James Snyder. 2015. “Candidate Ideology and Electoral Success.” Unpublished manuscript. Available for download at https://dl.dropboxusercontent.com/u/ 11481940/Hall_Snyder_Ideology.pdf. Hall, Andrew B. 2015. “What Happens When Extremists Win Primaries?” American Political Science Review 109(01): 18–42. Hare, Christopher, David A Armstrong, Ryan Bakker, Royce Carroll, and Keith T Poole. 2014. “Using Bayesian Aldrich-McKelvey Scaling to Study Citizens’ Ideological Preferences and Perceptions.” American Journal of Political Science . Henderson, John Arthur. 2013. “Downs’ Revenge: Elections, Responsibility and the Rise of Congressional Polarization.” Unpublished PhD Dissertation. Available for download at http://gradworks.umi.com/36/16/3616463.html. Hill, Seth, and Greg Huber. 2015. “Representativeness and Motivations of Contemporary Contributors to Political Campaigns: Results from Merged Survey and Administrative Records.” Unpublished manuscript. Available for download at http://www.sethjhill. com/HillHuberDonorate_062515.pdf. Jessee, Stephen A. 2012. Ideology and Spatial Voting in American Elections. New York, NY: Cambridge University Press. Joesten, Danielle A, and Walter J Stone. 2014. “Reassessing Proximity Voting: Expertise, Party, and Choice in Congressional Elections.” The Journal of Politics 76(3): 740–753. Kousser, Thad, Justin Phillips, and Boris Shor. 2015. “Reform and Representation: A New 34 Method Applied to Recent Electoral Changes.” Unpublished manuscript. Available for download at http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2260083. Lee, David S, Enrico Moretti, and Matthew J Butler. 2004. “Do voters affect or elect policies? Evidence from the US House.” The Quarterly Journal of Economics 119(3): 807–859. Maestas, Cherie D., Matthew K.; Buttice, and Walter J. Stone. 2013. “Replication data for: Extracting Wisdom from Experts and Small Crowds: Strategies for Improving Informantbased Measures of Political Concepts.” http://dx.doi.org/10.7910/DVN/23170. Maestas, Cherie D, Matthew K Buttice, and Walter J Stone. 2014. “Extracting Wisdom from Experts and Small Crowds: Strategies for Improving Informant-based Measures of Political Concepts.” Political Analysis 22(3): 354–373. McCarty, Nolan M, Keith T Poole, and Howard Rosenthal. 2006. Polarized America: The Dance of Ideology and Unequal Riches. Cambridge, MA: MIT Press Cambridge. Montagnes, B Pablo, and Jon C Rogowski. 2014. “Testing Core Predictions of Spatial Models: Platform Moderation and Challenger Success.” Political Science Research and Methods Forthcoming. Nokken, Timothy P, and Keith T Poole. 2004. “Congressional party defection in American history.” Legislative Studies Quarterly 29(4): 545–568. Poole, Keith T. 2005. Spatial models of parliamentary voting. Cambridge University Press. Poole, Keith T, and Howard L Rosenthal. 2011. Ideology and Congress. Transaction Publishers. Ramey, Adam. 2016. “Vox Populi, Vox Dei? Crowdsourced Ideal Point Estimation.” Journal of Politics 78(1). 35 Rogowski, Jon C. 2015. “Faithful Agents? Electoral Platforms and Legislative Behavior.” Unpublished Manuscript. Available for download at https://pages.wustl.edu/files/ pages/imce/rogowski/measure_accountability_10-14.pdf. Rogowski, Jon C, and Stephanie Langella. 2014. “Primary Systems and Candidate Ideology Evidence From Federal and State Legislative Elections.” American Politics Research 43(5): 846–871. Shor, Boris, and Jon C Rogowski. 2015. “Ideology and the US Congressional Vote.” Unpublished manuscript. Available for download at http://papers.ssrn.com/sol3/papers. cfm?abstract_id=2650028. Shor, Boris, and Nolan McCarty. 2011. “The Ideological Mapping of American Legislatures.” American Political Science Review 105(03): 530–551. Shor, Boris, and Nolan McCarty. 2014. “Individual State Legislator Shor-McCarty Ideology Data, July 2014 update.” http://dx.doi.org/10.7910/DVN/26805. Shor, Boris, Christopher Berry, and Nolan McCarty. 2010. “A Bridge to Somewhere: Mapping State and Congressional Ideology on a Cross-institutional Common Space.” Legislative Studies Quarterly 35(3): 417–448. Snyder, James M. 1991. “On Buying Legislatures.” Economics & Politics 3(2): 93–109. Stone, Walter J, and Elizabeth N Simas. 2010. “Candidate valence and ideological positions in US House elections.” American Journal of Political Science 54(2): 371–388. Stratmann, Thomas. 2000. “Congressional voting over legislative careers: Shifting positions and changing constraints.” American Political Science Review 94(03): 665–676. Thomsen, Danielle M. 2014. “Ideological Moderates Won?t Run: How Party Fit Matters for Partisan Polarization in Congress.” The Journal of Politics 76(03): 786–797. 36 Appendix A: Comparison of our Results with the Validation Metrics in the Source Papers In this section, we compare our results to the results reported in the original papers that we evaluate. Table 8 shows the percentage of the variation in incumbents’ DW-Nominate scores explained by each model that we report in the main text. It also shows the percentage of the variation in DW-Nominate scores that each of the source papers reports that their model explains. Table 8: Validation Statistics for Various Measurement Models Against Contemporaneous Nominate Scores in the U.S. House - Comparison with Results in Source Papers Name State Leg. Ideal Pts. NPAT Scores Survey Respond’s Experts Twitter CF-Score Success in Explaining Within-Party Variation in DW-Nominate Scores for Democrats % Variation Explained (our analysis) 0.466 0.545 0.483 0.601 0.462 % Variation Explained (source paper) NA NA 0.573 0.518 0.519 0.203 0.314 Success in Explaining Within-Party Variation in DW-Nominate Scores for Republicans % Variation Explained (our analysis) 0.284 0.368 0.183 0.252 0.118 % Variation Explained (source paper) NA NA 0.289 0.313 0.123 0.279 0.436 Overall, our results regarding the amount of within-party variation explained by each model are very similar to the results reported in the source papers.25 The only notable differences between our results and those in the source papers are for CF-Scores (Bonica, 2014) and the Survey-based scores (Ramey, 2016). This difference for CF-Scores likely stems from the fact that we focus on the 107-113 Congresses, while Bonica (2014, 370-371) focuses on the 96-112 Congresses. There may have been a tighter relationship between CFScores and DW-Nominate scores in earlier Congresses. The difference for the survey-based scores appears to stem from the fact that Ramey (2016) uses W-Nominate rather than DW25 Shor and McCarty (2011) and Shor and Rogowski (2015) do not report correlations with DW-Nominate scores in their papers. Also, note that results from Barberá (2015) that we report in Table 8 were calculated from his replication data based on members of the U.S. House in the 112th Congress. The correlations reported in the paper are somewhat higher, but they include both members of the U.S. House and Senate. 37 Nominate to validate his estimates, and there is a slightly higher correlation between the survey scores and W-Nominate than there is with DW-Nominate. 38 Appendix B: Vote-by-Vote Statistics In the main body of this paper, we focus on the fit of each measure to DW-Nominate scores. This is for the simple reason that DW-Nominate recovers a latent dimension which best summarizes roll call voting behavior, according to a likelihood model. Methods based on other likelihood models, such as Clinton, Jackman, and Rivers’s (2004) IDEAL, or methods which explicitly maximize classification, such as Poole’s (2005) Optimal Classification, lead to an extremely similar result. Nonetheless, it is possible to examine the fit of each measure to the votes themselves, and examine the degree to which these measures lead to correct classifications. To do this, we run univariate logistic regressions for every vote case in the House of Representatives from 2003 to 2012. For each measure, we calculate predicted votes and compare them to the actual votes. We consider three measures. The first is Poole and Rosenthal’s (2011) Average Proprtional Reduction in Error (APRE). The APRE calculates the overall reduction in error as compared to a naive model that assumes all votes are cast with the majority. It is calculated as follows: X AP RE = Minority Vote − Classification Errors votes X Minority Vote votes The second measure is the Percent Correctly Predicted (PCP) which is simply the percent of all non-missing votes that are correctly predicted. The third measure is the “Improvement over Party.” Like the APRE, the “Improvement over Party” measures the reduction in error above and beyond a naive model, but in this case the naive predictions are the predicted values from a logistic regression on a dummy variable for the party identification of the legislator. “Improvement over Party” is the percent reduction in error where the error from the party model is in the denominator. 39 X Improvement over Party = Party Model Errors − Errors From This Model votes X Party Model Errors votes Table 9 shows the results of these analyses for each measure discussed in the paper. It also shows the number of votes analyzed, which varies due to the availability of the measures in question. Party and DW-Nominate scores are included as separate measures. Unsurprisingly, the results of this exercise mirror the results from the rest of the paper. DW-Nominate is a measure based on the underlying votes. Despite the very high importance of party in recent years, DW-Nominate scores substantially improve the classification of votes. This is why we use DW-Nominate scores as a general measure of legislator behavior. However, it is notable that significant idiosyncratic error remains. Most of the remaining measures vary significantly in their explanatory power, which is often close to 0, and sometimes even negative. No measure besides DW-Nominate consistently reduces error above and beyond party. Twitter scores and Survey-based Aldrich McKelvey scores explain 8.9% and 8.6% of the variation left unexplained by party in the one congress where they are available. This is still only 60% of the reduction in error achieved by DW-Nominate. These are contemporaneous comparisons, and so they likely overestimate the predictive power of these measures. 40 Table 9: Vote-by-vote Statistics for Each Measure APRE 108th House, 2003-2004 Party 0.729 Dynamic CF-Score 0.732 Static CF-Score 0.734 NPAT 0.696 State Leg. 0.72 DW-Nominate 0.768 109th House, 2005-2006 Party 0.693 Dynamic CF-Score 0.686 Static CF-Score 0.695 NPAT 0.649 State Leg. 0.666 DW-Nominate 0.731 110th House, 2007-2008 Party 0.773 Dynamic CF-Score 0.772 Static CF-Score 0.774 State Leg. 0.776 DW-Nominate 0.815 111th House, 2009-2010 Party 0.75 Dynamic CF-Score 0.758 Static CF-Score 0.754 Experts 0.723 State Leg. 0.744 DW-Nominate 0.796 112th House, 2011-2012 Party 0.743 Dynamic CF-Score 0.753 Static CF-Score 0.748 Twitter 0.766 Survey 0.765 State Leg. 0.724 DW-Nominate 0.779 PCP Votes Improvement over Party 0.917 0.918 0.918 0.91 0.924 0.929 406,387 389,117 389,117 115,780 62,640 406,387 0 0.011 0.018 -0.122 -0.033 0.144 0.907 0.906 0.909 0.904 0.908 0.919 432,589 420,391 420,391 97,381 74,989 432,589 0 -0.023 0.007 -0.143 -0.088 0.124 0.926 0.925 0.926 0.924 0.94 641,222 626,308 626,308 120,105 641,222 0 -0.004 0.004 0.013 0.185 0.931 0.933 0.932 0.934 0.923 0.943 502,225 484,762 484,762 169,947 101,641 502,225 0 0.032 0.016 -0.108 -0.024 0.184 0.909 0.913 0.911 0.924 0.917 0.914 0.922 635,443 616,114 616,114 214,589 628,296 149,986 635,443 0 0.039 0.019 0.089 0.086 -0.074 0.14 41
© Copyright 2026 Paperzz