Parliamentary Polarization Cohort Effects and Ideological Dynamics in the UK House of Commons, 1935–2013∗ Andrew Peterson† Arthur Spirling‡ Abstract We consider the causes of changing levels of elite polarization in the House of Commons, a challenging but important task for understanding the development of UK politics. Making use of a new dataset of 3.5 million speeches over the period 1935–2013, we provide a new measurement strategy that takes relative distinctiveness in spoken contributions between Labour and Conservative members as evidence of polarization. In the aggregate, this yields a long-term overview of parliament that is remarkably consistent with well-known historical accounts. We show that there are three structural breaks in the data, and polarization in the post-Blair period has declined to levels not seen since the mid-1960s. Using the individual level estimates we then derive and validate, we explore ‘cohort effects’—by which new, incoming generations of MPs are a possible cause of changes in elite polarization. We show such effects are real and surprisingly large. ∗ First version: February 16, 2016. This version: October 1, 2016. We thank Chris Kam, Ben Lauderdale and Gaurav Sood for helpful comments on an earlier draft. Kaspar Beelen provided invaluable research advice and assistance with data. † PhD Candidate, Department of Politics, New York University. [email protected] ‡ Associate Professor of Politics and Data Science, New York University. [email protected] 1 1 Introduction The study of party systems and their effects is at the core of political science research (e.g. Sartori, 1976; Powell, 1982). Commensurate with this long-standing interest, in recent times scholars have shifted to examining the causes and consequences of ‘polarization’ in both the Comparative (e.g. Iversen and Soskice, 2015) and Americanist literatures (e.g. Barber and McCarty, 2015). Not least because it represents one of the oldest, most imitated and most stable systems of governance (Rhodes and Weller, 2005), Britain’s Westminster polity has attracted a great deal of attention in this context. This work runs the gamut, and includes both long–range historical, qualitative accounts of the (purported) ‘post-war consensus’ among leaders (e.g. Kavanagh and Morris, 1994), along with focussed statistical studies of voter behavior in modern times (e.g. Green, 2007). Despite this broad-ranging research agenda, we have little systematic knowledge of elite— that is, Member of Parliament (MP)—behavior, with respect to its relationship with polarization of the House of Commons, for any period. Of course, scholars have have made great strides in studying MP roll call activities (e.g. Hanretty, Lauderdale and Vivyan, 2016) and MP beliefs (e.g. Kam, 2009) but, for the reasons we explain below, they have not been able to examine the changing nature of inter-party and intra-party ideological cohesion in the Commons over the long term. This means, for example, that it is impossible to test important historical theories of aggregate change, such as comparing contemporary fractiousness to its level in the post-Second World War period (see, e.g., Seldon, 1994). Because we cannot measure elite polarization in a reproducible, quantitative way, we can say almost nothing about what drives it over time, be it new leaders, or the latent characteristics of new cohorts of MPs, both, or neither. This may be compared to historical work on cohesion in Westminster systems where such issues have been considered in some detail (see, e.g., 2 Eggers and Spirling, 2016; Godbout and Hoyland, 2016). This situation is also in stark contrast to research on the United States, where scholars have readily available, valid and reliable measures of polarization, and consequently paint an increasingly complete picture of its development and correlates in the long term (e.g. McCarty, Poole and Rosenthal, 2006). The current paper improves on this state of affairs for scholars of British politics specifically, and students of Comparative Politics, more generally. In particular, we introduce new methods applied to a massive data set of 3.5 million Commons speeches, in order to provide both aggregate and individual estimates of polarization for every single parliamentary session and every member of the period 1935–2013. Our central logic is to conceive of MPs from different parties as being more or less distinguishable over time, in terms of what they choose to say. More specifically, when Labour MPs cannot easily be told apart from Conservative MPs, we are in a world of relatively low polarization. By contrast, when it is straightforward to discriminate between partisans based on their utterances—say, with regards to the topics they raise, or the way they express themselves—we are in a more polarized era. Because assessing such things is simply beyond the scope of human hand coding, we use fast, accurate ‘supervised learning’ techniques—with the party membership as the ‘label’—to place the speeches in a continuous Labour–Conservative space. From there, we aggregate in a simple but valid way to both the legislators themselves and the parliamentary year as a whole. Though not completely without precedent in political science (e.g. Diermeier et al., 2012), our measurement strategy allows us to shed light on long-term Westminster patterns of behavior in a fundamentally new way. By providing estimates for the individuals over their careers, we can undertake regressions in a cross-section time-series format, which help us to understand whether new cohorts of MPs do or do not affect the nature of Commons competition. That is, we can determine whether MPs entering in different ‘generations’ push Commons life towards more different or more similar party ideological 3 positions. Further, we can contrast such effects to those coming from other sources such as new leadership (or its associated time trends). Unlike all previous approaches for assessing the ideological makeup of the chamber, we are not reliant on roll calls and thus avoid the problems they entail (e.g. Spirling and McLean, 2007). Nor are we are bound by data availability to a short time period, or to incomplete coverage of MPs (e.g. Kellermann, 2012). To preview our findings, at the aggregate level, we find strong evidence of a ‘post-war consensus’, followed by a distinctive break in 1979. This is in line with some qualitative accounts, though by no means all of them (e.g. Pimlott, 1988). More interestingly, we provide evidence that Blair’s election as Labour leader, and the 2001 general election in particular, are the beginning of a sharp overall decline in polarization. This continues into the most recent period of coalition government (that is, from 2010 until the 2015 general election). At a more micro-level, in contrast to earlier accounts focussing on roll calls in the 19th Century (Eggers and Spirling, 2016; Godbout and Hoyland, 2016) we show that cohorts are a consistently statistically significant predictor of member polarization, though the direction of the effect changes over time. In particular, new junior cohorts of MPs were more polarized than senior colleagues immediately after the Second World War, and during the advent of the first Thatcher administration. However, they were significantly less polarized than longer serving MPs in the post-New Labour era. These effects are robust to controlling for the party of the MP (that is, this is not simply an effect of the partisan identity of the government), and time dummies. In fact, we find that it was only during the Thatcher era that MPs become progressively more polarized as time passed: prior to and post-Thatcher, there are no statistically significant time effects. This means that, generally, MPs are as polarized ‘as they’ll ever be’ when they enter the Commons for the first time. Overall, we demonstrate that somewhere between a sixth and just over a third of all the variation in speech polarization stems from the cohort in which an MP entered parliament. Though not 4 the direct focus of our paper, this has interesting implications for the nature of Westminster political competition: rather than thinking about individual parliamentary leaders and their associated periods of office as more tribalistic or harmonious than others, we should perhaps pay at least as much attention to their colleagues on the benches beside and behind them. 2 Literature and Orientation The polarization of American politics has garnered a great deal of scholarly consideration (see Layman, Carsey and Horowitz, 2006, e.g.). In that literature, the fundamental quantitative evidence for this phenomenon is the increasing distance between the Democratic and Republican parties in terms of their roll call vote behavior in Congress. Inspection of post-war NOMINATE scores (Poole and Rosenthal, 1997), for example, suggests that on the main economic dimension of politics, the legislative parties are increasingly cohesive, with fewer conservative Democrats or liberal Republicans between them. Both the causes and consequences of this change are active areas of research (see Barber and McCarty, 2015, for an overview). 2.1 Evidence of the Post-War consensus in the UK Changes to polarization in the UK have not received similar levels of attention in political science and there is, in general, little statistical work on ideological partisanship for the House of Commons. On the qualitative side, political historians have engaged in a longstanding debate on the purported ‘post war consensus’ in Britain (see Fraser, 2000, for an overview). In essence this is about existence: on one side, scholars such as Addison (1994), Kavanagh and Morris (1994) and Seldon (1994) argue that the period between the Second World War and the 1980s was marked by the implementation of very similar policies—such as the goal of full employment, a mixed economy with large amounts of nationalization, and 5 the importance of the Atlantic Alliance—regardless of which party (Labour or Conservative) was in power. On the other hand, researchers like Pimlott (1988) claim that the parties were different in many areas of ideology and policy; and, in any case, voting in the electorate were certainly divided along class and party lines. As is obvious from this brief summary, these authors may be speaking past each other in the sense that the level of analysis—elite versus ‘ordinary’ voters—seem to differ between accounts (see Fraser, 2000, 349–250 on definitional issues). Indeed, to the extent that quantitative methods have been used to study polarization, they have tended to focus almost exclusively on electoral behavior. In this vein, for example, Adams, Green and Milazzo (2012) show that between 1987 and 2001, there has been a marked decline in partisan sorting—that is, the match between voter policy positions and party choice—although the changes to the variance of policy positions per se have not much changed.1 In common with other work on this topic, for reasons we explain below, these authors take as given—rather than provide or cite quantitative evidence that demonstrates— that the various purported eras of consensus or disharmony are exactly that. When quantitative scholars have turned to Commons behavior itself with ideological eras in mind, they have tended to focus on the (fractious) politics within single parties and the ‘rebellions’ these induce (e.g. Cowley, 2002), rather than the causes and effects of overall levels of polarization. Part of the reason that polarization at the voter-level has tended to have the lion’s share of quantitative effort is that elites, and elite behavior, are more difficult to study with systematic statistical techniques. For example, using roll calls to infer relative partisan difference is extremely problematic in Westminster countries: first, as in many parliamentary systems, the parties tend to vote extremely cohesively, regardless of era. This means that we do 1 See Ford and Goodwin (2014) for a complementary discussion of newer ideological movements in Britain. 6 not have any variance in ‘reaching across the aisle’ politics (there is consistently none) as might be the case in the United States. Second, there is reason to believe that even when we see departures from the norm of cohesion, the presence of “government vs opposition” motivations means that it is generally not possible to draw sensible conclusions about the relationship between preferences and behavior (Spirling and McLean, 2007). Scholars have attempted to get around these issues either with surveys of member positions (e.g. Kam, 2009) or by modeling something other than divisions; Kellermann (2012), for example, considers the (co)signing of ‘Early Day Motions’ (EDMs) as indicative of MP ideology. These techniques are helpful in the modern period, but of limited utility for the current problem. On the former, it is only relatively recently that surveys were administered in anything like a comprehensive fashion. On the latter, it is not trivial to find (pre-war) historical data on EDMs and, in any case, they yield inferences only for those who chose to sign them. Finally, scholars have invented ways to locate parliamentary parties relative to one another from speech (Slapin and Proksch, 2008), but these typically require strong dimensional assumptions (i.e. that there is one dimension to political conflict) and are not amenable to estimating member positions, which is a key part of our analysis here. Recently, Lauderdale and Herzog (2016) have built on such efforts to obtain individual estimates, but there the focus is on the underlying ‘political disagreement’ dimension (such as ‘government vs opposition) which as noted above, need not be the same as the ‘sincere’ left-right ideological continuum we are interested in. 2.2 Cohorts versus Eras As we will describe in more detail, the data we will use to draw conclusions about polarization are text.2 In particular, they are debates from Hansard. By applying the machine 2 We note that scholars of rhetoric have studied the post-war consensus in terms of elite expression (see, e.g., Toye, 2012), though not in a systematic, quantitative way. 7 learning techniques below, we will be able to draw conclusions about the aggregate amount of polarization in parliament at any given time, and comment directly on the plausibility of any ‘post-war consensus’ and its subsequent development. Furthermore, we will able to discern what predicts changes in polarization. At the most basic level, a natural way to think about polarization is that it is driven by new members arriving in the House of Commons. That is, when general elections introduce new MPs to parliament, those MPs bring latent characteristics in terms of, say, their position on some underlying left-right continuum of politics. If new Conservative MPs tend to be to the right of current ones, while new Labour MPs tend to be to the left of their colleagues, we would expect average polarization to increase. The opposite would hold if more moderate MPs join the chamber. Of course, even in the presence of new latent types, there are reasons to believe that polarization might not change at all. We know that Westminster is a highly structured place where, for example, deviance from the party (i.e. leader) line is harshly punished such that, in equilibrium, behavior is extremely predictable regardless of constituency or personality type (e.g. Kam, 2009). With this in mind, there is reason to believe we may see no ‘cohort effects’ whatsoever; indeed, this is exactly what Eggers and Spirling (2016) and Godbout and Hoyland (2016) find for 19th Century roll call behavior in the UK and Canada respectively. If change does not originate with new cohorts of MPs, it might come from MPs (of whatever vintage) facing new incentives at different times. In particular, one can imagine that fresh leaders affect their followers by introducing new ideological arguments and divides, or by bringing new items on to the agenda (see Eggers and Spirling, 2016; Godbout and Hoyland, 2016, for a review of similar arguments). Such ‘inducement’ stories imply that what matters is the (leadership) ‘era’ itself, rather than the cohorts of the MPs within it. This is, to be 8 sure, the default position of the qualitative literature on the origins of the post-war consensus (e.g. Addison, 1994), the role of the prime minister in general (e.g. King, 2007), and accounts of specific individuals in that position (e.g. Gamble, 1994). In this telling, a variable pertaining to the session number will be significant at all times; that is, what matters is the specific time period (coterminous with the identity of a particular set of leaders). In the extreme version, such a predictor will ‘crowd out’ any explanatory power from replacement. All in all, the ‘textbook’ theory (e.g. Lijphart, 1999; Powell, 2000)—and empirical evidence regarding (Eggers and Spirling, 2016)—Westminster systems yields priors that cohorts don’t matter very much for polarization. With that in mind, finding cohort effects on relative partisanship would be interesting per se. It also suggests something very different to other theories from a policy perspective: for example, it implies that if a less polarized environment is desired, tinkering with selection and screening mechanisms for candidates in the constituencies may be part of the solution. It is the possibility of such an effect that we focus on below. 3 Data: 3.5 million speeches over 78 years The official Hansard record of British parliamentary debates has existed since the 19th Century in paper form. Following digitization efforts by the “Digging into Linked Parliamentary Data” project, almost all volumes to the present day now exist in electronic form.3 This data has been extensively cleaned and matched with (disambiguated) meta-data on member names, whose ministerial roles and party identifications have been disambiguated.4 3 See http://schema.politicalmashup.nl/ We obtained xml copies of the records from Kaspar Beelen, a team member of the project. See Beelen et al. (2016) for details. 4 9 The data is for the period 1935–2013. This comfortably covers the post- Second World War era at Westminster, and thus is optimal for our purposes here. More specifically, we focus only on Labour and Conservative members. Bearing in mind that they consistently held between 85 percent and (almost) all the parliamentary seats—and had duopolistic control of Prime Ministerial office—during this time, this partisan restriction makes life considerably easier without much loss of analytical power. The relevant time series unit is the parliamentary ‘session’, a period lasting approximately a year (unless a general election intervenes). Thus we are working with 3,573,778 speeches over 78 sessions, given by 3,167 unique members, including a total of 5,085 ministerial roles. We do not work with all the speeches available. In particular, we do not use any speech containing fewer than 40 characters, and we drop any tokens which consist only of numbers or symbols. This is to avoid utterances (such as ‘hear-hear’) which are very common, but contain little substantive content. The data is remarkably well balanced in terms of partisan contributions, which is a testament to the dominance of the two ‘big’ British parties at this time. Thus, the Conservative party gave an average of 21,805 speeches per session, while the Labour party gave slightly more (23,432). Overall, each member gave an average of 1,128 speeches in their parliamentary career, with a mean of of 82 speeches in each session. Broken down by party, Tories gave an average of 83 speeches, while Labour members gave 81 speeches per session. The average Conservative speech was 1,023 characters, and for Labour speech it was 1,103 characters. This is comforting though, in any case (see below), where there is asymmetry in representation we use class weights to ensure that the classifier will not increase accuracy by predicting the more common class. The data also shows encouraging consistency over time. In particular, we inspected the number of speeches per member and note that, though exhibiting 10 some periodicity, the mean, median and other percentiles of this statistic are stable. We also looked at the mean length of each members’ speeches over time, which shows a general drifting upwards after the 1930s but stationary thereafter.5 In terms of representing the texts themselves, we assume that the standard ‘bag of words’ vector space model is appropriate, with some pre-processing: we treat each (word) speech as a series of token-specific (i.e. word-specific) frequencies that have been normalized by their maximum absolute value, which allows us to maintain the data in sparse format. We make no attempt to retain word order. We begin by fixing a vocabulary across all sessions6 in which we drop any word that does not appear in 200 speeches in the entire dataset, which leaves 24,726 words. We do not stem or stop words, or otherwise limit tokens, relying instead on the regularization process to drop unimportant terms. 3.1 Other Variables of Interest In what follows below, we control for party with the binary variable ‘Conservative’ taking the value 1 if the member is from the Tory party (zero for Labour). We have a ‘Session’ variable that records the session number (starting at 1 for our first observed session in 1935). Finally, and most importantly, we have a ‘New Cohort’ variable which is an indicator taking the value 1 if the member in question entered the Commons for the first time after the relevant break date, and zero if the member entered prior to this time. To clarify, note that we take the session for which a member’s first speech appears in Hansard as his or her first session. Since one’s maiden speech typically occurs immediately after entering the House for the first time, this provides an excellent proxy for a legislator’s cohort. 5 See Supporting Information A for graphical evidence. One advantage of fixing the vocabulary is that it ensures that our measure is not subject to the bias identified by Gentzkow, Shapiro and Taddy (2015). See Supporting Information B for more details. 6 11 In order to reduce the role played by differences in the choice of subject matter of members of the two parties, we include dummy variables for each of the topics mentioned in a given legislative session. Rather than running a topic model (see, e.g., Quinn et al., 2010), we simply use the debate descriptors that are included in the speeches dataset for each speech. These include topics such as ‘Energy Prices’, ’Welfare Reform (Sick and Disabled People)’, and ‘Family Taxation’. 4 Methods: Parliamentary Polarization As we explained in Section 2, for data availability and basic behavior reasons, it is difficult to accurately model the polarization of the House of Commons and its members over time. While the historical coverage of the debates is impressive, the question now is how to best use them. The intuition of our approach is simple: if by studying their speeches, MPs of different parties cannot be easily distinguished from one another, we are in a world in which Labour and Conservative legislators are not very different. This is evidence of low levels of parliamentary polarization. By contrast, if the speeches are very different by party, then polarization must be higher.7 4.1 Intuition: Attlee vs Eden, Thatcher vs Kinnock To make this intuition more concrete, consider the (real) Hansard exchange between a Prime Minister and Leader of the Opposition displayed in Figure 1. Of course, the use of ‘she’ by the Leader of the Opposition suggests this is from the Thatcher period—which is indeed the case. Notice though, that the language used by the Prime Minister is very ‘Conservative’ in 7 Notice that the comparison is with respect to the relative accuracy of the same technique (which could be a human or a machine) in discriminating between the parties, applied again and again over time: that is, the issue is not that the performance of the approach is ‘bad’ (or ‘good’) in absolute classification terms during certain eras. Indeed, the absolute performance is essentially irrelevant for current purposes. 12 Prime Minister: I am happy that my successor will carry on the excellent policies that have finished with the decline of socialism, brought great prosperity to this country, raised Britain’s standing in the world and brought about a truly capital-owning democracy. Leader of Oppn: If the Prime Minister thinks that nothing should be changed, can she tell us why on earth all those now competing for her job are desperately wriggling around trying to find a way out of the poll tax trap? Figure 1: Thatcher–Kinnock exchange in House of Commons, Nov 27, 1990. nature: she talks of having ended socialism, and the merits of a ‘capital-owning democracy’. This clearly connotes the Tory policy of the 1980s. Meanwhile, Neil Kinnock (then Labour leader) talks of the ‘poll tax’, a term used to criticise the government’s ‘Community Charge’ policy of the time. In contrast, consider Figure 2 where we report part of an exchange on industrial unrest at the London docks in 1948. Here, Eden is the Opposition spokesman, while Attlee is the Prime Minister. Surprisingly, given that issue has obvious political overtones— in the sense that the Labour party is part of the trade union movement—Eden does not use a particularly partisan attacking strategy. The point here is that Thatcher and Kinnock are much more clearly polarized in their speeches than are Eden and Attlee: for the former exchange, it is obvious which is Conservative and which is Labour—this is not true for the latter. Further, one can imagine that by looking for certain terms associated with a given party, we could accurately classify politicians from the 1980s; but by the same token, merely obtaining discriminating terms might be difficult for the 1940s. The idea is that if we could train a computer to find such terms if they exist—and record how helpful those terms are in classifying MPs—then we will have an automated way to measure polarization. 13 Deputy Leader of Oppn: May I ask the Prime Minister if he has been made aware that there is considerable concern that the Minister of Labour should have left the country at this particular moment to go to a conference where the Permanent Secretary of the Ministry of Labour already is, and whether it would not have been possible to retain the Minister of Labour here until these difficult negotiations were completed? Prime Minister: At the time my right hon. Friend the Minister of Labour left for the important Conference of the I.L.O. it was thought that the matter had been settled. The matter is, of course, in hand with the Parliamentary Secretary, and I can assure the right hon. Gentleman that everything will be done. It is, of course, unfortunate that the Minister of Labour should be absent, but these things threaten at times and one can never quite tell whether they are coming off or not. Figure 2: Eden–Attlee exchange in House of Commons, June 22, 1948. 4.2 Intuition: ‘polarizing’ words The examples from 1948 and 1990 above make it clear that, in some periods at least, politicians use words and phrases that connote strict ideological differences. But in our set up, and in contrast to Americanist efforts, parliamentary polarization is not defined only in terms of the drifting apart of individual ideal points by party. That is, our approach is considerably broader: whatever causes speeches to differ by party—including the ideal points of their members, the topics they choose to debate, their stylistic choices–will affect what we denote as (aggregate) ‘polarization’. In that sense, our measure is of ‘partisanship’ in a more general sense than previously considered. To see this idea in action, we consider the contents of two important post-war debates wherein deeply entrenched party political positions were not necessarily couched in explicitly ideological terms. First, the March 28, 1979 discussion preceding the vote of no confidence in James Callaghan’s government (as introduced by then leader of the opposition, Margaret 14 Thatcher). In Table 1 we report a few of the ‘most Conservative’ words in that debate—in the sense that their use is most predictive of the speaker being a Tory MP—followed by some most likely used by Labour MPs.8 In context, the record shows Margaret Thatcher arguing that Callaghan ought to “seek a fresh mandate from the people” in view of the Commons’ lack of support. Meanwhile her Conservative colleagues such as Michael Shersby barracked the Cabinet as “tottering from one crisis to another”, not least due to its “incomes policy”, while Willie Whitelaw talked of “clear and unanswerable exposure of the Government’s failure”. On the government backbenches, by contrast, Eric Heffer argued that the date of the election was irrelevant and that “Whenever we go to the country, the real issue is the basic difference between the philosophy of” the two parties. The second example is taken from the Treaty of Maastricht (Social Protocol) debate of July 22, 1993. Again, we report some key discriminating words in Table 1. In this case, a now Tory Prime Minister (John Major) was under pressure in the House of Commons to keep his fractious party together to pass legislation on European integration. Now we see that the ‘most Conservative’ words include “Monklands”, a reference to the leader of the Opposition, John Smith’s constituency, as the Prime Minister attempted to respond to the latter’s amendment on the day’s business. There is a similar story behind the presence of “Ashdown” who was leader of the Liberal Democrats at the time. We see Major referring to “nonsense from Brussels” and to the support that his position has from the “Confederation of British Industry”. On the Labour side, John Smith talked of the fact that other “right-wing Governments” of Europe had adopted the measure, which would, for example, extend “equal rights and adequate provisions for maternity leave”. So, while some discriminating words are obviously ‘ideological’, many are not, and could in principle have been spoken by either side (depending on their procedural role at the 8 We estimate ideological terms by starting with the coefficients for the each word from each session’s logistic regression model, then multiplying these coefficients by the inverse frequency in that session (with Laplace smoothing) in order to identify words that are most distinctive of speech patterns. 15 1979 No Confidence 1993 Social Chapter ‘Most Conservative’ fresh incomes crisis failure Monklands Ashdown Confederation Brussels ‘Most Labour’ listen complain wanted whenever exists wing lock equal Table 1: Some of the ‘most Conservative’ and ‘most Labour’ words from the 1979 No Confidence debate, and the 1993 Maastricht (Social Protocol) debate. time). Readers may have concerns that this is too far a departure from the usual, narrow, understanding of polarization for the term to have validity here. We have several defences. First, as we will see below, our method produces an aggregate measure that accords with priors from qualitative accounts of British consensus and polarization. In that sense, though we allow for more ‘inputs’, our outputted measure ‘works’ in a validity sense. Second, in a Westminster context where governments have almost full control of the agenda and the Speaker prohibits non-germane contributions, the nature of debate—in terms of topics covered—is essentially fixed in a given session. That is, it is simply cannot be the case that, simultaneously, one side talks about utterly different issues to the other, but would otherwise agree on the substantive positions were they forced to engage on similar matters. 4.3 Machine Learning Polarization As the intuition above makes clear, our machine learning approach aims to capture the extent to which it is possible to distinguish between members of the two parties based on their speeches. We do this by using various supervised algorithms to predict the party affiliation of the speaker of each speech in a legislative session. That is, we have labeled data (the party identifications) and we seek to ‘learn’ the relationship between the speech information and 16 the labels. We can report both an overall accuracy for our classifier, and provide estimates for any given MP in terms of their probability of being in one of the two (Conservative, Labour) classes, given their speeches and the relationships observed in the data. Obviously, we do not intend to capture the full substantive ‘meaning’ of the speeches, and we do not seek to identify the issue positions of individual legislators. Rather, we are successful to the extent that our approach does not miss differences in how legislators of the two parties express themselves; we do not need a complete model of semantic content to do this, and our approach is similar in that sense to Diermeier et al. (2012). As usual with machine learning approaches, we seek to balance strong predictive power against other concerns such as simplicity, reproducibility, overfitting, and computational time (see Hastie, Tibshirani and Friedman, 2009, for discussion of these issues). We choose four cutting edge algorithms that embody all these features to varying extents. These are: • the perceptron algorithm (see Freund and Schapire, 1999), a simple linear classifier with no regularization penalty and a fixed learning rate. This is trained by stochastic gradient descent, and is thus a special case of the second classifier: • a stochastic gradient descent (SGD) classifier, which updates parameters on batches of randomly selected subsets of the data (for an overview see Bottou, 2004). • the ‘passive aggressive’ classifier with hinge-loss, which updates parameters by seeking in each step a hyperplane that is close to the existing solution but which aggressively modifies parameters in order to correctly classify at least one additional example (Crammer et al., 2006). • logistic regression with an L2 penalty, with regulation parameter C = 1000 # training speeches ≈ 0.2, fit using stochastic average gradient descent (see Schmidt, Roux and Bach, 2013). 17 Within each legislative session, we run all four algorithms, then select the algorithm with the highest accuracy as the representative of that session. All four algorithms are implemented using Scikit-Learn (Pedregosa et al., 2011) in the Python language. For each classifier we also average the classification accuracy over a stratified 10-fold cross-validation. In practice, though different in nature, the algorithms perform extremely similarly, on average, which suggests there is little model dependence to our findings (see Supporting Information C). Different legislative sessions have different numbers of members and speeches by one party or the other. In principle, this could cause an algorithms to increase its reported accuracy by simply favoring one party.9 One option is to drop speeches to keep the two parties balanced, though we prefer to not throw out data if avoidable. Instead, we use class (party) weights inversely proportional to the class (party) frequencies, i.e. n , 2·np where n is the total number of speeches and np is the number of speeches by members of that party. That is, we essentially weight up the speeches of the less commonly observed party in a given session. For results in which we report the importance of individual words in each sessions’ models, we focus on the stochastic gradient descent classifier and stochastic average gradient descent logistic regression, which generally had the best performance. 4.4 Member-level estimates Notice that our estimates are at the speech level: that is, given its features, and given the relationship we ‘learn’ between the features and the party of the person who said it, the speech itself is allocated some probability of being (from a) Conservative. We do this in a 9 By way of a pathological example, suppose that in some session 95% of speeches are made by Conservative members, while only 5% come from Labour MPs. In such a world, any algorithm reporting that all speeches were predicted to be Tory would appear to do very well simply as an artifact of the data. 18 principled way across the stratified ten folds (i.e. we avoid overfitting), but the important point is that this information then allows us to estimate the position of any given MP. In particular, we simply assign an MP the average position of their speeches (where the score for each speech is the probability that it is given by a Conservative member, and is thus on the 0–1 interval) in a given session.10 To see how this works in practice, consider Figure 3. There we consider the parliamentary session beginning in November 1984, and within that, PM Margaret Thatcher’s speeches. In the top panel, each black circle is a speech given by Thatcher, and its position (obtained by plugging its characteristics into the SAG learner) on the interval. The dark histogram simply summarizes that information: unsurprisingly, Thatcher gives mostly more ‘Conservative’ speeches. Indeed, her mean speech is around 0.85. The broken line is the density plot for all speeches, for all MPs, during this time: evidently, she is exemplary of Conservative speech during this period. In the bottom plot, we report the empirical cumulative distribution function for the House of Commons, and the enlarged square represents our estimate for her i.e. the mean of her speeches. It is plotted along with all the Conservative MPs (blue squares) and all the Labour MPs (pink circles)—which are also simply the means of the various MPs’ speeches. We perform this process for every MP, for every session, and thus have estimates for our entire period across the chamber. 10 We can also obtain the variance of those estimates, if required. We also note that unlike the probability estimates of a naive Bayes classifier, which can be bi-modal, taking on predominately values near 0 and 1, the estimates are unimodal. This suggests that members with few speeches will not be given extreme values simply due to the sparsity of the data. 19 8 6 4 0 2 Density Thatcher, mean 0.2 0.4 0.6 0.8 1.0 0.8 1.0 0.0 0.0 0.2 0.4 0.6 Thatcher 0.0 0.2 0.4 0.6 0.8 1.0 Figure 3: Example of individual MP estimates: Margaret Thatcher in November 1984 session. Top panel: each black circle is a speech given by Thatcher, and its position (obtained by plugging its characteristics into the SAG learner) on the Labour-Conservative interval. Dark histogram represents density of those speeches. Broken line is the density plot for all speeches, for all MPs. Bottom panel: empirical CDF of House of Commons with Thatcher estimate (i.e. her speech mean) highlighted. For the purposes of our regressions below, we convert all estimated scores from 0–1 within the House, to 0–1 within the relevant party. This is important because it then allows us to treat MPs from different parties with the same score as being ‘equally’ polarized (albeit in different directions). To clarify, each Labour MP now receives a score between 0 and 1, with 1 being most extremely ‘Labour’ at that time; each Conservative MP receives a score between 0 and 1, with 1 being most extremely ‘Conservative’ at that time. This is accomplished very simply: we subtract all Labour MP positions from 1, and keep the Tory estimates as they 20 are.11 4.5 Caveats and Validation While our classification approach is fast and useful, it’s important to be intellectually honest about what it can and cannot do. Ultimately, our measure of polarization is about overlap in speech: that is, we can tell how similar the members of different parties are in terms of what they say. This may hide ‘true’ heterogeneity in a given party, especially if whipping (of speech) is strong. Alternatively, parties might be quite distinct on average, but contain a wide diversity of voiced opinions, making them overlap. Consequently, it is important to assess both whether our estimates are valid (i.e. make sense) for given individuals, and to examine the cases where the classifier does poorly as this provides information regarding the scope of our claims. In Supporting Information D we examine these issues in some detail for a well studied recent period of Westminster history. The upshot is that we are confident we are measuring something meaningful for what follows. 5 Results The results of our main supervised learning analysis can be seen in Figure 4. There, for each of our parliamentary sessions, we report the (mean) accuracy of the algorithm that performs best in separating Conservative from Labour MPs. We also superimpose structural break estimates, which we will describe in more detail below. For now focussing on the points, recall that when the accuracy is high, we are claiming that politics is polarized and divisive. When accuracy is low, the parties are not easily told apart, and thus parliamentary life may 11 As an example, Diane Abbott—a purportedly left-wing Labour MP—is estimated to be at 0.28 on the House 0–1 scale in 2001. We convert her score to 1 − 0.28 = 0.72 within her own party for that session. Meanwhile, Tory MP Peter Ainsworth is estimated to be 0.60 on the original scale which is then used directly in the regressions. This would imply that Abbott is more extreme relative to her own party than Ainsworth is relative to his, which makes substantive sense. 21 be described as more consensual. 5.1 Aggregate Polarization: Flat, Up and Down Our immediate observation from the figure is just how closely it accords with our priors for the period. In particular, a description of the time series would be as follows: in the 1930s, polarization drops rapidly, reaching a nadir in the years of the Second World War. This, of course, makes sense given the (Churchill led) coalition government of that time. Soon after, when elections begin in earnest with the 1945 Labour landslide, polarization ticks up. It then enters a long period of approximate stasis between circa 1945 and circa 1979, with small movements around the mean, though it is gradually sloping upwards. From the first session of 1979, i.e. the session in which Margaret Thatcher assumed the premiership, polarization jumps and reaches its zenith around the session corresponding to 1987. It then falls, gradually at first and then more quickly, as Tony Blair becomes leader of Labour after 1994. By the sessions around 2001, polarization is falling sharply, with the end of Gordon Brown’s government and the beginning of the Conservative-Liberal Democrat coalition marking a further decline. This overall pattern—of relative similarity of the two major parties during the 1950s, 1960s and 1970s, ended by Thatcher’s government—is almost entirely in keeping with most qualitative, substantive accounts of the era under study (Addison, 1994; Kavanagh and Morris, 1994; Fraser, 2000). From a validity perspective, this is good news and we make the observation that, contrary to e.g. Pimlott (1988) in parliament at least, there was indeed a relative post-war consensus in terms of debate. Furthermore, there is prima facie evidence, post-Blair, of a new consensus in line with earlier conjectures from Seldon (1994) and others. Taking the estimates literally, the contemporary polarization of the House of Commons is on a par with that of the mid-1960s, a period thought to mark the high watermark of agreement 22 in policy between the parties. We now push this analysis further by being more formal about the time series. In particular, we consider structural breaks in the sense of Bai and Perron (2003) as implemented by Zeileis et al. (2002).12 Analysis using standard defaults suggests that there are three break points in the data, and thus four segments that differ in terms of their mean polarization. These are the vertical green lines in Figure 4 and correspond to the following dates: September 1948, November 1978 and June 2001. In the case of the first two dates, as visual inspection suggests, the mean level of polarization increased (confirmed by t-test, p < 0.01), and after the third it decreased (p < 0.01). Notice that by segmenting our data systematically, we have imposed sufficient structure to allow some relatively simple tests of cohort effects. That is, we now have a set of time series the clearly defined aggregate changes of which may be decomposed into various sources of variation. To clarify, for each change point, we are defining the relevant sub-series of the data as the two segments joined by that break point. That is, our first subseries is all data between the start of the observations (March 1935) and the second break in 1978 (with the first break somewhere in between). The second subseries is all data between the first break in 1948 and the third in 1979 (with the second break somewhere in between). The final time subseries runs from the second break in 1979 until the end of the data (with the third break somewhere in between). 5.2 Estimating Possible Cohort Effects For each sub-data set, our dependent variable are the polarization scores of the individual MPs which were estimated in the way we described above, along with the important adjustment we mentioned: we subtract Labour estimates from 1 to make them comparable in ‘extremism’ terms to the estimates for the Conservatives. We keep our models simple for interpretation purposes, and there are just three variables on the ‘right hand side’ (other than the intercept). To recall, these are: ‘Conservative’ taking the value 1 if the member 12 We give more details in Supporting Information E. 23 Figure 4: Estimates of parliamentary polarization, by session. Estimated change points are [green] vertical lines. 24 is from the Tory party (zero for Labour); ‘Session’ variable that records the session number (starting at 1 for our first observed session in 1935); ‘New Cohort’which is an indicator taking the value 1 if the member in question entered the Commons for the first time after the relevant break date, and zero if the member entered prior to this time. For example, suppose we were studying the third segment of the data, which covers the Thatcher break point: Conservative members entering at the 1979 general election would receive Conservative=1, New Cohort=1 while ‘Session’ would vary from 46 to 68—for all members—depending on the given time period. Two points should be made before proceeding to the results: first, we have repeated (i.e. session to session) observations for each member, so we cluster our standard errors at the MP level. Second, we cannot simultaneously cater for member fixed effects and cohort effects, because the latter is unchanging for an MP (they enter at a particular time, and this is constant over their career). This is not a problem per se: if we see a non-zero effect for the ‘New Cohort’ variable we have evidence for the idea that replacement matters—new waves of MPs are behaving differently to older waves.13 Our linear model results appear in Table 5.2, where each column corresponds to the relevant subpart of the data. Thus, ‘Breakpoint 1’ refers to the time series that begins at the start of the data, and then runs to the second (Thatcher) breakpoint, with the first breakpoint (September 1948) in between. The other columns are similarly defined. We see, immediately, that there is indeed a cohort effect for all the periods. That is, controlling for session number and party membership, MPs entering the Commons after the relevant break are different in some way from their older colleagues. The directions are perhaps as expected across the models: MPs entering after 1948 were more polarized (on average) than longer serving members, as were 13 To be clear, within each segment, we are fitting a regression of the following form: polarizationi = β0 + β1 New Cohorti + β2 Sessioni + β3 Conservativei + i . Here polarizationi is the polarization score of the ith member (the mean of their speeches, converted to a point on the 0–1 interval as noted above), and the independent variables are as described in text, with i as an error term. 25 (Intercept) New Cohort Session Conservative N R2 adj. R2 Breakpoint 1 Breakpoint 2 Breakpoint 3 Postwar Consensus Thatcher Blair Effect 0.602∗ 0.547∗ 0.632∗ (0.002) (0.004) (0.018) ∗ ∗ 0.023 0.016 −0.045∗ (0.003) (0.004) (0.007) ∗ 0.000 0.001 0.000 (0.000) (0.000) (0.000) −0.022∗ 0.039∗ 0.051∗ (0.002) (0.002) (0.005) 24048 30739 19248 0.026 0.092 0.053 0.026 0.092 0.053 Standard errors, clustered by MP, in parentheses ∗ indicates significance at p < 0.05 Table 2: Linear Regression of individual polarization estimates on cohort indicator, session number and party identification variables. Each column refers to a different segment of the data, demarcated by the relevant change point MPs entering at or after the 1979 election. By contrast, the new cohort arriving in or after 2001 exhibited less (average) polarization than their forebears. An obvious next question is whether members of a certain party were consistently more polarized. Reading across the columns of the ‘Conservative’ row suggests this is false: it is true that Tories were on average more polarized for the 1979 and 2001 periods, but this is not true for the break in 1948 that became the post-war consensus. We can also see that, whatever the cohort effects, there is not much evidence that once a given segment has occurred, members become consistently more or less polarized. The coefficients (and their associated significance) on ‘Session’ suggest that there are no pure ‘time’ effects for the first or third subset of the data. Interestingly though, there is something of an effect for the Thatcher period (that is, the subset of the data which includes Thatcher’s premiership towards its middle). In particular, members of parliament of this time became more polarized with every session, regardless of their party or cohort. Put otherwise, the Thatcher period 26 saw a deepening of partisan divides from members opposite year-on-year, even among those had entered parliament at the same time in the same party. Given our interest in party-specific explanations, readers may wonder about models that include an interaction between party membership and cohort (both binary variables). Inevitably, this gives rise to a more involved model which is harder to interpret. For completeness though, we provide exactly such results in Supporting Information F. Generally, the interaction effects yield models that do not fit the data much better than our simpler efforts, so we focus on the more parsimonious case in what follows. 5.3 Relative Effect Sizes How large are the cohort effects relative to the other variables in the model? To assess this, we apply the ‘relative importance’ method of Lindeman, Merenda and Gold (1980) as implemented by Gromping (2006). An overview of such an approach for social science can be found in Johnson and Lebreton (2004), but the key idea is that the R2 of a given linear model is decomposed into the contribution of each regressor to the model fit, such that the sum of the contributions is the R2 for the original specification. The essence of the estimation is that a sequence of linear models is fit to the data, each with a different number and permutation of the relevant variables. At each step, the increase (or decrease) of the R2 is recorded, and ultimately assigned to a given variable by averaging over the total number of models in which it was included. In Table 3 we report the these results for the three sets of regressions. In each cell, we give the variable contribution to R2 , and in parentheses the proportion of the same for the model in question. We note immediately that the cohort variable explains somewhere between 14 and 37 percent of the variation of polarization over time. The party effect is considerably larger, running from 34 percent of the variation for the Thatcher period to a high of some 81 percent during the last phase of the data. Meanwhile, 27 variable New Cohort Breakpoint 1 0.0096 [0.37] Breakpoint 2 0.0194 [0.21] Breakpoint 3 0.0072 [0.14] Session 0.0047 [0.18] 0.0415 [0.45] 0.0026 [0.05] Conservative 0.0118 [0.45] 0.0309 [0.34] 0.0435 [0.81] Table 3: Relative Importance of predictors for the three models: contribution to model R2 [proportion of R2 in parentheses]. the only period in which the session variable contributes the plurality of the explanatory power of the model is around the second Thatcher break point; elsewhere, it is contributes little to the model, overall. All in all, we conclude that cohort effects are important both statistically and substantively for explaining changes in parliamentary polarization. 5.4 Robustness Our approach uses an estimated dependent variable, which introduces heteroscedascity insofar as we have more accurate point predictions for some MPs relative to others. As noted, we excluded very short speeches, but in any case the great majority of our observations involve more than ten speeches. Still, we verify the robustness of our results with HC3 (Efron) standard errors in the sense recommended by Lewis and Linzer (2005). See Supporting Information G for more details. A different philosophical concern with our approach arises from the fact that the breakpoints could be confounded with major replacement, implying that a cohort effect can be not be identified separately from simply serving at different times. In Supporting Information H we examine the robustness of our results using only data after each change point. 28 6 Discussion In the American context, legislative polarization has inspired much popular and academic concern (although see Fiorina, Abrams and Pope, 2011, for an alternative account). Broadly, the argument is that recent ‘gridlock’, and the commensurate threats to “shut down” the federal government, have negative consequences for everything from the economy to the United States’ standing in the world. In Westminster systems, which entail a “fusion” (Bagehot, 1873/2016, 48) of the executive and legislature along with large, obedient majorities for those in office, polarization of elites along party lines is less likely to lead to dysfunction. This is a fortiori true when legislative leaders seek to court the median voter of the country as a whole, which arguably describes UK politics in recent times (Adams, Green and Milazzo, 2012). In fact, scholars of British politics have gone so far as to suggest that public disengagement—signalled by low turnout—may be the consequence of too little polarization and differentiation between the ‘main’ parties (Ford, 2015). Of course, this is not to say that polarization in the Westminster system—or any parliamentary system—is innocuous: the literature on government formation, for example, suggests ideologically distant parties may make successful coalition agreements less likely (e.g. Warwick, 2005). Whatever the diagnosis, it is clear that interest in elite polarization (or lack thereof) is front and center for political scientists regardless of their country of study. Yet, as we noted above, the literature to date has developed asymmetrically: while we know a great deal about measuring and analyzing legislative polarization in the United States, we have made little progress for Westminster systems, the UK perhaps most obviously. It was partly the task of measurement that this paper set out to solve. We argued that, if polarization has an effect on legislators, we should see it in debates. In particular, we claimed that when Labour and Conservatives make speeches that are distinctively different 29 from each other—as in the mid-1980s—this is prima facie evidence of polarization. By contrast, when the speeches are very similar ‘across the aisle’—as they were in the 1950s and 1960s—there is less polarization. Our machine learning algorithms produced estimates that accord with our qualitative priors about different periods, and we were able to discern three structural breaks: just after the Second World War, the Thatcher era, and the Blair era. Perhaps more importantly, using individual speeches, we were able to calculate MP-level estimates on the Labour–Conservative continuum and use them in regression analyzes. It is, inevitably, difficult to make causal statements from observational data, but we nonetheless would claim evidence for the notion that ‘cohort effects’ matter. That is, there is a systematic difference between MPs, depending on the generation in which they enter the House of Commons. In terms of regression variance explained, this variable contributes somewhere between a non-trivial 14 and 37 percent, over time. Furthermore, even when we restrict our analysis to comparing cohorts after a given structural break, the generations generally remain statistically significantly different. This is important because it rules out the notion that the effect is a function of electoral success or career aspirations (i.e. we are comparing everyone who won (re-)election and served after a given time). Our methods and the way that we employ on the data leave open several interesting avenues of research, especially regarding the causal mechanisms behind the findings. For example, it is possible that the cohort effect comes from different relative ‘susceptibilities’ to (new) leaders, though we noted above that it remains potent even when we control for party membership (and party leaders do not typically change simultaneously). It is also possible that any differences come from evolving elite recruitment efforts by parties, and thus the main part of the data generating process predates the times of serving in the Commons by some years. Second, we have said nothing about smaller parties—such as the Liberal Democrats of Scottish Nationalists—who have played key roles in the Commons in recent times. Going 30 forward, including such MPs may be fruitful not least to paint a more complete picture of contemporary UK politics. In the context of our classification approach, this will involve using versions of the techniques to predict membership of one of multiple classes. Such an innovation is likely to be especially helpful for studying systems outside Westminster, such as Germany, Ireland or Canada, where the norm is several ‘medium sized’ parties which routinely form coalitions with small partners. In such situations, multiclass approaches can tell us more about party overlap and how this might affect the stability of those coalitions. Finally, though we used speeches, we have not said much about how they may relate to differing policy outcomes. So, while it one thing to demonstrate that elite polarization varied over time, it is quite another to show that this polarization had real consequences for the Acts passed and the citizens they affect (see Jennings and John, 2009). We leave these questions for future work. 31 References Adams, James, Jane Green and Caitlin Milazzo. 2012. “Has the British Public Depolarized Along With Political Elites? An American Perspective on British Public Opinion.” Comparative Political Studies 45(4):507–530. Addison, Paul. 1994. The Road to 1945: British Politics and the Second World War. London: Pimlico. Bagehot, Walter. 1873/2016. The English Constitution (2nd Edn). Accessed April 1, 2016.: http://socserv.mcmaster.ca/ econ/ugcm/3ll3/bagehot/constitution.pdf. Bai, Jushan and Pierre Perron. 2003. “Computation and Analysis of Multiple Structural Change Models.” Journal of Applied Econometrics 18:1–22. Barber, Michael and Nolan McCarty. 2015. Causes and Consequences of Polarization. In Solutions to Polarization in America, ed. Nathaniel Persily. Cambridge: Cambridge University Press pp. 15–59. Beelen, Kaspar, Tim Alberdingk Thijm, Christopher Cochrane, Kees Halvemaan, Graeme Hirst, Mike Kimmins, Sander Lijbrink, Maarten Marx, Nona Naderi, Roman Polyanovsky, Ludovic Rheault and Tanya Whyte. 2016. “The Digitization of the Canadian Parliamentary Debates.” Working Paper, Univeristy of Toronto. Bottou, Léon. 2004. Stochastic learning. In Advanced lectures on machine learning. Springer pp. 146–168. Cowley, Philip. 2002. Revolts and Rebellions: Parliamentary Voting Under Blair. London: Politico’s. Crammer, Koby, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz and Yoram Singer. 2006. 32 “Online Passive-Aggressive Algorithms.” Journal of Machine Learning Research 7(1):551– 585. Diermeier, Daniel, Jean-Franois Godbout, Bei Yu and Stefan Kaufmann. 2012. “Language and Ideology in Congress.” British Journal of Political Science 42:31–55. Eggers, Andrew C. and Arthur Spirling. 2016. “Party Cohesion in Westminster Systems: Inducements, Replacement and Discipline in the House of Commons, 18361910.” British Journal of Political Science FirstView:1–23. Fiorina, Morris, Samuel Abrams and Jeremy Pope. 2011. Culture War? The Myth of a Polarized America. New York: Longman. Ford, Robert. 2015. “In Britain, polarization could be the solution”. In Political Polarization in American Politics, ed. Daniel Hopkins and John Sides. New York: Bloomsbury Academic pp. 126–136. Ford, Robert and Matthew Goodwin. 2014. Revolt on the Right: Explaining Support for the Radical Right in Britain. New York: Routledge. Fraser, Duncan. 2000. “The Postwar Consensus: A Debate Not Long Enough.” Parliamentary Affairs 53(2):347–362. Freund, Yoav and Robert E. Schapire. 1999. “Large Margin Classification Using the Perceptron Algorithm.” Machine Learning 37(3):277–296. Gamble, Andrew. 1994. The Free Market and the Strong State: The politics of Thatcherism. New York: NYU Press. Gentzkow, Matthew and Jesse M Shapiro. 2010. “What drives media slant? Evidence from US daily newspapers.” Econometrica 78(1):35–71. 33 Gentzkow, Matthew, Jesse M Shapiro and Matt Taddy. 2015. “Measuring Polarization in High-dimensional Data: Method and Application to Congressional Speech.” NBER Working Paper . Godbout, Jean-Franois and Bjrn Hoyland. 2016. “Unity in Diversity? The Development of Political Parties in the Parliament of Canada, 18672011.” British Journal of Political Science FirstView:1–25. Green, Jane. 2007. “When Voters and Parties Agree: Valence Issues and Party Competition.” Political Studies 55(3):629–655. Gromping, Ulrike. 2006. “Relative Importance for Linear Regression in R: The Package relaimpo.” Journal of Statistical Software 17(1):1–27. Hanretty, Chris, Benjamin Lauderdale and Nick Vivyan. 2016. “Dyadic Representation in a Westminster System.” Legislative Studies Quarterly pp. 1–33. Hastie, Trevor, Robert Tibshirani and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer. Iversen, Torben and David Soskice. 2015. “Information, Inequality, and Mass Polarization: Ideology in Advanced Democracies.” Comparative Political Studies 48(13):1781–1813. Jennings, Will and Peter John. 2009. “The dynamics of political attention: public opinion and the Queen’s Speech in the United Kingdom.” American Journal of Political Science 53(4):838–854. Jensen, Jacob, Suresh Naidu, Ethan Kaplan, Laurence Wilse-Samson, David Gergen, Michael Zuckerman and Arthur Spirling. 2012. “Political polarization and the dynamics of political language: Evidence from 130 years of partisan speech [with comments and discussion].” Brookings Papers on Economic Activity pp. 1–81. 34 Johnson, Jeff and James Lebreton. 2004. “History and Use of Relative Importance Indices in Organizational Research.” Organizational Research Methods 7(3):238–257. Kam, Christopher J. 2009. Party Discipline and Parliamentary Politics. Cambridge: Cambridge University Press. Kavanagh, Dennis and Peter Morris. 1994. Consensus Politics from Attlee to Major. Hoboken: Wiley Blackwell. Kellermann, Michael. 2012. “Estimating Ideal Points in the British House of Commons Using Early Day Motions.” American Journal of Political Science 56(3):757–771. King, Anthony. 2007. The British Constitution. Oxford: Oxford University Press. Lauderdale, Benjamin and Alexander Herzog. 2016. “Measuring Political Positions from Legislative Speech.” Political Analysis 24(2):1–21. Layman, Geoffrey, Thomas Carsey and Juliana Horowitz. 2006. “Party Polarization in American Politics: Characteristics, Causes, and Consequences.” Annual Review of Political Science 9:83–110. Lewis, Jeffrey and Drew Linzer. 2005. “Estimating Regression Models in Which the Dependent Variable Is Based on Estimates.” Political Analysis 13(4):345–364. Lijphart, Arend. 1999. Patterns of Democracy: Government Forms and Performance in Thirty-Six Countries. New Haven, CT: Yale University Press. Lindeman, Richard, Peter Merenda and Ruth Gold. 1980. Introduction to Bivariate and Multivariate Analysis. Glenview, IL: Scott, Foresman. McCarty, Nolan, Keith Poole and Howard Rosenthal. 2006. Polarized America: The Dance of Ideology and Unequal Riches. Cambridge, MA: MIT Press. 35 Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E. Duchesnay. 2011. “Scikit-learn: Machine Learning in Python.” Journal of Machine Learning Research 12:2825–2830. Pimlott, Ben. 1988. The Myth of Consensus. In The Making of Britain: Echoes of Greatness, ed. Lesley Smith. Basingstoke: Macmillan pp. 129–142. Poole, Keith and Howard Rosenthal. 1997. Congress: A Political-Economic History of Roll Call Voting. New York: Oxford University Press. Powell, Bingham. 2000. Elections as Instruments of Democracy: Majoritarian and Proportional Visions. New Haven: Yale University Press. Powell, G. Bingham. 1982. Contemporary democracies: Participation, stability and violence. Cambridge, MA: Harvard University Press. Quinn, Kevin, Burt Monroe, Michael Colaresi, Michael H. Crespin and Dragomir Radev. 2010. “How to Analyze Political Attention with Minimal Assumptions and Costs.” American Journal of Political Science 54:209–228. Rhodes, Rod and Patrick Weller. 2005. Westminster Transplanted and Westminster Implanted: Exploring Political Change. In Westminster Legacies: Democracy and Responsible Government in Asia and the Pacific, ed. Haig Patapan, John Wanna and Patrick Weller. University of New South Wales: University of New South Wales Press. Sartori, G. 1976. Parties and party systems. New York: Cambridge University Press. Schmidt, Mark, Nicolas Le Roux and Francis Bach. 2013. “Minimizing finite sums with the stochastic average gradient.” arXiv preprint arXiv:1309.2388 . Seldon, Anthony. 1994. “The Consensus Debate.” Parliamentary Affairs 47(4):501–514. 36 Slapin, Jonathan B. and Sven-Oliver Proksch. 2008. “A Scaling Model for Estimating TimeSeries Party Positions from Texts.” American Journal of Political Science 52. Spirling, Arthur and Iain McLean. 2007. “UK OC OK?” Political Analysis 15(1):85–96. Spirling, Arthur and Kevin Quinn. 2010. “Identifying intraparty voting blocs in the UK House of Commons.” Journal of the American Statistical Association 105(490):447–457. Toye, Richard. 2012. “From ‘Consensus’ to ‘Common Ground’: The Rhetoric of the Postwar Settlement and its Collapse.” Journal of Contemporary History 48(1):3–23. Warwick, Paul V. 2005. “Do Policy Horizons Structure the Formation of Parliamentary Governments?: The Evidence from an Expert Survey.” American Journal of Political Science 49(2):373–387. Zeileis, Achim, Friedrich Leisch, Kurt Hornik and Christian Kleiber. 2002. “strucchange: An R Package for Testing for Structural Change in Linear Regression Models.” Journal of Statistical Software 7(2):1–38. 37 Supporting Information A Temporal Stability of the Data As we discuss in Section 3, our results are unlikely to be the spurious result of artificial long-term trends in how speeches are made in Parliament. In particular, while there is some variation from one session to another in the number and length of speeches given by members, there is no general trends that aligns with our findings about polarization and cohorts. Consider first the number speeches made by each member per session, presented in Figure 5. While there is some local cyclicality related to electoral periods (with a higher mean number of speeches given in 1979 when Thatcher was elected, for example), overall there is no detectable trend. 38 500 speeches 400 300 5th %ile 95%ile mean 200 median 100 2008 2003 1998 1993 1988 1983 1978 1974 1969 1964 1959 1954 1950 1944 1939 0 session Figure 5: Number of Speeches By Member Per Session. In addition to the number of speeches given, we might be concerned that there are differences in the length of speeches, which could reflect differences in cohorts or procedural roles played by different members. The evidence suggests this is not the case, however, as the mean length of speeches by different members of Parliament remains constant throughout the period of our study. We present the mean and 5th, 50th, and 95th percentiles of the mean length of speeches in Figure 6. While there is a slight increase in the mean length in the post-war period and a slight decrease in recent years, this is minor and does not match the trends we identify in our polarization measure. 39 Mean length of speeches 4000 5th %ile 95%ile mean median 2000 2008 2003 1998 1993 1988 1983 1978 1974 1969 1964 1959 1954 1950 1944 1939 0 session Figure 6: Mean Length of Speeches By Member Per Session. Supporting Information B Measurement Concerns Gentzkow, Shapiro and Taddy (2015) show that two recent measures of polarization from speech based on text (Gentzkow and Shapiro, 2010; Jensen et al., 2012) can be biased by changes in the size of the vocabulary. Such a critique could be of particular interest to our findings since they argue that the revised measure identifies significant polarization in recent years in the U.S. case. However, since we fix the vocabulary across all Parliamentary sessions, we have little reason to think this would affect our results. Their approach to demonstrating this, however, which involves comparing the results when party labels are randomly assigned by member, provides a way to examine whether our results may be the product of some other similar spurious relationship. In particular, we would be concerned if the trend line from the 40 randomized labels closely tracks the trend of our measure (compare Gentzkow, et al, Figures 2, 3). This is not the case for our results, as is clear from comparing our results (in red) to those of 10 runs of randomized party labels (Figure 7). While there is some variation in the estimates generated from random labels, it does not match our results, and differs from them quite substantially at points, such as in suggesting high polarization during the World 0.7 0.6 0.5 0.4 Mean of Max Accuracy 0.8 War II era. 1935 1945 1955 1965 1975 1985 1995 2005 Session Year Figure 7: Estimates of parliamentary polarization, by session, by algorithm. The accuracy using real party labels (our polarization measure) is in red, while 10 runs with party labels randomized by speaker are presented in grey. 41 Supporting Information C Machine Algorithms Produce Similar Results Recall that we use four machine learning algorithms: perceptron and passive aggressive classifiers, a stochastic gradient descent classifier using a hinge-loss penalty and logistic regression using stochastic average gradient descent. When we inspect their mean accuracy rates over time, we see they perform almost identically. This is shown in Figure 8, where the lines each correspond to a different classifier and, importantly, are barely distinguishable from one another. 42 Figure 8: Estimates of parliamentary polarization, by session, by algorithm. Legend abbreviations are logistic regression using stochastic average gradient descent (SAG), stochastic gradient descent classifier (SGD), perceptron43(PCPT), passive aggressive (passAg). Notice that performance is essentially identical across algorithms. Supporting Information D Validation and Misclassification We want to believe that our speech-based metric measures something useful and meaningful, like ideology. To do this, we looked in some detail at the period 1997-2001 for the parliamentary party, which has been studied extensively both quantitatively (Spirling and Quinn, 2010) and qualitatively (Cowley, 2002). In particular, we study the first full year of the Labour government, 1998. If the individual level measures are valid, it should be the case that cabinet ministers and loyal backbenchers—generally New Labour types—appear relatively distant from more rebellious MPs who routinely defied the whip in roll call voting. In Table 4 we report evidence in line with that requirement. In particular, of the 302 Labour members for which we have (well defined) estimates, we see that cabinet members, such as Alan Howarth and Tony Blair are at one end of the spectrum, while serial rebels like Tony Benn and Jeremy Corbyn are at the other. In between them are independent minded non-cabinet members such as Keith Vaz and Roger Berry. 44 Name Estimated Position Rank Alan Howarth 0.12 1 David Blunkett 0.16 11 Tony Blair 0.17 14 Robin Cook 0.18 20 Keith Vaz 0.21 67 Rosie Winterton 0.23 87 Roger Berry 0.23 89 Dennis Skinner 0.27 170 Jeremy Corbyn 0.31 213 Tam Dalyell 0.37 278 Tony Benn 0.39 288 Table 4: Some Labour MPs from ‘most Labour’ to ‘least Labour’ in 1998. Notice that the classifier performs in a particular way: it generally is more successful at classifying core government personnel (in the sense that it estimated to be most ‘Labour-ish’) rather than the rebels. This produces no problem per se for our regression results insofar as what matters for them is the relative position of MPs within their own party—i.e. whether they are far from others or close (not their specific values on the scale). Supporting Information E Structural Break Details As noted above we looked for structural breaks in the sense of Bai and Perron (2003) as implemented by Zeileis et al. (2002). Using standard defaults, which in this case means a minimal window of 0.15 of all the data, this dynamic programming method seeks out (multiple) points where the regression coefficients (in our case, the intercept) shift in value. 45 In Table 5 we give the Bayesian Information Criterion statistic for each number of break points (m), up to 6. By this goodness of fit measure, the regression with three break points is optimal (BIC is lowest). m 0 1 2 3 4 5 6 BIC -302.01 -371.39 -408.57 -413.78 -408.50 -400.05 -389.74 Table 5: Structural breaks in polarization time series. Supporting Information F Models with Interactions It is straightforward to include an interaction effect in our linear models; that is, to include a term calculated as conservative×New Cohort. Noting that both component are binary, such a variable allows for the possibility that there are asymmetric effects of party for those in different cohorts. Since our focus here is on the relative model fit of such efforts compared to our main (non-interaction) regressions, and the relative point estimates, we don’t correct the standard errors (with the exception of one variable in one regression, clustering makes no difference for p < 0.05, anyway). 46 Table 6: Re-estimating our linear models to allow for interaction effects between party and cohort. (Intercept) New Cohort Conservative Session New Cohort×Conservative N R2 adj. R2 Breakpoint 1 Breakpoint 2 Breakpoint 3 0.59∗ 0.55∗ 0.63∗ (0.00) (0.00) (0.01) 0.04∗ −0.01∗ −0.02∗ (0.00) (0.00) (0.01) ∗ ∗ −0.01 0.03 0.05∗ (0.00) (0.00) (0.00) ∗ 0.00 0.00 0.00 (0.00) (0.00) (0.00) −0.03∗ 0.04∗ −0.04∗ (0.00) (0.00) (0.01) 24048 30739 19248 0.034 0.099 0.055 0.034 0.099 0.055 Standard errors in parentheses ∗ indicates significance at p < 0.05 From the perspective of the adjusted-R2 , the model with the interaction adds essentially nothing over our original specifications in the case of Breakpoint 2 (original adjustedR2 = 0.092) and Breakpoint 3 (0.053). In the case of the first breakpoint, the fit is slightly better with the interaction term: moving from an adjusted-R2 of 0.026 to 0.034. To get a sense of the substantive implications of the new term, in Table 7 we present the predicted values of the three possibilities (New Cohort-Labour, Old Cohort-Conservative, New Cohort-Conservative) relative to Labour members not in the new intake. That is, Labour members who take a ‘New Cohort’ value of zero are our baseline (of zero) and any other specification is the point estimate we calculate relative—over or under— to that set of people. 47 Table 7: Estimated polarization score of various groups relative to non-New Cohort Labour members Breakpoint 1 Breakpoint 2 Breakpoint 3 Old, Labour (baseline) 0.000 0.000 0.000 New, Labour 0.040 -0.006 -0.023 Old, Conservative -0.006 0.029 0.054 New, Conservative -0.000 0.064 -0.007 Running through these estimates chronologically, we note that incoming Labour MPs had higher average polarizations than their senior colleagues around the time of the postwar consensus, while Tories of all vintages were less polarized. For the Thatcher era, both sets of Conservatives had higher average polarization scores than senior Labour MPs (perhaps reflecting the idea that the ‘Thatcher effect’ was profound for all Tory MPs). Finally, for the Blair era, we see that incoming Conservative and incoming Labour MPs had predicted values of polarization slightly lower than incumbent Labour members. 48 Supporting Information G Robustness I: Efron/HC3 Correction A natural concern with modeling approaches such as ours is that the dependent variable is estimated, which introduces heteroscedascity in the sense that we can imagine that we have more accurate estimates for some MPs relative to others. Partly to ameliorate this possible problem, we excluded very short speeches as we explained above. Further, over 82 percent of our observations (at the member-session level) involve MPs making at least 10 speeches. Second, state of the art advice on such matters is to use White or Efron heteroskedastic robust standard errors in estimation (Lewis and Linzer, 2005). In the above analysis, we clustered on MP, but it is no problem to switch the specification to HC3 (Efron) and we do just that below. Importantly, the results remain the same: the relevant variables are statistically significant as before. (Intercept) New Cohort Session Conservative N R2 adj. R2 Breakpoint 1 Breakpoint 2 Breakpoint 3 Postwar Consensus Thatcher Blair Effect 0.602∗ 0.547∗ 0.632∗ (0.002) (0.002) (0.007) ∗ ∗ 0.023 0.016 −0.045∗ (0.002) (0.002) (0.004) ∗ 0.000 0.001 0.000 (0.000) (0.000) (0.000) −0.022∗ 0.039∗ 0.051∗ (0.001) (0.001) (0.002) 24048 30739 19248 0.026 0.092 0.053 0.026 0.092 0.053 Standard errors, Efron consistent, in parentheses ∗ indicates significance at p < 0.05 Table 8: Linear Regression of individual polarization estimates on cohort indicator, session number and party identification variables. Each column refers to a different segment of the data, demarcated by the relevant change point. This specification uses Efron HC3 standard errors. 49 Supporting Information H Robustness II: Post breakpoint(s) only In our main regression above, for any given change point, we compare the MPs of different cohorts across all the relevant sub-data. So, for example in the case of the Thatcher break point, we are comparing the behavior of those elected prior to the change point with the behavior of the same people after the breakpoint and to the behavior of the new cohort (after the break point). This may be problematic if it’s the case there is large scale replacement at or near the breakpoints. At the extreme, one can imagine comparing an ‘old cohort’ all of whom served prior to the breakpoint, with a ‘new cohort’ all of serve after. But in that situation, any cohort effect is clearly confounded by serving at different times. With this in mind, we re-specify our models using just the data after each break. This, inevitably, reduces the number of observations available, but it forces the regression to compare ‘like with like’ in terms of service. We report the results of such regressions below There is some sign flipping in the coefficients, but importantly, cohort remains a statistically significant variable in all specifications. 50 (Intercept) New Cohort Session Conservative N R2 adj. R2 Breakpoint 1 Breakpoint 2 Breakpoint 3 0.61∗ 0.48∗ 0.98∗ (0.00) (0.02) (0.07) ∗ ∗ 0.02 0.01 0.02∗ (0.00) (0.00) (0.01) ∗ −0.00 0.00 −0.00∗ (0.00) (0.00) (0.00) −0.02∗ 0.12∗ −0.10∗ (0.00) (0.00) (0.00) 17455 13284 5964 0.02 0.25 0.17 0.02 0.25 0.17 MP-clustered standard errors in parentheses ∗ indicates significance at p < 0.05 Table 9: Checking Robustness by Restricting Data to all Observations occurring after the relevant breakpoint. 51
© Copyright 2026 Paperzz