Parliamentary Polarization Cohort Effects and Ideological

Parliamentary Polarization
Cohort Effects and Ideological Dynamics in the UK
House of Commons, 1935–2013∗
Andrew Peterson†
Arthur Spirling‡
Abstract
We consider the causes of changing levels of elite polarization in the House of Commons,
a challenging but important task for understanding the development of UK politics.
Making use of a new dataset of 3.5 million speeches over the period 1935–2013, we
provide a new measurement strategy that takes relative distinctiveness in spoken contributions between Labour and Conservative members as evidence of polarization. In
the aggregate, this yields a long-term overview of parliament that is remarkably consistent with well-known historical accounts. We show that there are three structural
breaks in the data, and polarization in the post-Blair period has declined to levels
not seen since the mid-1960s. Using the individual level estimates we then derive and
validate, we explore ‘cohort effects’—by which new, incoming generations of MPs are
a possible cause of changes in elite polarization. We show such effects are real and
surprisingly large.
∗
First version: February 16, 2016. This version: October 1, 2016. We thank Chris Kam, Ben Lauderdale
and Gaurav Sood for helpful comments on an earlier draft. Kaspar Beelen provided invaluable research
advice and assistance with data.
†
PhD Candidate, Department of Politics, New York University. [email protected]
‡
Associate Professor of Politics and Data Science, New York University. [email protected]
1
1
Introduction
The study of party systems and their effects is at the core of political science research
(e.g. Sartori, 1976; Powell, 1982). Commensurate with this long-standing interest, in recent
times scholars have shifted to examining the causes and consequences of ‘polarization’ in
both the Comparative (e.g. Iversen and Soskice, 2015) and Americanist literatures (e.g. Barber and McCarty, 2015). Not least because it represents one of the oldest, most imitated
and most stable systems of governance (Rhodes and Weller, 2005), Britain’s Westminster
polity has attracted a great deal of attention in this context. This work runs the gamut,
and includes both long–range historical, qualitative accounts of the (purported) ‘post-war
consensus’ among leaders (e.g. Kavanagh and Morris, 1994), along with focussed statistical
studies of voter behavior in modern times (e.g. Green, 2007).
Despite this broad-ranging research agenda, we have little systematic knowledge of elite—
that is, Member of Parliament (MP)—behavior, with respect to its relationship with polarization of the House of Commons, for any period. Of course, scholars have have made great
strides in studying MP roll call activities (e.g. Hanretty, Lauderdale and Vivyan, 2016) and
MP beliefs (e.g. Kam, 2009) but, for the reasons we explain below, they have not been able
to examine the changing nature of inter-party and intra-party ideological cohesion in the
Commons over the long term. This means, for example, that it is impossible to test important historical theories of aggregate change, such as comparing contemporary fractiousness
to its level in the post-Second World War period (see, e.g., Seldon, 1994). Because we
cannot measure elite polarization in a reproducible, quantitative way, we can say almost
nothing about what drives it over time, be it new leaders, or the latent characteristics of
new cohorts of MPs, both, or neither. This may be compared to historical work on cohesion
in Westminster systems where such issues have been considered in some detail (see, e.g.,
2
Eggers and Spirling, 2016; Godbout and Hoyland, 2016). This situation is also in stark
contrast to research on the United States, where scholars have readily available, valid and
reliable measures of polarization, and consequently paint an increasingly complete picture
of its development and correlates in the long term (e.g. McCarty, Poole and Rosenthal, 2006).
The current paper improves on this state of affairs for scholars of British politics specifically, and students of Comparative Politics, more generally. In particular, we introduce
new methods applied to a massive data set of 3.5 million Commons speeches, in order to
provide both aggregate and individual estimates of polarization for every single parliamentary session and every member of the period 1935–2013. Our central logic is to conceive
of MPs from different parties as being more or less distinguishable over time, in terms of
what they choose to say. More specifically, when Labour MPs cannot easily be told apart
from Conservative MPs, we are in a world of relatively low polarization. By contrast, when
it is straightforward to discriminate between partisans based on their utterances—say, with
regards to the topics they raise, or the way they express themselves—we are in a more polarized era. Because assessing such things is simply beyond the scope of human hand coding,
we use fast, accurate ‘supervised learning’ techniques—with the party membership as the
‘label’—to place the speeches in a continuous Labour–Conservative space. From there, we
aggregate in a simple but valid way to both the legislators themselves and the parliamentary
year as a whole. Though not completely without precedent in political science (e.g. Diermeier
et al., 2012), our measurement strategy allows us to shed light on long-term Westminster
patterns of behavior in a fundamentally new way. By providing estimates for the individuals over their careers, we can undertake regressions in a cross-section time-series format,
which help us to understand whether new cohorts of MPs do or do not affect the nature
of Commons competition. That is, we can determine whether MPs entering in different
‘generations’ push Commons life towards more different or more similar party ideological
3
positions. Further, we can contrast such effects to those coming from other sources such
as new leadership (or its associated time trends). Unlike all previous approaches for assessing the ideological makeup of the chamber, we are not reliant on roll calls and thus avoid
the problems they entail (e.g. Spirling and McLean, 2007). Nor are we are bound by data
availability to a short time period, or to incomplete coverage of MPs (e.g. Kellermann, 2012).
To preview our findings, at the aggregate level, we find strong evidence of a ‘post-war consensus’, followed by a distinctive break in 1979. This is in line with some qualitative accounts,
though by no means all of them (e.g. Pimlott, 1988). More interestingly, we provide evidence that Blair’s election as Labour leader, and the 2001 general election in particular,
are the beginning of a sharp overall decline in polarization. This continues into the most
recent period of coalition government (that is, from 2010 until the 2015 general election).
At a more micro-level, in contrast to earlier accounts focussing on roll calls in the 19th
Century (Eggers and Spirling, 2016; Godbout and Hoyland, 2016) we show that cohorts are
a consistently statistically significant predictor of member polarization, though the direction of the effect changes over time. In particular, new junior cohorts of MPs were more
polarized than senior colleagues immediately after the Second World War, and during the
advent of the first Thatcher administration. However, they were significantly less polarized
than longer serving MPs in the post-New Labour era. These effects are robust to controlling
for the party of the MP (that is, this is not simply an effect of the partisan identity of the
government), and time dummies. In fact, we find that it was only during the Thatcher era
that MPs become progressively more polarized as time passed: prior to and post-Thatcher,
there are no statistically significant time effects. This means that, generally, MPs are as
polarized ‘as they’ll ever be’ when they enter the Commons for the first time. Overall, we
demonstrate that somewhere between a sixth and just over a third of all the variation in
speech polarization stems from the cohort in which an MP entered parliament. Though not
4
the direct focus of our paper, this has interesting implications for the nature of Westminster
political competition: rather than thinking about individual parliamentary leaders and their
associated periods of office as more tribalistic or harmonious than others, we should perhaps
pay at least as much attention to their colleagues on the benches beside and behind them.
2
Literature and Orientation
The polarization of American politics has garnered a great deal of scholarly consideration
(see Layman, Carsey and Horowitz, 2006, e.g.). In that literature, the fundamental quantitative evidence for this phenomenon is the increasing distance between the Democratic
and Republican parties in terms of their roll call vote behavior in Congress. Inspection of
post-war NOMINATE scores (Poole and Rosenthal, 1997), for example, suggests that on the
main economic dimension of politics, the legislative parties are increasingly cohesive, with
fewer conservative Democrats or liberal Republicans between them. Both the causes and
consequences of this change are active areas of research (see Barber and McCarty, 2015, for
an overview).
2.1
Evidence of the Post-War consensus in the UK
Changes to polarization in the UK have not received similar levels of attention in political
science and there is, in general, little statistical work on ideological partisanship for the
House of Commons. On the qualitative side, political historians have engaged in a longstanding debate on the purported ‘post war consensus’ in Britain (see Fraser, 2000, for an
overview). In essence this is about existence: on one side, scholars such as Addison (1994),
Kavanagh and Morris (1994) and Seldon (1994) argue that the period between the Second
World War and the 1980s was marked by the implementation of very similar policies—such
as the goal of full employment, a mixed economy with large amounts of nationalization, and
5
the importance of the Atlantic Alliance—regardless of which party (Labour or Conservative)
was in power. On the other hand, researchers like Pimlott (1988) claim that the parties were
different in many areas of ideology and policy; and, in any case, voting in the electorate were
certainly divided along class and party lines.
As is obvious from this brief summary, these authors may be speaking past each other
in the sense that the level of analysis—elite versus ‘ordinary’ voters—seem to differ between accounts (see Fraser, 2000, 349–250 on definitional issues). Indeed, to the extent
that quantitative methods have been used to study polarization, they have tended to focus
almost exclusively on electoral behavior. In this vein, for example, Adams, Green and Milazzo (2012) show that between 1987 and 2001, there has been a marked decline in partisan
sorting—that is, the match between voter policy positions and party choice—although the
changes to the variance of policy positions per se have not much changed.1 In common with
other work on this topic, for reasons we explain below, these authors take as given—rather
than provide or cite quantitative evidence that demonstrates— that the various purported
eras of consensus or disharmony are exactly that. When quantitative scholars have turned
to Commons behavior itself with ideological eras in mind, they have tended to focus on the
(fractious) politics within single parties and the ‘rebellions’ these induce (e.g. Cowley, 2002),
rather than the causes and effects of overall levels of polarization.
Part of the reason that polarization at the voter-level has tended to have the lion’s share of
quantitative effort is that elites, and elite behavior, are more difficult to study with systematic statistical techniques. For example, using roll calls to infer relative partisan difference
is extremely problematic in Westminster countries: first, as in many parliamentary systems,
the parties tend to vote extremely cohesively, regardless of era. This means that we do
1
See Ford and Goodwin (2014) for a complementary discussion of newer ideological movements in Britain.
6
not have any variance in ‘reaching across the aisle’ politics (there is consistently none) as
might be the case in the United States. Second, there is reason to believe that even when
we see departures from the norm of cohesion, the presence of “government vs opposition”
motivations means that it is generally not possible to draw sensible conclusions about the
relationship between preferences and behavior (Spirling and McLean, 2007). Scholars have
attempted to get around these issues either with surveys of member positions (e.g. Kam,
2009) or by modeling something other than divisions; Kellermann (2012), for example, considers the (co)signing of ‘Early Day Motions’ (EDMs) as indicative of MP ideology. These
techniques are helpful in the modern period, but of limited utility for the current problem.
On the former, it is only relatively recently that surveys were administered in anything like
a comprehensive fashion. On the latter, it is not trivial to find (pre-war) historical data
on EDMs and, in any case, they yield inferences only for those who chose to sign them.
Finally, scholars have invented ways to locate parliamentary parties relative to one another
from speech (Slapin and Proksch, 2008), but these typically require strong dimensional assumptions (i.e. that there is one dimension to political conflict) and are not amenable to
estimating member positions, which is a key part of our analysis here. Recently, Lauderdale
and Herzog (2016) have built on such efforts to obtain individual estimates, but there the
focus is on the underlying ‘political disagreement’ dimension (such as ‘government vs opposition) which as noted above, need not be the same as the ‘sincere’ left-right ideological
continuum we are interested in.
2.2
Cohorts versus Eras
As we will describe in more detail, the data we will use to draw conclusions about polarization are text.2 In particular, they are debates from Hansard. By applying the machine
2
We note that scholars of rhetoric have studied the post-war consensus in terms of elite expression (see,
e.g., Toye, 2012), though not in a systematic, quantitative way.
7
learning techniques below, we will be able to draw conclusions about the aggregate amount
of polarization in parliament at any given time, and comment directly on the plausibility
of any ‘post-war consensus’ and its subsequent development. Furthermore, we will able to
discern what predicts changes in polarization.
At the most basic level, a natural way to think about polarization is that it is driven by new
members arriving in the House of Commons. That is, when general elections introduce new
MPs to parliament, those MPs bring latent characteristics in terms of, say, their position
on some underlying left-right continuum of politics. If new Conservative MPs tend to be to
the right of current ones, while new Labour MPs tend to be to the left of their colleagues,
we would expect average polarization to increase. The opposite would hold if more moderate MPs join the chamber. Of course, even in the presence of new latent types, there are
reasons to believe that polarization might not change at all. We know that Westminster is
a highly structured place where, for example, deviance from the party (i.e. leader) line is
harshly punished such that, in equilibrium, behavior is extremely predictable regardless of
constituency or personality type (e.g. Kam, 2009). With this in mind, there is reason to
believe we may see no ‘cohort effects’ whatsoever; indeed, this is exactly what Eggers and
Spirling (2016) and Godbout and Hoyland (2016) find for 19th Century roll call behavior in
the UK and Canada respectively.
If change does not originate with new cohorts of MPs, it might come from MPs (of whatever
vintage) facing new incentives at different times. In particular, one can imagine that fresh
leaders affect their followers by introducing new ideological arguments and divides, or by
bringing new items on to the agenda (see Eggers and Spirling, 2016; Godbout and Hoyland,
2016, for a review of similar arguments). Such ‘inducement’ stories imply that what matters
is the (leadership) ‘era’ itself, rather than the cohorts of the MPs within it. This is, to be
8
sure, the default position of the qualitative literature on the origins of the post-war consensus (e.g. Addison, 1994), the role of the prime minister in general (e.g. King, 2007), and
accounts of specific individuals in that position (e.g. Gamble, 1994). In this telling, a variable
pertaining to the session number will be significant at all times; that is, what matters is the
specific time period (coterminous with the identity of a particular set of leaders). In the
extreme version, such a predictor will ‘crowd out’ any explanatory power from replacement.
All in all, the ‘textbook’ theory (e.g. Lijphart, 1999; Powell, 2000)—and empirical evidence
regarding (Eggers and Spirling, 2016)—Westminster systems yields priors that cohorts don’t
matter very much for polarization. With that in mind, finding cohort effects on relative
partisanship would be interesting per se. It also suggests something very different to other
theories from a policy perspective: for example, it implies that if a less polarized environment is desired, tinkering with selection and screening mechanisms for candidates in the
constituencies may be part of the solution. It is the possibility of such an effect that we
focus on below.
3
Data: 3.5 million speeches over 78 years
The official Hansard record of British parliamentary debates has existed since the 19th Century in paper form. Following digitization efforts by the “Digging into Linked Parliamentary
Data” project, almost all volumes to the present day now exist in electronic form.3 This
data has been extensively cleaned and matched with (disambiguated) meta-data on member
names, whose ministerial roles and party identifications have been disambiguated.4
3
See http://schema.politicalmashup.nl/
We obtained xml copies of the records from Kaspar Beelen, a team member of the project. See Beelen
et al. (2016) for details.
4
9
The data is for the period 1935–2013. This comfortably covers the post- Second World
War era at Westminster, and thus is optimal for our purposes here. More specifically, we focus only on Labour and Conservative members. Bearing in mind that they consistently held
between 85 percent and (almost) all the parliamentary seats—and had duopolistic control of
Prime Ministerial office—during this time, this partisan restriction makes life considerably
easier without much loss of analytical power.
The relevant time series unit is the parliamentary ‘session’, a period lasting approximately
a year (unless a general election intervenes). Thus we are working with 3,573,778 speeches
over 78 sessions, given by 3,167 unique members, including a total of 5,085 ministerial roles.
We do not work with all the speeches available. In particular, we do not use any speech
containing fewer than 40 characters, and we drop any tokens which consist only of numbers
or symbols. This is to avoid utterances (such as ‘hear-hear’) which are very common, but
contain little substantive content.
The data is remarkably well balanced in terms of partisan contributions, which is a testament
to the dominance of the two ‘big’ British parties at this time. Thus, the Conservative party
gave an average of 21,805 speeches per session, while the Labour party gave slightly more
(23,432). Overall, each member gave an average of 1,128 speeches in their parliamentary
career, with a mean of of 82 speeches in each session. Broken down by party, Tories gave
an average of 83 speeches, while Labour members gave 81 speeches per session. The average
Conservative speech was 1,023 characters, and for Labour speech it was 1,103 characters.
This is comforting though, in any case (see below), where there is asymmetry in representation we use class weights to ensure that the classifier will not increase accuracy by predicting
the more common class. The data also shows encouraging consistency over time. In particular, we inspected the number of speeches per member and note that, though exhibiting
10
some periodicity, the mean, median and other percentiles of this statistic are stable. We
also looked at the mean length of each members’ speeches over time, which shows a general
drifting upwards after the 1930s but stationary thereafter.5
In terms of representing the texts themselves, we assume that the standard ‘bag of words’
vector space model is appropriate, with some pre-processing: we treat each (word) speech as
a series of token-specific (i.e. word-specific) frequencies that have been normalized by their
maximum absolute value, which allows us to maintain the data in sparse format. We make
no attempt to retain word order. We begin by fixing a vocabulary across all sessions6 in
which we drop any word that does not appear in 200 speeches in the entire dataset, which
leaves 24,726 words. We do not stem or stop words, or otherwise limit tokens, relying instead
on the regularization process to drop unimportant terms.
3.1
Other Variables of Interest
In what follows below, we control for party with the binary variable ‘Conservative’ taking
the value 1 if the member is from the Tory party (zero for Labour). We have a ‘Session’
variable that records the session number (starting at 1 for our first observed session in 1935).
Finally, and most importantly, we have a ‘New Cohort’ variable which is an indicator taking
the value 1 if the member in question entered the Commons for the first time after the
relevant break date, and zero if the member entered prior to this time. To clarify, note that
we take the session for which a member’s first speech appears in Hansard as his or her first
session. Since one’s maiden speech typically occurs immediately after entering the House for
the first time, this provides an excellent proxy for a legislator’s cohort.
5
See Supporting Information A for graphical evidence.
One advantage of fixing the vocabulary is that it ensures that our measure is not subject to the bias
identified by Gentzkow, Shapiro and Taddy (2015). See Supporting Information B for more details.
6
11
In order to reduce the role played by differences in the choice of subject matter of members
of the two parties, we include dummy variables for each of the topics mentioned in a given
legislative session. Rather than running a topic model (see, e.g., Quinn et al., 2010), we
simply use the debate descriptors that are included in the speeches dataset for each speech.
These include topics such as ‘Energy Prices’, ’Welfare Reform (Sick and Disabled People)’,
and ‘Family Taxation’.
4
Methods: Parliamentary Polarization
As we explained in Section 2, for data availability and basic behavior reasons, it is difficult
to accurately model the polarization of the House of Commons and its members over time.
While the historical coverage of the debates is impressive, the question now is how to best
use them. The intuition of our approach is simple: if by studying their speeches, MPs of
different parties cannot be easily distinguished from one another, we are in a world in which
Labour and Conservative legislators are not very different. This is evidence of low levels of
parliamentary polarization. By contrast, if the speeches are very different by party, then
polarization must be higher.7
4.1
Intuition: Attlee vs Eden, Thatcher vs Kinnock
To make this intuition more concrete, consider the (real) Hansard exchange between a Prime
Minister and Leader of the Opposition displayed in Figure 1. Of course, the use of ‘she’ by
the Leader of the Opposition suggests this is from the Thatcher period—which is indeed the
case. Notice though, that the language used by the Prime Minister is very ‘Conservative’ in
7
Notice that the comparison is with respect to the relative accuracy of the same technique (which could
be a human or a machine) in discriminating between the parties, applied again and again over time: that
is, the issue is not that the performance of the approach is ‘bad’ (or ‘good’) in absolute classification terms
during certain eras. Indeed, the absolute performance is essentially irrelevant for current purposes.
12
Prime Minister: I am happy that my successor will carry on
the excellent policies that have finished with the decline of
socialism, brought great prosperity to this country, raised
Britain’s standing in the world and brought about a truly
capital-owning democracy.
Leader of Oppn: If the Prime Minister thinks that nothing should
be changed, can she tell us why on earth all those now competing
for her job are desperately wriggling around trying to find a way
out of the poll tax trap?
Figure 1: Thatcher–Kinnock exchange in House of Commons, Nov 27, 1990.
nature: she talks of having ended socialism, and the merits of a ‘capital-owning democracy’.
This clearly connotes the Tory policy of the 1980s. Meanwhile, Neil Kinnock (then Labour
leader) talks of the ‘poll tax’, a term used to criticise the government’s ‘Community Charge’
policy of the time. In contrast, consider Figure 2 where we report part of an exchange on industrial unrest at the London docks in 1948. Here, Eden is the Opposition spokesman, while
Attlee is the Prime Minister. Surprisingly, given that issue has obvious political overtones—
in the sense that the Labour party is part of the trade union movement—Eden does not use
a particularly partisan attacking strategy.
The point here is that Thatcher and Kinnock are much more clearly polarized in their
speeches than are Eden and Attlee: for the former exchange, it is obvious which is Conservative and which is Labour—this is not true for the latter. Further, one can imagine
that by looking for certain terms associated with a given party, we could accurately classify
politicians from the 1980s; but by the same token, merely obtaining discriminating terms
might be difficult for the 1940s. The idea is that if we could train a computer to find such
terms if they exist—and record how helpful those terms are in classifying MPs—then we will
have an automated way to measure polarization.
13
Deputy Leader of Oppn: May I ask the Prime Minister if he has
been made aware that there is considerable concern that the
Minister of Labour should have left the country at this particular
moment to go to a conference where the Permanent Secretary of
the Ministry of Labour already is, and whether it would not have
been possible to retain the Minister of Labour here until these
difficult negotiations were completed?
Prime Minister: At the time my right hon. Friend the Minister
of Labour left for the important Conference of the I.L.O. it
was thought that the matter had been settled. The matter is, of
course, in hand with the Parliamentary Secretary, and I can assure
the right hon. Gentleman that everything will be done. It is, of
course, unfortunate that the Minister of Labour should be absent,
but these things threaten at times and one can never quite tell
whether they are coming off or not.
Figure 2: Eden–Attlee exchange in House of Commons, June 22, 1948.
4.2
Intuition: ‘polarizing’ words
The examples from 1948 and 1990 above make it clear that, in some periods at least, politicians use words and phrases that connote strict ideological differences. But in our set up, and
in contrast to Americanist efforts, parliamentary polarization is not defined only in terms of
the drifting apart of individual ideal points by party. That is, our approach is considerably
broader: whatever causes speeches to differ by party—including the ideal points of their
members, the topics they choose to debate, their stylistic choices–will affect what we denote
as (aggregate) ‘polarization’. In that sense, our measure is of ‘partisanship’ in a more general
sense than previously considered.
To see this idea in action, we consider the contents of two important post-war debates
wherein deeply entrenched party political positions were not necessarily couched in explicitly
ideological terms. First, the March 28, 1979 discussion preceding the vote of no confidence
in James Callaghan’s government (as introduced by then leader of the opposition, Margaret
14
Thatcher). In Table 1 we report a few of the ‘most Conservative’ words in that debate—in
the sense that their use is most predictive of the speaker being a Tory MP—followed by some
most likely used by Labour MPs.8 In context, the record shows Margaret Thatcher arguing
that Callaghan ought to “seek a fresh mandate from the people” in view of the Commons’
lack of support. Meanwhile her Conservative colleagues such as Michael Shersby barracked
the Cabinet as “tottering from one crisis to another”, not least due to its “incomes policy”,
while Willie Whitelaw talked of “clear and unanswerable exposure of the Government’s failure”. On the government backbenches, by contrast, Eric Heffer argued that the date of the
election was irrelevant and that “Whenever we go to the country, the real issue is the basic
difference between the philosophy of” the two parties. The second example is taken from
the Treaty of Maastricht (Social Protocol) debate of July 22, 1993. Again, we report some
key discriminating words in Table 1. In this case, a now Tory Prime Minister (John Major)
was under pressure in the House of Commons to keep his fractious party together to pass
legislation on European integration. Now we see that the ‘most Conservative’ words include
“Monklands”, a reference to the leader of the Opposition, John Smith’s constituency, as the
Prime Minister attempted to respond to the latter’s amendment on the day’s business. There
is a similar story behind the presence of “Ashdown” who was leader of the Liberal Democrats
at the time. We see Major referring to “nonsense from Brussels” and to the support that his
position has from the “Confederation of British Industry”. On the Labour side, John Smith
talked of the fact that other “right-wing Governments” of Europe had adopted the measure,
which would, for example, extend “equal rights and adequate provisions for maternity leave”.
So, while some discriminating words are obviously ‘ideological’, many are not, and could
in principle have been spoken by either side (depending on their procedural role at the
8
We estimate ideological terms by starting with the coefficients for the each word from each session’s
logistic regression model, then multiplying these coefficients by the inverse frequency in that session (with
Laplace smoothing) in order to identify words that are most distinctive of speech patterns.
15
1979 No Confidence
1993 Social Chapter
‘Most Conservative’
fresh
incomes
crisis
failure
Monklands
Ashdown
Confederation
Brussels
‘Most Labour’
listen
complain
wanted
whenever
exists
wing
lock
equal
Table 1: Some of the ‘most Conservative’ and ‘most Labour’ words from the 1979 No Confidence debate, and the 1993 Maastricht (Social Protocol) debate.
time). Readers may have concerns that this is too far a departure from the usual, narrow,
understanding of polarization for the term to have validity here. We have several defences.
First, as we will see below, our method produces an aggregate measure that accords with priors from qualitative accounts of British consensus and polarization. In that sense, though we
allow for more ‘inputs’, our outputted measure ‘works’ in a validity sense. Second, in a Westminster context where governments have almost full control of the agenda and the Speaker
prohibits non-germane contributions, the nature of debate—in terms of topics covered—is
essentially fixed in a given session. That is, it is simply cannot be the case that, simultaneously, one side talks about utterly different issues to the other, but would otherwise agree
on the substantive positions were they forced to engage on similar matters.
4.3
Machine Learning Polarization
As the intuition above makes clear, our machine learning approach aims to capture the extent to which it is possible to distinguish between members of the two parties based on their
speeches. We do this by using various supervised algorithms to predict the party affiliation
of the speaker of each speech in a legislative session. That is, we have labeled data (the party
identifications) and we seek to ‘learn’ the relationship between the speech information and
16
the labels. We can report both an overall accuracy for our classifier, and provide estimates
for any given MP in terms of their probability of being in one of the two (Conservative,
Labour) classes, given their speeches and the relationships observed in the data. Obviously,
we do not intend to capture the full substantive ‘meaning’ of the speeches, and we do not
seek to identify the issue positions of individual legislators. Rather, we are successful to
the extent that our approach does not miss differences in how legislators of the two parties
express themselves; we do not need a complete model of semantic content to do this, and
our approach is similar in that sense to Diermeier et al. (2012).
As usual with machine learning approaches, we seek to balance strong predictive power
against other concerns such as simplicity, reproducibility, overfitting, and computational
time (see Hastie, Tibshirani and Friedman, 2009, for discussion of these issues). We choose
four cutting edge algorithms that embody all these features to varying extents. These are:
• the perceptron algorithm (see Freund and Schapire, 1999), a simple linear classifier
with no regularization penalty and a fixed learning rate. This is trained by stochastic
gradient descent, and is thus a special case of the second classifier:
• a stochastic gradient descent (SGD) classifier, which updates parameters on batches
of randomly selected subsets of the data (for an overview see Bottou, 2004).
• the ‘passive aggressive’ classifier with hinge-loss, which updates parameters by seeking
in each step a hyperplane that is close to the existing solution but which aggressively modifies parameters in order to correctly classify at least one additional example
(Crammer et al., 2006).
• logistic regression with an L2 penalty, with regulation parameter C =
1000
# training speeches
≈
0.2, fit using stochastic average gradient descent (see Schmidt, Roux and Bach, 2013).
17
Within each legislative session, we run all four algorithms, then select the algorithm with the
highest accuracy as the representative of that session. All four algorithms are implemented
using Scikit-Learn (Pedregosa et al., 2011) in the Python language. For each classifier we
also average the classification accuracy over a stratified 10-fold cross-validation. In practice,
though different in nature, the algorithms perform extremely similarly, on average, which
suggests there is little model dependence to our findings (see Supporting Information C).
Different legislative sessions have different numbers of members and speeches by one party
or the other. In principle, this could cause an algorithms to increase its reported accuracy
by simply favoring one party.9 One option is to drop speeches to keep the two parties balanced, though we prefer to not throw out data if avoidable. Instead, we use class (party)
weights inversely proportional to the class (party) frequencies, i.e.
n
,
2·np
where n is the total
number of speeches and np is the number of speeches by members of that party. That is, we
essentially weight up the speeches of the less commonly observed party in a given session.
For results in which we report the importance of individual words in each sessions’ models, we focus on the stochastic gradient descent classifier and stochastic average gradient
descent logistic regression, which generally had the best performance.
4.4
Member-level estimates
Notice that our estimates are at the speech level: that is, given its features, and given the
relationship we ‘learn’ between the features and the party of the person who said it, the
speech itself is allocated some probability of being (from a) Conservative. We do this in a
9
By way of a pathological example, suppose that in some session 95% of speeches are made by Conservative
members, while only 5% come from Labour MPs. In such a world, any algorithm reporting that all speeches
were predicted to be Tory would appear to do very well simply as an artifact of the data.
18
principled way across the stratified ten folds (i.e. we avoid overfitting), but the important
point is that this information then allows us to estimate the position of any given MP. In
particular, we simply assign an MP the average position of their speeches (where the score
for each speech is the probability that it is given by a Conservative member, and is thus on
the 0–1 interval) in a given session.10 To see how this works in practice, consider Figure 3.
There we consider the parliamentary session beginning in November 1984, and within that,
PM Margaret Thatcher’s speeches. In the top panel, each black circle is a speech given by
Thatcher, and its position (obtained by plugging its characteristics into the SAG learner)
on the interval. The dark histogram simply summarizes that information: unsurprisingly,
Thatcher gives mostly more ‘Conservative’ speeches. Indeed, her mean speech is around 0.85.
The broken line is the density plot for all speeches, for all MPs, during this time: evidently,
she is exemplary of Conservative speech during this period. In the bottom plot, we report
the empirical cumulative distribution function for the House of Commons, and the enlarged
square represents our estimate for her i.e. the mean of her speeches. It is plotted along with
all the Conservative MPs (blue squares) and all the Labour MPs (pink circles)—which are
also simply the means of the various MPs’ speeches. We perform this process for every MP,
for every session, and thus have estimates for our entire period across the chamber.
10
We can also obtain the variance of those estimates, if required. We also note that unlike the probability
estimates of a naive Bayes classifier, which can be bi-modal, taking on predominately values near 0 and 1,
the estimates are unimodal. This suggests that members with few speeches will not be given extreme values
simply due to the sparsity of the data.
19
8
6
4
0
2
Density
Thatcher, mean
0.2
0.4
0.6
0.8
1.0
0.8
1.0
0.0
0.0
0.2
0.4
0.6
Thatcher
0.0
0.2
0.4
0.6
0.8
1.0
Figure 3: Example of individual MP estimates: Margaret Thatcher in November 1984 session. Top panel: each black circle is a speech given by Thatcher, and its position (obtained
by plugging its characteristics into the SAG learner) on the Labour-Conservative interval.
Dark histogram represents density of those speeches. Broken line is the density plot for all
speeches, for all MPs. Bottom panel: empirical CDF of House of Commons with Thatcher
estimate (i.e. her speech mean) highlighted.
For the purposes of our regressions below, we convert all estimated scores from 0–1 within
the House, to 0–1 within the relevant party. This is important because it then allows us to
treat MPs from different parties with the same score as being ‘equally’ polarized (albeit in
different directions). To clarify, each Labour MP now receives a score between 0 and 1, with 1
being most extremely ‘Labour’ at that time; each Conservative MP receives a score between
0 and 1, with 1 being most extremely ‘Conservative’ at that time. This is accomplished very
simply: we subtract all Labour MP positions from 1, and keep the Tory estimates as they
20
are.11
4.5
Caveats and Validation
While our classification approach is fast and useful, it’s important to be intellectually honest
about what it can and cannot do. Ultimately, our measure of polarization is about overlap
in speech: that is, we can tell how similar the members of different parties are in terms of
what they say. This may hide ‘true’ heterogeneity in a given party, especially if whipping
(of speech) is strong. Alternatively, parties might be quite distinct on average, but contain
a wide diversity of voiced opinions, making them overlap. Consequently, it is important to
assess both whether our estimates are valid (i.e. make sense) for given individuals, and to
examine the cases where the classifier does poorly as this provides information regarding the
scope of our claims. In Supporting Information D we examine these issues in some detail for
a well studied recent period of Westminster history. The upshot is that we are confident we
are measuring something meaningful for what follows.
5
Results
The results of our main supervised learning analysis can be seen in Figure 4. There, for each
of our parliamentary sessions, we report the (mean) accuracy of the algorithm that performs
best in separating Conservative from Labour MPs. We also superimpose structural break
estimates, which we will describe in more detail below. For now focussing on the points,
recall that when the accuracy is high, we are claiming that politics is polarized and divisive.
When accuracy is low, the parties are not easily told apart, and thus parliamentary life may
11
As an example, Diane Abbott—a purportedly left-wing Labour MP—is estimated to be at 0.28 on the
House 0–1 scale in 2001. We convert her score to 1 − 0.28 = 0.72 within her own party for that session.
Meanwhile, Tory MP Peter Ainsworth is estimated to be 0.60 on the original scale which is then used directly
in the regressions. This would imply that Abbott is more extreme relative to her own party than Ainsworth
is relative to his, which makes substantive sense.
21
be described as more consensual.
5.1
Aggregate Polarization: Flat, Up and Down
Our immediate observation from the figure is just how closely it accords with our priors for
the period. In particular, a description of the time series would be as follows: in the 1930s,
polarization drops rapidly, reaching a nadir in the years of the Second World War. This,
of course, makes sense given the (Churchill led) coalition government of that time. Soon
after, when elections begin in earnest with the 1945 Labour landslide, polarization ticks up.
It then enters a long period of approximate stasis between circa 1945 and circa 1979, with
small movements around the mean, though it is gradually sloping upwards. From the first
session of 1979, i.e. the session in which Margaret Thatcher assumed the premiership, polarization jumps and reaches its zenith around the session corresponding to 1987. It then falls,
gradually at first and then more quickly, as Tony Blair becomes leader of Labour after 1994.
By the sessions around 2001, polarization is falling sharply, with the end of Gordon Brown’s
government and the beginning of the Conservative-Liberal Democrat coalition marking a
further decline.
This overall pattern—of relative similarity of the two major parties during the 1950s, 1960s
and 1970s, ended by Thatcher’s government—is almost entirely in keeping with most qualitative, substantive accounts of the era under study (Addison, 1994; Kavanagh and Morris,
1994; Fraser, 2000). From a validity perspective, this is good news and we make the observation that, contrary to e.g. Pimlott (1988) in parliament at least, there was indeed a
relative post-war consensus in terms of debate. Furthermore, there is prima facie evidence,
post-Blair, of a new consensus in line with earlier conjectures from Seldon (1994) and others.
Taking the estimates literally, the contemporary polarization of the House of Commons is on
a par with that of the mid-1960s, a period thought to mark the high watermark of agreement
22
in policy between the parties. We now push this analysis further by being more formal about
the time series. In particular, we consider structural breaks in the sense of Bai and Perron
(2003) as implemented by Zeileis et al. (2002).12 Analysis using standard defaults suggests
that there are three break points in the data, and thus four segments that differ in terms
of their mean polarization. These are the vertical green lines in Figure 4 and correspond to
the following dates: September 1948, November 1978 and June 2001. In the case of the first
two dates, as visual inspection suggests, the mean level of polarization increased (confirmed
by t-test, p < 0.01), and after the third it decreased (p < 0.01). Notice that by segmenting
our data systematically, we have imposed sufficient structure to allow some relatively simple
tests of cohort effects. That is, we now have a set of time series the clearly defined aggregate changes of which may be decomposed into various sources of variation. To clarify, for
each change point, we are defining the relevant sub-series of the data as the two segments
joined by that break point. That is, our first subseries is all data between the start of the
observations (March 1935) and the second break in 1978 (with the first break somewhere in
between). The second subseries is all data between the first break in 1948 and the third in
1979 (with the second break somewhere in between). The final time subseries runs from the
second break in 1979 until the end of the data (with the third break somewhere in between).
5.2
Estimating Possible Cohort Effects
For each sub-data set, our dependent variable are the polarization scores of the individual
MPs which were estimated in the way we described above, along with the important adjustment we mentioned: we subtract Labour estimates from 1 to make them comparable in
‘extremism’ terms to the estimates for the Conservatives. We keep our models simple for
interpretation purposes, and there are just three variables on the ‘right hand side’ (other
than the intercept). To recall, these are: ‘Conservative’ taking the value 1 if the member
12
We give more details in Supporting Information E.
23
Figure 4: Estimates of parliamentary polarization, by session. Estimated change points are
[green] vertical lines.
24
is from the Tory party (zero for Labour); ‘Session’ variable that records the session number
(starting at 1 for our first observed session in 1935); ‘New Cohort’which is an indicator taking the value 1 if the member in question entered the Commons for the first time after the
relevant break date, and zero if the member entered prior to this time. For example, suppose we were studying the third segment of the data, which covers the Thatcher break point:
Conservative members entering at the 1979 general election would receive Conservative=1,
New Cohort=1 while ‘Session’ would vary from 46 to 68—for all members—depending on
the given time period. Two points should be made before proceeding to the results: first,
we have repeated (i.e. session to session) observations for each member, so we cluster our
standard errors at the MP level. Second, we cannot simultaneously cater for member fixed
effects and cohort effects, because the latter is unchanging for an MP (they enter at a particular time, and this is constant over their career). This is not a problem per se: if we see a
non-zero effect for the ‘New Cohort’ variable we have evidence for the idea that replacement
matters—new waves of MPs are behaving differently to older waves.13 Our linear model
results appear in Table 5.2, where each column corresponds to the relevant subpart of the
data. Thus, ‘Breakpoint 1’ refers to the time series that begins at the start of the data, and
then runs to the second (Thatcher) breakpoint, with the first breakpoint (September 1948)
in between. The other columns are similarly defined. We see, immediately, that there is
indeed a cohort effect for all the periods. That is, controlling for session number and party
membership, MPs entering the Commons after the relevant break are different in some way
from their older colleagues. The directions are perhaps as expected across the models: MPs
entering after 1948 were more polarized (on average) than longer serving members, as were
13
To be clear, within each segment, we are fitting a regression of the following form:
polarizationi = β0 + β1 New Cohorti + β2 Sessioni + β3 Conservativei + i .
Here polarizationi is the polarization score of the ith member (the mean of their speeches, converted to a
point on the 0–1 interval as noted above), and the independent variables are as described in text, with i as
an error term.
25
(Intercept)
New Cohort
Session
Conservative
N
R2
adj. R2
Breakpoint 1
Breakpoint 2 Breakpoint 3
Postwar Consensus
Thatcher
Blair Effect
0.602∗
0.547∗
0.632∗
(0.002)
(0.004)
(0.018)
∗
∗
0.023
0.016
−0.045∗
(0.003)
(0.004)
(0.007)
∗
0.000
0.001
0.000
(0.000)
(0.000)
(0.000)
−0.022∗
0.039∗
0.051∗
(0.002)
(0.002)
(0.005)
24048
30739
19248
0.026
0.092
0.053
0.026
0.092
0.053
Standard errors, clustered by MP, in parentheses
∗
indicates significance at p < 0.05
Table 2: Linear Regression of individual polarization estimates on cohort indicator, session
number and party identification variables. Each column refers to a different segment of the
data, demarcated by the relevant change point
MPs entering at or after the 1979 election. By contrast, the new cohort arriving in or after
2001 exhibited less (average) polarization than their forebears.
An obvious next question is whether members of a certain party were consistently more
polarized. Reading across the columns of the ‘Conservative’ row suggests this is false: it
is true that Tories were on average more polarized for the 1979 and 2001 periods, but this
is not true for the break in 1948 that became the post-war consensus. We can also see
that, whatever the cohort effects, there is not much evidence that once a given segment has
occurred, members become consistently more or less polarized. The coefficients (and their
associated significance) on ‘Session’ suggest that there are no pure ‘time’ effects for the
first or third subset of the data. Interestingly though, there is something of an effect for the
Thatcher period (that is, the subset of the data which includes Thatcher’s premiership towards its middle). In particular, members of parliament of this time became more polarized
with every session, regardless of their party or cohort. Put otherwise, the Thatcher period
26
saw a deepening of partisan divides from members opposite year-on-year, even among those
had entered parliament at the same time in the same party.
Given our interest in party-specific explanations, readers may wonder about models that
include an interaction between party membership and cohort (both binary variables). Inevitably, this gives rise to a more involved model which is harder to interpret. For completeness though, we provide exactly such results in Supporting Information F. Generally, the
interaction effects yield models that do not fit the data much better than our simpler efforts,
so we focus on the more parsimonious case in what follows.
5.3
Relative Effect Sizes
How large are the cohort effects relative to the other variables in the model? To assess
this, we apply the ‘relative importance’ method of Lindeman, Merenda and Gold (1980)
as implemented by Gromping (2006). An overview of such an approach for social science
can be found in Johnson and Lebreton (2004), but the key idea is that the R2 of a given
linear model is decomposed into the contribution of each regressor to the model fit, such
that the sum of the contributions is the R2 for the original specification. The essence of the
estimation is that a sequence of linear models is fit to the data, each with a different number
and permutation of the relevant variables. At each step, the increase (or decrease) of the R2
is recorded, and ultimately assigned to a given variable by averaging over the total number
of models in which it was included. In Table 3 we report the these results for the three
sets of regressions. In each cell, we give the variable contribution to R2 , and in parentheses
the proportion of the same for the model in question. We note immediately that the cohort
variable explains somewhere between 14 and 37 percent of the variation of polarization over
time. The party effect is considerably larger, running from 34 percent of the variation for the
Thatcher period to a high of some 81 percent during the last phase of the data. Meanwhile,
27
variable
New Cohort
Breakpoint 1
0.0096 [0.37]
Breakpoint 2
0.0194 [0.21]
Breakpoint 3
0.0072 [0.14]
Session
0.0047 [0.18]
0.0415 [0.45]
0.0026 [0.05]
Conservative
0.0118 [0.45]
0.0309 [0.34]
0.0435 [0.81]
Table 3: Relative Importance of predictors for the three models: contribution to model R2
[proportion of R2 in parentheses].
the only period in which the session variable contributes the plurality of the explanatory
power of the model is around the second Thatcher break point; elsewhere, it is contributes
little to the model, overall. All in all, we conclude that cohort effects are important both
statistically and substantively for explaining changes in parliamentary polarization.
5.4
Robustness
Our approach uses an estimated dependent variable, which introduces heteroscedascity insofar as we have more accurate point predictions for some MPs relative to others. As noted, we
excluded very short speeches, but in any case the great majority of our observations involve
more than ten speeches. Still, we verify the robustness of our results with HC3 (Efron)
standard errors in the sense recommended by Lewis and Linzer (2005). See Supporting Information G for more details. A different philosophical concern with our approach arises
from the fact that the breakpoints could be confounded with major replacement, implying
that a cohort effect can be not be identified separately from simply serving at different times.
In Supporting Information H we examine the robustness of our results using only data after
each change point.
28
6
Discussion
In the American context, legislative polarization has inspired much popular and academic
concern (although see Fiorina, Abrams and Pope, 2011, for an alternative account). Broadly,
the argument is that recent ‘gridlock’, and the commensurate threats to “shut down” the federal government, have negative consequences for everything from the economy to the United
States’ standing in the world. In Westminster systems, which entail a “fusion” (Bagehot,
1873/2016, 48) of the executive and legislature along with large, obedient majorities for
those in office, polarization of elites along party lines is less likely to lead to dysfunction.
This is a fortiori true when legislative leaders seek to court the median voter of the country
as a whole, which arguably describes UK politics in recent times (Adams, Green and Milazzo, 2012). In fact, scholars of British politics have gone so far as to suggest that public
disengagement—signalled by low turnout—may be the consequence of too little polarization
and differentiation between the ‘main’ parties (Ford, 2015). Of course, this is not to say
that polarization in the Westminster system—or any parliamentary system—is innocuous:
the literature on government formation, for example, suggests ideologically distant parties
may make successful coalition agreements less likely (e.g. Warwick, 2005). Whatever the
diagnosis, it is clear that interest in elite polarization (or lack thereof) is front and center
for political scientists regardless of their country of study. Yet, as we noted above, the literature to date has developed asymmetrically: while we know a great deal about measuring
and analyzing legislative polarization in the United States, we have made little progress for
Westminster systems, the UK perhaps most obviously.
It was partly the task of measurement that this paper set out to solve. We argued that,
if polarization has an effect on legislators, we should see it in debates. In particular, we
claimed that when Labour and Conservatives make speeches that are distinctively different
29
from each other—as in the mid-1980s—this is prima facie evidence of polarization. By contrast, when the speeches are very similar ‘across the aisle’—as they were in the 1950s and
1960s—there is less polarization. Our machine learning algorithms produced estimates that
accord with our qualitative priors about different periods, and we were able to discern three
structural breaks: just after the Second World War, the Thatcher era, and the Blair era.
Perhaps more importantly, using individual speeches, we were able to calculate MP-level
estimates on the Labour–Conservative continuum and use them in regression analyzes. It is,
inevitably, difficult to make causal statements from observational data, but we nonetheless
would claim evidence for the notion that ‘cohort effects’ matter. That is, there is a systematic difference between MPs, depending on the generation in which they enter the House of
Commons. In terms of regression variance explained, this variable contributes somewhere
between a non-trivial 14 and 37 percent, over time. Furthermore, even when we restrict
our analysis to comparing cohorts after a given structural break, the generations generally
remain statistically significantly different. This is important because it rules out the notion
that the effect is a function of electoral success or career aspirations (i.e. we are comparing
everyone who won (re-)election and served after a given time).
Our methods and the way that we employ on the data leave open several interesting avenues
of research, especially regarding the causal mechanisms behind the findings. For example,
it is possible that the cohort effect comes from different relative ‘susceptibilities’ to (new)
leaders, though we noted above that it remains potent even when we control for party membership (and party leaders do not typically change simultaneously). It is also possible that
any differences come from evolving elite recruitment efforts by parties, and thus the main
part of the data generating process predates the times of serving in the Commons by some
years. Second, we have said nothing about smaller parties—such as the Liberal Democrats
of Scottish Nationalists—who have played key roles in the Commons in recent times. Going
30
forward, including such MPs may be fruitful not least to paint a more complete picture of
contemporary UK politics. In the context of our classification approach, this will involve
using versions of the techniques to predict membership of one of multiple classes. Such an
innovation is likely to be especially helpful for studying systems outside Westminster, such
as Germany, Ireland or Canada, where the norm is several ‘medium sized’ parties which
routinely form coalitions with small partners. In such situations, multiclass approaches can
tell us more about party overlap and how this might affect the stability of those coalitions.
Finally, though we used speeches, we have not said much about how they may relate to
differing policy outcomes. So, while it one thing to demonstrate that elite polarization varied over time, it is quite another to show that this polarization had real consequences for
the Acts passed and the citizens they affect (see Jennings and John, 2009). We leave these
questions for future work.
31
References
Adams, James, Jane Green and Caitlin Milazzo. 2012. “Has the British Public Depolarized Along With Political Elites? An American Perspective on British Public Opinion.”
Comparative Political Studies 45(4):507–530.
Addison, Paul. 1994. The Road to 1945: British Politics and the Second World War. London:
Pimlico.
Bagehot, Walter. 1873/2016. The English Constitution (2nd Edn). Accessed April 1, 2016.:
http://socserv.mcmaster.ca/ econ/ugcm/3ll3/bagehot/constitution.pdf.
Bai, Jushan and Pierre Perron. 2003. “Computation and Analysis of Multiple Structural
Change Models.” Journal of Applied Econometrics 18:1–22.
Barber, Michael and Nolan McCarty. 2015. Causes and Consequences of Polarization. In
Solutions to Polarization in America, ed. Nathaniel Persily. Cambridge: Cambridge University Press pp. 15–59.
Beelen, Kaspar, Tim Alberdingk Thijm, Christopher Cochrane, Kees Halvemaan, Graeme
Hirst, Mike Kimmins, Sander Lijbrink, Maarten Marx, Nona Naderi, Roman Polyanovsky,
Ludovic Rheault and Tanya Whyte. 2016. “The Digitization of the Canadian Parliamentary Debates.” Working Paper, Univeristy of Toronto.
Bottou, Léon. 2004. Stochastic learning. In Advanced lectures on machine learning. Springer
pp. 146–168.
Cowley, Philip. 2002. Revolts and Rebellions: Parliamentary Voting Under Blair. London:
Politico’s.
Crammer, Koby, Ofer Dekel, Joseph Keshet, Shai Shalev-Shwartz and Yoram Singer. 2006.
32
“Online Passive-Aggressive Algorithms.” Journal of Machine Learning Research 7(1):551–
585.
Diermeier, Daniel, Jean-Franois Godbout, Bei Yu and Stefan Kaufmann. 2012. “Language
and Ideology in Congress.” British Journal of Political Science 42:31–55.
Eggers, Andrew C. and Arthur Spirling. 2016. “Party Cohesion in Westminster Systems:
Inducements, Replacement and Discipline in the House of Commons, 18361910.” British
Journal of Political Science FirstView:1–23.
Fiorina, Morris, Samuel Abrams and Jeremy Pope. 2011. Culture War? The Myth of a
Polarized America. New York: Longman.
Ford, Robert. 2015. “In Britain, polarization could be the solution”. In Political Polarization in American Politics, ed. Daniel Hopkins and John Sides. New York: Bloomsbury
Academic pp. 126–136.
Ford, Robert and Matthew Goodwin. 2014. Revolt on the Right: Explaining Support for the
Radical Right in Britain. New York: Routledge.
Fraser, Duncan. 2000. “The Postwar Consensus: A Debate Not Long Enough.” Parliamentary Affairs 53(2):347–362.
Freund, Yoav and Robert E. Schapire. 1999. “Large Margin Classification Using the Perceptron Algorithm.” Machine Learning 37(3):277–296.
Gamble, Andrew. 1994. The Free Market and the Strong State: The politics of Thatcherism.
New York: NYU Press.
Gentzkow, Matthew and Jesse M Shapiro. 2010. “What drives media slant? Evidence from
US daily newspapers.” Econometrica 78(1):35–71.
33
Gentzkow, Matthew, Jesse M Shapiro and Matt Taddy. 2015. “Measuring Polarization
in High-dimensional Data: Method and Application to Congressional Speech.” NBER
Working Paper .
Godbout, Jean-Franois and Bjrn Hoyland. 2016. “Unity in Diversity? The Development
of Political Parties in the Parliament of Canada, 18672011.” British Journal of Political
Science FirstView:1–25.
Green, Jane. 2007. “When Voters and Parties Agree: Valence Issues and Party Competition.”
Political Studies 55(3):629–655.
Gromping, Ulrike. 2006. “Relative Importance for Linear Regression in R: The Package
relaimpo.” Journal of Statistical Software 17(1):1–27.
Hanretty, Chris, Benjamin Lauderdale and Nick Vivyan. 2016. “Dyadic Representation in a
Westminster System.” Legislative Studies Quarterly pp. 1–33.
Hastie, Trevor, Robert Tibshirani and Jerome Friedman. 2009. The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. New York: Springer.
Iversen, Torben and David Soskice. 2015. “Information, Inequality, and Mass Polarization:
Ideology in Advanced Democracies.” Comparative Political Studies 48(13):1781–1813.
Jennings, Will and Peter John. 2009. “The dynamics of political attention: public opinion
and the Queen’s Speech in the United Kingdom.” American Journal of Political Science
53(4):838–854.
Jensen, Jacob, Suresh Naidu, Ethan Kaplan, Laurence Wilse-Samson, David Gergen, Michael
Zuckerman and Arthur Spirling. 2012. “Political polarization and the dynamics of political
language: Evidence from 130 years of partisan speech [with comments and discussion].”
Brookings Papers on Economic Activity pp. 1–81.
34
Johnson, Jeff and James Lebreton. 2004. “History and Use of Relative Importance Indices
in Organizational Research.” Organizational Research Methods 7(3):238–257.
Kam, Christopher J. 2009. Party Discipline and Parliamentary Politics. Cambridge: Cambridge University Press.
Kavanagh, Dennis and Peter Morris. 1994. Consensus Politics from Attlee to Major. Hoboken: Wiley Blackwell.
Kellermann, Michael. 2012. “Estimating Ideal Points in the British House of Commons Using
Early Day Motions.” American Journal of Political Science 56(3):757–771.
King, Anthony. 2007. The British Constitution. Oxford: Oxford University Press.
Lauderdale, Benjamin and Alexander Herzog. 2016. “Measuring Political Positions from
Legislative Speech.” Political Analysis 24(2):1–21.
Layman, Geoffrey, Thomas Carsey and Juliana Horowitz. 2006. “Party Polarization in
American Politics: Characteristics, Causes, and Consequences.” Annual Review of Political Science 9:83–110.
Lewis, Jeffrey and Drew Linzer. 2005. “Estimating Regression Models in Which the Dependent Variable Is Based on Estimates.” Political Analysis 13(4):345–364.
Lijphart, Arend. 1999. Patterns of Democracy: Government Forms and Performance in
Thirty-Six Countries. New Haven, CT: Yale University Press.
Lindeman, Richard, Peter Merenda and Ruth Gold. 1980. Introduction to Bivariate and
Multivariate Analysis. Glenview, IL: Scott, Foresman.
McCarty, Nolan, Keith Poole and Howard Rosenthal. 2006. Polarized America: The Dance
of Ideology and Unequal Riches. Cambridge, MA: MIT Press.
35
Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M.
Brucher, M. Perrot and E. Duchesnay. 2011. “Scikit-learn: Machine Learning in Python.”
Journal of Machine Learning Research 12:2825–2830.
Pimlott, Ben. 1988. The Myth of Consensus. In The Making of Britain: Echoes of Greatness,
ed. Lesley Smith. Basingstoke: Macmillan pp. 129–142.
Poole, Keith and Howard Rosenthal. 1997. Congress: A Political-Economic History of Roll
Call Voting. New York: Oxford University Press.
Powell, Bingham. 2000. Elections as Instruments of Democracy: Majoritarian and Proportional Visions. New Haven: Yale University Press.
Powell, G. Bingham. 1982. Contemporary democracies: Participation, stability and violence.
Cambridge, MA: Harvard University Press.
Quinn, Kevin, Burt Monroe, Michael Colaresi, Michael H. Crespin and Dragomir Radev.
2010. “How to Analyze Political Attention with Minimal Assumptions and Costs.” American Journal of Political Science 54:209–228.
Rhodes, Rod and Patrick Weller. 2005. Westminster Transplanted and Westminster Implanted: Exploring Political Change. In Westminster Legacies: Democracy and Responsible Government in Asia and the Pacific, ed. Haig Patapan, John Wanna and Patrick
Weller. University of New South Wales: University of New South Wales Press.
Sartori, G. 1976. Parties and party systems. New York: Cambridge University Press.
Schmidt, Mark, Nicolas Le Roux and Francis Bach. 2013. “Minimizing finite sums with the
stochastic average gradient.” arXiv preprint arXiv:1309.2388 .
Seldon, Anthony. 1994. “The Consensus Debate.” Parliamentary Affairs 47(4):501–514.
36
Slapin, Jonathan B. and Sven-Oliver Proksch. 2008. “A Scaling Model for Estimating TimeSeries Party Positions from Texts.” American Journal of Political Science 52.
Spirling, Arthur and Iain McLean. 2007. “UK OC OK?” Political Analysis 15(1):85–96.
Spirling, Arthur and Kevin Quinn. 2010. “Identifying intraparty voting blocs in the UK
House of Commons.” Journal of the American Statistical Association 105(490):447–457.
Toye, Richard. 2012. “From ‘Consensus’ to ‘Common Ground’: The Rhetoric of the Postwar
Settlement and its Collapse.” Journal of Contemporary History 48(1):3–23.
Warwick, Paul V. 2005. “Do Policy Horizons Structure the Formation of Parliamentary
Governments?: The Evidence from an Expert Survey.” American Journal of Political
Science 49(2):373–387.
Zeileis, Achim, Friedrich Leisch, Kurt Hornik and Christian Kleiber. 2002. “strucchange:
An R Package for Testing for Structural Change in Linear Regression Models.” Journal of
Statistical Software 7(2):1–38.
37
Supporting Information A
Temporal Stability of the
Data
As we discuss in Section 3, our results are unlikely to be the spurious result of artificial
long-term trends in how speeches are made in Parliament. In particular, while there is some
variation from one session to another in the number and length of speeches given by members, there is no general trends that aligns with our findings about polarization and cohorts.
Consider first the number speeches made by each member per session, presented in Figure 5.
While there is some local cyclicality related to electoral periods (with a higher mean number of speeches given in 1979 when Thatcher was elected, for example), overall there is no
detectable trend.
38
500
speeches
400
300
5th %ile
95%ile
mean
200
median
100
2008
2003
1998
1993
1988
1983
1978
1974
1969
1964
1959
1954
1950
1944
1939
0
session
Figure 5: Number of Speeches By Member Per Session.
In addition to the number of speeches given, we might be concerned that there are differences in the length of speeches, which could reflect differences in cohorts or procedural roles
played by different members. The evidence suggests this is not the case, however, as the
mean length of speeches by different members of Parliament remains constant throughout
the period of our study. We present the mean and 5th, 50th, and 95th percentiles of the
mean length of speeches in Figure 6. While there is a slight increase in the mean length in
the post-war period and a slight decrease in recent years, this is minor and does not match
the trends we identify in our polarization measure.
39
Mean length of speeches
4000
5th %ile
95%ile
mean
median
2000
2008
2003
1998
1993
1988
1983
1978
1974
1969
1964
1959
1954
1950
1944
1939
0
session
Figure 6: Mean Length of Speeches By Member Per Session.
Supporting Information B
Measurement Concerns
Gentzkow, Shapiro and Taddy (2015) show that two recent measures of polarization from
speech based on text (Gentzkow and Shapiro, 2010; Jensen et al., 2012) can be biased by
changes in the size of the vocabulary. Such a critique could be of particular interest to our
findings since they argue that the revised measure identifies significant polarization in recent
years in the U.S. case. However, since we fix the vocabulary across all Parliamentary sessions,
we have little reason to think this would affect our results. Their approach to demonstrating
this, however, which involves comparing the results when party labels are randomly assigned
by member, provides a way to examine whether our results may be the product of some other
similar spurious relationship. In particular, we would be concerned if the trend line from the
40
randomized labels closely tracks the trend of our measure (compare Gentzkow, et al, Figures
2, 3). This is not the case for our results, as is clear from comparing our results (in red)
to those of 10 runs of randomized party labels (Figure 7). While there is some variation in
the estimates generated from random labels, it does not match our results, and differs from
them quite substantially at points, such as in suggesting high polarization during the World
0.7
0.6
0.5
0.4
Mean of Max Accuracy
0.8
War II era.
1935
1945
1955
1965
1975
1985
1995
2005
Session Year
Figure 7: Estimates of parliamentary polarization, by session, by algorithm. The accuracy
using real party labels (our polarization measure) is in red, while 10 runs with party labels
randomized by speaker are presented in grey.
41
Supporting Information C
Machine Algorithms Produce
Similar Results
Recall that we use four machine learning algorithms: perceptron and passive aggressive
classifiers, a stochastic gradient descent classifier using a hinge-loss penalty and logistic
regression using stochastic average gradient descent. When we inspect their mean accuracy
rates over time, we see they perform almost identically. This is shown in Figure 8, where
the lines each correspond to a different classifier and, importantly, are barely distinguishable
from one another.
42
Figure 8: Estimates of parliamentary polarization, by session, by algorithm. Legend abbreviations are logistic regression using stochastic average gradient descent (SAG), stochastic
gradient descent classifier (SGD), perceptron43(PCPT), passive aggressive (passAg). Notice
that performance is essentially identical across algorithms.
Supporting Information D
Validation and Misclassification
We want to believe that our speech-based metric measures something useful and meaningful, like ideology. To do this, we looked in some detail at the period 1997-2001 for the
parliamentary party, which has been studied extensively both quantitatively (Spirling and
Quinn, 2010) and qualitatively (Cowley, 2002). In particular, we study the first full year of
the Labour government, 1998. If the individual level measures are valid, it should be the
case that cabinet ministers and loyal backbenchers—generally New Labour types—appear
relatively distant from more rebellious MPs who routinely defied the whip in roll call voting.
In Table 4 we report evidence in line with that requirement. In particular, of the 302 Labour
members for which we have (well defined) estimates, we see that cabinet members, such
as Alan Howarth and Tony Blair are at one end of the spectrum, while serial rebels like
Tony Benn and Jeremy Corbyn are at the other. In between them are independent minded
non-cabinet members such as Keith Vaz and Roger Berry.
44
Name
Estimated Position
Rank
Alan Howarth
0.12
1
David Blunkett
0.16
11
Tony Blair
0.17
14
Robin Cook
0.18
20
Keith Vaz
0.21
67
Rosie Winterton
0.23
87
Roger Berry
0.23
89
Dennis Skinner
0.27
170
Jeremy Corbyn
0.31
213
Tam Dalyell
0.37
278
Tony Benn
0.39
288
Table 4: Some Labour MPs from ‘most Labour’ to ‘least Labour’ in 1998.
Notice that the classifier performs in a particular way: it generally is more successful at
classifying core government personnel (in the sense that it estimated to be most ‘Labour-ish’)
rather than the rebels. This produces no problem per se for our regression results insofar as
what matters for them is the relative position of MPs within their own party—i.e. whether
they are far from others or close (not their specific values on the scale).
Supporting Information E
Structural Break Details
As noted above we looked for structural breaks in the sense of Bai and Perron (2003) as
implemented by Zeileis et al. (2002). Using standard defaults, which in this case means
a minimal window of 0.15 of all the data, this dynamic programming method seeks out
(multiple) points where the regression coefficients (in our case, the intercept) shift in value.
45
In Table 5 we give the Bayesian Information Criterion statistic for each number of break
points (m), up to 6. By this goodness of fit measure, the regression with three break points
is optimal (BIC is lowest).
m
0
1
2
3
4
5
6
BIC
-302.01
-371.39
-408.57
-413.78
-408.50
-400.05
-389.74
Table 5: Structural breaks in polarization time series.
Supporting Information F
Models with Interactions
It is straightforward to include an interaction effect in our linear models; that is, to include
a term calculated as conservative×New Cohort. Noting that both component are binary,
such a variable allows for the possibility that there are asymmetric effects of party for those
in different cohorts. Since our focus here is on the relative model fit of such efforts compared
to our main (non-interaction) regressions, and the relative point estimates, we don’t correct
the standard errors (with the exception of one variable in one regression, clustering makes
no difference for p < 0.05, anyway).
46
Table 6: Re-estimating our linear models to allow for interaction effects between party and
cohort.
(Intercept)
New Cohort
Conservative
Session
New Cohort×Conservative
N
R2
adj. R2
Breakpoint 1 Breakpoint 2 Breakpoint 3
0.59∗
0.55∗
0.63∗
(0.00)
(0.00)
(0.01)
0.04∗
−0.01∗
−0.02∗
(0.00)
(0.00)
(0.01)
∗
∗
−0.01
0.03
0.05∗
(0.00)
(0.00)
(0.00)
∗
0.00
0.00
0.00
(0.00)
(0.00)
(0.00)
−0.03∗
0.04∗
−0.04∗
(0.00)
(0.00)
(0.01)
24048
30739
19248
0.034
0.099
0.055
0.034
0.099
0.055
Standard errors in parentheses
∗
indicates significance at p < 0.05
From the perspective of the adjusted-R2 , the model with the interaction adds essentially nothing over our original specifications in the case of Breakpoint 2 (original adjustedR2 = 0.092) and Breakpoint 3 (0.053). In the case of the first breakpoint, the fit is slightly
better with the interaction term: moving from an adjusted-R2 of 0.026 to 0.034.
To get a sense of the substantive implications of the new term, in Table 7 we present the
predicted values of the three possibilities (New Cohort-Labour, Old Cohort-Conservative,
New Cohort-Conservative) relative to Labour members not in the new intake. That is,
Labour members who take a ‘New Cohort’ value of zero are our baseline (of zero) and any
other specification is the point estimate we calculate relative—over or under— to that set of
people.
47
Table 7: Estimated polarization score of various groups relative to non-New Cohort Labour
members
Breakpoint 1 Breakpoint 2 Breakpoint 3
Old, Labour (baseline)
0.000
0.000
0.000
New, Labour
0.040
-0.006
-0.023
Old, Conservative
-0.006
0.029
0.054
New, Conservative
-0.000
0.064
-0.007
Running through these estimates chronologically, we note that incoming Labour MPs
had higher average polarizations than their senior colleagues around the time of the postwar consensus, while Tories of all vintages were less polarized. For the Thatcher era, both sets
of Conservatives had higher average polarization scores than senior Labour MPs (perhaps
reflecting the idea that the ‘Thatcher effect’ was profound for all Tory MPs). Finally, for
the Blair era, we see that incoming Conservative and incoming Labour MPs had predicted
values of polarization slightly lower than incumbent Labour members.
48
Supporting Information G
Robustness I: Efron/HC3 Correction
A natural concern with modeling approaches such as ours is that the dependent variable is
estimated, which introduces heteroscedascity in the sense that we can imagine that we have
more accurate estimates for some MPs relative to others. Partly to ameliorate this possible
problem, we excluded very short speeches as we explained above. Further, over 82 percent
of our observations (at the member-session level) involve MPs making at least 10 speeches.
Second, state of the art advice on such matters is to use White or Efron heteroskedastic
robust standard errors in estimation (Lewis and Linzer, 2005). In the above analysis, we
clustered on MP, but it is no problem to switch the specification to HC3 (Efron) and we
do just that below. Importantly, the results remain the same: the relevant variables are
statistically significant as before.
(Intercept)
New Cohort
Session
Conservative
N
R2
adj. R2
Breakpoint 1
Breakpoint 2 Breakpoint 3
Postwar Consensus
Thatcher
Blair Effect
0.602∗
0.547∗
0.632∗
(0.002)
(0.002)
(0.007)
∗
∗
0.023
0.016
−0.045∗
(0.002)
(0.002)
(0.004)
∗
0.000
0.001
0.000
(0.000)
(0.000)
(0.000)
−0.022∗
0.039∗
0.051∗
(0.001)
(0.001)
(0.002)
24048
30739
19248
0.026
0.092
0.053
0.026
0.092
0.053
Standard errors, Efron consistent, in parentheses
∗
indicates significance at p < 0.05
Table 8: Linear Regression of individual polarization estimates on cohort indicator, session
number and party identification variables. Each column refers to a different segment of the
data, demarcated by the relevant change point. This specification uses Efron HC3 standard
errors.
49
Supporting Information H
Robustness II: Post breakpoint(s) only
In our main regression above, for any given change point, we compare the MPs of different
cohorts across all the relevant sub-data. So, for example in the case of the Thatcher break
point, we are comparing the behavior of those elected prior to the change point with the
behavior of the same people after the breakpoint and to the behavior of the new cohort (after
the break point). This may be problematic if it’s the case there is large scale replacement
at or near the breakpoints. At the extreme, one can imagine comparing an ‘old cohort’ all
of whom served prior to the breakpoint, with a ‘new cohort’ all of serve after. But in that
situation, any cohort effect is clearly confounded by serving at different times. With this
in mind, we re-specify our models using just the data after each break. This, inevitably,
reduces the number of observations available, but it forces the regression to compare ‘like
with like’ in terms of service. We report the results of such regressions below There is some
sign flipping in the coefficients, but importantly, cohort remains a statistically significant
variable in all specifications.
50
(Intercept)
New Cohort
Session
Conservative
N
R2
adj. R2
Breakpoint 1 Breakpoint 2 Breakpoint 3
0.61∗
0.48∗
0.98∗
(0.00)
(0.02)
(0.07)
∗
∗
0.02
0.01
0.02∗
(0.00)
(0.00)
(0.01)
∗
−0.00
0.00
−0.00∗
(0.00)
(0.00)
(0.00)
−0.02∗
0.12∗
−0.10∗
(0.00)
(0.00)
(0.00)
17455
13284
5964
0.02
0.25
0.17
0.02
0.25
0.17
MP-clustered standard errors in parentheses
∗
indicates significance at p < 0.05
Table 9: Checking Robustness by Restricting Data to all Observations occurring after the
relevant breakpoint.
51