A Measurement Model for Synthesizing Multiple Comparative

A Measurement Model for Synthesizing Multiple
Comparative Indicators: The Case of Judicial
Independence
Drew A. Linzer
Assistant Professor
Department of Political Science
Emory University
Tarbutton Hall
1555 Dickey Drive
Atlanta, GA 30322
[email protected]
phone: 404-727-0697
fax: 404-727-4586
Jeffrey K. Staton
Associate Professor
Department of Political Science
Emory University
Tarbutton Hall
1555 Dickey Drive
Atlanta, GA 30322
[email protected]
phone: 404-727-6559
fax: 404-727-4586
Word count: 8418
Abstract
We introduce a statistical measurement model for uncovering latent concepts commonly encountered in time series, cross-sectional analyses in comparative politics and
international relations. Our approach addresses unique challenges that arise in these
data: temporal dependence and boundedness in the latent quantity, and substantial
missing data and measurement error in the observable indicators. Unlike in traditional
scaling applications, comparative indicators are not produced by the units themselves
but rather by independent teams of scholars seeking to measure the concept using
more or less similar information. Our model accommodates these differences, enabling
scholars to estimate latent variables in a manner directly suited for comparative TSCS
research. The resulting measures match conceptual definitions with both greater validity and improved reliability. We apply the model to generate a new measure of judicial
independence, which is available for 50 years and 200 states, and which solves several
vexing problems that undermine the utility of existing indictors.
Introduction
The concept of judicial independence has been invoked as a key causal variable across comparative political science for outcomes ranging from regime stability to economic development and the protection of human rights (e.g., Gibler and Randazzo, 2011; La Porta et al.,
2004; North and Weingast, 1989). Researchers have also sought to understand the determinants of judicial independence (e.g., Ginsburg, 2003; Helmke, 2005; Vanberg, 2005; Hayo
and Voigt, 2007; Clark, 2010; Carrubba, Gabel and Hankla, 2008); and, in light of its importance to various aspects of human welfare, the international community spends considerable
resources each year promoting judicial reform and tracking its success (Carothers, 2006).
Despite this attention, researchers have continued to face vexing measurement challenges
that complicate efforts to answer even very basic questions. The key obstacle is that judicial
independence—like many concepts in comparative politics and international relations—is
not directly observable. It is latent.1
One response to this challenge relies on elements of judicial behavior (e.g., the choice of
a peak court to invalidate a law), which under the right conditions, and under a particular
theoretical logic, can reflect a form of independence.2 Although plausible for a limited set
1
We seek to measure a behavioral concept of independence, though the point is no less
true for de jure concepts, which attempt to capture the set of institutions that incentivize
independent judging. Although a constitutional terms like fixed tenure are typically directly
observable, what is latent are the incentives these rules are supposed to create.
2
For example, Rı́os-Figueroa (2007) uses decisions against the PRI in Mexico to evaluate
the “effectiveness” of the Mexican Supreme Court. To be effective, a court must first be
willing to challenge rulers, which depends on having an “independent” view of the record.
Similarly, Helmke (2005) uses Supreme Court decisions against the sitting government in
Argentina to evaluate the extent to which judges’ decisions are disconnected from government
preferences (i.e. independent) over case outcomes. Of course, simply observing a court strike
down a law does not imply that the court is acting “independently.” Whittington (2005)
2
of states, the approach suffers from the fact that there is nothing close to a representative
sample of judicial decisions for all countries, much less over time and for many courts in
state’s system.3 Recognizing this limitation, scholars looking for global measures of judicial
independence have relied upon expert assessments or non-judicial proxies, which are available
for many states and years. Yet using a single expert assessment or proxy can be hard to
justify when plausible alternatives exist. In addition, these variables contain significant
measurement error (Haggard, MacIntyre and Tiede, 2008), and many only offer incomplete
coverage of countries and years. These limitations are likely to bias estimates of causal
effects, and inflate confidence about the robustness of researchers’ findings (Rı́os-Figueroa
and Staton, Forthcoming). Using multiple indicators, on the other hand, raises questions
about how to aggregate them sensibly and without substantial loss of information.
Whether we care to explain judicial independence, track its evolution over time, or evaluate its effects on human welfare or state choices in the international system, we must do
a better job measuring it. In this paper, we propose a solution to the measurement problem using a statistical measurement model that systematically combines information from
multiple observable indicators into a single unified measure. Rather than attempt to find
the perfect indicator, we argue that there are compelling reasons to draw out the underlying
commonalities in all of them. We introduce a dynamic heteroscedastic graded response item
response theory (IRT) model for time series, cross-sectional (TSCS) data that is applicable
reminds us that leaders can prefer to use their courts to strike down policies that they
themselves did not wish to pass in the first place. This does not imply, however, that
decisions say nothing about independence. Armed with a strong theoretical argument that
identifies conditions under which judges ought to confront strong pressures that undermine
independence, observing decisions can be informative.
3
The National High Courts Database (Haynie et al., 2007) has expanded significantly the
field’s ability to conduct studies of judicial decision-making in a truly comparative setting,
but it only contains data on eleven states.
3
to a wide variety of latent concepts in the study of international relations and comparative
politics—e.g., level of democracy, degree of corruption, or military resolve.4 As when seeking
to estimate any latent variable, we appeal to features of the world that we can observe to
infer characteristics we cannot (Treier and Jackman, 2008, 203).
We demonstrate that there are core differences between the assumptions underlying traditional IRT models, and the process by which indicators of latent concepts in comparative
TSCS research emerge, which together necessitate a new approach. Our model draws upon
the unique features of TSCS data to generate latent variable measures that are superior
to those from current methods. Since consecutive years’ observations are not independent,
the model constrains the latent variable to trend smoothly within countries over time. The
model is sensitive enough to uncover both gradual changes and abrupt shifts in the latent
characteristic from year to year. It also seamlessly accommodates indicators with varying
amounts of data missingness and measurement reliability. The model further estimates the
difficulty of measuring the latent concept in each country, and by so doing, raises questions
about the usefulness of the latent concept in all locations. Finally, it constrains estimates
of the latent variable to lay on a bounded interval. This not only reflects the idea that
many comparative concepts themselves have natural boundaries; but also that countries at
the extremes of the scale should be the easiest ones to measure. Models that assume an
unbounded scale place the greatest estimation uncertainty around countries at the ends of
the scale, which we consider unrealistic in a comparative setting.
The model that we describe will enable researchers to uncover the common element in
any series of indicators of a latent variable of interest, assess which of these factors are
more or less informative, and characterize the uncertainty in the resulting inferences. Yet
as we emphasize, the successful use of our method requires that researchers first present a
theoretically valid conceptualization of how the latent variable is expected to manifest, and
4
The term graded response in the context of item response theory refers to models for
outcome variables that are measured at the ordinal level (Johnson and Albert, 1999, ch. 7).
4
why. As such, our model not only does not replace careful theoretical argumentation—but
in fact it depends upon it.
We illustrate with application to the measurement of judicial independence, constructing
a unified measure that is available for 200 states over 50 years. We also provide estimates
of uncertainty for each latent value of judicial independence. The resulting measure is both
more complete and more theoretically valid than any existing, individual indicator.5 Our
model further permits the calibration of extant indicators, so that a researcher who wishes to
use any one of them—rather than our scale—will be able to assess those ratings in relation
to the others.
Measuring Latent Concepts
There are three primary methodological options for quantifying a concept that is not directly
observable. First, we can appeal to a single proxy operationalization, as when democracy is
measured by a regime score (e.g., Przeworski et al., 2000) or when the strength of a party is
measured by its age (e.g., Stokes, 2001). As we discuss above, when multiple, plausibly good
alternatives exist, the choice is not obvious. And unless we believe that measurement error
is extremely small or that a particular indicator matches our theoretical needs particularly
well, asking which indicator to use poses the wrong question.
A second approach involves using a deterministic set of rules for systematically combining information from related variables into a single category or scale. One simple but often
effective version of this technique is to average together a battery of distinct measures that
are each believed to be manifestations of the underlying, latent concept (e.g., Ansolabehere,
5
In a supplementary appendix, we describe some current and potential research uses
for the new measure. We also illustrate how our measurement model might be fruitfully
applied to another latent concept, regime type, which highlights advantages of our approach
in comparison to the scaling model proposed by Pemstein, Meserve and Melton (2010).
5
Rodden and Snyder, 2008). A variety of methods for eliciting and pooling expert ratings
are also available (O’Hagan et al., 2006). The benefit to this approach is that the resulting
scale will be directly interpretable as a measure of the latent concept—assuming that the
aggregation procedure rests on a solid theoretical foundation. In certain applications, however, the association between the latent variable and each of the manifest indicators is not
sufficiently evident a priori to use such descriptive techniques. This may be because some
manifest variables are better or worse indicators of the underlying scale; or because there is
a large amount of measurement error or stochastic noise in the observed indicators. We may
also lack sufficient data to reliably implement the aggregation procedure for the entire set of
observations.
We advocate a third option: a statistical measurement model that allows the data to
“reveal” their own underlying structure, including the relative importance of each observed
indicator to the latent quantity. Measurement models specify a mathematical function linking the latent concept to the manifest variables probabilistically, with parameters estimated
from the data. Scaling models such as the Rasch or item response model are examples
of measurement models that infer a latent continuum from multiple categorical variables
(van der Linden and Hambleton, 1997; Fox, 2010). Principal component and factor analytic models perform dimensional reduction, converting a series of continuous variables into
a smaller number of scales that preserve common information (Joliffe, 2002; Quinn, 2004;
Arel-Bundock and Mebane, 2011). These methods offer certain advantages over more basic
descriptive approaches, but they must also be justified and interpreted with care.
Latent Concepts in TSCS Data
Comparative time series cross-sectional data—in which the same set of countries are measured repeatedly over time—share a set of features that complicate the use of standard
measurement models. To date, most IRT models in political science have been used for
6
estimating the latent ideologies of legislators, judges, or political parties from recorded votes
(e.g., Bailey, 2001; Martin and Quinn, 2002; Clinton, Jackman and Rivers, 2004b; Voeten,
2007; Shor and McCarty, 2011), or of voters from survey responses (e.g., Jessee, 2009; Bafumi
and Herron, 2010). In these cases, the observable manifestations of the latent characteristic
are produced directly by the units of study: votes are cast; questions are answered. Although
researchers may have an idea of what the latent dimension underlying these outcomes represents, this meaning is ultimately not revealed until after fitting the model and interpreting
the results. How each of the indicators will relate to the underlying dimension—in either
magnitude or direction—is also not necessarily known in advance. In comparative research,
by contrast, the indicators of the latent concept are not produced by the units themselves,
but rather by teams of scholars attempting to assess the underlying concept. The meaning
of the latent dimension is fixed a priori with reference to some theory or definition, and the
indicators are selected on the basis of that theory.6
A number of complications emerge from this important distinction in the data-generating
process. Most alarmingly, individual researchers need not share the same conceptual definition of the latent concept. Even if they do, they may differ over what should constitute
valid indicators. Some research teams may simply be better at measuring certain concepts,
perhaps especially so in particular regions or political contexts. And of course, some states
may be more difficult to measure than others. On the other hand, because the meaning of
the scale is known, researchers will often have a relatively clear idea of which countries belong
at the top or bottom of the scale, in a way they might not when trying to determine who
is the most “conservative” or “liberal” senator, for example (Clinton, Jackman and Rivers,
2004a).
Latent concepts in comparative research, measured by state and year, are also likely
to change in a relatively consistent manner over time. Manifest variables, in contrast, will
6
For an example of this research design in the context of measuring regime type, see
Treier and Jackman (2008) and Pemstein, Meserve and Melton (2010).
7
contain a greater amount of noise, and be more prone to idiosyncratic yearly fluctuations.
Applying measurement models that assume independent and identically distributed observations within countries ignores the temporal element of these data and does not necessarily
provide for a smoothed latent trend. Simply averaging together multiple indicators will also
not produce smooth year-to-year trends in the latent variable, unless the researcher has
access to a large number of manifest variables that are consistently observed across most
countries and years.
Another consideration is that many latent variables in comparative politics and international relations research have natural lower and upper bounds. Income cannot be more
equally distributed than on a perfectly equal basis, nor can it be less equally distributed
than when a single person controls all of a state’s resources. The ratio between the size of
the winning coalition and the selectorate must lie on the unit interval (Bueno de Mesquita
et al., 2003). A central bank or a judiciary can only be so dependent or independent of a
sitting government. At some point, we should treat bankers and judges as either the government itself or, well, themselves. Moreover, as we show, even if the underlying concept is
unbounded, bounding the latent variable may do little harm to the estimates and produce
more sensible estimates of uncertainty.
It is also unlikely that countries are equally easy to measure. One possibility is that
certain countries (perhaps during particular periods) are simply more poorly-understood.
This may be because the set of knowledgeable people about State A (e.g., Russia) is far
larger than State B (e.g., Suriname). It may be because some countries collect and make
available more (and potentially more detailed) information than others. Or it may be that a
concept is more relevant to politics in one place than another. For example, it may be easier
to measure “military resolve” in Israel or the United States than in Costa Rica, since the
former countries are commonly involved in international conflicts, whereas Costa Rica does
not have an army.
8
Finally, from a practical perspective, the availability of manifest indicators of various
latent concepts in comparative data is highly uneven. There are typically few theoretically
relevant manifest variables per latent variable to begin with—perhaps no more than ten
or fifteen. Those that do exist rarely span the entire range of countries or years under
investigation. Missing data in most comparative indicators is extensive (Honaker and King,
2010). With such sparse data, a measurement model is preferred that will be robust to the
presence of intervals when data are limited or nonexistent. Our approach addresses each of
these issues.
Model Specification
We describe a latent variable measurement model for comparative TSCS data that estimates
a smoothly trending value along a unidimensional, bounded interval scale. It therefore allows
for a more precise match between conceptual definitions and their empirical manifestations.
The latent variable xkt varies across both countries k = 1 . . . K and years t = 1 . . . T .
Although x is unobserved, we assume there to be a series of R observed variables y r , also
measured at the country-year level, that can be taken as indicators, or ratings of the latent
concept x. Individually, each y r is an imperfect and incomplete measure of the latent concept;
but together, the y r are able to reveal variation in the level of x across countries and over
r
time. Our model provides a statistical mechanism for combining the observed ykt
to produce
reliable estimates of the underlying xkt .
The indicators y r that are used to assess x are chosen by the analyst on the basis of
a prior theoretical expectation about the implications of larger or smaller values of xkt in
the world. Clearly, if we have access to multiple indicators that are specifically designed to
capture a particular latent concept, then we are well-advised to use them. But we might
also consider proxy indicators, especially those that our theories suggest are manifestations
of the underlying concept. For example, states with higher levels of democracy may exhibit
9
more frequent leadership change. States with relatively high levels of social capital may have
larger participation (per capita) in amateur sports clubs. States with lower levels of judicial
independence might exhibit politically motivated purges or a pattern of decision-making
that is highly sensitive to government preferences. In none of these cases are the observable
manifestations of the concept equivalent to the concept itself; however, we can have strong
theoretical or conceptual justifications for including them nonetheless. We assume only that
x and y r are positively associated (otherwise, treating y r as a manifestation of x is likely
incorrect); and that 1) certain indicators may be better or worse measures of the latent
variable, and 2) certain countries may be more or less reliably measured by those indicators.
This could be due to the inherent difficulty of learning about y r in country k, or the possibility
that the theory relating x to y r does not apply equally well in all countries. We require no
r
other micro-level assumptions about the actual process by which xkt results in ykt
.
It is most common in quantitative comparative research for the y r to be measured as
ordinal-level scales. When manifest variables are recorded at the interval level, they are
often bounded—from above, below, or both. As noted by Pemstein, Meserve and Melton
(2010, 433), “although these scores take on many values and thus resemble interval scales,
they do not necessarily provide interval-level information” about the latent variable. The
main concern is that the relationship between a continuous y r and the latent x may not be
linear. Following Pemstein, Meserve and Melton (2010), we therefore convert any continuous
y r into ordinal rankings. Variables containing more observations can be partitioned more
finely into larger numbers of discrete categories, with minimal loss of information. This also
preserves a consistent interpretation of model parameters across the set of manifest variables.
The results of our analysis are robust to the categorization rule.
r
To link the latent xkt to the manifest ykt
, we specify a heteroscedastic graded response
IRT model. A series of item coefficients, βr , capture the reliability, or discrimination, of
indicator r as a measure of x. Treier and Jackman (2008, 205) describe this parameter as
“the extent to which variation in the scores on the latent concepts generates different response
10
probabilities” in the outcome variables. Larger estimates of βr reveal a closer relationship
between x and y r . The inverse of this parameter has an equally intuitive interpretation as
the “personal error variance” of rater r (Pemstein, Meserve and Melton, 2010, 431). A more
“noisy” relationship between x and y r is indicated by estimates of βr that are closer to zero.
A second set of coefficients, γk , reflect the overall reliability of indicators y r in country k.
The γk act as a country-level multiplier for the βr . If raters evaluate the manifest variables in
an inconsistent manner in country k, or if those variables are all more weakly associated with
the latent variable itself in that country, then smaller values of γk attenuate the effects of βr .7
Countries in which the latent variable tracks more closely with the manifest variables will
have larger γk . The idea is similar to the heteroscedastic IRT model developed by Lauderdale
(2010), which applies when certain observational units respond more “unpredictably” to
measurement by various manifest indicators. Since we assume that x and y r are positively
associated, we restrict both βr ≥ 0 and γk ≥ 0. This also serves as an identifying restriction
for the rotational invariance of the underlying scale.
Denote as Mr the total number of outcome categories for the rth manifest variable.
Also let τrm represent the threshold values for item r in the graded response model, with
m = 1 . . . Mr . The τrm divide adjacent ratings on the latent scale, subject to the constraint
that τrm > τr(m−1) . Then the link function is written:
r
Pr(ykt
= m) = logit−1 βr γk (τrm − xkt ) − logit−1 βr γk (τr(m−1) − xkt ).
7
(1)
An alternative specification might also consider the possibility that it is years, rather
than countries, that are better- or worse-measured by the manifest variables. Generalizing
the model to include both country- and year-reliability parameters would require a large
amount of additional data to produce meaningful estimates; as such, we do not consider this
extension.
11
As xkt increases, so does the probability of observing larger-numbered outcomes m on the
r
r
manifest ykt
. The only observed values in equation 1 are the ykt
on the left-hand side: all
other parameters are estimated by the model. We fix τr0 = −∞ and τrMr = ∞, and estimate
the remaining Mr − 1 threshold parameters for each y r . The estimated threshold levels τrm
will align the observed ratings across the manifest variables, and the spacing between the
thresholds for each variable can help indicate which category distinctions are more or less
substantively meaningful, relative to the other y r .
To estimate xkt and other auxiliary parameters of interest, we adopt a fully Bayesian
approach. Theoretical considerations suggest that x is naturally bounded by a logical minimum and maximum value. Since x is a conceptual variable, we arbitrarily place its lower
bound at zero (“none” of the latent characteristic) and its upper bound at one (“all” of the
latent characteristic). We achieve a smooth trend in our estimate of the latent variable by
assuming it to follow a Bayesian random walk prior process, a form of dynamic latent trait
model (e.g., Martin and Quinn, 2002; Rosas, 2009). Within country k, we assume that the
latent value x in year t has a Normal, but bounded, prior distribution that is centered at
the previous value of the latent variable in year t − 1;
xkt ∼ N (xk(t−1) , σk2 )I(0, 1).
(2)
The notation I(0, 1) indicates that xkt can not exceed the unit interval.8 For year t = 1 we
assume a non-informative Normal prior, also truncated beyond zero and one. The variance
parameters σk2 , which are estimated separately for each country, capture the amount of
temporal variation in xkt . In countries where xkt is relatively unchanged from year-to-year—
typically due to countries remaining at the maximum or minimum level of x for the entire
8
We prefer this specification to a Beta prior distribution (which is also bounded by zero
and one) because it is characterized by mean and variance parameters that are directly
interpretable as the quantities of interest: the latent value x, and its year-to-year variability.
12
period of observation—values of σk2 will be close to zero. In countries where xkt experiences
more substantial or rapid yearly changes, σk2 can be larger.9 Letting σk2 vary by country
ensures that countries where xkt varies greatly are not over-smoothed by comparison to
countries where xkt is more stable. The σk are each assigned vague uniform priors.
The Bayesian specification allows us to seamlessly handle the frequent occurrence of missr
ing data (Jackman, 2000). In each country-year, up to R manifest ratings ykt
are observed.
r
Country-years with greater numbers of observed ykt
will have more information from which
r
to update xkt , based on the estimated βr and γk . When many ykt
are missing, the posterior
estimate of xkt will be closer to its prior distribution. In cases where every y r is missing for a
given country-year, the random walk process bridges the gap by connecting estimates of xkt
from the last year in which any y r were observed to those in the next year with an observed
value of y r .
We finalize the model by placing prior distributions over βr , γk , and τrm . The threshold
parameters are assigned vague normal priors. To satisfy the constraint that τrm are increasing
over m, we re-sort the thresholds in each iteration of the estimation algorithm, as described
by Curtis (2010). For the item- and country-parameters βr and γk , we assume moderately
informative half-normal priors with variance 0.1. This ensures that both coefficient vectors
are strictly positive. When there are relatively few manifest variables, as in most comparative
research, sensibly chosen prior distributions can prevent coefficient estimates from increasing
indefinitely (Bailey, 2001). We find that any priors more diffuse than this may allow estimates
of βr and γk to grow unrealistically large.
9
If there is reason to believe that a country has experienced a sudden, “known” break in
the time series—as when transitioning from dictatorship to democracy, for example—it may
be preferable to divide the country into two periods, estimate the latent variable separately
for each sub-unit, then reassemble those x values following estimation.
13
Measuring Judicial Independence
We consider a behavioral, or de facto concept of judicial independence. A judge is independent insofar as her decisions reflect her evaluation of the legal regard (autonomous
decision-making) and insofar as those decisions are respected by government officials who
disagree with them (effective decision-making). In aggregate, a judiciary’s independence
increases in the independence of its judges.10
The measurement model we develop is particularly appropriate in this context. As we
have already noted, extant indicators suffer from related patterns of missing data and measurement error, which our approach addresses. The concept is clearly bounded. And importantly, theories of judicial independence require that our measures are sensitive to both
abrupt and more gradual change. Indeed, one set of models of judicial independence suggests that the concept should evolve slowly, as judges carefully manage their case loads and
decisions (e.g., Helmke, 2005; Ginsburg, 2003). Yet another set of models suggests that independence should grow or shrink abruptly, either because of shifts in exogeneous political
conditions (e.g., Rı́os-Figueroa, 2007) or because critical pieces of new information radically
transform a court’s authority (e.g., Carrubba, 2009). To evaluate these claims, we need an
approach that is capable of revealing both types of change.
Indicators of Judicial Independence
To measure latent judicial independence as we conceive it, we make use of eight manifest
indicators that have previously been used to assess de facto judicial independence (Table 1).
In a review, Rı́os-Figueroa and Staton (Forthcoming, 28) suggest that five of these variables
10
This reflects a common definition, though admittedly not the only one. Specifically, it
draws on the Cameron (2002) power concept of judicial independence. We are comfortable
labeling this concept judicial power, as well. A typical alternative merely requires autonomy;
see Rı́os-Figueroa and Staton (Forthcoming) for a general conceptual discussion.
14
Variable
Tate-Keith
Howard-Carey
CIRI
XCONST
CIM
Feld-Voigt
PRS
Fraser
Measurement level
ordinal; 3 categories
ordinal; 3 categories
ordinal; 3 categories
ordinal; 7 categories
interval; 0–1
interval; 0–1
interval; 0–6
interval; 0–10
Years
available
1990-2004
1992-1999
1981-2009
1960-2008
1960-2000
1980-2003
1984-2008
1995, 2000-2005
Percent
missing
66%
83%
38%
19%
37%
77%
60%
93%
Source
Tate and Keith (2009)
Howard and Carey (2004)
Cingranelli and Richards (2010)
Marshall and Jaggers (2010)
Clague et al. (1999)
Feld and Voigt (2003)
Rı́os-Figueroa and Staton (Forthcoming)
Rı́os-Figueroa and Staton (Forthcoming)
Table 1: Eight variables used to scale latent judicial independence, and their availability.
are designed precisely to capture the concept of judicial independence that we have described.
For two of them, Fraser and XCONST, it was unclear which concept of judicial independence
was being measured. It also appears that the Howard-Carey indicator is designed to capture
“autonomy.” But, since an independent court under our concept must be autonomous,
Howard-Carey should provide relevant information.
Tate-Keith, Howard-Carey, and CIRI are all drawn from the U.S. State Department
Human Rights Country Reports. PRS, Feld-Voigt, and Fraser draw on international surveys
of country experts. Polity IV’s measure of executive constraints, XCONST, is coded by the
project’s team of experts. Finally, the Contract Intensive Money score (CIM) reflects the
proportion of money that is held in banking institutions.11 The logic of this proxy measure
is that individuals are more likely to keep their financial assets in banks when they believe
that a state’s institutions designed to protect property rights are credible. The judiciary is
central among institutions designed to do so.
Tate-Keith, Howard-Carey, CIRI, and Feld-Voigt each explicitly attempt to provide a
score for the independence of a state’s judiciary. In contrast, XCONST, Fraser, and PRS
are hybrid measures, reflecting a team’s evaluation of the judiciary along with other features
11
Specifically, CIM is “the ratio of non-currency money to the total money supply, or
(M2-C)/M2 where M2 is a broad definition of money supply and C is currency held outside
of banks” (Clague et al., 1999, 188).
15
of the legal system. For example, XCONST is designed to broadly reflect “constraints” on
government authority. PRS is a measure of law and order. Fraser provides information on the
rule of law, impartiality and the protection of property rights, as well as the independence of
the judiciary. On their own, each of the hybrid measures might be less reliable indicators of
the underlying concept, however, as the underlying concept is latent and very much related to
the non-judicial independence features of these measures, their inclusion is highly reasonable.
We investigate 200 countries over the 50-year interval from 1960 to 2009. Because many
countries were not in existence for the entire study period, there are a total of 8,197 potentially observable country-years in the data set. Between the eight indicators, data coverage
is most consistent in the mid-1990s, and at least five variables are available in each year
between 1984 and 2004. If every manifest variable had been measured in every country-year,
r
there would be a total of 65,576 observed values of ykt
. In actuality, we only observe 26,939,
for an overall missingness rate of 59%. This missingness is distributed unevenly across the
eight indicators, with Fraser missing fully 93% of the possible country-years, and XCONST
observed for all but 19%.
The first four indicators were coded as ordinal level variables by their original authors.
We convert CIM into an eight-category measure with the first category for values lower than
0.3. The remaining seven categories bin observations into increments of 0.1 on the original
scale, up to 1. This variable is highly left-skewed, as only 1.6% of observed values fall into the
first group. Another 2.7% are in category two, and 4.6% are in category three. Feld-Voigt
and Fraser we divide into six categories of equal width from their minimum to maximum
value. Feld-Voigt is recorded as a single value that we assume to have held constant from
1980 to 2003. PRS is already nearly categorical (most values are integers from 1 to 6), so to
generate an ordinal measure, we round each rating up to the nearest whole number.
16
Estimation and Model Fit
We estimate the full Bayesian model using a Markov chain Monte Carlo sampling procedure,
implemented in the WinBUGS and R software packages (Lunn et al., 2000; Sturtz, Ligges
and Gelman, 2005; R Development Core Team, 2011).12 Our analysis is based upon the
posterior distribution of parameters xkt , βr , γk , τrm , and σk2 .
To evaluate the fit of the model, we compare the observed distribution of each indicator
variable y r to the predicted distributions based upon the model estimates. For each country
r
and year in which ykt
is observed, we resample values of each parameter estimate from their
joint posterior distribution, and enter them into the data generating process in equation 1.
r
This produces one set of predicted values for each ykt
. For example, XCONST is observed
for 6,678 country-years, so we generate 6,678 hypothetical values of XCONST based upon
the fitted model. We then tabulate the frequency with which each rating is predicted for
each outcome variable, and repeat this simulation a large number of times. This produces a
posterior predictive distribution over the possible outcomes of all eight manifest variables.
If the model fits the data, there should be no systematic discrepancies between the
observed y r and the posterior predictive distribution. This is what we find (Figure 1).
Of the 42 possible ratings, the observed rating frequency falls within the interval spanned
by 95% of the posterior predicted frequencies for 41, or 98%, of the ratings. The lone
exception is a slight under-prediction of the lowest level of XCONST. However, there is no
general indication that the model is prone to over- or under-predicting extreme ratings on
the manifest variables; a potential concern given the bounded nature of the latent variable.
Of the eight manifest variables in our analysis, CIM is arguably the most likely to report
information on a distinct concept. As we note above, to use CIM as a proxy for judicial
independence requires a model of how mass financial behavior responds to the performance
12
Convergence is assessed by visual inspection of a series of three chains for adequate
mixing and values of the Gelman-Rubin statistic R̂ ≈ 1 (Gelman and Rubin, 1992; Cowles
and Carlin, 1996; Brooks and Gelman, 1998; Gelman et al., 2004).
17
Howard−Carey
●
1
1500
●
●
●
●
●
●
0
●
●
1500
●
1000
500
●
1 2 3 4 5 6 7 8
Rating Level
700
600
500
400
300
200
100
0
2
3
Rating Level
1
●
●
●
2 3 4 5
Rating Level
●
●
●
1000
●
●
500
2
3
Rating Level
●
●
1
2
3 4 5 6
Rating Level
●
Fraser
800
●
600
●
●
●
●
400
200
6
7
150
●
0
1
1500
PRS
●
●
2000
0
Feld−Voigt
Rating Frequency
Rating Frequency
CIM
500
●
2000
0
2
3
Rating Level
1000
●
●
Rating Frequency
1
XCONST
Rating Frequency
●
CIRI
2500
Rating Frequency
●
●
700
600
500
400
300
200
100
0
Rating Frequency
Rating Frequency
Rating Frequency
Tate−Keith
1400
1200
1000
800
600
400
200
0
●
●
●
●
100
50
●
●
0
1
2 3 4 5
Rating Level
6
1
2 3 4 5
Rating Level
6
Figure 1: Comparison of the observed distributions of the manifest variables (gray bars) to
simulated values from the fitted model. Points denote the mean predicted frequencies; vertical
lines span 95% of posterior predicted values.
of the judiciary. This model may not be correct. We investigate the sensitivity of our
latent variable estimates to the information contained in the CIM variable by re-fitting
the model excluding CIM. Estimates of x̂kt from this specification are highly similar to
estimates from the original model. The correlation between the two sets of estimates is
above 0.95 in more than half of the countries, and 0.97 across the complete set of countryyears. In future applications where missing data is especially pervasive, including additional
manifest variables can help guard against any single indicator having too great an influence
on estimates of the latent variable.
A Latent Measure of Judicial Independence
The model produces estimated levels of judicial independence for every country and year in
our data set.13 As an initial validation of our results, we plot the estimates for the most
13
An electronic file containing all 8,197 of these values is available from the authors by
request.
18
Iraq
Democratic Republic of the Congo
Turkmenistan
Chad
Cuba
North Korea
Bhutan
Afghanistan
Libya
Uzbekistan
Haiti
Myanmar
Equatorial Guinea
Sudan
Yemen
Congo
Central African Republic
Togo
Somalia
Cambodia
Azerbaijan
Zimbabwe
Eritrea
Belarus
Kazakhstan
Saudi Arabia
Gabon
Burundi
Cameroon
Bangladesh
Qatar
Yugoslavia
Maldives
Guinea
Tunisia
Tajikistan
Fiji
Iran
Burkina Faso
Bahrain
Angola
Liberia
Djibouti
Gambia
Pakistan
Ivory Coast
Vietnam
China
Nigeria
Rwanda
Laos
Syria
Mauritania
Oman
Morocco
Uganda
Bosnia and Herzegovina
Tanzania
Venezuela
Kyrgyzstan
Mozambique
Egypt
Georgia
Swaziland
Jordan
Kenya
Mali
Guinea−Bissau
Indonesia
Honduras
Seychelles
Madagascar
United Arab Emirates
Armenia
Benin
Algeria
Russia
Malaysia
Singapore
Niger
Thailand
El Salvador
Guatemala
Ecuador
Guyana
Philippines
Ukraine
Argentina
Kuwait
Brazil
Ethiopia
Panama
Paraguay
Albania
Senegal
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.0
0.2
0.4
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Suriname
Sao Tome and Principe
Dominican Republic
Zambia
Nepal
Mexico
Colombia
Ghana
Sierra Leone
Nicaragua
Sri Lanka
Papua New Guinea
Lebanon
Bolivia
Comoros
Romania
Bulgaria
Malawi
Slovakia
Moldova
Mongolia
Brunei
Turkey
Peru
Macedonia
Croatia
Namibia
Taiwan
Greece
South Korea
Latvia
France
India
Jamaica
Poland
Andorra
Czech Republic
Lithuania
Solomon Islands
Italy
Trinidad and Tobago
Hungary
Botswana
Belize
South Africa
Uruguay
Cape Verde
Marshall Islands
Tonga
Estonia
Lesotho
Bahamas
Spain
Chile
Malta
Grenada
Costa Rica
Portugal
Liechtenstein
Barbados
Samoa
Mauritius
Cyprus
St. Vincent and the Grenadines
Slovenia
Belgium
St. Lucia
San Marino
Vanuatu
Israel
Tuvalu
Nauru
St. Kitts and Nevis
Germany
Ireland
Monaco
Palau
United Kingdom
Federated States of Micronesia
Sweden
Switzerland
Netherlands
Denmark
Kiribati
New Zealand
Antigua & Barbuda
Austria
Canada
United States of America
Norway
Iceland
Dominica
Australia
Japan
Finland
Luxembourg
0.6
0.8
1.0
0.0
0.2
0.4
0.6
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.8
1.0
Figure 2: Estimates of judicial independence in 191 countries in 2009. Error bars indicate
80% posterior credible intervals.
19
recent year, 2009, along with an associated measure of uncertainty (Figure 2). The ordering
predictably places countries such as North Korea, Cuba, and Iraq at the low end of the scale,
and countries like Finland, Australia, and Japan at the top.
The countries that are estimated to have the highest and lowest levels of judicial independence are also estimated with the smallest amount of posterior uncertainty. This is
just as we should expect: the countries that are more difficult to rank are those towards
the center of the scale. However, this result is precisely the opposite of what is found by
standard IRT models that assume that the latent variable is unbounded. In the analysis of
Pemstein, Meserve and Melton (2010), for example, the level of democracy in countries at
the absolute top and bottom of the scale are estimated with the greatest amount of uncertainty. This is attributable to the “truncation inherent in the individual component scales”
(p. 440). Once countries reach the limits of the scales, there is no more information from
which to distinguish one highly democratic country from another. But in our analysis, when
countries consistently demonstrate features that reveal very low or very high levels of judicial
independence, the estimator can reliably place those countries at exactly the most extreme
position on the latent scale.
Temporal trends
Theories of judicial independence commonly make predictions about changes over time;
however, the coarseness of the extant indicators, as well as the prevalence of missing data,
has largely prohibited quantitative scholars from even simply describing temporal variation
in the concept. On some theoretical accounts, major political events should cause an abrupt
change in judicial independence (e.g., North and Weingast, 1989; Ginsburg, 2003), whereas
on other accounts, judicial independence should evolve more or less smoothly over time (e.g.,
Carrubba, 2009; Helmke, 2005). One of the most significant contributions of our model is
its ability to estimate any number of different trends in the latent variable over time, based
upon the highly flexible random walk prior process. The practical benefit to analysts is the
20
Australia
Spain
Thailand
Argentina
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
1960
1970
1980
1990
2000
2010
1960
1970
Cuba
1980
1990
2000
2010
1960
1970
Venezuela
1980
1990
2000
2010
1960
Chile
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
1970
1980
1990
2000
2010
1960
1970
1980
1990
1980
1990
2000
2010
2000
2010
Egypt
1.0
1960
1970
2000
2010
1960
1970
1980
1990
Figure 3: Trends in judicial independence in eight countries. Points denote the observed
data, re-scaled to the unit range, and jittered for improved visibility. For indicators that were
originally continuous, we plot the original values rather than the categorical y r . More darkly
shaded points indicate manifest variables with larger item discrimination, βr . Seven of the
eight y r are shown as circles, but we plot CIM using triangles to make it stand out. The thick
black line represents estimates of the latent xkt , with an 80% posterior credible interval. The
thinner, jagged line shows the result of re-scaling each of the original eight indicators to the
unit range, and then taking the average of the observed values in each country-year.
possibility of conducting within-country, temporal comparisons across relatively long time
intervals. Our measure is sensitive to both large changes in judicial independence, as well as
more gradual or subtle shifts over time.
To illustrate, we select eight countries with a variety of temporal patterns of judicial
independence, and plot the estimated xkt , surrounded by an 80% posterior credible interval
(Figure 3).14 The observed data, re-scaled to the unit interval, are shown as points. We
also compare our estimates to a naive descriptive estimate obtained by averaging together
the available manifest indicators for each country-year. In each case, the assumptions of the
model lend considerable statistical power to the estimator, filtering away a large amount of
14
Plots showing the complete set of estimated time series are attached to the back of this
paper.
21
measurement error and stochastic noise, and allowing us to uncover small or higher-order
trends with much greater reliability.
The model reveals a number of temporal trends that should be familiar to judicial scholars. The first column highlights temporally invariant patterns that emerge at the top and
bottom of the latent scale over the entire period of observation. For Australia, the observed
indicators cluster tightly together at the top of the scale, reflecting the general consensus
that the Australian judiciary, at least relative to the world, was considerably independent
over the last 50 years. The Cuban series suggests precisely the opposite. As the time period
covers the entire history of the Castro brothers’ regime, it would have been surprising to
estimate anything other than what we see in Figure 3. Yet, Cuba also allows us to consider
a consequence of simply averaging the indicators. The light grey points near the top of the
scale reflect the PRS measure, which picks up orderly societies as much as societies in which
judiciaries are independent. This is, however, the only indicator that suggests anything
other than low latent independence. When data are scarce, the mean will be overly sensitive
to individual, discrepant indicators—here, spiking upwards as soon as PRS enters in 1984.
Importantly, PRS exhibits relatively low discrimination and since it is in great disagreement
with the remaining indicators, the model largely ignores its claims about Cuba. This is not
to say that PRS is a fundamentally invalid indicator or necessarily “wrong” about Cuba. It
is may be quite right about Cuba’s law and order status, especially its order status. And
we would not necessarily want to always ignore the information PRS provides about judicial
independence. In many settings, the PRS is likely to reveal valid information about the concept. The Cuban case demonstrates how the model allows us to include indicators that are
often right without giving too much weight to them in contexts where the evidence strongly
points to their being wrong.
It is more common for judicial independence to vary over time, in response to significant domestic political events and shifts in government policy. The second column displays
cases in which there is a prolonged, unidirectional upward or downward trend. The change
22
is abrupt in Spain, responding very strongly to its transition to democracy in the period
immediately following Franco’s death. This change likely reflects explicit efforts of reformers
to change the nature of constitutional control in Spain via a system of centralized constitutional review (Guillen Lopez, 2007). But it also reflects the fact that the both Spanish legal
culture and Spain’s judiciary was not entirely lined up with the Franco regime. There was
clearly a segment of the Spanish judiciary willing to place limits on the state, if given the
chance (Larkins, 1996, 612). If these accounts are correct, then we should have observed an
abrupt change in Spanish judicial independence upon the transition to democracy.
Venezuela reflects a different path. The Chavez period has been associated with a gradual erosion of judicial independence, through court packing at various levels and targeted
purges (Taylor, 2009). This change is picked up by the latent measure. Critically, the average suggests a massive drop in 1989; however, all that has “happened” is that additional
indicators become available at the bottom of the scale. The Venezuelan panel reveals another
subtle feature of the model. Note that while the average tracks the two (and only) observed
indicators during the 1960s and 1970s, the smoother is consistently lower. The reason is
that observed values between 0.7 and 0.8 on the re-scaled XCONST and CIM measures are
associated with middling scores for the other indicators. For this reason, the model estimates
independence in Venezuela to be closer to the center of the scale during this period.
The final set of examples shows states in which judicial independence has turned, sometimes multiple times—and demonstrates the capacity of the model to identify non-monotonic
patterns in judicial independence. The Chilean series reflects constraints on the judiciary
imposed by the Pinochet regime, which were largely removed after the transition (Scribner,
2011).15 The Thai panel mirrors this pattern. The 1997 Thai constitutional reform was
designed to confirm democratic changes in the regime following the Black May Uprising of
1992, and gave new authority to the judiciary to investigate allegations of political corrup15
But see Hilbink (2007) who argues that the Chilean judiciary’s independence was not
compromised during this period.
23
tion. The estimates reflect an increase in judicial independence beginning at the end of
the 1980s and peaking in the 1990s. Ginsburg (2009, 96) argues, however, that what gains
might have been made in the period following the 1997 reform were undermined by Prime
Minister Thaksin, who came to power in 2001. He writes, “Gradually, Thaksin began to
influence all the independent political institutions, including the Constitutional Court and
those designed to prevent corruption.” As in Cuba, using only the average trend produces a
series of misleading peaks and valleys, due to the relative paucity of data prior to 1980.
The model can also distinguish more subtle trends in judicial independence, as in Egypt.
The Egyptian series suggests a rise beginning around 1980, followed by a fall starting in
the late 1990s. About this period, Brown (2002, 151-152) describes, “After 1979 (especially
after the mid-1980s when the new appointment procedure had begun to seriously affect
the composition of the [Constitutional] Court), the Court rapidly distinguished itself as the
boldest and most independent judicial actor in Arab history.” Yet in the late 1990s, “The
presidency of the Supreme Constitutional Court fell vacant with the retirement of the activist
‘Awad al-Morr. The vacancy was used to pressure the Court into accepting a diminution in
its authority to issue retroactive judgments.”
Argentina’s panel is highly informative and speaks directly to the validity of the measure.
We would expect a measure of governance in Argentina to be highly unstable: observing the
peaks and valleys in Figure 3 is not surprising. And in light of theories like Helmke’s (2005),
we would expect to observe changes in the series as regimes de-stabilize. It is particularly
interesting that the estimate detects an upward change in judicial independence beginning
in 1980, prior to the fall of the junta. This is consistent with Helmke’s argument, but is also
important to note that the measure responds to features of Argentine judicial politics that are
independent of regime change. Chávez, Ferejohn and Weingast (2011) argue that patterns
of judicial independence in Argentina have tracked the fragmentation of government closely,
precisely because it was difficult to discipline the court absent inter-party coordination.
Specifically, they argue that since government was divided during the Alfonsı́n era (1983-
24
1989), the judiciary found space to “challenge the executive” (p. 237). Yet, during the
Menem period (1989-1997), where government was unified, judicial independence was deeply
compromised. Our estimates support these claims. There is a pronounced increase in judicial
independence beginning in the early 1980s, followed by an abrupt change at the end of the
decade, precisely when the President began to pack the Supreme Court.
Assessing the indicators
The model returns information not only about latent judicial independence, but the indicators themselves. Indeed, inferences about the latent variable depend upon the relationship
between x and each of the manifest y r in country k. This is captured in our model by the
discrimination parameters βr , threshold values τrm , and country-reliability parameters γk .
Estimates of βr range from 2.8 for Feld-Voigt to 10.9 for XCONST, with posterior standard
deviations in the range of approximately 0.1 to 0.2. As is apparent in Figure 3, indicators
with larger discrimination parameters exhibit greater “pull” on the latent variable.
But the relative positioning of the threshold estimates for each indicator is also crucial—
especially for scaling xkt during intervals when data are sparse. The thresholds reveal how
outcomes on each manifest variable align with one another along the latent scale. Different
indicators are better at distinguishing values of xkt at different levels of the latent variable
(Figure 4). The three-category indicators Tate-Keith, Howard-Carey, and, to a lesser extent,
CIRI, all partition the scale in a relatively similar fashion: country-years rated 1 (below the
lowest threshold) on one variable are likely to be rated 1 on the others as well. By comparison,
ratings of 1 on Tate-Keith and Howard-Carey roughly correspond to ratings of 3 or less on
XCONST; and, continuing to read across Figure 4, ratings of up to 5 or 6 on CIM. Because
CIM is so left-skewed, a country-year can have a relatively high observed value (say, 5 out
of 8) and still belong at the low end of the latent scale.
The consequences of this indicator realignment is apparent in Figure 3. CIM, which is
plotted using triangular points, is consistently above the smoothed latent trend. While the
25
Threshold value
1.5
1.0
0.5
0.0
●
−0.5
Tate−
Keith
Howard−
Carey
CIRI
XCONST
CIM
Feld−Voigt
PRS
Fraser
Figure 4: Threshold estimates τ̂rm for eight indicators of judicial independence. Shading
denotes areas of 95% highest posterior density.
trend line for the mean is fooled by these “large” values of CIM, our model recognizes that
even when CIM is as high as 0.7 (corresponding to category 5), judicial independence should
still be considered low. Similarly, when CIM is above 0.9 (category 8), the country-year
is very likely to be scaled as close to 1. Country-years manifesting the highest-category
outcomes on Feld-Voigt, PRS, and Fraser are even more likely to appear at the top of the
latent scale.
Estimates of the country-level γk parameters suggest that some countries are, as expected,
less conducive than others to scaling judicial independence based on the chosen y r . Following
equation 1, larger values of γk amplify the effects of all eight βr in country k, leading the
trend in xkt to respond more strongly to the observed data. Where γk is smaller, the data
are less informative about the latent value and as a result, the prior receives greater weight
and the trend changes more gradually over time. In Table 2, we list the ten countries with
the highest and lowest estimated γk . At the low end, countries such as Swaziland, Lesotho,
and Qatar are especially poorly measured: the manifest indicators do not trend together in
a manner that is systematically similar to other countries in the sample. At the high end,
as expected, we observe a series of developed countries including the USA, Sweden, Ireland,
Australia, and New Zealand.
26
Ten lowest
Swaziland
Lesotho
Qatar
Georgia
Kuwait
Bahrain
Central African Republic
German Democratic Republic
Republic of Vietnam
Czech Republic
γk
0.28
0.37
0.53
0.53
0.58
0.58
0.59
0.62
0.63
0.67
(0.23,
(0.29,
(0.46,
(0.44,
(0.51,
(0.50,
(0.50,
(0.44,
(0.36,
(0.54,
0.34)
0.45)
0.59)
0.63)
0.64)
0.65)
0.68)
0.81)
0.89)
0.81)
Ten highest
Cyprus
Sweden
Ireland
United States of America
Cameroon
New Zealand
Poland
Denmark
Tanzania
Australia
γk
1.87
1.81
1.75
1.74
1.73
1.72
1.70
1.69
1.68
1.66
(1.68, 2.08)
(1.59, 2.03)
(1.52, 1.98)
(1.52, 1.95)
(1.56, 1.90)
(1.50, 1.94)
(1.52, 1.89)
(1.48,1.91)
(1.52, 1.85)
(1.43, 1.93)
Table 2: Ranking of the ten best- and worst- measured countries, by estimated γk . Parentheses indicate 80% posterior credible intervals.
When γk approaches 2, the product βr γk in equation 1 can generate coefficients of 15 or
more on certain items. It may nevertheless be the case that judicial independence in country
k remains relatively consistent over the entire study period. Large values of γk only create
the potential for substantively meaningful effects on xkt , in countries where the latent trend
actually varies over time. A more consistent feature of γk is its association with the size of the
posterior credible interval surrounding estimates of xkt : smaller γk reveal greater posterior
uncertainty in the value of the latent variable. In Kuwait, for example, a variety of different
indicators simultaneously rate the country at both the maximum and minimum of the scale
for nearly the entire study period. The result is a nearly flat, and fairly uncertain, trend.
We compare this to Mexico, which with γk = 1.54 is among the “best-measured” countries
in our sample. It is reasonable that for a country adjacent to the United States, about which
much is known, both conceptual clarity and measurement quality would be high. The trend
in judicial independence in Mexico is nevertheless relatively flat over the observed period.
From a theoretical perspective, countries with small γk are interesting for other substantive and diagnostic reasons. These estimates can direct researchers to the countries or
regions that may require greater attention to be paid to reducing measurement error. Alternatively, they may indicate countries where existing theories about the consequences of the
latent variable are failing to apply, helping to guide the development of new theories that
generalize more broadly around the world. Many of the most poorly measured countries in
27
our analysis, for example, are in the Middle East and central Asia. Why this should be the
case is not immediately apparent, but warrants a closer examination.
Conclusion
Latent concepts pervade comparative politics and international relations. The most common
approach to measuring them has involved the use of single proxy instruments, where scholars
at best consider the robustness of findings to alternative proxies. Researchers have recently
approached the problem through the application of measurement models designed for the
task. In this paper, we develop a dynamic heteroscedastic graded response IRT model
designed for the kind of time series, cross-sectional data common to studies of comparative
politics and international relations, and apply it to the problem of measuring latent judicial
independence.
We emphasize two features of our approach which represent substantive advancements
over existing latent variable models. The first is the assumption that latent judicial independence follows a Bayesian random walk prior process, which permits us to smooth our
estimates over time. As many concepts in international relations and comparative politics,
measured in TSCS data, change gradually over time, this assumption lends a great deal
of statistical power. In a context where different indicators generally agree but measurement error is significant, smoothing allows us to cut through considerable noise. Second, by
bounding the latent variable, we obtain uncertainty estimates that make sense. We should
be more confident about the placement of states at either end of the scale than we are about
states in the middle. Bounding the latent variable produces this effect.
There are a number of implications for the study of judicial independence. Clearly,
comparative judicial scholars have been limited by extant indicators of the concept. Related
patterns of measurement error and missing data have not only complicated analyses, but
they have rendered some kinds of studies simply impossible to conduct with anything but a
28
rough proxy. The sheer increase in data that we provide addresses these practical problems.
We are now in a position to trace judicial independence systematically over a relatively long
time period. Importantly, the estimates are neither too sensitive to severe changes in one or
two indicators nor are they completely insensitive to massive changes in political context.
We hope that this feature of the measure will allow scholars to more precisely evaluate claims
about changes over time, claims that suggest both dramatic and subtle change.
It is also transparent that no theory of judicial independence that anticipates only one
kind of development over time can explain what we observe. Trends around the globe simply
fit multiple patterns. Judicial scholars have always known this, but our estimates confirm
the point clearly. Some states are highly stable. Some are highly volatile. Others exhibit
change, but in one direction, while still others experience considerable backslides or single
changes for the better. We hope that our measure will contribute to the further development
of theories that are conditional and sensitive to context.
Another beneficial feature of the model is its ability to speak to the difficulty of measuring
particular countries. We hope that our findings on this score will influence how research
teams evaluate their measurement efforts, perhaps by re-thinking why we seem to get some—
but not all—countries consistently correct. The findings also offer a tool for funders to
evaluate whether to encourage particular work in the states where we seem to be doing a
relatively mediocre job.
Finally, we believe it is important to stress that scholars who choose to neither use our
measure, nor estimate their own, will do well to at least consider averaging the scores to
which they have access. Figure 3 certainly suggests that an average is far more unstable
than the estimate we provide, and there are years where it seems artificially high or low.
That said, it does a reasonably good job of aggregating the scores. In a pinch, averaging
these series is better than selecting one indicator on more or less arbitrary grounds.
29
References
Ansolabehere, Stephen, Jonathan Rodden and James M. Snyder. 2008. “The Strength of
Issues: Using Multiple Measures to Gauge Preference Stability, Ideological Constraint,
and Issue Voting.” American Political Science Review 102(2):215–232.
Arel-Bundock, Vincent and Walter R. Mebane, Jr. 2011. “Measurement Error, Missing
Values and Latent Structure in Governance Indicators.”. Paper presented at the Annual
Meeting of the American Political Science Association, Seattle, WA.
Bafumi, Joseph and Michael C. Herron. 2010. “Leapfrog Representation and Extremism: A
Study of American Voters and their Members in Congress.” American Political Science
Review 104(3):519–542.
Bailey, Michael. 2001. “Ideal Point Estimation with a Small Number of Votes: A RandomEffects Approach.” Political Analysis 9(3):192–210.
Brooks, Stephen P. and Andrew Gelman. 1998. “General Methods for Monitoring Convergence of Iterative Simulations.” Journal of Computational and Graphical Statistics
7(4):434–455.
Brown, Nathan J. 2002. Constitutions in a nonconstitutional world: Arab basic laws and the
prospects for accountable government. State University of New York Press.
Bueno de Mesquita, Bruce, Alastair Smith, Randolph M. Siverson and James Morrow. 2003.
The Logic of Political Survival. MIT Press.
Cameron, Charles M. 2002. Judicial Independence: How Can You Tell It When You See
it? And, Who Cares? In Judicial Independence at the Crossroads: An Interdisciplinary
Approach, ed. Steven B. Burbank and Barry Friedman. New York: Sage Publications Inc.
pp. 134–147.
Carothers, Thomas. 2006. Promoting the rule of law abroad: in search of knowledge. Carnegie
Endowment for Intl Peace.
Carrubba, Clifford J. 2009. “A Model of the Endogenous Development of Judicial Institutions
in Federal and International Systems.” Journal of Politics 71(1):1–15.
Carrubba, Clifford J., Matthew Gabel and Charles Hankla. 2008. “Judicial Behavior under
Political Constraints: Evidence from the European Court of Justice.” American Political
Science Review 435–452(4):109.
Chávez, Rebecca Bill, John A. Ferejohn and Barry R. Weingast. 2011. A Theory of the Politically Independent Judiciary: A Comparative Study of the United States and Argentina.
In Courts in Latin America. New York: Cambridge University Press.
Cingranelli, David L. and David L. Richards. 2010.
“The Cingranelli Richards
(CIRI) Human Rights Database Coding Manual.” available online at:
http://ciri.binghamton.edu/documentation.asp.
30
Clague, Christopher, Philip Keefer, Stephen Knack and Mancur Olson. 1999. “ContractIntensive Money: Contract Enforcement, Property Rights, and Economic Performance.”
Journal of Economic Growth 4(2):185–211.
Clark, Tom S. 2010. The Limits of Judicial Independence. New York: Cambridge University
Press.
Clinton, Joshua D., Simon Jackman and Doug Rivers. 2004a. “‘The Most Liberal Senator’ ? Analyzing and Interpreting Congressional Roll Calls.” Political Science and Politics
37(4):805–811.
Clinton, Joshua, Simon Jackman and Douglas Rivers. 2004b. “The Statistical Analysis of
Roll Call Data.” American Political Science Review 98(2):355–370.
Conrad, Courtenay R. and Emily Hencken Ritter. Forthcoming. “Treaties, Tenure, and
Torture: The Conflicting Democratic Effects of International Law.” Journal of Politics .
Cowles, Mary Kathryn and Bradley P. Carlin. 1996. “Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review.” Journal of the American Statistical Association 91(434):883–904.
Curtis, S. McKay. 2010. “BUGS Code for Item Response Theory.” Journal of Statistical
Software 36(1).
Escriba i Folch, Abel and Joseph Wright. 2012. “Human Rights Prosecutions and Autocratic
Survival.” Paper prepared for presentation at the American Political Science Association
Meetings.
Feld, Lars P. and Stefan Voigt. 2003. “Economic Growth and Judicial Independence: Crosscountry Evidence Using a New Set of Indicators.” European Journal of Political Economy
19(3):497–527.
Fox, Jean-Paul. 2010. Bayesian Item Response Modeling: Theory and Applications. New
York: Springer.
Gelman, Andrew and Donald B. Rubin. 1992. “Inference from Iterative Simulation using
Multiple Sequences.” Statistical Science 7(4):457–511.
Gelman, Andrew, John B. Carlin, Hal S. Stern and Donald B. Rubin. 2004. Bayesian Data
Analysis; Second Edition. Chapman & Hall/CRC.
Gibler, D.M. and K.A. Randazzo. 2011. “Testing the Effects of Independent Judiciaries
on the Likelihood of Democratic Backsliding.” American Journal of Political Science
55(3):696–709.
Ginsburg, Tom. 2003. Judicial Review in New Democracies: Constitutional Courts in Asian
Cases. New York: Cambridge University Press.
Ginsburg, Tom. 2009. “Constitutional afterlife: The continuing impact of Thailand’s postpolitical constitution.” International journal of constitutional law 7(1):83.
31
Guillen Lopez, E. 2007. “Judicial Review in Spain: The Constitutional Court.” Loyola of
Los Angeles Law Review 41:529.
Haggard, Stephan, Andrew MacIntyre and Lydia Tiede. 2008. “The Rule of Law and Economic Development.” Annual Review of Political Science 11:205–234.
Haynie, Stacia L., Reginald S. Sheehan, Donald R. Songer and C. Neal Tate. 2007. “National
High Courts Database.”.
URL: http://artsandsciences.sc.edu/poli/juri/highcts.htm
Hayo, Bernd. and Stefan Voigt. 2007. “Explaining de facto judicial independence.” International Review of Law and Economics 27(3):269–290.
Helmke, Gretchen. 2005. Courts under Constraints. Cambridge: Cambridge University
Press.
Hilbink, Lisa. 2007. Judges beyond politics in democracy and dictatorship: lessons from
Chile. Cambridge University Press.
Honaker, James and Gary King. 2010. “What to do about Missing Values in Time Series
Cross-Section Data.” AJPS 54(3):561–581.
Howard, Robert M. and Henry F. Carey. 2004. “Is an Independent Judiciary Necessary for
Democracy?” Judicature 87(6):284–290.
Jackman, Simon. 2000. “Estimation and Inference Are Missing Data Problems: Unifying
Social Science Statistics via Bayesian Simulation.” Political Analysis 8(4):307–332.
Jessee, Stephen A. 2009. “Spatial Voting in the 2004 Presidential Election.” American
Political Science Review 103(1):59–81.
Johnson, Valen E. and James H. Albert. 1999. Ordinal data modeling. New York: SpringerVerlag.
Joliffe, I.T. 2002. Principal Component Analysis, Second edition. New York: Springer-Verlag.
La Porta, Rafael, Florencio López de Silanes, Cristian Pop-Eleches and Andrei Shleifer. 2004.
“Judicial Checks and Balances.” Journal of Political Economy 112(2):445–470.
Larkins, Christopher M. 1996. “Judicial independence and democratization: A theoretical
and conceptual analysis.” The American Journal of Comparative Law 44(4):605–626.
Lauderdale, Benjamin E. 2010. “Unpredictable Voters in Ideal Point Estimation.” Political
Analysis 18(2):151–171.
Lunn, D.J., A. Thomas, N. Best and D. Spiegelhalter. 2000. “WinBUGS–a Bayesian modelling framework: concepts, structure, and extensibility.” Statistics and Computing 10:325–
337.
32
Marshall, Monty and Keith Jaggers. 2010. “Polity IV Project: Political Regime Characteristics and Transitions, 1800–2004.”.
Martin, Andrew and Kevin Quinn. 2002. “Dynamic Ideal Point Estimation via Markov Chain
Monte Carlo for the U.S. Supreme Court, 1953-1999.” Political Analysis 10:405–419.
Melton, James and Tom Ginsburg. 2012. “Does De Jure Judicial Independence Really Matter? A Reevaluation of Explanations of Judicial Independence.” Paper prepared for presentation at the American Political Science Association Meetings, August 30 – September
2, 2012.
North, Douglass and Barry Weingast. 1989. “Constitutions and Commitment: The Evolution
of Institutions Governing Public Choice in 17th Century England.” Journal of Economic
History 49(4):803–832.
O’Hagan, Anthony, Caitlin E. Buck, Alireza Daneshkhah, J. Richard Eiser, Paul H. Garthwaite, David J. Jenkinson, Jeremy E. Oakley and Tim Rakow. 2006. Uncertain Judgements: Eliciting Experts’ Probabilities. West Sussex: Wiley.
Pemstein, Daniel, Stephen A. Meserve and James Melton. 2010. “Democratic Compromise:
A Latent Variable Analysis of Ten Measures of Regime Type.” Political Analysis 18:426–
449.
Przeworski, Adam, Michael Alvarez, José Antonio Cheibub and Fernando Limongi. 2000.
Democracy and Development: Political Insititutions and Well-Being in the World, 1950–
1990. New York: Cambridge University Press.
Quinn, Kevin M. 2004. “Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses.” Political Analysis 12(4):338–353.
R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
http://www.R-project.org.
Rı́os-Figueroa, J. 2007. “Fragmentation of Power and the Emergence of an Effective Judiciary
in Mexico, 1994–2002.” Latin American Politics & Society 49:31–57.
Rı́os-Figueroa, Julio and Jeffrey K. Staton. Forthcoming. “An Evaluation of Cross-National
Measures of Judicial Independence.” Journal of Law, Economics and Organization .
Rosas, Guillermo. 2009. “Dynamic latent trait models: An application to Latin American
banking crises.” Electoral Studies 28(3):375–387.
Scribner, Druscilla. 2011. Courts, Power, andRights in Argentina and Chile. In Courts in
Latin America. New York: Cambridge University Press.
Shor, Boris and Nolan McCarty. 2011. “The Ideological Mapping of American Legislatures.”
American Political Science Review 105(3):530–551.
33
Staton, Jeffrey K., Christopher Reenock and Marius Radean. Forthcoming. “Judicial Institutions and Democratic Survival.” Journal of Politics .
Stokes, Susan C. 2001. Mandates and Democracy: Neoliberalism by Surprise in Latin America. New York: Cambridge University Press.
Sturtz, Sibylle, Uwe Ligges and Andrew Gelman. 2005. “R2WinBUGS: A Package for Running WinBUGS from R.” Journal of Statistical Software 12(3):1–16.
Tate, Neal C. and Linda Camp Keith. 2009. “Conceptualizing and Operationalizing Judicial
Independence Globally.” Working Paper.
Taylor, Matthew M. 2009. “A Model of Judicial Independence with Illustration from Chavez’s
Venezuela.” Paper presented at the annual American Political Science Association Meeting, Toronto, Canada.
Treier, Shawn and Simon Jackman. 2008. “Democracy as a Latent Variable.” American
Journal of Political Science 52(1):201–217.
van der Linden, Wim J. and Ronald K. Hambleton. 1997. Handbook of modern item response
theory. New York: Springer.
Vanberg, Georg. 2005. The Politics of Constitutional Review in Germany. New York: Cambridge University Press.
Voeten, E. 2007. “The Politics of International Judicial Appointments: Evidence from the
European Court of Human Rights.” International Organization 61:669–701.
Whittington, Keith E. 2005. “Interpose your friendly handÓ: Political Supports for the
Exercise of Judicial Review by the United States Supreme Court.” American Political
Science Review 99(4):583–596.
34
United States of America
Canada
Bahamas
Cuba
Haiti
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1960
1970
1980
1990
2000
2010
1960
1970
Dominican Republic
1980
1990
2000
2010
1960
Jamaica
1970
1980
1990
2000
2010
1960
1970
Trinidad and Tobago
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
1970
Grenada
1980
1990
2000
2010
1960
St. Lucia
1970
1980
1990
2000
2010
1960
St. Vincent and the Grenadines
1970
1980
1990
2000
2010
1960
Antigua & Barbuda
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
1970
Mexico
1980
1990
2000
2010
1960
1970
Belize
1980
1990
2000
2010
1960
1970
Guatemala
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Nicaragua
1980
1990
2000
2010
1960
1970
Costa Rica
1980
1990
2000
2010
1960
1970
Panama
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Guyana
1980
1990
2000
2010
1960
1970
Suriname
1980
1990
2000
2010
1960
1970
Ecuador
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Bolivia
1980
1990
2000
2010
1960
1970
Paraguay
1980
1990
2000
2010
1960
1970
Chile
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
United Kingdom
1980
1990
2000
2010
1960
Ireland
1970
1980
1990
2000
2010
1960
1970
Netherlands
1980
1990
2000
2010
1960
Belgium
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
1970
1980
1990
2000
2010
1960
1970
1980
35
1990
2000
2010
1960
1970
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1970
1980
1990
2000
2010
1970
1980
1990
2000
2010
2000
2010
Luxembourg
1.0
1960
1990
Uruguay
0.8
1970
1970
Argentina
1.0
1960
1980
Brazil
0.8
1970
1970
Peru
1.0
1960
2010
Venezuela
0.8
1970
1970
Colombia
1.0
1960
2000
El Salvador
0.8
1970
1970
Honduras
1.0
1960
1990
St. Kitts and Nevis
1.0
1960
1980
Dominica
1.0
1960
1970
Barbados
2000
2010
1960
1970
1980
1990
France
Monaco
Liechtenstein
Switzerland
Spain
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1960
1970
1980
1990
2000
2010
1960
1970
Andorra
1980
1990
2000
2010
1960
1970
Portugal
1980
1990
2000
2010
1960
Germany
1970
1980
1990
2000
2010
1960
German Federal Republic
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1960
1970
1980
1990
2000
2010
1960
1970
Poland
1980
1990
2000
2010
1960
1970
Austria
1980
1990
2000
2010
1960
Hungary
1970
1980
1990
2000
2010
1960
Czechoslovakia
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
1970
Slovakia
1980
1990
2000
2010
1960
1970
Italy
1980
1990
2000
2010
1960
1970
San Marino
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Macedonia
1980
1990
2000
2010
1960
1970
Croatia
1980
1990
2000
2010
1960
Yugoslavia
1970
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Slovenia
1980
1990
2000
2010
1960
1970
Greece
1980
1990
2000
2010
1960
1970
Cyprus
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
Romania
1970
1980
1990
2000
2010
1960
1970
Soviet Union
1980
1990
2000
2010
1960
1970
Russia
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Lithuania
1980
1990
2000
2010
1960
1970
Ukraine
1980
1990
2000
2010
1960
1970
Belarus
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
1980
1990
2000
2010
1960
1970
1980
36
1990
2000
2010
1960
1970
1980
1990
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1970
1980
1990
2000
2010
2000
2010
Georgia
0.8
1970
1970
Armenia
1.0
1960
1980
Latvia
0.8
1970
1970
Estonia
1.0
1960
2010
Moldova
0.8
1970
1970
Bulgaria
1.0
1960
2000
Kosovo
0.8
1970
1970
Bosnia and Herzegovina
1.0
1960
1990
Albania
0.8
1970
1970
Malta
1.0
1960
1980
Czech Republic
1.0
1960
1970
German Democratic Republic
2000
2010
1960
1970
1980
1990
Azerbaijan
Finland
Sweden
Norway
Denmark
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1960
1970
1980
1990
2000
2010
1960
1970
Iceland
1980
1990
2000
2010
1960
Cape Verde
1970
1980
1990
2000
2010
1960
Sao Tome and Principe
1970
1980
1990
2000
2010
1960
Guinea−Bissau
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
1970
Gambia
1980
1990
2000
2010
1960
1970
Mali
1980
1990
2000
2010
1960
1970
Senegal
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Niger
1980
1990
2000
2010
1960
1970
Ivory Coast
1980
1990
2000
2010
1960
Guinea
1970
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Sierra Leone
1980
1990
2000
2010
1960
1970
Ghana
1980
1990
2000
2010
1960
1970
Togo
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
Gabon
1970
1980
1990
2000
2010
1960
1970
Central African Republic
1980
1990
2000
2010
1960
1970
Chad
1980
1990
2000
2010
Congo
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
1970
Uganda
1980
1990
2000
2010
1960
1970
Kenya
1980
1990
2000
2010
1960
1970
Tanzania
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Somalia
1980
1990
2000
2010
1960
1970
Djibouti
1980
1990
2000
2010
1960
1970
Ethiopia
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
1980
1990
2000
2010
1960
1970
1980
37
1990
2000
2010
1960
1970
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1970
1980
1990
2000
2010
2000
2010
Angola
0.8
1970
1970
Eritrea
1.0
1960
1990
Rwanda
0.8
1970
1970
Burundi
1.0
1960
1980
Democratic Republic of the Congo
1.0
1960
2010
Nigeria
0.8
1970
1970
Cameroon
1.0
1960
2000
Liberia
0.8
1970
1970
Burkina Faso
1.0
1960
1990
Mauritania
0.8
1970
1970
Benin
1.0
1960
1980
Equatorial Guinea
1.0
1960
1970
2000
2010
1960
1970
1980
1990
Mozambique
Zambia
Zimbabwe
Malawi
South Africa
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1960
1970
1980
1990
2000
2010
1960
1970
Namibia
1980
1990
2000
2010
1960
1970
Lesotho
1980
1990
2000
2010
1960
1970
Botswana
1980
1990
2000
2010
1960
Swaziland
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
1970
Comoros
1980
1990
2000
2010
1960
1970
Mauritius
1980
1990
2000
2010
1960
1970
Seychelles
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Tunisia
1980
1990
2000
2010
1960
1970
Libya
1980
1990
2000
2010
1960
1970
Sudan
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Iraq
1980
1990
2000
2010
1960
1970
Egypt
1980
1990
2000
2010
1960
1970
Syria
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
Israel
1970
1980
1990
2000
2010
1960
Saudi Arabia
1970
1980
1990
2000
2010
1960
1970
Yemen Arab Republic
1980
1990
2000
2010
1960
Yemen
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
1970
Kuwait
1980
1990
2000
2010
1960
1970
Bahrain
1980
1990
2000
2010
1960
Qatar
1970
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
Afghanistan
1970
1980
1990
2000
2010
1960
1970
Turkmenistan
1980
1990
2000
2010
1960
1970
Tajikistan
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
1980
1990
2000
2010
1960
1970
1980
38
1990
2000
2010
1960
1970
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1970
1980
1990
2000
2010
2000
2010
Uzbekistan
0.8
1970
1970
Kyrgyzstan
1.0
1960
1990
Oman
0.8
1970
1970
United Arab Emirates
1.0
1960
1980
Yemen People's Republic
1.0
1960
2010
Jordan
0.8
1970
1970
Lebanon
1.0
1960
2000
Turkey
0.8
1970
1970
Iran
1.0
1960
1990
Algeria
0.8
1970
1970
Morocco
1.0
1960
1980
Madagascar
1.0
1960
1970
2000
2010
1960
1970
1980
1990
Kazakhstan
China
Hong Kong
Mongolia
Taiwan
1.0
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1960
1970
1980
1990
2000
2010
1960
North Korea
1970
1980
1990
2000
2010
1960
1970
South Korea
1980
1990
2000
2010
1960
1970
Japan
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
Pakistan
1970
1980
1990
2000
2010
1960
1970
Bangladesh
1980
1990
2000
2010
1960
1970
Myanmar
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Nepal
1980
1990
2000
2010
1960
1970
Thailand
1980
1990
2000
2010
1960
1970
Cambodia
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1970
1980
1990
2000
2010
1960
1970
1980
1990
2000
2010
1960
1970
Malaysia
1980
1990
2000
2010
1960
1970
Singapore
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Indonesia
1980
1990
2000
2010
1960
Australia
1970
1980
1990
2000
2010
1960
Papua New Guinea
1970
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
Solomon Islands
1980
1990
2000
2010
1960
1970
Kiribati
1980
1990
2000
2010
1960
1970
Tuvalu
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
Nauru
1970
1980
1990
2000
2010
1960
1970
Marshall Islands
1980
1990
2000
2010
1960
Palau
1970
1980
1990
2000
2010
1960
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
0.0
1980
1990
2000
2010
1960
1970
1980
1990
2000
2010
1960
1970
1980
39
1990
2000
2010
1960
1970
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
1980
1990
2000
2010
2000
2010
1980
1990
2000
2010
2000
2010
Samoa
0.8
1970
1970
Federated States of Micronesia
1.0
1960
1990
Tonga
0.8
1970
1970
Fiji
1.0
1960
1980
Vanuatu
0.8
1970
1970
New Zealand
1.0
1960
2010
Philippines
0.8
1970
1970
Brunei
1.0
1960
2000
Vietnam
0.8
Republic of Vietnam
1970
Laos
1.0
1960
1990
Maldives
0.8
1970
1970
Sri Lanka
1.0
1960
1980
Bhutan
1.0
1960
1970
India
1960
1970
1980
1990
Supplementary Appendix
Substantive Applications
Our measure of judicial independence has a range of potential uses in applied research. It
has already been employed in a series of projects that demonstrate its predictive validity.
Conrad and Ritter (Forthcoming) consider a model of international treaty effects on a state’s
human rights behavior. They investigate whether ratification of the international Convention Against Torture (CAT) influences the subsequent behavior of state leaders. Using our
measure, they find that leaders are increasingly likely to ratify the CAT as judicial independence increases, but that this effect is attenuated for leaders with strong tenure, who are
most likely to violate the substantive provisions of the CAT and thus face potential legal
consequences.
In the field of judicial politics, Melton and Ginsburg (2012) use the measure to reconsider findings in Hayo and Voigt (2007), showing a weak but positive relationship between
institutions thought to promote judicial independence and de facto judicial independence
itself. They find that only selection and removal institutions are strongly related to de facto
independence. Finally, in comparative politics, (Staton, Reenock and Radean, Forthcoming)
evaluate the role of judicial independence in regime survival, finding that democratic states
are more stable in the presence of independence courts, but only for a sufficient level of
development, where the enforcement in the legal system of core democratic understandings
is most critical. Similarly, Escriba i Folch and Wright (2012) find that autocratic regimes
are increasingly likely to transition to democracy—and to do so non-violently—as judicial
independence increases.
40
Extending the Model: Measuring Democracy
The method that we have described for synthesizing multiple TSCS indicators can be used
to scale any number of latent variables in comparative research. To illustrate, we extend the
analysis of Pemstein, Meserve and Melton (2010), who employ a standard graded response
IRT model to generate a measure of regime type—the Unified Democracy Score, or UDS—
from a series of expert ratings. We fit our model to the same set of ten indicators, and
compare the resulting latent variable estimates.16 The comparison serves multiple purposes.
First, it enables us to investigate the substantive added-value of the unique elements of our
model. Second, it provides validation of our measurement approach versus existing latent
variable models. And third, it highlights the conditions under which existing methods may
fall short, and where the assumptions built into our model will have the greatest impact.
We estimate levels of democracy for 194 countries from 1946 to 2008, for a total of 9,146
country-years. The posterior predictive distribution again indicates a satisfactory fit of the
model to the data. As an initial validation, the correlation between our estimates of the
latent variable and the UDS is 0.98 overall, and 0.92 in the median country. There are no
systematic outliers, nor is there a pronounced non-linear relationship (Figure 5). However, a
closer inspection reveals some important distinctions between the two series. At the extremes
of the UDS measure, there is a gap between the very highest and lowest country-years (at
2 and -2, respectively) and the remainder of the data. This is a result of how the standard
graded response model separates and spaces apart cases once they approach the edges of the
unbounded scale. By imposing bounds on the latent variable, our model compresses these
16
Details of the original ratings, including their sources, components, and availability, are
given in Pemstein, Meserve and Melton (2010, Table 1). We include all countries with at
least one rating for at least 15 years, and follow the same procedure as the original authors
for converting continuous manifest variables into ordinal-level indicators.
41
Dynamic Heteroscedastic Bounded GRM
1.0
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
● ●●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●● ●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●●●
●●
●
●
●
●● ●
●
●
●
●
●● ●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●● ●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
● ● ●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●● ●
●
●
●
●
●
●● ●
●●
●
●
●
●● ●
●●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●● ●
●●
●
●
●● ●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●●
●
●
●
●● ●
●●
●
●
●
● ● ●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●●
●
●
●
● ●
● ●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●
●
●
●● ●
●
●●
●
●● ● ● ●
●
●
●●
●
●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●● ● ● ●
●
●
●
●
● ●
●●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●● ●●
●
●
●●
●
●
●
●
●● ●
●●
●
●●
●●
●
●
●
●●
●
●●●● ●●
●
●
●
●●
●●●
●● ●●
●
● ●
●
●
●
●●● ●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●●
●●
●
●
●
●
●●
●
●● ●
●
●
● ●
●
● ●
●
●
●
●●●●
●
●
●
●●
●
●● ●
●
●● ●
●
●
● ●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
● ●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●●
● ●
●
● ●
●
●
●●
●
●●
●
●
●●
●
●
●●
●●
●
●
●●●
●
● ●
●
●
●
●
●● ●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●●●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
● ●
●●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
● ●●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●●
● ●
●
●●
●
●●
●●
●●
●
●
●
●●●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●
● ●
●
●● ●●
●
●●
●
●
●●
●●
●
●
●
●●
●
●● ●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●●
●
●
●●● ●●
●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
● ●●
●
●
●
●
●
●
●
●
●
●●
●● ●
●●
●
●
●
●
●
●●●
●●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●
● ●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
●
● ●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●●
● ● ●
● ●●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●●
●●
●
●
●
●
●●
●
●●
●●
●●
●
●
●
●
●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●●
●●
●●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
● ●
●●
●
●●●
●
●
●
●
●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●● ●
● ● ●●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●● ●● ●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●●● ●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
● ●●
●●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●●●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●●●●
●●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●●●●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ●●
●
●● ●
●●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●●
●●
●
●
●●
●●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●●● ●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
● ●●● ●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●●●
●
● ●●
●●
● ●●
●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●● ● ●●
●
●
●
●
●●
●
● ●●
● ●
●
●
●
●
●●
● ●
●●
●●● ●●●
●
●
●●● ●●
●
●
●
●
●
●
●●●
●
● ●● ●●
●
●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●●● ●
●●●
●
●
●
●
●
●
●
●
●
●●
● ●
●●
● ● ●●
●● ●
●
●●
●●●
●
●
●
● ●●●
●●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●●●●
●
●
●●●●●●
●
●
●●
●
●
●
●●
● ●
●
●
●●
●
●●
●
●
●
●
●●
●●
●
●
●●
●
●●
●
●
●
●
● ●●
●●
●
●
●●●
● ●
●
●
●●● ●
●
●
●
●
●
●
● ●
●
●●
●
●
●● ●●
●● ●●
●
●
●●●●
●
●●
●●
●
●
●●●●
●
●
●
●
●●
●
● ●
● ●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●● ●
●
● ●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
● ●●
●
●●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●● ●●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●● ●●●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●
●
●●
●●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●
●●●
●
●
●
●●
●●●●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●●●
●●●●●
●
●
●
●
●
●●
●
●●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●● ● ●
●
●●
●●
●
●
●
● ●●●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●● ●
● ●●
●
●
●
●
●
●●
●
●●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●●
●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●
●●●● ●
●
●
●
●
●●
●
●
● ●●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●●●
●
●●● ●
●
●●
●
●
●
●
●
●●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ● ●
●
●●
●●●
●
●
●
●
●●●
●
●
●●
●
●●● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●●
●
●
● ●●
● ●
●●
●
●
●
● ●●
●
●●●●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
● ●● ●
●
●
●●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●● ●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ● ● ● ●● ●
●
●
●
●●
●
●
●
●
●●
●
●●●
●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●● ●
●
●●
●
●
● ●●
●
●
●
● ●● ●●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●●
●
●
●
●
●●
●
●
●
●
●● ●●
● ●●
●●●●
●
●●●●
●
●
●●
●
●
●
●●
● ●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●●●● ●●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
● ●
●
● ● ●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
● ● ● ● ●●
●
●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
● ●●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●
●
●
●●●
●
●●
●●
●
●
●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●
●● ●
●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●
●
●
●●
●
●
●● ●
●
● ●●
●
●
●
●
●
●
●
●
●
●
● ●● ●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●● ●●
●●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●●●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●● ●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●●● ●●
●
●●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
● ●
●●
●●
●
●
●
●●
●
●●
●
●
●
●
●
●
●●●
●
●●●●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●●●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●● ●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●●
●●
●
●●
●
●
● ●
●
●●
●
●●● ●●
● ●●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●● ●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
0.8
0.6
0.4
0.2
0.0
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
−2
−1
0
1
2
Unified Democracy Score
Figure 5: Estimated levels of democracy, by country and year, using the dynamic, bounded
IRT model described in this paper, versus the Pemstein, Meserve and Melton (2010) UDS.
country-years at zero and one, which we consider to be a more valid representation of the
underlying political reality.
Other differences between our estimates and the UDS become apparent when we examine
the time trends in individual countries (Figure 6). The most obvious consequence of our
modeling assumptions is the greater smoothness of the estimated series. In contrast, the
UDS estimates treat each country-year as independent, making them more susceptible to
random yearly fluctuations in the observed indicators, and highly sensitive to the particular
set of ratings that happen to be observed in a given country-year. The same two problems
arose when considering the choice of the mean as a measure of the latent variable (e.g.,
Figure 3). Indeed, Figure 6 demonstrates that the graded response model-based UDS tracks
extremely closely with an alternative measure based simply upon rescaling each indicator to
the unit range, and taking the average.
In linking consecutive years within each country, our model remains able to detect both
gradual and rapid adjustments in countries’ level of democracy. The model consistently rates
Switzerland as highly democratic, despite a single anomalous series that is present prior to
1970. In Portugal, the latent variable estimates from our model transition from autocratic
42
Switzerland
Portugal
Syria
Pakistan
1.0
1.0
1.0
1.0
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.4
0.4
0.4
0.4
0.2
0.2
0.2
0.2
0.0
0.0
0.0
0.0
1950
1970
1990
2010
1950
1970
1990
2010
1950
1970
1990
2010
1950
1970
1990
2010
Figure 6: Comparing three estimators of countries’ level of democracy. The thick black line
represents estimates based upon the model described in this paper. The thick gray line shows
the UDS estimates produced by the model of Pemstein, Meserve and Melton (2010), re-scaled
to the unit range. The thin line indicates the mean of the original set of democracy ratings,
also re-scaled to the unit range.
to democratic just as quickly as the UDS measure. Early and late in the series, our estimates
for Portugal are also closer to the boundaries of the scale than the UDS. The trend in Syria,
on the other hand, demonstrates a much more gradual decline, and in a much less erratic
manner than would be implied by either the mean or the UDS measure. And where multiple
swings and reversals do occur over time, as in Pakistan, the model captures each of those
patterns as well.
As researchers begin to draw upon larger numbers of manifest variables, and with lower
rates of missingness, the divergence between estimates based upon our approach, the standard graded response model, and even the simple average, will become less pronounced.
Already in this example, the correlation between both model-based measures and the mean
of the observed ratings is 0.96. But the purpose of estimating a measurement model is not
purely to generate estimates of the latent variable. By specifying a complete measurement
model, researchers can further evaluate the reliability of each indicator, and, country by
country, of the measurement process itself.
43