A Measurement Model for Synthesizing Multiple Comparative Indicators: The Case of Judicial Independence Drew A. Linzer Assistant Professor Department of Political Science Emory University Tarbutton Hall 1555 Dickey Drive Atlanta, GA 30322 [email protected] phone: 404-727-0697 fax: 404-727-4586 Jeffrey K. Staton Associate Professor Department of Political Science Emory University Tarbutton Hall 1555 Dickey Drive Atlanta, GA 30322 [email protected] phone: 404-727-6559 fax: 404-727-4586 Word count: 8418 Abstract We introduce a statistical measurement model for uncovering latent concepts commonly encountered in time series, cross-sectional analyses in comparative politics and international relations. Our approach addresses unique challenges that arise in these data: temporal dependence and boundedness in the latent quantity, and substantial missing data and measurement error in the observable indicators. Unlike in traditional scaling applications, comparative indicators are not produced by the units themselves but rather by independent teams of scholars seeking to measure the concept using more or less similar information. Our model accommodates these differences, enabling scholars to estimate latent variables in a manner directly suited for comparative TSCS research. The resulting measures match conceptual definitions with both greater validity and improved reliability. We apply the model to generate a new measure of judicial independence, which is available for 50 years and 200 states, and which solves several vexing problems that undermine the utility of existing indictors. Introduction The concept of judicial independence has been invoked as a key causal variable across comparative political science for outcomes ranging from regime stability to economic development and the protection of human rights (e.g., Gibler and Randazzo, 2011; La Porta et al., 2004; North and Weingast, 1989). Researchers have also sought to understand the determinants of judicial independence (e.g., Ginsburg, 2003; Helmke, 2005; Vanberg, 2005; Hayo and Voigt, 2007; Clark, 2010; Carrubba, Gabel and Hankla, 2008); and, in light of its importance to various aspects of human welfare, the international community spends considerable resources each year promoting judicial reform and tracking its success (Carothers, 2006). Despite this attention, researchers have continued to face vexing measurement challenges that complicate efforts to answer even very basic questions. The key obstacle is that judicial independence—like many concepts in comparative politics and international relations—is not directly observable. It is latent.1 One response to this challenge relies on elements of judicial behavior (e.g., the choice of a peak court to invalidate a law), which under the right conditions, and under a particular theoretical logic, can reflect a form of independence.2 Although plausible for a limited set 1 We seek to measure a behavioral concept of independence, though the point is no less true for de jure concepts, which attempt to capture the set of institutions that incentivize independent judging. Although a constitutional terms like fixed tenure are typically directly observable, what is latent are the incentives these rules are supposed to create. 2 For example, Rı́os-Figueroa (2007) uses decisions against the PRI in Mexico to evaluate the “effectiveness” of the Mexican Supreme Court. To be effective, a court must first be willing to challenge rulers, which depends on having an “independent” view of the record. Similarly, Helmke (2005) uses Supreme Court decisions against the sitting government in Argentina to evaluate the extent to which judges’ decisions are disconnected from government preferences (i.e. independent) over case outcomes. Of course, simply observing a court strike down a law does not imply that the court is acting “independently.” Whittington (2005) 2 of states, the approach suffers from the fact that there is nothing close to a representative sample of judicial decisions for all countries, much less over time and for many courts in state’s system.3 Recognizing this limitation, scholars looking for global measures of judicial independence have relied upon expert assessments or non-judicial proxies, which are available for many states and years. Yet using a single expert assessment or proxy can be hard to justify when plausible alternatives exist. In addition, these variables contain significant measurement error (Haggard, MacIntyre and Tiede, 2008), and many only offer incomplete coverage of countries and years. These limitations are likely to bias estimates of causal effects, and inflate confidence about the robustness of researchers’ findings (Rı́os-Figueroa and Staton, Forthcoming). Using multiple indicators, on the other hand, raises questions about how to aggregate them sensibly and without substantial loss of information. Whether we care to explain judicial independence, track its evolution over time, or evaluate its effects on human welfare or state choices in the international system, we must do a better job measuring it. In this paper, we propose a solution to the measurement problem using a statistical measurement model that systematically combines information from multiple observable indicators into a single unified measure. Rather than attempt to find the perfect indicator, we argue that there are compelling reasons to draw out the underlying commonalities in all of them. We introduce a dynamic heteroscedastic graded response item response theory (IRT) model for time series, cross-sectional (TSCS) data that is applicable reminds us that leaders can prefer to use their courts to strike down policies that they themselves did not wish to pass in the first place. This does not imply, however, that decisions say nothing about independence. Armed with a strong theoretical argument that identifies conditions under which judges ought to confront strong pressures that undermine independence, observing decisions can be informative. 3 The National High Courts Database (Haynie et al., 2007) has expanded significantly the field’s ability to conduct studies of judicial decision-making in a truly comparative setting, but it only contains data on eleven states. 3 to a wide variety of latent concepts in the study of international relations and comparative politics—e.g., level of democracy, degree of corruption, or military resolve.4 As when seeking to estimate any latent variable, we appeal to features of the world that we can observe to infer characteristics we cannot (Treier and Jackman, 2008, 203). We demonstrate that there are core differences between the assumptions underlying traditional IRT models, and the process by which indicators of latent concepts in comparative TSCS research emerge, which together necessitate a new approach. Our model draws upon the unique features of TSCS data to generate latent variable measures that are superior to those from current methods. Since consecutive years’ observations are not independent, the model constrains the latent variable to trend smoothly within countries over time. The model is sensitive enough to uncover both gradual changes and abrupt shifts in the latent characteristic from year to year. It also seamlessly accommodates indicators with varying amounts of data missingness and measurement reliability. The model further estimates the difficulty of measuring the latent concept in each country, and by so doing, raises questions about the usefulness of the latent concept in all locations. Finally, it constrains estimates of the latent variable to lay on a bounded interval. This not only reflects the idea that many comparative concepts themselves have natural boundaries; but also that countries at the extremes of the scale should be the easiest ones to measure. Models that assume an unbounded scale place the greatest estimation uncertainty around countries at the ends of the scale, which we consider unrealistic in a comparative setting. The model that we describe will enable researchers to uncover the common element in any series of indicators of a latent variable of interest, assess which of these factors are more or less informative, and characterize the uncertainty in the resulting inferences. Yet as we emphasize, the successful use of our method requires that researchers first present a theoretically valid conceptualization of how the latent variable is expected to manifest, and 4 The term graded response in the context of item response theory refers to models for outcome variables that are measured at the ordinal level (Johnson and Albert, 1999, ch. 7). 4 why. As such, our model not only does not replace careful theoretical argumentation—but in fact it depends upon it. We illustrate with application to the measurement of judicial independence, constructing a unified measure that is available for 200 states over 50 years. We also provide estimates of uncertainty for each latent value of judicial independence. The resulting measure is both more complete and more theoretically valid than any existing, individual indicator.5 Our model further permits the calibration of extant indicators, so that a researcher who wishes to use any one of them—rather than our scale—will be able to assess those ratings in relation to the others. Measuring Latent Concepts There are three primary methodological options for quantifying a concept that is not directly observable. First, we can appeal to a single proxy operationalization, as when democracy is measured by a regime score (e.g., Przeworski et al., 2000) or when the strength of a party is measured by its age (e.g., Stokes, 2001). As we discuss above, when multiple, plausibly good alternatives exist, the choice is not obvious. And unless we believe that measurement error is extremely small or that a particular indicator matches our theoretical needs particularly well, asking which indicator to use poses the wrong question. A second approach involves using a deterministic set of rules for systematically combining information from related variables into a single category or scale. One simple but often effective version of this technique is to average together a battery of distinct measures that are each believed to be manifestations of the underlying, latent concept (e.g., Ansolabehere, 5 In a supplementary appendix, we describe some current and potential research uses for the new measure. We also illustrate how our measurement model might be fruitfully applied to another latent concept, regime type, which highlights advantages of our approach in comparison to the scaling model proposed by Pemstein, Meserve and Melton (2010). 5 Rodden and Snyder, 2008). A variety of methods for eliciting and pooling expert ratings are also available (O’Hagan et al., 2006). The benefit to this approach is that the resulting scale will be directly interpretable as a measure of the latent concept—assuming that the aggregation procedure rests on a solid theoretical foundation. In certain applications, however, the association between the latent variable and each of the manifest indicators is not sufficiently evident a priori to use such descriptive techniques. This may be because some manifest variables are better or worse indicators of the underlying scale; or because there is a large amount of measurement error or stochastic noise in the observed indicators. We may also lack sufficient data to reliably implement the aggregation procedure for the entire set of observations. We advocate a third option: a statistical measurement model that allows the data to “reveal” their own underlying structure, including the relative importance of each observed indicator to the latent quantity. Measurement models specify a mathematical function linking the latent concept to the manifest variables probabilistically, with parameters estimated from the data. Scaling models such as the Rasch or item response model are examples of measurement models that infer a latent continuum from multiple categorical variables (van der Linden and Hambleton, 1997; Fox, 2010). Principal component and factor analytic models perform dimensional reduction, converting a series of continuous variables into a smaller number of scales that preserve common information (Joliffe, 2002; Quinn, 2004; Arel-Bundock and Mebane, 2011). These methods offer certain advantages over more basic descriptive approaches, but they must also be justified and interpreted with care. Latent Concepts in TSCS Data Comparative time series cross-sectional data—in which the same set of countries are measured repeatedly over time—share a set of features that complicate the use of standard measurement models. To date, most IRT models in political science have been used for 6 estimating the latent ideologies of legislators, judges, or political parties from recorded votes (e.g., Bailey, 2001; Martin and Quinn, 2002; Clinton, Jackman and Rivers, 2004b; Voeten, 2007; Shor and McCarty, 2011), or of voters from survey responses (e.g., Jessee, 2009; Bafumi and Herron, 2010). In these cases, the observable manifestations of the latent characteristic are produced directly by the units of study: votes are cast; questions are answered. Although researchers may have an idea of what the latent dimension underlying these outcomes represents, this meaning is ultimately not revealed until after fitting the model and interpreting the results. How each of the indicators will relate to the underlying dimension—in either magnitude or direction—is also not necessarily known in advance. In comparative research, by contrast, the indicators of the latent concept are not produced by the units themselves, but rather by teams of scholars attempting to assess the underlying concept. The meaning of the latent dimension is fixed a priori with reference to some theory or definition, and the indicators are selected on the basis of that theory.6 A number of complications emerge from this important distinction in the data-generating process. Most alarmingly, individual researchers need not share the same conceptual definition of the latent concept. Even if they do, they may differ over what should constitute valid indicators. Some research teams may simply be better at measuring certain concepts, perhaps especially so in particular regions or political contexts. And of course, some states may be more difficult to measure than others. On the other hand, because the meaning of the scale is known, researchers will often have a relatively clear idea of which countries belong at the top or bottom of the scale, in a way they might not when trying to determine who is the most “conservative” or “liberal” senator, for example (Clinton, Jackman and Rivers, 2004a). Latent concepts in comparative research, measured by state and year, are also likely to change in a relatively consistent manner over time. Manifest variables, in contrast, will 6 For an example of this research design in the context of measuring regime type, see Treier and Jackman (2008) and Pemstein, Meserve and Melton (2010). 7 contain a greater amount of noise, and be more prone to idiosyncratic yearly fluctuations. Applying measurement models that assume independent and identically distributed observations within countries ignores the temporal element of these data and does not necessarily provide for a smoothed latent trend. Simply averaging together multiple indicators will also not produce smooth year-to-year trends in the latent variable, unless the researcher has access to a large number of manifest variables that are consistently observed across most countries and years. Another consideration is that many latent variables in comparative politics and international relations research have natural lower and upper bounds. Income cannot be more equally distributed than on a perfectly equal basis, nor can it be less equally distributed than when a single person controls all of a state’s resources. The ratio between the size of the winning coalition and the selectorate must lie on the unit interval (Bueno de Mesquita et al., 2003). A central bank or a judiciary can only be so dependent or independent of a sitting government. At some point, we should treat bankers and judges as either the government itself or, well, themselves. Moreover, as we show, even if the underlying concept is unbounded, bounding the latent variable may do little harm to the estimates and produce more sensible estimates of uncertainty. It is also unlikely that countries are equally easy to measure. One possibility is that certain countries (perhaps during particular periods) are simply more poorly-understood. This may be because the set of knowledgeable people about State A (e.g., Russia) is far larger than State B (e.g., Suriname). It may be because some countries collect and make available more (and potentially more detailed) information than others. Or it may be that a concept is more relevant to politics in one place than another. For example, it may be easier to measure “military resolve” in Israel or the United States than in Costa Rica, since the former countries are commonly involved in international conflicts, whereas Costa Rica does not have an army. 8 Finally, from a practical perspective, the availability of manifest indicators of various latent concepts in comparative data is highly uneven. There are typically few theoretically relevant manifest variables per latent variable to begin with—perhaps no more than ten or fifteen. Those that do exist rarely span the entire range of countries or years under investigation. Missing data in most comparative indicators is extensive (Honaker and King, 2010). With such sparse data, a measurement model is preferred that will be robust to the presence of intervals when data are limited or nonexistent. Our approach addresses each of these issues. Model Specification We describe a latent variable measurement model for comparative TSCS data that estimates a smoothly trending value along a unidimensional, bounded interval scale. It therefore allows for a more precise match between conceptual definitions and their empirical manifestations. The latent variable xkt varies across both countries k = 1 . . . K and years t = 1 . . . T . Although x is unobserved, we assume there to be a series of R observed variables y r , also measured at the country-year level, that can be taken as indicators, or ratings of the latent concept x. Individually, each y r is an imperfect and incomplete measure of the latent concept; but together, the y r are able to reveal variation in the level of x across countries and over r time. Our model provides a statistical mechanism for combining the observed ykt to produce reliable estimates of the underlying xkt . The indicators y r that are used to assess x are chosen by the analyst on the basis of a prior theoretical expectation about the implications of larger or smaller values of xkt in the world. Clearly, if we have access to multiple indicators that are specifically designed to capture a particular latent concept, then we are well-advised to use them. But we might also consider proxy indicators, especially those that our theories suggest are manifestations of the underlying concept. For example, states with higher levels of democracy may exhibit 9 more frequent leadership change. States with relatively high levels of social capital may have larger participation (per capita) in amateur sports clubs. States with lower levels of judicial independence might exhibit politically motivated purges or a pattern of decision-making that is highly sensitive to government preferences. In none of these cases are the observable manifestations of the concept equivalent to the concept itself; however, we can have strong theoretical or conceptual justifications for including them nonetheless. We assume only that x and y r are positively associated (otherwise, treating y r as a manifestation of x is likely incorrect); and that 1) certain indicators may be better or worse measures of the latent variable, and 2) certain countries may be more or less reliably measured by those indicators. This could be due to the inherent difficulty of learning about y r in country k, or the possibility that the theory relating x to y r does not apply equally well in all countries. We require no r other micro-level assumptions about the actual process by which xkt results in ykt . It is most common in quantitative comparative research for the y r to be measured as ordinal-level scales. When manifest variables are recorded at the interval level, they are often bounded—from above, below, or both. As noted by Pemstein, Meserve and Melton (2010, 433), “although these scores take on many values and thus resemble interval scales, they do not necessarily provide interval-level information” about the latent variable. The main concern is that the relationship between a continuous y r and the latent x may not be linear. Following Pemstein, Meserve and Melton (2010), we therefore convert any continuous y r into ordinal rankings. Variables containing more observations can be partitioned more finely into larger numbers of discrete categories, with minimal loss of information. This also preserves a consistent interpretation of model parameters across the set of manifest variables. The results of our analysis are robust to the categorization rule. r To link the latent xkt to the manifest ykt , we specify a heteroscedastic graded response IRT model. A series of item coefficients, βr , capture the reliability, or discrimination, of indicator r as a measure of x. Treier and Jackman (2008, 205) describe this parameter as “the extent to which variation in the scores on the latent concepts generates different response 10 probabilities” in the outcome variables. Larger estimates of βr reveal a closer relationship between x and y r . The inverse of this parameter has an equally intuitive interpretation as the “personal error variance” of rater r (Pemstein, Meserve and Melton, 2010, 431). A more “noisy” relationship between x and y r is indicated by estimates of βr that are closer to zero. A second set of coefficients, γk , reflect the overall reliability of indicators y r in country k. The γk act as a country-level multiplier for the βr . If raters evaluate the manifest variables in an inconsistent manner in country k, or if those variables are all more weakly associated with the latent variable itself in that country, then smaller values of γk attenuate the effects of βr .7 Countries in which the latent variable tracks more closely with the manifest variables will have larger γk . The idea is similar to the heteroscedastic IRT model developed by Lauderdale (2010), which applies when certain observational units respond more “unpredictably” to measurement by various manifest indicators. Since we assume that x and y r are positively associated, we restrict both βr ≥ 0 and γk ≥ 0. This also serves as an identifying restriction for the rotational invariance of the underlying scale. Denote as Mr the total number of outcome categories for the rth manifest variable. Also let τrm represent the threshold values for item r in the graded response model, with m = 1 . . . Mr . The τrm divide adjacent ratings on the latent scale, subject to the constraint that τrm > τr(m−1) . Then the link function is written: r Pr(ykt = m) = logit−1 βr γk (τrm − xkt ) − logit−1 βr γk (τr(m−1) − xkt ). 7 (1) An alternative specification might also consider the possibility that it is years, rather than countries, that are better- or worse-measured by the manifest variables. Generalizing the model to include both country- and year-reliability parameters would require a large amount of additional data to produce meaningful estimates; as such, we do not consider this extension. 11 As xkt increases, so does the probability of observing larger-numbered outcomes m on the r r manifest ykt . The only observed values in equation 1 are the ykt on the left-hand side: all other parameters are estimated by the model. We fix τr0 = −∞ and τrMr = ∞, and estimate the remaining Mr − 1 threshold parameters for each y r . The estimated threshold levels τrm will align the observed ratings across the manifest variables, and the spacing between the thresholds for each variable can help indicate which category distinctions are more or less substantively meaningful, relative to the other y r . To estimate xkt and other auxiliary parameters of interest, we adopt a fully Bayesian approach. Theoretical considerations suggest that x is naturally bounded by a logical minimum and maximum value. Since x is a conceptual variable, we arbitrarily place its lower bound at zero (“none” of the latent characteristic) and its upper bound at one (“all” of the latent characteristic). We achieve a smooth trend in our estimate of the latent variable by assuming it to follow a Bayesian random walk prior process, a form of dynamic latent trait model (e.g., Martin and Quinn, 2002; Rosas, 2009). Within country k, we assume that the latent value x in year t has a Normal, but bounded, prior distribution that is centered at the previous value of the latent variable in year t − 1; xkt ∼ N (xk(t−1) , σk2 )I(0, 1). (2) The notation I(0, 1) indicates that xkt can not exceed the unit interval.8 For year t = 1 we assume a non-informative Normal prior, also truncated beyond zero and one. The variance parameters σk2 , which are estimated separately for each country, capture the amount of temporal variation in xkt . In countries where xkt is relatively unchanged from year-to-year— typically due to countries remaining at the maximum or minimum level of x for the entire 8 We prefer this specification to a Beta prior distribution (which is also bounded by zero and one) because it is characterized by mean and variance parameters that are directly interpretable as the quantities of interest: the latent value x, and its year-to-year variability. 12 period of observation—values of σk2 will be close to zero. In countries where xkt experiences more substantial or rapid yearly changes, σk2 can be larger.9 Letting σk2 vary by country ensures that countries where xkt varies greatly are not over-smoothed by comparison to countries where xkt is more stable. The σk are each assigned vague uniform priors. The Bayesian specification allows us to seamlessly handle the frequent occurrence of missr ing data (Jackman, 2000). In each country-year, up to R manifest ratings ykt are observed. r Country-years with greater numbers of observed ykt will have more information from which r to update xkt , based on the estimated βr and γk . When many ykt are missing, the posterior estimate of xkt will be closer to its prior distribution. In cases where every y r is missing for a given country-year, the random walk process bridges the gap by connecting estimates of xkt from the last year in which any y r were observed to those in the next year with an observed value of y r . We finalize the model by placing prior distributions over βr , γk , and τrm . The threshold parameters are assigned vague normal priors. To satisfy the constraint that τrm are increasing over m, we re-sort the thresholds in each iteration of the estimation algorithm, as described by Curtis (2010). For the item- and country-parameters βr and γk , we assume moderately informative half-normal priors with variance 0.1. This ensures that both coefficient vectors are strictly positive. When there are relatively few manifest variables, as in most comparative research, sensibly chosen prior distributions can prevent coefficient estimates from increasing indefinitely (Bailey, 2001). We find that any priors more diffuse than this may allow estimates of βr and γk to grow unrealistically large. 9 If there is reason to believe that a country has experienced a sudden, “known” break in the time series—as when transitioning from dictatorship to democracy, for example—it may be preferable to divide the country into two periods, estimate the latent variable separately for each sub-unit, then reassemble those x values following estimation. 13 Measuring Judicial Independence We consider a behavioral, or de facto concept of judicial independence. A judge is independent insofar as her decisions reflect her evaluation of the legal regard (autonomous decision-making) and insofar as those decisions are respected by government officials who disagree with them (effective decision-making). In aggregate, a judiciary’s independence increases in the independence of its judges.10 The measurement model we develop is particularly appropriate in this context. As we have already noted, extant indicators suffer from related patterns of missing data and measurement error, which our approach addresses. The concept is clearly bounded. And importantly, theories of judicial independence require that our measures are sensitive to both abrupt and more gradual change. Indeed, one set of models of judicial independence suggests that the concept should evolve slowly, as judges carefully manage their case loads and decisions (e.g., Helmke, 2005; Ginsburg, 2003). Yet another set of models suggests that independence should grow or shrink abruptly, either because of shifts in exogeneous political conditions (e.g., Rı́os-Figueroa, 2007) or because critical pieces of new information radically transform a court’s authority (e.g., Carrubba, 2009). To evaluate these claims, we need an approach that is capable of revealing both types of change. Indicators of Judicial Independence To measure latent judicial independence as we conceive it, we make use of eight manifest indicators that have previously been used to assess de facto judicial independence (Table 1). In a review, Rı́os-Figueroa and Staton (Forthcoming, 28) suggest that five of these variables 10 This reflects a common definition, though admittedly not the only one. Specifically, it draws on the Cameron (2002) power concept of judicial independence. We are comfortable labeling this concept judicial power, as well. A typical alternative merely requires autonomy; see Rı́os-Figueroa and Staton (Forthcoming) for a general conceptual discussion. 14 Variable Tate-Keith Howard-Carey CIRI XCONST CIM Feld-Voigt PRS Fraser Measurement level ordinal; 3 categories ordinal; 3 categories ordinal; 3 categories ordinal; 7 categories interval; 0–1 interval; 0–1 interval; 0–6 interval; 0–10 Years available 1990-2004 1992-1999 1981-2009 1960-2008 1960-2000 1980-2003 1984-2008 1995, 2000-2005 Percent missing 66% 83% 38% 19% 37% 77% 60% 93% Source Tate and Keith (2009) Howard and Carey (2004) Cingranelli and Richards (2010) Marshall and Jaggers (2010) Clague et al. (1999) Feld and Voigt (2003) Rı́os-Figueroa and Staton (Forthcoming) Rı́os-Figueroa and Staton (Forthcoming) Table 1: Eight variables used to scale latent judicial independence, and their availability. are designed precisely to capture the concept of judicial independence that we have described. For two of them, Fraser and XCONST, it was unclear which concept of judicial independence was being measured. It also appears that the Howard-Carey indicator is designed to capture “autonomy.” But, since an independent court under our concept must be autonomous, Howard-Carey should provide relevant information. Tate-Keith, Howard-Carey, and CIRI are all drawn from the U.S. State Department Human Rights Country Reports. PRS, Feld-Voigt, and Fraser draw on international surveys of country experts. Polity IV’s measure of executive constraints, XCONST, is coded by the project’s team of experts. Finally, the Contract Intensive Money score (CIM) reflects the proportion of money that is held in banking institutions.11 The logic of this proxy measure is that individuals are more likely to keep their financial assets in banks when they believe that a state’s institutions designed to protect property rights are credible. The judiciary is central among institutions designed to do so. Tate-Keith, Howard-Carey, CIRI, and Feld-Voigt each explicitly attempt to provide a score for the independence of a state’s judiciary. In contrast, XCONST, Fraser, and PRS are hybrid measures, reflecting a team’s evaluation of the judiciary along with other features 11 Specifically, CIM is “the ratio of non-currency money to the total money supply, or (M2-C)/M2 where M2 is a broad definition of money supply and C is currency held outside of banks” (Clague et al., 1999, 188). 15 of the legal system. For example, XCONST is designed to broadly reflect “constraints” on government authority. PRS is a measure of law and order. Fraser provides information on the rule of law, impartiality and the protection of property rights, as well as the independence of the judiciary. On their own, each of the hybrid measures might be less reliable indicators of the underlying concept, however, as the underlying concept is latent and very much related to the non-judicial independence features of these measures, their inclusion is highly reasonable. We investigate 200 countries over the 50-year interval from 1960 to 2009. Because many countries were not in existence for the entire study period, there are a total of 8,197 potentially observable country-years in the data set. Between the eight indicators, data coverage is most consistent in the mid-1990s, and at least five variables are available in each year between 1984 and 2004. If every manifest variable had been measured in every country-year, r there would be a total of 65,576 observed values of ykt . In actuality, we only observe 26,939, for an overall missingness rate of 59%. This missingness is distributed unevenly across the eight indicators, with Fraser missing fully 93% of the possible country-years, and XCONST observed for all but 19%. The first four indicators were coded as ordinal level variables by their original authors. We convert CIM into an eight-category measure with the first category for values lower than 0.3. The remaining seven categories bin observations into increments of 0.1 on the original scale, up to 1. This variable is highly left-skewed, as only 1.6% of observed values fall into the first group. Another 2.7% are in category two, and 4.6% are in category three. Feld-Voigt and Fraser we divide into six categories of equal width from their minimum to maximum value. Feld-Voigt is recorded as a single value that we assume to have held constant from 1980 to 2003. PRS is already nearly categorical (most values are integers from 1 to 6), so to generate an ordinal measure, we round each rating up to the nearest whole number. 16 Estimation and Model Fit We estimate the full Bayesian model using a Markov chain Monte Carlo sampling procedure, implemented in the WinBUGS and R software packages (Lunn et al., 2000; Sturtz, Ligges and Gelman, 2005; R Development Core Team, 2011).12 Our analysis is based upon the posterior distribution of parameters xkt , βr , γk , τrm , and σk2 . To evaluate the fit of the model, we compare the observed distribution of each indicator variable y r to the predicted distributions based upon the model estimates. For each country r and year in which ykt is observed, we resample values of each parameter estimate from their joint posterior distribution, and enter them into the data generating process in equation 1. r This produces one set of predicted values for each ykt . For example, XCONST is observed for 6,678 country-years, so we generate 6,678 hypothetical values of XCONST based upon the fitted model. We then tabulate the frequency with which each rating is predicted for each outcome variable, and repeat this simulation a large number of times. This produces a posterior predictive distribution over the possible outcomes of all eight manifest variables. If the model fits the data, there should be no systematic discrepancies between the observed y r and the posterior predictive distribution. This is what we find (Figure 1). Of the 42 possible ratings, the observed rating frequency falls within the interval spanned by 95% of the posterior predicted frequencies for 41, or 98%, of the ratings. The lone exception is a slight under-prediction of the lowest level of XCONST. However, there is no general indication that the model is prone to over- or under-predicting extreme ratings on the manifest variables; a potential concern given the bounded nature of the latent variable. Of the eight manifest variables in our analysis, CIM is arguably the most likely to report information on a distinct concept. As we note above, to use CIM as a proxy for judicial independence requires a model of how mass financial behavior responds to the performance 12 Convergence is assessed by visual inspection of a series of three chains for adequate mixing and values of the Gelman-Rubin statistic R̂ ≈ 1 (Gelman and Rubin, 1992; Cowles and Carlin, 1996; Brooks and Gelman, 1998; Gelman et al., 2004). 17 Howard−Carey ● 1 1500 ● ● ● ● ● ● 0 ● ● 1500 ● 1000 500 ● 1 2 3 4 5 6 7 8 Rating Level 700 600 500 400 300 200 100 0 2 3 Rating Level 1 ● ● ● 2 3 4 5 Rating Level ● ● ● 1000 ● ● 500 2 3 Rating Level ● ● 1 2 3 4 5 6 Rating Level ● Fraser 800 ● 600 ● ● ● ● 400 200 6 7 150 ● 0 1 1500 PRS ● ● 2000 0 Feld−Voigt Rating Frequency Rating Frequency CIM 500 ● 2000 0 2 3 Rating Level 1000 ● ● Rating Frequency 1 XCONST Rating Frequency ● CIRI 2500 Rating Frequency ● ● 700 600 500 400 300 200 100 0 Rating Frequency Rating Frequency Rating Frequency Tate−Keith 1400 1200 1000 800 600 400 200 0 ● ● ● ● 100 50 ● ● 0 1 2 3 4 5 Rating Level 6 1 2 3 4 5 Rating Level 6 Figure 1: Comparison of the observed distributions of the manifest variables (gray bars) to simulated values from the fitted model. Points denote the mean predicted frequencies; vertical lines span 95% of posterior predicted values. of the judiciary. This model may not be correct. We investigate the sensitivity of our latent variable estimates to the information contained in the CIM variable by re-fitting the model excluding CIM. Estimates of x̂kt from this specification are highly similar to estimates from the original model. The correlation between the two sets of estimates is above 0.95 in more than half of the countries, and 0.97 across the complete set of countryyears. In future applications where missing data is especially pervasive, including additional manifest variables can help guard against any single indicator having too great an influence on estimates of the latent variable. A Latent Measure of Judicial Independence The model produces estimated levels of judicial independence for every country and year in our data set.13 As an initial validation of our results, we plot the estimates for the most 13 An electronic file containing all 8,197 of these values is available from the authors by request. 18 Iraq Democratic Republic of the Congo Turkmenistan Chad Cuba North Korea Bhutan Afghanistan Libya Uzbekistan Haiti Myanmar Equatorial Guinea Sudan Yemen Congo Central African Republic Togo Somalia Cambodia Azerbaijan Zimbabwe Eritrea Belarus Kazakhstan Saudi Arabia Gabon Burundi Cameroon Bangladesh Qatar Yugoslavia Maldives Guinea Tunisia Tajikistan Fiji Iran Burkina Faso Bahrain Angola Liberia Djibouti Gambia Pakistan Ivory Coast Vietnam China Nigeria Rwanda Laos Syria Mauritania Oman Morocco Uganda Bosnia and Herzegovina Tanzania Venezuela Kyrgyzstan Mozambique Egypt Georgia Swaziland Jordan Kenya Mali Guinea−Bissau Indonesia Honduras Seychelles Madagascar United Arab Emirates Armenia Benin Algeria Russia Malaysia Singapore Niger Thailand El Salvador Guatemala Ecuador Guyana Philippines Ukraine Argentina Kuwait Brazil Ethiopia Panama Paraguay Albania Senegal ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.2 0.4 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Suriname Sao Tome and Principe Dominican Republic Zambia Nepal Mexico Colombia Ghana Sierra Leone Nicaragua Sri Lanka Papua New Guinea Lebanon Bolivia Comoros Romania Bulgaria Malawi Slovakia Moldova Mongolia Brunei Turkey Peru Macedonia Croatia Namibia Taiwan Greece South Korea Latvia France India Jamaica Poland Andorra Czech Republic Lithuania Solomon Islands Italy Trinidad and Tobago Hungary Botswana Belize South Africa Uruguay Cape Verde Marshall Islands Tonga Estonia Lesotho Bahamas Spain Chile Malta Grenada Costa Rica Portugal Liechtenstein Barbados Samoa Mauritius Cyprus St. Vincent and the Grenadines Slovenia Belgium St. Lucia San Marino Vanuatu Israel Tuvalu Nauru St. Kitts and Nevis Germany Ireland Monaco Palau United Kingdom Federated States of Micronesia Sweden Switzerland Netherlands Denmark Kiribati New Zealand Antigua & Barbuda Austria Canada United States of America Norway Iceland Dominica Australia Japan Finland Luxembourg 0.6 0.8 1.0 0.0 0.2 0.4 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.8 1.0 Figure 2: Estimates of judicial independence in 191 countries in 2009. Error bars indicate 80% posterior credible intervals. 19 recent year, 2009, along with an associated measure of uncertainty (Figure 2). The ordering predictably places countries such as North Korea, Cuba, and Iraq at the low end of the scale, and countries like Finland, Australia, and Japan at the top. The countries that are estimated to have the highest and lowest levels of judicial independence are also estimated with the smallest amount of posterior uncertainty. This is just as we should expect: the countries that are more difficult to rank are those towards the center of the scale. However, this result is precisely the opposite of what is found by standard IRT models that assume that the latent variable is unbounded. In the analysis of Pemstein, Meserve and Melton (2010), for example, the level of democracy in countries at the absolute top and bottom of the scale are estimated with the greatest amount of uncertainty. This is attributable to the “truncation inherent in the individual component scales” (p. 440). Once countries reach the limits of the scales, there is no more information from which to distinguish one highly democratic country from another. But in our analysis, when countries consistently demonstrate features that reveal very low or very high levels of judicial independence, the estimator can reliably place those countries at exactly the most extreme position on the latent scale. Temporal trends Theories of judicial independence commonly make predictions about changes over time; however, the coarseness of the extant indicators, as well as the prevalence of missing data, has largely prohibited quantitative scholars from even simply describing temporal variation in the concept. On some theoretical accounts, major political events should cause an abrupt change in judicial independence (e.g., North and Weingast, 1989; Ginsburg, 2003), whereas on other accounts, judicial independence should evolve more or less smoothly over time (e.g., Carrubba, 2009; Helmke, 2005). One of the most significant contributions of our model is its ability to estimate any number of different trends in the latent variable over time, based upon the highly flexible random walk prior process. The practical benefit to analysts is the 20 Australia Spain Thailand Argentina 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 1960 1970 1980 1990 2000 2010 1960 1970 Cuba 1980 1990 2000 2010 1960 1970 Venezuela 1980 1990 2000 2010 1960 Chile 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 1960 1970 1980 1990 1980 1990 2000 2010 2000 2010 Egypt 1.0 1960 1970 2000 2010 1960 1970 1980 1990 Figure 3: Trends in judicial independence in eight countries. Points denote the observed data, re-scaled to the unit range, and jittered for improved visibility. For indicators that were originally continuous, we plot the original values rather than the categorical y r . More darkly shaded points indicate manifest variables with larger item discrimination, βr . Seven of the eight y r are shown as circles, but we plot CIM using triangles to make it stand out. The thick black line represents estimates of the latent xkt , with an 80% posterior credible interval. The thinner, jagged line shows the result of re-scaling each of the original eight indicators to the unit range, and then taking the average of the observed values in each country-year. possibility of conducting within-country, temporal comparisons across relatively long time intervals. Our measure is sensitive to both large changes in judicial independence, as well as more gradual or subtle shifts over time. To illustrate, we select eight countries with a variety of temporal patterns of judicial independence, and plot the estimated xkt , surrounded by an 80% posterior credible interval (Figure 3).14 The observed data, re-scaled to the unit interval, are shown as points. We also compare our estimates to a naive descriptive estimate obtained by averaging together the available manifest indicators for each country-year. In each case, the assumptions of the model lend considerable statistical power to the estimator, filtering away a large amount of 14 Plots showing the complete set of estimated time series are attached to the back of this paper. 21 measurement error and stochastic noise, and allowing us to uncover small or higher-order trends with much greater reliability. The model reveals a number of temporal trends that should be familiar to judicial scholars. The first column highlights temporally invariant patterns that emerge at the top and bottom of the latent scale over the entire period of observation. For Australia, the observed indicators cluster tightly together at the top of the scale, reflecting the general consensus that the Australian judiciary, at least relative to the world, was considerably independent over the last 50 years. The Cuban series suggests precisely the opposite. As the time period covers the entire history of the Castro brothers’ regime, it would have been surprising to estimate anything other than what we see in Figure 3. Yet, Cuba also allows us to consider a consequence of simply averaging the indicators. The light grey points near the top of the scale reflect the PRS measure, which picks up orderly societies as much as societies in which judiciaries are independent. This is, however, the only indicator that suggests anything other than low latent independence. When data are scarce, the mean will be overly sensitive to individual, discrepant indicators—here, spiking upwards as soon as PRS enters in 1984. Importantly, PRS exhibits relatively low discrimination and since it is in great disagreement with the remaining indicators, the model largely ignores its claims about Cuba. This is not to say that PRS is a fundamentally invalid indicator or necessarily “wrong” about Cuba. It is may be quite right about Cuba’s law and order status, especially its order status. And we would not necessarily want to always ignore the information PRS provides about judicial independence. In many settings, the PRS is likely to reveal valid information about the concept. The Cuban case demonstrates how the model allows us to include indicators that are often right without giving too much weight to them in contexts where the evidence strongly points to their being wrong. It is more common for judicial independence to vary over time, in response to significant domestic political events and shifts in government policy. The second column displays cases in which there is a prolonged, unidirectional upward or downward trend. The change 22 is abrupt in Spain, responding very strongly to its transition to democracy in the period immediately following Franco’s death. This change likely reflects explicit efforts of reformers to change the nature of constitutional control in Spain via a system of centralized constitutional review (Guillen Lopez, 2007). But it also reflects the fact that the both Spanish legal culture and Spain’s judiciary was not entirely lined up with the Franco regime. There was clearly a segment of the Spanish judiciary willing to place limits on the state, if given the chance (Larkins, 1996, 612). If these accounts are correct, then we should have observed an abrupt change in Spanish judicial independence upon the transition to democracy. Venezuela reflects a different path. The Chavez period has been associated with a gradual erosion of judicial independence, through court packing at various levels and targeted purges (Taylor, 2009). This change is picked up by the latent measure. Critically, the average suggests a massive drop in 1989; however, all that has “happened” is that additional indicators become available at the bottom of the scale. The Venezuelan panel reveals another subtle feature of the model. Note that while the average tracks the two (and only) observed indicators during the 1960s and 1970s, the smoother is consistently lower. The reason is that observed values between 0.7 and 0.8 on the re-scaled XCONST and CIM measures are associated with middling scores for the other indicators. For this reason, the model estimates independence in Venezuela to be closer to the center of the scale during this period. The final set of examples shows states in which judicial independence has turned, sometimes multiple times—and demonstrates the capacity of the model to identify non-monotonic patterns in judicial independence. The Chilean series reflects constraints on the judiciary imposed by the Pinochet regime, which were largely removed after the transition (Scribner, 2011).15 The Thai panel mirrors this pattern. The 1997 Thai constitutional reform was designed to confirm democratic changes in the regime following the Black May Uprising of 1992, and gave new authority to the judiciary to investigate allegations of political corrup15 But see Hilbink (2007) who argues that the Chilean judiciary’s independence was not compromised during this period. 23 tion. The estimates reflect an increase in judicial independence beginning at the end of the 1980s and peaking in the 1990s. Ginsburg (2009, 96) argues, however, that what gains might have been made in the period following the 1997 reform were undermined by Prime Minister Thaksin, who came to power in 2001. He writes, “Gradually, Thaksin began to influence all the independent political institutions, including the Constitutional Court and those designed to prevent corruption.” As in Cuba, using only the average trend produces a series of misleading peaks and valleys, due to the relative paucity of data prior to 1980. The model can also distinguish more subtle trends in judicial independence, as in Egypt. The Egyptian series suggests a rise beginning around 1980, followed by a fall starting in the late 1990s. About this period, Brown (2002, 151-152) describes, “After 1979 (especially after the mid-1980s when the new appointment procedure had begun to seriously affect the composition of the [Constitutional] Court), the Court rapidly distinguished itself as the boldest and most independent judicial actor in Arab history.” Yet in the late 1990s, “The presidency of the Supreme Constitutional Court fell vacant with the retirement of the activist ‘Awad al-Morr. The vacancy was used to pressure the Court into accepting a diminution in its authority to issue retroactive judgments.” Argentina’s panel is highly informative and speaks directly to the validity of the measure. We would expect a measure of governance in Argentina to be highly unstable: observing the peaks and valleys in Figure 3 is not surprising. And in light of theories like Helmke’s (2005), we would expect to observe changes in the series as regimes de-stabilize. It is particularly interesting that the estimate detects an upward change in judicial independence beginning in 1980, prior to the fall of the junta. This is consistent with Helmke’s argument, but is also important to note that the measure responds to features of Argentine judicial politics that are independent of regime change. Chávez, Ferejohn and Weingast (2011) argue that patterns of judicial independence in Argentina have tracked the fragmentation of government closely, precisely because it was difficult to discipline the court absent inter-party coordination. Specifically, they argue that since government was divided during the Alfonsı́n era (1983- 24 1989), the judiciary found space to “challenge the executive” (p. 237). Yet, during the Menem period (1989-1997), where government was unified, judicial independence was deeply compromised. Our estimates support these claims. There is a pronounced increase in judicial independence beginning in the early 1980s, followed by an abrupt change at the end of the decade, precisely when the President began to pack the Supreme Court. Assessing the indicators The model returns information not only about latent judicial independence, but the indicators themselves. Indeed, inferences about the latent variable depend upon the relationship between x and each of the manifest y r in country k. This is captured in our model by the discrimination parameters βr , threshold values τrm , and country-reliability parameters γk . Estimates of βr range from 2.8 for Feld-Voigt to 10.9 for XCONST, with posterior standard deviations in the range of approximately 0.1 to 0.2. As is apparent in Figure 3, indicators with larger discrimination parameters exhibit greater “pull” on the latent variable. But the relative positioning of the threshold estimates for each indicator is also crucial— especially for scaling xkt during intervals when data are sparse. The thresholds reveal how outcomes on each manifest variable align with one another along the latent scale. Different indicators are better at distinguishing values of xkt at different levels of the latent variable (Figure 4). The three-category indicators Tate-Keith, Howard-Carey, and, to a lesser extent, CIRI, all partition the scale in a relatively similar fashion: country-years rated 1 (below the lowest threshold) on one variable are likely to be rated 1 on the others as well. By comparison, ratings of 1 on Tate-Keith and Howard-Carey roughly correspond to ratings of 3 or less on XCONST; and, continuing to read across Figure 4, ratings of up to 5 or 6 on CIM. Because CIM is so left-skewed, a country-year can have a relatively high observed value (say, 5 out of 8) and still belong at the low end of the latent scale. The consequences of this indicator realignment is apparent in Figure 3. CIM, which is plotted using triangular points, is consistently above the smoothed latent trend. While the 25 Threshold value 1.5 1.0 0.5 0.0 ● −0.5 Tate− Keith Howard− Carey CIRI XCONST CIM Feld−Voigt PRS Fraser Figure 4: Threshold estimates τ̂rm for eight indicators of judicial independence. Shading denotes areas of 95% highest posterior density. trend line for the mean is fooled by these “large” values of CIM, our model recognizes that even when CIM is as high as 0.7 (corresponding to category 5), judicial independence should still be considered low. Similarly, when CIM is above 0.9 (category 8), the country-year is very likely to be scaled as close to 1. Country-years manifesting the highest-category outcomes on Feld-Voigt, PRS, and Fraser are even more likely to appear at the top of the latent scale. Estimates of the country-level γk parameters suggest that some countries are, as expected, less conducive than others to scaling judicial independence based on the chosen y r . Following equation 1, larger values of γk amplify the effects of all eight βr in country k, leading the trend in xkt to respond more strongly to the observed data. Where γk is smaller, the data are less informative about the latent value and as a result, the prior receives greater weight and the trend changes more gradually over time. In Table 2, we list the ten countries with the highest and lowest estimated γk . At the low end, countries such as Swaziland, Lesotho, and Qatar are especially poorly measured: the manifest indicators do not trend together in a manner that is systematically similar to other countries in the sample. At the high end, as expected, we observe a series of developed countries including the USA, Sweden, Ireland, Australia, and New Zealand. 26 Ten lowest Swaziland Lesotho Qatar Georgia Kuwait Bahrain Central African Republic German Democratic Republic Republic of Vietnam Czech Republic γk 0.28 0.37 0.53 0.53 0.58 0.58 0.59 0.62 0.63 0.67 (0.23, (0.29, (0.46, (0.44, (0.51, (0.50, (0.50, (0.44, (0.36, (0.54, 0.34) 0.45) 0.59) 0.63) 0.64) 0.65) 0.68) 0.81) 0.89) 0.81) Ten highest Cyprus Sweden Ireland United States of America Cameroon New Zealand Poland Denmark Tanzania Australia γk 1.87 1.81 1.75 1.74 1.73 1.72 1.70 1.69 1.68 1.66 (1.68, 2.08) (1.59, 2.03) (1.52, 1.98) (1.52, 1.95) (1.56, 1.90) (1.50, 1.94) (1.52, 1.89) (1.48,1.91) (1.52, 1.85) (1.43, 1.93) Table 2: Ranking of the ten best- and worst- measured countries, by estimated γk . Parentheses indicate 80% posterior credible intervals. When γk approaches 2, the product βr γk in equation 1 can generate coefficients of 15 or more on certain items. It may nevertheless be the case that judicial independence in country k remains relatively consistent over the entire study period. Large values of γk only create the potential for substantively meaningful effects on xkt , in countries where the latent trend actually varies over time. A more consistent feature of γk is its association with the size of the posterior credible interval surrounding estimates of xkt : smaller γk reveal greater posterior uncertainty in the value of the latent variable. In Kuwait, for example, a variety of different indicators simultaneously rate the country at both the maximum and minimum of the scale for nearly the entire study period. The result is a nearly flat, and fairly uncertain, trend. We compare this to Mexico, which with γk = 1.54 is among the “best-measured” countries in our sample. It is reasonable that for a country adjacent to the United States, about which much is known, both conceptual clarity and measurement quality would be high. The trend in judicial independence in Mexico is nevertheless relatively flat over the observed period. From a theoretical perspective, countries with small γk are interesting for other substantive and diagnostic reasons. These estimates can direct researchers to the countries or regions that may require greater attention to be paid to reducing measurement error. Alternatively, they may indicate countries where existing theories about the consequences of the latent variable are failing to apply, helping to guide the development of new theories that generalize more broadly around the world. Many of the most poorly measured countries in 27 our analysis, for example, are in the Middle East and central Asia. Why this should be the case is not immediately apparent, but warrants a closer examination. Conclusion Latent concepts pervade comparative politics and international relations. The most common approach to measuring them has involved the use of single proxy instruments, where scholars at best consider the robustness of findings to alternative proxies. Researchers have recently approached the problem through the application of measurement models designed for the task. In this paper, we develop a dynamic heteroscedastic graded response IRT model designed for the kind of time series, cross-sectional data common to studies of comparative politics and international relations, and apply it to the problem of measuring latent judicial independence. We emphasize two features of our approach which represent substantive advancements over existing latent variable models. The first is the assumption that latent judicial independence follows a Bayesian random walk prior process, which permits us to smooth our estimates over time. As many concepts in international relations and comparative politics, measured in TSCS data, change gradually over time, this assumption lends a great deal of statistical power. In a context where different indicators generally agree but measurement error is significant, smoothing allows us to cut through considerable noise. Second, by bounding the latent variable, we obtain uncertainty estimates that make sense. We should be more confident about the placement of states at either end of the scale than we are about states in the middle. Bounding the latent variable produces this effect. There are a number of implications for the study of judicial independence. Clearly, comparative judicial scholars have been limited by extant indicators of the concept. Related patterns of measurement error and missing data have not only complicated analyses, but they have rendered some kinds of studies simply impossible to conduct with anything but a 28 rough proxy. The sheer increase in data that we provide addresses these practical problems. We are now in a position to trace judicial independence systematically over a relatively long time period. Importantly, the estimates are neither too sensitive to severe changes in one or two indicators nor are they completely insensitive to massive changes in political context. We hope that this feature of the measure will allow scholars to more precisely evaluate claims about changes over time, claims that suggest both dramatic and subtle change. It is also transparent that no theory of judicial independence that anticipates only one kind of development over time can explain what we observe. Trends around the globe simply fit multiple patterns. Judicial scholars have always known this, but our estimates confirm the point clearly. Some states are highly stable. Some are highly volatile. Others exhibit change, but in one direction, while still others experience considerable backslides or single changes for the better. We hope that our measure will contribute to the further development of theories that are conditional and sensitive to context. Another beneficial feature of the model is its ability to speak to the difficulty of measuring particular countries. We hope that our findings on this score will influence how research teams evaluate their measurement efforts, perhaps by re-thinking why we seem to get some— but not all—countries consistently correct. The findings also offer a tool for funders to evaluate whether to encourage particular work in the states where we seem to be doing a relatively mediocre job. Finally, we believe it is important to stress that scholars who choose to neither use our measure, nor estimate their own, will do well to at least consider averaging the scores to which they have access. Figure 3 certainly suggests that an average is far more unstable than the estimate we provide, and there are years where it seems artificially high or low. That said, it does a reasonably good job of aggregating the scores. In a pinch, averaging these series is better than selecting one indicator on more or less arbitrary grounds. 29 References Ansolabehere, Stephen, Jonathan Rodden and James M. Snyder. 2008. “The Strength of Issues: Using Multiple Measures to Gauge Preference Stability, Ideological Constraint, and Issue Voting.” American Political Science Review 102(2):215–232. Arel-Bundock, Vincent and Walter R. Mebane, Jr. 2011. “Measurement Error, Missing Values and Latent Structure in Governance Indicators.”. Paper presented at the Annual Meeting of the American Political Science Association, Seattle, WA. Bafumi, Joseph and Michael C. Herron. 2010. “Leapfrog Representation and Extremism: A Study of American Voters and their Members in Congress.” American Political Science Review 104(3):519–542. Bailey, Michael. 2001. “Ideal Point Estimation with a Small Number of Votes: A RandomEffects Approach.” Political Analysis 9(3):192–210. Brooks, Stephen P. and Andrew Gelman. 1998. “General Methods for Monitoring Convergence of Iterative Simulations.” Journal of Computational and Graphical Statistics 7(4):434–455. Brown, Nathan J. 2002. Constitutions in a nonconstitutional world: Arab basic laws and the prospects for accountable government. State University of New York Press. Bueno de Mesquita, Bruce, Alastair Smith, Randolph M. Siverson and James Morrow. 2003. The Logic of Political Survival. MIT Press. Cameron, Charles M. 2002. Judicial Independence: How Can You Tell It When You See it? And, Who Cares? In Judicial Independence at the Crossroads: An Interdisciplinary Approach, ed. Steven B. Burbank and Barry Friedman. New York: Sage Publications Inc. pp. 134–147. Carothers, Thomas. 2006. Promoting the rule of law abroad: in search of knowledge. Carnegie Endowment for Intl Peace. Carrubba, Clifford J. 2009. “A Model of the Endogenous Development of Judicial Institutions in Federal and International Systems.” Journal of Politics 71(1):1–15. Carrubba, Clifford J., Matthew Gabel and Charles Hankla. 2008. “Judicial Behavior under Political Constraints: Evidence from the European Court of Justice.” American Political Science Review 435–452(4):109. Chávez, Rebecca Bill, John A. Ferejohn and Barry R. Weingast. 2011. A Theory of the Politically Independent Judiciary: A Comparative Study of the United States and Argentina. In Courts in Latin America. New York: Cambridge University Press. Cingranelli, David L. and David L. Richards. 2010. “The Cingranelli Richards (CIRI) Human Rights Database Coding Manual.” available online at: http://ciri.binghamton.edu/documentation.asp. 30 Clague, Christopher, Philip Keefer, Stephen Knack and Mancur Olson. 1999. “ContractIntensive Money: Contract Enforcement, Property Rights, and Economic Performance.” Journal of Economic Growth 4(2):185–211. Clark, Tom S. 2010. The Limits of Judicial Independence. New York: Cambridge University Press. Clinton, Joshua D., Simon Jackman and Doug Rivers. 2004a. “‘The Most Liberal Senator’ ? Analyzing and Interpreting Congressional Roll Calls.” Political Science and Politics 37(4):805–811. Clinton, Joshua, Simon Jackman and Douglas Rivers. 2004b. “The Statistical Analysis of Roll Call Data.” American Political Science Review 98(2):355–370. Conrad, Courtenay R. and Emily Hencken Ritter. Forthcoming. “Treaties, Tenure, and Torture: The Conflicting Democratic Effects of International Law.” Journal of Politics . Cowles, Mary Kathryn and Bradley P. Carlin. 1996. “Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review.” Journal of the American Statistical Association 91(434):883–904. Curtis, S. McKay. 2010. “BUGS Code for Item Response Theory.” Journal of Statistical Software 36(1). Escriba i Folch, Abel and Joseph Wright. 2012. “Human Rights Prosecutions and Autocratic Survival.” Paper prepared for presentation at the American Political Science Association Meetings. Feld, Lars P. and Stefan Voigt. 2003. “Economic Growth and Judicial Independence: Crosscountry Evidence Using a New Set of Indicators.” European Journal of Political Economy 19(3):497–527. Fox, Jean-Paul. 2010. Bayesian Item Response Modeling: Theory and Applications. New York: Springer. Gelman, Andrew and Donald B. Rubin. 1992. “Inference from Iterative Simulation using Multiple Sequences.” Statistical Science 7(4):457–511. Gelman, Andrew, John B. Carlin, Hal S. Stern and Donald B. Rubin. 2004. Bayesian Data Analysis; Second Edition. Chapman & Hall/CRC. Gibler, D.M. and K.A. Randazzo. 2011. “Testing the Effects of Independent Judiciaries on the Likelihood of Democratic Backsliding.” American Journal of Political Science 55(3):696–709. Ginsburg, Tom. 2003. Judicial Review in New Democracies: Constitutional Courts in Asian Cases. New York: Cambridge University Press. Ginsburg, Tom. 2009. “Constitutional afterlife: The continuing impact of Thailand’s postpolitical constitution.” International journal of constitutional law 7(1):83. 31 Guillen Lopez, E. 2007. “Judicial Review in Spain: The Constitutional Court.” Loyola of Los Angeles Law Review 41:529. Haggard, Stephan, Andrew MacIntyre and Lydia Tiede. 2008. “The Rule of Law and Economic Development.” Annual Review of Political Science 11:205–234. Haynie, Stacia L., Reginald S. Sheehan, Donald R. Songer and C. Neal Tate. 2007. “National High Courts Database.”. URL: http://artsandsciences.sc.edu/poli/juri/highcts.htm Hayo, Bernd. and Stefan Voigt. 2007. “Explaining de facto judicial independence.” International Review of Law and Economics 27(3):269–290. Helmke, Gretchen. 2005. Courts under Constraints. Cambridge: Cambridge University Press. Hilbink, Lisa. 2007. Judges beyond politics in democracy and dictatorship: lessons from Chile. Cambridge University Press. Honaker, James and Gary King. 2010. “What to do about Missing Values in Time Series Cross-Section Data.” AJPS 54(3):561–581. Howard, Robert M. and Henry F. Carey. 2004. “Is an Independent Judiciary Necessary for Democracy?” Judicature 87(6):284–290. Jackman, Simon. 2000. “Estimation and Inference Are Missing Data Problems: Unifying Social Science Statistics via Bayesian Simulation.” Political Analysis 8(4):307–332. Jessee, Stephen A. 2009. “Spatial Voting in the 2004 Presidential Election.” American Political Science Review 103(1):59–81. Johnson, Valen E. and James H. Albert. 1999. Ordinal data modeling. New York: SpringerVerlag. Joliffe, I.T. 2002. Principal Component Analysis, Second edition. New York: Springer-Verlag. La Porta, Rafael, Florencio López de Silanes, Cristian Pop-Eleches and Andrei Shleifer. 2004. “Judicial Checks and Balances.” Journal of Political Economy 112(2):445–470. Larkins, Christopher M. 1996. “Judicial independence and democratization: A theoretical and conceptual analysis.” The American Journal of Comparative Law 44(4):605–626. Lauderdale, Benjamin E. 2010. “Unpredictable Voters in Ideal Point Estimation.” Political Analysis 18(2):151–171. Lunn, D.J., A. Thomas, N. Best and D. Spiegelhalter. 2000. “WinBUGS–a Bayesian modelling framework: concepts, structure, and extensibility.” Statistics and Computing 10:325– 337. 32 Marshall, Monty and Keith Jaggers. 2010. “Polity IV Project: Political Regime Characteristics and Transitions, 1800–2004.”. Martin, Andrew and Kevin Quinn. 2002. “Dynamic Ideal Point Estimation via Markov Chain Monte Carlo for the U.S. Supreme Court, 1953-1999.” Political Analysis 10:405–419. Melton, James and Tom Ginsburg. 2012. “Does De Jure Judicial Independence Really Matter? A Reevaluation of Explanations of Judicial Independence.” Paper prepared for presentation at the American Political Science Association Meetings, August 30 – September 2, 2012. North, Douglass and Barry Weingast. 1989. “Constitutions and Commitment: The Evolution of Institutions Governing Public Choice in 17th Century England.” Journal of Economic History 49(4):803–832. O’Hagan, Anthony, Caitlin E. Buck, Alireza Daneshkhah, J. Richard Eiser, Paul H. Garthwaite, David J. Jenkinson, Jeremy E. Oakley and Tim Rakow. 2006. Uncertain Judgements: Eliciting Experts’ Probabilities. West Sussex: Wiley. Pemstein, Daniel, Stephen A. Meserve and James Melton. 2010. “Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type.” Political Analysis 18:426– 449. Przeworski, Adam, Michael Alvarez, José Antonio Cheibub and Fernando Limongi. 2000. Democracy and Development: Political Insititutions and Well-Being in the World, 1950– 1990. New York: Cambridge University Press. Quinn, Kevin M. 2004. “Bayesian Factor Analysis for Mixed Ordinal and Continuous Responses.” Political Analysis 12(4):338–353. R Development Core Team. 2011. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-07-0. http://www.R-project.org. Rı́os-Figueroa, J. 2007. “Fragmentation of Power and the Emergence of an Effective Judiciary in Mexico, 1994–2002.” Latin American Politics & Society 49:31–57. Rı́os-Figueroa, Julio and Jeffrey K. Staton. Forthcoming. “An Evaluation of Cross-National Measures of Judicial Independence.” Journal of Law, Economics and Organization . Rosas, Guillermo. 2009. “Dynamic latent trait models: An application to Latin American banking crises.” Electoral Studies 28(3):375–387. Scribner, Druscilla. 2011. Courts, Power, andRights in Argentina and Chile. In Courts in Latin America. New York: Cambridge University Press. Shor, Boris and Nolan McCarty. 2011. “The Ideological Mapping of American Legislatures.” American Political Science Review 105(3):530–551. 33 Staton, Jeffrey K., Christopher Reenock and Marius Radean. Forthcoming. “Judicial Institutions and Democratic Survival.” Journal of Politics . Stokes, Susan C. 2001. Mandates and Democracy: Neoliberalism by Surprise in Latin America. New York: Cambridge University Press. Sturtz, Sibylle, Uwe Ligges and Andrew Gelman. 2005. “R2WinBUGS: A Package for Running WinBUGS from R.” Journal of Statistical Software 12(3):1–16. Tate, Neal C. and Linda Camp Keith. 2009. “Conceptualizing and Operationalizing Judicial Independence Globally.” Working Paper. Taylor, Matthew M. 2009. “A Model of Judicial Independence with Illustration from Chavez’s Venezuela.” Paper presented at the annual American Political Science Association Meeting, Toronto, Canada. Treier, Shawn and Simon Jackman. 2008. “Democracy as a Latent Variable.” American Journal of Political Science 52(1):201–217. van der Linden, Wim J. and Ronald K. Hambleton. 1997. Handbook of modern item response theory. New York: Springer. Vanberg, Georg. 2005. The Politics of Constitutional Review in Germany. New York: Cambridge University Press. Voeten, E. 2007. “The Politics of International Judicial Appointments: Evidence from the European Court of Human Rights.” International Organization 61:669–701. Whittington, Keith E. 2005. “Interpose your friendly handÓ: Political Supports for the Exercise of Judicial Review by the United States Supreme Court.” American Political Science Review 99(4):583–596. 34 United States of America Canada Bahamas Cuba Haiti 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1960 1970 1980 1990 2000 2010 1960 1970 Dominican Republic 1980 1990 2000 2010 1960 Jamaica 1970 1980 1990 2000 2010 1960 1970 Trinidad and Tobago 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 1970 Grenada 1980 1990 2000 2010 1960 St. Lucia 1970 1980 1990 2000 2010 1960 St. Vincent and the Grenadines 1970 1980 1990 2000 2010 1960 Antigua & Barbuda 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 1970 Mexico 1980 1990 2000 2010 1960 1970 Belize 1980 1990 2000 2010 1960 1970 Guatemala 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Nicaragua 1980 1990 2000 2010 1960 1970 Costa Rica 1980 1990 2000 2010 1960 1970 Panama 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Guyana 1980 1990 2000 2010 1960 1970 Suriname 1980 1990 2000 2010 1960 1970 Ecuador 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Bolivia 1980 1990 2000 2010 1960 1970 Paraguay 1980 1990 2000 2010 1960 1970 Chile 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 United Kingdom 1980 1990 2000 2010 1960 Ireland 1970 1980 1990 2000 2010 1960 1970 Netherlands 1980 1990 2000 2010 1960 Belgium 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 1960 1970 1980 35 1990 2000 2010 1960 1970 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1970 1980 1990 2000 2010 1970 1980 1990 2000 2010 2000 2010 Luxembourg 1.0 1960 1990 Uruguay 0.8 1970 1970 Argentina 1.0 1960 1980 Brazil 0.8 1970 1970 Peru 1.0 1960 2010 Venezuela 0.8 1970 1970 Colombia 1.0 1960 2000 El Salvador 0.8 1970 1970 Honduras 1.0 1960 1990 St. Kitts and Nevis 1.0 1960 1980 Dominica 1.0 1960 1970 Barbados 2000 2010 1960 1970 1980 1990 France Monaco Liechtenstein Switzerland Spain 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1960 1970 1980 1990 2000 2010 1960 1970 Andorra 1980 1990 2000 2010 1960 1970 Portugal 1980 1990 2000 2010 1960 Germany 1970 1980 1990 2000 2010 1960 German Federal Republic 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1960 1970 1980 1990 2000 2010 1960 1970 Poland 1980 1990 2000 2010 1960 1970 Austria 1980 1990 2000 2010 1960 Hungary 1970 1980 1990 2000 2010 1960 Czechoslovakia 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 1970 Slovakia 1980 1990 2000 2010 1960 1970 Italy 1980 1990 2000 2010 1960 1970 San Marino 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Macedonia 1980 1990 2000 2010 1960 1970 Croatia 1980 1990 2000 2010 1960 Yugoslavia 1970 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Slovenia 1980 1990 2000 2010 1960 1970 Greece 1980 1990 2000 2010 1960 1970 Cyprus 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 Romania 1970 1980 1990 2000 2010 1960 1970 Soviet Union 1980 1990 2000 2010 1960 1970 Russia 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Lithuania 1980 1990 2000 2010 1960 1970 Ukraine 1980 1990 2000 2010 1960 1970 Belarus 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 1960 1970 1980 36 1990 2000 2010 1960 1970 1980 1990 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1970 1980 1990 2000 2010 2000 2010 Georgia 0.8 1970 1970 Armenia 1.0 1960 1980 Latvia 0.8 1970 1970 Estonia 1.0 1960 2010 Moldova 0.8 1970 1970 Bulgaria 1.0 1960 2000 Kosovo 0.8 1970 1970 Bosnia and Herzegovina 1.0 1960 1990 Albania 0.8 1970 1970 Malta 1.0 1960 1980 Czech Republic 1.0 1960 1970 German Democratic Republic 2000 2010 1960 1970 1980 1990 Azerbaijan Finland Sweden Norway Denmark 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1960 1970 1980 1990 2000 2010 1960 1970 Iceland 1980 1990 2000 2010 1960 Cape Verde 1970 1980 1990 2000 2010 1960 Sao Tome and Principe 1970 1980 1990 2000 2010 1960 Guinea−Bissau 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 1970 Gambia 1980 1990 2000 2010 1960 1970 Mali 1980 1990 2000 2010 1960 1970 Senegal 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Niger 1980 1990 2000 2010 1960 1970 Ivory Coast 1980 1990 2000 2010 1960 Guinea 1970 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Sierra Leone 1980 1990 2000 2010 1960 1970 Ghana 1980 1990 2000 2010 1960 1970 Togo 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 Gabon 1970 1980 1990 2000 2010 1960 1970 Central African Republic 1980 1990 2000 2010 1960 1970 Chad 1980 1990 2000 2010 Congo 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 1970 Uganda 1980 1990 2000 2010 1960 1970 Kenya 1980 1990 2000 2010 1960 1970 Tanzania 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Somalia 1980 1990 2000 2010 1960 1970 Djibouti 1980 1990 2000 2010 1960 1970 Ethiopia 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 1960 1970 1980 37 1990 2000 2010 1960 1970 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1970 1980 1990 2000 2010 2000 2010 Angola 0.8 1970 1970 Eritrea 1.0 1960 1990 Rwanda 0.8 1970 1970 Burundi 1.0 1960 1980 Democratic Republic of the Congo 1.0 1960 2010 Nigeria 0.8 1970 1970 Cameroon 1.0 1960 2000 Liberia 0.8 1970 1970 Burkina Faso 1.0 1960 1990 Mauritania 0.8 1970 1970 Benin 1.0 1960 1980 Equatorial Guinea 1.0 1960 1970 2000 2010 1960 1970 1980 1990 Mozambique Zambia Zimbabwe Malawi South Africa 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1960 1970 1980 1990 2000 2010 1960 1970 Namibia 1980 1990 2000 2010 1960 1970 Lesotho 1980 1990 2000 2010 1960 1970 Botswana 1980 1990 2000 2010 1960 Swaziland 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 1970 Comoros 1980 1990 2000 2010 1960 1970 Mauritius 1980 1990 2000 2010 1960 1970 Seychelles 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Tunisia 1980 1990 2000 2010 1960 1970 Libya 1980 1990 2000 2010 1960 1970 Sudan 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Iraq 1980 1990 2000 2010 1960 1970 Egypt 1980 1990 2000 2010 1960 1970 Syria 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 Israel 1970 1980 1990 2000 2010 1960 Saudi Arabia 1970 1980 1990 2000 2010 1960 1970 Yemen Arab Republic 1980 1990 2000 2010 1960 Yemen 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 1970 Kuwait 1980 1990 2000 2010 1960 1970 Bahrain 1980 1990 2000 2010 1960 Qatar 1970 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 Afghanistan 1970 1980 1990 2000 2010 1960 1970 Turkmenistan 1980 1990 2000 2010 1960 1970 Tajikistan 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 1960 1970 1980 38 1990 2000 2010 1960 1970 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1970 1980 1990 2000 2010 2000 2010 Uzbekistan 0.8 1970 1970 Kyrgyzstan 1.0 1960 1990 Oman 0.8 1970 1970 United Arab Emirates 1.0 1960 1980 Yemen People's Republic 1.0 1960 2010 Jordan 0.8 1970 1970 Lebanon 1.0 1960 2000 Turkey 0.8 1970 1970 Iran 1.0 1960 1990 Algeria 0.8 1970 1970 Morocco 1.0 1960 1980 Madagascar 1.0 1960 1970 2000 2010 1960 1970 1980 1990 Kazakhstan China Hong Kong Mongolia Taiwan 1.0 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1960 1970 1980 1990 2000 2010 1960 North Korea 1970 1980 1990 2000 2010 1960 1970 South Korea 1980 1990 2000 2010 1960 1970 Japan 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 Pakistan 1970 1980 1990 2000 2010 1960 1970 Bangladesh 1980 1990 2000 2010 1960 1970 Myanmar 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Nepal 1980 1990 2000 2010 1960 1970 Thailand 1980 1990 2000 2010 1960 1970 Cambodia 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1970 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 1960 1970 Malaysia 1980 1990 2000 2010 1960 1970 Singapore 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Indonesia 1980 1990 2000 2010 1960 Australia 1970 1980 1990 2000 2010 1960 Papua New Guinea 1970 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 Solomon Islands 1980 1990 2000 2010 1960 1970 Kiribati 1980 1990 2000 2010 1960 1970 Tuvalu 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 Nauru 1970 1980 1990 2000 2010 1960 1970 Marshall Islands 1980 1990 2000 2010 1960 Palau 1970 1980 1990 2000 2010 1960 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 0.0 1980 1990 2000 2010 1960 1970 1980 1990 2000 2010 1960 1970 1980 39 1990 2000 2010 1960 1970 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 1980 1990 2000 2010 2000 2010 1980 1990 2000 2010 2000 2010 Samoa 0.8 1970 1970 Federated States of Micronesia 1.0 1960 1990 Tonga 0.8 1970 1970 Fiji 1.0 1960 1980 Vanuatu 0.8 1970 1970 New Zealand 1.0 1960 2010 Philippines 0.8 1970 1970 Brunei 1.0 1960 2000 Vietnam 0.8 Republic of Vietnam 1970 Laos 1.0 1960 1990 Maldives 0.8 1970 1970 Sri Lanka 1.0 1960 1980 Bhutan 1.0 1960 1970 India 1960 1970 1980 1990 Supplementary Appendix Substantive Applications Our measure of judicial independence has a range of potential uses in applied research. It has already been employed in a series of projects that demonstrate its predictive validity. Conrad and Ritter (Forthcoming) consider a model of international treaty effects on a state’s human rights behavior. They investigate whether ratification of the international Convention Against Torture (CAT) influences the subsequent behavior of state leaders. Using our measure, they find that leaders are increasingly likely to ratify the CAT as judicial independence increases, but that this effect is attenuated for leaders with strong tenure, who are most likely to violate the substantive provisions of the CAT and thus face potential legal consequences. In the field of judicial politics, Melton and Ginsburg (2012) use the measure to reconsider findings in Hayo and Voigt (2007), showing a weak but positive relationship between institutions thought to promote judicial independence and de facto judicial independence itself. They find that only selection and removal institutions are strongly related to de facto independence. Finally, in comparative politics, (Staton, Reenock and Radean, Forthcoming) evaluate the role of judicial independence in regime survival, finding that democratic states are more stable in the presence of independence courts, but only for a sufficient level of development, where the enforcement in the legal system of core democratic understandings is most critical. Similarly, Escriba i Folch and Wright (2012) find that autocratic regimes are increasingly likely to transition to democracy—and to do so non-violently—as judicial independence increases. 40 Extending the Model: Measuring Democracy The method that we have described for synthesizing multiple TSCS indicators can be used to scale any number of latent variables in comparative research. To illustrate, we extend the analysis of Pemstein, Meserve and Melton (2010), who employ a standard graded response IRT model to generate a measure of regime type—the Unified Democracy Score, or UDS— from a series of expert ratings. We fit our model to the same set of ten indicators, and compare the resulting latent variable estimates.16 The comparison serves multiple purposes. First, it enables us to investigate the substantive added-value of the unique elements of our model. Second, it provides validation of our measurement approach versus existing latent variable models. And third, it highlights the conditions under which existing methods may fall short, and where the assumptions built into our model will have the greatest impact. We estimate levels of democracy for 194 countries from 1946 to 2008, for a total of 9,146 country-years. The posterior predictive distribution again indicates a satisfactory fit of the model to the data. As an initial validation, the correlation between our estimates of the latent variable and the UDS is 0.98 overall, and 0.92 in the median country. There are no systematic outliers, nor is there a pronounced non-linear relationship (Figure 5). However, a closer inspection reveals some important distinctions between the two series. At the extremes of the UDS measure, there is a gap between the very highest and lowest country-years (at 2 and -2, respectively) and the remainder of the data. This is a result of how the standard graded response model separates and spaces apart cases once they approach the edges of the unbounded scale. By imposing bounds on the latent variable, our model compresses these 16 Details of the original ratings, including their sources, components, and availability, are given in Pemstein, Meserve and Melton (2010, Table 1). We include all countries with at least one rating for at least 15 years, and follow the same procedure as the original authors for converting continuous manifest variables into ordinal-level indicators. 41 Dynamic Heteroscedastic Bounded GRM 1.0 ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ●● ● ●●●● ●● ● ● ● ●● ●●● ●● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●●● ● ●● ●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ● ●●● ● ● ● ●● ● ●● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ●● ●● ● ● ● ●● ● ●● ●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ● ●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ●●● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●●● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ●●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ●● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●●● ● ● ●● ●● ● ●● ● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ●● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ●●● ●●● ● ● ●●● ●● ● ● ● ● ● ● ●●● ● ● ●● ●● ● ● ●●●● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ●● ● ● ●● ●●● ● ● ● ● ●●● ●● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ●●●●●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ●● ● ● ●●●● ● ●● ●● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ●●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ●●●● ● ● ●●● ● ● ● ●● ●●●● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ●●● ●●●●● ● ● ● ● ● ●● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ●●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ● ● ●●●● ● ●●● ● ● ●● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ●● ● ● ●● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ●●● ● ● ● ● ●●● ● ● ●● ● ●●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ●●●● ● ● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ●● ●●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ●● ● ● ● ● ●● ●● ● ●● ●●●● ● ●●●● ● ● ●● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●●●●● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ●● ●●●●●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●● ● ● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●● ● ● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ●● ●● ●●●● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ●●● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ●● ● ● ● ● ● ● ●●● ● ●●●● ● ●● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ●●●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●● ●● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●●● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ●●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● 0.8 0.6 0.4 0.2 0.0 ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −2 −1 0 1 2 Unified Democracy Score Figure 5: Estimated levels of democracy, by country and year, using the dynamic, bounded IRT model described in this paper, versus the Pemstein, Meserve and Melton (2010) UDS. country-years at zero and one, which we consider to be a more valid representation of the underlying political reality. Other differences between our estimates and the UDS become apparent when we examine the time trends in individual countries (Figure 6). The most obvious consequence of our modeling assumptions is the greater smoothness of the estimated series. In contrast, the UDS estimates treat each country-year as independent, making them more susceptible to random yearly fluctuations in the observed indicators, and highly sensitive to the particular set of ratings that happen to be observed in a given country-year. The same two problems arose when considering the choice of the mean as a measure of the latent variable (e.g., Figure 3). Indeed, Figure 6 demonstrates that the graded response model-based UDS tracks extremely closely with an alternative measure based simply upon rescaling each indicator to the unit range, and taking the average. In linking consecutive years within each country, our model remains able to detect both gradual and rapid adjustments in countries’ level of democracy. The model consistently rates Switzerland as highly democratic, despite a single anomalous series that is present prior to 1970. In Portugal, the latent variable estimates from our model transition from autocratic 42 Switzerland Portugal Syria Pakistan 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 0.2 0.2 0.2 0.2 0.0 0.0 0.0 0.0 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 1950 1970 1990 2010 Figure 6: Comparing three estimators of countries’ level of democracy. The thick black line represents estimates based upon the model described in this paper. The thick gray line shows the UDS estimates produced by the model of Pemstein, Meserve and Melton (2010), re-scaled to the unit range. The thin line indicates the mean of the original set of democracy ratings, also re-scaled to the unit range. to democratic just as quickly as the UDS measure. Early and late in the series, our estimates for Portugal are also closer to the boundaries of the scale than the UDS. The trend in Syria, on the other hand, demonstrates a much more gradual decline, and in a much less erratic manner than would be implied by either the mean or the UDS measure. And where multiple swings and reversals do occur over time, as in Pakistan, the model captures each of those patterns as well. As researchers begin to draw upon larger numbers of manifest variables, and with lower rates of missingness, the divergence between estimates based upon our approach, the standard graded response model, and even the simple average, will become less pronounced. Already in this example, the correlation between both model-based measures and the mean of the observed ratings is 0.96. But the purpose of estimating a measurement model is not purely to generate estimates of the latent variable. By specifying a complete measurement model, researchers can further evaluate the reliability of each indicator, and, country by country, of the measurement process itself. 43
© Copyright 2026 Paperzz