How do judges learn from precedent?

How do judges learn from precedent?
Scott Baker and Anup Malani1
Introduction
Federal appellate judges cite cases by sister circuits. Why is that? Common wisdom holds that judges
look to out-of-circuit cases for credible legal arguments -- as persuasive precedent. But hard-core
adherents of the attitudinal model (Segal and Spaeth 2002) – and certain cynical realists and critical legal
studies scholars2 – might argue that judges decide in accordance with their policy or political
preferences and cite cases only to cover up that fact or to legitimize acting out their preferences.
In this paper we address that dispute in the process of asking some more basic questions about how
judges learn from prior, persuasive precedent. We inquire whether judges put any weight on nonbinding precedents or make decisions only based on the case before and/or their own preferences;
whether judges weight all precedents the same or make adjustments based on inferences about how
confident prior judges were in their opinions or about the quality of those judges; and whether judges’
opinions convey to future jurists all the information that their authors gleaned from the case before
them, i.e., whether judges have full information about everything prior judges learned.
These are important questions because the manner in which judges learn not only affects how we
model them and predict their behavior, but also affects their legitimacy and how much authority we
ought to give them. For example, if judges place weight on prior non-binding precedent, then it
becomes less plausible that they are the purely political actors that the strongest version of the
attitudinal model suggests. Alternatively, suppose we find that judges are subject to information
cascades. They rely on prior opinions that do not convey all the information those opinion authors had.
In the case, we may want either to limit the jurisdiction of judges or do the opposite and empower
them. Specifically, we may want to allow judges even more leeway not to publish. In so doing, judges
can self-censor and avoid sparking a cascade when they think a decision is based on a weak evidentiary
basis – i.e., it could have easily been decided the other way.
Our approach first presents a number of models of judicial learning – many taken from the existing
literature -- that offer different answers to these questions. We next test their conflicting predictions in
order to discriminate between them. For our tests, we employ nearly 1000 sequences of federal
appellate cases that address a common legal issue, e.g., whether Family Educational Rights and Privacy
Act allows private rights of action or the Coals Act permits successor liability. Each sequence contains at
least one circuit split. We code the different decisions made by the circuit courts in a sequence as A, B,
C, etc., depending on whether the circuit agreed or disagreed with the circuits that considered the issue
1
Washington University and University of Chicago, respectively. We thank Ray Mao for excellent research
assistance. The authors thank Kaushik Vasudevan, Kevin Jiang, Sarah Wilbanks, Bridget Widdowson and Ray Mao
for outstanding research assistance. We also thank Charles Barzun, William Hubbard, Richard McAdams, Tom
Miles, Bruce Owen, workshop participants at Washington University School of Law, the Max Planck Institute in
Bonn, Northwestern Law School, and ZTH, and participants at the Harvard Conference on Blinding, the L&E Theory
Conference at Yale, American Law & Economics Association, and the Center for Empirical Legal Studies for helpful
comments.
2
For a nice overview of legal realism and a response to its crudest characterization, see Leiter 2003.
1
previously. We end up with a sequence like ABA, AAABB, ABBC, etc. A typical sequence might look as
follows:
Circuit
Decision
Ninth Circuit
A
Eleven Circuit
A
Fifth Circuit
B
Seventh Circuit
B
Fourth Circuit
A
For each legal question the dataset contains the decision reached by each circuit and the order in which
the circuits reached those decisions. In the example above, the dataset reveals that the Fourth Circuit
decided the issue as an “A.” We also know that the Fourth Circuit had access to four other decisions.
Those prior decisions split: 2 As (circuits that agreed with the Fourth Circuit) and 2 Bs (circuits that
reached the opposite result). Finally, we know the order of the past decisions was 2 As followed by 2 Bs.
Depending on how, if at all, judges learn, they might pay attention to both the number of decisions on
each side of an issue and the order of those decisions.
These data are ideal for our purpose. They take advantage of the sequencing of cases to understand
how judges learn from prior cases. Moreover, because they contain splits, they present judges with the
flexibility to weight prior cases in different ways, giving us a lot of leverage to rule in or out different
models of judicial learning.
Our first model posits that judges do not rely at all on prior precedents. This model replicates the hard
core attitudinal model, where judges decide based on their personal preferences, with no deference to
the opinions or decisions of other courts. This assumption implies that decisions on the same legal issue
should be independent of one another. We test this assumption by looking for runs – sequences of two
or more consecutive identical decisions (e.g., two or more As in a row) – in sequences of cases
addressing a common legal issue. We use bootstrap methods to generate an empirical distribution of
runs assuming that decision order is random and compare the actual number and length of runs on our
data to this empirical distribution. We are able strongly to reject that the decisions in our data are
independent of one another.
After rejecting the strong political model, we next consider models of judges who learn from prior
opinions. Such models vary on two dimensions. The first is whether the quality of new information
made available to each judge during adjudication – including information from lower courts decisions,
the factual record, the litigants and their lawyers – and the quality of the each judge herself is constant
or varies across case, i.e., whether cases are of “variable quality”. The second dimension is whether
judicial opinions reveal all the new, relevant information that their authors gleaned during adjudication,
i.e., whether judges consulting prior cases have “full information” about the underlying reasons the
prior judges decided the way they did.
The structure of our analysis replicates models by Daughety and Reinganum (1999), who suggest that
judges may not have full information and may this be subject to information cascades and Talley (1999),
2
who suggests that judges do have full information and are thus protected from cascades. Variable
quality adds a twist to the debate because it suggests that judges can mitigate the impact of cascades, to
the extent they exist, by self-censoring – not publishing – opinions that are of low quality and that
conform to the cascade (Baker and Malani 2014).
Turning to empirics, suppose that case quality does indeed vary from judge to judge, from circuit to
circuit. What should we expect to see in the circuit split data? Take a judge consulting prior precedent
that contained a balanced history – a number of pairs of opposite decisions, like AB or ABAB. If case
quality varies, this judge should be more likely than not to agree with the last case in the sequence. The
reason is the inference about what the prior judges must have known when they decided their own
cases. If the immediate predecessor judge disagreed with the majority of earlier cases, then she must
have a great deal of confidence in the information from her case – otherwise, why create a split?
Understanding as much, the judge examining a balanced history would weigh the decision by his
immediate predecessor more. Of course, such inferences are ruled out if, by assumption, all prior cases
are of the same quality. When we test this non-parametrically in our sequences, we find that judges are
indeed more likely to follow the last case in a balanced history, suggesting case quality is variable.
We find it more challenging to test whether prior opinions convey to judges full information from prior
cases. The reason is that information can be conveyed in a number of ways. At one extreme, a judicial
opinion might simply report all the relevant information the judge gleaned from their own case, the
litigation at hand. At the other extreme, an opinion might report a posterior belief about what the
correct answer is. That belief would reflect all the information the judge gleaned from his own case and
combine it with all the information that judge culled from the prior cases. In the latter setting, we
obtain a potentially testable prediction: judges opinions should depend on the immediately prior
opinion in a sequence, but not on earlier ones. I.e., opinions should follow an AR(1) process, but not an
AR(k), k>1, process. Even this prediction is a challenge to test because, while it is possible to code
decisions, it is harder to code – quantify – opinions, or the value of the posterior belief each future judge
could have extracted from those opinions. Nevertheless we implement empirical tests using vector
autoregressions on both decisions and dissents and with conditional tetrachoric correlations amongst
decisions. The first of these tests suggest that judges do indeed have full information. The second test
is still under construction.
The remainder of our paper is organized into two halves. The first presents our various models of
judicial learning and the testable predictions from each. The second presents our empirical tests of each
prediction.
I. Models of judicial learning
This section lays out a general model of judicial behavior. The model embeds, as special cases, different
specific models of how judges learn from prior courts addressing the same legal question, i.e., nonbinding precedents. The models will not take a position on what was learned but focus on whether
anything was learned and the nature of information conveyed by prior courts.
As noted, we always start with a legal question. Legal questions include, for example, whether the
Affordable Care Act permits payment of health insurance premium subsidies to individuals who buy
insurance on a federally-run exchange rather than a state-run exchange or whether Food and Drug
3
Administration approval of a drug as safe and effective preempts state products liability suits against the
drug. We assume legal questions have a binary answer, 𝐴 or 𝐵 -- “yes” or “no.”
The legal question is presented to an arbitrary number of federal appellate courts. The issue is always
one of first impression in that circuit: there is no prior precedent from the circuit itself or from the
Supreme Court. Whenever the circuit is not in the first position (i.e., is not the first appellate court to
consider the issue), the judges have access to persuasive precedent from sister circuits. Judges observe
the opinions and decisions of those circuits. The question is what impact, if any, do prior decisions and
opinions have on the judge’s decision. If two sister circuits decided the legal question the same way,
does that increase the chance that the third circuit to consider the question will follow the trend? If two
sister circuits gave conflicting answers, is the third circuit in line more likely to follow the first or second
mover?
We construct a model of judicial learning in which we assume that the order in which the courts hear
the case is random. While this is certainly may not be the case -- e.g. litigants may choose to target
some courts before others or certain courts may be more likely to sidestep novel questions than others - we test the assumption in our data and find it to be reasonable (more on this later). The position or
slot in which a court appears in a sequence is indexed by 𝑚. Thus, the circuit that decides after two
sister circuits have already considered the issue sits in position “3”, so 𝑚 = 3.
Entire circuits (unless meeting en banc) do not hear cases; cases are decided by three judge panels.
However, we conflate the panel and the circuit in our analysis. Random assignment of cases to panels in
a circuit3 supports our assumption. While there may be important within-panel effects (e.g., Sunstein,
Schkade & Ellman 2004), we also ignore that in our models to keep them tractable.4
After laying out the models from the existing literature, we ask what predictions those models would
have for the data on sequential decisionmaking by judges. After that, we test the predictions from each
model against the data.
A. No learning
We start with a crude model of judicial behavior: one where policy preferences alone drive decisions.
Legal scholars and some political scientists are skeptical of this model for lots of good reasons (Lax 2011;
Cross & Tiller 2006; Epstein & Knight 2013; Epstein, Landes, and Posner 2013). In this context, however,
it could be an accurate account of decisionmaking. Our data only include cases of first impression in
each circuit. The question we ask is whether the judge looks and learns from the prior persuasive
precedent or instead decides based on her own instincts and the record in the case at hand. In other
words, are the decisions in the circuit split sequence on the same legal question correlated over time or
independent? If judges only consider their preferences, then we would straightforwardly get the
following prediction.
3
While Chilton & Levy (2014) show that assignment is not perfectly random, it is largely random.
That said, in some of our empirical analysis, we will use the existence of dissents to majority opinions to indicate
a less persuasive majority opinion. Moreover, in regression analyses we conducted but do not report, we include
the party of the President that appointed the judge that authors the majority opinion as a regressor.
4
4
Prediction 1: When presented with a case of first impression, if judges are guided solely by policy
preferences, the history of prior case dispositions should have no effect on the judge’s resolution of the
issue.
Of course, it is possible that decisions could be independent even if judges do not simply vote their
preferences. For example, if a judge looks at the facts and arguments in the case before her and,
thinking she is a better judge than anyone else, she may make a decision that is independent of prior
decisions. We cannot rule this out. But even if the driver of the decision is not preferences, we would
uncover that the judge did not dig down in the prior cases for information, i.e., did not respect
persuasive precedent.
B. Herding with constant quality cases
Given a sequence of past decisions reaching the same result, might judges be tempted to ignore their
own instincts and follow the trend? As is well-known, such herding can lead to clustering on the wrong
outcome (Banerjee 1992, Bikhchandani et al. 1992). As noted in the introduction, other scholars have
suggested that the federal circuit courts ruling on the same legal issue also might herd (Daughety and
Reinganum 1999, Talley 1999).
To see why, consider a typical model of this sort. As noted, each legal issue has two possible answers.
The judges provide a decision -- A or B. Assume that there is a correct legal answer (perhaps the one the
Supreme Court would give). The true state or correct legal answer is either “𝛼” or “𝛽.” Greek letters,
here, distinguish the true state of the world (the right answer) from the decisions rendered by the
judges (A or B).
As for payoffs, suppose that the judge decides 𝐴 if she believes that the correct legal answer is more
likely 𝛼 than 𝛽 and 𝐵 otherwise. Judges start with an uninformative prior about the correct rule (i.e.,
𝑃𝑟(𝛼) = 𝑃𝑟(𝛽) = 1/2).
Before issuing a decision, the judge hears the argument, reads the briefs, and consults with the other
panel members. These actions are “private,” meaning they are not easily and costlessly observed by
future judges. Assume that, taken together, these actions lead to a random draw of a “private” signal,
suggesting what to do -- which decision to make. The information contained in the private signal is
probative, but not conclusive: the signal might be right or might be wrong.
The possible values of the signal are 𝑠 ∈ {𝑎, 𝑏}. The signal draws are (conditionally) independent. The
probability the signal identifies the correct outcome is 𝜋 = Pr(𝑠 = 𝑎|𝛼) = Pr(𝑠 = 𝑏|𝛽) ∈ (.5,1]. For
now assume judges receives signals with identical quality or precision – i.e., the value of 𝜋 is the same
for each judge on every circuit.
In addition to the own private signal, judges also observe opinions by prior judges – the persuasive
precedent. The prior opinions only contain information about the prior decision rendered: A or B, and
not the signal that the prior judges received.
With this framework in hand, take a sequence of three decisions by different circuits. Given that judges
start with uninformative priors, the first judge always follows his own private signal. If the signal is 𝑎, she
decides 𝐴. If the signal is 𝑏, she decides 𝐵.
5
The second judge observes the decision of the first judge. From this observation, the second judge can
back out the private signal the first judge received. If the first judge decided A, she must have received a
signal of “a.” If she decided B, she must have received a signal “b.” Suppose that the first judge decides
𝐴. The second judge’s decision will follow if she receives a private signal 𝑎. On the other hand, the
second judge will split and issue a decision 𝐵 upon receiving signal 𝑏.5
Move now to the third judge. For this judge, the inference is more complicated. Suppose that third
judge observes the prior two circuits decide the case as 𝐴, i.e., he sees a history 𝐻 = 𝐴𝐴. This judge
knows that both judge 1 and judge 2 obtained private signals 𝑎. As a result, even if the third judge’s
private signal suggests B is more likely correct, she will rationally ignore her own conflicting private
signal. She will go with the herd, deciding 𝐴 as well.6 The reason: the information contained in two 𝑎
signals outweighs the information contained in one 𝑏 signal.
In this setting, the fourth judge in line learns nothing from the third judge’s decision. The fourth judge
knows that the third judge decides 𝐴 whether she received an 𝑎 or 𝑏 private signal. So the fourth judge
sits in the same position as the third judge in terms of the information available in the case law. She
therefore makes the same choice, no matter the value of her own private signal. She also decides 𝐴. In
this framework, consecutive identical decisions spark an information cascade. The prediction from the
herding model follows:
Prediction 2: In a herding model, the history of decisions generally matters. For example, the probability
of observing an outcome A following two A’s i.e., 𝐻 = 𝐴𝐴 is one.
Prediction 2 shows that any history with two A decisions in a row determines all future decisions. But
what about splits, one A decision followed by a B decision? To analyze that circumstance, suppose that
judge 3 observes her sister circuits split over the issue; e.g., 𝐻 = 𝐴𝐵. From the decisions, the third judge
can make two inferences: (1) the prior judge deciding “A” received an “a” signal and (2) the prior judge
deciding “B” received a “b” signal. The third judge also knows that these signals were of identical
quality. As a result, the conflicting prior decisions cancel out and, in effect, provide the third judge with
no information whatsoever. The third judge is in the same position as judge 1. As a result, she decides
5
We assume that if the judge believes state 𝛼 and state 𝛽 are equally likely after receiving their signal, they issue
the decision that accords with the value of the private signal.
6
An application of Bayes rule confirms the claim in the text. Suppose that the first two judges decided 𝐴 and
the third judge received signal 𝑠 = 𝑏. In that case, Bayes rule implies that Pr(𝛼|𝑠 = 𝑏, 𝐻 = 𝐴𝐴) equals:
Pr(𝑠 = 𝑏, 𝐻 = 𝐴𝐴|𝛼) Pr(𝛼)
Pr(𝑠 = 𝑏, 𝐻 = 𝐴𝐴|𝛼) Pr(𝛼) + Pr(𝑠 = 𝑏, 𝐻 = 𝐴𝐴|𝛽) Pr(𝛽)
Because the signals are conditionally independent and the prior is uninformative, this expression is
Pr(𝛼|𝑠 = 𝑏, 𝐻 = 𝐴𝐴) = (1 − 𝜋)𝜋 2 /[(1 − 𝜋)𝜋 2 + (1 − 𝜋)2 𝜋].
By contrast,
Pr(𝛽|𝑠 = 𝑏, 𝐻 = 𝐴𝐴) = (1 − 𝜋)2 𝜋/[(1 − 𝜋)𝜋 2 + (1 − 𝜋)2 𝜋].
1
Since > , it follows that Pr(𝛼|𝑠 = 𝑏, 𝐻 = 𝐴𝐴) > Pr(𝛽|𝑠 = 𝑏, 𝐻 = 𝐴𝐴).
2
6
based solely on her private signal. This logic applies to any judge who consults a set of prior precedent
that is (1) balanced and (2) does not contain two consecutive identical decisions. This yields the
following prediction.
Prediction 3: In a herding model, a judge consulting a balanced history that does not contain two
consecutive identical decisions will choose A if 𝑠 = 𝑎.
To test this prediction, we need to know the probability that a judge receives a private signal 𝑎. But that
requires knowing the correct answer to the legal question – something we don’t know and are not going
to figure out. All is not lost, however. Given that the order in which judges hear a case is random, the
decision that comes first in a sequence clarifies which signal has expected probability 0.5 or greater.
Thus, any signal that differs from the first decision should on average have probability 0.5 or lower.
Thus, we can test the modified version of the preceding prediction.
Prediction 3’: In the herding model, the probability that a judge facing a balanced history will choose the
last decision in the history is at most 0.5.
C. Herding with variable quality cases
It seems unrealistic to suppose that all judges receive the same quality of information from the
underlying litigation. Some sets of fact are more informative than others. Lawyers vary in quality. Some
judges make better inferences than others. To capture these differences, make two changes to the
model of the preceding section. First, we assume that the precision of the private signal can take one of
two values. With probability 𝑝 the private signal is barely informative. It has a probative value
𝜋 = 𝜋𝐿 = 0.5 + 𝜀, where 𝜀 > 0. With probability 1 − 𝑝, the signal is highly informative, taking a value
𝜋 = 𝜋𝐻 > 𝜋𝐿 . Second, assume that judges do not know the quality the signal (high or low) received by
prior judges.
The analysis proceeds in much the same way as before. Take three circuits deciding a legal question in
turn. The first judge receives her signal. Because the signal is informative (even if only slightly), she
decides the case in accordance with her signal. If the signal says “a”, she decides A.
The second judge receives a new private signal. If that signal matches the decision rendered by the first
judge, he decides the case the same way. If, however, the signal differs from the decision, the second
judge might or might not create a split.
Suppose that the first judge decided the issue as 𝐴 and the second judge obtained a conflicting signal 𝑏,
but the quality of that signal was high, 𝜋𝐻 . The second judge will follow her own conflicting high quality
signal rather than defer to the judgment of the first judge. The second judge knows that the first judge
received an 𝑎 signal because the first judge’s decision reveals as much. The second judge is unsure,
however, whether the first judge’s decision was based on a good or bad signal – based on a good or bad
record, strong or flimsy reasons. The expected quality of the first judge’s signal is
𝐸[𝜋] = 𝑝𝜋𝐿 + (1 − 𝑝)𝜋𝐻 .
By contrast, the second judge knows for sure that her own conflicting signal has good quality,
𝜋𝐻 > 𝐸[𝜋]. As a result, she will follow her own conflicting and higher quality signal.
7
More interesting, if the second judge’s conflicting signal was of low quality, she prefers to follow the first
judge. In that case, the second judge knows her information is lousy; she also knows the first judge’s
information could be lousy or good. Rather than rely on lousy information for sure, the second judge
defers to the first judge’s decision.
As before, the third judge is the one who must make the most difficult inferences from prior precedent.
Consider the decision of a third judge following a split, e.g., following 𝐻 = 𝐴𝐵. In creating the split, the
second judge signaled that she must have made the decision based on a strong record or strong reasons.
Otherwise, she wouldn't have been so bold. The first judge, by contrast, could have made her decision
on a weak or a strong record. The order of the decisions allows the third judge to unwind which prior
precedent is based on better information. Before considering her own private signal, the third judge is
more likely to believe that judge 2 got the decision right as opposed to judge 1.
Now suppose that the third judge draws the 𝑠 = 𝑎 signal, but that signal is not terribly informative.
Maybe the lawyers did not present that compelling case for deciding the case as A. What should the
third judge do? Should he follow his own signal rendered from a lousy record or defer to the decisionmaking of the second judge -- a decision he knows was made on the best possible information? The
following result identifies when a judge following a split will ignore the information contained in his own
private signal and defer to the judge who created the split.
Remark (1) If 𝑝 > 2 − (1/𝜋𝐻 ), a judge who receives a weak private signal that conflicts with the decision
of the second judge will ignore his own private signal and defer to the decision made by the second
judge.
If the third judge draws a high quality signal, he always follows that signal. Combining with the results in
remark 1, this insight allows us to compute the probability of observing the sequence of decisions 𝐴𝐵𝐵.
That is to say, the decision sequence where the first two judges split and the third judge follows the
second judge's position. We can also compute the probability of observing the sequence 𝐴𝐵𝐴 -- the
path where the first two judges split and the third judge follows the first judge's position. Because of the
second judge must be really confident to create a split, the third judge will always find that decision
more persuasive. The first path is more likely to occur. Formally, we have:
Prediction 4: A judges consulting a precedent history 𝐻 = 𝐴𝐵, will be more likely to decide 𝐵 than A if
𝑝 > 2 − (1/𝜋𝐻 ).
D. Judges with perfect information
Up until this point, we assumed that judges only observed prior decisions, not the underlying
information – the private signal – that influenced those decisions. Suppose instead that the judges
observed everything from the past. This is the sort of judge Talley (1999) has in mind. The judges report
their private signals along with any decision. The signals are more informative than the decisions.
Assume again the signals are of constant quality.
Does such a model provide any different predictions from the ones outlined above? Take judge 3.
Suppose he received signal b. Suppose judge 1 revealed that he received the signal “a” and judge 2
revealed that he received the signal b. Judge 3’s belief that the true state is β and, as a result, he should
decide B is given by:
8
Pr(𝛽|𝑏, 𝑎, 𝑎) =
(1 − 𝜋)2 𝜋
(1 − 𝜋)2 𝜋 + (1 − 𝜋)𝜋 2
Consider next judge 4. Suppose he observed signal a. His belief that the true state is β and, as a result,
he should decide B is given by:
(1−𝜋)3 𝜋
Pr(𝛽|𝑎, 𝑏, 𝑎, 𝑎) = (1−𝜋)3 𝜋
+𝜋3 (1−𝜋)
or
Pr(𝛽|𝑎, 𝑏, 𝑎, 𝑎) =
(1 − 𝜋) Pr(𝛽|𝑏, 𝑎, 𝑎)
(1 − 𝜋) Pr(𝛽|𝑏, 𝑎, 𝑎) + 𝜋 Pr(𝛼|𝑏, 𝑎, 𝑎)
If information is fully revealed, judge 4’s estimate of the correct legal answer only depends on what
judge 3 reports (his posterior) and the information judge 4 receives from the litigation in his own circuit.
All this, of course, is a well-understood property of Bayes rule. We rehearse it here because it yields
another – and different prediction – for the empirical results.
Prediction 5: If opinions convey information fully, only the immediate prior decision should matter or
influence the subsequent decision.
In terms of the example, judge 4 need not look any further than the decision and opinion of judge 3. He
need not consult the decisions of judge 2 or judge 1; because judge 3’s decision already embeds that
information. As a result, the decisions should follow an AR(1) process.7
II. Tests of models
We now test the several, alternative models presented in the last section. The models are defined by
their assumptions about information and learning, so we will test implications of these assumptions to
discriminate between the models. The following table lists the predictions from each of these
assumptions.
Table 1. Predictions from judicial learning models.
Model/Assumption
Judges do not learn from prior cases
Judges do not have full information on signals in
prior cases and all signals have the same quality
and j
The quality of private signals vary across cases,
thus judges weight prior cases differently
Judges have full information on signals in prior
cases.
Prediction
Prediction 1: Decisions within a sequence are
independent of one another
Prediction 2: There are no splits observed after
the second slot in the sequence of cases
Predictions 3’ and 4: In cases with a balanced
history, decisions are more likely to follow the
immediate prior decision than not
Prediction 5: Conditional on the opinion in the
immediately prior cases, judicial opinions do not
rely on opinions from earlier cases
7
Other, more complicated models of judicial learning also imply that judges need only consult a signal prior
opinion – the one that is most informative about the location of efficient legal rule. See Baker & Mezzetti 2012.
9
A. Data
As noted, the data we employ for our tests are outcomes from nearly 1000 sequences of federal
appellate cases that answer a common legal question and result in a circuit split, defined as conflicting
answers to that question amongst the circuit courts.
The sequences were located by searching backwards from the January 2015 issue of U.S. Law Week
(US).8 We read each editions “Circuit Split Roundup” if present. Each roundup lists several cases that
generate or contribute to circuit splits. The roundup also includes the question addressed by the case,
the other cases in the sequence that also addressed that question, and the answer provided by at least
one case in the sequence. Usually the roundup references another article in USLW that elaborates on a
split. This referenced article lists the answer given by each case in a sequence of cases addressing a
common question. In situations where there is no referenced article, we had a research assistant read
each case in the sequence and hand code the answer it provided. Overall, we coded 977 sequences;9 of
these we borrowed USLW coding for decisions in 713 sequences and hand-coded outcomes in 264 other
sequences. Because USLW only reports on splits, we do not have any sequences in which all courts
make the same decision about a legal question.
While decisions varied depending on the question asked, we relabeled the answer given by the first
court in a sequence to address a question “A”. We labelled the first court that gives an answer that
differs from the prior one “B”, the second decision different from all prior ones “C”, and so on.10 Thus,
as described above, our data take the form of sequences like AAB, ABBAB, AAAABC, etc.11
We gathered a number of covariates for each of the cases in our data. We obtain these from Google
Scholar searches on each case. From Google Scholar, we are able to scrape for each cases the circuit and
date of decision. A very small number of cases in the data (37 of 3841 total cases) are from state
supreme courts addressing federal questions. We leave these cases in the data, but the results are
robust to their exclusion since there are so few. For most cases we are also able to obtain the names of
each judge on a panel and are in the process of scraping whether the whether the decision was
unanimous or with dissent. We use the judges’ names to extract biographical data, such as appointing
8
USLW is available online after mid-2007. Prior to that it is only available in paper form. However, the format of
the online and paper versions are similar and so the data should not vary depending whether USLW was online or
paper-only. We verified this by comparing summary statistics for the online and paper-only data subsamples.
Sequence lengths are slightly shorter (3.736 v. 4.0002) and the fraction of splits truncated because of C decision is
higher (0.0116 v. 0.015) for splits reported in online versions, with both difference significant at 95% confidence
level. Differences in other covariates are not significant.
9
Specifically, we gathered 1001 splits but one of them started with a Supreme Court decision and was followed by
a divergent circuit court case. We dropped that observation. We have 23 sequences for which we are still
recording decisions.
10
For certain statistical tests, it is helpful to randomly assign A or B with equal probability to the first decision.
When we assign B to the first decision, the first different decision is assigned A. We will clarify when we do that.
11
Because some of our tests work better when there are only two states of the world, we sometimes artificially
terminate each sequence just before any C decision appears. Thus AAAABC, for example, would become AAAAB.
We will clarify when this is the case.
10
President (and his party), from the Federal Judicial Center (FJC) database available at
http://www.fjc.gov/history/home.nsf/page/ export.html.12
Table 2 provides a statistical description of the case sequences in our data. To check if the manner in
which we code decisions had an effect on the data we provide summary statistics separately by whether
decisions were coded by USLW or our research assistant.13 Our own hand-coding of sequences
identified generates longer sequences – by number – but shorter sequences – by duration -- and a
smaller fraction that are truncated because of a C decision. In any case, only the difference in the
fraction truncated is statistically significant. To test whether differences in data gathering methodology
has any effect on outcomes, we run each empirical test once on the whole sample and once on just the
subsample of decisions coded by USLW. We report any instances in which the two samples give
qualitatively different results.
Sequences can (and do) have different lengths. This can happen for multiple reasons. First, we chose to
stop observing a sequence the date it was reported in USLW. The reason for doing so is that we wanted
the method for coding a decision to be consistent within sequences. Second, the sequence may be
terminated because the Supreme Court took up the legal question addressed in the sequence and
resolved it, though we do not have the Supreme Court decision in the sequence.14 Although these
truncations may potentially cause selection in the distributions of A’s and B’s over positions in a
sequence, we find little evidence of this. First, we regressed an indicator for A decisions on a sequence
length and found that the coefficients on sequence length (coef=0.019, se=1.21, 0.772) and length
squared (-0.000, 0.14) were small and insignificant. Second, as a precaution, we either include sequence
fixed or random effects or include sequence length and length squared as coefficients in our regression
analysis where possible.
B. Random order of courts in a sequence
A critical assumption of all our models is that courts or circuits are chosen in random order to answer a
legal question. If this assumption were violated, then it could be that judges do not consider outcomes
in prior cases but decisions are not independently distributed over time because, e.g., litigants seek out
courts where outcomes are likely to be similar to those in prior cases. Likewise, it is possible that judges
do consider outcomes in prior cases but that decisions look independently distributed because, e.g.,
litigants seek out courts where outcomes are likely to differ from prior cases. In other words, random
12
This matching is imperfect, in some cases because our records from Google Scholar are imperfect and in other
cases because data on the deciding judges are not in the FJC data. For example, some cases are decided by state
supreme courts and those judges are not the in FJC data, which only contain data on federal judges. Where the
matching is incomplete, our regression sample size may be lower than the total number of sequences or cases in
the data.
13
Appendix Table 8 and Table 9 do the same for the distribution of cases across circuits and time.
14
There are two exceptions, both fully reported by USLW. One sequence started with a Supreme Court case and
had 1 divergent circuit court cases; we dropped this observation. Another sequence ended with a Supreme Court
case; we kept the sequence but dropped the Supreme Court case because the latter resolved the circuit split.
Many other sequences with splits may have ultimately been resolved by the Supreme Court, but we purposely did
not observe that resolution to maintain a simple and consistent sampling strategy.
11
court ordering makes it more likely that correlation in case outcomes identifies judicial learning rather
than other mechanisms that may influence judicial decisions.
If court order in each sequence were random, then the probability of observing court 𝑘 in slot 𝑚 in a
sequence would be the same as the probability of observing court 𝑘 across all slots – anywhere – in that
sequence. In other words, if court order were random, you would be no more likely to see the 9th circuit
in the first slot in a sequence than in the third or sixth slot. To test our assumption or random order, we
conduct a non-parametric, bootstrap-derived test of whether the probability of observing a decision
from court 𝑘 in slot 𝑚 is not significantly different than the probability of observing court 𝑘 in any slot.
To implement the test, we note that each sequence of cases is actually a sequence of courts that hear a
legal question. For each of these sequences of courts, we draw a new sequence of courts (without
replacement), i.e., we randomly reorder the courts in each sequence.15 We do this 5000 times for each
sequence, resulting in 5000 drawn court sequences for each actual sequence. We assemble the n-th
draw on each sequence into a data set representing the n-th draw on the set of all sequences, resulting
in 5000 draws on the entire set of sequences. Then, for each court and slot combination, we look at the
fraction of draws on the set of all sequences that have fewer cases from that court in that slot. If the
fraction is below 0.025 or above 0.975, then we can reject random order with 95% confidence in a twosided test.
Table 3 reports the fractional of draws that have fewer cases from court k in slot m than the actual data.
No court-slot combinations in the actual data are significantly more or less likely than from randomly
drawn data.16 Thus we cannot reject random ordering with 95% confidence. This implies that it is
reasonable to use our data to test our models of learning.
C. Prediction 1: Decisions are independent of one another
We first test the prediction that decisions are independent. Following Smeeton & Cox (2003), our test
employs a non-parametric, bootstrap-derived runs test much like the test for random ordering of
courts.17 Runs are a sequence of 2 or more of the same opinion. E.g., the sequence AAABA has 1 run –
AAA and the length of this run is 3. Independent decisions have runs; however, dependent decisions
may have more or less runs and generally have longer runs. One can test if decisions are independent or
15
E.g., if the actual sequence of courts that hears a legal question is 3-6-8-9-11, where each element is the circuit
name, one would draw 5 courts. There would be an equal probability of drawing 3 or 6 or 8 or 9 or 11 for the first
court. If you drew, e.g., 8, then there would be an equal probability of drawing 3 or 6 or 9 or 11 for the second
court. And so on. One would continue until the last court was drawn. None of the courts drawn would be one
other than 3 or 6 or 8 or 9 or 11; and no circuit would be drawn twice – so this is like sampling without
replacement.
16
th
th
The closest combination to significance are the 9 and Federal Circuits in the 9 slot, which is more likely in the
actual data than in 10.7% of draws, implying only 21% significance in a 2-sided test. Moreover, with a multiple
th
th
testing adjustment, we cannot even reject that the 9 and Federal Circuits in the 9 slot are random at 21%
confidence.
17
Because we have a large number of small sequences of variable length, we cannot employ, e.g., the WaldWolfowitz approximation (Wald and Wolfowitz 1940) for the distribution of runs in a sequence. Even exact
distributions are not helpful because many of the sequences are so small that there are less than 20 combinations
and so we can never achieve 95% confidence that the data are not independent.
12
not by testing whether data on actual decisions sequences have significantly more runs than data withg
independent decision sequences.
In order to implement our runs test, we randomly reorder the decisions (A, B, C, D’s etc) in each
sequence in the actual data.18 After reordering, we count the total number of runs and the average
length of runs across the reordered sequences. We repeat this 5000 times to generate the empirical
distribution of number and average length of runs if decisions were independent of one another. The
mean and standard deviation of the number runs in the empirical distribution is 745.9 and 13.4,
respectively. We are able to reject that the actual data, with 785 runs, are independent with a p-value
of 0.0002. The mean and standard deviation of the average length of runs in the empirical distribution is
2.60 and 0.023 respectively. We are able to reject that the actual data, with runs of average length 2.67,
are independent with a p-value of <0.0001. Thus, we are able to reject in our data that the decisions in
a sequence are independent of one another.19
D. Prediction 2: No splits after slot 2.
18
Random reordering is the same as sampling without replacement. E.g., if the actual sequence is AABAB, one
would draw 4 decisions; there would be a 3/5 probability the first decision was A; if it was, then there would be a
½ probability the second would be an A; if it was a B there would be a 2/3 probability that the next decision would
be an A, and so on, til one drew a reorder sequence of 5 cases.
Notice that we only reorder cases within each sequence, not cases across sequences. Because the legal
questions that each addresses is different, the probability of the different possible answers to each question
varies, i.e., the probability of observing the answer arbitrarily labeled A varies across sequences. Reordering
decisions only within sequences preserves the number of A’s (and thus the probability of A decisions) in each
sequence, allowing the probability of A decisions to across the sequence to vary.
19
Our test is unconditional on any covariates. Because we only reorder cases within a sequence, this approach
allows the probability of A decisions to vary across sequences, equivalent to a random effects model. However, we
do not allow correlations across sequence-level means (i.e., fixed effects), let alone condition on any other
covariate, e.g., sequence length, circuit dummies, political party of authoring judge, etc. Conducting a runs test
controlling for covariates, e.g., by regressing the slot 𝑚 decision on decisions in previous slots and covariates as in
Klaasen & Magnus (2001), is challenging because of the dynamic nature of the model we wish to test and the need
to include random or fixed effects to allow for the probability of A-decisions to vary across sequences. With short
length or 𝑇 (in our case average sequence length is 3.97 in the censored data), the random or fixed effect
generates endogeneity (Nickell 1981). One can use IV such as in Anderson & Hsiao (1981) and GMM methods as in
Holtz-Eakin, Newey and Rosen (1988) and Arellano & Bond (1991) to address the problem, but the challenge is that
the error terms has to have an AR(𝑝) structure with finite 𝑝. However, because the private information model
implies that a decision can depend on all prior decisions in a sequence, it is not possible to assume 𝑝 is finite.
Moreover, even if 𝑝 were finite, the longer it is, the fewer the IV’s that are available since they must extend
beyond the range of the error term’s AR structure. Given the short average length of our sequences, this too is a
critical problem.
We have devised a way to address the lack of IVs problem with a double difference method. First, we
difference over slots within a sequence to eliminate fixed or random effects. Second, and more creatively, we
match sequences with identical history and then difference across matched sequences, which eliminates the
lagged decisions from the regression equation. Only differences in covariates across matched sequences remain as
regressors and we can estimate the coefficients on those regressors. We do not present our results because they
do not address our core concern: do current decisions depend on past decisions. Those regressors were
differenced out.
13
The learning model with private information and constant quality signals predicts that there should be
no first splits observed in slot 3 or later (i.e., following any two consecutive A decision every subsequent
decision should be an A). In our data we do observe first splits in slots 3 or later. To determine if this is
just random error, we employ Fisher exact tests on the probability of first observing a split in slot 𝑚 or
later, with 𝑚 > 2. As the first panel in Table 6 shows, the probability of a first split is significantly
different for 𝑚 from 3 to 8.
An issue with this test is that, because we identify our sequences for USLW’s circuit split roundup, our
sample only includes sequences with a split. This means that the probability of a first split in slot 𝑚 or
later is lower in the set of all sequences that in a set of sequences with splits at some point. That said, it
is also the case that if we sampled cases without splits, we would have a larger sample size and so our
estimates of first split probabilities would be more precise, and thus perhaps still significantly different
than 0. One way to address this concern is to sample actual sequences without splits. That is quite
costly and there is no obvious, non-arbitrary method of selecting non-split sequences. Our solution is to
generate pseudo sequences without splits, since we know the decisions in all cases in those sequences.
First, we randomly draw a number for sequence length from the set of actual sequence lengths in our
data. Second, we add to the data a sequence with that length but with only A decisions in each slot. We
repeat this enough times to generate augmented sample where sequences with splits are 50%, 20%,
10%, 5% and 1% of the total sample of real plus pseudo sequences. For each augmented sample, we
against conduct Fisher tests of the probability of observing a first split in slot 𝑚 or larger, with 𝑚 > 2.
As Table 6 show, our results the results do not change. There are a statistically significant number of
splits after slot 2.
E. Predictions 3’ v. 4: Balanced history test
Our next test examines the prediction that, if the private signals that courts receive vary in quality,
judges in cases with a balanced history will place more weight on the opinion in the immediately prior
case than those in earlier cases. The econometrician can test this prediction with data on decisions by
odd slot courts so long as the odd-slot judge has equal probability of receiving an a or b signal. If we
select a large enough sample of cases with only two possible outcomes and with balanced histories, the
law of large numbers gives us confidence that is the case in our sample. So we begin by assembling a
sample of 312, 80 and 24 sequences that never had a C decision and with a balanced history of length 2,
4 and 6 respectively.20
In these samples we conduct unconditional, (Fisher) exact tests of the hypothesis that the probability of
a court with a balanced history issuing a decision that is the same as the decision in the immediately
prior case is 0.5. As reported in Table 4 we find that the probability of following is greater than 0.5 for
each of these balanced history lengths. However, only cases with balanced histories of length 2 have a
probability of following that is significantly different than 0.5. Those with balanced histories of length 4
20
This is meant to approximate focus the sample on sequences that answer questions with only 2 possible
answers. Of course it is possible even these sequences can cases where judges receive a 𝑐 signal. But that simply
means our test is conservative. If the probability of receiving an 𝑎 or 𝑏 signal is less than one, say 𝑞 < 1, then the
law of large numbers implies the expected probability of an 𝑎 or 𝑏 signal in our balanced history sample is
𝑞/2 < ½. So if we find that in fact courts follow the immediately prior case more than ½ the time (which we do),
then it is certainly the case that they do so more than 𝑞/2 the time.
14
are marginally insignificant at the 90% level. The insignificance of results for cases with balanced history
lengths greater than 2 may be due either to sampling error with small sample size or less confidence
that there was an equal probability of the court receiving an a or b signal in these small samples.
To address the sample size problem and to check robustness we also conducted regression analysis to
test the balanced history prediction. This allows us to pool the 2, 4, and 6 length balanced history
sequences to improve sample size. To address the fact that some sequences can be found in the
regression sample multiple times because they were balanced after 2, then 4 and then even perhaps 6
cases, we weighted cases in inverse proportion to how often their sequence appeared in our regression
sample. We included a constant and indicators for cases with 2, 4, and 6 length balanced histories and
relevant covariates. As Table 5 shows, we find that the probability of following is significantly greater
than 0.5 for each case history when we include no covariates. The results are robust to including of
different sets of covariates, including party affiliation, dissents and largely even circuit dummies.
F. Test 4: AR(p) structure of opinions.
We test whether judicial opinions convey all the information from the case they decide as well as from
all prior cases on the same legal question by testing whether opinions are autocorrelated. If opinions
convey full information in the manner just mentioned, then it should be that opinions have an AR(1)
structure and no more. Specifically, it must be that the opinion in slot m is uncorrelated with opinion in
slot 𝑚 − 2 conditional on the opinion in slot 𝑚.
The challenge with executing this test is that, while we can certainly read each opinion, it is difficult to
quantify the information – specifically the judges posterior belief that the right answer to a question is,
e.g., A – in an opinion. All we observe is whether the decision is, e.g., A, whether there was a dissent, or
how many decisions preceded the current one. This means we have multiple problems. First, we may
observe something with fewer dimensions than the information (moments of the judge’s posterior) in
opinions. E.g., maybe the decision measures the mean of the posterior while the existence of a dissent
tells us about the mean (relative to 0.5) and variance of the posterior. Moreover, even when we have
measure on certain moments, there is measurement error. E.g., the decision only tells use if the mean
of the posterior on A is greater than 0.5, not what the actual mean is.
We propose to address the problem in two ways. First, we estimate vector autogressions that on a
vector that includes binary variables that indicate whether the decision in a case is A or not and whether
there was a dissent or not. (To keep matters simple, we employ a sample in which sequences are
censored just before a C is observed.) VARs partly addresses the dimensionality problem because they
include multiple observed proxies on the posterior beliefs reflected in opinions. However, there
remains a risk of measurement error, which prevents identification.21 Therefore, instead of focusing on
coefficients and their significance, we focus on goodness of fit. Although there is no natural goodness of
fit measure for VARs, there are an indirect measures: various information criteria built upon the Jstatistic (a test of the null that the number of moment restrictions is correct) and used for model
selection. Table 7 presents the model selection statistics. The various information criteria (Bayesian,
21
Indeed, our analysis of the probability limit of the coefficient from an AR(1) regression involving just decisions
implies that we cannot even sign the nature of bias on OLS coefficient.
15
Akaike, and Hansen-Quinn) suggest the 1 lag model is the most economical fit. The conclusion is
reinforced by the J-statistic, which fails to reject the null of correct moment restrictions even for the 1
lag model, though it also fails to do so for higher lag models. Thus, we tentatively conclude that the VAR
analysis does not reject the hypothesis that opinions convey full information.
Our conclusion is tentative because it is possible that measurement error corrupts the model selection
criteria. The only way that would not be the case is if, e.g., measurement error in the slot m decision is
uncorrelated with the measurement error in the slot 𝑚 − 𝑘 decisions, 𝑘 > 0. Although that does not
appear to be an unreasonable assumption, we have no way to validate that.
To tackle measurement error more directly, we tackle the AR test a second way, with tetrachoric
correlations between the slot 𝑚 and the slot 𝑚 − 2 decisions, conditioned on the slot 𝑚 − 1 decision.
Note that whether a judge decides A depends, e.g., on her unobserved posterior. If the mean of her
posterior on A is > 0.5, then she decides A; otherwise she does not. In other words, we have a latent
variable model with a 0.5 cutoff. The tetrachoric correlation between two indicators based on latent
variables is the correlation between the two latent variables on which they are based (Pearson, 1901).
Thus, the tetrachoric correlation between two decisions, even in the same sequence, is the correlation
between the two posterior means on which they are based.
The problem with simply calculating the tetrachoric correlations between the decision in slot 𝑚 and the
decision in slot 𝑚 − 2 is that it is not conditioned on the decision in slot 𝑚 − 1, as is required for a test
of the full information model. To address this, we add two steps. First, we demean the decision data
within each sequence to account for the possibility that the probability of an A decision varies across
sequences. E.g., if a sequence is AABAB, which we code as 11010, we subtract the mean of the
sequence from each element in the sequence, making it 0.7, 0.7, -0.3. 0.7, -0.3. Second, we regress the
demeaned decision in slot m on the demeaned decision in slot m-1. (We can also add whether there
was dissent in decision m-1 as a regressor.) We then calculate the residual, which captures all
information in the demeaned decision at m that is orthogonal to observed decision and perhaps the
existence of a dissent at time m-1. Finally, we calculate the tetrachoric correlation between the residual
and the decision at time m-2, which in turn reveals the correlation between the opinion at m
(controlling for the decision and perhaps existence of dissent at time m-1) and the opinion at time m-1.
The one weakness of these approach is that the residual contains information from the opinion at time
m-1 that was not captured by the decision and dissent at time m-1. We are still in the process of
estimating this conditional tetrachoric correlation.
Conclusion
[To be completed.]
References
Baker, Scott and Claudio Mezzetti. 2012. “A Theory of Rational Jurisprudence.” Journal of Political
Economy, 120(3), pages 513 – 551
16
Banerjee, Abhijit. 1992. "A Simple Model of Herd Behavior." Quarterly Journal of Economics, vol. 107,
pp. 797-817.
Bikhchandani, Sushil, Hirshleifer, David and Ivo Welch. 1992. “A Theory of Fads, Fashion, Custom, and
Cultural Change as Informational Cascades.” Journal of Political Economy, vol. 100, pp. 992-1026
Daughety, Andrew and Jennifer Reinganum. 1999. “Stampede to Judgment: Persuasive Influence and
Herding Behavior by Courts.” American Law and Economics Review, vol. 1, pp. 158-89
Shai Danzigera, Jonathan Levavb,and Liora Avnaim-Pesso. 2011. “Extraneous factors in judicial decisions,
Proceedings of the National Academy of Sciences of the United States of America”
Epstein, Lee and Jack Knight. 2013. “Reconsidering Judicial Preferences.” Annual Review of Political
Science, vol. 16, pp. 19.1-19.21.
Epstein, Lee, William M. Landes, and Richard Posner. 2013. The Behavior of Federal Judges: A
Theoretical and Empirical Study of Rational Choice. Cambridge, Mass.: Harvard University Press.
Lax, Jeffrey R. 2011. “The New Judicial Politics of Legal Doctrine.” Annual Review of Political Science, vol.
11, pp. 131-157.Daughety, Andrew and Reinganum, Jennifer. 1999. “Stampede to Judgment: Persuasive
Influence and Herding Behavior by Courts.” American Law and Economics Review, vol. 1, pp. 158-89
Leiter, Brian. 2003. “American Legal Realism.” The Blackwell Guide to Philosophy of Law and Legal
Theory; W. Edmundson and M. Golding, eds. Oxford: Blackwell.
Pritchett, C. Herman. 1948. The Roosevelt Court.
Pritchett, C. Herman. 1941. “Divisions of Opinion Among Justices of the U.S. Supreme Court, 1939–
1941.” Amer. Pol. Sci. Rev., vol. 35: pp. 890-___.
Segal, Jeffrey A. and Harold J. Spaeth. 2002. The Supreme Court and the Attitudinal Model Revisited.
Spitzer, Matthew and Eric, Talley. 2013. “Left, Right, and Center: Strategic Information Acquisition and
Diversity in Judicial Panels.” Journal of Law, Economics, and Organization, vol. 29, pp. 638-680.
Sunstein, Cass, David Schkade, Lisa M. Ellman, and Andres Sawicki. 2006. Are Judges Political. Brookings
Institute: Washington D.C.
Talley, Eric. 1999. “Precedential Cascades: An Appraisal.” Southern California Law Review, vol. 73, pp.
87-__.
Tiller, Emerson and Frank Cross. 2006. “What is Legal Doctrine?” Northwestern Law Review, vol. 100, pp.
517-534.
17
TABLES
Table 2. Summary statistics on various covariates for sequences of federal appellate cases, by how decisions in the sequence
were coded.
USLW
Handcoded
coded
Differdecisions decisions
ence
P-value
All
Sequence length,
3.875
4.114
0.239
0.240
3.970
all decisions (no. of decisions)
(1.878)
(2.075)
(1.945)
Sequence length, A and B
3.704
4.083
0.379
0.032
3.848
decisions (no.)
1.800
(2.069)
(1.896)
Sequences truncated because
0.094
0.015
0.079
0.000*
0.072
C decisions (fraction)
(0.292)
(0.122)
(0.259)
Duration (years)
12.310
10.290
2.020
0.184
11.750
(10.87)
(8.05)
(10.21)
"A" decisions (fraction of cases)
0.554
0.536
0.018
0.889
0.549
(0.161)
(0.182)
(0.167)
Opinion by Republican
0.591
0.575
0.016
0.449
0.586
judges (fraction)
(0.306)
(0.309)
(0.307)
Observations (No. of sequences)
713
264
977
Note: All sequences are identified by USLW, but decisions may be coded by USLW or a
research assistant (hand-coded). Table provides summary statistics by how decisions are
coded. Cells contain means with standard deviations in parentheses. P-values are
calculated using Wilcoxon rank sum test for continuous variables and Fisher exact test for
binary variables. * indicates significance with 95% confidence in a 2-sided test.
18
Table 3. Non-parametric, bootstrap-based test of hypothesis that courts are chosen in random order.
Slot (cells contain probabilities)
Court
1
2
3
4
5
6
7
8
9
Circuit 1
0.278
0.256
0.554
0.645
0.706
0.710
0.744
0.540
0.200
Circuit 2
0.730
0.723
0.440
0.254
0.371
0.357
0.288
0.369
0.208
Circuit 3
0.507
0.492
0.480
0.478
0.501
0.442
0.465
0.432
0.113
Circuit 4
0.202
0.206
0.463
0.673
0.751
0.744
0.744
0.668
0.379
Circuit 5
0.535
0.562
0.501
0.442
0.418
0.427
0.457
0.457
0.451
Circuit 6
0.475
0.468
0.501
0.476
0.632
0.545
0.346
0.520
0.432
Circuit 7
0.495
0.491
0.538
0.518
0.545
0.385
0.437
0.339
0.444
Circuit 8
0.431
0.426
0.540
0.572
0.509
0.341
0.524
0.590
0.455
Circuit 9
0.854
0.863
0.539
0.276
0.140
0.133
0.152
0.165
0.107
Circuit 10
0.292
0.301
0.465
0.625
0.629
0.763
0.701
0.725
0.301
Circuit 11
0.432
0.434
0.455
0.593
0.538
0.484
0.577
0.614
0.444
DC Circuit
0.425
0.421
0.458
0.534
0.505
0.714
0.481
0.551
0.447
Fed Circuit
0.557
0.553
0.647
0.279
0.448
0.444
0.572
0.322
0.107
State Court
0.308
0.298
0.366
0.493
0.520
0.821
0.753
0.958
0.108
Note: Table reports the fraction of draws in which the number of times a given circuit court is in a given slot
(across all sequences) is less than the number of times in that circuit is in that slot in the actual data (across
all sequences). We made 5000 draws. In each draw, we reorder the courts in each actual sequence at
random. Draws do not reallocate courts across sequences.
19
Table 4. Unconditional, exact tests of balanced history prediction.
Balanced history length
2
4
6
Follow last case (fraction)
0.599
0.600
0.583
Std error
0.0015
0.006
0.021
P-value v. 1/2
0.001
0.090
0.540
Obs.
312
80
24
Notes. Sequences tested are sequence that never had a C, D, E, or F
decision. Table calculates the probability a case follows the decision in
the case immediately prior to it, given that the history of cases prior to
the case is balanced. If the balanced history length is 2/4/6, then the
first 2/4/6 cases in the sequence have pairs of cases with equal
numbers of A and B decisions, and we are examining whether the case
in position 3/5/7 follows that in position 2/4/6.
20
Table 5. Pooled balanced history test with covariates.
Dependent Variable
Independent Variable
History length 2
History length 4
History length 6
Constant
Decision follows prior case
-0.154*
(0.082)
-0.093*
(0.054)
-0.097
(0.067)
0.798***
(0.086)
-0.227***
(0.086)
-0.106*
(0.058)
-0.106
(0.072)
0.875***
(0.098)
-0.227***
(0.086)
-0.106*
(0.059)
-0.098
(0.073)
0.873***
(0.103)
-0.228***
(0.086)
-0.106*
(0.059)
-0.099
(0.073)
0.872***
(0.103)
-0.229***
(0.086)
-0.106*
(0.059)
-0.100
(0.073)
0.870***
(0.103)
-0.209**
(0.087)
-0.119**
(0.059)
-0.114
(0.073)
0.995***
(0.179)
F-test Results
History length 2
0.000
0.001
0.005
0.006
0.008
0.063
History length 4
0.007
0.002
0.004
0.005
0.005
0.037
History length 6
0.046
0.019
0.022
0.024
0.026
0.046
Joint test
0.000
0.001
0.010
0.012
0.014
0.206
Controlled Covariates
Party
x
x
x
x
x
Avg. vote of prior judges
x
x
x
x
from same party
Fraction of prior A
x
x
x
decisions with dissent
Fraction of prior B
x
x
decisions with dissent
Circuit dummy
x
Observations
443
387
379
379
379
379
R2
0.018
0.027
0.026
0.027
0.027
0.092
Notes: Table tests the probability a case follows the decision in the case immediately prior to it, given that
the history of cases prior to the case is balanced. Sample only includes cases with balanced histories, i.e.,
each pair of cases in the history contains an equal number of A's and B's, e.g., history is AB, ABAB, ABBA,
ABABAB, ABBAAB, etc. Sample excludes sequences with any C, D, etc. decisions. If the balanced history
length is 2/4/6, then the first 2/4/6 cases in the sequence have pairs of cases with equal numbers of A and B
decisions, and we are regressing whether the case in position 3/5/7 follows that in position 2/4/6 on the
indicated covariates. Observations are cases with balanced histories and weighted in inverse proportion to
number of times the sequence from which the observation is drawn appears in the sample. ***/**/*
indicates p<0.01/0.05/0.1.
21
Table 6. Test for timing of first split on split-sequence-only and augmented samples.
3 or
higher
4 or
higher
5 or
higher
Slot
6 or
higher
First split (fraction of cases)
0.187
0.113
0.090
0.083
0.072
0.060
0.120
Std err
0.000
0.000
0.000
0.001
0.001
0.003
0.013
Fisher's test against 0
0.000
0.000
0.000
0.000
0.000
0.029
0.117
663
446
283
179
110
59
18
First split (fraction of cases)
0.095
0.058
0.047
0.043
0.039
0.032
0.057
Std err
0.000
0.000
0.000
0.000
0.001
0.001
0.004
Fisher's test against 0
0.000
0.000
0.000
0.000
0.000
0.030
0.121
Total number of sequences
1316
881
549
353
203
105
37
First split (fraction of cases)
0.037
0.022
0.017
0.016
0.014
0.011
0.024
Std err
0.000
0.000
0.000
0.000
0.000
0.000
0.001
Fisher's test against 0
0.000
0.000
0.000
0.000
0.000
0.031
0.124
Total number of sequences
3322
2247
1434
929
577
327
91
First split (fraction of cases)
0.018
0.011
0.009
0.008
0.007
0.006
0.013
Std err
0.000
0.000
0.000
0.000
0.000
0.000
0.000
Fisher's test against 0
0.000
0.000
0.000
0.000
0.000
0.031
0.124
Total number of sequences
6670
4518
2871
1831
1125
607
169
First split (fraction of cases)
0.009
0.006
0.004
0.004
0.004
0.003
0.006
Std err
0.000
0.000
0.000
0.000
0.000
0.000
0.000
Fisher's test against 0
0.000
0.000
0.000
0.000
0.000
0.031
0.125
Total number of sequences
13266
8889
5673
3629
2238
1179
365
First split (fraction of cases)
0.002
0.001
0.001
0.001
0.001
0.001
0.001
Std err
0.000
0.000
0.000
0.000
0.000
0.000
0.000
Fisher's test against 0
0.000
0.000
0.000
0.000
0.000
0.031
0.125
7 or
higher
8 or
higher
9 or
higher
Split Sequences Only
Total number of sequences
Split Sequences 50% of Sample
Split Sequences 20% of Sample
Split Sequences 10% of Sample
Split Sequences 5% of Sample
Split Sequences 1% of Sample
Total number of sequences
66363
44846
28505
18082
10989
5915
1785
Notes. Table reports probability of first split in a case by case position in a sequence. Sequences
other than split sequences are all A decision sequences and they're generated by randomly selecting a
split sequence and create an all A sequence of the same sequence length. We use fisher's exact test
against a null of 0. We do not report numbers for positions > 9 because there is no first split case in
each of those positions.
22
Table 7. Model selection results for VAR models for data on decisions and dissents.
Lag
CD
J
J pvalue
MBIC
MAIC
MQIC
1
0.732
7.109
0.850
-63.950
-16.891
-35.578
2
0.743
3.175
0.923
-44.198
-12.825
-25.283
3
0.579
1.433
0.838
-22.253
-6.567
-12.796
Notes. Table presents J-statistic and various information criteria to dsicriminate between different
VAR models (1, 2 or 3 lags) for a vector that includes the decision in a case and whether a case has a
dissent.
23
APPENDIX TABLES
Table 8. Distribution of cases in sequences across circuit courts, by how decisions in the sequence were coded.
No USLW
With USLW
Court
citation
citation
All
1
5.68
5.20
5.34
2
7.92
9.61
9.12
3
6.34
8.09
7.58
4
8.57
7.86
8.07
5
8.67
10.36
9.87
6
8.67
8.24
8.36
7
10.90
10.29
10.47
8
8.95
7.14
7.66
9
12.40
14.01
13.54
10
8.85
6.19
6.96
11
`8.29
7.93
8.04
Fed
0.00
0.95
0.67
DC
3.91
3.19
3.40
State
0.84
0.95
0.92
Total
100
100
100
Note: Table lists percent of cases in data from each circuit or
state court
24
Table 9. Distribution of sequences by date of last observed case in sequence.
USLWHandcoded
coded
decisions decisions
(percent) (percent)
All
1933
0.00
0.15
0.11
1984
0.00
0.08
0.05
1989
0.19
0.19
0.19
1990
0.00
0.19
0.13
1991
0.00
0.42
0.29
1992
0.00
0.15
0.11
1993
0.19
0.00
0.05
1994
0.00
0.11
0.08
1995
0.56
0.11
0.24
1996
0.00
0.23
0.16
1997
0.00
0.15
0.11
1998
0.00
0.72
0.51
1999
1.76
0.15
0.62
2000
0.46
0.30
0.35
2001
29.32
3.47
10.96
2002
27.75
4.08
10.94
2003
31.54
4.98
12.68
2004
8.23
1.55
3.48
2005
0.00
1.06
0.75
2006
0.00
6.72
4.77
2007
0.00
8.38
5.95
2008
0.00
6.38
4.53
2009
0.00
6.98
4.96
2010
0.00
8.30
5.90
2011
0.00
11.85
8.42
2012
0.00
7.21
5.12
2013
0.00
9.66
6.86
2014
0.00
11.74
8.34
2015
0.00
4.72
3.35
Total
100.00
100.00
100.00
Note: All sequences are identified by
USLW, but decisions may be coded by
USLW or a research assistant (handcoded). Table lists percent of decisions in
data from each year.
25