Relationships Among Rivals: Analyzing Contending Hypotheses

Relationships Among Rivals:
Analyzing Contending Hypotheses with a New Logic
of Process Tracing∗†
Sherry Zaks
University of California, Davis
August 31, 2011
Abstract
Process tracing is a powerful qualitative tool for causal inference. Yet, as it is
currently conceived, the procedure’s treatment of rival hypotheses is fundamentally
incomplete. In this paper, I demonstrate that rival hypotheses relate to one another
in diverse ways. Since one of the primary goals of process tracing is to adjudicate
among alternative explanations, this observation brings into sharp focus the importance of devoting central attention to identifying the nature of the relationships
among the main and alternative hypotheses when using process tracing tests. I
identify three relationships among hypotheses: mutually exclusive, coincident, and
congruent. I demonstrate that depending on the relationship between the main and
rival hypotheses under consideration, passage or failure of the process tracing tests
yields differential and sometimes counterintuitive inferences. This paper offers new
insight into the tools of process tracing by focusing on the relationships between
the main hypothesis of interest and each rival hypothesis being evaluated.
∗
Paper prepared for the Short Course 1 on Multi-Method Research, 2011 Annual Meeting of the
American Political Science Association: Seattle, Washington.
†
The 2011 Institute for Qualitative and Multi-Method Research at Syracuse University played a
crucial rule in crystalizing the ideas presented here.
Contents
1 Introduction
1
2 Process Tracing: The State of the Art
2.1 Advances in Process Tracing . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
3
5
3 Process Tracing Reframed
3.1 Sufficiency Tests: Reframing the “Smoking-Gun” . . . . .
3.1.1 Passing Sufficiency Tests . . . . . . . . . . . . . . .
3.1.2 Impacts on Rival Hypotheses . . . . . . . . . . . .
3.1.3 Failing a Sufficiency Test . . . . . . . . . . . . . . .
3.2 Necessity Tests: Reconceptualizing “Hoops” . . . . . . . .
3.2.1 Passing a Necessity Test . . . . . . . . . . . . . . .
3.2.2 Implications for Rival Hypotheses . . . . . . . . . .
3.2.3 Failing a Necessity Test . . . . . . . . . . . . . . .
3.3 “Doubly-Decisive” Tests and Biconditionality . . . . . . .
3.3.1 Implications for Rival Hypotheses . . . . . . . . . .
3.4 Leveraging Tests: More Than Just a “Straw-in-the-Wind”
3.4.1 Leverage in Favor . . . . . . . . . . . . . . . . . . .
3.4.2 Leverage Against . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
9
10
13
13
13
14
16
17
17
18
19
20
4 Assessing the Relative Strength of Tests
21
5 Implications for Research and Concluding Remarks
22
2
1
Introduction
Process tracing is a powerful tool for systematically evaluating pieces of evidence gathered
in a qualitative research setting. As both a stand-alone methodology and a complement
to other methodological tools, process tracing has tremendous promise for contributing
to causal inference. Strides have been made in developing and refining this methodology,
I argue, however, that the research procedures associated with process tracing require
further crystallization. Specifically, its treatment of rival hypotheses is fundamentally
incomplete. The current literature overlooks how passing or failing the process tracing
tests should often lead to diverse and potentially counterintuitive inferences concerning
the validity of alternative hypotheses.
In this article, I draw on the concepts of the predicate and modal branches of formal
logic to develop a new framework for conceptualizing the reasoning underlying process
tracing tests. This new lens elicits a significant insight: rival hypotheses relate to one
another in diverse ways that give rise to diverse inferences. Since the goal of process
tracing is to adjudicate among alternative explanations, this observation brings into sharp
focus the fact that part of using process tracing tests involves devoting central attention
to identifying the relationship among the main hypothesis under examination and all of
the rival hypotheses. I develop a more nuanced framework for considering how alternative
explanations relate to one another, which in turn will allow researchers to establish with
more precision when rival hypotheses can be disqualified from consideration.
Reconceptualizing process tracing in terms of formal logic accomplishes three tasks
that add significant value to this research program and should enhance both the quality
and frequency of its use. First, I demonstrate that rival hypotheses can — and frequently
do — relate to the main hypothesis in diverse ways. Consequently, passing the different
process tracing tests differentially impacts the validity of rival hypotheses depending on
whether they are mutually exclusive to, coincident with, or congruent with the main
1
hypothesis.
Second, formal logic concepts as they are applied to process tracing tests provide
researchers with a clearer set of criteria with which to evaluate pieces of evidence. I
argue that the current designations for the tests have some significant drawbacks that
negatively impact the ease with which the methodology is adopted as well as its overall
clarity. Once the process tracing tests are redefined in terms of formal logic, we can then
explore, by logical proof, how passing or failing a given test bears on the main hypothesis
as well as alternative hypotheses.
Third, reframing process tracing in terms of formal logic provides researchers with a
language more conducive to traversing the quantitative-qualitative methodological divide.
Having a more clearly defined set of criteria describing the relationship between the
phenomenon under investigation and observable evidence will more easily denote the
sorts of statistical results one should expect in large-n studies used to triangulate or
bolster the strength of process tracing research.
Before laying out the new framework, I turn first to a summary of process tracing
as it is currently conceived and employed. I discuss its strengths as well as some of the
inherent shortcomings and their implications for researchers and wide adoption of the
methodology.
2
Process Tracing: The State of the Art
The goals of process tracing are two-fold: first, by examining multiple pieces of sequential
evidence within a case, the method evinces underlying processes that give rise to the
phenomenon, thus contributing to causal inference; second, it provides a systematized
way of establishing how different pieces of evidence bear on competing explanations.
As originally conceptualized, process tracing involved the use of historical narratives
and within-case analysis as a means of evaluating complex causal processes (George and
2
McKeown, 1985). Following its inception, there have been many additional advances
in codifying the procedure of process tracing. A recurring theme in this literature is
assessing the strength of inferential leverage that derives from different process tracing
tests.
2.1
Advances in Process Tracing
A critical advancement in process tracing came with the introduction of four “tests” under
which each piece of evidence can be evaluated (Van Evera, 1997). Passing or failing the
tests dictates the level of confidence with which we can accept our proposed explanation.
The weakest, straw-in-the-wind tests, assesses pieces of evidence that, by themselves,
are not sufficiently strong to either confirm a hypothesis if it passes or eliminate the
hypothesis if it fails. They allow researchers to gain insight into the overall balance of
evidence in favor of (or in opposition to) one hypothesis, yet are not decisive one way or
the other.
In Van Evera’s account, two additional tests of intermediate strength yield stronger
inferences than straw-in-the-wind, though their utility relative to each other is contested
in this paper. Hoop tests are used to evaluate observations that must be satisfied in
order for an explanation to hold. While failing a hoop test (i.e. failing to meet a necessary condition) decisively rules out a given hypothesis from consideration, passing only
suggests that the hypothesis is still a contender. In Van Evera’s framework, the next
strongest, smoking-gun, evaluates pieces of evidence that are so uniquely confirmatory
for a given explanation that they affirm the validity of the hypothesis. In contrast to
hoop tests, passing a smoking-gun test is sufficient to accept an explanation, yet failing
to find smoking-gun evidence is not sufficient reason to disqualify it.
Finally, the strongest of the tests, deemed doubly-decisive, evaluates pieces of evidence
(or a combination of evidence) that are so uniquely applicable and so important for a
3
given explanation that passing confirms the hypothesis and failing disqualifies it.
Another important step forward in the development of this methodology was the call
to critically assess alternative hypotheses (George and Bennett, 2005). Noting that for
any one case, there are often many potential causal paths by which the phenomenon
could have occurred, George and Bennett enjoin researchers to map out the alternatives
as a crucial stride towards establishing more rigorously the contributions of different
tests(2005, 207).
Following from this logic, a crucial effort towards codification and unification of the
principles in process tracing took on the task of combining Van Evera’s four tests with
the consideration of their implications for rival hypotheses (Bennett, 2010; Collier, 2011).
The four tests, definitions of passing and failing, their implications for rival hypotheses,
and their proposed gradations in the strength of inferences they yield are depicted in
Table 1.
Table 1: The Four Process Tracing Tests†
Sufficient to Affirm Causal Inference
1. Straw-in-the-Wind
3. Smoking-Gun
Necessary
to Affirm
Causal
Inference
Passing: Affirms relevance of hypothesis, but
does not confirm it.
Failing: Hypothesis is slightly weakened,
though not eliminated.
Implications for Rival Hypotheses:
Passing: slightly weakens them.
Failing: slightly strengthens them.
2. Hoop
Failing: Hypothesis is somewhat weakened,
though not eliminated.
Implications for Rival Hypotheses:
Passing: substantially weakens them.
Failing: somewhat strengthens them.
4. Doubly-Decisive
Passing: Affirms relevance of hypothesis, but
does not confirm it.
Failing: Eliminates Hypothesis.
Implications for Rival Hypotheses:
Passing: somewhat weakens them.
Failing: somewhat strengthens them.
†
Passing: Confirms hypothesis.
Passing: Confirms hypothesis and eliminates
others.
Failing: Eliminates Hypothesis
Implications for Rival Hypotheses:
Passing: eliminates them.
Failing: substantially strengthens them.
Source: Collier (forthcoming 2011), who adapts the table from Bennett (2010).
Mahoney’s recent contribution extends the efforts of Van Evera, George and Bennett,
4
and Collier. In an attempt to make distinctions not among, but within tests in terms
of how much inferential leverage they yield, Mahoney’s work dissects the cells by asking
whether all hoop tests (or all smoking-gun tests, etc.) are the same, or whether they
provide more leverage in some instances than in others (Mahoney, 2011). Adopting what
is very nearly a frequentist approach, Mahoney argues that the difficulty of passing a
hoop test, for instance, is related to the frequency with which the observation is present.
Specifically, he notes that when a certain condition is rare or abnormal, a hoop test based
on that condition will be more difficult to pass. Thus, contrary to the way many have
conceptualized meeting a necessary condition, he concludes that these more demanding
hoop tests have additional utility of lending positive support to a hypothesis if it passes.
It is at this point worth asking whether the tests provide an exhaustive account of
the possible inferences and implications that we can draw from a piece of evidence. The
proper formulation and implications of these tests are of central importance to this article.
Categorizing the tests in terms of whether their contribution is necessary or sufficient
for affirming causal inference provides a clear and useful framework for researchers who
wish to employ process tracing. Yet, many of the inferences that follow from a logical
framework have still gone unexamined.
2.2
Shortcomings
If the goal of methodological work is to delineate and advocate the usage of different
methodological tools, we must ask ourselves the following question: Given the current
state of the process tracing literature, should we expect to see more students and researchers within political science adopting this technique? I argue that as it is currently
formulated, process tracing exhibits two limitations that impede its adoption and potentially leave it vulnerable to criticism. The first limitation concerns the nature of the
terminology, and the second concerns an incomplete treatment of rival hypotheses.
5
One need not look far into the process tracing literature to find a wealth of examples
of scholars (or detectives) “using” process tracing as a research tool.1 Upon further
examination of the articles held up as exemplars of this methodology, however, it quickly
becomes clear that these scholars are not so much utilizing this method, so much as
they are implicitly conforming to what it is deemed to be. In fact, it is still quite
infrequently that scholars will refer to “process tracing” at all, let alone the names of
the corresponding tests.2 None of these scholars (or detectives) — not Tannenwald, not
Holmes — are specifying a priori the sort of observation that would constitute a hoop
test and then looking for confirmatory or contradictory examples. To illustrate the true
peculiarity of an article representing a creditable example of a methodology while never
referring to the terminology associated with it, we could imagine how hard-pressed we
would be to find a comparably excellent use of regression analysis without the author
making reference to “regression coefficients or “statistical significance. Ultimately the
language we are expected to use to describe our work matters, and I expect that many
substantive scholars searching for a template of good qualitative research would find
themselves hesitant to classify their findings as “straws-in-the-wind.”
The second and more inimical shortcoming is that its treatment of implications for
rival hypotheses is fundamentally incomplete. While it is important to consider explanations in addition to the main hypothesis of interest, it is, furthermore, vital that we
also consider the relationships among them. Indiscriminately assuming that if one hypothesis passes a test then rival hypotheses are necessarily weakened could ultimately
lead researchers to hastily rule-out explanations that may work in conjunction with the
1
Mahoney (2009) references Tannenwald (1999); Bennett (2010) references Schultz (2001) and Goemans (2000) as well as his own earlier work.
2
A Google Scholar search of “hoop test” returns a total of eight results: three of which are articles
or books written on process tracing (rather than using process tracing) and in the other five cases, the
phrase is only used once, most often in a footnote. A search on “smoking-gun test” returns five results,
four of which are methodological, rather than substantive in nature. “Doubly-decisive test” returns only
three works, all of which are on process tracing. Finally, “straw-in-the-wind test” returns two results,
both of which are also written on process tracing.
6
primary one.
3
Process Tracing Reframed
Contrary to the prior formulation of the process tracing tests, I demonstrate here that
passing or failing a given test does not have a singular, deterministic implication for rival
hypotheses. Rather, the consequences for competing explanations depend entirely on
their relationship to the main hypothesis. I propose reframing the four process tracing
tests in terms of formal logic in order to reveal the diverse implications for competing
hypotheses. I discuss and reframe each test: smoking-gun tests, hoop tests, doubly-decisive
tests, and finally, straw-in-the-wind tests. In order to draw inferences from each, we must
consider the implications from the test along three dimensions: (1) how passing a test
bears on the hypothesis, (2) what passing a test for the main hypothesis implies for the
viability of rival hypotheses, and (3) how failing a test bears on the hypothesis.
The central finding that derives from reframing the process tracing tests concerns
their diverse, and sometimes counterintuitive, impacts on rival hypotheses. This is the
direct result of hypotheses relating to one another in different ways. Before I delineate the
new logic-based framework for the tests, I first provide a brief summary of the different
relationships among the main and rival hypotheses. The three potential relationships
among hypotheses are depicted in Table 2. As I move into describing the tests, the
logical implications of these relationships will be further elaborated.
3.1
Sufficiency Tests: Reframing the “Smoking-Gun”
Previously known in the literature as “smoking-gun tests,” this test can be usefully reframed in terms of the logic of sufficient conditions and referred to as a Sufficiency Test.
To say that a piece of evidence is sufficient to accept a hypothesis as true is to say that,
7
Table 2: Relationships Among Hypotheses
Relationship
Description
Mutually Exclusive Hypotheses are completely disjoint.
Bolstering the validity of one necessarily undermines the validity of the other.
Coincident
Hypotheses are independent of one another. The validity of one neither bolsters nor undermines the validity of the
other. Both may contribute to the phenomenon.
Congruent
Hypotheses are similar and possibly related. Bolstering one hypothesis simultaneously bolsters the other. The possibility also exists that they are jointly
sufficient for the outcome.
alone, the piece of evidence under examination confirms or validates the hypothesis. Yet,
the logical framework reveals that the sufficiency test does not have as strong an impact on rival hypotheses as some have suggested — and certainly not as strong as the
“smoking-gun” metaphor implies. The image of the “smoking-gun, while evocative and
intuitive, belies the complexity of the phenomena we study. When we come across a
person with a smoking-gun standing over a corpse with one bullet wound, that “observation” is certainly sufficient to establish the gun-bearing individual as the murder; and it
simultaneously provides convincing (if not certain) evidence exonerating other suspects.
The problem is that in social phenomena, the corpse often has a bullet wound in addition
to multiple stab wounds and clear evidence of poisoning. Finding one smoking gun in this
case, is not only insufficient to explain the murder, but it also clearly fails to eliminate
many other suspects from consideration.
Essentially, it is not only possible, but likely, that many of the phenomena we study
exhibit overdetermination. Overdetermined events exhibit a variety of distinct causes, any
one of which may independently be responsible for the event, but due to the presence of
8
the others, the individual cause cannot be determined. Firing squads are a prime example
of overdetermination: any one bullet could have killed the victim, yet it might also be
the case that two bullets from two different guns jointly killed victim, or that one bullet
exacerbated problems caused by a different bullet.
3.1.1
Passing Sufficiency Tests
A given hypothesis passes a sufficiency test when a specific observation or piece of evidence, alone, is adequate to validate the hypothesis under consideration. In other words,
passing a sufficiency test entails finding a piece of evidence that confirms the validity of
the hypothesis under consideration. Alternately, we could think of the hypothesized outcome as a necessary consequence that would have to follow given the occurrence of that
causal-process observation. Analytically, if we take E to represent a given piece of evidence and HM to represent the main hypothesis being tested, we can define a sufficiency
test as follows,
E → HM .
(1)
As Bennett (2010) observes, an excellent example of research employing evidence
that constitutes a sufficiency test is found in Schultz’s (2001) examination of the Fashoda
crisis. In seeking to explain why the Fashoda crisis did not escalate to war, Schultz
theorizes that the structure of democratic institutions facilitates disclosure of leaders’
private information irrespective of whether leaders desire that such information be made
known. So, on the one hand, democratic leaders have a difficult time bluffing, yet on
the other hand, threats are all the more credible due to the visibility or public reactions.
Schultz argues that Britain’s resolve to “take a hard line against France” was sufficiently
credible that France had no choice but to back down or else face near-certain defeat. The
sufficiency test evidence he presents in support of this hypothesis comes from a speech
delivered by the opposition party in Britain in support of the Prime Minister’s resolve.
9
According to Schultz, such evidence of unanimity of the “hard-line” stance — particularly
at a point when Britain’s government was highly polarized — provoked France to retreat
rather than face war.
The logical implications of passing sufficiency tests are clear for the given hypothesis
to which the evidence is relevant, yet two questions remain. What are the implications
for additional, rival hypotheses? And, what happens when a hypothesis fails a sufficiency
test?
3.1.2
Impacts on Rival Hypotheses
What does passing a sufficiency test imply for competing hypotheses? The answer depends, in part, on the nature of the competing hypothesis. Nearly every phenomena is
subject to different explanations as to how it arose, yet not all explanations relate to each
other in the same way. For the sake of illustration, I continue to refer to the proposed
or main hypothesis as HM and I refer to other (rival) hypotheses under consideration
as HR . Passing a sufficiency test has different implications for the validity of rival hypotheses depending on the relationship between HM and HR . HM and HR can have one
of three relationships: they can be mutually exclusive, coincident, or congruent. The
logical implications for differential relationships among main and rival hypotheses have
gone unexamined in the previous literature on process tracing, and yet the differences
have highly salient consequences for the inferences we draw from research.
Mutually Exclusive Hypotheses. The singular condition under which passing a sufficiency test (for HM ) undermines the validity of a rival hypothesis is when the two
explanations are mutually exclusive. In other words, if the piece of evidence under examination simultaneously validates HM , yet would be impossible under HR , then this
observation would be considered sufficient to both accept HM and reject HR . Formally,
this situation occurs when a the presence of a piece of evidence is a sufficient condition
10
for HM and the absence of that same piece of evidence is a necessary condition for HR
to hold. By definition (and contraposition) then,
E → HM , and
(2)
E → ¬HR .
The burden of proof of mutual exclusivity of two hypotheses, of course, is on the researcher. One would have to convincingly argue (and preferably, demonstrate) that two
hypotheses could not co-explain the outcome.
Coincident Hypotheses. In contrast to rival explanation that exhibit mutual exclusivity, when nothing precludes HM and HR from coexisting, yet they are unrelated to one
another, HM and HR constitute coincident hypotheses. In this case, both hypotheses may
be valid in contributing to the outcome, but they are unrelated in such a way that confirmatory evidence for one hypothesis neither bolsters nor undermines the other. Thus,
in terms of logical notation, we would observe,
E → HM
(3)
E → HR or ¬HR .
An example of simultaneous rival explanations of an outcome is found in the democratic peace literature. In seeking an explanation for the paradox that although democracies are comparably conflict-prone to their autocratic counterparts, they nonetheless
exhibit some immunity from fighting one another, scholars have proposed two explanations: the normative model and the institutional model. The logic of the normative
perspective holds that different polity types foster different types of norms for dealing
with conflict within the state, which are then externalized reliably enough to determine
how that state will deal with conflicts of interests between states. Alternately, the institutional approach contends that the institutional structure of democracies acts as a
11
constraint on leaders and makes it fundamentally more difficult for them to wage war
than their autocratic counterparts. Each model posits a very different causal story even
though they arrive at the same outcome. Yet, does support for one necessarily invalidate
the other explanation? The answer here is no. After all, it seems not only reasonable, but
likely, that the democratic peace phenomenon exists as a result of multiple compounding
factors. A piece of evidence that is sufficient to verify an underlying normative process in
the democratic peace phenomenon does not preclude institutional/structural influences
from playing a comparably strong role.
Congruent Hypotheses. Finally, it is possible to have two (or more) hypotheses for
which some amalgamation of evidence is sufficient to verify the contribution of both
explanations.3 Analytically, congruent hypotheses are defined as:
E → HM and
(4)
E → HR .
Turning back to Schultz’s example, we would do well to ask ourselves, is Schultz’s
discovery a true “smoking-gun” in the classic sense of the metaphor? The answer, I argue, is no. For reasons I discuss later in the context of Necessity Tests, Schultz rules out
the power asymmetry in Britain’s favor as a possible explanation for the cessation of the
Fashoda Crisis. What is more likely, however, is that France observed the power asymmetry in conjunction with Britain’s newfound unanimous resolve to use force, and that
combination of factors ultimately prompted France to back down. After all, convincing
resolve to use force does not prompt retreat unless one’s opponent is also stronger. I
argue that all Schultz did was rule out the balance of power hypothesis as the sole factor
explaining the end of the crisis.
3
This concept is analytically similar to the INUS approach to causal inference, in which a condition
is by itself neither necessary nor sufficient, yet is jointly sufficient with another condition to explain the
phenomenon under examination (Mackie, 1980; Mahoney and Goertz, 2006).
12
3.1.3
Failing a Sufficiency Test
Finally, what are the implications for the a hypothesis if an observation fails to satisfy
a sufficiency test? The consequences are insubstantial. While many pieces of evidence
would alone be sufficient to confirm a hypothesis, their absence does not undermine
the validity of the hypothesis. We could imagine, for instance, many different pieces of
evidence that would individually be sufficient to validate a hypothesis. The absence of
one does not preclude the existence of another comparably strong piece of evidence that
would do the same job.
3.2
Necessity Tests: Reconceptualizing “Hoops”
In this section, I reformulate what the literature refers to as “hoop tests” in terms of the
logic of necessary conditions. For this test, the evidence under examination is not, alone,
sufficient to verify the hypothesis, but it is required in order for the hypothesis to be true.
3.2.1
Passing a Necessity Test
A given explanation passes a necessity test when a piece of evidence confirms a condition
that is required for the validity of the hypothesis. In contrast to sufficiency tests, in which
the proposed hypothesis is a necessary outcome of the evidence, here, the evidence/CPO is
a necessary condition for the proposed hypothesis to be valid. In other words, the absence
of that condition, by definition, compromises the validity of the hypothesis. Analytically,
a piece of evidence constitutes a necessity test when,
HM → E
(5)
It is important to note that the evidence/observation is not alone sufficient to establish
that HM holds — only that it is possible. When evaluating a given piece of evidence under
13
a necessity test, we must ask, what are the logical implications of passing and failing?
I demonstrate in this section that careful consideration of necessity tests under the new
framework reveals some new insights about the implications of passing and failing, as
well as the relative strength of this test as compared to the others vis-à-vis adjudicating
from among multiple hypotheses.
Passing a necessity test raises two questions that bear importantly on our evaluation
of the hypotheses on the table: (1) What does satisfying a necessity test imply for the
hypothesis under consideration? And, (2) What does passing imply for other rival hypotheses? Keeping with the previously defined notation, passing a necessity test entails
finding evidence that constitutes a necessary condition or requirement under HM . [EXAMPLE]. An observation that verifies the existence of a necessary condition buttresses
the hypothesis under consideration, though is not, by itself, sufficient to verify the causal
claim. We should, nonetheless, consider passing necessity tests as offering an important
level of support in favor of the main hypothesis. Since, after all, pieces of evidence that
we examine are merely the observables that are indicative of an underlying process, finding a concrete piece of evidence verifying that a requirement was satisfied should not be
taken lightly.
3.2.2
Implications for Rival Hypotheses
As with sufficiency tests, passing a necessity test has different implications for rival hypotheses depending on how the alternatives relate to the hypothesis under consideration.
In the current literature on process tracing, others have noted that passing a hoop-test
“slightly weakens rival hypotheses” (Collier, 2011). While this assertion is true in some
cases, it does not necessarily hold for all alternatives on the table. Once again, since
hypotheses may (likely) exhibit heterogeneity in the way that they relate to the main
hypothesis, the logical implications for their validity will vary accordingly. Since there
14
are three different possible relationships between HM and HR , passing a necessity test
will have one of three different implication for competing hypotheses depending on the
nature of the relationship.
Mutually Exclusive Hypotheses. First, if the two competing hypothesis are mutually
exclusive of one another, passing a necessity test for one has grave consequences for
the other. This case is one in which the implications of the necessity test are stronger
than have been previously acknowledged. Mutually exclusive hypotheses in the case of
necessity tests are analytically defined as,
HM → E
(6)
HRM E → ¬E.
In words, mutual exclusivity occurs when the main hypothesis requires the presence
of a particular piece of evidence in order for the hypothesis to hold, whereas the rival
hypothesis requires the absence of that same piece of evidence. Here, passing a necessity
test for HM , not only bolsters HM , but it also necessarily rules out HR from consideration.
Coincident Hypotheses. In the event that the piece of evidence under consideration
lies outside the scope of relevance of the rival hypothesis, we might call this a coincident
relationship. Here, the evidence/observation, while satisfying a requirement for HM ,
has no necessary effect on the validity of HR . Both explanations may contribute to the
phenomenon. [EXAMPLE]
HM → E
(7)
HR → E or ¬E
Congruent Hypotheses. Lastly, if the two competing explanations both require the
same observation, they ought to be considered congruent hypotheses. In this case, passing
a necessity test for HM not only does not weaken HR , but rather, the same piece of
15
evidence lends support to both cases.
HM → E
(8)
HR → E.
Although this sort of evidence is obviously not as useful as one that bears on only one
or another hypothesis, it is nonetheless important that researchers acknowledge these
observations and are able to draw correct inferences as to their affects on rival hypotheses.
Of course, the nature of the relationships will differ depending on the piece of evidence
under consideration. Obviously there are some explanations that are congruent on one
piece of evidence, but sharply divergent or even mutually exclusive on others.
3.2.3
Failing a Necessity Test
Failing necessity tests has grave implications for hypotheses. In stark contrast to the
inconsequentiality of failing sufficiency tests, when a piece of evidence undermines a
hypothesis via a necessity test (i.e. implies that a necessary condition for that hypothesis
is not satisfied) that hypothesis must be immediately stricken from consideration. This
conclusion follows from a logical deductive process known as contraposition. The intuition
is that if one removes a necessary condition or a requirement for an outcome, the outcome
thus becomes impossible. Formally defined, the absence of a necessary condition precludes
the existence of the sufficient condition, here:
HM → E
(9)
¬E → ¬HM .
Thus, failure to observe a requirement for the hypothesis, by definition, controverts the
validity of that hypothesis.
16
3.3
“Doubly-Decisive” Tests and Biconditionality
Referred to in the literature as doubly-decisive, this test entails finding a piece of evidence
that is of such high certitude and uniqueness that it confirms the main hypothesis while
simultaneously eliminating rival explanations (Van Evera, 1997)(Collier, 2011). In terms
of logic, this piece of evidence exhibits what is known as biconditionality: it constitutes
both a sufficient and necessary condition for an outcome. Formally, this relationship is
depicted as follows:
E ⇐⇒ HM
(10)
¬E ⇐⇒ ¬HM .
In words, the observation is possible if, and only if the main hypothesis is valid.
3.3.1
Implications for Rival Hypotheses
Perhaps surprisingly, passing a biconditional test does not have the impact on alternative explanations that much of the literature currently implies. Recent process tracing
literature has mistakenly interpreted biconditional satisfaction (i.e. finding a piece of
evidence that constitutes both a necessary and sufficient condition for an outcome) to
imply both confirmation of the main hypothesis as well as incontestable elimination of
rival hypotheses (Bennett, 2010). If the piece of evidence is sufficiently relevant to the
hypothesis, then it might carry the strength implied in the literature. However, while one
piece of evidence might be strong enough to confirm a hypothesis if it passes the test, and
eliminate that same hypothesis if it fails the test, it is not necessarily strong enough to
simultaneously rule out alternative hypotheses. Take, for instance, Van Evera’s example:
“If a bank security camera records the faces of bank robbers, its film is decisive both ways
— it proves suspects guilty or innocent” (Van Evera, 1997, 32). Certainly, the recording
can confirm and absolve suspects if the hypothesis concerns those who physically held-up
17
the tellers. Yet, if the outcome we wish to explain instead concerns involvement in the
robbery, relying solely on the security camera evidence might lead us to wrongfully exonerate the driver of the getaway car, crooked bank personnel, or the person who organized
(though was not present for) the operation.
Consequently, although a piece of evidence can be both necessary and sufficient to
confirm (or disconfirm) one specific explanation, the validity of coincident or congruent
explanations is not necessarily undermined. In other words, even the strength of biconditional tests is called into question when we encounter situations in which the phenomenon
is overdetermined.
3.4
Leveraging Tests: More Than Just a “Straw-in-the-Wind”
As with the “smoking-gun”, the “straw-in-the-wind test” demands relabeling, in part, due
to the misleading and overly narrow nature of the metaphor. Straws-in-the-wind refer to
fleeting observations that slightly hint at future events. By this narrow definition, a great
deal of useful and salient evidence in support (or contest) of hypotheses would be left
uncategorized. I propose reconceptualizing/relabeling the “straw-in-the-wind” test as a
leveraging test. The nature of this test is somewhat distinct from the previous tests due
to the nature of the evidence. So-called “straw-in-the-wind” tests are assessed with pieces
of evidence that are insufficiently unique and certain to give rise to decisive confirmatory
inferences (Van Evera, 1997, 32). Such evidence may lend support to (or undermine) an
explanation, but neither passing nor failing the test establishes conclusively the validity
of the hypothesis.
I argue, however, that its incertitude should not belie its importance. Evidence that
points researchers in one or another direction is an equally salient part of the process
as is evidence that points to a finish line or a dead end. Yet the metaphor ascribed to
the test suggests calls into question the importance of the evidence that fits into this
18
category. We must, after all, keep in mind that the goal of process tracing — drawing
causal inferences — is a difficult one. Consequently, much of the evidence that we come
across is not likely to qualify evaluation via necessity, sufficiency, or biconditional tests.
It is useful to have a designation that allows researchers to present their evidence without
suggesting that they are relying on chance occurrences (i.e. pieces of straw traveling in
the wind) to corroborate their theories.
With structured research and reasoning, some pieces of corroborating evidence provide
much analytic leverage, while others do not. The question that remains is, how can we
impose useful structure on a test meant to evaluate uncertain evidence? Observations
to be evaluated under this test will result in one of two outcomes: it will either provide
leverage in favor of a given hypothesis, or it will provide leverage against a hypothesis.
3.4.1
Leverage in Favor
To pass a leveraging test, researchers must find evidence that bolsters the credibility of
the main hypothesis. Due to the uncertain nature of the evidence, we must alter slightly
the formal notation associated with leveraging tests. To capture the uncertainty, I draw
on modal logic, which is a branch of formal logic that extends the notation to capture
possibility, symbolized by ♦. Thus, the analytic definition of evidence that provides
leverage in favor of a hypothesis is,
E → ♦HM .
(11)
In words, we would say that the observation suggests that the main hypothesis is possible,
or even more probable.
19
3.4.2
Leverage Against
Failing a leveraging test entails finding a piece of evidence that counts against the main
hypothesis. One of two conditions can result in failure. First, we may observe something
that is seemingly incompatible with the hypothesis under consideration:
E → ¬♦HM .
(12)
Or else, we might find tentative support for a rival hypothesis that has a priori been
designated as mutually exclusive.
E → ♦HRM E .
(13)
In either of the aforementioned cases, the observation negatively impacts the main hypothesis.
Reframed, the new process tracing tests and their definitions are summarized in Table
3.
Table 3: Process Tracing Reframed
Sufficient to Affirm Causal Inference
Straw-in-the-Wind → Leveraging
Smoking-Gun → Sufficiency
Passing: Provides leverage in favor of H
Failing: Provides leverage against H
Implications for Rival Hypotheses:
Variable
Passing: Sufficient to Establish H
Failing: Inconsequential for H
Implications for Rival Hypotheses:
Variable
Failing: Eliminates H
Implications for Rival Hypotheses:
Variable
Passing: Sufficient to Establish H
Failing: Eliminates H
Implications for Rival Hypotheses:
Variable
Necessary
to Affirm
Causal
Hoop → Necessity
Inference Passing: Provides leverage in favor of H
†
Source: Collier (forthcoming 2011).
20
Doubly-Decisive → Biconditional
4
Assessing the Relative Strength of Tests
An additional insight that follows from this discussion concerns the relative strength of
the different tests. The literature on process tracing suggests (uniformly) that necessity
tests (or, hoop tests as they are called) provide a weaker benchmark for hypothesisevaluation than do sufficiency (smoking-gun) tests (Van Evera, 1997, 31)(Collier, 2011).
I argue, however, the goals of process tracing suggest that this evaluation be reversed.
If the purposes of process tracing are to both aid in causal inference and adjudicate
from among competing hypotheses, necessity tests offer two advantages that elevate their
importance so as to — at the very least — put them on equal footing with sufficiency
tests.
First, failing a necessity test allows researchers to eliminate the contravening hypothesis from consideration. Irrespective of the methodological tradition, scholars are
repeatedly enjoined to take plausible alternative explanations into consideration. This
task is no small feat, especially since we so frequently investigate phenomena that exhibit
equifinality or overdetermination. Consequently, a test that allows for the dismissal of
hypotheses is of great importance in the process of narrowing down the scope of potential explanations, which, by definition, makes any project more tractable. Appraisal of
a theory must take into account the severity of the tests to which the theory has been
subjected (Popper, 2009). Sufficiency tests — except under the condition of mutually
exclusive hypotheses — cannot falsify; and falsification is crucial for the task of adjudication.
The second advantage of necessity tests derives from their ability to be specified a
priori. Van Evera, I argue, attaches excessive merit to the uniqueness of tests, and
insufficient merit to their certitude. Yet, the very problem with sufficiency tests concerns
the uniqueness of the evidence required to satisfy them. A piece of evidence so strong
and unique that it confirms a hypothesis is, by the same token, difficult to specify ahead
21
of time. It is a more tractable, and, likely, a more fruitful task for researchers to ask
“what would I need to see to make this hypothesis false?” than it is to ask “what would
I need to see to confirm this hypothesis?”
5
Implications for Research and Concluding Remarks
This new framework has important implications for the process of laying out and executing a research design. Before evidence is collected — irrespective of its source —
researchers should aim to specify two aspects of their projects. First, for every hypothesis under consideration, researchers ought to specify a priori the sort of observations that
would pass (and fail) necessity tests. This list is crucial in achieving the goals of process
tracing (i.e. providing criteria for the systematic elimination of alternative hypotheses)
and furthermore, it provides an efficient and methodical approach to evidence gathering.
Second, given a (hopefully) comprehensive list of explanations, researchers ought to posit
the relationships among the main hypothesis and the rival hypotheses. This prior knowledge, once again, can shape the process of evidence gathering and will allow researchers
to judge the impact of a given piece of evidence with considerably more precision.
Without a framework specifying how relationships among hypotheses affect the inferences we draw from evidence, we risk mistakenly discounting sound explanations or
mistakenly accepting faulty ones.
22
References
Bennett, Andrew. 2010. Process Tracing and Causal Inference. In Rethinking Social
Inquiry: Diverse Tools, Shared Standards, ed. Henry E. Brady and David Collier.
Second ed. Lanham, MD: Rowman and Littlefield.
Collier, David. 2011. “Teaching Process Tracing.” forthcoming pp. 1–52.
George, Alexander L. and Andrew Bennett. 2005. Case Studies and Theory Development
in the Social Sciences. BCSIA Studies in International Security Cambridge, MA:
MIT Press.
George, Alexander L. and Timothy J. McKeown. 1985. Case Studies and Theory Development in the Social Sciences. Cambridge, MA: MIT Press.
Goemans, Hein. 2000. War and Punishment: The Causes of War Termination and the
First World War. Princeton, NJ: Princeton University Press.
Mackie, John Leslie. 1980. The Cement of the Universe: A Study of Causation. Oxford:
Oxford University Press.
Mahoney, James. 2009. “After KKV: The New Methodology of Qualitative Research.”
World Politics 62(01):120.
Mahoney, James. 2011. “The Logic of Process Tracing Tests in the Social Sciences.”.
Mahoney, James and Gary Goertz. 2006. “A Tale of Two Cultures: Contrasting Quantitative and Qualitative Research.” Political Analysis 14(3):227–249.
Popper, Karl. 2009. The Logic of Scientific Discovery. New York: Routledge Classics.
Schultz, Kenneth. 2001. Democracy and Coercive Diplomacy. Cambridge: Cambridge
University Press.
Tannenwald, Nina. 1999. “The Nuclear Taboo: The United States and the Normative
Basis of Nuclear Non-Use.” International Organization 53(3):433–468.
Van Evera, Stephen. 1997. Guide to Methods for Students of Political Science. Ithica,
NY: Cornell University Press.
23