Relationships Among Rivals: Analyzing Contending Hypotheses with a New Logic of Process Tracing∗† Sherry Zaks University of California, Davis August 31, 2011 Abstract Process tracing is a powerful qualitative tool for causal inference. Yet, as it is currently conceived, the procedure’s treatment of rival hypotheses is fundamentally incomplete. In this paper, I demonstrate that rival hypotheses relate to one another in diverse ways. Since one of the primary goals of process tracing is to adjudicate among alternative explanations, this observation brings into sharp focus the importance of devoting central attention to identifying the nature of the relationships among the main and alternative hypotheses when using process tracing tests. I identify three relationships among hypotheses: mutually exclusive, coincident, and congruent. I demonstrate that depending on the relationship between the main and rival hypotheses under consideration, passage or failure of the process tracing tests yields differential and sometimes counterintuitive inferences. This paper offers new insight into the tools of process tracing by focusing on the relationships between the main hypothesis of interest and each rival hypothesis being evaluated. ∗ Paper prepared for the Short Course 1 on Multi-Method Research, 2011 Annual Meeting of the American Political Science Association: Seattle, Washington. † The 2011 Institute for Qualitative and Multi-Method Research at Syracuse University played a crucial rule in crystalizing the ideas presented here. Contents 1 Introduction 1 2 Process Tracing: The State of the Art 2.1 Advances in Process Tracing . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 3 5 3 Process Tracing Reframed 3.1 Sufficiency Tests: Reframing the “Smoking-Gun” . . . . . 3.1.1 Passing Sufficiency Tests . . . . . . . . . . . . . . . 3.1.2 Impacts on Rival Hypotheses . . . . . . . . . . . . 3.1.3 Failing a Sufficiency Test . . . . . . . . . . . . . . . 3.2 Necessity Tests: Reconceptualizing “Hoops” . . . . . . . . 3.2.1 Passing a Necessity Test . . . . . . . . . . . . . . . 3.2.2 Implications for Rival Hypotheses . . . . . . . . . . 3.2.3 Failing a Necessity Test . . . . . . . . . . . . . . . 3.3 “Doubly-Decisive” Tests and Biconditionality . . . . . . . 3.3.1 Implications for Rival Hypotheses . . . . . . . . . . 3.4 Leveraging Tests: More Than Just a “Straw-in-the-Wind” 3.4.1 Leverage in Favor . . . . . . . . . . . . . . . . . . . 3.4.2 Leverage Against . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 7 9 10 13 13 13 14 16 17 17 18 19 20 4 Assessing the Relative Strength of Tests 21 5 Implications for Research and Concluding Remarks 22 2 1 Introduction Process tracing is a powerful tool for systematically evaluating pieces of evidence gathered in a qualitative research setting. As both a stand-alone methodology and a complement to other methodological tools, process tracing has tremendous promise for contributing to causal inference. Strides have been made in developing and refining this methodology, I argue, however, that the research procedures associated with process tracing require further crystallization. Specifically, its treatment of rival hypotheses is fundamentally incomplete. The current literature overlooks how passing or failing the process tracing tests should often lead to diverse and potentially counterintuitive inferences concerning the validity of alternative hypotheses. In this article, I draw on the concepts of the predicate and modal branches of formal logic to develop a new framework for conceptualizing the reasoning underlying process tracing tests. This new lens elicits a significant insight: rival hypotheses relate to one another in diverse ways that give rise to diverse inferences. Since the goal of process tracing is to adjudicate among alternative explanations, this observation brings into sharp focus the fact that part of using process tracing tests involves devoting central attention to identifying the relationship among the main hypothesis under examination and all of the rival hypotheses. I develop a more nuanced framework for considering how alternative explanations relate to one another, which in turn will allow researchers to establish with more precision when rival hypotheses can be disqualified from consideration. Reconceptualizing process tracing in terms of formal logic accomplishes three tasks that add significant value to this research program and should enhance both the quality and frequency of its use. First, I demonstrate that rival hypotheses can — and frequently do — relate to the main hypothesis in diverse ways. Consequently, passing the different process tracing tests differentially impacts the validity of rival hypotheses depending on whether they are mutually exclusive to, coincident with, or congruent with the main 1 hypothesis. Second, formal logic concepts as they are applied to process tracing tests provide researchers with a clearer set of criteria with which to evaluate pieces of evidence. I argue that the current designations for the tests have some significant drawbacks that negatively impact the ease with which the methodology is adopted as well as its overall clarity. Once the process tracing tests are redefined in terms of formal logic, we can then explore, by logical proof, how passing or failing a given test bears on the main hypothesis as well as alternative hypotheses. Third, reframing process tracing in terms of formal logic provides researchers with a language more conducive to traversing the quantitative-qualitative methodological divide. Having a more clearly defined set of criteria describing the relationship between the phenomenon under investigation and observable evidence will more easily denote the sorts of statistical results one should expect in large-n studies used to triangulate or bolster the strength of process tracing research. Before laying out the new framework, I turn first to a summary of process tracing as it is currently conceived and employed. I discuss its strengths as well as some of the inherent shortcomings and their implications for researchers and wide adoption of the methodology. 2 Process Tracing: The State of the Art The goals of process tracing are two-fold: first, by examining multiple pieces of sequential evidence within a case, the method evinces underlying processes that give rise to the phenomenon, thus contributing to causal inference; second, it provides a systematized way of establishing how different pieces of evidence bear on competing explanations. As originally conceptualized, process tracing involved the use of historical narratives and within-case analysis as a means of evaluating complex causal processes (George and 2 McKeown, 1985). Following its inception, there have been many additional advances in codifying the procedure of process tracing. A recurring theme in this literature is assessing the strength of inferential leverage that derives from different process tracing tests. 2.1 Advances in Process Tracing A critical advancement in process tracing came with the introduction of four “tests” under which each piece of evidence can be evaluated (Van Evera, 1997). Passing or failing the tests dictates the level of confidence with which we can accept our proposed explanation. The weakest, straw-in-the-wind tests, assesses pieces of evidence that, by themselves, are not sufficiently strong to either confirm a hypothesis if it passes or eliminate the hypothesis if it fails. They allow researchers to gain insight into the overall balance of evidence in favor of (or in opposition to) one hypothesis, yet are not decisive one way or the other. In Van Evera’s account, two additional tests of intermediate strength yield stronger inferences than straw-in-the-wind, though their utility relative to each other is contested in this paper. Hoop tests are used to evaluate observations that must be satisfied in order for an explanation to hold. While failing a hoop test (i.e. failing to meet a necessary condition) decisively rules out a given hypothesis from consideration, passing only suggests that the hypothesis is still a contender. In Van Evera’s framework, the next strongest, smoking-gun, evaluates pieces of evidence that are so uniquely confirmatory for a given explanation that they affirm the validity of the hypothesis. In contrast to hoop tests, passing a smoking-gun test is sufficient to accept an explanation, yet failing to find smoking-gun evidence is not sufficient reason to disqualify it. Finally, the strongest of the tests, deemed doubly-decisive, evaluates pieces of evidence (or a combination of evidence) that are so uniquely applicable and so important for a 3 given explanation that passing confirms the hypothesis and failing disqualifies it. Another important step forward in the development of this methodology was the call to critically assess alternative hypotheses (George and Bennett, 2005). Noting that for any one case, there are often many potential causal paths by which the phenomenon could have occurred, George and Bennett enjoin researchers to map out the alternatives as a crucial stride towards establishing more rigorously the contributions of different tests(2005, 207). Following from this logic, a crucial effort towards codification and unification of the principles in process tracing took on the task of combining Van Evera’s four tests with the consideration of their implications for rival hypotheses (Bennett, 2010; Collier, 2011). The four tests, definitions of passing and failing, their implications for rival hypotheses, and their proposed gradations in the strength of inferences they yield are depicted in Table 1. Table 1: The Four Process Tracing Tests† Sufficient to Affirm Causal Inference 1. Straw-in-the-Wind 3. Smoking-Gun Necessary to Affirm Causal Inference Passing: Affirms relevance of hypothesis, but does not confirm it. Failing: Hypothesis is slightly weakened, though not eliminated. Implications for Rival Hypotheses: Passing: slightly weakens them. Failing: slightly strengthens them. 2. Hoop Failing: Hypothesis is somewhat weakened, though not eliminated. Implications for Rival Hypotheses: Passing: substantially weakens them. Failing: somewhat strengthens them. 4. Doubly-Decisive Passing: Affirms relevance of hypothesis, but does not confirm it. Failing: Eliminates Hypothesis. Implications for Rival Hypotheses: Passing: somewhat weakens them. Failing: somewhat strengthens them. † Passing: Confirms hypothesis. Passing: Confirms hypothesis and eliminates others. Failing: Eliminates Hypothesis Implications for Rival Hypotheses: Passing: eliminates them. Failing: substantially strengthens them. Source: Collier (forthcoming 2011), who adapts the table from Bennett (2010). Mahoney’s recent contribution extends the efforts of Van Evera, George and Bennett, 4 and Collier. In an attempt to make distinctions not among, but within tests in terms of how much inferential leverage they yield, Mahoney’s work dissects the cells by asking whether all hoop tests (or all smoking-gun tests, etc.) are the same, or whether they provide more leverage in some instances than in others (Mahoney, 2011). Adopting what is very nearly a frequentist approach, Mahoney argues that the difficulty of passing a hoop test, for instance, is related to the frequency with which the observation is present. Specifically, he notes that when a certain condition is rare or abnormal, a hoop test based on that condition will be more difficult to pass. Thus, contrary to the way many have conceptualized meeting a necessary condition, he concludes that these more demanding hoop tests have additional utility of lending positive support to a hypothesis if it passes. It is at this point worth asking whether the tests provide an exhaustive account of the possible inferences and implications that we can draw from a piece of evidence. The proper formulation and implications of these tests are of central importance to this article. Categorizing the tests in terms of whether their contribution is necessary or sufficient for affirming causal inference provides a clear and useful framework for researchers who wish to employ process tracing. Yet, many of the inferences that follow from a logical framework have still gone unexamined. 2.2 Shortcomings If the goal of methodological work is to delineate and advocate the usage of different methodological tools, we must ask ourselves the following question: Given the current state of the process tracing literature, should we expect to see more students and researchers within political science adopting this technique? I argue that as it is currently formulated, process tracing exhibits two limitations that impede its adoption and potentially leave it vulnerable to criticism. The first limitation concerns the nature of the terminology, and the second concerns an incomplete treatment of rival hypotheses. 5 One need not look far into the process tracing literature to find a wealth of examples of scholars (or detectives) “using” process tracing as a research tool.1 Upon further examination of the articles held up as exemplars of this methodology, however, it quickly becomes clear that these scholars are not so much utilizing this method, so much as they are implicitly conforming to what it is deemed to be. In fact, it is still quite infrequently that scholars will refer to “process tracing” at all, let alone the names of the corresponding tests.2 None of these scholars (or detectives) — not Tannenwald, not Holmes — are specifying a priori the sort of observation that would constitute a hoop test and then looking for confirmatory or contradictory examples. To illustrate the true peculiarity of an article representing a creditable example of a methodology while never referring to the terminology associated with it, we could imagine how hard-pressed we would be to find a comparably excellent use of regression analysis without the author making reference to “regression coefficients or “statistical significance. Ultimately the language we are expected to use to describe our work matters, and I expect that many substantive scholars searching for a template of good qualitative research would find themselves hesitant to classify their findings as “straws-in-the-wind.” The second and more inimical shortcoming is that its treatment of implications for rival hypotheses is fundamentally incomplete. While it is important to consider explanations in addition to the main hypothesis of interest, it is, furthermore, vital that we also consider the relationships among them. Indiscriminately assuming that if one hypothesis passes a test then rival hypotheses are necessarily weakened could ultimately lead researchers to hastily rule-out explanations that may work in conjunction with the 1 Mahoney (2009) references Tannenwald (1999); Bennett (2010) references Schultz (2001) and Goemans (2000) as well as his own earlier work. 2 A Google Scholar search of “hoop test” returns a total of eight results: three of which are articles or books written on process tracing (rather than using process tracing) and in the other five cases, the phrase is only used once, most often in a footnote. A search on “smoking-gun test” returns five results, four of which are methodological, rather than substantive in nature. “Doubly-decisive test” returns only three works, all of which are on process tracing. Finally, “straw-in-the-wind test” returns two results, both of which are also written on process tracing. 6 primary one. 3 Process Tracing Reframed Contrary to the prior formulation of the process tracing tests, I demonstrate here that passing or failing a given test does not have a singular, deterministic implication for rival hypotheses. Rather, the consequences for competing explanations depend entirely on their relationship to the main hypothesis. I propose reframing the four process tracing tests in terms of formal logic in order to reveal the diverse implications for competing hypotheses. I discuss and reframe each test: smoking-gun tests, hoop tests, doubly-decisive tests, and finally, straw-in-the-wind tests. In order to draw inferences from each, we must consider the implications from the test along three dimensions: (1) how passing a test bears on the hypothesis, (2) what passing a test for the main hypothesis implies for the viability of rival hypotheses, and (3) how failing a test bears on the hypothesis. The central finding that derives from reframing the process tracing tests concerns their diverse, and sometimes counterintuitive, impacts on rival hypotheses. This is the direct result of hypotheses relating to one another in different ways. Before I delineate the new logic-based framework for the tests, I first provide a brief summary of the different relationships among the main and rival hypotheses. The three potential relationships among hypotheses are depicted in Table 2. As I move into describing the tests, the logical implications of these relationships will be further elaborated. 3.1 Sufficiency Tests: Reframing the “Smoking-Gun” Previously known in the literature as “smoking-gun tests,” this test can be usefully reframed in terms of the logic of sufficient conditions and referred to as a Sufficiency Test. To say that a piece of evidence is sufficient to accept a hypothesis as true is to say that, 7 Table 2: Relationships Among Hypotheses Relationship Description Mutually Exclusive Hypotheses are completely disjoint. Bolstering the validity of one necessarily undermines the validity of the other. Coincident Hypotheses are independent of one another. The validity of one neither bolsters nor undermines the validity of the other. Both may contribute to the phenomenon. Congruent Hypotheses are similar and possibly related. Bolstering one hypothesis simultaneously bolsters the other. The possibility also exists that they are jointly sufficient for the outcome. alone, the piece of evidence under examination confirms or validates the hypothesis. Yet, the logical framework reveals that the sufficiency test does not have as strong an impact on rival hypotheses as some have suggested — and certainly not as strong as the “smoking-gun” metaphor implies. The image of the “smoking-gun, while evocative and intuitive, belies the complexity of the phenomena we study. When we come across a person with a smoking-gun standing over a corpse with one bullet wound, that “observation” is certainly sufficient to establish the gun-bearing individual as the murder; and it simultaneously provides convincing (if not certain) evidence exonerating other suspects. The problem is that in social phenomena, the corpse often has a bullet wound in addition to multiple stab wounds and clear evidence of poisoning. Finding one smoking gun in this case, is not only insufficient to explain the murder, but it also clearly fails to eliminate many other suspects from consideration. Essentially, it is not only possible, but likely, that many of the phenomena we study exhibit overdetermination. Overdetermined events exhibit a variety of distinct causes, any one of which may independently be responsible for the event, but due to the presence of 8 the others, the individual cause cannot be determined. Firing squads are a prime example of overdetermination: any one bullet could have killed the victim, yet it might also be the case that two bullets from two different guns jointly killed victim, or that one bullet exacerbated problems caused by a different bullet. 3.1.1 Passing Sufficiency Tests A given hypothesis passes a sufficiency test when a specific observation or piece of evidence, alone, is adequate to validate the hypothesis under consideration. In other words, passing a sufficiency test entails finding a piece of evidence that confirms the validity of the hypothesis under consideration. Alternately, we could think of the hypothesized outcome as a necessary consequence that would have to follow given the occurrence of that causal-process observation. Analytically, if we take E to represent a given piece of evidence and HM to represent the main hypothesis being tested, we can define a sufficiency test as follows, E → HM . (1) As Bennett (2010) observes, an excellent example of research employing evidence that constitutes a sufficiency test is found in Schultz’s (2001) examination of the Fashoda crisis. In seeking to explain why the Fashoda crisis did not escalate to war, Schultz theorizes that the structure of democratic institutions facilitates disclosure of leaders’ private information irrespective of whether leaders desire that such information be made known. So, on the one hand, democratic leaders have a difficult time bluffing, yet on the other hand, threats are all the more credible due to the visibility or public reactions. Schultz argues that Britain’s resolve to “take a hard line against France” was sufficiently credible that France had no choice but to back down or else face near-certain defeat. The sufficiency test evidence he presents in support of this hypothesis comes from a speech delivered by the opposition party in Britain in support of the Prime Minister’s resolve. 9 According to Schultz, such evidence of unanimity of the “hard-line” stance — particularly at a point when Britain’s government was highly polarized — provoked France to retreat rather than face war. The logical implications of passing sufficiency tests are clear for the given hypothesis to which the evidence is relevant, yet two questions remain. What are the implications for additional, rival hypotheses? And, what happens when a hypothesis fails a sufficiency test? 3.1.2 Impacts on Rival Hypotheses What does passing a sufficiency test imply for competing hypotheses? The answer depends, in part, on the nature of the competing hypothesis. Nearly every phenomena is subject to different explanations as to how it arose, yet not all explanations relate to each other in the same way. For the sake of illustration, I continue to refer to the proposed or main hypothesis as HM and I refer to other (rival) hypotheses under consideration as HR . Passing a sufficiency test has different implications for the validity of rival hypotheses depending on the relationship between HM and HR . HM and HR can have one of three relationships: they can be mutually exclusive, coincident, or congruent. The logical implications for differential relationships among main and rival hypotheses have gone unexamined in the previous literature on process tracing, and yet the differences have highly salient consequences for the inferences we draw from research. Mutually Exclusive Hypotheses. The singular condition under which passing a sufficiency test (for HM ) undermines the validity of a rival hypothesis is when the two explanations are mutually exclusive. In other words, if the piece of evidence under examination simultaneously validates HM , yet would be impossible under HR , then this observation would be considered sufficient to both accept HM and reject HR . Formally, this situation occurs when a the presence of a piece of evidence is a sufficient condition 10 for HM and the absence of that same piece of evidence is a necessary condition for HR to hold. By definition (and contraposition) then, E → HM , and (2) E → ¬HR . The burden of proof of mutual exclusivity of two hypotheses, of course, is on the researcher. One would have to convincingly argue (and preferably, demonstrate) that two hypotheses could not co-explain the outcome. Coincident Hypotheses. In contrast to rival explanation that exhibit mutual exclusivity, when nothing precludes HM and HR from coexisting, yet they are unrelated to one another, HM and HR constitute coincident hypotheses. In this case, both hypotheses may be valid in contributing to the outcome, but they are unrelated in such a way that confirmatory evidence for one hypothesis neither bolsters nor undermines the other. Thus, in terms of logical notation, we would observe, E → HM (3) E → HR or ¬HR . An example of simultaneous rival explanations of an outcome is found in the democratic peace literature. In seeking an explanation for the paradox that although democracies are comparably conflict-prone to their autocratic counterparts, they nonetheless exhibit some immunity from fighting one another, scholars have proposed two explanations: the normative model and the institutional model. The logic of the normative perspective holds that different polity types foster different types of norms for dealing with conflict within the state, which are then externalized reliably enough to determine how that state will deal with conflicts of interests between states. Alternately, the institutional approach contends that the institutional structure of democracies acts as a 11 constraint on leaders and makes it fundamentally more difficult for them to wage war than their autocratic counterparts. Each model posits a very different causal story even though they arrive at the same outcome. Yet, does support for one necessarily invalidate the other explanation? The answer here is no. After all, it seems not only reasonable, but likely, that the democratic peace phenomenon exists as a result of multiple compounding factors. A piece of evidence that is sufficient to verify an underlying normative process in the democratic peace phenomenon does not preclude institutional/structural influences from playing a comparably strong role. Congruent Hypotheses. Finally, it is possible to have two (or more) hypotheses for which some amalgamation of evidence is sufficient to verify the contribution of both explanations.3 Analytically, congruent hypotheses are defined as: E → HM and (4) E → HR . Turning back to Schultz’s example, we would do well to ask ourselves, is Schultz’s discovery a true “smoking-gun” in the classic sense of the metaphor? The answer, I argue, is no. For reasons I discuss later in the context of Necessity Tests, Schultz rules out the power asymmetry in Britain’s favor as a possible explanation for the cessation of the Fashoda Crisis. What is more likely, however, is that France observed the power asymmetry in conjunction with Britain’s newfound unanimous resolve to use force, and that combination of factors ultimately prompted France to back down. After all, convincing resolve to use force does not prompt retreat unless one’s opponent is also stronger. I argue that all Schultz did was rule out the balance of power hypothesis as the sole factor explaining the end of the crisis. 3 This concept is analytically similar to the INUS approach to causal inference, in which a condition is by itself neither necessary nor sufficient, yet is jointly sufficient with another condition to explain the phenomenon under examination (Mackie, 1980; Mahoney and Goertz, 2006). 12 3.1.3 Failing a Sufficiency Test Finally, what are the implications for the a hypothesis if an observation fails to satisfy a sufficiency test? The consequences are insubstantial. While many pieces of evidence would alone be sufficient to confirm a hypothesis, their absence does not undermine the validity of the hypothesis. We could imagine, for instance, many different pieces of evidence that would individually be sufficient to validate a hypothesis. The absence of one does not preclude the existence of another comparably strong piece of evidence that would do the same job. 3.2 Necessity Tests: Reconceptualizing “Hoops” In this section, I reformulate what the literature refers to as “hoop tests” in terms of the logic of necessary conditions. For this test, the evidence under examination is not, alone, sufficient to verify the hypothesis, but it is required in order for the hypothesis to be true. 3.2.1 Passing a Necessity Test A given explanation passes a necessity test when a piece of evidence confirms a condition that is required for the validity of the hypothesis. In contrast to sufficiency tests, in which the proposed hypothesis is a necessary outcome of the evidence, here, the evidence/CPO is a necessary condition for the proposed hypothesis to be valid. In other words, the absence of that condition, by definition, compromises the validity of the hypothesis. Analytically, a piece of evidence constitutes a necessity test when, HM → E (5) It is important to note that the evidence/observation is not alone sufficient to establish that HM holds — only that it is possible. When evaluating a given piece of evidence under 13 a necessity test, we must ask, what are the logical implications of passing and failing? I demonstrate in this section that careful consideration of necessity tests under the new framework reveals some new insights about the implications of passing and failing, as well as the relative strength of this test as compared to the others vis-à-vis adjudicating from among multiple hypotheses. Passing a necessity test raises two questions that bear importantly on our evaluation of the hypotheses on the table: (1) What does satisfying a necessity test imply for the hypothesis under consideration? And, (2) What does passing imply for other rival hypotheses? Keeping with the previously defined notation, passing a necessity test entails finding evidence that constitutes a necessary condition or requirement under HM . [EXAMPLE]. An observation that verifies the existence of a necessary condition buttresses the hypothesis under consideration, though is not, by itself, sufficient to verify the causal claim. We should, nonetheless, consider passing necessity tests as offering an important level of support in favor of the main hypothesis. Since, after all, pieces of evidence that we examine are merely the observables that are indicative of an underlying process, finding a concrete piece of evidence verifying that a requirement was satisfied should not be taken lightly. 3.2.2 Implications for Rival Hypotheses As with sufficiency tests, passing a necessity test has different implications for rival hypotheses depending on how the alternatives relate to the hypothesis under consideration. In the current literature on process tracing, others have noted that passing a hoop-test “slightly weakens rival hypotheses” (Collier, 2011). While this assertion is true in some cases, it does not necessarily hold for all alternatives on the table. Once again, since hypotheses may (likely) exhibit heterogeneity in the way that they relate to the main hypothesis, the logical implications for their validity will vary accordingly. Since there 14 are three different possible relationships between HM and HR , passing a necessity test will have one of three different implication for competing hypotheses depending on the nature of the relationship. Mutually Exclusive Hypotheses. First, if the two competing hypothesis are mutually exclusive of one another, passing a necessity test for one has grave consequences for the other. This case is one in which the implications of the necessity test are stronger than have been previously acknowledged. Mutually exclusive hypotheses in the case of necessity tests are analytically defined as, HM → E (6) HRM E → ¬E. In words, mutual exclusivity occurs when the main hypothesis requires the presence of a particular piece of evidence in order for the hypothesis to hold, whereas the rival hypothesis requires the absence of that same piece of evidence. Here, passing a necessity test for HM , not only bolsters HM , but it also necessarily rules out HR from consideration. Coincident Hypotheses. In the event that the piece of evidence under consideration lies outside the scope of relevance of the rival hypothesis, we might call this a coincident relationship. Here, the evidence/observation, while satisfying a requirement for HM , has no necessary effect on the validity of HR . Both explanations may contribute to the phenomenon. [EXAMPLE] HM → E (7) HR → E or ¬E Congruent Hypotheses. Lastly, if the two competing explanations both require the same observation, they ought to be considered congruent hypotheses. In this case, passing a necessity test for HM not only does not weaken HR , but rather, the same piece of 15 evidence lends support to both cases. HM → E (8) HR → E. Although this sort of evidence is obviously not as useful as one that bears on only one or another hypothesis, it is nonetheless important that researchers acknowledge these observations and are able to draw correct inferences as to their affects on rival hypotheses. Of course, the nature of the relationships will differ depending on the piece of evidence under consideration. Obviously there are some explanations that are congruent on one piece of evidence, but sharply divergent or even mutually exclusive on others. 3.2.3 Failing a Necessity Test Failing necessity tests has grave implications for hypotheses. In stark contrast to the inconsequentiality of failing sufficiency tests, when a piece of evidence undermines a hypothesis via a necessity test (i.e. implies that a necessary condition for that hypothesis is not satisfied) that hypothesis must be immediately stricken from consideration. This conclusion follows from a logical deductive process known as contraposition. The intuition is that if one removes a necessary condition or a requirement for an outcome, the outcome thus becomes impossible. Formally defined, the absence of a necessary condition precludes the existence of the sufficient condition, here: HM → E (9) ¬E → ¬HM . Thus, failure to observe a requirement for the hypothesis, by definition, controverts the validity of that hypothesis. 16 3.3 “Doubly-Decisive” Tests and Biconditionality Referred to in the literature as doubly-decisive, this test entails finding a piece of evidence that is of such high certitude and uniqueness that it confirms the main hypothesis while simultaneously eliminating rival explanations (Van Evera, 1997)(Collier, 2011). In terms of logic, this piece of evidence exhibits what is known as biconditionality: it constitutes both a sufficient and necessary condition for an outcome. Formally, this relationship is depicted as follows: E ⇐⇒ HM (10) ¬E ⇐⇒ ¬HM . In words, the observation is possible if, and only if the main hypothesis is valid. 3.3.1 Implications for Rival Hypotheses Perhaps surprisingly, passing a biconditional test does not have the impact on alternative explanations that much of the literature currently implies. Recent process tracing literature has mistakenly interpreted biconditional satisfaction (i.e. finding a piece of evidence that constitutes both a necessary and sufficient condition for an outcome) to imply both confirmation of the main hypothesis as well as incontestable elimination of rival hypotheses (Bennett, 2010). If the piece of evidence is sufficiently relevant to the hypothesis, then it might carry the strength implied in the literature. However, while one piece of evidence might be strong enough to confirm a hypothesis if it passes the test, and eliminate that same hypothesis if it fails the test, it is not necessarily strong enough to simultaneously rule out alternative hypotheses. Take, for instance, Van Evera’s example: “If a bank security camera records the faces of bank robbers, its film is decisive both ways — it proves suspects guilty or innocent” (Van Evera, 1997, 32). Certainly, the recording can confirm and absolve suspects if the hypothesis concerns those who physically held-up 17 the tellers. Yet, if the outcome we wish to explain instead concerns involvement in the robbery, relying solely on the security camera evidence might lead us to wrongfully exonerate the driver of the getaway car, crooked bank personnel, or the person who organized (though was not present for) the operation. Consequently, although a piece of evidence can be both necessary and sufficient to confirm (or disconfirm) one specific explanation, the validity of coincident or congruent explanations is not necessarily undermined. In other words, even the strength of biconditional tests is called into question when we encounter situations in which the phenomenon is overdetermined. 3.4 Leveraging Tests: More Than Just a “Straw-in-the-Wind” As with the “smoking-gun”, the “straw-in-the-wind test” demands relabeling, in part, due to the misleading and overly narrow nature of the metaphor. Straws-in-the-wind refer to fleeting observations that slightly hint at future events. By this narrow definition, a great deal of useful and salient evidence in support (or contest) of hypotheses would be left uncategorized. I propose reconceptualizing/relabeling the “straw-in-the-wind” test as a leveraging test. The nature of this test is somewhat distinct from the previous tests due to the nature of the evidence. So-called “straw-in-the-wind” tests are assessed with pieces of evidence that are insufficiently unique and certain to give rise to decisive confirmatory inferences (Van Evera, 1997, 32). Such evidence may lend support to (or undermine) an explanation, but neither passing nor failing the test establishes conclusively the validity of the hypothesis. I argue, however, that its incertitude should not belie its importance. Evidence that points researchers in one or another direction is an equally salient part of the process as is evidence that points to a finish line or a dead end. Yet the metaphor ascribed to the test suggests calls into question the importance of the evidence that fits into this 18 category. We must, after all, keep in mind that the goal of process tracing — drawing causal inferences — is a difficult one. Consequently, much of the evidence that we come across is not likely to qualify evaluation via necessity, sufficiency, or biconditional tests. It is useful to have a designation that allows researchers to present their evidence without suggesting that they are relying on chance occurrences (i.e. pieces of straw traveling in the wind) to corroborate their theories. With structured research and reasoning, some pieces of corroborating evidence provide much analytic leverage, while others do not. The question that remains is, how can we impose useful structure on a test meant to evaluate uncertain evidence? Observations to be evaluated under this test will result in one of two outcomes: it will either provide leverage in favor of a given hypothesis, or it will provide leverage against a hypothesis. 3.4.1 Leverage in Favor To pass a leveraging test, researchers must find evidence that bolsters the credibility of the main hypothesis. Due to the uncertain nature of the evidence, we must alter slightly the formal notation associated with leveraging tests. To capture the uncertainty, I draw on modal logic, which is a branch of formal logic that extends the notation to capture possibility, symbolized by ♦. Thus, the analytic definition of evidence that provides leverage in favor of a hypothesis is, E → ♦HM . (11) In words, we would say that the observation suggests that the main hypothesis is possible, or even more probable. 19 3.4.2 Leverage Against Failing a leveraging test entails finding a piece of evidence that counts against the main hypothesis. One of two conditions can result in failure. First, we may observe something that is seemingly incompatible with the hypothesis under consideration: E → ¬♦HM . (12) Or else, we might find tentative support for a rival hypothesis that has a priori been designated as mutually exclusive. E → ♦HRM E . (13) In either of the aforementioned cases, the observation negatively impacts the main hypothesis. Reframed, the new process tracing tests and their definitions are summarized in Table 3. Table 3: Process Tracing Reframed Sufficient to Affirm Causal Inference Straw-in-the-Wind → Leveraging Smoking-Gun → Sufficiency Passing: Provides leverage in favor of H Failing: Provides leverage against H Implications for Rival Hypotheses: Variable Passing: Sufficient to Establish H Failing: Inconsequential for H Implications for Rival Hypotheses: Variable Failing: Eliminates H Implications for Rival Hypotheses: Variable Passing: Sufficient to Establish H Failing: Eliminates H Implications for Rival Hypotheses: Variable Necessary to Affirm Causal Hoop → Necessity Inference Passing: Provides leverage in favor of H † Source: Collier (forthcoming 2011). 20 Doubly-Decisive → Biconditional 4 Assessing the Relative Strength of Tests An additional insight that follows from this discussion concerns the relative strength of the different tests. The literature on process tracing suggests (uniformly) that necessity tests (or, hoop tests as they are called) provide a weaker benchmark for hypothesisevaluation than do sufficiency (smoking-gun) tests (Van Evera, 1997, 31)(Collier, 2011). I argue, however, the goals of process tracing suggest that this evaluation be reversed. If the purposes of process tracing are to both aid in causal inference and adjudicate from among competing hypotheses, necessity tests offer two advantages that elevate their importance so as to — at the very least — put them on equal footing with sufficiency tests. First, failing a necessity test allows researchers to eliminate the contravening hypothesis from consideration. Irrespective of the methodological tradition, scholars are repeatedly enjoined to take plausible alternative explanations into consideration. This task is no small feat, especially since we so frequently investigate phenomena that exhibit equifinality or overdetermination. Consequently, a test that allows for the dismissal of hypotheses is of great importance in the process of narrowing down the scope of potential explanations, which, by definition, makes any project more tractable. Appraisal of a theory must take into account the severity of the tests to which the theory has been subjected (Popper, 2009). Sufficiency tests — except under the condition of mutually exclusive hypotheses — cannot falsify; and falsification is crucial for the task of adjudication. The second advantage of necessity tests derives from their ability to be specified a priori. Van Evera, I argue, attaches excessive merit to the uniqueness of tests, and insufficient merit to their certitude. Yet, the very problem with sufficiency tests concerns the uniqueness of the evidence required to satisfy them. A piece of evidence so strong and unique that it confirms a hypothesis is, by the same token, difficult to specify ahead 21 of time. It is a more tractable, and, likely, a more fruitful task for researchers to ask “what would I need to see to make this hypothesis false?” than it is to ask “what would I need to see to confirm this hypothesis?” 5 Implications for Research and Concluding Remarks This new framework has important implications for the process of laying out and executing a research design. Before evidence is collected — irrespective of its source — researchers should aim to specify two aspects of their projects. First, for every hypothesis under consideration, researchers ought to specify a priori the sort of observations that would pass (and fail) necessity tests. This list is crucial in achieving the goals of process tracing (i.e. providing criteria for the systematic elimination of alternative hypotheses) and furthermore, it provides an efficient and methodical approach to evidence gathering. Second, given a (hopefully) comprehensive list of explanations, researchers ought to posit the relationships among the main hypothesis and the rival hypotheses. This prior knowledge, once again, can shape the process of evidence gathering and will allow researchers to judge the impact of a given piece of evidence with considerably more precision. Without a framework specifying how relationships among hypotheses affect the inferences we draw from evidence, we risk mistakenly discounting sound explanations or mistakenly accepting faulty ones. 22 References Bennett, Andrew. 2010. Process Tracing and Causal Inference. In Rethinking Social Inquiry: Diverse Tools, Shared Standards, ed. Henry E. Brady and David Collier. Second ed. Lanham, MD: Rowman and Littlefield. Collier, David. 2011. “Teaching Process Tracing.” forthcoming pp. 1–52. George, Alexander L. and Andrew Bennett. 2005. Case Studies and Theory Development in the Social Sciences. BCSIA Studies in International Security Cambridge, MA: MIT Press. George, Alexander L. and Timothy J. McKeown. 1985. Case Studies and Theory Development in the Social Sciences. Cambridge, MA: MIT Press. Goemans, Hein. 2000. War and Punishment: The Causes of War Termination and the First World War. Princeton, NJ: Princeton University Press. Mackie, John Leslie. 1980. The Cement of the Universe: A Study of Causation. Oxford: Oxford University Press. Mahoney, James. 2009. “After KKV: The New Methodology of Qualitative Research.” World Politics 62(01):120. Mahoney, James. 2011. “The Logic of Process Tracing Tests in the Social Sciences.”. Mahoney, James and Gary Goertz. 2006. “A Tale of Two Cultures: Contrasting Quantitative and Qualitative Research.” Political Analysis 14(3):227–249. Popper, Karl. 2009. The Logic of Scientific Discovery. New York: Routledge Classics. Schultz, Kenneth. 2001. Democracy and Coercive Diplomacy. Cambridge: Cambridge University Press. Tannenwald, Nina. 1999. “The Nuclear Taboo: The United States and the Normative Basis of Nuclear Non-Use.” International Organization 53(3):433–468. Van Evera, Stephen. 1997. Guide to Methods for Students of Political Science. Ithica, NY: Cornell University Press. 23
© Copyright 2026 Paperzz