Qualitative Methods - BU Blogs

Qualitative Methods
John Gerring
Department of Political Science
Boston University
[email protected]
Forthcoming:
Annual Review of Political Science 20 (May/June 2017)
Draft: 1 June 2016
Estimated pages: 23
(http://www.annualreviews.org/page/authors/article-length-estimator-1)
Comments welcome!
Please do not cite without permission
1
Qualitative methods, broadly construed, extend back to the very beginnings of social and
political analysis (however that might be dated). 1 Selfconscious reflection on those methods,
however, is comparatively recent. The first methodological statements of contemporary
relevance grew out of the work of logicians, philosophers, and historians in the nineteenth
century, most importantly J.S. Mill (1843). To be sure, these scholars were in a quest for science,
understood as a unified venture. So the notion of a method that applies only to qualitative data
would have made little sense to them.
At the turn of the twentieth century, a bifurcation appeared between quantitative and
qualitative methods (Platt 1992). The natural sciences, along with economics, moved fairly
quickly and without much fuss into the quantitative camp, while the humanities remained largely
qualitative in orientation. The social sciences found themselves in the middle – divided between
scholars aligned with each camp, and some who embraced both. For this reason, the qual/quant
distinction has assumed considerable importance in these fields, and very little importance
outside these fields.
Perhaps it is not coincidental that the quest for a method to qualitative inquiry has
proceeded further in the social sciences than in the humanities. And among the social sciences
one might argue that political science has gone further than any other in developing the field of
qualitative methods. Accordingly, this review article focuses primarily on work produced by
political scientists, with an occasional glance at neighboring disciplines.
I begin by discussing the time-honored qualitative/quantitative distinction. What is
qualitative data and analysis and how does it differ from quantitative data and analysis? I propose
a narrow definition for “qualitative” and then explore the implications of that definition. I also
explore in a speculative vein some of the factors underlying the ongoing Methodenstreit between
scholars who identify with quantitative and qualitative approaches to social science. In the
remainder of the article I discuss areas of qualitative research that seem especially fecund,
judging by output over the past decade. This includes case-selection, causal inference, and multimethod
research.
In treating these subjects I try to represent the current state of the field. However,
representing every position in every debate would not be possible in the short space of this
review. There are simply too many ways to cut the cake. Instead, I endeavor to reduce the
material in a fashion that incorporates work conducted by many scholars but – inevitably –
imposes my own views on the subject matter.
Even so, many vitally important subjects are neglected in this short review. I do not
address concept formation (Collier & Gerring 2009; Goertz 2005), typological methods (Collier,
LaPorte & Seawright 2012; Elman 2005; George & Bennett 2005), set theory and qualitative
comparative analysis (Mahoney 2010; Mahoney & Vanderpoel 2015; Rihoux 2013), data
archiving, transparency, and replication (Elman & Kapiszewski 2014; Elman, Kapiszewski &
Vinuela 2010; Lieberman 2010), comparative historical analysis (Mahoney & Thelen 2015), path
dependence (Bennett & Elman 2006a; Boas 2007; Page 2006), the organizational features of
qualitative methods (Collier & Elman 2008), interpretivism and ethnography (Schatz 2009;
Yanow & Schwartz-Shea 2013), or other methods of data collection grouped together under the
rubric of field research (Kapiszewski, MacLean & Read 2015). Fortunately, these topics are
amply covered in recent work, as the foregoing citations attest. I should also signal that the
following discussion pertains mostly to causal inference, leaving aside many knotty questions
pertaining to descriptive inference (Gerring 2012).
Having acknowledged my biases and omissions – especially important in a review
focused on a subject as contested as qualitative methods – let us begin.
I am grateful to Colin Elman, Evan Lieberman, Jim Mahoney, and David Waldner for comments and suggestions
on this manuscript.
1
2
Qual and Quant
Although the qual/quant distinction is ubiquitous in social science today, the distinction is
viewed differently by scholars identified with each camp. As a rule, scholars whose work is
primarily quantitative tend to view social science as a unified endeavor, following similar rules
and assumptions. The naturalistic ideal centers on goals such as replication, cumulation, and
consensus – all of which point toward a single logic of inference (Beck 2006, 2010; King,
Keohane & Verba 1994).
By contrast, scholars whose work is primarily qualitative tend to view the two modes of
inquiry as distinctive, perhaps even incommensurable. They are more likely to believe that
knowledge of the world is embedded in theoretical, epistemological, or ontological frameworks
from which we can scarcely disentangle ourselves. They may also identify with the
phenomenological idea that all human endeavor, including science, is grounded in human
experience. Since experiences – inevitably couched in positions of differential power and status –
vary, one can reasonably expect that the methods and goals of social science might also vary. The
apparent embeddedness of knowledge reinforces qualitative scholars’ predilection toward
pluralism, as it suggests that there are fundamentally – and legitimately – different ways of going
about business (Ahmed & Sil 2012; Bennett & Elman 2006b: 456-57; Goertz & Mahoney 2012;
Hall 2003; Mahoney & Goertz 2006; Shapiro, Smith & Masoud 2004; Sil 2000; Yanow &
Schwartz-Shea 2013).
Following the axiom that where one sits determines where one stands, we must also
consider the stakes in this controversy. Over the past century, quantitative work has been on the
ascendant and qualitative work has been cast in a defensive posture. Qualitative researchers are at
pains to explain their work in ways that those in the quantitative tradition can understand.
Uncomfortable with the prospect of absorption into a “quantitative template,” one may surmise
that many qualitative scholars have sought to emphasize the distinctiveness of what they do for
strategic reasons – establishing a nature preserve for endangered species, as it were.
Whatever its intellectual and sociological sources, the question of unity or dis-unity
depends upon how one chooses to define similarity and difference. Any two objects will share
some characteristics and differ in others. It follows that they may be either compared or
contrasted, depending upon the author’s point of view. Quantitatively inclined scholars may
choose to focus on similarities while qualitatively inclined scholars choose to focus on
differences. Both are correct, as far as they go. The half-empty/half-full conundrum seems
difficult to overcome in this particular context. 2 To put the matter in a more specific frame: most
political scientists probably agree with Brady & Collier (2010) that there are “diverse tools” (the
pluralistic angle) as well as “shared standards” (the monist angle). 3 But they do not necessarily
agree on what those shared standards are or to what extent they should discipline the work of
social science.
Any attempt to resolve the monism/pluralism question that begins with high-level
concepts (e.g., monism and pluralism, logic of inquiry, epistemology, commensurability,
naturalism, interpretivism) is probably doomed to failure. These words are loaded, and once they
have been uttered the die is cast. Those who identify with either camp are likely to dig in their
heels.
I propose, therefore, to take a ground-level approach that seeks to avoid diffuse – and
loaded – concepts from philosophy of science, focusing instead on matters of definition. What,
exactly, is qualitative data? And what, by contrast, is quantitative data? We shall then explore the
This is nicely illustrated in recent arguments about causation (Reiss 2009).
A more radical pluralist view, associated with post-structuralism (Rosenau 1992)), denies the existence of shared
standards. I suspect that few political scientists hold that view.
2
3
3
repercussions of this distinction, working toward some tentative conclusions which may resolve
some (though not all) aspects of the qual/quant debate.
Definitions
Qualitative and quantitative are usually understood as antonyms. The resulting polar concepts
may be viewed as a continuum (a matter of degrees) or as a set of crisp concepts (with clear-cut
boundaries). In either case, the two terms are defined in opposition to each other. Let us
consider some of the attributes commonly associated with these contrasting approaches.
Qualitative work is expressed in natural language while quantitative work is
expressed in numbers and in statistical models. Qualitative work employs small
samples, while quantitative work is large-n. Qualitative work draws on cases chosen
in an opportunistic or purposive fashion while quantitative work employs
systematic (random) sampling. Qualitative work is often focused on particular
individuals, events, and contexts, lending itself to an idiographic style of analysis.
Quantitative work is more likely to be focused on features that (in the researcher’s
view) can be generalized across a larger population, lending itself to a nomothetic
style of analysis.
I shall suppose that all of the foregoing contrasts contain some truth; that is, they
describe patterns found in the work of social scientists, even if there are many exceptions. And
let us further suppose that they resonate with common usage of these terms, as reflected in
extant work on the subject (e.g., Bennett & Elman 2006b; Brady 2010; Caporaso 2009; Collier &
Elman 2008; Glassner & Moreno 1989; Goertz & Mahoney 2012; Hammersley 1992; King,
Keohane & Verba 1994; McLaughlin 1991; Levy 2007; Morgan 2012; Patton 2002; Schwartz &
Jacobs 1979; Shweder 1996; Snow 1959/1993; Strauss & Corbin 1998). If so, we have usefully
surveyed the field. But we have not provided anything more than a semantic map of this rugged
terrain. And because the foregoing attributes are multidimensional, the subject remains elusive.
We cannot bring methodological clarity to it because “it” remains ambiguous.
My goal is to arrive at a minimal definition that bounds our subject in a fairly crisp
fashion, that resonates with extant understandings (subsuming many of the meanings contained
in the passage above), and that does not trespass on other well-established terms. (It would be
inefficient, semantically speaking, to conflate qualitative with idiographic, ethnographic, or some
other term in this family of concepts.) In addition, it would be helpful if the proffered definition
accounts for (in a loosely causal sense) the various attributes commonly associated with the
terms “qualitative” and “quantitative” as surveyed above.
With these goals in mind, I propose that the defining feature of qualitative work is its use
of non-comparable observations – observations that pertain to different aspects of a causal or
descriptive question. As an example, one may consider the clues in a typical detective story. One
clue concerns the suspect’s motives; another concerns his location at the time the crime was
committed; a third concerns a second suspect; and so forth. Each observation, or clue, draws
from a different population. This is why they cannot be arrayed in a matrix (rectangular) dataset
and must be dealt with in prose (aka narrative analysis). It is also why we have difficulty counting
such observations. The time-honored question of quantitative research – What is the n? – is
impossible to answer in a definitive fashion. Likewise, styles of inference based on qualitative
data operate somewhat differently than styles of inference based on quantitative data.
I therefore define quantitative observations as comparable (along whatever dimensions
are relevant) and qualitative observations as non-comparable, regardless of how many there are.
When qualitative observations are employed for causal analysis they may be referred to as causalprocess observations (Brady 2010), though I shall continue to employ the more general (and less
bulky) term, qualitative observation, which applies to both descriptive and causal inferences.
The notion of a qualitative or quantitative analysis is, accordingly, an inference that rests
on one or the other sort of data. If the work is quantitative, it enlists patterns of covariation
4
found in a matrix of observations and analyzed with a formal model (e.g., set theory/QCA,
frequentist statistics, Bayesian probabilities, randomization inference) to reach a descriptive or
causal inference. If the work is qualitative, the inference is based on bits and pieces of noncomparable observations that address different aspects of a problem. Traditionally, these are
analyzed in an informal fashion, an issue taken up below.
Some strategies of data collection seem inherently qualitative, e.g., unstructured
interviews, participant-observation (ethnography), and archival work. This is because researchers
are likely to incorporate a wide variety of clues drawn from different kinds of sources and
addressing different aspects of a problem. The different-ness of the evidence makes them noncomparable, and hence qualitative. Other data collection strategies such as standardized surveys
are inherently quantitative, as they involve counting large numbers of observations that are
comparable by assumption. Of course, they might not actually be comparable. We are speaking
here of assumptions about the data generating process, not about the truth with a capital T. But
we cannot avoid assumptions about the world, and these assumptions – quite rightly – lead
researchers to adopt one or the other method of apprehending that reality.
Before moving on I want to call attention to the fact that the proposed definition of
qualitative/quantitative offered here imposes a somewhat narrower purview on the subject than
is common in everyday speech. It encompasses much of what scholars refer to as qualitative, but
not everything. It excludes qualitative comparative analysis, for example. Note that QCA utilizes
multiple observations that are comparable to each other, as suggested by the matrix format that
underlies QCA algorithms (crisp, fuzzy, and so forth). It runs orthogonal to the notion of a
“qualitative” level of measurement, i.e., a binary or ordinal scale. Qualitative, as the term is used
here, refers to a type of data and analysis, not a type of scale.
Other contrasts might be drawn with extant usages of “qualitative/quantitative,” which is
extremely loose. The point is, in order to say anything about our subject we need to circumscribe
it. And in circumscribing it we necessarily include some phenomena and exclude others. There is
no getting around the somewhat arbitrary act of definition. Let us now turn to the payoff: What
might we learn about qualitative (and quantitative) research when defined in this manner?
Converting Words to Numbers
No qualitative observation is immune from quantification. Interviews, pictures, ethnographic
notes, and texts drawn from other sources may be coded, either through judgments exercised by
coders or through mathematical algorithms (Grimmer & Stewart 2013). By coding I refer to the
systematic measurement of the phenomenon at hand – reducing the information at hand to a
small number of dimensions, consistently defined across the units of interest. All that is required,
following our definition, is that multiple observations of the same kind be produced and (voila!)
quantitative observations are born. These may then be represented in the matrix format familiar
to those who work with rectangular datasets.
Of course, there are often practical obstacles to quantification. Perhaps additional
sources (informants, pictures, texts) are unavailable. Perhaps, if available, they are not really
comparable, or they introduce problems of causal identification (e.g., heterogeneity across cases
that could pose a problem of noise or confounding). Alternatively, it may be possible to generate
additional (comparable) observations but not worthwhile, e.g., because the first observation is
sufficient to prove the point at issue. Sometimes, one clue is decisive. Nonetheless, in principle,
if the researcher’s assumptions of comparability are justified, qualitative data can become
quantitative data. The plural of anecdote is data.
Something is generally lost in the process of reducing qualitative information to
quantitative data. One must ignore the unique aspects of each qualitative observation in order
render them comparable. If one wishes to generalize across a population, ignoring idiosyncratic
features of the data is desirable. But if one wishes to shed light on these heterogeneous features
5
the conversion of qualitative to quantitative data will iron out the ruggedness of the landscape –
obscuring variation of theoretical interest. Information loss must be reckoned with. 4
Finally, and perhaps most importantly, there is an asymmetry between qual and quant.
One can convert qualitative data to quantitative data but not the reverse. It is a one-way street.
Once a piece of information is rendered in a matrix template whatever unique aspects may have
adhered to that observation have been lost. Data reduction is possible, but not expansion. The
singular of data is not anecdote, which is to say one can never recover an anecdote from a data
point.
Contrasting Affinities
It follows from our discussion that the utility of qualitative and quantitative data varies according
to the researcher’s goals, i.e., whether the research is exploratory and whether it is case-based. Other
features of a theory or an analysis do not seem to have a direct bearing on the relative utility of
these varying approaches. 5
First, qualitative data is likely to be more important at early stages of research, when not
much is known about a subject or when the goal is to uncover a new angle or hypothesis.
Qualitative data are ideal for exploratory analysis, and often indispensable. Arguably, social
science knowledge begins at a qualitative level and then (sometimes) proceeds to a quantitative
level. This is implicit in the notion that data can be converted from qual to quant, but not the
reverse.
Granted, the qualitative component of research sometimes follows the quantitative
component, as when a cross-case study is combined with a case-based investigation of causal
mechanisms. Here, the causal effect may be well-understood but not the pathway(s) by which X
is connected to Y. Again, case-based investigation plays an exploratory role.
Second, qualitative data is likely to be more useful insofar as a study is focused on a
single case (or event), or a small number of cases (or events). Such investigations bear close
resemblance, methodologically speaking, to a detective’s quest to explain a crime, which may be
thought of as a single event or a small number of associated events (if it is a string of crimes
committed by the same person or group). The reason that these investigations often rest on
qualitative data is that the researcher wishes to know a lot about the chosen case/event, and this
requires a supple mode of investigation that allows one to draw different kinds of observations
from different populations.
Whether case-level analysis is warranted may rest on other, more fundamental aspects of
the analysis. For example, case-level analysis is more plausible if the cases of theoretical interest
are heterogeneous and scarce (e.g., nation-states) rather than homogeneous and plentiful (e.g.,
firms or individuals), if the causal factor cannot be manipulated by the researcher, if the causal
factor or outcome is extremely rare, if the theory is focused on a single case or a small set of
cases, and so forth.
Of course, any rendering of a complex phenomenon involves some loss of information. This is true even for the
most faithful – and detailed – descriptions of reality such as those produced by ethnomethodologists (Garfinkel
1967).
5 For example, the long-standing distinction between research that seeks a complete explanation of an outcome
(“causes-of-effects”) and research that narrows its scope to a single hypothesis (“effects-of-causes”) seems to bear
ambivalently on the qual/quant divide. Note that a causes-of-effects explanation may be provided solely on the basis
of quantitative data, e.g., a “full” regression model. Likewise, an effects-of-causes explanation may be provided
based solely on qualitative data, i.e., a process tracing analysis).
4
6
Case Selection
We have observed that case-based analysis is likely to contain qualitative observations (even if it
also incorporates quantitative observations). Consequently, the question of case-selection – how
a case, or a small number of cases, is chosen from a large number of potential cases – is central
to qualitative analysis.
Quite a number of case-selection typologies have been proposed over the years, with a
noticeable acceleration in the past decade. Mill (1843/1872) proposes the method of difference
(aka most-similar method) and method of agreement (aka most-different method), along with
several others that have not gained traction. Lijphart (1971: 691) proposes six case study types: atheoretical, interpretative, hypothesis-generating, theory-confirming, theory-infirming, and
deviant. Eckstein (1975) identifies five species: configurative-idiographic, disciplinedconfigurative, heuristic, plausibility probes, and crucial-case. Skocpol & Somers (1980) identify
three logics of comparative history: macro-causal analysis, parallel demonstration of theory, and
contrast of contexts. Gerring (2007) and Seawright & Gerring (2008) identify nine techniques:
typical, diverse, extreme, deviant, influential, crucial, pathway, most-similar, and most-different.
Levy (2008) identifies five case study research designs: comparable, most and least likely, deviant,
and process tracing. Rohlfing (2012: ch3) identifies five case-types – typical, diverse, most-likely,
least-likely, and deviant – which are applied differently according to the purpose of the case
study. Blatter & Haverland (2012: 24-26) identify three explanatory approaches – covariational,
process tracing, and congruence analysis – each of which offers a variety of case-selection
strategies.
Building on these efforts, Gerring & Cojocaru (2016) propose a new typology that
(arguably) qualifies as the most comprehensive to date, incorporating much of the foregoing
literature. Its organizing feature is the goal that a case study is intended to serve, identified in the
first column of Table 1. Column 2 specifies the number of cases (n) in the case study. It will be
seen that case studies enlist a minimum of one or two cases, with no clearly defined ceiling
(though at a certain point the defining goal of a case study – intensive analysis of a case –
becomes dubious). Column 3 clarifies which dimensions of the case are relevant for caseselection, i.e., descriptive features (D), causal factors of theoretical interest (X), background
factors (Z), and/or the outcome (Y). Column 4 specifies the criteria used to select a case(s) from
a universe of possible cases. Column 5 offers an example of each case-selection strategy. In what
follows, I offer a brief resume of the resulting typology.
7
Table 1: Case-Selection Strategies
Goals/Strategies
N
Factors
Criteria for cases
Examples
I. DESCRIPTIVE (to describe)
● Typical
● Diverse
1+
2+
D
D
Mean, mode, or median of D
Typical sub-types
Lynd & Lynd (1929) Middletown
Fenno (1977, 1978) Home Style
1. Exploratory (to identify HX)
● Extreme
● Index
● Deviant
● Most-similar
● Most-different
● Diverse
1+
1+
1+
2+
2+
2+
X or Y
Y
ZY
ZY
ZY
ZY
Maximize variation in X or Y
First instance of ∆Y
Poorly explained by Z
Skocpol (1979) States and Social Revolutions
Pincus (2011) 1688: First Modern Revolution
Alesina et al (2001) Why Doesn’t US Have Welfare State?
Epstein (1964) A Comparative Study of Canadian Parties
Karl (1997) Paradox of Plenty
Moore (1966) Social Origins of Dictatorship and Democracy
2. Estimating (to estimate HX)
● Longitudinal
● Most-similar
1+
2+
XZ
XZ
X changes, Z constant or biased against HX
Similar on Z, different on X
Friedman & Schwartz (1963) Monetary History of US
Posner (2004) Political Salience of Cultural Difference
3. Diagnostic (to assess HX)
● Influential
● Pathway
● Most-similar
1+
1+
2+
XZY
XZY
XZY
Greatest impact on P(HX)
Ray (1993) Wars between Democracies
Mansfield & Snyder (2005) Electing to Fight
Walter (2002) Committing to Peace
II. CAUSAL (to explain Y)
Similar on Z, different on Y
Different on Z, similar on Y
All possible configurations of Z (assumption: X ∈ 𝒁)
X→ 𝑌 strong, Z constant or biased against HX
Similar on Z, different on X & Y, X→ 𝑌 strong
D = descriptive features (other than those to be described in a case study). H X = causal hypothesis of interest. P(H X) = the probability of HX. X = causal factor(s) of theoretical
interest. X→ 𝑌 = apparent or estimated causal effect. Y = outcome of interest. Z = vector of background factors that may affect X and/or Y.
8
Many case studies are primarily descriptive, which is to say they are not organized around a
central, overarching causal hypothesis. Although writers are not always explicit about their
selection of cases, most of these decisions might be described as following a typical or diverse case
strategy. That is, they aim to identify a case, or cases, that exemplify a common pattern (typical)
or patterns (diverse). This follows from the minimal goals of descriptive analysis. Where the goal
is to describe there is no need to worry about the more complex desiderata that might allow one
to gain causal leverage on a question of interest.
Other case studies are oriented toward causal analysis. A good case (or set of cases) for
purposes of causal analysis is generally one that exemplifies quasi-experimental properties,
replicating the virtues of a true experiment even while lacking a manipulated treatment (Gerring
& McDermott 2007). Specifically, for a given case (observed through time) or for several cases
(compared to each other), variation in X should not be correlated with other factors that are also
causes of Y, which might serve as confounders (Z), generating a spurious (non-causal)
relationship between X and Y.
Exploratory case studies aim to identify a hypothesis. Sometimes, the researcher begins
with a factor that is presumed to have fundamental influence on a range of outcomes. The
research question is, what outcomes (Y) does X affect? More commonly, the researcher works
backward from a known outcome to its possible causes. The research question is, therefore,
what accounts for variation in Y? Or, if Y is a discrete event, Why does Y occur? The researcher
may also have an idea about background conditions, Z, that influence Y but are not of
theoretical interest. The purpose of the study, in any case, is to identify X, regarded as a possible
or probable cause of Y. Specific exploratory techniques may be classified as extreme, index, deviant,
most-different, most-similar, or diverse, as specified in Table 1.
Estimating cases aim to test a hypothesis by estimating a causal effect. That might mean a
precise point estimate along with a confidence interval (e.g., from a time-series or synthetic
matching analysis), or an estimate of the “sign” of a relationship, i.e., whether X has a positive,
negative, or no relationship to Y. The latter is more common, not only because of the small size
of the sample (at the case level) but also because it is more likely to be generalizable across a
population of cases. In either situation, case selection rests on information about X and Z (not
Y). Two general approaches are viable – longitudinal and most similar – as outlined in Table 1.
Diagnostic case studies help to confirm, disconfirm, or refine a hypothesis (garnered from
the literature on a subject or from the researcher’s own ruminations) and identify the generative
agent (mechanism) at work in that relationship. All the elements of a causal model – X, Z, and Y
– are generally involved in the selection of a diagnostic case. Specific strategies may be classified
as influential, pathway, or most-similar, as shown in Table 1.
Note that virtually all of these case selection strategies may be executed in an informal,
qualitative fashion or by employing a quantitative algorithm. For example, a deviant case could
be chosen based on a researcher’s sense about which case(s) is poorly explained by extant
theories. Or it might be chosen by looking at residuals from a regression model. Discussion of
the pros and cons of algorithmic case selection can be found in Gerring (2017).
Validation
The reader may wonder, how does one know whether a designated strategy will achieve what it is
intended to achieve? Given a research goal, which is the best way to choose cases? Why these
strategies (listed in Table 1) and not others? Evidently, there are serious problems of validation to
wrestle with.
With this nagging question in mind, several attempts have been made to assess varying
case selection strategies using simulation techniques. Herron & Quinn (2016) assess estimating
strategies, i.e., where the case is intended to measure causal effects. Seawright (2016) assesses
diagnostic strategies, where the case is designed to help confirm or disconfirm a causal
hypothesis. Lucas & Szatrowksi (2014) assess QCA-based strategies of case-selection.
9
It would take some time to discuss these complex studies, so I shall content myself with
several summary judgments. First, case selection techniques have different goals, so any attempt
to compare them must focus on the goals that are appropriate to that technique. A technique
whose purpose is exploratory (to identify a new hypothesis about Y) cannot be judged by its
efficacy in identifying causal mechanisms, for example. Second, among these goals, estimating
causal effects is the least common – and, by all accounts, the least successful – of these goals, so
any attempt to gauge the effectiveness of case selection methods should probably focus primarily
on exploratory and diagnostic functions. Third, case selection techniques are best practiced when
taking into account change over time in the key variables, rather than static cross-sectional
analyses – as most of the simulation exercises appear to do. Finally, and most importantly, it is
difficult and perhaps impossible to simulate the complex features involved in an in-depth case
analysis. The question of interest – which case(s) would best serve my purpose if I devoted a
case study of it? – is hard to model without introducing assumptions that pre-judge the results of
the case study and are in this respect endogenous to the case-selection strategy. 6
In my opinion, testing the viability of case selection strategies in a rigorous fashion would
involve a methodological experiment of the following sort. First, assemble a panel of researchers
with similar background knowledge of a subject. Second, identify a subject deemed ripe for case
study research, i.e., it is not well-studied or has received no authoritative treatment and is not
amenable to experimental manipulation. Third, select cases algorithmically, following one of the
protocols laid out in Table 1. Fourth, randomly assign these cases to the researchers with
instructions to pursue all case study goals – exploratory, estimating, and diagnostic. Fifth,
assemble a panel of judges, who are well-versed in the subject of theoretical focus, to evaluate
how well each case study achieved each of these goals. These could be scored on a questionnaire
using ordinal, Likert-style categories. Judges would be instructed to decide independently
(without conferring), though there might be a second round of judgments following a
deliberative process in which they shared their thoughts and their preliminary decisions.
Such an experiment would be time-consuming and costly (assuming participants receive
some remuneration). And it would need to be iterated across several research topics and with
several panels of researchers and judges in order to make strong claims of generalizability.
Nonetheless, it might be worth pursuing given the possible downstream benefits. 7
Causal Inference
Having discussed case selection, we proceed to case analysis, with a focus on the qualitative
components of that inquiry. Can causal inference be reached with qualitative data? Here, we
encounter the most mysterious, and most contested, aspect of qualitative methods.
Causal inference in a quantitative context usually refers to the estimation of a fairly
precise causal (treatment) effect. In qualitative contexts, the meaning of causal inference is more
For example, Herron & Quinn (2016: 9) make the assumption that the potential outcomes inherent in a case (i.e.,
the unit-level causal relationship) will be discovered by the case study researcher in the course of an intensive
analysis of the case. Yet, “discoverability” is the very thing that case selection techniques are designed to achieve.
That is, a case selection technique is regarded as superior insofar as it offers a higher probability of discovering an
unknown feature of a case.
7 Note, however, that this experiment disregards qualitative judgments by researchers that might be undertaken after
an algorithmic selection of cases. These qualitative judgments might serve as mediators. It could be, for example,
that some case-selection strategies work better when the researcher is allowed to make final judgments – from
among a set of potential cases that meet the stipulated case-selection criteria – based on knowledge of the potential
cases. One must also consider a problem of generalizability that stems from the use of algorithmic procedures for
selecting cases. It could be that subjects for which algorithmic case selection is feasible (i.e., where values for X, Z,
and Y can be measured across a large sample) are different from subjects for which algorithmic case selection is
infeasible. If so, we could not generalize the results of this experiment to the latter genre of case study research.
6
10
complicated. First, inferences about a causal effect are apt to be looser, less precise (unless the
relationship is deemed to be deterministic). Typically, an author will attempt to determine
whether X is a cause of Y and whether its effect is positive or negative. Sometimes, an attempt
will be made to account for all the causes of an outcome (a causes-of-effects style of research).
Invariably, there will be an attempt to identify a mechanism. Indeed, the latter may form the
main focus of analysis, as it would be in situations where a causal effect is presumed at the outset
(perhaps as a product of quantitative analysis).
For qualitative inquiry, the distinction between internal validity (causal relationships for the
studied cases) and external validity (causal relationships inferred for a broader population) is
especially critical. This is because the studied cases are usually small in number and not chosen
randomly from a known population. When we speak of causal inference in this section we are
concerned about inferences drawn for the studied cases, not for a larger population.
Rules of Thumb
Over the past several decades, scholars have attempted to identify a set of loosely framed rules to
guide the process of qualitative inquiry where the goal is causal inference (loosely defined). 8
These may be summarized as follows…
•
Analyze sources according to their relevance (to the question of theoretical interest), proximity
(whether the source is in a position to know what s/he is claiming), authenticity (the source is not
fake or reflecting the influence of someone else), validity (the source is not biased), and diversity
(collectively, sources represent a diversity of viewpoints on the question at hand).
•
When identifying a new causal factor or theory, look for one (a) that is potentially generalizable
to a larger population, (b) that is neglected in the extant literature on your subject, (c) that greatly
enhances the probability of an outcome (if binary) or explains a lot of variation on that outcome
(if interval-level), and (d) that is exogenous (not explained by other factors).
•
Canvas widely for rival explanations, which also serve as potential confounders. Treat them
seriously (not as “straw men”), dismissing them only when warranted. Utilize this logic of
elimination, where possible, to enhance the strength of the favored hypothesis.
•
For each explanation, construct as many testable hypotheses as possible, paying close attention to
within-case opportunities – e.g., mechanisms and alternative outcomes.
•
Enlist counterfactual thought-experiments in an explicit fashion, making clear which features of
the world are being altered, and which are assumed to remain the same, in order to test the
viability of a theory. Also, focus on periods when background features are stable (so they don’t
serve as confounders) and minimize changes to the world (the minimal-rewrite rule) so that the
alternate scenario is tractable.
•
Utilize chronologies and diagrams to clarify temporal and causal interrelationships among
complex causal factors. Include as many features as possible so that the time-line is continuous,
uninterrupted.
These are the loose guidelines – “rules of thumb” – that students are taught, and that
scholars follow (we hope). Because of their informal nature, qualitative evidence is often
regarded with suspicion. It’s hard to articulate what a convincing inference might consist of and
how to know it when one sees it. Are there methodological standards applying to qualitative data
analysis (aka process tracing)?
Inferential Frameworks
To remedy this situation a number of recent studies try to make sense of qualitative data,
imposing order on the seeming chaos. Proposed frameworks include set theory (Mahoney 2012;
See Beach & Pedersen (2013), Bennett & Checkel (2015), Brady & Collier (2004), Collier (2011), George (1979),
Hall (2006), Jacobs (2015), Mahoney (2012), Roberts (1996), Schimmelfennig (2015), Waldner (2012, 2015a, 2015b).
8
11
Mahoney & Vanderpoel 2015), acyclic graphs (Waldner 2015b), or – most commonly – Bayesian
inference (Beach & Pedersen 2013: 83-99; Bennett 2008, 2015; Crandell et al. 2011; George &
McKeown 1985; Gill et al. 2005; Humphreys & Jacobs 2015, forthcoming; McKeown 1999;
Rohlfing 2012: 180-99).
These efforts have performed an invaluable service to the cause of qualitative inquiry,
fitting them into frameworks that are already well-established for quantitative inquiry. It should
be no surprise that there are multiple frameworks, just as there are multiple frameworks for
quantitative methodology. Scholars may debate whether, or to what extent, these frameworks are
compatible with each other; this important debate is orthogonal to the present topic. The point
to stress is that qualitative inquiry can be understood within the rubric of general causal
frameworks. There is, in this sense, a unifying logic of inquiry.
Thus far, applications of set theory, acyclic graphs, and bayesianism to qualitative
methods have focused on making sense of the activity rather than providing a practical guide to
research. It remains to be seen whether these can be developed in such a way as to alter the ways
that qualitative researchers go about their business. Let me illustrate.
Some years ago, Van Evera (1997) proposed a fourfold typology of tests that has since
been widely adopted (e.g., Bennett & Checkel 2015: 17; George & Bennett 2005; Mahoney &
Vanderpoel 2015; Waldner 2015a). A “hoop” test is necessary (but not sufficient) for
demonstrating Hx. A “smoking-gun” test is sufficient (but not necessary) for demonstrating Hx.
A “doubly-decisive” test is necessary and sufficient for demonstrating Hx. A “straw-in-the-wind”
test is neither necessary nor sufficient, constituting weak or circumstantial evidence. These
concepts, diagramed in Table 2, are useful for classifying the nature of evidence according to a
researcher’s judgment. However, the hard question – the judgment itself – is elided. When does a
particular piece of evidence qualify as a hoop, smoking-gun, doubly-decisive, or straw-in-thewind test (or something in between)?
Table 2: Qualitative Tests and their Presumed Inferential Role
Inferential Role
Tests
Hoop
Smoking-gun
Doubly-decisive
Straw-in-wind
Necessary


Sufficient


Likewise, Bayesian frameworks are useful for combining evidence from diverse quarters
in a logical fashion with the use of subjective assessments, e.g., the probability that a hypothesis
is true, ex ante, and assessments of the probability that the hypothesis is true if a piece of
evidence (stipulated in advance) is observed. The hard question, again, is the case-specific
judgment. Consider the lengthy debate that has ensued over the reasons for electoral system
choice in Europe (Kreuzer 2010). Humphreys & Jacobs (2015) use this example to sketch out
their application of Bayesian inference to qualitative research. In particular, they explore the “left
threat” hypothesis, which suggests that the presence of a large left-wing party explains the
adoption of proportional representation (PR) in the early twentieth century (Boix 1999). The
authors point out that “for cases with high left threat and a shift to PR, the inferential task is to
determine whether they would have, or would not have, shifted to PR without left threat”
(Humphreys & Jacobs 2015: 664). Bayesian frameworks do nothing to ease this inferential task,
12
which takes the form of a counterfactual thought-experiment. Similar judgments are required by
other frameworks – set theory, acyclic graphs, and so forth.
To get a feel for the level of detail required in qualitative research let us take a closer look
at a particular inquiry. Helpfully, Tasha Fairfield (2013: 55-6; see also 2015) provides a
scrupulous blow-by-blow account of the sleuthing required to reach each case-level inference in
her study of how policymakers avoid political backlash when they attempt to tax economic elites.
One of her three country cases is Chile, which is observed during and after a recent presidential
election. Fairfield explains,
During the 2005 presidential campaign, right candidate Lavı´n blamed Chile’s
persistent inequality on the left and accused President Lagos of failing to deliver his
promise of growth with equity. Lagos responded by publicly challenging the right to
eliminate 57 bis, a highly regressive tax benefit for wealthy stockholders that he
called “a tremendous support for inequality.” The right accepted the challenge and
voted in favor of eliminating the tax benefit in congress, deviating from its prior
position on this policy and the preferences of its core business constituency.
The following three hypotheses encompass the main components of my argument
regarding why the right voted in favor of the reform:
Hypothesis 1. Lagos’ equity appeal motivated the right to accept the reform, due to
concern over public opinion.
Hypothesis 2. The timing of the equity appeal—during a major electoral
campaign—contributed to its success.
Hypothesis 3. The high issue-salience of inequality contributed to the equity
appeal’s success.
The following four observations, drawn from different sources, provide indirect,
circumstantial support for Hypothesis 1:
Observation 1a (p. 48): The Lagos administration considered eliminating 57 bis in
the 2001 Anti-Evasion reform but judged it politically infeasible given businessright opposition (interview: Finance Ministry-a, 2005).
Observation 1b: The Lagos administration subsequently tried to reach an agreement
with business to eliminate 57 bis without success (interview, Finance Ministry-b,
2005).
Observation 1c: Initiatives to eliminate the exemption were blocked in 1995 and
1998 due to right opposition. (Sources: congressional records, multiple interviews)
Observation 1d: Previous efforts to eliminate 57 bis did not involve concerted
equity appeals. Although Concertacio´n governments had mentioned equity in prior
efforts, technical language predominated, and government statements focused
much more on 57 bis’ failure to stimulate investment rather than its regressive
distributive impact (congressional records, La Segunda, March 27, 1998, El
Mercurio, April 1, 1998, Interview, Ffrench-Davis, Santiago, Chile, Sept. 5, 2005).
Inference: These observations suggest that right votes to eliminate 57 bis would
have been highly unlikely without some new, distinct political dynamic. Lagos’
strong, high-profile equity appeal, in the unusual context of electoral competition
from the right on the issue of inequality, becomes a strong candidate for explaining
the right’s acceptance of the reform.
The appendix continues in this vein for several pages, focused relentlessly on explaining the
behavior of one particular set of actors in one event, i.e., the motivation of the right-wing in
favoring the reform. This event is just one of a multitude of events discussed in connection with
the Chilean case study, to which must be added the equally complex set of events occurring in
Argentina and Bolivia in Fairfield’s three-country study. Clearly, reaching case-level inferences is
complicated business.
One may conclude that if researchers agreed on case-level judgments then general frameworks
could successfully cumulate those judgments into higher-level inferences, accompanied by a
13
(very useful!) confidence interval. But if one cannot assume case-level consensus, conclusions
based on qualitative judgments combined through a Bayesian (or other) framework represent
one researcher’s views, which might vary appreciably from another’s. Readers who are not
versed in the intricacies of Chilean politics will have a hard time ascertaining whether Fairfield’s
judgments are correct.
A Crowd-based Approach
In principle, this sort of problem could be overcome with a crowd-based approach. Specifically,
one might survey a panel of experts – chosen randomly or with an aim to represent diverse
perspectives – on each point of judgment. One could then cumulate these judgments into an
overall inference, in which the confidence interval reflects the level of disagreement among
experts (among other things). Unfortunately, not just any crowd will do. The extreme difficulty
of case study research derives in no small part from the expertise that case study researchers
bring to their task. I cannot envision a world in which lay coders, recruited through Amazon
Turk or Facebook, would replace that expertise, honed through years of work on a particular
problem and in a particular site (a historical period, country, city, village, organization,…).
To be credible, a crowd-based approach to the problem of judgment would need to
enlist the small community of experts who study a subject and can be expected to make
knowledgeable judgments about highly specific questions such as the “left wing threat.” In the
previous example, it would entail enlisting scholars versed in the politics of early twentieth
century Europe. This procedure is conceivable, but difficult to implement. How would one
identify a random, or otherwise representative, sample? (What is the sampling frame?) How
would one motivate scholars to undertake the task? How would one elicit honest judgments
about the specific questions on a questionnaire, uncorrupted by broader judgments about the
theoretical question at hand (which they would probably be able to infer)?
Likewise, if one goes to the trouble of constructing a common coding frame (a
questionnaire), an on-line system for recording responses, a system of recruitment, and a
Bayesian (or some other) framework for integrating judgments, the considerable investment in
time and expense of such a venture would probably justify extending the analysis to many cases,
chosen randomly, so that a representative sample can be attained and stochastic threats to
inference minimized. In this fashion, procedures to integrate qualitative data into a quantitative
framework seem likely to morph from case studies into cross-case coding exercises. This is not
to argue against the idea. It is simply to point out that any standardization of procedures tends to
work against the intensive focus on one or several cases which (by my definition) characterizes
case study research.
Multimethod Research
In multimethod research both qual and quant styles of evidence are brought to bear on the same
general research question (Brewer & Hunter 2006; Goertz 2015; Harrits 2011; Lieberman 2005;
Seawright 2017). While multimethod research is increasingly common, there are serious
questions about its effectiveness (Lohmann 2007). Doing more than one thing might mean
doing multiple things poorly, by dint of limited time or expertise. Nor is clear whether qualitative
and quantitative analysis can speak to one another productively (Ahmed & Sil 2012).
In discussing this question it is important not to confuse disagreement with
incommensurability. If qual and quant tests of a proposition are truly independent there is always
the possibility that they will elicit different, perhaps even directly contradictory, answers. For
example, the most common style of multimethod analysis combines a quantitative analysis of
many units with an in-depth, qualitative (or at least partially qualitative) analysis of a single case
or a small set of cases, which Lieberman (2005) refers to as a nested analysis. Occasionally, these
14
two analyses reach different conclusions about a causal relationship (though, one suspects,
authors do not always bring these disagreements to the fore). However, the same disagreements
also arise from rival quantitative analyses (e.g., conducted with different samples or
specifications) and rival qualitative analyses (e.g., focused on different research sites or generated
by different researchers). Disagreement about whether X causes Y, or about the mechanisms at
work, does not entail that multimethod research is unavailing. Sometimes, triangulation does not
confirm one’s hypothesis. It is still useful information; and for those worried about confirmation
bias, it is critical.
In any case, Seawright (2017) points out that when qualitative and quantitative evidence
is combined these analyses are usually oriented toward somewhat different goals. Typically, a
large-n cross-case analysis is focused on measuring a causal effect while a small-n within-case
analysis is focused on identifying a causal mechanism. As such, the two styles of evidence cannot
directly conflict since their objectives are different. They nonetheless inform each other in a
useful fashion.
This leaves open another way of viewing multimethod research. Sometimes, the
qualitative and quantitative aspects of research are profitably united within a larger “research
cycle” that includes a diversity of methods and authors (Lieberman 2016). This allows scholars
with a qual or quant bent to do what they do best, concentrating their efforts on their particular
skill-set and on one particular context that they can become intimately acquainted with. The
research cycle also mitigates a presentational problem – stuffing results from myriad analyses
into a 10,000-word article.
Unfortunately, the research cycle approach to multimethod research also encounters
obstacles. In particular, one must wonder whether cumulation can occur successfully across
diverse studies utilizing diverse research methods. Note that political science work is not highly
standardized, even when focused on the same research question and when utilizing the same
quantitative method. This inhibits the integration of findings, and helps to account for the
scarcity of meta-analyses in political science. Qualitative studies are even less likely to be
standardized in a way that allows for their integration into an ongoing research trajectory. Inputs
and outputs may be defined and operationalized in disparate ways, or perhaps not clearly
operationalized at all. And because samples are not randomly chosen, any aggregation of studies
cannot purport to represent a larger population in an unbiased fashion.
There is yet another angle on this topic that offers what is perhaps a more optimistic –
not to mention realistic – reading of the multimethod ideal. Rather than conceptualizing
qualitative and quantitative research as separate research designs we might regard them as
integral components of the same design.
Nowadays, it is my impression that there are fewer purely qualitative studies. Although
the main burden of inference may be carried by qualitative data, this is often supplemented by a
large-n cross-case analysis or a large-n within-case analysis (where observations are drawn from a
lower level of analysis). Likewise, there are few purely quantitative analyses since the former is
usually (always?) accompanied by qualitative observations of one sort or another. At a minimum,
qualitative data is trotted out by way of illustration. At a maximum, qualitative data is essential to
causal inference.
In this vein, a number of recent studies highlight the vital role played by qualitative data,
even when the research design is experimental or quasi-experimental. Although we tend to think
of these designs as being quantitative – since they generally incorporate a large number of
comparable units – they may also contain important qualitative components.
There is, to begin with, the problem of research design. Without an ethnographic
understanding of the research site and the individuals who are likely to serve as subjects it is
impossible to design an experiment that adequately tests a hypothesis of interest. It is impossible
to define a confounder “in the abstract.” In-depth case-based understanding is especially
15
important in the context of field experimentation, where one context is likely to influence how
subjects react to a given treatment.
Second, one must assess potential threats to inference. Where the assignment is randomized,
ex ante comparability is assured. But ex post comparability remains a serious threat to inference.
For example, experiments often face problems of compliance, so it is incumbent on the
researcher to ascertain whether subjects adhered to the prescribed protocol and, if not, which
subjects violated the protocol. Where significant numbers of subjects attrit (withdraw from
participation) there is an important question about what motivated their withdrawal and what
sort of subjects were inclined to withdraw. In field experiments, where a significant time lag
often separates the treatment and the outcome of theoretical interest, one must try to determine
whether subjects under study may have communicated with one another, introducing potential
problems of interference and/or contamination (interference across treatment and control
groups).
Third, there is a question of causal mechanisms. Assuming a treatment effect can be
measured without bias, what is it that accounts for the connection between X and Y?
Finally, there are questions of generalizability. In order to determine the external validity of
an experiment one must have a good sense of the research site and the subjects who have been
studied. Specifically, one must be able to assess the extent to which these individuals, and this
particular treatment effect, can be mapped across other – potentially quite different – settings.
These issues – of research design, inferential threats, causal mechanisms, and
generalizability – are often assessable with qualitative data. Indeed, they may only be assessable
by means of a rich, contextual knowledge of a research project as it unfolds on a particular site.
Paluck (2010) argues, further, that experimental designs may be combined with
qualitative measurement to access outcomes that would not be apprehended with traditional
quantitative measures. As an example, she explores Chattopadhyay & Duflo’s (2004) study of
women leaders in India. While praising this landmark study, Paluck (2010: 61) points out,
participant observation of women leaders outside of the council settings—such as
in their homes, where they visit with other women—could have revealed whether
they were influenced by women constituents in these more informal settings.
Intensive interviews could compare social processes in villages with female or male
council leaders to reveal how beliefs about women leaders’ efficacy shift. For
example, did other council members, elders, or religious leaders make public
statements about female leaders or the reservation system? Was there a tipping
point at which common sentiment in villages with female leaders diverged from
villages with male leaders? Such qualitatively generated insights could have enabled
this study to contribute more to general theories of identity, leadership, and political
and social change. Moreover, ethnographic work could compare understandings of
authority and political legitimacy in villages with female- and male-led councils. Do
the first female leaders inspire novel understandings of female authority and
legitimacy, or are traditional gender narratives invoked just as frequently to explain
women’s new power and position?
Paluck concludes that experiments provide an opportunity for qualitative analysis, one that is
grossly under-utilized. Quantitative scholars who are enamored of experiments are well-advised
to pursue a parallel investigation along ethnographic lines. Qualitative scholars who wish to
understand causal relationship are well-advised to conduct experiments to facilitate this analysis.
For example, suppose one is interested in the impact of modernization across range of
measurable outcomes. To address this question, one might construct a field experiment in which
an agent of modernization – e.g., a bridge, road, harbor, radio tower – is randomized across sites,
allowing for an opportunity to systematically compare treatment and control groups over time.
This design not only allows for unbiased estimates of a causal effect; it also affords an occasion
for participant-observation – focused, one supposes, on how subjects respond to the treatment,
what sense they make of their changing world, and what mechanisms are at work.
16
Where the treatment is not randomly assigned (i.e., in observational research) there are
additional issues pertaining to potential assignment (or selection) bias. Here, qualitative data
often comes into play (Dunning 2012). For example, Jeremy Ferwerda & Nicholas Miller (2014)
argue that devolution of power reduces resistance to foreign rule. To do so, they focus on France
during World War Two, when the northern part of the country was ruled directly by German
forces and the southern part was ruled indirectly by the “Vichy” regime headed by Marshall
Petain. The key methodological assumption of their regression discontinuity design is that the
line of demarcation was assigned in an as-if random fashion. For the authors, and for their critics
(Kocher & Monteiro 2015), this assumption requires in-depth qualitative research – research that
promises to uphold, or call into question, the author’s entire analysis.
As a second example, we may consider Romer & Romer’s (2010) analysis of the impact
of tax changes on economic activity. Because tax changes are non-random, and likely to be
correlated with the outcome of interest, anyone interested in this question must be concerned
with bias arising from the assignment of the treatment. To deal with this threat, Romer & Romer
make use of the narrative record provided by presidential speeches and congressional reports to
elucidate the motivation of tax policy changes in the postwar era. This allows them to distinguish
policy changes that might have been motivated by economic performance from those that may
be considered as-if random. By focusing solely on the latter, they claim to provide an unbiased
test of the theory that tax increases are contractionary.
Conclusions
I began by stipulating a definition for quantitative and qualitative research. If the work is
quantitative, it enlists patterns of covariation found in a matrix of observations and analyzed with
a formal model to reach a descriptive or causal inference. If the work is qualitative, the inference
is based on bits and pieces of non-comparable observations that address different aspects of a
problem. Traditionally, these are analyzed in an informal fashion. If one accepts this definition, it
follows that one can convert qualitative data to quantitative data (“coding”) but not the reverse.
And it follows that each approach to social science has characteristic strengths and weaknesses.
Qualitative data is generally (but not always!) more useful insofar as a study is exploratory and
insofar as it is focused on a single case or a small number of cases.
In the second section, I presented a typology of case-selection strategies whose
organizing feature is the goal that a case study is intended to serve. I argued (implicitly) that
methods of case selection are considerably more differentiated than extant work suggests.
In the third section, I discussed the application of qualitative data to the goal of causal
inference – beginning with common “rules of thumb” and proceeding to general frameworks
such as set theory, acyclic graphs, and Bayesian probability. These general frameworks have
demonstrated (at least to my mind) that qualitative and quantitative observations can be
incorporated into a unified framework in the pursuit of causal inference. They have not yet
provided practical tools for the conduct of qualitative inquiry (more on this below).
In the fourth section, I discussed multimethod research – which in this context refers to
the combination of quantitative and qualitative data in the same analysis, the same study, or in
various studies devoted to the same research question (a “research cycle”). This pluralistic
approach seems to offer the possibility of combining the strengths of both styles of research,
while avoiding their respective weaknesses. And it seems to be reflected in current trends within
the discipline. Acknowledging the burdens imposed upon authors (who must master a diverse
range of skills), the limitations imposed by journals with stringent word-counts, and the problem
of cumulating results across diverse methods, I would argue that the multi-method ideal
nonetheless offers a plausible solution to the pervasive conflict between quantitative and
qualitative styles of research.
17
By way of conclusion to this short review, I want to invoke a fundamental tradeoff in
scientific endeavor – between a context of discovery (aka exploration, innovation) and a context of
justification (aka appraisal, demonstration, proof, verification/falsification) (Reichenbach 1938).
While both are acknowledged to be essential to scientific progress, the field of methodology is
strongly aligned with the latter. This is because the task of justification is amenable to systematic
rules which can be presented in academic journals, summarized in textbooks, and taught in
courses. By contrast, the task of discovery is a comparatively anarchistic affair. There are no rules
for finding new things. There may be some informal rules of thumb, analogies, pieces of advice,
but nothing one could build an academic field around.
I am exaggerating, to be sure. But the dichotomy is useful in illustrating a core feature of
qualitative inquiry. If exploratory work is inherently hostile to systematic method (Feyerabend
1975), and if qualitative approaches are uniquely insightful at early stages of research, then it may
be a mistake to suppose that systematic rules of method can apply, or should apply, to this genre
of research. Indeed, the very features that inhibit the achievement of falsifiability, replicability,
and cumulation may enhance the achievement of discovery. For example, qualitative work is
often derided as providing multiple angles on a subject, all of which are plausible and none of
which can be definitively proven or disproven. This flows from the narrow but intensive manner
of study, which may be summarized as large-K (variables), small-N (observations – in the classic
sense of comparable/dataset observations).” Qualitative work is also seen as “post hoc,”
adjusting theories to fit the facts or adjusting facts to fit the theories (i.e., looking for settings in
which Theory X might be true). These are indeed vices if the researcher’s goal is to avoid Type A
errors. But they are virtues insofar as one wishes to discover new – and potentially true – things
about the world.
I do not wish to ghetto-ize qualitative inquiry as purely exploratory. Non-comparable bits
of evidence have a vital role to play in confirming, and disconfirming, theories, as the foregoing
discussion illustrates. However, insofar as qualitative inquiry contributes to the discovery of new
concepts, new hypotheses, and new frameworks of analysis we must come to terms with the
nature of that inquiry, which is often at odds with current trends in social science methodology.
To honor the contributions of qualitative research is to honor the role of exploratory research in
the progress of social science.
18
References
Ahmed, Amel, Rudra Sil. 2012. “When Multi-Method Research Subverts Methodological
Pluralism - Or, Why We Still Need Single-Method Research.” Perspectives on
Politics 10:4 (December) 935-953.
Alesina, Alberto, Edward Glaeser, Bruce Sacerdote. 2001. “Why Doesn’t the US Have a
European-Style Welfare State?” Brookings Papers on Economic Activity 2, 187-277.
Beach, Derek, Rasmus Brun Pedersen. 2013. Process-Tracing Methods: Foundations and Guidelines.
Ann Arbor, MI: University of Michigan Press.
Beck, Nathaniel. 2006. “Is Causal-Process Observation An Oxymoron?” Political Analysis 14(3):
347–52.
Beck, Nathaniel. 2010. “Causal Process ‘Observations’: Oxymoron or (Fine) Old Wine.” Political
Analysis 18(4): 499–505.
Bennett, Andrew. 2008. “Process Tracing: A Bayesian Approach.” In Janet Box-Steffensmeier,
Henry Brady, & David Collier (eds), Oxford Handbook of Political Methodology (Oxford: Oxford
University Press) 702-21.
Bennett, Andrew. 2015. “Disciplining our Conjectures: Systematizing Process Tracing with
Bayesian Analysis.” In Andrew Bennett & Jeffrey T. Checkel (eds), Process Tracing: From
Metaphor to Analytic Tool (Cambridge: Cambridge University Press) 276-98.
Bennett, Andrew, Colin Elman. 2006a. “Complex Causal Relations and Case Study Methods:
The Example of Path Dependence.” Political Analysis 14:3, 250-67.
Bennett, Andrew, Colin Elman. 2006b. “Qualitative Research: Recent Developments in Case
Study Methods.” Annual Review of Political Science 9:455–76.
Bennett, Andrew, Jeffrey T. Checkel (eds). 2015. Process Tracing: From Metaphor to Analytic Tool.
Cambridge: Cambridge University Press.
Blatter, Joachim, Markus Haverland. 2012. Designing Case Studies: Explanatory Approaches in Small-n
Research. Palgrave Macmillan.
Boas, Taylor C. 2007. “Conceptualizing Continuity and Change: The Composite-Standard Model
of Path Dependence.” Journal of Theoretical Politics 19:1, 33–54.
Boix, Carles. 1999. “Setting the Rules of the Game: The Choice of Electoral Systems in
Advanced Democracies.” American Political Science Review 93:3, 609-624.
Brady, Henry E. 2010. “Data-Set Observations versus Causal-Process Observations: The 2000
U.S. Presidential Election.” In Henry E. Brady & David Collier (eds), Rethinking Social Inquiry:
Diverse Tools, Shared Standards. 2nd ed. (Lanham, MD: Rowan & Littlefield) 237–42.
Brady, Henry E., David Collier (eds). 2004. Rethinking Social Inquiry: Diverse Tools, Shared Standards.
Lanham: Rowman & Littlefield.
Brady, Henry E., David Collier (eds). 2010. Rethinking Social Inquiry: Diverse Tools, Shared Standards.
2nd ed. Lanham, MD: Rowan & Littlefield.
Brewer, John, Albert Hunter. 2006. Foundations of Multimethod Research: Synthesizing Styles.
Thousand Oaks, CA: Sage.
Caporaso, James. 2009. “Is There a Quantitative-Qualitative Divide in Comparative Politics?” In
Todd Landman and Neil Robinson (eds), Sage Handbook of Comparative Politics (Thousand Oaks:
Sage).
Chattopadhyay, Raghabendra, Esther Duflo. 2004. “Women as policy makers: Evidence from a
randomized policy experiment in India.” Econometrica 72 (5): 1409-43.
Collier, David. 2011. “Understanding Process Tracing.” PS: Political Science and Politics 44 (04):
823–30.
Collier, David, Colin Elman. 2008. “Qualitative and Multimethod Research: Organizations,
Publications, and Reflections on Integration.” In Janet M. Box-Steffensmeier, Henry Brady,
and David Collier (eds), The Oxford Handbook for Political Methodology (Oxford: Oxford University
Press) 779–95.
19
Collier, David, Jody LaPorte, Jason Seawright. 2012. “Putting Typologies to Work: Concept
Formation, Measurement, and Analytic Rigor.” Political Research Quarterly 65(1) 217–232.
Collier, David, John Gerring (eds). 2009. Concepts and Method in Social Science: The Tradition of
Giovanni Sartori. Routledge.
Crandell, Jamie L., Corrine I. Voils, YunKyung Chang, Margarete Sandelowski. 2011. “Bayesian
data augmentation methods for the synthesis of qualitative and quantitative research
findings.” Quality & Quantity 45, 653–669.
Crasnow, Sharon. 2012. “The Role of Case Study Research in Political Science: Evidence for
Causal Claims.” Philosophy of Science 79:5 (December) 655-66.
Dunning, Thad. 2012. Natural Experiments in the Social Sciences: A Design-Based Approach.
Cambridge: Cambridge University Press.
Eckstein, Harry. 1975. “Case Studies and Theory in Political Science.” In Fred I. Greenstein and
Nelson W. Polsby (eds), Handbook of Political Science, vol. 7. Political Science: Scope and Theory
(Reading, MA: Addison-Wesley).
Elman, Colin. 2005. “Explanatory Typologies in Qualitative Studies of International Politics.”
International Organization 59:2 (April) 293-326.
Elman, Colin, Diana Kapiszewski. 2014. “Data Access and Research Transparency in the
Qualitative Tradition.” PS: Political Science & Politics 47/1: 43–47.
Elman, Colin, Diana Kapiszewski, Lorena Vinuela. 2010. “Qualitative Data Archiving: Rewards
and Challenges.” PS: Political Science & Politics 43, 23-27.
Epstein, Leon D. 1964. “A Comparative Study of Canadian Parties.” American Political Science
Review 58 (March) 46-59.
Fairfield, Tasha. 2013. “Going Where the Money Is: Strategies for Taxing Economic Elites in
Unequal Democracies.” World Development 47, 42–57.
Fairfield, Tasha. 2015. Private Wealth and Public Revenue in Latin America: Business Power and Tax
Politics. Cambridge: Cambridge University Press.
Fenno, Richard F., Jr. 1977. “U.S. House Members in Their Constituencies: An Exploration.”
American Political Science Review 71:3 (September) 883-917.
Fenno, Richard F., Jr. 1978. Home Style: House Members in their Districts. Boston : Little, Brown.
Ferwerda, Jeremy, Nicholas Miller. 2014. “Political Devolution and Resistance to Foreign Rule:
A Natural Experiment.” American Political Science Review 108:3 (August) 642-60.
Feyerabend, Paul. 1975. Against Method. London: New Left Books.
Friedman, Milton, Anna Jacobson Schwartz. 1963. A Monetary History of the United States, 18671960. Princeton: Princeton University Press.
Garfinkel, Harold. 1967. Studies in Ethnomethodology. Englewood Cliffs: Prentice-Hall.
George, Alexander L. 1979. “Case Studies and Theory Development: The Method of Structured,
Focused Comparison.” In Paul Gordon Lauren (ed), Diplomacy: New Approaches in History,
Theory, and Policy (New York: The Free Press).
George, Alexander L., Andrew Bennett. 2005. Case Studies and Theory Development. Cambridge:
MIT Press.
George, Alexander L., Timothy J. McKeown. 1985. “Case Studies and Theories of
Organizational Decision-making.” In Robert F. Coulam & Richard A. Smith (eds), Advances in
Information Processing in Organizations (Greenwich, Conn.: JAI Press) 21–58.
Gerring, John. 2007. Case Study Research: Principles and Practices. Cambridge: Cambridge University
Press.
Gerring, John. 2012. “Mere Description.” British Journal of Political Science 42:4 (October) 721-46.
Gerring, John. 2017. Case Study Research: Principles and Practices, 2d ed. Cambridge: Cambridge
University Press.
Gerring, John, Jason Seawright. 2016. “The Inference in Causal Inference: A Psychology for
Social Science Methodology.” Unpublished manuscript, Department of Political Science,
Boston University.
20
Gerring, John, Lee Cojocaru. 2016. “Selecting Cases for Intensive Analysis: A Diversity of Goals
and Methods.” Sociological Methods & Research (forthcoming).
Gerring, John, Rose McDermott. 2007. “An Experimental Template for Case-Study Research.”
American Journal of Political Science 51:3 (July) 688-701.
Gill, Christopher J, Lora Sabin, Christopher H. Schmid. 2005. “Why Clinicians are Natural
Bayesians.” BMJ 330:1080-3 (May 7).
Glassner, Barry; Jonathan D. Moreno (eds). 1989. The Qualitative-Quantitative Distinction in the Social
Sciences (Boston Studies in the Philosophy of Science, 112).
Goertz, Gary. 2005. Social Science Concepts: A User’s Guide. Princeton: Princeton University Press.
Goertz, Gary. 2017. Multimethod Research, Causal Mechanisms, and Selecting Cases: The Research Triad.
Princeton: Princeton University Press.
Goertz, Gary, James Mahoney. 2012. A Tale of Two Cultures: Qualitative and Quantitative Research in
the Social Sciences. Princeton: Princeton University Press.
Grimmer, Justin, Brandon M. Stewart. 2013. “Text as Data: The Promise and Pitfalls of
Automatic Content Analysis Methods for Political Texts.” Political Analysis 21:3, 267-97.
Hall, Peter A. 2003. “Aligning Ontology and Methodology in Comparative Politics.” In James
Mahoney and Dietrich Rueschemeyer (eds), Comparative Historical Analysis in the Social Sciences
(Cambridge: Cambridge University Press).
Hall, Peter A. 2006. “Systematic process analysis: when and how to use it.” European Management
Review 3, 24–31
Hammersley, Martyn. 1992. “Deconstructing the Qualitative-Quantitative Divide.” In Julie
Brannen (ed), Mixing Methods: Qualitative and Quantitative Research (Aldershot: Avebury).
Harrits, Gitte Sommer. 2011. “More Than Method? A Discussion of Paradigm Differences
within Mixed Methods Research.” Journal of Mixed Methods Research 5(2): 150–66.
Herron, Michael C., Kevin M. Quinn. 2016. “A Careful Look at Modern Case Selection
Methods.” Sociological Methods & Research (forthcoming).
Humphreys, Macartan, Alan M. Jacobs. 2015. “Mixing Methods: A Bayesian Approach.”
American Political Science Review 109:4 (November) 653-73.
Humphreys, Macartan, Alan M. Jacobs. Forthcoming. Integrated Inferences: A Bayesian Integration of
Qualitative and Quantitative Approaches to Causal Inference. Cambridge: Cambridge University
Press.
Jacobs, Alan. 2015. “Process Tracing the Effects of Ideas.” In Andrew Bennett, Jeffrey T.
Checkel (eds), Process Tracing: From Metaphor to Analytic Tool (Cambridge: Cambridge University
Press) 41-73.
Kapiszewski, Diana, Lauren M. MacLean, Benjamin L. Read. 2015. Field Research in Political
Science: Practices and Principles. Cambridge: Cambridge University Press.
Karl, Terry Lynn. 1997. The Paradox of Plenty: Oil Booms and Petro-States. Berkeley: University of
California Press.
King, Gary, Robert O. Keohane, Sidney Verba. 1994. Designing Social Inquiry: Scientific Inference in
Qualitative Research. Princeton: Princeton University Press.
Kocher, Matthew, Nuno Monteiro. 2015. “What’s in a Line? Natural Experiments and the Line
of Demarcation in WWII Occupied France.” Unpublished manuscript, Department of
Political Science, Yale University.
Kreuzer, Markus. 2010. “Historical Knowledge and Quantitative Analysis: The Case of the
Origins of Proportional Representation.” American Political Science Review 104:369–92.
Levy, Jack S. 2007. “Qualitative Methods and Cross-Method Dialogue in Political Science.”
Comparative Political Studies 40(2): 196–214.
Levy, Jack S. 2008. “Case Studies: Types, Designs, and Logics of Inference.” Conflict Management
and Peace Science 25:1–18.
Lieberman, Evan S. 2005. “Nested Analysis as a Mixed-Method Strategy for Comparative
Research.” American Political Science Review 99:3 (August) 435-52.
21
Lieberman, Evan S. 2010. “Bridging the Qualitative-Quantitative Divide: Best Practices in the
Development of Historically Oriented Replication Databases.” Annual Review of Political Science
13, 37-59.
Lieberman, Evan S. 2016. “Improving Causal Inference through Non-Causal Research: Can the
Bio-Medical Research Cycle Provide a Model for Political Science?” Unpublished manuscript,
Department of Political Science, MIT.
Lijphart, Arend. 1971. “Comparative Politics and the Comparative Method.” American Political
Science Review 65, 682-93.
Lohmann, Susanne. 2007. “The Trouble with Multi-Methodism.” Newsletter of the APSA Organized
Section on Qualitative Methods 5(1): 13–17.
Lucas, Samuel R., Alisa Szatrowski. 2014. “Qualitative Comparative Analysis in Critical
Perspective.” Sociological Methodology 44:1, 1–79.
Lynd, Robert Staughton, Helen Merrell Lynd. 1929/1956. Middletown: A Study in American Culture.
New York: Harcourt, Brace.Mahoney, James, Gary Goertz. 2006. “A Tale of Two Cultures:
Contrasting Quantitative and Qualitative Research.” Political Analysis 14:227–49.
Mahoney, James. 2010. “After KKV: The New Methodology of Qualitative Research.” World
Politics 62 (1): 120–47.
Mahoney, James. 2012. “The Logic of Process Tracing Tests in the Social Sciences.” Sociological
Methods & Research 41:4 (November) 566-590.
Mahoney, James, Gary Goertz. 2006. “A Tale of Two Cultures: Contrasting Quantitative and
Qualitative Research.” Political Analysis 14:3 (Summer) 227-49.
Mahoney, James, Kathleen Thelen (eds). 2015. Advances in Comparative-Historical Analysis.
Cambridge: Cambridge University Press.
Mahoney, James, Rachel Sweet Vanderpoel. 2015. “Set Diagrams and Qualitative Research.”
Comparative Political Studies 48:1 (January) 65-100.
Mansfield, Edward D., Jack Snyder. 2005. Electing to Fight: Why Emerging Democracies go to War.
Cambridge: MIT Press.
McKeown, Timothy J. 1999. “Case Studies and the Statistical World View.” International
Organization 53 (Winter) 161-190.
McLaughlin, Eithne. 1991. “Oppositional Poverty: The Quantitative/Qualitative Divide and
Other Dichotomies.” The Sociological Review 39 (May): 292-308.
Mill, John Stuart. 1843/1872. The System of Logic, 8th ed. London: Longmans, Green.
Moore, Barrington, Jr. 1966. Social Origins of Dictatorship and Democracy: Lord and Peasant in the
Making of the Modern World. Boston: Beacon Press.
Morgan, Mary. 2012. “Case Studies: One Observation or Many? Justification or Discovery?”
Philosophy of Science 79:5 (December) 655-66.
Page, Scott E. 2006. “Essay: Path Dependence.” Quarterly Journal of Political Science 1: 87–115.
Paluck, Elizabeth Levy. 2010. “The Promising Integration of Qualitative Methods and Field
Experiments.” The ANNALS of the American Academy of Political and Social Science 628, 59-71.
Patton, Michael Quinn. 2002. Qualitative Research & Evaluation Methods. Sage.
Pincus, Steve. 2011. 1688: The First Modern Revolution. New Haven: Yale University Press.
Platt, Jennifer. 1992. “’Case Study’ in American Methodological Thought.” Current Sociology 40:1,
17-48.
Posner, Daniel. 2004. “The Political Salience of Cultural Difference: Why Chewas and
Tumbukas are Allies in Zambia and Adversaries in Malawi.” American Political Science Review
98:4 (November) 529-46.
Ray, James Lee. 1993. “Wars between Democracies: Rare or Nonexistent?” International
Interactions 18:251–76.
Reichenbach, Hans 1938. Experience and Prediction: An Analysis of the Foundations and the Structure of
Knowledge. University of Chicago Press.
22
Reiss, Julian. 2009. “Causation in the Social Sciences: Evidence, Inference, and Purpose.”
Philosophy of the Social Sciences 39(1): 20–40.
Rihoux, Benoit. 2013. “Qualitative Comparative Analysis (QCA), Anno 2013: Reframing The
Comparative Method’s Seminal Statements.” Swiss Political Science Review 19:2, 233–45.
Roberts, Clayton. 1996. The Logic of Historical Explanation. University Park: Pennsylvania State
University Press.
Rohlfing, Ingo. 2012. Case Studies and Causal Inference: An Integrative Framework. Palgrave
Macmillan.
Romer, Christina D., David H. Romer. 2010. “The Macroeconomic Effects of Tax Changes:
Estimates Based on a New Measure of Fiscal Shocks.” American Economic Review 100 (June)
763–801.
Rosenau, Pauline Marie. 1992. Post-Modernism and the Social Sciences: Insights, Inroads, and Intrusions.
Princeton: Princeton University Press.
Schatz, Edward (ed). 2009. Political Ethnography: What Immersion Contributes to the Study of Power.
Chicago: University of Chicago Press.
Schimmelfennig, Frank. 2015. “Efficient Process Tracing: Analyzing the Causal Mechanisms of
European Integration.” In Andrew Bennett, Jeffrey T. Checkel (eds), Process Tracing: From
Metaphor to Analytic Tool (Cambridge: Cambridge University Press) 98-125.
Schwartz, Howard, Jerry Jacobs. 1979. Qualitative Sociology: A Method to the Madness. New York:
Free Press.
Seawright, Jason. 2016. “The Case for Selecting Cases that are Deviant or Extreme on the
Independent Variable.” Sociological Methods & Research (forthcoming).
Seawright, Jason. 2017. Multi-Method Social Science: Combining Qualitative and Quantitative Tools.
Cambridge: Cambridge University Press, forthcoming.
Seawright, Jason, John Gerring. 2008. “Case-Selection Techniques in Case Study Research: A
Menu of Qualitative and Quantitative Options.” Political Research Quarterly 61:2 (June) 294-308.
Shapiro, Ian, Rogers Smith, Tarek Masoud (eds). 2004. Problems and Methods in the Study of Politics.
Cambridge: Cambridge University Press.
Shweder, Richard A. 1996. “Quanta and Qualia: What is the ‘Object’ of Ethnographic Method?”
In Richard Jessor, Anne Colby, and Richard A. Shweder (eds), Ethnography and Human
Development: Context and Meaning in Social Inquiry (Chicago: University of Chicago Press).
Sil, Rudra. 2000. “The Division of Labor in Social Science Research: Unified Methodology or
‘Organic Solidarity’?” Polity 32:4 (Summer) 499-531.
Skocpol, Theda. 1979. States and Social Revolutions: A Comparative Analysis of France, Russia, and
China. Cambridge: Cambridge University Press.
Skocpol, Theda, Margaret Somers. 1980. “The Uses of Comparative History in Macrosocial
Inquiry.” Comparative Studies in Society and History 22:2 (April) 147-97.
Snow, C.P. 1959/1993. The Two Cultures. Cambridge: Cambridge University Press.
Strauss, Anselm, Juliet Corbin. 1998. Basic of Qualitative Research: Techniques and Procedures for
Developing Grounded Theory. Thousand Oaks: Sage.
Van Evera, Stephen. 1997. Guide to Methods for Students of Political Science. Ithaca: Cornell University
Press.
Waldner, David. 2012. “Process Tracing and Causal Mechanisms.” In Harold Kincaid (ed),
Oxford Handbook of Philosophy of Social Science (Oxford: Oxford University Press) 65-84.
Waldner, David. 2015a. “Process Tracing and Qualitative Causal Inference.” Security Studies 24:2,
239-50.
Waldner, David. 2015b. “What Makes Process Tracing Good? Causal Mechanisms, Causal
Inference, and the Completeness Standard in Comparative Politics.” In Andrew Bennett,
Jeffrey T. Checkel (eds), Process Tracing: From Metaphor to Analytic Tool (Cambridge: Cambridge
University Press) 126-52.
23
Walter, Barbara. 2002. Committing to Peace: The Successful Settlement of Civil Wars. Princeton:
Princeton University Press.
Yanow, Dvora, Peregrine Schwartz-Shea (eds). 2013. Interpretation and Method: Empirical Research
Methods and the Interpretive Turn, 2d ed. Armonk, NY: M E Sharpe.
24