Force and Influence in Content Analysis: The Production of New

Quality & Quantity 37: 221–238, 2003.
© 2003 Kluwer Academic Publishers. Printed in the Netherlands.
221
Force and Influence in Content Analysis: The
Production of New Social Knowledge
ROBERT HOGENRAAD1, DEAN P. MCKENZIE2 and NORMAND
PÉLADEAU3
1 Department of Psychology, Catholic University of Louvain, Louvain-la-Neuve, Belgium;
2 Department of Psychological Medicine, Monash University, Melbourne, Australia; 3 Provalis
Research and Université du Québec à Montréal, Montreal, Canada
Abstract. We examine the two traditions of content analysis: the first in which one substitutes words
of a text with categories, and the second in which one looks for clusters of words that may refer to
a theme. In the first tradition, preexisting dictionary categories give meaning to the words; in the
second, meaning comes after the fact. Preexisting dictionary categories (the substitution model) are
calibrated instruments applied within experimental designs that leave no space for doubt; meanwhile,
the ability of the correlational model to conjure up complex themes from fragments of a text yields no
unique solution. These differences have bearings on the production of new social knowledge. We expound on the epistemological foundations of the two traditions of interpretation and draw from them
decision rules upon which one may rely for choosing among appropriate content-analytic tactics.
Two reasons make this essay timely and critical: (1) the increasing variety of new content-analytic
software for particular purposes and (2) the almost exclusive focusing on software and technology
at the expense of adjusting the choice of the software to the nature of the text. Two studies, one in
historiometry, the other in autobiography, illustrate the liabilities and benefits of the two models of
content analysis.
Key words: tactics of computer-aided content analysis, hermeneutic chiasma, words as predictors
versus words as symptoms
Abbreviations: IR = Industrial Relations, PROTAN = Protocol Analyzer, RID = Regressive Imagery
Dictionary
Traditions choose you as much as you choose them. On the subject of content
analysis, computer-aided or not, analysts do not have much choice but to fit into
one of two traditions – delineated by Weber (1983) in this journal. In the first,
dictionary-based tradition, one substitutes words of the text with categories; in
the second, correlational tradition, one correlates words of the text and looks for
suggestive patterns (themes).
These two traditions of interpretation have always consisted of either donning
an attitude of mistrust and suspicion toward the text or accepting and commenting
the text as it is, without altering it (Norris, 1987: 229). To the tradition of distrust
corresponds the content-analytic tactic based on forced substitution (of words of
the text with dictionary categories); this tactic provides “answers to preexisting
questions” (Toulmin, 1982: 95). To the other tradition corresponds the tactic of
222
ROBERT HOGENRAAD ET AL.
looking for contiguities between words of the text; this tactic brings readers or
analysts to unfix the manifest meaning attached to words and, in an attempt to
generalize from preexisting facts, to discover latent ones (Handelman, 1982: xiii).
Handelman dubs the substitutive tradition, patristic; the correlational one, rabbinic.
Nomothetic dictionaries cause a text to share features with other texts. Idiographic
contiguities are less likely to be shared by other texts, because contiguities are
unique, never seen before, never to be seen after (Hogenraad, 2002).
Not only do these two traditions not stand easily together for a team photograph,
but their indiscriminate use to analyze text not otherwise qualified may lead one to
obtaining confusing results. The purpose of the present paper is (1) to expound
on the epistemological foundations of the two dominant traditions of content analysis; and also (2) to draw, from these foundations, decision rules concerning the
choice of content-analytic tactics. The scaffolding of a coherence in content analysis should get us much further in our capabilities of evaluating content-analytic
findings with respect to the production of new social knowledge: More precisely,
the nature of the text to be analyzed determines in part the choice of the tactic
which in turn determines in part the degree to which findings can be cumulative.
Both Weber (1983) and his contender Muskens (1985) were curiously blind to the
dependence of the choice of the tactics on the nature of the text.
1. Weber’s 1983 “Measurement Models . . . ” Revisited
Weber was at his best when he reserved the term category for “groups of words
which have similar meanings and/or connotations” (1983: 140), and the term
theme for “clusters of words with different meanings or connotations that taken
together refer to some theme or issue” (ibid., italics in the original). The principles
which inform the established experimental treatment of textual data do not lie in
the nature of the data, but in the use of tools required to process data through
experimental/control binary designs. The use of calibrated computer-readable
dictionaries allows the content analyst to zero in on and extract the relevant information, airily lopping off the human bits that do not fit into the experimental design.
It is not hard to see why: There always was, in the disciplines concerned with the
experimental treatment of language, a feeling of truth in doing experiments and
coming up with results, stemming in part from a reluctance to confront anything
that cannot be brought under the control of the experimental design.
When Weber’s paper appeared in 1983, the only complete and autonomous system of content analysis in the correlational tradition was Iker’s WORDS computer
program (Iker and Klein, 1974). However, things have begun perceptibly to change,
to wit, the return of interest in thematics, i.e., in capturing the information available
but disseminated throughout the text like shifting mists (Duchastel et al., 1992;
Dyer, 1983; Salton et al., 1994). Trying to grasp dispersed information – a moving
target without fixed meaning, as indeed topics won’t stay in place for the length
even of a moderately short sentence – is what thematics is aiming at. We are now
FORCE AND INFLUENCE IN CONTENT ANALYSIS
223
lost in a palisade of new software for particular content-analytic purposes. Meanwhile, new fields turn up on the horizon, like textual data mining – an expression
seemingly pressed into service by the business community – abetted by a variety of
new algorithms of cluster analysis. The current focus on textual data mining could
be the sign that contemporary society is tired of scientists who have eyes only for
a few variables being silent on the rest. Says Hannah in Tom Stoppard’s Arcadia
(1993: 59), “You’ve left out everything which doesn’t fit”; and again, “You’ve gone
from a glint in your eye to a sure thing in a hop”. One view that helps us to see
what has been too obvious or continuously ignored is this: that looking for patterns
of variables obliges us to assimilate as much as possible of the totality of the text
to guard against selectivity, and to maintain a sense of proportion. The remainder
of this paper examines the significance and implications of “looking for patterns”
in texts.
The increasing variety of new software of content analysis for particular purposes, each with implicit assumptions – a trivial truth in itself – makes it timely
critical to examine the epistemological foundations on which content analysis relies. What is striking – but less trivial – about these developments is the almost
exclusive focusing on software and technology at the expense of the question of
adjusting the choice of the software to the nature of the text.
2. Two Syntaxes of Knowing: The Code and the Context
For most of the last 50 years, the drift of computer-supported content analysis has
been on the word as unit of analysis (Perry and Kent, 1958; Péladeau, 1998; Sammet, 1969; SPSS, 1997; Stone, 1997; Stone et al., 1962) rather than on sentences
(Gottschalk, 1997; Gottschalk and Bechtel, 1982), semantic structures (Roberts,
1997), or linguistic neural networks (Carley, 1994). This does not mean that these
other units of analysis are less interesting than the word. Our perspective is primarily on the word as unit of analysis, because systems based on words have vaster
realms of applications while also being able to manage huge amounts of text, not
for lack of consideration of other methods. More basically, capitalizing on the word
as a unit of content analysis finds a strong support in the fundamentals of language,
similarity and contiguity (Firth, 1957). When a speaker selects a word among a set
of possible ones (similarity, e.g., SNOW rather than ICE), then arranges them into
a meaningful sequence (contiguity, e.g., IS MELTING), the processes involved go
deep down into the twofold character of language, corresponding to two types of
speech disturbances, similarity and contiguity disorders (Jakobson, 1956).
It may be a pleasant surprise for text analysts to discover that the 1960’s technology (Stone et al.’s 1962, 1966 totemic GENERAL INQUIRER or Iker and Klein’s
1974 WORDS; see also Salton, 1991), with the word as unit of analysis, has been
repeatedly tried and tested, and still works very satisfactorily. A quick look at web
pages related to content analysis, e.g., www.gsu.edu/∼wwwcom/content.html or
www.intext.de/textanae.htm, suggests that the word technology is still in high gear,
224
ROBERT HOGENRAAD ET AL.
particularly when combined with new ideas in statistics: autocorrelation modeling,
resampling, or classification and regression techniques that recursively test their
assumptions to see if they may not themselves be in error (Efron and Tibshirani,
1991; Hogenraad and McKenzie, 1999).
2.1. WHAT A CODE DOES ( THE SUBSTITUTION DICTIONARY- BASED MODEL )
Let us agree that the words ICE, SNOW, and IGLOO share a fixed common code, i.e.,
possess in common an idea of COLD . The content-analytic processing of text starts
when a list of words is selected and organized, and recursively so, as in lists of
lists, in a manner such that the shared common codes are set to be of some agreedupon and controlled heuristic value (determined independently in other places of
the field of enquiry). In content analysis, a dictionary is such a heuristic list of lists
of words, all sharing at least one code, and all being substitutable to each other by
virtue of their common codes. The list may be extensive or not, but this is a question
of validity that needs to be answered empirically. For example: Martindale (1979)
validated his “Regressive Imagery Dictionary” (RID) (1975); Saris-Gallhofer et
al.’s (1978), Holsti’s “Stanford Political Dictionary” (1969: 164–190); Whissell et
al.’s (1986), her “Dictionary of Emotions”; Winter (1973), McClelland’s imagery
motive (1975); finally, Anderson and McMaster (1982, 1986) and Bestgen (1994)
showed that dictionary words do almost as well as human readers in evaluating the
emotional content of short stories.
Having a common code of some heuristic value at the base of a dictionary is
what converts the dictionary into a quasi-experimental tool designed to predictively
test a theory. A dictionary is like the needle of a galvanometer which retraces and
amplifies the variations in a current. If the current increases 10 percent, the needle
goes up, say, two scale degrees. When a text is compared to the words of the list,
each word of the text being compared to each word of the list, what does not match
the list is ignored, and we return full-circle to the hypothesis-verification model of
standard experimental science: The substitution model is selective, i.e., it predicts
a single issue – that is what hypotheses are for – and unambiguously so – the
hypothesis is verified or it is not –, leaving no room for doubt, at least in an ideal
world.
2.2. THE CORRELATIONAL MODEL : “ EACH WORD WHEN USED IN A NEW
CONTEXT IS A NEW WORD ” ( FIRTH , 1957: 190)
The above quote by linguist J. R. Firth illustrates what makes the correlational
model incommensurable with the substitution one. The difference demands to redraw the line between the two models. A good stab at defining the correlational
model is that it stems from a desire to meet the text on its own terms, and not in
terms of fixed codes shared by a list of words. The correlational model is predicated
on the notion that the associations existing between words allow one to gain know-
FORCE AND INFLUENCE IN CONTENT ANALYSIS
225
ledge of the patterns of distribution of soft-fingered themes as the text unfolds.
Church and Hanks argue at greater length that “it is indeed common practice in
linguistics to classify words not only on the basis of their meanings but also on
the basis of their co-occurrence with other words” (1990: 22). What is at stake
here is the power of blending an association measure and words of the text; yet,
applied to subcategories of a dictionary, the same association measure would still
be marked by the flaw and asset of the substitution model: The flaw at the heart
of the substitution model is that it lifts the words out of context, its asset, that
the words keep the same code throughout the text. At the opposite, in context, as
each addicted reader knows, words lose their correspondences between sign and
meaning while being enriched of new significance. Also, if dictionaries need to
be designed for each language in which one intends to apply them, word-word
associations are in this sense language-independent.
3. On Aboutness: A Commendation for Vagueness
The question facing us now is how to tackle with themes that are as real as Henry
James’s “figure in the carpet” (James and Kermode, 1896/1986), how do we, in
other words, cut the themes at the joints. Themes scatter and gather again, like
Bach’s “Goldberg Variations”, yet, we can never see the patterns that pattern the
text, as they are everywhere and nowhere.
A distinctive feature of a category is that it takes only one word to make up
a category, but that it takes many to make up a theme. In the earlier series ICE,
SNOW , and IGLOO making up the hypothetical category COLD , each word has the
same exchange value with respect to the category COLD, i.e., each is substitutable
to any other in the series because the category retains only the COLD dimension
that is common to all. There is among these words a certain degree of likeness
of meaning. Now consider another group made up of NATO, ARMY, MILITARY,
ATTACK , and REVOLUTIONARY . This group of words characterized one of the dimensions in a factor analysis of 86 words distributed over 19 documents produced
by a terrorist group in Europe in the years 1984–1985 (Hogenraad et al., 1995a: 55–
56). Taken separately, none of these words can be substituted with another, except
perhaps the second and third. It is correct to state that this group of words is about
NATO, and about the army, and about the military establishment, etc., but it is only
when one recombines their different use values together and watches the reciprocal
influences of these values in the text that one can give shape, a posteriori, to a theme
by connecting the dots between these words. In the act of conjuring up complex
themes from fragments of a text, words are only pointers to a theme, but they are
not the theme. Dubbing this cluster of words “military concern” is not enough. One
has also to describe their affinities by careful phrasing. The idea that looms here is
that connecting the dots between the words can only mean that behind words, there
are only other words.
226
ROBERT HOGENRAAD ET AL.
4. Putting the Text First: The Hermeneutic Chiasma
The answer to the next question about when to use one tactic rather than the other
owes much to Spence’s (1982) rule of critical asymmetry, with reference to two
kinds of reading. The rule of critical asymmetry sets the conditions in which analyzing the content of a text can reasonably be carried out and those in which such
content analyses are unsuitable.
The more clear and transparent a text is, the less effort is required of the receiver.
In other words, on the sender’s side, a high level of active structuring authorizes
a low level of active structuring on the receiving end; a low structuring of the
text on the sender’s side requires active structuring reading on the receiving end.
For content analysis, the rule translates as “the less structured the text, the more
structured and categorizing must be the analysis”. Nothing can be more puzzling
than trying to understand the content of a postcard that lands by mistake in your
mailbox. The writer of the postcard does not feel the need to explain to the intended
receiver a frame of reference which the writer knows is perfectly known to the
receiver, yet, only to him or her. That the hairline context is unknown to a third
party makes the postcard unstructured and a quasi-private language in the eyes of
that party.1
5. An Illustration Using Historiometry
We are now living in the twilight of a declining sensate era, and after a stormy
period of transition new ideational values will appear on the horizon (Michel
P. Richard, Introduction to the Transaction Edition of Sorokin’s Social and
Cultural Dynamics, 1985: xi).
5.1. INDUSTRIAL RELATIONS ( IR )
IR subject traditionally rested on the tripartite contract comprising government,
business, and labor. Adverse economic conditions precipitated governments to
withdraw from the triangle in which various forces operated, giving managers
wider options, e.g., to escape higher labor costs by avoiding participatory labor
relations (Hyman, 1997). We wanted to know how these changing conditions affected the evolution of IR as a scientific discipline. Kuhn’s (2000) model of the
development of science best fits science at “the cutting edge of research”; for its
part, a text-based history of science attempts to grasp the minor developments of
science as well. We limited the analysis of scholarly communication to the abstracts
of five IR journals.2 Scientific abstracts are both a reflection and a determinant of
a science mind set. Across journals, the total number of words in the abstracts
amounts to 353,822, the total number of abstracts is 3,455.
FORCE AND INFLUENCE IN CONTENT ANALYSIS
227
5.2. A COMPUTER - AIDED CONTENT ANALYSIS
The corpus was analyzed with the help of the PROTAN content analysis system (Hogenraad et al., 1995b). What PROTAN valuably does, beyond counting
words, is to standardize and categorize words, look for word-word correlations
(theme-spotting), recognize patterns of words, juxtapose sentences containing particular combinations of categories, and insert nonverbal information into the textual
analysis. What it does still more valuably is to create, for each of these tasks,
system-independent output files that are then submitted to statistical tools.
Using a suffix removal procedure, we bring down the number of different word
entries to some degree by taking out of the texts most of the verb declensions, plural
and other adverbial forms. Disambiguating is optional; in practice, pace Kelley and
Stone (1975), we have found it more effective simply to avoid very ambiguous
words, having observed that the cost of the process is not in proportion to the
ensuing gain.
In the search for word-word correlations, the first step is to collect all the words
with a frequency above a given level and to prune them of everything that is
not a content word. In a second step, all the words retained in the first step are
cross-correlated: Significant cross-correlations are then exponentiated usually to
the fifth power while at the same time insignificant correlations are set to zero.
This transformation maintains high correlations at a high to moderately high level,
while accelerating the decrease of lower ones. This algorithm was devised by Iker
(1974a, 1975) within the WORDS system (Iker and Klein, 1974), and revived
within PROTAN. In a third and last step, the most associationally rich words that
emerge from this procedure are then factor or cluster analyzed.
5.3. RESULTS
5.3.1. A Result without Suspicion
Like any science, IR should achieve their goal by moving toward ever more abstract concepts, and away from sensate ones (Martindale, 1990: 350–368): Without
principled concepts, IR would not even be able to identify industrial situations.
We assessed the degree of abstractedness of the texts with the American-English
version of Martindale’s RID (1975). The RID is composed of two calibrated lists
of words (N = 2, 483), one (N = 1815), with words such as LOVE, SEX, FOOD,
COLD , CHAOS, DREAM , FLYING , representing sensate systems viewing reality as
that which is presented to the sense organs; the other (N = 668), with words
such as MONEY, WORK, DISCIPLINE, POLICE, TIME, JUSTICE, LAW, representing
ideational systems, i.e., conventions. When all the words of the texts are compared
to all the words of the dictionary, we observe the profile in Figure 1, arrayed by
year (each data point is the difference, in percentage per year, between the lists of
convention and sensate words): e.g., in Industrial and Labor Relations Review,
228
ROBERT HOGENRAAD ET AL.
Figure 1. Symbolic thought contents (convention minus sensate words) in the abstracts
(1971–1995) of Industrial and Labor Relations Review, observed and fitted.
R 2 = 0.78, F (1, 23) = 79.35, p < 0.0001, 2nd and 4th order (p < 0.50)
autocorrelations removed (Hogenraad et al., 1997).
Is this a chance result? Resampling the series 2,000 times with the SIMSTAT
bootstrap algorithm (Péladeau, 1996) yields an average R 2 of 0.78, 95% confidence intervals ranging from 0.63 to 0.89; the average first degree beta value is
−0.05, 95% confidence intervals ranging from −0.06 to −0.04. These results are
totally unambiguous, not so much in regard to the replications as in regard to a
predicted hypothesis being confirmed or not. Actually, the predicted hypothesis is
not confirmed, neither for this journal, neither for the remaining four. All we can
do after the predictive test of the theory is to provide interpretations based on our
experience accumulated with an instrument, i.e., the RID dictionary; in this specific
case, we express our conviction that, except for some possible paradigm shift to
come, Figure 1 stops short of the terminal moraine of a discipline that, justifiably or
not, has likely thickened up to problem-solving at the expense of science-building
(Adams, 1988) – without pausing to wonder what a science that is so keen to solve
problems might be a science of. Nonetheless, acknowledging our own reactions to
the result, to use Toulmin’s words (1982: 97), does not influence it, as we will see
is possible in the correlational model.
FORCE AND INFLUENCE IN CONTENT ANALYSIS
229
Admittedly, in addition to being suspicious of believing that one could discover
the essential pattern of a process still going on, some could consider as an overstatement our pretension to catch into one equation the deep intricacies of years
of writing in IR. Yet, the most economical way to enhance the effectiveness of
a purposeful action, i.e., testing if a particular discipline becomes really more
conceptual, is by verifying focused hypotheses. Meanwhile, we close our eyes on
anything that is irrelevant to the case.
5.3.2. A Quest without a Quarry
With the help of FACSTRAP, a bootstrap procedure for factor analysis (Thompson,
1988), we arbitrarily and systematically requested three factors from the wordword correlation matrices between the different words retained from the abstracts
in each journal. From the abstracts of Industrial Relations for example, we selected
92 richly associated words distributed over 25 years (1971–1995). Table 1 displays
the words of the first factor of which they are symptoms, together with the mean
estimate loadings (100 resamplings), their variability (SD) as estimate of their
stability, and the original coefficients in the last column. No matter how many
factors we request from the correlation matrix, a single motif keeps surfacing as a
first prominent factor, i.e., the social (as opposed to the economic) or nonprofit (as
opposed to profit). In the remaining journals abstracts (not shown here), the same
motif of profit or individualism is not only visible, but also impossible to avoid. In
addition to sharing the same world view, i.e., a separation between the economic
and the social realm, the journals show the same trend over time, e.g., in Industrial
Relations (Figure 2), R 2 = 0.65, F (2, 22) = 20.29, p < 0.0001, 3rd and 4th order
(p < 0.50) autocorrelations removed.
6. Murray Sidman’s Autobiography in Science: A Cameo on the Superiority
of Intra-Subject over Inter-Subject Variability
Out of the air a voice without a face/ Proved by statistics that some cause
was just/ In tones as dry and level as the place (W. H. Auden’s The Shield of
Achilles, 1976/1994: 597).
In 1994, the psychologist Murray Sidman wrote a research story of his work
on equivalence relations (between words and their referents, e.g., in aphasic patients and retarded Down’s-syndrome children). Sidman’s is an autobiography in
the literal sense of the term, one that allows readers to attend to the “genesis and
development of a scientific fact”, to borrow the title of Fleck’s 1979 work. Along
with the 11 reprints Sidman selected as representative of his work on this question,
he commented on the reasons why each experiment was done, even the mood of
the research team at the time; most fittingly for us, he arrayed the reprints, not
chronologically, but along the order of the development of his thoughts on the
subject. With Sidman’s story we have an elegant cameo test case for evaluating
230
ROBERT HOGENRAAD ET AL.
Table I. The social motif (mean eigenvalue = 14.4, SD
= 2.7) in the Industrial Relations abstracts (mean estimates of structure coefficients based on 100 replications
with sample size = 25)
Word or
cluster
Mean estimate
loading
SD
Original
coefficient
Educate
Social
History
Black
Suggest
Woman
Equal
Law
Man
Race
Discriminate
Member
Role
Bargaining
Reduce
Result
0.74
0.69
0.61
0.61
0.60
0.53
0.58
0.58
0.53
0.52
0.51
0.50
0.49
0.48
0.43
0.43
0.36
0.39
0.34
0.44
0.32
0.28
0.48
0.36
0.29
0.50
0.32
0.26
0.45
0.22
0.33
0.29
0.49
0.45
0.07
0.43
0.20
0.19
0.27
0.46
0.06
0.46
0.47
−0.03
0.59
0.12
0.07
−0.48
the hypothesis that, as a scientist develops his grasp of a question, he moves away
from sensate thought contents. This test case is elegant on several grounds. First,
Sidman tells us the whole story of each paper, a story that journals never let us
have. Secondly, the rationally ordered reprints offer us both a behavioral baseline
– Sidman’s first paper – and a systematic intra-subject replication of a quasiexperiment – his remaining 10 papers – that further avoids the obscuring effect
of inter-subject variability of group statistical procedures (Bakan, 1954; Sidman,
1960/1988). For this quasi-experiment, we downloaded from the PsycInfo data
base (www.apa.org/psycinfo/) the abstracts of each of the 11 papers commented
upon by Sidman in his research story. The data collected made a total of 1,489
words of which 451 different ones. A simple rate of sensate words was computed
using the RID.
The result, in Figure 3,3 displays another side of the elegance of our test
case. The anticipated decline in sensate words is highly visible [R 2 = 0.74,
F (2, 8) = 11.43, p < 0.01, 1st and 2nd order (p < 0.50) autocorrelations removed]. But the final autobiographic text, numbered 11 though dated 1989, makes
for an even stronger case. Sidman comments about this last study, “I am presenting
FORCE AND INFLUENCE IN CONTENT ANALYSIS
231
Figure 2. From the social realm to its opposite (profit) in the abstracts of Industrial Relations
(1981–1995), observed and fitted.
the study out of order because (. . . ) the considerations underlying the Contextual
Control experiments and the procedures they introduce mark off a new direction
for experimental, applied, and theoretical investigations of equivalence” (1994:
475; italics in the original, underlining is ours). Text 11 marks the end of a line
of thought and the beginning of a new line. Little wonder then that sensate words
reverse to the baseline.
7. Discussion: A Spiral of Interpretations
In both IR and Sidman’s studies, we predicted the modification that one variable
(simple rate of sensate words in the Sidman’s study or difference between rates of
conceptual and sensate words in the IR one) would go through over time. In the
Sidman’s study, our prediction was successful; in the IR one, it was not. Actually
we obtained the converse of what we expected. Yet, we could reconcile the result
– as a way of gathering new knowledge – with other information regarding the
transformation of a scientific discipline into a problem-solving one . . . before a new
cycle of idealization, come the day (Kaufman, 1993: 237, note 1). It is as if the toand-fro of facts and theory recurrently refining each other, as in normal science,
232
ROBERT HOGENRAAD ET AL.
Figure 3. Rate of sensate words in Murray Sidman’s research story (1971–1990) on equivalence relations.
was superseded in North-American, British, and Australian IR by an accelerating
process of piling up tangible facts on tangible facts (anecdotalism), causing the
discipline to loose its conceptual ballast, possibly at the expense of de-skilling its
own scientists (Reich, 1991: 282–300).
However, in both studies, our prediction rested on a preexisting body of knowledge, i.e., the knowledge accumulated about Martindale’s (1990) evolutionary
theory of scientific change and the instrument that goes with it, the RID dictionary.
Not so though when we looked for themes, as in the analysis of the abstracts of Industrial Relations, in which case it was totally impossible to foresee which themes
could have emerged. If words are predictors in the dictionary-based procedure,
words can only be symptoms when one is looking for patterns, whichever the
discovery procedure, factor, cluster, or correspondence analysis. Theme making is
a self-explanatory unveiling of symptoms arrived at but by cognitive and emotional
participation in a cause that contains both the method and the object of observation.
The trouble is, as Rhees notes (Wittgenstein, 1966: 26, note 1), that “this does
not help us to predict anything” (italics in the original). Theme making merely
extends our knowledge of preexisting facts (while the substitution model redefines
our conception of them).
FORCE AND INFLUENCE IN CONTENT ANALYSIS
233
To say this is of course not to imply that results obtained with a dictionary are
necessarily unconnected to those obtained from correlating text words, although
much depends here on the dictionary selected and the reasons why it is selected.
Actually, in the IR case, these results are related. In the abstracts of Industrial
and Labor Relations Review for example, the correlation between the observed
profiles of symbolic thought content (Figure 1) and the profit motif is −0.55
(N = 25, 95% confidence intervals, −0.76 to −0.30 after 25,000 replications);
in the other journals, the correlations between symbolic thought content and the
profit motif are similarly negative and significant. These correlations confirm Sorokin’s (1957/1985: 93 and 214) theory and observations regarding the relationships
between individualism and sensate literature.
There remains that theme-spotting makes the analyst always vulnerable to the
accusation of having missed that one crucial word. To cap it all, recognizing one
pattern raises interminably new suggestions waiting to be disclosed. The treatment
of themes discussed and exemplified here accentuates three difficulties of the correlational model. The first, yet least problematic, is that the word-word correlation
approach rests on an old rhetoric; indeed, the contiguity principle has a long history
in behavioral science (Iker, 1974b). It is a relatively simple device to have two
words coexist side by side and then argue that what the words represent is coherent
because they coexist. The name of this device is reifying, i.e., granting the status
of static reality to a moving target (Spencer, 1982). If philosophers are uneasy
with reifying (Vayda, 1991), we must concede that much of social and behavioral
science practice rests on it without much ado. The second, more serious, difficulty
is that looking for themes contributes to the permanent suspicion that there could
be another sense to a text, that is, another hidden theme leading to a spiral of
interpretations at the expense of communication (Raven et al., 1971). Actually,
the difficulty is only one facet of a larger question identified by Sorokin under
the label of “principle of limits” (1985: 647–663). Albeit one may suppose that
the number of possible themes and the amount of their respective variations in a
given text is finite, one has no means to know these limits. Put differently, themes
have more possibilities of change than dictionary categories, while some facets
of a theme may never come out in a given text, or quite infrequently. The third,
pragmatic, difficulty of eliciting robust themes comes from the nature of the text:
The less tied to the constraint of reality, as in literature, the wider the gap between
sign and meaning (de Man, 1989: 17), hence the name, “literary fiction”. Inversely,
one may expect themes to be transparent whenever there is pressure to abide by the
constraints of reality in language use, even if one cannot predict which themes will
emerge from a correlational analysis; in this ideal case, analysts get the best of both
worlds: theme-spotting techniques may yield well-distinct themes and dictionary
techniques, new re-definitions of facts.
234
ROBERT HOGENRAAD ET AL.
8. Conclusions
“There’s always more to it” says character David Ferrie in Don DeLillo’s (1988:
321) novel “Libra”. Indeed! The correlational model of content analysis yields no
unique solution: It will never be the “Food and Drug Administration” of texts. For
the less a text is structured, the more discovering a pattern in it is likely to produce
results that may be difficult or even impossible to interpret. Not only may a pattern
not be the ultimate one, but even if it is, one may find it not easily transportable to
other contexts. Besides, not only discovering the pattern of a class of words does
not allow one to predict, but it may be fallacious to believe that one could discover
the essential pattern of a process that is still going on. Finally, themes or patterns
being pure representations (i.e., only names, without fundamental reality, as real as
the epitaph of Sherlock Holmes), when one analyzes a theme or pattern, one should
not fall into the fancy of its existence. Altogether, there is no way to get closer to the
way themes really are, for nothing can be like a sentence except another sentence.
Acknowledgement
Acknowledgment is made to the National Fund for Scientific Research, Belgium (Grant No. 1.5.126.99, 1998–1999). Reprint requests should be addressed
to Robert Hogenraad, Psychology Department, Catholic University of Louvain, 10 Place Cardinal Mercier, B-1348 Louvain-la-Neuve (Belgium); E-mail
[email protected]
Notes
1. Technical conversations between experts, too, turn easily into a quasi-private language because
so many elements of it are implicit, i.e., the text is structured only for the experts, but not for
outsiders. Consider the following:
A: “Bill, you’d better get that Linden back or you’ll lose that baby too.”
B: “Yeah, I just lost 81”.
B: “Look any better?”
A: “No. You still got to get rid of about 400 Bill because you’re 400 over the short time emergency on that 80 line.”
B: “Yeah – that’s what I’m saying. Can you help me with that?”
(Conversations between Senior Pool Dispatcher (A) and Con Edison System Operator (B)
between 8:56 pm and 9:02 pm, July 13, 1977. Extracts from the State of New York Investigation,
New York City Blackout, July 13, 1977, p. 13. (blackout.gmu.edu/archive/pfd/ny_state_77.pdf).
2. These journals are: British Journal of Industrial Relations, London (1963–1995), Industrial Relations, Berkeley (1962–1995), Industrial and Labor Relations Review, New York (1947–1995),
Journal of Industrial Relations, Sydney (1959–1995), and Relations Industrielles/Industrial
Relations, Quebec (1945–1995).
3. Murray Sidman’s papers analyzed in Figure 3 are:
(1) Sidman, M. (1971). Reading and auditory-visual equivalences. Journal of Speech and
Hearing Research 14: 5–13.
(2) Sidman, M. & Cresson, O. (1973). Reading and cross-modal transfer of stimulus equivalences in severe retardation. American Journal of Mental Deficiency 77: 515–523.
FORCE AND INFLUENCE IN CONTENT ANALYSIS
235
(3) Sidman, M. (1977). Teaching some basic prerequisites for reading. In: P. Mittler (ed.), Research to Practice in Mental Retardation: Vol. 2. Education and Training. Baltimore, MD:
University Parks Press, pp. 353–360.
(4) Sidman, M., Cresson, O. & Willson-Morris, M. (1974). Acquisition of matching to sample
via mediated transfer. Journal of the Experimental Analysis of Behavior 22: 261–273.
(5) Sidman, M., Rauzin, R., Lazar, R., Cunningham, S., Tailby, W. & Carrigan, P. (1982). A
search for symmetry in the conditional discriminations of rhesus monkeys, baboons, and
children. Journal of the Experimental Analysis of Behavior 37: 23–44.
(6) Sidman, M. & Tailby, W. (1982). Conditional discrimination vs. matching to sample: An
expansion of the testing paradigm. Journal of the Experimental Analysis of Behavior 37:
5–22.
(7) Sidman, M., Kirk, B. & Willson-Morris, M. (1985). Six-member stimulus classes generated
by conditional-discrimination procedures. Journal of the Experimental Analysis of Behavior
43: 21–42.
(8) Sidman, M., Willson-Morris, M. & Kirk, B. (1986). Matching-to-sample procedures and
the development of equivalence relations: The role of naming. Analysis and Intervention in
Developmental Disabilities 6: 1–19.
(9) Sidman, M. (1990). Equivalence relations: where do they come from? In: D. E. Blackmore & H. Lejeune (eds.), Behaviour Analysis in Theory and Practice: Contributions and
Controversies. Hillsdale, NJ: Erlbaum, pp. 93–114.
(10) Sidman, M., Wynne, C. K., Maguire, R. W. & Barnes, T. (1989). Functional classes and
equivalence relations. Journal of the Experimental Analysis of Behavior 52: 261–274.
(11) Bush, K. M., Sidman, M. & de Rose, T. (1989). Contextual control of emergent equivalence
relations. Journal of the Experimental Analysis of Behavior 51: 29–45.
References
Adams, R. J. (1988). Desperately seeking industrial relations theory. International Journal of
Comparative Labor Law and Industrial Relations 4: 1–10.
Anderson, C. W. & McMaster, G. E. (1982). Computer assisted modeling of affective tone in written
documents. Computers and the Humanities 16: 1–9.
Anderson, C. W. & McMaster, G. E. (1986). Modeling emotional tone in stories using tension levels
and categorical states. Computers and the Humanities 20: 3–9.
Auden, W. H. (1976/1994). Collected Poems. London: Faber and Faber.
Bakan, D. (1954). A generalization of Sidman’s results on group and individual functions. Psychological Bulletin 51: 63–64.
Bestgen, Y. (1994). Can emotional valence in stories be determined from words? Cognition and
Emotion 8: 21–36.
Carley, K. M. (1994). Extracting culture through textual analysis. Poetics 22: 291–312.
Church, K. W. & Hanks, P. (1990). Word association norms, mutual information, and lexicography.
Computational Linguistics 16: 22–29.
DeLillo, D. (1988). Libra. Harmondsworth: Penguin.
de Man, P. (1989). Blindness and Insight: Essays in the Rhetoric of Contemporary Criticism. (2nd
revised edition, with an introduction by Wlad Godzich). London: Routledge.
Duchastel, J., Paquin, L.-C. & Beauchemin, J. (1992). Automated syntactic text description
enhancement: Thematic structure analysis. Computers and the Humanities 26: 31–42.
Dyer, M. G. (1983). In-Depth Understanding: A Computer Model of Integrated Processing for
Narrative Comprehension. Cambridge, MA: MIT Press.
Efron, B. & Tibshirani, R. (1991). Statistical data in the computer age. Science 253: 390–395.
236
ROBERT HOGENRAAD ET AL.
Firth, J. R. (1957). Modes of meaning (1951). In: J. R. Firth (ed.), Papers in Linguistics 1934–1951.
London: Oxford University Press, pp. 190–215.
Fleck, L. (1979). In: Thaddeus J. Trenn and Robert K. Merton (eds), Genesis and Development of
a Scientific Fact [Entstehung und Entwicklung einer wissenschaftlichen Tatsache: Einführung in
die Lehre vom Denkstil und Denkkollektiv]. Fred Bradley and Thaddeus J. Trenn (trans.). Foreword by Thomas S. Kuhn. Chicago: The University of Chicago Press. (Original work published
1935.)
Gottschalk, L. A. (1997). The unobtrusive measurement of psychological states and traits. In: C. W.
Roberts (ed.), Text Analysis for the Social Sciences: Methods for Drawing Inferences from Texts
and Transcripts. Mahwah, NJ: Erlbaum, pp. 117–130.
Gottschalk, L. A. & Bechtel, R. J. (1982). The measurement of anxiety through the computer analysis
of verbal samples. Comprehensive Psychiatry 23: 364–369.
Handelman, S. A. (1982). The Slayers of Moses: The Emergence of Rabbinic Interpretation in
Modern Literary Theory. Albany: State University of New York Press.
Hogenraad, R. (2002). Moving targets: The making and molding of a theme. In: M. M. Louwerse
& W. van Peer (eds), Thematics: Interdisciplinary Studies. Philadelphia: John Benjamins, pp.
353–376.
Hogenraad, R., Bestgen, Y. & Nysten, J.-L. (1995a). Terrorist rhetoric: Texture and architecture. In:
E. Nissan & K. M. Schmidt (eds), From Information to Knowledge. Conceptual and Content
Analysis by Computer. Oxford: Intellect, pp. 48–59.
Hogenraad, R., Daubies, C. & Bestgen, Y. (1995b). Une Théorie et une Méthode Générale
d’Analyse Textuelle Assistée par Ordinateur. Le Système PROTAN (PROTocol ANalyzer)
(Version du 2 mars 1995) [A General Theory and Method of Computer-Aided Text Analysis: The PROTAN System (PROTocol ANalyzer), Version of March 2, 1995]. Computer program, Louvain-la-Neuve (Belgium), Psychology, Catholic University of Louvain.
http://www.psor.ucl.ac.be/protan/protanae.html
Hogenraad, R. & McKenzie, D. P. (1999). Replicating text: The cumulation of knowledge in social
science. Quality & Quantity 33: 97–116.
Hogenraad, R., McKenzie, D. P. & Martindale, C. (1997). The enemy within: Autocorrelation bias
in content analysis of narratives. Computers and the Humanities 30: 433–439.
Holsti, O. R. (1969). Content Analysis for the Social Sciences and the Humanities. Reading, MA:
Addison Wesley.
Hyman, R. (1997). La géométrie du syndicalisme: Une analyse comparative des identités et des
idéologies [The geometry of trade unionism: A comparative analysis of identities and ideology.].
Relations Industrielles/Industrial Relations 52: 7–38.
Iker, H. P. (1974a). SELECT: A computer program to identify associationally rich words for content
analysis: I. Statistical results. Computers and the Humanities 8: 313–319.
Iker, H. P. (1974b). An historical note on the use of word-frequency contiguities in content analysis.
Computers and the Humanities 8: 93–98.
Iker, H. P. (1975). SELECT: A computer program to identify associationally rich words for content
analysis. II. Substantive results. Computers and the Humanities 9: 3–12.
Iker, H. P. & Klein, R. H. (1974). WORDS: A computer system for the analysis of content. Behavior
Research Methods & Instrumentation 6: 430–438.
Jakobson, R. (1956). Two aspects of language and two types of aphasic disturbances. In: R. Jakobson
& M. Halle (eds), Fundamentals of Language. The Hague: Mouton, pp. 53–82.
James, H. & Kermode, F. E. (eds) (1896/1986). The Figure in the Carpet and Other Stories. New
York: Viking Press.
Kaufman, B. E. (1993). The Origins and Evolution of the Field of Industrial Relations in the United
States. Ithaca, NY: ILR Press.
Kelley, E. F. & Stone, P. J. (1975). Computer Recognition of English Word Senses. Amsterdam:
North-Holland.
FORCE AND INFLUENCE IN CONTENT ANALYSIS
237
Kuhn, T. (2000). In: James Conant and John Haugeland (eds), The Road Since “Structure”:
Philosophical Essays, 1970–1993, with an Autobiographical Interview. Chicago: University of
Chicago Press.
Martindale, C. (1975). Romantic Progression: The Psychology of Literary History. Washington, DC:
Hemisphere.
Martindale, C. (1979). The night journey: Trends in the content of narratives symbolizing alteration
of consciousness. Journal of Altered States of Consciousness 4: 321–343.
Martindale, C. (1990). The Clockwork Muse: The Predictability of Artistic Change. New York: Basic
Books.
McClelland, D. C. (1975). Power: The Inner Experience. New York: Irvington Publishers.
Muskens, G. (1985). Mathematical analysis of content. Quality & Quantity 19: 99–103.
Norris, C. (1987). Derrida. London: Fontana Press.
Péladeau, N. (1996). Simstat for Windows. User’s Guide (Version 2.0, 10 October 2000). Montreal:
Provalis Research. (www.simstat.com)
Péladeau, N. (1998). WordStat. Content Analysis Module for SIMSTAT. User’s Guide (Version 3.0,
21 December 2000). Montreal: Provalis Research. (www.simstat.com)
Perry, J. W. & Kent, A. (eds) (1958). Tools for Machine Literature Searching: Semantic Code
Dictionary, Equipment, Procedures. New York: Interscience.
Raven, P. H., Berlin, B. & Breedlove, D. E. (1971, 17 December). The origins of taxonomy. Science
174: 1210–1213.
Reich, R. B. (1991). The Work of Nations. Preparing Ourselves for 21st-Century Capitalism. London:
Simon & Schuster.
Roberts, C. W. (ed.) (1997). Text Analysis for the Social Sciences: Methods for Drawing Statistical
Inferences from Texts and Transcripts. Mahwah, NJ: Erlbaum.
Salton, G. (1991). Developments in automatic text retrieval. Science 253: 974–980.
Salton, G., Allan, J., Buckley, C. & Singhal, A. (1994). Automatic analysis, theme generation, and
summarization of machine-readable texts. Science 264: 1421–1426.
Sammet, J. E. (1969). Programming Languages: History and Fundamentals. Englewood Cliffs, NJ:
Prentice-Hall.
Saris-Gallhofer, I. N., Saris, W. E. & Morton, E. L. (1978). A validation study of Holsti’s content
analysis procedure. Quality & Quantity 12:, 131–145.
Sidman, M. (1960/1988). Tactics of Scientific Research: Evaluating Experimental Data in Psychology. Boston: Authors Cooperative.
Sidman, M. (1994). Equivalence Relations and Behavior: A Research Story. Boston: Authors
Cooperative.
Sorokin, P. (1985). Social and Cultural Dynamics. A Study of Change in Major Systems of Art, Truth,
Ethics, Law, and Social Relationships. London: Transaction Publishers.
Spence, D. P. (1982). Narrative Truth and Historical Truth. Meaning and Interpretation in
Psychoanalysis. New York: Norton.
Spencer, M. E. (1982). The ontologies of social science. Philosophy of the Social Sciences 12: 121–
141.
SPSS Inc. (1997). TextSmart 1.0 User’s Guide. SPSS Inc.: Chicago.
Stone, P. J. (1997). Thematic text analysis: New agendas for analyzing text content. In: C. W. Roberts
(ed.), Text Analysis for the Social Science. Methods for Drawing Statistical Inferences from Texts
and Transcripts. Mahwah, NJ: Erlbaum, pp. 35–54.
Stone, P. J., Bales, R. F., Namenwirth, J. Z. & Ogilvie, D. M. (1962). The General Inquirer: A
computer system for content analysis and retrieval based on the sentence as a unit of information.
Behavioral Science 7: 484–498.
Stone, P. J., Dunphy, D. C., Smith, M. S. & Ogilvie, D. M. (1966). The General Inquirer: A Computer
Approach to Content Analysis. Cambridge, MA: MIT Press.
Stoppard, T. (1993). Arcadia. London: Faber and Faber.
238
ROBERT HOGENRAAD ET AL.
Thompson, B. (1988). Program FACSTRAP: A program that computes bootstrap estimates of factor
structures. Educational and Psychological Measurement 48: 681–686.
Toulmin, S. (1982). The construal of reality: Criticism in modern and postmodern science. Critical
Inquiry 9: 93–111.
Vayda, A. P. (1991). Concepts of process in social science explanations. Philosophy of the Social
Sciences 21: 318–331.
Weber, R. P. (1983). Measurement models for content analysis. Quality & Quantity 17: 127–149.
Whissell, C., Fournier, M., Pelland, R., Weir, D. & Makarec, K. (1986). A dictionary of affect in
language. IV. Reliability, validity, and applications. Perceptual and Motor Skills 62: 875–888.
Winter, D. G. (1973). The Power Motive. New York: Free Press.
Wittgenstein, L. (1966). Lectures on aesthetics. In: L. Wittgenstein (ed.), Lectures and Conversations
on Aesthetics, Psychology and Religious Belief. Oxford: Basil Blackwell, pp. 1–40.