Quality & Quantity 37: 221–238, 2003. © 2003 Kluwer Academic Publishers. Printed in the Netherlands. 221 Force and Influence in Content Analysis: The Production of New Social Knowledge ROBERT HOGENRAAD1, DEAN P. MCKENZIE2 and NORMAND PÉLADEAU3 1 Department of Psychology, Catholic University of Louvain, Louvain-la-Neuve, Belgium; 2 Department of Psychological Medicine, Monash University, Melbourne, Australia; 3 Provalis Research and Université du Québec à Montréal, Montreal, Canada Abstract. We examine the two traditions of content analysis: the first in which one substitutes words of a text with categories, and the second in which one looks for clusters of words that may refer to a theme. In the first tradition, preexisting dictionary categories give meaning to the words; in the second, meaning comes after the fact. Preexisting dictionary categories (the substitution model) are calibrated instruments applied within experimental designs that leave no space for doubt; meanwhile, the ability of the correlational model to conjure up complex themes from fragments of a text yields no unique solution. These differences have bearings on the production of new social knowledge. We expound on the epistemological foundations of the two traditions of interpretation and draw from them decision rules upon which one may rely for choosing among appropriate content-analytic tactics. Two reasons make this essay timely and critical: (1) the increasing variety of new content-analytic software for particular purposes and (2) the almost exclusive focusing on software and technology at the expense of adjusting the choice of the software to the nature of the text. Two studies, one in historiometry, the other in autobiography, illustrate the liabilities and benefits of the two models of content analysis. Key words: tactics of computer-aided content analysis, hermeneutic chiasma, words as predictors versus words as symptoms Abbreviations: IR = Industrial Relations, PROTAN = Protocol Analyzer, RID = Regressive Imagery Dictionary Traditions choose you as much as you choose them. On the subject of content analysis, computer-aided or not, analysts do not have much choice but to fit into one of two traditions – delineated by Weber (1983) in this journal. In the first, dictionary-based tradition, one substitutes words of the text with categories; in the second, correlational tradition, one correlates words of the text and looks for suggestive patterns (themes). These two traditions of interpretation have always consisted of either donning an attitude of mistrust and suspicion toward the text or accepting and commenting the text as it is, without altering it (Norris, 1987: 229). To the tradition of distrust corresponds the content-analytic tactic based on forced substitution (of words of the text with dictionary categories); this tactic provides “answers to preexisting questions” (Toulmin, 1982: 95). To the other tradition corresponds the tactic of 222 ROBERT HOGENRAAD ET AL. looking for contiguities between words of the text; this tactic brings readers or analysts to unfix the manifest meaning attached to words and, in an attempt to generalize from preexisting facts, to discover latent ones (Handelman, 1982: xiii). Handelman dubs the substitutive tradition, patristic; the correlational one, rabbinic. Nomothetic dictionaries cause a text to share features with other texts. Idiographic contiguities are less likely to be shared by other texts, because contiguities are unique, never seen before, never to be seen after (Hogenraad, 2002). Not only do these two traditions not stand easily together for a team photograph, but their indiscriminate use to analyze text not otherwise qualified may lead one to obtaining confusing results. The purpose of the present paper is (1) to expound on the epistemological foundations of the two dominant traditions of content analysis; and also (2) to draw, from these foundations, decision rules concerning the choice of content-analytic tactics. The scaffolding of a coherence in content analysis should get us much further in our capabilities of evaluating content-analytic findings with respect to the production of new social knowledge: More precisely, the nature of the text to be analyzed determines in part the choice of the tactic which in turn determines in part the degree to which findings can be cumulative. Both Weber (1983) and his contender Muskens (1985) were curiously blind to the dependence of the choice of the tactics on the nature of the text. 1. Weber’s 1983 “Measurement Models . . . ” Revisited Weber was at his best when he reserved the term category for “groups of words which have similar meanings and/or connotations” (1983: 140), and the term theme for “clusters of words with different meanings or connotations that taken together refer to some theme or issue” (ibid., italics in the original). The principles which inform the established experimental treatment of textual data do not lie in the nature of the data, but in the use of tools required to process data through experimental/control binary designs. The use of calibrated computer-readable dictionaries allows the content analyst to zero in on and extract the relevant information, airily lopping off the human bits that do not fit into the experimental design. It is not hard to see why: There always was, in the disciplines concerned with the experimental treatment of language, a feeling of truth in doing experiments and coming up with results, stemming in part from a reluctance to confront anything that cannot be brought under the control of the experimental design. When Weber’s paper appeared in 1983, the only complete and autonomous system of content analysis in the correlational tradition was Iker’s WORDS computer program (Iker and Klein, 1974). However, things have begun perceptibly to change, to wit, the return of interest in thematics, i.e., in capturing the information available but disseminated throughout the text like shifting mists (Duchastel et al., 1992; Dyer, 1983; Salton et al., 1994). Trying to grasp dispersed information – a moving target without fixed meaning, as indeed topics won’t stay in place for the length even of a moderately short sentence – is what thematics is aiming at. We are now FORCE AND INFLUENCE IN CONTENT ANALYSIS 223 lost in a palisade of new software for particular content-analytic purposes. Meanwhile, new fields turn up on the horizon, like textual data mining – an expression seemingly pressed into service by the business community – abetted by a variety of new algorithms of cluster analysis. The current focus on textual data mining could be the sign that contemporary society is tired of scientists who have eyes only for a few variables being silent on the rest. Says Hannah in Tom Stoppard’s Arcadia (1993: 59), “You’ve left out everything which doesn’t fit”; and again, “You’ve gone from a glint in your eye to a sure thing in a hop”. One view that helps us to see what has been too obvious or continuously ignored is this: that looking for patterns of variables obliges us to assimilate as much as possible of the totality of the text to guard against selectivity, and to maintain a sense of proportion. The remainder of this paper examines the significance and implications of “looking for patterns” in texts. The increasing variety of new software of content analysis for particular purposes, each with implicit assumptions – a trivial truth in itself – makes it timely critical to examine the epistemological foundations on which content analysis relies. What is striking – but less trivial – about these developments is the almost exclusive focusing on software and technology at the expense of the question of adjusting the choice of the software to the nature of the text. 2. Two Syntaxes of Knowing: The Code and the Context For most of the last 50 years, the drift of computer-supported content analysis has been on the word as unit of analysis (Perry and Kent, 1958; Péladeau, 1998; Sammet, 1969; SPSS, 1997; Stone, 1997; Stone et al., 1962) rather than on sentences (Gottschalk, 1997; Gottschalk and Bechtel, 1982), semantic structures (Roberts, 1997), or linguistic neural networks (Carley, 1994). This does not mean that these other units of analysis are less interesting than the word. Our perspective is primarily on the word as unit of analysis, because systems based on words have vaster realms of applications while also being able to manage huge amounts of text, not for lack of consideration of other methods. More basically, capitalizing on the word as a unit of content analysis finds a strong support in the fundamentals of language, similarity and contiguity (Firth, 1957). When a speaker selects a word among a set of possible ones (similarity, e.g., SNOW rather than ICE), then arranges them into a meaningful sequence (contiguity, e.g., IS MELTING), the processes involved go deep down into the twofold character of language, corresponding to two types of speech disturbances, similarity and contiguity disorders (Jakobson, 1956). It may be a pleasant surprise for text analysts to discover that the 1960’s technology (Stone et al.’s 1962, 1966 totemic GENERAL INQUIRER or Iker and Klein’s 1974 WORDS; see also Salton, 1991), with the word as unit of analysis, has been repeatedly tried and tested, and still works very satisfactorily. A quick look at web pages related to content analysis, e.g., www.gsu.edu/∼wwwcom/content.html or www.intext.de/textanae.htm, suggests that the word technology is still in high gear, 224 ROBERT HOGENRAAD ET AL. particularly when combined with new ideas in statistics: autocorrelation modeling, resampling, or classification and regression techniques that recursively test their assumptions to see if they may not themselves be in error (Efron and Tibshirani, 1991; Hogenraad and McKenzie, 1999). 2.1. WHAT A CODE DOES ( THE SUBSTITUTION DICTIONARY- BASED MODEL ) Let us agree that the words ICE, SNOW, and IGLOO share a fixed common code, i.e., possess in common an idea of COLD . The content-analytic processing of text starts when a list of words is selected and organized, and recursively so, as in lists of lists, in a manner such that the shared common codes are set to be of some agreedupon and controlled heuristic value (determined independently in other places of the field of enquiry). In content analysis, a dictionary is such a heuristic list of lists of words, all sharing at least one code, and all being substitutable to each other by virtue of their common codes. The list may be extensive or not, but this is a question of validity that needs to be answered empirically. For example: Martindale (1979) validated his “Regressive Imagery Dictionary” (RID) (1975); Saris-Gallhofer et al.’s (1978), Holsti’s “Stanford Political Dictionary” (1969: 164–190); Whissell et al.’s (1986), her “Dictionary of Emotions”; Winter (1973), McClelland’s imagery motive (1975); finally, Anderson and McMaster (1982, 1986) and Bestgen (1994) showed that dictionary words do almost as well as human readers in evaluating the emotional content of short stories. Having a common code of some heuristic value at the base of a dictionary is what converts the dictionary into a quasi-experimental tool designed to predictively test a theory. A dictionary is like the needle of a galvanometer which retraces and amplifies the variations in a current. If the current increases 10 percent, the needle goes up, say, two scale degrees. When a text is compared to the words of the list, each word of the text being compared to each word of the list, what does not match the list is ignored, and we return full-circle to the hypothesis-verification model of standard experimental science: The substitution model is selective, i.e., it predicts a single issue – that is what hypotheses are for – and unambiguously so – the hypothesis is verified or it is not –, leaving no room for doubt, at least in an ideal world. 2.2. THE CORRELATIONAL MODEL : “ EACH WORD WHEN USED IN A NEW CONTEXT IS A NEW WORD ” ( FIRTH , 1957: 190) The above quote by linguist J. R. Firth illustrates what makes the correlational model incommensurable with the substitution one. The difference demands to redraw the line between the two models. A good stab at defining the correlational model is that it stems from a desire to meet the text on its own terms, and not in terms of fixed codes shared by a list of words. The correlational model is predicated on the notion that the associations existing between words allow one to gain know- FORCE AND INFLUENCE IN CONTENT ANALYSIS 225 ledge of the patterns of distribution of soft-fingered themes as the text unfolds. Church and Hanks argue at greater length that “it is indeed common practice in linguistics to classify words not only on the basis of their meanings but also on the basis of their co-occurrence with other words” (1990: 22). What is at stake here is the power of blending an association measure and words of the text; yet, applied to subcategories of a dictionary, the same association measure would still be marked by the flaw and asset of the substitution model: The flaw at the heart of the substitution model is that it lifts the words out of context, its asset, that the words keep the same code throughout the text. At the opposite, in context, as each addicted reader knows, words lose their correspondences between sign and meaning while being enriched of new significance. Also, if dictionaries need to be designed for each language in which one intends to apply them, word-word associations are in this sense language-independent. 3. On Aboutness: A Commendation for Vagueness The question facing us now is how to tackle with themes that are as real as Henry James’s “figure in the carpet” (James and Kermode, 1896/1986), how do we, in other words, cut the themes at the joints. Themes scatter and gather again, like Bach’s “Goldberg Variations”, yet, we can never see the patterns that pattern the text, as they are everywhere and nowhere. A distinctive feature of a category is that it takes only one word to make up a category, but that it takes many to make up a theme. In the earlier series ICE, SNOW , and IGLOO making up the hypothetical category COLD , each word has the same exchange value with respect to the category COLD, i.e., each is substitutable to any other in the series because the category retains only the COLD dimension that is common to all. There is among these words a certain degree of likeness of meaning. Now consider another group made up of NATO, ARMY, MILITARY, ATTACK , and REVOLUTIONARY . This group of words characterized one of the dimensions in a factor analysis of 86 words distributed over 19 documents produced by a terrorist group in Europe in the years 1984–1985 (Hogenraad et al., 1995a: 55– 56). Taken separately, none of these words can be substituted with another, except perhaps the second and third. It is correct to state that this group of words is about NATO, and about the army, and about the military establishment, etc., but it is only when one recombines their different use values together and watches the reciprocal influences of these values in the text that one can give shape, a posteriori, to a theme by connecting the dots between these words. In the act of conjuring up complex themes from fragments of a text, words are only pointers to a theme, but they are not the theme. Dubbing this cluster of words “military concern” is not enough. One has also to describe their affinities by careful phrasing. The idea that looms here is that connecting the dots between the words can only mean that behind words, there are only other words. 226 ROBERT HOGENRAAD ET AL. 4. Putting the Text First: The Hermeneutic Chiasma The answer to the next question about when to use one tactic rather than the other owes much to Spence’s (1982) rule of critical asymmetry, with reference to two kinds of reading. The rule of critical asymmetry sets the conditions in which analyzing the content of a text can reasonably be carried out and those in which such content analyses are unsuitable. The more clear and transparent a text is, the less effort is required of the receiver. In other words, on the sender’s side, a high level of active structuring authorizes a low level of active structuring on the receiving end; a low structuring of the text on the sender’s side requires active structuring reading on the receiving end. For content analysis, the rule translates as “the less structured the text, the more structured and categorizing must be the analysis”. Nothing can be more puzzling than trying to understand the content of a postcard that lands by mistake in your mailbox. The writer of the postcard does not feel the need to explain to the intended receiver a frame of reference which the writer knows is perfectly known to the receiver, yet, only to him or her. That the hairline context is unknown to a third party makes the postcard unstructured and a quasi-private language in the eyes of that party.1 5. An Illustration Using Historiometry We are now living in the twilight of a declining sensate era, and after a stormy period of transition new ideational values will appear on the horizon (Michel P. Richard, Introduction to the Transaction Edition of Sorokin’s Social and Cultural Dynamics, 1985: xi). 5.1. INDUSTRIAL RELATIONS ( IR ) IR subject traditionally rested on the tripartite contract comprising government, business, and labor. Adverse economic conditions precipitated governments to withdraw from the triangle in which various forces operated, giving managers wider options, e.g., to escape higher labor costs by avoiding participatory labor relations (Hyman, 1997). We wanted to know how these changing conditions affected the evolution of IR as a scientific discipline. Kuhn’s (2000) model of the development of science best fits science at “the cutting edge of research”; for its part, a text-based history of science attempts to grasp the minor developments of science as well. We limited the analysis of scholarly communication to the abstracts of five IR journals.2 Scientific abstracts are both a reflection and a determinant of a science mind set. Across journals, the total number of words in the abstracts amounts to 353,822, the total number of abstracts is 3,455. FORCE AND INFLUENCE IN CONTENT ANALYSIS 227 5.2. A COMPUTER - AIDED CONTENT ANALYSIS The corpus was analyzed with the help of the PROTAN content analysis system (Hogenraad et al., 1995b). What PROTAN valuably does, beyond counting words, is to standardize and categorize words, look for word-word correlations (theme-spotting), recognize patterns of words, juxtapose sentences containing particular combinations of categories, and insert nonverbal information into the textual analysis. What it does still more valuably is to create, for each of these tasks, system-independent output files that are then submitted to statistical tools. Using a suffix removal procedure, we bring down the number of different word entries to some degree by taking out of the texts most of the verb declensions, plural and other adverbial forms. Disambiguating is optional; in practice, pace Kelley and Stone (1975), we have found it more effective simply to avoid very ambiguous words, having observed that the cost of the process is not in proportion to the ensuing gain. In the search for word-word correlations, the first step is to collect all the words with a frequency above a given level and to prune them of everything that is not a content word. In a second step, all the words retained in the first step are cross-correlated: Significant cross-correlations are then exponentiated usually to the fifth power while at the same time insignificant correlations are set to zero. This transformation maintains high correlations at a high to moderately high level, while accelerating the decrease of lower ones. This algorithm was devised by Iker (1974a, 1975) within the WORDS system (Iker and Klein, 1974), and revived within PROTAN. In a third and last step, the most associationally rich words that emerge from this procedure are then factor or cluster analyzed. 5.3. RESULTS 5.3.1. A Result without Suspicion Like any science, IR should achieve their goal by moving toward ever more abstract concepts, and away from sensate ones (Martindale, 1990: 350–368): Without principled concepts, IR would not even be able to identify industrial situations. We assessed the degree of abstractedness of the texts with the American-English version of Martindale’s RID (1975). The RID is composed of two calibrated lists of words (N = 2, 483), one (N = 1815), with words such as LOVE, SEX, FOOD, COLD , CHAOS, DREAM , FLYING , representing sensate systems viewing reality as that which is presented to the sense organs; the other (N = 668), with words such as MONEY, WORK, DISCIPLINE, POLICE, TIME, JUSTICE, LAW, representing ideational systems, i.e., conventions. When all the words of the texts are compared to all the words of the dictionary, we observe the profile in Figure 1, arrayed by year (each data point is the difference, in percentage per year, between the lists of convention and sensate words): e.g., in Industrial and Labor Relations Review, 228 ROBERT HOGENRAAD ET AL. Figure 1. Symbolic thought contents (convention minus sensate words) in the abstracts (1971–1995) of Industrial and Labor Relations Review, observed and fitted. R 2 = 0.78, F (1, 23) = 79.35, p < 0.0001, 2nd and 4th order (p < 0.50) autocorrelations removed (Hogenraad et al., 1997). Is this a chance result? Resampling the series 2,000 times with the SIMSTAT bootstrap algorithm (Péladeau, 1996) yields an average R 2 of 0.78, 95% confidence intervals ranging from 0.63 to 0.89; the average first degree beta value is −0.05, 95% confidence intervals ranging from −0.06 to −0.04. These results are totally unambiguous, not so much in regard to the replications as in regard to a predicted hypothesis being confirmed or not. Actually, the predicted hypothesis is not confirmed, neither for this journal, neither for the remaining four. All we can do after the predictive test of the theory is to provide interpretations based on our experience accumulated with an instrument, i.e., the RID dictionary; in this specific case, we express our conviction that, except for some possible paradigm shift to come, Figure 1 stops short of the terminal moraine of a discipline that, justifiably or not, has likely thickened up to problem-solving at the expense of science-building (Adams, 1988) – without pausing to wonder what a science that is so keen to solve problems might be a science of. Nonetheless, acknowledging our own reactions to the result, to use Toulmin’s words (1982: 97), does not influence it, as we will see is possible in the correlational model. FORCE AND INFLUENCE IN CONTENT ANALYSIS 229 Admittedly, in addition to being suspicious of believing that one could discover the essential pattern of a process still going on, some could consider as an overstatement our pretension to catch into one equation the deep intricacies of years of writing in IR. Yet, the most economical way to enhance the effectiveness of a purposeful action, i.e., testing if a particular discipline becomes really more conceptual, is by verifying focused hypotheses. Meanwhile, we close our eyes on anything that is irrelevant to the case. 5.3.2. A Quest without a Quarry With the help of FACSTRAP, a bootstrap procedure for factor analysis (Thompson, 1988), we arbitrarily and systematically requested three factors from the wordword correlation matrices between the different words retained from the abstracts in each journal. From the abstracts of Industrial Relations for example, we selected 92 richly associated words distributed over 25 years (1971–1995). Table 1 displays the words of the first factor of which they are symptoms, together with the mean estimate loadings (100 resamplings), their variability (SD) as estimate of their stability, and the original coefficients in the last column. No matter how many factors we request from the correlation matrix, a single motif keeps surfacing as a first prominent factor, i.e., the social (as opposed to the economic) or nonprofit (as opposed to profit). In the remaining journals abstracts (not shown here), the same motif of profit or individualism is not only visible, but also impossible to avoid. In addition to sharing the same world view, i.e., a separation between the economic and the social realm, the journals show the same trend over time, e.g., in Industrial Relations (Figure 2), R 2 = 0.65, F (2, 22) = 20.29, p < 0.0001, 3rd and 4th order (p < 0.50) autocorrelations removed. 6. Murray Sidman’s Autobiography in Science: A Cameo on the Superiority of Intra-Subject over Inter-Subject Variability Out of the air a voice without a face/ Proved by statistics that some cause was just/ In tones as dry and level as the place (W. H. Auden’s The Shield of Achilles, 1976/1994: 597). In 1994, the psychologist Murray Sidman wrote a research story of his work on equivalence relations (between words and their referents, e.g., in aphasic patients and retarded Down’s-syndrome children). Sidman’s is an autobiography in the literal sense of the term, one that allows readers to attend to the “genesis and development of a scientific fact”, to borrow the title of Fleck’s 1979 work. Along with the 11 reprints Sidman selected as representative of his work on this question, he commented on the reasons why each experiment was done, even the mood of the research team at the time; most fittingly for us, he arrayed the reprints, not chronologically, but along the order of the development of his thoughts on the subject. With Sidman’s story we have an elegant cameo test case for evaluating 230 ROBERT HOGENRAAD ET AL. Table I. The social motif (mean eigenvalue = 14.4, SD = 2.7) in the Industrial Relations abstracts (mean estimates of structure coefficients based on 100 replications with sample size = 25) Word or cluster Mean estimate loading SD Original coefficient Educate Social History Black Suggest Woman Equal Law Man Race Discriminate Member Role Bargaining Reduce Result 0.74 0.69 0.61 0.61 0.60 0.53 0.58 0.58 0.53 0.52 0.51 0.50 0.49 0.48 0.43 0.43 0.36 0.39 0.34 0.44 0.32 0.28 0.48 0.36 0.29 0.50 0.32 0.26 0.45 0.22 0.33 0.29 0.49 0.45 0.07 0.43 0.20 0.19 0.27 0.46 0.06 0.46 0.47 −0.03 0.59 0.12 0.07 −0.48 the hypothesis that, as a scientist develops his grasp of a question, he moves away from sensate thought contents. This test case is elegant on several grounds. First, Sidman tells us the whole story of each paper, a story that journals never let us have. Secondly, the rationally ordered reprints offer us both a behavioral baseline – Sidman’s first paper – and a systematic intra-subject replication of a quasiexperiment – his remaining 10 papers – that further avoids the obscuring effect of inter-subject variability of group statistical procedures (Bakan, 1954; Sidman, 1960/1988). For this quasi-experiment, we downloaded from the PsycInfo data base (www.apa.org/psycinfo/) the abstracts of each of the 11 papers commented upon by Sidman in his research story. The data collected made a total of 1,489 words of which 451 different ones. A simple rate of sensate words was computed using the RID. The result, in Figure 3,3 displays another side of the elegance of our test case. The anticipated decline in sensate words is highly visible [R 2 = 0.74, F (2, 8) = 11.43, p < 0.01, 1st and 2nd order (p < 0.50) autocorrelations removed]. But the final autobiographic text, numbered 11 though dated 1989, makes for an even stronger case. Sidman comments about this last study, “I am presenting FORCE AND INFLUENCE IN CONTENT ANALYSIS 231 Figure 2. From the social realm to its opposite (profit) in the abstracts of Industrial Relations (1981–1995), observed and fitted. the study out of order because (. . . ) the considerations underlying the Contextual Control experiments and the procedures they introduce mark off a new direction for experimental, applied, and theoretical investigations of equivalence” (1994: 475; italics in the original, underlining is ours). Text 11 marks the end of a line of thought and the beginning of a new line. Little wonder then that sensate words reverse to the baseline. 7. Discussion: A Spiral of Interpretations In both IR and Sidman’s studies, we predicted the modification that one variable (simple rate of sensate words in the Sidman’s study or difference between rates of conceptual and sensate words in the IR one) would go through over time. In the Sidman’s study, our prediction was successful; in the IR one, it was not. Actually we obtained the converse of what we expected. Yet, we could reconcile the result – as a way of gathering new knowledge – with other information regarding the transformation of a scientific discipline into a problem-solving one . . . before a new cycle of idealization, come the day (Kaufman, 1993: 237, note 1). It is as if the toand-fro of facts and theory recurrently refining each other, as in normal science, 232 ROBERT HOGENRAAD ET AL. Figure 3. Rate of sensate words in Murray Sidman’s research story (1971–1990) on equivalence relations. was superseded in North-American, British, and Australian IR by an accelerating process of piling up tangible facts on tangible facts (anecdotalism), causing the discipline to loose its conceptual ballast, possibly at the expense of de-skilling its own scientists (Reich, 1991: 282–300). However, in both studies, our prediction rested on a preexisting body of knowledge, i.e., the knowledge accumulated about Martindale’s (1990) evolutionary theory of scientific change and the instrument that goes with it, the RID dictionary. Not so though when we looked for themes, as in the analysis of the abstracts of Industrial Relations, in which case it was totally impossible to foresee which themes could have emerged. If words are predictors in the dictionary-based procedure, words can only be symptoms when one is looking for patterns, whichever the discovery procedure, factor, cluster, or correspondence analysis. Theme making is a self-explanatory unveiling of symptoms arrived at but by cognitive and emotional participation in a cause that contains both the method and the object of observation. The trouble is, as Rhees notes (Wittgenstein, 1966: 26, note 1), that “this does not help us to predict anything” (italics in the original). Theme making merely extends our knowledge of preexisting facts (while the substitution model redefines our conception of them). FORCE AND INFLUENCE IN CONTENT ANALYSIS 233 To say this is of course not to imply that results obtained with a dictionary are necessarily unconnected to those obtained from correlating text words, although much depends here on the dictionary selected and the reasons why it is selected. Actually, in the IR case, these results are related. In the abstracts of Industrial and Labor Relations Review for example, the correlation between the observed profiles of symbolic thought content (Figure 1) and the profit motif is −0.55 (N = 25, 95% confidence intervals, −0.76 to −0.30 after 25,000 replications); in the other journals, the correlations between symbolic thought content and the profit motif are similarly negative and significant. These correlations confirm Sorokin’s (1957/1985: 93 and 214) theory and observations regarding the relationships between individualism and sensate literature. There remains that theme-spotting makes the analyst always vulnerable to the accusation of having missed that one crucial word. To cap it all, recognizing one pattern raises interminably new suggestions waiting to be disclosed. The treatment of themes discussed and exemplified here accentuates three difficulties of the correlational model. The first, yet least problematic, is that the word-word correlation approach rests on an old rhetoric; indeed, the contiguity principle has a long history in behavioral science (Iker, 1974b). It is a relatively simple device to have two words coexist side by side and then argue that what the words represent is coherent because they coexist. The name of this device is reifying, i.e., granting the status of static reality to a moving target (Spencer, 1982). If philosophers are uneasy with reifying (Vayda, 1991), we must concede that much of social and behavioral science practice rests on it without much ado. The second, more serious, difficulty is that looking for themes contributes to the permanent suspicion that there could be another sense to a text, that is, another hidden theme leading to a spiral of interpretations at the expense of communication (Raven et al., 1971). Actually, the difficulty is only one facet of a larger question identified by Sorokin under the label of “principle of limits” (1985: 647–663). Albeit one may suppose that the number of possible themes and the amount of their respective variations in a given text is finite, one has no means to know these limits. Put differently, themes have more possibilities of change than dictionary categories, while some facets of a theme may never come out in a given text, or quite infrequently. The third, pragmatic, difficulty of eliciting robust themes comes from the nature of the text: The less tied to the constraint of reality, as in literature, the wider the gap between sign and meaning (de Man, 1989: 17), hence the name, “literary fiction”. Inversely, one may expect themes to be transparent whenever there is pressure to abide by the constraints of reality in language use, even if one cannot predict which themes will emerge from a correlational analysis; in this ideal case, analysts get the best of both worlds: theme-spotting techniques may yield well-distinct themes and dictionary techniques, new re-definitions of facts. 234 ROBERT HOGENRAAD ET AL. 8. Conclusions “There’s always more to it” says character David Ferrie in Don DeLillo’s (1988: 321) novel “Libra”. Indeed! The correlational model of content analysis yields no unique solution: It will never be the “Food and Drug Administration” of texts. For the less a text is structured, the more discovering a pattern in it is likely to produce results that may be difficult or even impossible to interpret. Not only may a pattern not be the ultimate one, but even if it is, one may find it not easily transportable to other contexts. Besides, not only discovering the pattern of a class of words does not allow one to predict, but it may be fallacious to believe that one could discover the essential pattern of a process that is still going on. Finally, themes or patterns being pure representations (i.e., only names, without fundamental reality, as real as the epitaph of Sherlock Holmes), when one analyzes a theme or pattern, one should not fall into the fancy of its existence. Altogether, there is no way to get closer to the way themes really are, for nothing can be like a sentence except another sentence. Acknowledgement Acknowledgment is made to the National Fund for Scientific Research, Belgium (Grant No. 1.5.126.99, 1998–1999). Reprint requests should be addressed to Robert Hogenraad, Psychology Department, Catholic University of Louvain, 10 Place Cardinal Mercier, B-1348 Louvain-la-Neuve (Belgium); E-mail [email protected] Notes 1. Technical conversations between experts, too, turn easily into a quasi-private language because so many elements of it are implicit, i.e., the text is structured only for the experts, but not for outsiders. Consider the following: A: “Bill, you’d better get that Linden back or you’ll lose that baby too.” B: “Yeah, I just lost 81”. B: “Look any better?” A: “No. You still got to get rid of about 400 Bill because you’re 400 over the short time emergency on that 80 line.” B: “Yeah – that’s what I’m saying. Can you help me with that?” (Conversations between Senior Pool Dispatcher (A) and Con Edison System Operator (B) between 8:56 pm and 9:02 pm, July 13, 1977. Extracts from the State of New York Investigation, New York City Blackout, July 13, 1977, p. 13. (blackout.gmu.edu/archive/pfd/ny_state_77.pdf). 2. These journals are: British Journal of Industrial Relations, London (1963–1995), Industrial Relations, Berkeley (1962–1995), Industrial and Labor Relations Review, New York (1947–1995), Journal of Industrial Relations, Sydney (1959–1995), and Relations Industrielles/Industrial Relations, Quebec (1945–1995). 3. Murray Sidman’s papers analyzed in Figure 3 are: (1) Sidman, M. (1971). Reading and auditory-visual equivalences. Journal of Speech and Hearing Research 14: 5–13. (2) Sidman, M. & Cresson, O. (1973). Reading and cross-modal transfer of stimulus equivalences in severe retardation. American Journal of Mental Deficiency 77: 515–523. FORCE AND INFLUENCE IN CONTENT ANALYSIS 235 (3) Sidman, M. (1977). Teaching some basic prerequisites for reading. In: P. Mittler (ed.), Research to Practice in Mental Retardation: Vol. 2. Education and Training. Baltimore, MD: University Parks Press, pp. 353–360. (4) Sidman, M., Cresson, O. & Willson-Morris, M. (1974). Acquisition of matching to sample via mediated transfer. Journal of the Experimental Analysis of Behavior 22: 261–273. (5) Sidman, M., Rauzin, R., Lazar, R., Cunningham, S., Tailby, W. & Carrigan, P. (1982). A search for symmetry in the conditional discriminations of rhesus monkeys, baboons, and children. Journal of the Experimental Analysis of Behavior 37: 23–44. (6) Sidman, M. & Tailby, W. (1982). Conditional discrimination vs. matching to sample: An expansion of the testing paradigm. Journal of the Experimental Analysis of Behavior 37: 5–22. (7) Sidman, M., Kirk, B. & Willson-Morris, M. (1985). Six-member stimulus classes generated by conditional-discrimination procedures. Journal of the Experimental Analysis of Behavior 43: 21–42. (8) Sidman, M., Willson-Morris, M. & Kirk, B. (1986). Matching-to-sample procedures and the development of equivalence relations: The role of naming. Analysis and Intervention in Developmental Disabilities 6: 1–19. (9) Sidman, M. (1990). Equivalence relations: where do they come from? In: D. E. Blackmore & H. Lejeune (eds.), Behaviour Analysis in Theory and Practice: Contributions and Controversies. Hillsdale, NJ: Erlbaum, pp. 93–114. (10) Sidman, M., Wynne, C. K., Maguire, R. W. & Barnes, T. (1989). Functional classes and equivalence relations. Journal of the Experimental Analysis of Behavior 52: 261–274. (11) Bush, K. M., Sidman, M. & de Rose, T. (1989). Contextual control of emergent equivalence relations. Journal of the Experimental Analysis of Behavior 51: 29–45. References Adams, R. J. (1988). Desperately seeking industrial relations theory. International Journal of Comparative Labor Law and Industrial Relations 4: 1–10. Anderson, C. W. & McMaster, G. E. (1982). Computer assisted modeling of affective tone in written documents. Computers and the Humanities 16: 1–9. Anderson, C. W. & McMaster, G. E. (1986). Modeling emotional tone in stories using tension levels and categorical states. Computers and the Humanities 20: 3–9. Auden, W. H. (1976/1994). Collected Poems. London: Faber and Faber. Bakan, D. (1954). A generalization of Sidman’s results on group and individual functions. Psychological Bulletin 51: 63–64. Bestgen, Y. (1994). Can emotional valence in stories be determined from words? Cognition and Emotion 8: 21–36. Carley, K. M. (1994). Extracting culture through textual analysis. Poetics 22: 291–312. Church, K. W. & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics 16: 22–29. DeLillo, D. (1988). Libra. Harmondsworth: Penguin. de Man, P. (1989). Blindness and Insight: Essays in the Rhetoric of Contemporary Criticism. (2nd revised edition, with an introduction by Wlad Godzich). London: Routledge. Duchastel, J., Paquin, L.-C. & Beauchemin, J. (1992). Automated syntactic text description enhancement: Thematic structure analysis. Computers and the Humanities 26: 31–42. Dyer, M. G. (1983). In-Depth Understanding: A Computer Model of Integrated Processing for Narrative Comprehension. Cambridge, MA: MIT Press. Efron, B. & Tibshirani, R. (1991). Statistical data in the computer age. Science 253: 390–395. 236 ROBERT HOGENRAAD ET AL. Firth, J. R. (1957). Modes of meaning (1951). In: J. R. Firth (ed.), Papers in Linguistics 1934–1951. London: Oxford University Press, pp. 190–215. Fleck, L. (1979). In: Thaddeus J. Trenn and Robert K. Merton (eds), Genesis and Development of a Scientific Fact [Entstehung und Entwicklung einer wissenschaftlichen Tatsache: Einführung in die Lehre vom Denkstil und Denkkollektiv]. Fred Bradley and Thaddeus J. Trenn (trans.). Foreword by Thomas S. Kuhn. Chicago: The University of Chicago Press. (Original work published 1935.) Gottschalk, L. A. (1997). The unobtrusive measurement of psychological states and traits. In: C. W. Roberts (ed.), Text Analysis for the Social Sciences: Methods for Drawing Inferences from Texts and Transcripts. Mahwah, NJ: Erlbaum, pp. 117–130. Gottschalk, L. A. & Bechtel, R. J. (1982). The measurement of anxiety through the computer analysis of verbal samples. Comprehensive Psychiatry 23: 364–369. Handelman, S. A. (1982). The Slayers of Moses: The Emergence of Rabbinic Interpretation in Modern Literary Theory. Albany: State University of New York Press. Hogenraad, R. (2002). Moving targets: The making and molding of a theme. In: M. M. Louwerse & W. van Peer (eds), Thematics: Interdisciplinary Studies. Philadelphia: John Benjamins, pp. 353–376. Hogenraad, R., Bestgen, Y. & Nysten, J.-L. (1995a). Terrorist rhetoric: Texture and architecture. In: E. Nissan & K. M. Schmidt (eds), From Information to Knowledge. Conceptual and Content Analysis by Computer. Oxford: Intellect, pp. 48–59. Hogenraad, R., Daubies, C. & Bestgen, Y. (1995b). Une Théorie et une Méthode Générale d’Analyse Textuelle Assistée par Ordinateur. Le Système PROTAN (PROTocol ANalyzer) (Version du 2 mars 1995) [A General Theory and Method of Computer-Aided Text Analysis: The PROTAN System (PROTocol ANalyzer), Version of March 2, 1995]. Computer program, Louvain-la-Neuve (Belgium), Psychology, Catholic University of Louvain. http://www.psor.ucl.ac.be/protan/protanae.html Hogenraad, R. & McKenzie, D. P. (1999). Replicating text: The cumulation of knowledge in social science. Quality & Quantity 33: 97–116. Hogenraad, R., McKenzie, D. P. & Martindale, C. (1997). The enemy within: Autocorrelation bias in content analysis of narratives. Computers and the Humanities 30: 433–439. Holsti, O. R. (1969). Content Analysis for the Social Sciences and the Humanities. Reading, MA: Addison Wesley. Hyman, R. (1997). La géométrie du syndicalisme: Une analyse comparative des identités et des idéologies [The geometry of trade unionism: A comparative analysis of identities and ideology.]. Relations Industrielles/Industrial Relations 52: 7–38. Iker, H. P. (1974a). SELECT: A computer program to identify associationally rich words for content analysis: I. Statistical results. Computers and the Humanities 8: 313–319. Iker, H. P. (1974b). An historical note on the use of word-frequency contiguities in content analysis. Computers and the Humanities 8: 93–98. Iker, H. P. (1975). SELECT: A computer program to identify associationally rich words for content analysis. II. Substantive results. Computers and the Humanities 9: 3–12. Iker, H. P. & Klein, R. H. (1974). WORDS: A computer system for the analysis of content. Behavior Research Methods & Instrumentation 6: 430–438. Jakobson, R. (1956). Two aspects of language and two types of aphasic disturbances. In: R. Jakobson & M. Halle (eds), Fundamentals of Language. The Hague: Mouton, pp. 53–82. James, H. & Kermode, F. E. (eds) (1896/1986). The Figure in the Carpet and Other Stories. New York: Viking Press. Kaufman, B. E. (1993). The Origins and Evolution of the Field of Industrial Relations in the United States. Ithaca, NY: ILR Press. Kelley, E. F. & Stone, P. J. (1975). Computer Recognition of English Word Senses. Amsterdam: North-Holland. FORCE AND INFLUENCE IN CONTENT ANALYSIS 237 Kuhn, T. (2000). In: James Conant and John Haugeland (eds), The Road Since “Structure”: Philosophical Essays, 1970–1993, with an Autobiographical Interview. Chicago: University of Chicago Press. Martindale, C. (1975). Romantic Progression: The Psychology of Literary History. Washington, DC: Hemisphere. Martindale, C. (1979). The night journey: Trends in the content of narratives symbolizing alteration of consciousness. Journal of Altered States of Consciousness 4: 321–343. Martindale, C. (1990). The Clockwork Muse: The Predictability of Artistic Change. New York: Basic Books. McClelland, D. C. (1975). Power: The Inner Experience. New York: Irvington Publishers. Muskens, G. (1985). Mathematical analysis of content. Quality & Quantity 19: 99–103. Norris, C. (1987). Derrida. London: Fontana Press. Péladeau, N. (1996). Simstat for Windows. User’s Guide (Version 2.0, 10 October 2000). Montreal: Provalis Research. (www.simstat.com) Péladeau, N. (1998). WordStat. Content Analysis Module for SIMSTAT. User’s Guide (Version 3.0, 21 December 2000). Montreal: Provalis Research. (www.simstat.com) Perry, J. W. & Kent, A. (eds) (1958). Tools for Machine Literature Searching: Semantic Code Dictionary, Equipment, Procedures. New York: Interscience. Raven, P. H., Berlin, B. & Breedlove, D. E. (1971, 17 December). The origins of taxonomy. Science 174: 1210–1213. Reich, R. B. (1991). The Work of Nations. Preparing Ourselves for 21st-Century Capitalism. London: Simon & Schuster. Roberts, C. W. (ed.) (1997). Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Transcripts. Mahwah, NJ: Erlbaum. Salton, G. (1991). Developments in automatic text retrieval. Science 253: 974–980. Salton, G., Allan, J., Buckley, C. & Singhal, A. (1994). Automatic analysis, theme generation, and summarization of machine-readable texts. Science 264: 1421–1426. Sammet, J. E. (1969). Programming Languages: History and Fundamentals. Englewood Cliffs, NJ: Prentice-Hall. Saris-Gallhofer, I. N., Saris, W. E. & Morton, E. L. (1978). A validation study of Holsti’s content analysis procedure. Quality & Quantity 12:, 131–145. Sidman, M. (1960/1988). Tactics of Scientific Research: Evaluating Experimental Data in Psychology. Boston: Authors Cooperative. Sidman, M. (1994). Equivalence Relations and Behavior: A Research Story. Boston: Authors Cooperative. Sorokin, P. (1985). Social and Cultural Dynamics. A Study of Change in Major Systems of Art, Truth, Ethics, Law, and Social Relationships. London: Transaction Publishers. Spence, D. P. (1982). Narrative Truth and Historical Truth. Meaning and Interpretation in Psychoanalysis. New York: Norton. Spencer, M. E. (1982). The ontologies of social science. Philosophy of the Social Sciences 12: 121– 141. SPSS Inc. (1997). TextSmart 1.0 User’s Guide. SPSS Inc.: Chicago. Stone, P. J. (1997). Thematic text analysis: New agendas for analyzing text content. In: C. W. Roberts (ed.), Text Analysis for the Social Science. Methods for Drawing Statistical Inferences from Texts and Transcripts. Mahwah, NJ: Erlbaum, pp. 35–54. Stone, P. J., Bales, R. F., Namenwirth, J. Z. & Ogilvie, D. M. (1962). The General Inquirer: A computer system for content analysis and retrieval based on the sentence as a unit of information. Behavioral Science 7: 484–498. Stone, P. J., Dunphy, D. C., Smith, M. S. & Ogilvie, D. M. (1966). The General Inquirer: A Computer Approach to Content Analysis. Cambridge, MA: MIT Press. Stoppard, T. (1993). Arcadia. London: Faber and Faber. 238 ROBERT HOGENRAAD ET AL. Thompson, B. (1988). Program FACSTRAP: A program that computes bootstrap estimates of factor structures. Educational and Psychological Measurement 48: 681–686. Toulmin, S. (1982). The construal of reality: Criticism in modern and postmodern science. Critical Inquiry 9: 93–111. Vayda, A. P. (1991). Concepts of process in social science explanations. Philosophy of the Social Sciences 21: 318–331. Weber, R. P. (1983). Measurement models for content analysis. Quality & Quantity 17: 127–149. Whissell, C., Fournier, M., Pelland, R., Weir, D. & Makarec, K. (1986). A dictionary of affect in language. IV. Reliability, validity, and applications. Perceptual and Motor Skills 62: 875–888. Winter, D. G. (1973). The Power Motive. New York: Free Press. Wittgenstein, L. (1966). Lectures on aesthetics. In: L. Wittgenstein (ed.), Lectures and Conversations on Aesthetics, Psychology and Religious Belief. Oxford: Basil Blackwell, pp. 1–40.
© Copyright 2026 Paperzz