On the Combination of Associative Probabilities in Linguistic

On the Combination of Associative Probabilities in Linguistic Contexts
Author(s): Davis Howes and Charles E. Osgood
Source: The American Journal of Psychology, Vol. 67, No. 2 (Jun., 1954), pp. 241-258
Published by: University of Illinois Press
Stable URL: http://www.jstor.org/stable/1418626 .
Accessed: 10/01/2011 03:54
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you
may use content in the JSTOR archive only for your personal, non-commercial use.
Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at .
http://www.jstor.org/action/showPublisher?publisherCode=illinois. .
Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed
page of such transmission.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
University of Illinois Press is collaborating with JSTOR to digitize, preserve and extend access to The
American Journal of Psychology.
http://www.jstor.org
ON THE COMBINATION OF ASSOCIATIVE PROBABILITIES
IN LINGUISTIC CONTEXTS
By DAVIS HOWES, Aero Medical Laboratory, Wright-Patterson AFB,
and CHARLES E. OSGOOD, University of Illinois
It is a commonplacethatmeaningsof words dependupon the contextsin
which they occur. This dependence sets a fundamentalproblem in the
psychologyof language: calculationof the psychologicaleffectsof a word
in its context from the individual propertiesof the word and of the contextual elements. Recognizing a conventionaldistinctionbetween the linguistic and non-linguisticcontexts of a person's speech, we may conveniently subdividethe former into (a) the homogeneouslinguistic contextthe context provided by his own previouslanguagebehavior;and (b) the
heterogeneouslinguistic context-that providedby the utterancesof other
persons in his environment.Recent work by Shannonhas provided a statistical model for describingcertain problems of homogeneoussequences
and has stimulatedseveralexperimentalstudiesin that area.' In the present
paper we shall be concernedsolely with heterogeneouslinguistic contexts;
i.e. with the predictionof the language behaviorof an experimentalsubject from the language behaviorof another person in his environment.
For experimentalpurposeswe take a sequenceof four words spoken by
an experimenter(E) and measureas a dependentvariablethe probability
that a given word will be emitted as an associationto the last word of the
sequence.This is a modified form of the conventionalword-association
experiment.Since the sequenceis spoken by E, the propertiesof the words
constitutingit can be controlledas independentvariables.The strengthsof
the associativeeffectsof each of the firstthree words of the sequenceupon
the subject's(S's) responseto the fourth word is the propertyinvestigated
in the following experiments.Three studieswill be reported.If we designate the first three words of the four-wordsequenceas the contextand the
fourth word as the test-word,the independentvariablesdefining the three
* Accepted for publication October 23, 1952.
1C. E. Shannon, A mathematical theory of communication, Bell Syst. Tech. J., 27,
1948, 379-423, 623-656; G. A. Miller and J. A. Selfridge, Verbal context and the
recall of meaningful material, this JOURNAL,63, 1950, 176-185; E. B. Newman, The
pattern of vowels and consonants in various languages, this JOURNAL,64, 1951,
369-379; C. E. Shannon, Prediction and entropy of printed English, Bell Syst. Tech.
J., 30, 1951, 50-64.
241
242
HOWES AND OSGOOD
experiments are as follows: (1) interposition-the
number of stimulus-
words separatinga given contextualword from the test-word; (2) density
-the number of contextual words having similar associative effects;
(3)
frequency of occurrence-the
number of times a contextual word
occursin a general sample of the languageunder study. Since the experiments attemptto predictthe probabilitythat a particularassociationwill be
emitted following stimulationwith a sequenceof words from knowledge
of its probabilityfollowing single words, these studies take the general
form of experimentson the combinationof word probabilities.
Definitionsand symbols.Fromthe above descriptionit can be seen that the basic
idea requiringdefinitionis that of the probabilityof a word occurringas an association. For the generalcasewe can definethe associativeword-probability
Pj,j2
. ... m
for two populations of subjects U and W as the limiting relative
(i1, i2, . .
in)
frequency with which the sequence of words il, i2, . . . ,i is emitted by population
U following the emission of the sequence of words i}, i2 ... im by population W.
To give the definition experimentalmeaning,certain boundaryconditionshave to
be specified.First, predictionwill be attemptedfor the probabilityof only the first
word of the sequenceof associatedwords, ii (which can thereforebe writtensimply
as i). Similarly,only sequencesof four stimulatingwords j will be considered.The
stimulatingpopulationW will be a single person (one of the Es), and the associating populationU will consist of a class of college students.One other very important condition must be specifiedviz. the instructionsgiven the Ss. These are,
briefly, to associateonly to the final word of the stimulatingsequence.This final
word will thereforebe called the test-wordand will be designatedjt. The entire
sequence of four stimulating words, called a word-set, will be written ji, j2,j3, jt to
indicatethe special emphasisplaced on the test-wordby the instructions.The three
words that precedethe test-wordconstitutea context.The form of our experiments
can thus be representedby the expressionPJl,J2,3J,j(i). In words, this expression
representsthe probabilitythat word i will occuras the associationto a sequenceof
four wordswhen the Ss areinstructedto associateto the last word only.
Associativeprobabilitiesfollowing single-wordstimulation,such as those reported
by Kent and Rosanoff,2will be called first-orderassociativeprobabilitiesand written
pj(i). Since the strengthof these associationsis the basic quantityin the present
experiments,subscriptswill be used to indicate different values of this variable.
Largefirst-orderassociativeprobabilitieswill be indicatedby letter subscriptsto the
stimulus-symbol,small first-orderassociativeprobabilitiesby numericalsubscripts.
Thus pja(i) >> pil(i). It will be convenientto definethe associativeeffect of a
stimulus-word/ as the probabilityof the associationi following / in the conventional
word-association
experiment.In the aboveinequality,for example,a word represented
by ja (e.g. man) can be said to have a strongerassociativeeffectthanjl (boy) upon
the associationi (woman) for the populationstudied by Kent and Rosanoff,since
in their tables the relative frequencywith which woman is associatedto man is
0.394 while its relativefrequencyof associationto boy is only 0.002. These symbols
2G. H. Kent and A. J. Rosanoff,A study of associationin insanity,Amer. J.
Insanity,67, 1910, 37-96, 317-390.
ASSOCIATIVEPROBABILITIESIN LINGUISTIC CONTEXTS
243
for the strength of first-order associative effects will be carried over to the word
sets: thus, Pja,1J,i2 jt(i) indicates that pja(i) and Pit(i) are large relative to
Pjl(i) and pj2(i).
The effect of the context of a word-set upon the probability of the association i
will be measured as the difference between the probability of i following the word-set
containing that context and the probability of i following the test-word with a
control context consisting of nonsense words and three-place numbers (represented
by j'). It is assumed that such contextual materials affect the associative response
negligibly. Values of pit(i), estimated from the Kent-Rosanoff tables, can also be
put to this purpose. But the Kent-Rosanoff data can be applied to the present
population only with reservation in view of the type of Ss they used and the historical period in which their measurements were made.
The extent to which the word-set constitutes a meaningful sequence that might be
emitted in the course of ordinary continuous speech can be expected to influence
the associative response. Response to the word-set "Mary had a little," for example
can be expected to differ greatly from that to a set, such as "a had Mary little,"
which presents the same components in a less familiar sequence. To control for
this factor we can specify that the transitional probabilities for populations U and
W shall be negligible where the transitional probability P*Jk-t(Jk) is defined as the
probability that word jk will follow a given word jk-i in samples from the continuous
discourse of a stated population. Populations U and W in the present case both
represent samples of the general population of English-speaking persons of college
education.
It should be clear that the symbols defined here represent only experimental
variables that we have attempted to control in the present studies, and do not imply
that the dependent variable can be completely specified by these concepts. All kinds
of complex interactions, in addition to those specified by the transitional probabilities of the word-set, may modify the effects upon the subject of each contextual
word. Indeed, one of the purposes of the experiments is to ascertain the extent to
which a few simple conditions, such as those defined above, can determine the
probability of a given associative response. The elucidation of complex interactions
between the effects of different elements of a word-set is a natural corollary of such
an experimental program.
All measurements necessary to specify the independent variables of these experiments could be obtained from extensive tables of first-order associative and transitional probabilities analogous to those published for the former by Kent and Rosanoff and others.3 Sufficiently extensive tables, unfortunately, exist in neither case, so
that it has been necessary to make subjective estimations of these probabilities. Little
validity can be imputed to subjective estimations of fine differences in associative or
transitional probabilities; the experimental designs have therefore been simplified
to require only these very simple judgments: (1) that a given word is a highly
probable association, in the conventional word-association test, to a second given
word; (2) that a given word is a highly improbable association to a second given
word; and (3) that a given sequence of words will occur only very rarely (if at
3Kent and Rosanoff, op. cit.; J. O'Connor, Born That Way, 1928; Herbert Woodrow and F. Lowell, Children's association frequency tables, Psychol. Monog., 22, 1916,
No. 97, 1-110.
244
HOWES AND OSGOOD
all) in ordinary continuous discourse, and hence represents very low transitional
probabilities. The design of the experiments makes it necessary that the first two
judgments be valid only relative to each other; thus only two subjective judgments
are really required for the experiments. Selection of sequences with low transitional
probabilities can be facilitated by eliminating from the word-sets all connective
words (articles, conjunctions, prepositions, etc.), since sequences of high transitional probability usually include one or more of these words.
While no validity studies specific to these judgments have been conducted, the
ability of subjects to estimate the relative probabilities of occurrence of words
would indicate that such simple judgments can be carried out with small error.4
Such estimates, of course, do not possess the objectivity of measurements that are
independent of the experimenter's judgment, and the results to be presented below
must be interpreted with that limitation in mind. At the same time, it should be
emphasized that the difficulties are not inherent in the conceptual design of the investigation, but are artifacts of a temporary lack of empirical information. Whenever
more extensive tables of associative and transitional frequencies are available, the
present experiments can be replaced by more rigorous ones.
Subjects and instructions. Approximately 200 men, students in the introductory
psychology course in Yale College, served as Ss in these experiments. They were
divided into four groups,5 each of which was given the same test-words but different
contexts. The following instructions were read to all the groups.
This is an experiment on what is known as 'free association.' When a person
calls out a word and you say (or write) the first word that comes into your head,
that is a free association. Thus, for example, if I should say tree you might immediately think trunk or leaf or green or sleep, or any word whatsoever. The present
experiment differs from this simple type in one respect: I will read four words
slowly and you are to listen carefully to all of them, but you are to free-associate
to the last word only. Thus if I should say 'toy, come, wretched, book,' you would
then write down the first word that book made you think of. This is important:
listen to each word carefully, but respond only to the last one. Be sure to get the
first association down. There will be 50 such word-sets altogether. I will call out
the number of the set before I read off the words. Although the test-words will all
be meaningful, the other three will somtimes be nonsense words or numbers. Try
to avoid responding with one of the words in the set, but write it down if it comes
up strongly.
Before the experiment was begun the main rules were repeated: "(1) Listen to
all four words carefully. (2) Free-associate to the last word only. (3) Put down the
first word that comes to mind; write it immediately. (4) Try to avoid writing down
one of the words used in the word set." To set the test-word off from the context,
E paused slightly between reading the last word of the context and the test-word
and enunciated the latter with an intonation of finality (the tacit period of a spoken
sentence). Each word-set required about 5 sec. to read. The same E read the instructions and word-sets to all groups of Ss.
Materials. The first 5 word-sets read to each group were given to show the Ss
the procedure. Of the remaining 45 sets, 10 were devoted to Experiment 1 (inter4D. Howes, The definition and measurement of word probability, Doctoral Dissertation, Harvard University, 1950, 33-36.
5Scheduling difficulties resulted in a slight imbalance among the groups. The
actual number in each group varied from 46 to 57.
ASSOCIATIVEPROBABILITIESIN LINGUISTIC CONTEXTS
245
position), 10 to Experiment 2 (density), and 8 or 7, depending on the group,
to Experiment 3 (frequency of occurrence). The remaining word-sets concerned
variables not reported in this paper (the word-sets for these experiments appeared
in random order throughout the list of 50).
Since the Ss were instructed to avoid using contextual words as associations, it
was necessary to exclude from each context any word likely to occur as an associate
to the corresponding test-word. To insure this condition, all test-words were taken
from the list of 100 stimulus-words for which Kent and Rosanoff published 1000
associations, and no word was admitted to a context if in those tables it appeared
as an association to the test-word with a frequency greater than 0.005.
Treatment of data. The dependent variable in these experiments, P, (i),6 is the
proportion of Ss who respond to a word-set with a particular association, i. The
associations to be so measured must be selected carefully for the probability with
which they occur as first-order associations to the test-word. For if pit(i) were
very high, no context could effect a considerable increase in the probability of i,
which must always be less than unity; while too low a value of pit(i) would be
impossible to measure accurately with groups of only 50 subjects. If a cluster of
associations, all having similar first-order associative relations to the contextual words,
is taken as the dependent event i, the first-order associative probability of each
response-word can be small while the probability of the whole cluster can be large
enough to present no serious difficulty of measurement in samples of 50. This
solution has been followed in the present experiments. For the word-set: devil,
fearful, sinister, dark, for example (a set, used in the density experiment, in which
each contextual word should have a strong associative effect upon the response),
the frequency of occurrence of i was tabulated for the associative cluster (bad, evil,
fear, fright, frightening, gloomy, hell, mysterious, scared, scary).
The chief danger in specifying these clusters is that personal idiosyncrasies in
the meanings of words will influence the selection. Several precautions were observed in order to minimize this effect: (1) each E examined all 200 associations
made to a given test-word without distinguishing among those made under different
experimental conditions; (2) each E excluded any word about which he felt doubtful (i.e. with regard to its associative relations to the contextual words); (3)
separate lists of associative clusters were drawn up by the two Es, and no important
differences appear when results are calculated separately for each list; (4) a final
list was prepared, including only words listed by both Es, in which a still more
strict criterion for selection was adopted. Calculations given below are for this final
list. As additional assurance against the problem posed by associations of high probability, any word occurring to the test-word with a relative frequency of 0.02 or
higher in the Kent-Rosanoff tables, or of over 0.04 to the control word-sets with
nonsense-word contexts, was excluded from the clusters.
While it is unfortunate that these subjective estimations had to be made, it should
be remembered that the discrimination that is basic to all these approximations is
simply the ability to select one set of words that occur as associations to a given
stimulus-word much more frequently than to a second set of words. This discrimination is made especially easy in the present study by the fact that each associative
cluster need include only a few words. Thus any word about which there was doubt
6 The
subscript w appears here as a general symbol for word-set.
246
HOWES AND OSGOOD
could be eliminated. As will be seen, the experimental design itself provides a
further safeguard: any word selected for an associative cluster is counted in all
experimental groups, including the control, and poor selection would tend to
minimize effects of the experimental variables rather than to exaggerate them.
Results: Experiment 1: Interposition. It can be expected that an increase
in the number of stimulus-words, /i, j,, . . . that are presented between a
given contextual word ja and the moment of the S's response will tend to
decrease the effect of ja upon the associative response. The present experiment explores this relation.
The experiment involves measurement of the probability of a given associative
cluster i under three conditions: one in which the experimental word ja is the first
word of the context (hence farthest removed from jt); a second in which it is the
second word of the context; and a third in which it is the last word in the context
(hence nearest to jt). The remaining words in the context are neutral with respect
to i. A control context made up of 3-place numbers or nonsense words of two or
three syllables must be added. The four word-sets of this experiment can thus be
represented:
Condition
I, , ia, i2,
Condition II,
ConditionIII,
jt;
j2, ia, /1, jt
/i, j2, ja, jt;
Condition IV, ji, /2, /'3, jt.
For example, let ja = skin, jt = rough, and consider the association hands. The
word-sets read to the Ss might then be: I, skin, hour, utter, rough; II, hour, skin,
utter, rough; III, utter, hour, skin, rough; IV, 318, hokiba, rafuny, rough. It is
assumedthat neither hour nor utter has an appreciableeffect upon the occurrence
of handsas an association.
The amountof data this experimentgeneratescan be tripled by selectingsecond
and third associationsdifferentfrom the first one. Representingby ia the first association,which is a strong first-orderassociateof j,, we can write the two other
associationsio and iy. We now select a word that has a strongfirst-orderassociative
effect on ii but negligible first-orderassociativeeffects on ia and iy. This word,
which can be represented by jb, can be used in place of the neutral word j/ of the
contexts, and thus its separation from jt will be varied just as the separation between
ja and jt is varied.A third word, jc, can be chosen in like mannerfor strong firstorder associativeeffect on iy and negligible effects upon ia and i3. It can replacej2
in the word-sets of Equation [1]. We can then rewrite the first three word-sets of
the experiment:
Condition I, ,ja, /b, jt;
Condition II, jc, ja, jb, jt;
ConditionIII, jb,i,,jc, it;
lla]
where it is assumed that pja(ia), pjb(ip), and pjc(iy) are large relative to the
remaining associative probabilities, Pia(iS), Pja(iy),
jb(ia), PJb(iy), Pjc(ia) and
pjc(ip). For illustration, let jt = rough, ia = hands, ia =- storm, and iy = rocky,
and let the three contexts of [la] be: I, skin, wind, mountain; II, mountain, skin,
wind; III, wind, mountain, skin. We then determine the combined probability of
ASSOCIATIVEPROBABILITIESIN LINGUISTIC CONTEXTS
247
hands following both skin and rough when zero, one, or two neutral contextual
words are interposedbetweenthem, just as before.In addition,the samecalculations
can be madefor stormfollowing wind and rough and for rockyfollowing mountain
and rough. Thus a single word-setprovides three determinationsof the function.
Fig. 1 shows the probabilityof associativeclusters,Pw(i), as a function
of the number of interposed neutral words. Thirty determinationswere
made for each numberof interposedwords, three for each of the 10 wordsets. Thus Pw(ia) is plotted opposite the abscissalnumberzero for Condition III, in which ja is the last contextualword, and oppositethe number
.20
.10-
.05 --
.00
.
0
2
I
m
FIG. 1. THE PROBABILITY OF AN ASSOCIATIVECLUSTER, P,(i),
AS A FUNCTION
OF m, THE NUMBER OF NEUTRAL WORDS INTERPOSEDBETWEEN THE
EXPERIMENTAL WORD AND THE TEST-WORD
Circles and solid line representmeans of 30 determinations;trianglesand broken
lines representquartiles. Results for the control condition are shown above the
letter C.
two for ConditionI, in which ja is the first contextualword and separated
from jt by two words neutral with respect to i. Mean values of P,(i)
are indicatedby the circlesand solid line, quartilesby trianglesand dashed
lines. At the right of the graph, opposite the abscissalpoint labeled C, are
shown the results for the control condition.
These data show that a contextualword has its greatesteffect upon the
associationwhen it occurs immediatelyprior to the test-word.The word's
effect is considerablydiminished by introducinganotherword between it
and the test-word. Interposition of two rather than one neutral word
results in no appreciablefurther decreasein the effect of the experimental
248
HOWES AND OSGOOD
word. In each position the contextual words have a greater effect than
have the controlcontexts.These trendsare equallyclear for meansand for
both quartiles. Statisticalanalysis, the results of which are presented in
Table I, bearout these conclusions.
Ideally, this experimentwould consistof a very long context in which
the associativerelations that define the experimentwere preserved.Then
the dissipationof a contextualword's effect on the probabilityof an associative cluster could be describedas a function of the number of interposed neutralwords with the total number of words in the context as a
parameter.For very large numbersof interposedwords the effect of a contextual word should become negligible, and Pw(i) should therefore approachits value in the control condition. In Fig. 1, however,the tendency
TABLEI
SIGNIFICANCE OF DIFFERENCESIN EXPERIMENT I
Values of t (df.=29), with correspondingp values, for differencesin P,(i) when various
numbers of neutralwords are interposedbetween the experimentalword and the test-word
Number of interposedwords
0
I
t
p
3.48
.01
2
t
2.94
Control
I
2
o.12
p
.01
t
4.47
2.67
2.46
p
.01
.02
.02
.9
is for P,(i) to approacha value higher than that found in the control
condition. In our opinion, this is not merely an indicationof inaccuracy
of the data, but resultsfrom the reinforcementof the firstword of a context by an additional factor. This reinforcementcan be thought of as a
consequenceof greaterattentionpaid the first word of a sequence,or as a
result of the fact that the first contextualword is the only one that is free
of the competitivetendenciesarousedby a prior word. Anotherpossibility
is that the transitionaleffectsof the contextualwords upon each other are
not as negligible as assumed.The problem is amenableto straightforward
experimentalinvestigation.
Experiment2: Density. For this experimentwe takethree words having
strong first-orderassociativeeffectsupon i (experimentalwords) and compare the probabilityof i following a context including only one of them
with its probabilityfollowing two of them or three of them.
ASSOCIATIVEPROBABILITIESIN LINGUISTIC CONTEXTS
249
The word sets necessaryfor this experimentmay be specifiedas follows:
Condition I, , ji, j2, jt;
Condition II, jo,jb, j2, jt;
ConditionIII, j, jba,
jb, jt;
Condition IV, j'i, j2, j', jt,
where ja, ji, jc are contextualwords presumedto have strong first-orderassociative
effects upon the same associativecluster, i, and jl and j2 are words presumedto
have negligible first-ordereffects upon i. Word-set IV refers to the control condition. An illustrationcan be providedby the word-sets:I, devil, eat, basic, dark;
II, devil, fearful, basic, dark; III, devil, fearful, sinister, dark; and IV, 429, 124,
713, dark, where the probability of associate hell is to be measured. On the average,
the three experimental words can be considered to have approximately equal firstorder associative effects upon i, for the contextual words were assigned at random
to ja, jb, and jc in the various word-sets.
The probability of i following word-sets containing one, two, and
three experimental words appears in Fig. 2. As in Fig. 1, circles and solid
.40
.30
.$0
3
/
.20
/
/
/
//
/
.1
/
,
.00
I
3
2
C
n
FIG.2. THE PROBABILITY
OF n,
OF AN ASSOCIATIVE
CLUSTER,
Pw(i), ASA FUNCTION
THE NUMBER OF CONTEXTUAL WORDS HAVING STRONG FIRST-ORDER
EFFECTSON i
ASSOCIATIVE
Circles and solid line represent means of 10 determinations; triangles and broken
lines, quartiles. Results for the control condition are shown above the letter C.
lines represent means, while triangles and dashed lines represent quartiles.
Each point is based on 10 measurements, one for each word-set. Results
for the control condition are shown opposite the letter C. The probability
of an associative cluster is seen to be an increasing function of the number
250
HOWES AND OSGOOD
of contextualwords having strong first-orderassociativeeffects upon that
cluster. Statisticalanalysis,summarizedin Table II, indicatesthat none of
the differencesin Fig. 2 would be expected to occur by randomsampling
as often as one time in a hundred.
In these raw data the effects of contextualwords-which constitutethe
independentvariable-are confoundedwith those of the test-word.These
effects must be separated,for test-word and contextualwords cannot be
treateduniformly in view of the emphasisplaced upon the former by the
instructions.It has been suggestedpreviouslythat the effect of a context
can be measuredby the differencebetweenthe probabilityof an associative
cluster following an experimentalword-setand either (a) its probability
following the correspondingcontrol context or (b) its first-orderassoTABLEII
SIGNIFICANCE OF DIFFERENCESIN EXPERIMENT 2
Values of t (df=9), with corresponding p values, for differencesin Pw(i) when different
numbersof words with strong first,orderassociativeeffects appearin the context.
Number of contextual words
2
I
2
3
Control
t
2.97
p
.02
t
5.49
5.66
p
.01
.01
t
p
3.35
3
4.73
.OI
.OI
6.58
.OI
ciative probability following the test-word (i.e. its Kent-Rosanofffrequency). Either of these procedurestacitly assumes that the associative
effects of context and test-wordare algebraicallyadditive. Since the effects
of the independentvariablecan be measuredby the presenttechniqueonly
if the contributionof the test-wordcan be extracted,an assumptionof this
type is indispensible.
A test of the additivityassumptioncan be obtainedfrom the data presented in Fig. 2. We assumethat the first-orderassociativeeffectsof each
memberof a word-set are additive. Then the differenceAPw(i) between
P((i) following word-sets I, II, or III and P,(i) following the corresponding control word-set (IV) should be directlyproportionalto n, the
numberof contextualwords having strong associativeeffectsupon i. Taking the interpositionvariableof Experiment1 into account,the theoretical
ASSOCIATIVEPROBABILITIESIN LINGUISTIC CONTEXTS
251
valueof AP,(i) for a contextof Experiment2 is:
APw(i)
=r
Kpr(i),
[3]
where K is a weighting factor,evaluatedfrom the resultsof Experiment1,
which depends upon the number of words interposed between j, and jt.7
These weights, obtained by subtractingthe probabilitiesof the associative
clustersunder control conditionsof Experiment1 from their probabilities
under the correspondingexperimentalconditions, are 0.025, 0.028, and
0.071 for, respectively,2, 1, and 0 interposed neutral words. Thus the
following theoretical values of APw(i) for Experiment 2 are given by
Equation [31: Condition I (one experimentalword), 0.025; Condition
II (two experimentalwords), 0.053; Condition III (three experimental
words), 0.124. As the design of Experiment1 requiredthree associative
clustersfor each word-setwhile Experiment2 requiredonly one, the wordsets of the latter tend to be somewhat larger.This differencein size of
associativeclusters can be correctedby multiplying the theoreticalvalues
by a constant which equates theoreticaland experimentalvalues for any
one of the experimentalconditions. Adjusted in this manner for Condition I, in which only one contextualword has a strong associativeeffect,
the theoreticalmeans for Experiment2 are 0.050, 0.106, and 0.248, comparedwith the experimentallyobtainedmeans (data from Fig. 2) of 0.050,
0.137, and 0.245. For neither the two- nor the three-wordexperimental
conditionsdoes the theoreticalvalue representa significantdeparturefrom
the experimentallyobtainedmean (ts are, respectively,1.03 and 0.08 with
9 df.).
This comparisonis based upon the assumptionthat the probabilityof the associative cluster unaffectedby contextualwords can be measuredby the probabilityof
the cluster following the control context. There is some reasonto believe that this
methodmay give too high a value, however,for in a few cases the numbersof control contexts probablyhad an appreciableassociativeeffect. A second comparison
of theoreticaland observedvalues for Experiment2 was thereforemade, using the
relative frequencyof the words of the associativeclusters in the Kent-Rosanoff
tables in place of the control-conditionvalues of P,(i). Computedby this method,
the theoreticalmeansare 0.056, 0.116, and 0.228 and the empiricalmeansare 0.056,
0.143, and 0.251. Again the t's fail to approachstatisticalsignificancefor the twoandthe three-wordconditions.
The assumption that associative effects are algebraically additive is thus
This assumesthat the results of Experiment1, in which the interposedwords
were neutralwith respectto the measuredassociativecluster,hold also for the interposition of words having strong associativeeffects on it.
252
HOWES AND OSGOOD
consistentwith the present data. It can be expectedto hold, however, only
within the limitations of two defining conditions of these experiments:
(1) that the transitionalprobabilitiesof the contextual words are negligible; and (2) that the componentassociativeeffects are not too large.
The latter restrictionis imposedby the logical requirementthat the sum of
the componentassociativeeffects be less than unity. As for the former
restriction,a word-set consisting of a very familiar sequence (e.g. "Mary
had a little") would almost certainlylead to a disproportionatenumber
of associationsof the completionor speech-habittype (e.g. lamb).8 It is
also probablethat the associativeeffectsof the componentwords, or their
weights in influencingthe response,would be modified by the transitional
probabilities(cf. Experiment3 below). Even thus qualified,the assumption of additivityshould be acceptedonly with considerablecaution,since
the possibility remains that even within the present defining conditions
some word sets can be found that will yield resultsincompatiblewith the
assumption.
Experiment3: Frequency.The extentto which the associativeresponseto
a word-set is determined by the first-orderassociativeeffects of a particular contextualword may be expected to depend upon how familiar S
is with that word.
In this experiment we compare the effects of two contextual words, ja and /b, for
which p(ja) > P(jb), where p(j) is the probability of occurrence of / in a general
sample of the language behavior of the population under study. The further condition is imposed that the two words have approximately equal first-order associative
effects upon the associative cluster, i.e. that pja(i) = pjb(i). In lieu of the tables of
first-order associative probabilities called for by this condition, a subjective approximation was attempted by selecting pairs of words that are closely synonymous,
as judged by the experimenters and corroborated by a thesaurus.9 The following
word-sets then define the experiment:
Condition I, jl,/2, fa, jt;
Condition II, i, j2, ib, jt;
Condition III, j'1, j'2, j', jt.
[4]
For an example let the synonyms praise and panegyric be the contextual words
and jb and let glory be the association i made to the test-word soldier.
ja
The mean probabilitiesof the 10 associativeclusters obtained under
each of the three conditions of this experiment are as follows: control
(Condition III), 0.032; infrequent-word (Condition II), 0.035; fre8 For the meaning of these classifications cf. R. S. Woodworth, Experimental
Psychology, 1938, 350-352.
9C. O. S. Mawson, Roget's Thesaurus of the English Language in Dictionary
Form, 1940.
ASSOCIATIVEPROBABILITIESIN LINGUISTIC CONTEXTS
253
quent-word(Condition I), 0.075. The differencesare statisticallyreliable
between Conditions I and II (t = 2.64, p < 0.05) and between Conditions I and III (t = 3.46, p < 0.01). The difference between Conditions II and III is not significant.
These results indicate that the frequency with which a contextual
word occurs in the general language behavior of a population can be
regardedas a factor weighting that word's contributionto the associative
response. The insignificanceof the difference between the control and
infrequent-wordconditions then implies that the average weight of the
infrequent words in this experimentwas so low that their effective contribution to the response was little greaterthan that of neutral items.
Next let us express the contextual word's weight in determiningthe
associativeresponse as a function of its frequencyof occurrence.On the
assumptionthat associative effects of contextual words are algebraically
additive, the desired function is given by f in
APw(i) = f[p(j)].
[5
As in Experiment2, the quantityAPw(i) representsthe differencebetween
the probabilityof associativecluster i following an experimentalwordset (I or II in this experiment) and its probabilityfollowing the control
word-set (III). The term p(j) has been defined as the probabilityof
occurrenceof word j in a general sample of the languagebehaviorof the
population under study-American college students in this case. The
frequenciesof words in the Lorge MagazineCount and Thorndike-Lorge
SemanticCount, which correlatehighly with college students' ratings of
the frequencywith which they use words, can be used to measurep(j).10
Taken together, these counts give the number of times that a word
appearedin highly varied samples of written language behaviortotalling
over nine million words.
Fig. 3 presents the data. The abscissa,graphed logarithmicallyto conserve space, gives the Thorndike-Lorgefrequency of each experimental
word, j, or jb, and the ordinate gives the difference APw(i) between
the probabilityof associativeclusteri following a word-setcontainingthe
experimental word and the probability of i in the control condition
Correlationcoefficientscomputed for APw(i) as a function of log p(j)
are significantlygreaterthan zero (,I = 0.88, r = +0.77); the difference
between r and r is not sufficientto warrantrejection of the hypothesis
that the function is rectilinear in log p(j) (F = 1.91; df = 5, 13;
10E. L. Thorndike and I.
Lorge, The Teacher's Word Book of 30,000 Words, 1944;
Howes, loc. cit.
HOWES AND OSGOOD
254
p >0.10)."1 A rough estimate of the reliabilityof the measurementsof
log p(j) is given by the correlationbetween the frequenciesof words in
o
.10
.08 -
.06.04*
_
a. .02.00
-.02 -
-.02
*
.04.l
l
0.I
1l4o 5
.I
1.0
X
1.5
I
I
2.0
2.5
i.
3.0
log p (j)
FIG. 3. THE WEIGHTED ASSOCIATIVE EFFECT OF A CONTEXTUAL WORD AS A
FUNCTION OF ITS PROBABILITYOF OCCURRENCE
The abscissa shows the Thorndike-Lorge frequency of the experimental word. The
ordinate gives the difference between the probability of an associative cluster following a context including the experimental word and its probability following
a control context.
the Semantic Count and their frequencies in the Magazine Count
(r = +0.80).12
Since prediction of AP~,(i) from the frequency of oc-
"The small number of Ss made it necessary to estimate P, (i) for each word-set
in relative-frequency units of 0.02. This is too coarse a step-interval to permit accurate estimation of the small values of P,,(i) that obtain for the control word-sets.
In computing APw(i) for Fig. 3, therefore, the mean probability of i over all 10
of the control word-sets has been used. This is permissible because all associative
clusters and contexts for the control condition were subjected to the same selection
procedures. If each value of APw(i) is recomputed using the control frequency for
each individual word-set s1 and r are reduced to 0.71 and +0.64, respectively.
2
The four rarest words could not be included in these calculations since their
ASSOCIATIVEPROBABILITIESIN LINGUISTIC CONTEXTS
255
currenceof contextualwords is of the same order of accuracyas prediction of word frequencies from one sample to another, a causal relation
betweenthe two variablesis indicated.
Some of our Ss may not have known the meanings of all of the rare contextual
words. The first-order associative effects of these words could not be expected to
resemble those of their corresponding frequent contextual words. Hence the infrequent contextual words would function as neutral words, and the infrequentword contexts would be constituted essentially like control contexts. This possibility
offers an alternative interpretation of the fact that the results for the infrequentword contexts do not differ reliably from those for the control contexts. Although it
is impossible to discover directly the extent to which this factor affected the results,
it seems unlikely that more than a small proportion of the Ss employed were unfamiliar with most of the infrequent words used here. The function of Fig. 3 also
gives no evidence of a discontinuity such as might be expected to result from an
artifact of that type. Moreover, the function approaches zero for fairly well-known
words (e.g. astringent, delectable): thus words of even greater rarity would also
be expected to have zero weights, in which case it would make no difference to the
experimental results whether the subjects understood the word or not.
Independenceof neutral contextualwords. In all of these experiments
it has been assumedthat the words we have selected as neutralcontextual
words (representedby numerical subscripts) have no appreciableeffect
upon the probabilityof the associativeclusters. Precautionstaken to assure satisfactoryselection of these neutral words have already been explained,but it is desirableto have an empiricalcheck.
The data of Experiment 3 can be used for this purpose. In Conditions I and II
of that experiment we have two word-sets that are identical save for their third
contextual words. A new associative cluster u can then be so chosen that (1) each
of the first two contextual words has a strong associative effect upon u, and (2) the
words in the third contextual positions are neutral with respect to u. These third
contextual words in Experiment 3 are synonyms with comparable first-order associative effects upon u, but the third word of Condition I is a much more frequent
word than that of Condition II. Now let us suppose that the assumption that these
words are neutral with respect to cluster u is false. This will mean that the third
word of each word-set will contribute to the probability of u. By Experiment 3, the
contribution of a frequent word to the associative response must be weighted much
more heavily than that of a rare word. The third contextual word of Condition I
will thus increase the probability of u more than will the third contextual word of
Condition II, and therefore the measured value P,(u) will, on the average, be
larger for Condition I. The results, however, show a small difference in the opposite
direction: the mean probability of clusters u is 0.100 following Condition I (frefrequencies in the Magazine and Semantic counts are not distinguished in the published Thorndike-Lorge tables; but as the scatter-diagram of the data indicates that
the reliability of infrequent words is as low as, or lower than, that of frequent words,
the present argument is not invalidated by their omission.
256
HOWES AND OSGOOD
quent-word contexts) and 0.117 following Condition II (infrequent-word contexts).
A difference of this size would be expected more than 3 out of 10 times by random
sampling (t = 0.98, df = 9). Consequently the null hypothesis, that words selected
to be neutral have in fact no effect upon an associative cluster, should not be rejected.
Discussion. The results of these experimentslend themselvesto a surprisingly simple interpretation.This, however, requires a more refined
definition of the associativeeffect of a stimulus-word.We consider that
S is capable of emitting any one of a set of alternativeword-responses
ia, i3, . . . , iv. Each response-wordwe assumeto have an averageprobability of emission p(i) independentof any stimulus-word.These values
may be regardedas the relative habit strengths of the words, and presumably they are sampled by tables like those published by Thorndike
and Lorge.13The effect of a particularstimulus-word(when S is set by
the instructionsof the word-associationtest) is then to redistributethese
probabilities,increasingthem for some words, decreasingthem for others,
and leaving some practicallyunchanged.Thus the associativeeffect of a
stimulus-word is properly measured by the difference pj(i)p(i).
A set of such probabilitychanges for all possible responses,ia, . . , iv,
we assume to be a fixed propertyof the stimulus-wordand the population of Ss.
Considernow what happenswhen an S perceivestwo or more stimuluswords in the association-experiment.The change in probabilityof one
response-wordrelative to another is a property of each stimulus-word
and cannot be changed by the fact that each stimulus-wordnow appears
as one of a sequence.Only the extent to which a stimulus-wordaffectsthe
response-mechanismas a whole-its weight in Equation [3]-can vary.
Thus it is the capacityof the perceived stimulus-wordto 'capture'the
response-mechanismthat decreaseswith the number of other stimuluswords interposedbetween it and the momentof response (Experiment1),
and that increases in approximateproportion to the logarithm of its
probabilityof occurrence(Experiment 3).14 Hence the high correlation
between AP,(i) and p(j) found in Experiment3-which would hardly
be expected if more complicatedinteractionsamong the effects of con13
The correct statistic would be -ljp(j)pj(i), the sum of the associative probabilities for all possible stimulus-words weighted according to the probability of occurrence of each stimulus-word. A preliminary comparison of the Kent-Rosanoff and
Thorndike-Lorge tables indicates that the Thorndike-Lorge frequency of a word
gives a good estimate of this value except for a few special classes of words.
1 In this connection it is interesting to note that the time for which a stimulusword must be exposed tachistoscopically in order for it to be perceived can be decreased in approximate proportion to the logarithm of its probability of occurrence
(cf. Howes, op. cit., esp. Ch. IV).
ASSOCIATIVEPROBABILITIESIN LINGUISTIC CONTEXTS
257
textual words took place-and the additivity of associativeeffects found
in Experiment2.
In this paper only word-setswith zero transitionalprobabilitieshave been considered. What would happen if this restrictionwere removed?We have already
seen that, on empirical grounds, one can expect that the presence of familiar
sequencesin the word-setswould greatlymodify the associationsgiven. The simple
model describedabove predictssuch differences.When transitionalprobabilitiesare
appreciable,the probabilityof occurrenceof a stimulus-word,p(j), dependsupon
the particularwords that precedeit. The weights of the respectivecontextualwords
in determiningthe responseswould thus be changedgreatly, yielding results for
P, (i) entirelydifferentfrom those calculatedon the simple basis used in the present studies.
This conception of the associativeprocess is much simpler than many
views of linguistic processes would lead us to expect. It does not, for
example, postulate the representationalmediation-processesfound necessary by one of the authors to account for many aspects of linguistic
behavior,particularlythose involving semantic functions.15This simplicity, however, relates only to the manner in which certain concepts are
interrelated.The concepts themselves are statisticallydefined and thus
are recognized to be the product of complex multiple determination.
Study of furthervariableswithin the presentexperimentaldesign may, indeed, require a more complicatedinterpretationlike that affordedby the
mediation hypothesis.
SUMMARY
(1) The prediction of the language behavior of one population of
Ss from the language behaviorof a second population is formulatedin
statisticalconcepts.The basicconceptis thatof associativeword-probability,
defined as the probabilitythat one person (or population of persons)
will emit a word as an associationfollowing the emissionof a given stimulus-word by anotherperson. This concept is applied to the predictionof
the probabilityof a word-associationfollowing a sequence of stimuluswords from the probabilityof that associationfollowing each of the component stimulus-wordstaken separately.
(2) Three experiments, each using 200 college students, indicate the
following: (a) the effect of a given stimulus-wordon an associativeresponse is a decreasing function of the number of additional stimuluswords interposedbetween it and the time of response; (b) the effect of
5C. E. Osgood, The nature and measurementof meaning, Psychol. Bull., 49,
1952, 197-237.
258
HOWES AND OSGOOD
a sequenceof stimulus-wordsupon an associativeresponseis an increasing
function of the proportion of those stimulus-wordshaving similar firstorder associativeeffects on the response; and (c) the effect of a given
stimulus-wordon an associativeresponse is an increasingfunction of the
frequencyof occurrenceof the stimulus-wordin general linguistic usage.
(3) Quantitativespecificationof these functionssuggestscertainassumptions aboutthe way in which the effectsof severaldifferentstimulus-words
interactupon the same associativeresponse.The presentdata are consistent
with the assumptionthat these effectsare algebraicallyadditive.