Does CHARM Need Depth? Similarity and Levels- of

Journal of Experimental Psychology;
Learning, Memory, and Cognition
1987, Vol. 13, No. 3,443-455
Copyright 1987 by the American Psychological Association, Inc
0278-7 393/87/$OO.75
Does CHARM Need Depth? Similarity and Levelsof-Processing Effects in Cued Recall
Stephan Lewandowsky and William E. Hockley
University of Toronto, Toronto, Ontario, Canada
Eich {1985) recently presented a distributed memory model in which the pattern of results used to
support the levels-of-processing view of Craik and Lockhart (1972) was modeled by different degrees
of similarity between the encoding context and the to-be-recalled item. We report two experiments
in which both phonemic and semantic similarity were varied between pairs of words and incidental
acquisition (rhyme vs. category judgments) was varied across the same pairs of items. In both experiments the manipulation of the acquisition task produced a difference in cued-recall performance
for positive and negative rhyme and category judgments. Recall was better following a category
encoding decision than following a rhyme decision. This difference was independent of the effects of
similarity, which demonstrated that Eich*s (1985) assumptions regarding the effects of similarity are
not sufficient to account for the differences resulting from the manner in which subjects encode
information. An alternative method of modeling the levels-of-processing effect within the framework
of distributed memory models is proposed.
The levels-of-processing approach has been an influential
framework for research in human memory since Craik and
Lockhart's seminal (1972) paper. According to the levels-ofprocessing view, memory for events is determined by the type
of processing that is performed on the to-be-encoded material.
Recently, Eich (1985) has offered a different theoretical interpretation of the typical results used to support the levels of processing framework. According to Eich's view, differences in
memory performance between encoding conditions result from
differences in the degree of similarity between the target items
and their respective context. The present article contrasts the
levels-of-processing approach with Eich's similarity hypothesis
and reports an empirical investigation that pits these two views
against each other.
quired a category judgment ("Is it in the category fruit?—
CHERRY") or a rhyme judgment ("Does it rhyme with pond?—
WAND"). These judgment tasks were assumed to emphasize semantic and phonemic processing, respectively. Orthogonal to
manipulating the type of processing, the judgments required
either a positive response (the target word was a member of the
cue category or the word did rhyme with the cue) or a negative
response (the target was not a member of the cue category or
the target and cue did not rhyme). Subjects were not aware of
the impending memory test, and when memory was measured
through cued recall, Moscovitch and Craik found (a) a main
effect of encoding condition: Cued recall was better when the
target items had been presented in a category judgment than
when the targets had involved rhyme judgments; (b) a main
effect of response type: Cued recall was better when the category
or rhyme judgment had required a positive response than when
the judgment had required a negative response; and (c) an interaction of encoding condition and response type: The difference
in cued recall between the encoding conditions was attenuated
for negative responses. This pattern is representative of the outcome of many experiments that manipulated levels of processing (e.g., Craik & Tulving, 1975).
The theoretical analysis of the levels-of-processing effect
evolved from the assumption that memory is an incidental byproduct of perceptual and cognitive processing. The durability
and discriminability of the memory trace was thus seen to be a
function of the nature and quality of the encoding operations.
The original levels-of-processing proposal (Craik & Lockhart,
1972) focused on the depth of processing as a determinant for
memory performance. Depth of processing was seen to vary
along a continuum from shallow (sensory) processing to deep
(semantic) processing. According to this early conception of
levels, the deeper the initial processing was, the better subsequent memory performance was expected to be.
Furtherfindingsled to the addition of constructs such as elab-
Levels of Processing
The pattern of results used to support the levels-of-processing
framework is exemplified by the data of Moscovitch and Craik
(1976). Subjects were presented with questions that either re-
The work reported in this article was supported by Natural Sciences
and Engineering Research Council (NSERC) of Canada Grant APA 146
to B. B. Murdock, Jr. During parts of this work, Stephan Lewandowsky
was a postdoctoral fellow supported by NSERC Grant A8351 to Ian
Spence.
Both authors contributed equally to this paper. Order of authorship
was determined by the toss of a coin performed by Rennet Murdock.
Thefirstauthor wishes to thank Bennet Murdock for his coin toss. We
are grateful to Fergus Craik, Robert Lockhart, and Bennet Murdock for
their comments on an earlier version of this manuscript, and we also
wish to thank Janet Metcalfe Eich for helpful discussions.
Correspondence concerning this article should be addressed to Stephan Lewandowsky or William E. Hockley, Department of Psychology,
University of Toronto, Toronto, Ontario M5S 1A1, Canada.
443
444
STEPHAN LEWANDOWSKY AND WILLIAM E. HOCKLEY
oration within domains of encoding (Craik & Tulving, 1975;
Lockhart, Craik, & Jacoby, 1976), integration or congruency
(Craik & Tulving, 1975), and uniqueness or distinctiveness (Jacoby & Craik, 1979; Moscovitch & Craik, 1976). These additions shifted emphasis away from the concept of depth of processing per se and toward qualitatively distinct encoding operations. In spite of these modifications, however, the core of the
levels-of-processing framework continues to be "the fact that
very different outcomes are observed in memory experiments
as a function of different mental operations" (Craik & Lockhart, 1986, p. 361).
Tantamount to the importance of the levels-of-processing
framework is the number of criticisms that have been leveled
against it, It has been pointed out that the levels framework is
inherently circular (cf. Nelson, 1977), and that terms such as
depth or domain of encoding lack formal definition (cf. Morris,
Bransford, & Franks, 1977). Consequently, a formal description
of the levels-of-processing effect should be welcomed by most
of the critics.
CHARM
Eich (1985) provided thefirstformal, detailed, and quantitative explanation for the levels-of-processing effect based on
CHARM (composite holographic associative recall model; Eich,
1982). In CHARM, items are represented as lists of ordered features (i.e., vectors). Two items are associated by means of the
convolution of the two corresponding vectors. The convolved
image is added to a composite memory trace. At recall the vector representing the context cue is correlated with the composite trace and the retrieved set of features is identified by being
matched to every item in a lexicon. The item that yields the best
match with the retrieved information is given as a response.
In her 1985 article, Eich applied CHARM to the levels-of-processing domain. She chose to implement the levels-of-processing effect in a way that contrasted sharply with the view developed by Craik and Lockhart (1972). She rejected the idea that
the levels-of-processing effect necessarily is the result of different types of encoding operations, because she did "not assume
that perceptual analysis produces different patterns of features
to represent the same word . . . depending on the experimental
condition" (Eich, 1985, p. 3).' Put another way, in Eich's simulations the features of an item vector were invariant regardless
of whether the orienting task emphasized phonemic or semantic aspects of the stimulus.
Consequently, in the simulations of CHARM the levels-of-processing effect was not located in the series of mental processes
following the presentation of a stimulus but in the representation of the stimulus itself. In particular, Eich proposed that the
similarity between the encoding context and the to-be-encoded
item was greater in the deep encoding condition than in the
shallow condition and that "recall of the target is an increasing
function of the similarity of the context associated with the target, and the target" (Eich, 1985, p. 19). As an example, consider
the encoding questions "is the word a kind offish?—TROUT"
(deep) and "does the word rhyme with shout?—TROUT" (shallow). According to Eich, the similarity between TROUT and fish
in thefirst(deep) question is greater than the similarity between
TROUT and shout in the second (shallow) encoding question.
The observed differences in recall (or recognition; Eich provided a thumbnail sketch of a recognition version of CHARM
that was able to model a levels-of-processing effect in recognition in the same fashion) are seen to be caused by the underlying
differences in similarity.2
This view of levels of processing immediately gives rise to two
questions. First, how can experimental precision and theoretical succinctness be achieved when resorting to a concept, such
as similarity, that appears to be susceptible to problems of fuzziness and circularity; and, second, is this explanation not overly
restrictive in that it can only apply to situations in which the
encoding manipulations are confounded with nominal stimulus similarity?
Similarity
Similarity can be of great utility when used as a theoretical
construct. In a computer simulation, for example, similarity
may be useful because it can be given a strict operational definition. However, similarity is a fuzzy concept when applied directly to the empirical realm. For example, it has been suggested (e.g., James, 1890) that similarity is not antecedent to
mental processes but a consequence thereof. Put another way,
the identity of two stimulus items (i.e., their nominal similarity)
may bear no correlation with the functional (i.e., psychological)
similarity as perceived by the subject. If one adopted the functional view of similarity, it would be impossible to assess similarity independent of mental processes. Thus, in the levels-ofprocessing domain, it would be impossible to assess similarity
independent of recall performance. The circularity of the levels
approach (cf. Nelson, 1977) would continue its spins, and such
a similarity interpretation of levels would have substituted one
circular mentalistic concept (similarity) for another (depth).
However, Eich defined similarity operationally as the number
1
In the extreme case Eich's statement seems to imply that qualitatively unchanged encoding processes occur even when the legibility of
the stimulus is virtually nil or when subjects pay no attention to the
words. However, Eich did state that all features must be encoded for her
implementation of levels to apply. Consequently, in order to allow a fair
test of CHARM, one should not consider marginal encoding conditions.
For the subsequent discussion, we therefore restrict our scope to include
only encoding manipulations that require treatment of the word as a
whole. This excludes situations in which subjects have to count letters
or vowels because these manipulations may even constitute a distractor
task that inhibits encoding of ail features of the stimulus, as is assumed
by CHARM. Instead, in line with Eich's (1985} article, we focus on tasks
that require semantic or phonemic analysis of the items as a whole.
2
CHARM'S explanation has been criticized by Craik and Lockhart
(1986) in a theoretical article that takes issue with Eich's (1985) proposal. One of the criticisms is that CHARM can only account for encoding judgments that require a positive (yes) response. For negatives (e.g.,
"does the word rhyme with shout?—HOTEL" and "is the word a kind
offish—HOTEL")it is virtually impossible to envisage systematic
differences in similarity between encoding conditions. Therefore,
CHARM cannot predict a levels effect for negatives, a shortcoming that
is explored in detail in Craik and Lockhart's (1986) theoretical note.
Consequently, in the subsequent discussion any levels effect for negatives must be recognized as damaging to Eich's analysis, and we will not
take up that point again before the Genera! Discussion.
445
LEVELS AND SIMILARITY
of features shared in common by two items. An increase in similarity was simulated by increasing the number of features that
are shared by two items. This feature overlap, in turn, can be
assessed by rating tasks, so that "the similarity among items
may be ascertained independently of recall" (Eich, 1985,p. 21).
Therefore, much to Eich's credit, it is possible to test her explanation by examining or manipulating similarity between items
by using subjects' ratings.
Eich performed an experiment in which subjects rated items
that had been used in a previous levels-of-processing study
(Fisher & Craik, 1977) for their similarity, which allowed predictions (or postdictions) of CHARM'S performance. Eich found
that items used in the "deep" encoding condition were indeed
more similar to each other than those used in the "shallow"
condition. This demonstrates that Eich's explanation is viable
for the Fisher and Craik study, but it does not guarantee that
such an explanation would encompass all of the findings that
have been used to support the levels-of-processing framework.
Stimulus Confounding
Given Eich's concept of similarity, her interpretation can apply to only those situations in which the stimulus materials
differed between encoding conditions. When stimulus materials
are held constant, CHARM cannot predict a difference in memory performance for different encoding conditions.
Surprisingly, in 14 years of research on levels, there are only
two studies known to us in which the stimulus material was
not confounded with levels of processing. Furthermore, both of
these studies used encoding manipulations that failed to ensure
that the item was processed as a whole. In the experiments of
Epstein, Phillips, and Johnson (1975) the shallow condition involved a letter-counting task. Similarly, in the shallow encoding
condition of Graf and Schacter (1985, Experiment 1) subjects
had to compare the number of vowels present in the stimulus
words.
These two studies do not present a fair test of CHARM'S predictions because it is not altogether clear whether subjects had
to read the stimulus words at all in the shallow encoding conditions. Consequently, CHARM'S position could be that in these
unusual encoding situations only a subset of the features is encoded.
Aside from these two examples, previous studies of levels of
processing have always confounded the particular materials
used with the encoding manipulation. Whereas the target items
in previous experiments would usually be the same in all encoding conditions (e.g., Craik & Tulving, 1975), the contextual
frame provided at encoding would always differ between conditions. Obviously, something must differ between encoding conditions to operationalize the variable of interest. And changing
the sentence frame was the most readily available way to induce
different types of cognitive processing across encoding conditions. However, owing to the use of contextual frames, all levels
studies varied nominal identity—and therefore similarity—of
the stimuli in addition to manipulating the nature of the orienting task.
Consequently, disregarding negatives, Eich's interpretation
of the levels-of-processing effect cannot be ruled out on the basis of extant data. If her interpretation were found to be correct.
it could raise the spectre that the levels-of-processing phenomenon may be the result of a confounding of stimulus material.
Levels Versus CHARM
Eich's interpretation of the levels-of-processing effect stands
in clear contrast to Craik and Lockhart's (1972) view. In
CHARM, the representations of target items differ between encoding conditions only in the degree of similarity between the
target items and the context. According to Craik and Lockhart,
on the other hand, the processes leading to the formation of
memory traces differ in nature and quality between encoding
conditions. The similarity of the context to the items is not considered to be the major determinant of performance.
These two contrasting positions were tested in the two experiments reported in this article. Similarity of items was manipulated orthogonally to encoding task in an incidental cued-recall
task. Identical items were used across encoding conditions. Subjects were presented with 80 pairs of words; both items in each
pair rhymed and were semantically related. On the basis of ratings provided by independent judges, 40 of these pairs were of
low phonemic and semantic similarity (e.g., Gorilla-Zebra).
The remaining 40 pairs were rated high on both dimensions
(e.g., Breast-Chest) and were used for the high-similarity condition.
These same pairs appeared in two conditions: The rhyme
condition induced "shallow" encoding, and the category condition reflected a "deep" orienting task. In the rhyme condition
subjects had to indicate whether or not the pair members
rhymed, whereas in the category condition subjects had to decide whether or not both items shared a common category. The
design of both experiments thus was a 2 (encoding condition) X
2 (similarity) completely with in-subjects arrangement. The
effect of similarity was evaluated between pairs of items and the
effect of encoding condition was evaluated within pairs of items
but across subjects. Every subject responded to all 80 pairs during the orienting task. Half of the orienting list required the
category (deep) judgment; the other half required the rhyme
(shallow) judgment. Incidental memory for the pairs was measured through cued recall after presentation of the orienting list.
The major difference between experiments was the set of stimuli
that was used.
According to Eich's implementation of the levels-of-processing effect, performance on identical items presented in the same
context should not differ between levels of processing, because
the item-to-context similarity remains unchanged between conditions. The levels-of-processing view, on the other hand, must
make the prediction that memory performance would vary
with levels of processing, even if the items remain unchanged.
Furthermore, Eich's view certainly, and perhaps also the levelsof-processing approach, would predict that memory performance should be better for item-context pairs that are highly
similar to each other than for pairs that are less similar.
Experiment 1
Method
Stimulus Selection
An initial pool of 150 pairs of words was generated. Items within a
pair both rhymed and shared a common semantic category. Category
446
STEPHAN LEWANDOWSKY AND WILLIAM E. HOCKLEY
membership was determined on the basis of the Toronto Categorized
Wordpool (Murdock, 1976) and the Battig and Montague (1969)
norms, whereas the presence of a rhyme was established by the authors'
judgment. An additional 10 pairs of semanticaily unrelated items,
which also did not rhyme, were added to this list. The pairs were then
given to judges for ratings of similarity.
One problem concerned how to ask judges to rate similarity, as the
similarity judgments could vary on the basis of the criteria judges might
decide to use. Furthermore, the relevant dimension of similarity (or the
weightings of these dimensions) could be different in a rhyme-judgment
than in a category-judgment task. This would have presented a problem
in the present case, because the experiment compared performance
across a rhyme and a category judgment, where the items had to be of
approximately equal similarity in both tasks. Therefore, we decided to
ask three separate groups of four judges each to rate similarity on
different dimensions. The judges were graduate students in psychology.
One group of judges was asked to rate each pair only on the degree of
semantic similarity between pair members, a second group was asked
to rate each pair only on the degree of phonemic similarity, and the third
group was asked to provide a global similarity rating using as many
criteria as possible. Each judge received the pairs in a different random
order and rated each pair on a scale of 0 (completely dissimilar) to 7
[extremely similar).
The correlations between judges ranged from .38 to .66 for semantic
similarity, from .52 to .71 for phonemic similarity, and from .43 to .68
for global similarity. The correlations between groups of judges (from
the mean rating for each item in a group), in turn, ranged from .52
(phonemic and semantic), .76 (global and phonemic), to .79 (global and
semantic).
The ratings for each of the 150 pairs (discarding the 10 unrelated
pairs) were averaged across judges lo obtain an overall similarity score.
The 40 pairs of items that were given the highest overall mean ratings
and the 40 pairs of items that were given the lowest overall mean ratings
were selected for the experimental orienting list. The high-similarity
set of pairs had mean ratings of 5.63 for semantic similarity, 5.47 for
phonemic similarity, and 4.92 for global similarity. The low-similarity
set of pairs had mean ratings of 3.71 for semantic similarity, 3.76 for
phonemic similarity, and 3.34 for global similarity. The high- and lowsimilarity pairs and the mean ratings for each group of judges and across
groups of judges for each pair are presented in Appendix A.
Subjects and Arrangement of Material
A total of 18 undergraduates participated in the experiment and were
given credit toward Ihe completion of a course requirement. The orienting list was presented on an IBM Personal Computer, which also scored
subjects' responses and response times(RTs)to the orienting judgments.
During testing, subjects were seated in a sound-attenuating Industrial
Acoustics chamber.
Each subject responded to all 80 pairs. Half of the pairs were presented in the category-judgment condition, the other half in the rhyme
judgment condition. Pairs in an encoding condition were presented
blocked: That is, all pairs requiring a category judgment were shown
one after another and the rhyme judgments were also shown together.
Across subjects, the order of encoding conditions was counterbalanced.
Category judgments preceded rhyme questions in the orienting list as
often as rhyme judgments preceded category questions. Within an encoding condition the presentation order of the particular set of pairs was
randomized separately for each subject without regard to the similarity
status of a pair.
Within each encoding condition, 20 pairs were highly similar and the
other 20 were taken from the low-similarity pool. Assignment of items
to encoding condition was determined randomly for every second subject. The other (yoked) subject, who did not receive a new random selec-
tion, responded to items in the other encoding condition. Thus, it was
ensured that every item pair—regardless of its similarity—was also presented in the category encoding condition, if the pair had been allocated
to the rhyme condition by a given randomization. Across the two yoked
subjects, therefore, the same items appeared in both the rhyme and in
the category conditions, and across pairs of yoked subjects every item
pair had an equal chance of co-occurring with any other pair in the
same encoding condition.
A random half of the 20 pairs in each similarity-encoding combination required a positive response ("yes"). These pairs consisted of the
items the initial judges had performed their ratings on, and thus always
rhymed and also shared a common category. The remaining 10 items
in each of the cells required a negative response ("no"). For these items,
thefirstmember of a pair was combined with another item that neither
rhymed with the first word nor shared the first word's category. Each
pair in the pool had associated with it such a third negative item that
was used whenever the pair was sampled for the negative condition. The
negative items associated with each pair are also listed in Appendix A.
The similarity factor was meaningless for negative pairs because items
neither rhymed nor shared a common category. Across the two yoked
subjects, the same items were assigned to negative and positive pairs.
Across yoked pairs of subjects, a given pair had an equal chance of being
either negative or positive. Each subject was thus presented with 10 positive pairs from each of the four cells of the 2 x 2 design. Each subject
also responded to 10 negative pairs from each of the cells.
Procedure
Orienting list. Subjects received instructions in writing. Participants
were told the purpose of the experiment was to provide norms for rhyme
and category pairs and that their task was to decide on a pair's category
membership or rhyming relation. Before receiving the experimental list,
subjects responded to a 16-item practice list in the presence of the experimenter. Practice items were the same for all subjects, and the nature of
the practice list was identical to that of the experimental list.
The subject initiated the experimental orienting list by pressing the
space bar after the experimenter had left. Depending on counterbalancing, thefirst40 pairs of the list either required the category judgment or
the rhyme judgment. Following the first 40 pairs, the subject was informed that the other judgment was required for the next half of the list.
The subject had to press the space bar again to proceed to the second
set of 40 pairs.
For the pairs in the category judgment task subjects had to decide
whether or not the two items shared a common category. To indicate a
yes response (for a pair such as Hotel-Motel), the subject pressed the
slash (/) key on the lower right of the keyboard. The backslash ( \ ) key
on the lower left of the keyboard was used for negative responses (HotelMoose). For pairs in the rhyme-judgment task—subjects had to decide
whether or not the two items rhymed—responses were indicated by using the same keys.
Items for a pair were presented next to each other in the middle of
the screen with a 2-cm blank space between pair members. Each pair
remained on the screen until subjects responded, and the next pair was
presented 400 ms after the response. Throughout presentation of 40
pairs in a given condition, an appropriate prompt (Category? or
Rhyme?) was displayed at the upper left part of the screen. Following
presentation of the last (80th) pair, the computer informed subjects that
the experiment was over, and the experimenter returned for the surprise
cued-recall test.
Cued recall. Subjects recorded their answers on a cuing sheet, which
contained the left member of each pair with a responsefieldnext to it.
The order of cues was randomized separately for each subject without
regard to encoding condition, similarity, or response status. To avoid
criterion problems, subjects were required to respond to every cue in
447
LEVELS AND SIMILARITY
Table I
Numbers and Percentages ofItems Recalled Across
Ail Subjects: Experiment 1
Encoding condition
Category
Similarity
No.
Rhyme
%
No.
%
Stimulus conditional1
8
Positive
High
Low
Total
Negative b
None
120/180
85/180
205/360
67
47
57
105/180
20/180
125/360
58
11
35
59/360
16
3/360
1
Subject conditional
c
Yes
High
Low
Total
Nod
None
109/160
73/152
182/312
68
48
58
92/152
13/9?
105/249
61
13
42
82/408
20
23/471
5
Note. No. refers to the number recalled out of a total number presented.
Yes response required on orienting list.b No response required on orienting list.c Yes response given on orienting list. d No response given on
orienting list.
MSK = 2.13. The interaction reflects the fact that the effect of
encoding condition was attenuated for high-similarity items.
For negatives, the similarity factor was meaningless, and indeed no differences were observed between items that would
have been highly similar had they been positives (9%) and items
that would have been of low similarity (9%). Across levels of
the encoding manipulation, however, striking differences were
observed, with category encoding resulting in a recall accuracy
of 16% as compared with rhyme encoding, which yielded
1% correct. A one-way within-subjects ANOVA showed that
this difference was significant, F(\, 17) = 18.31, p < .001,
MS« * 4.76.
Subject conditional. In addition, the cued-recall data were
classified in a subject-conditional way. In this analysis, we only
considered a subject's actual response on the orienting list. All
pairs subjects responded "yes" to were grouped together (mirroring the aforementioned positives), and were compared to all
pairs subjects responded to with "no." The lower panel of Table
1 presents the subject conditional data for yes and no responses
separately. No ANOVAS were performed on the subject-conditional cued-recall data because each subject's entries were
based on a different number of possible responses and because
the pattern was identical to the stimulus-conditional data.
a
order even if they had to guess. Subjects had an unlimited time in which
to complete the recall task.
Results
CuedRecall
Stimulus conditional. The principal cued-recall analysis was
performed separately for pairs requiring a positive orienting response (we call these items positives) and pairs requiring a negative response (negatives). Positives were pairs that our initial
judges had rated as being rhymes and sharing a category (regardless of their similarity), whereas negatives consisted of two
items that in the authors'judgment were unrelated and did not
rhyme. For this analysis the subjects' actual orienting responses
to the pairs were not considered. That is, some observations
were assigned to a positive condition although subjects might
have responded "no" to these pairs. The upper panel of Table
1 presents the average stimulus conditional cued recall for positives and negatives separately.
For positives, recall was superior for the category-encoding
condition (57%) as compared with the rhyme-encoding condition (35%). The 2 X 2 within-subjects analysis of variance
(ANOVA) confirmed this difference as significant, F(\, 17) =
41.85,/; < Ml, MS* = 2.12. Similarity also had an effect, with
high-similarity items leading to belter recall (63%) than lowsimilarity items (29%). That difference also was significant, F{ 1,
17) - 147.82, p < .001, MSe = 1.35. In addition, the ANOVA
showed that the interaction between encoding condition and
item simiJarity was significant, F(\, 17) = 16.27, p < .001,
Guessing Rates
Because positive pairs in each condition both rhymed and
$hared a common category, subjects had a good guessing strategy available to them during cued recall. To estimate the guessing rate, we calculated the number of instances in which subjects generated the "correct" word at recall when the cue word
had actually been presented in a negative pairing on the orienting list (and in which the "correct" word had never been seen).
An example of a "correct" guess would be if a subject recalled
"Motel" given the cue Hotel, even though Hotel had been presented in a pair with Moose. Correct guesses for each condition
are presented in the upper panel of Table 2.
Table 2 (upper panel) shows that a conservative baseline for
the guessing rate is about 11% for high-similarity pairs and 1%
Table 2
Estimates ofGuessing Rates for Each Encoding
Condition: Experiments 1 and 2
Categ;ory
Similarity
No.
Rhyme
%
No.
Total
%
No.
%
13
2
8
39/360
4/360
11
1
13
4
9
33/320
13/320
10
4
Experiment 1
High
Low
Total
15/180
1/180
16/360
8
1
4
24/180
3/180
27/360
Experiment 2
High
Low
Tbtal
12/160
6/160
18/320
8
4
6
21/160
7/160
28/320
Note. No. refers to the number recalled out of the total number presented.
448
STEPHAN LEWANDOWSKY AND WILLIAM E. HOCKLEY
Table 3
Mean Reaction Times (in Seconds) to Orienting
Judgments: Experiment 1
Encoding condition
Category
Rhyme
Similarity
RT
SE
RT
SE
High*
Untrimmed
Trimmed
2.45
2.35
0.26
0,20
3.30
2.37
0.48
0.21
Untrimmed
Trimmed
3.50
2.88
0.50
0.28
5.11
3.06
1.07
0.29
Untrimmed
Trimmed
4.16
3.27
0.75
0.40
3.24
2.60
0.48
0.25
Low*
None b
Note. Trimmed means exclude RTs over 10 s.
a
Yes response given on orienting list. b No response given on orienting
list.
for low-similarity pairs. Thus, the advantage of recall for highsimilarity pairs compared with low-similarity pairs seen in Table 1 should be qualified by the guessing rates presented in Table 2. If this is done, the overall advantage of high-similarity
over low-similarity pairs is reduced somewhat. However, the
guessing rates do not pose a problem for interpreting the advantage of cued recall following category judgments as opposed to
rhyme judgments, because the guessing rate was higher for the
rhyme condition. Thus, if a correction were made for guessing
rate, the differences between encoding conditions would only
be accentuated.
Response Times
During presentation of the orienting list, subjects' RTs were
recorded. These RTs were classified by yes and no responses in
the same manner as the foregoing subject-conditional recall
data. Two measures of RT were computed: Untrimmed means
were based on all available responses, whereas trimmed means
were calculated by excluding observations whose RTs exceeded
10 s. This cutoff appeared to be fairly conservative, yet a total
of 81 observations had to be dropped by this criterion. Table
3 presents both RT measures and their standard errors for all
conditions separately for yes and no responses. The response
times were not conditionalized on correct subsequent recall;
they provide an estimate of the overall average processing time
for each of the conditions.
Discussion
The present study found a highly significant effect of encoding condition. Deep semantic processing led to better recall
than shallow phonemic encoding. Thus, the standard levels-ofprocessing effect remains largely unchanged even when the confounding of treatment and materials is removed. And, when
compared to previous research, the effect was particularly large
for items that required or received a no response at encoding;
recall in the category encoding condition was almost 20 times
higher than recall in the rhyme condition (59 vs. 3 items recalled).
For positives, the overall effect of encoding condition was not
as striking as has been observed in previous studies. However,
high-similarity items in the category condition were recalled six
times better than low-similarity items in the rhyme condition;
this comparison provides the proper parallel with previous
studies (e.g., Craik & Tulving, 1975) that did not control similarity between encoding conditions. Ourfirstexperiment therefore replicated the magnitude of the difference obtained in previous studies if the diagonal entries in Table 1 are compared.
And if the main effect of encoding is considered, the results are
in clear disagreement with CHARM'S predictions. The very same
items, whose nominal similarity must be the same, are recalled
to greatly varying extents, depending only on the type of processing performed on them.
The first experiment also found a highly significant effect of
similarity on memory performance. It appears that CHARM correctly predicts that similarity between pair members benefits
cued-recall performance. This, does not, however, reflect on the
appropriateness of the levels-of-processing view, because its
proponents have never claimed that similarity would have no
effect.
However, the obtained interaction between similarity and encoding condition is reason for some concern. If the effect of encoding task is considered separately for high- and low-similarity
items, it turns out that the ubiquitous and powerful levels-ofprocessing effect has been reduced to an 8% difference for highsimilarity items. For low-si mi far ity items, on the other hand,
the leveis-of-processing effect was quite large (35%).
Although we attempted to equate high and low similarity for
both the semantic and phonemic dimensions of our stimuli, we
cannot be sure how successful this manipulation was. Whereas
the similarity ratings of the judges were numerically similar, the
ratings of semantic and phonemic similarity may not be directly comparable. Even though the same 7-point scale was
used in each rating task there is no guarantee that the numbers
on the scales had the same meaning in the two tasks.3
Two aspects of the data suggest that, indeed, the manipulation
of similarity was not equivalent for the semantic and phonemic
dimensions of the stimuli. First, contrary to previous findings
(e.g., Craik & Tulving, 1975) the mean yes response times for
rhyme judgments were slower than mean yes response times for
category judgments. This difference was particularly striking
for low-similarity pairs, as was the discrepancy between
trimmed and untrimmed RT means. This indicates that some
of our low-similarity items resulted in extremely long response
latencies. Second, comparison of the upper and lower panels of
Table I shows that virtually half of all orienting responses to
(positive) low-similarity rhymes were in fact negative. Thus, our
subjects were not in complete agreement with our judges, which
indicates that the quality of the rhymes for the low-similarity
pairs was not wholly adequate. Experiment 2 was designed to
correct this problem.
3
We wish to thank Arthur Glenberg for pointing out this problem
to us.
449
LEVELS AND SIMILARITY
Experiment 2
Experiment 2 was identical to the first experiment but used
a revised set of stimuli. A primary aim in revising the stimulus
set was to improve the quality of the low-similarity rhymes.
This should increase the agreement between our classification
of pairs (into positives and negatives) and subjects' responses
("yes" or "no"), and should avoid extremely long RTs that
would distort the pattern of means.
Table 4
Numbers and Percentages of Items Recalled Across
All Subjects: Experiment 2
Encoding condition
Category
Similarity
No.
Rhyme
%
No.
%
Stimulus conditional[
Method
Stimulus Selection
A total pool of 192 pairs of words was generated, which contained
most of the items from the initial pool used in Experiment 1 plus additional rhyming items that also shared a common semantic category. A
group of five undergraduate judges performed a binary classification of
all pairs from the total pool on the basis of relative quality of the rhyme
and the category membership. Judges had to classify a pair of words
either as constituting a better rhyme than an exemplar for joint category
membership (rhyme dominant) or as a better exemplar of category
membership than rhyme {category dominant).
The final choice of 80 experimental pairs (40 high, 40 low) was made
by the authors and guided by the judges' responses. In particular, an
attempt was made to place as many rhyme dominant pairs as possible
into the low-similarity group. A pair's rhyme dominance score corresponded to the number ofjudges that classified that pair as rhyme dominant. For the 40 low-similarity pairs (which included 11 pairs from Experiment 1) the average rhyme dominance was 2.92, whereas the 40
high-similarity pairs {23 old pairs) were of rhyme dominance 2.00.
To verify that high- and low-similarity pairs differed in semantic similarity six different undergraduate judges rated all items from the set of
80 (plus 10 unrelated, nonrhyming fillers) on the semantic dimension
using a 7-point rating scale. High-similarity items received a mean rating of 4.93, whereas low similarity items were rated 3.17 on the average.
That difference was significant, t(5) = 7,35, p < .0005. The 80 stimulus
pairs for Experiment 2 are presented in Appendix B.
Subjects and Procedure
A total of 16 subjects, recruited from the University of Toronto community, participated in Experiment 2 for a remuneration of $5. The
procedure was identical to that used in Experiment 1, with the exception that subjects had to perform a similarity-rating task on the stimuli
after the cued-recall attempt. Half of all subjects had to rate the items
for the goodness of their rhyme (contrary to Experiment 1, in which
judges had to perform a phonemic similarity rating), the other half of
subjects rated the goodness of category membership of the item pairs.
Both rating tasks involved the usual 7-point scale, and also included 10
filler items.
Results
Rating Scores
We present the analysis of subjects' ratings of the stimuli first
because it provides internal validation for the stimulus selection
used in Experiment 2. The mean ratings for semantic similarity
were 5.46 and 3.90 for high- and low-similarity items, respectively. This difference was significant, t{l) = 6.06, p < .0005.
For the rhyme ratings, the means for high- and low-similarity
pairs were 5.06 and 4.42, which were again significantly different, /(7) = 8.31, p < .0001. The ratings thus support the division
3
Positive
High
Low
Total
Negative13
None
112/160
71/160
183/320
70
44
57
78/160
33/160
111/320
49
21
35
36/320
11
5/320
2
Subject conditional
c
Yes
High
Low
Nod
Total
None
109/148
50/103
159/251
74
49
63
66/127
29/112
95/239
52
26
40
59/387
15
17/391
4
Note, No. refers to the number recalled out of a total number presented.
' Yes response required on orienting list.b No response required on orienting list.c Yes response given on orienting list.d No response given on
orienting list.
of pairs into low- and high-similarity items. Appendix B shows
the category and rhyme ratings given to each pair by the experimental subjects.
Cued Recall
Stimulus conditional. For this analysis pairs were classified
as positives or negatives in the same fashion as in Experiment
I. The upper panel of Table 4 presents the average cued recall
for positives and negatives separately.
For positives, a 2 x 2 within-subjects ANOVA confirmed that
recall was better for the category encoding condition (57%) than
for the rhyme condition (35%), F{1, 15) = 54.00, p < .0001,
MSe = 1.50. The similarity effect (59% recall for high similarity;
33% for low) also was significant F( 1, 15) = 43.44, p < .0001,
MSe - 2.66. Contrary to thefirstexperiment, however, no interaction was apparent, f\\, 15) < 1. This suggests that the stimulus selection had been improved over Experiment 1; the lowsimilarity rhymes in particular resulted in far better recall than
in thefirstexperiment.
For negatives, the large levels-of-processing effect of Experiment 1 was replicated, with category encoding resulting in better recall performance (11% as compared with the rhyme encoding, which yielded 2%). This difference was shown to be significant by a one-way within-subjects ANOVA, F\l, 15) = 22.01,
p<.0003, MSC = 1.37.
Subject conditional. The lower panel of Table 4 shows the
subject-conditional recall performance for Experiment 2 (excluding 12 orienting responses whose latencies exceeded 10 s).
Two features of the data deserve attention: First, the pattern of
450
STEPHAN LEWANDOWSKY AND WILLIAM E. HOCKLEY
founded with the manipulation of the orienting task. This finding allays any fears prompted by Eich's (1985) simulations to
the effect that the typical levels result might be an artifact of
stimulus confounding.
Table 5
Mean Reaction Times (in Seconds) to Orienting
Judgments: Experiment 2
Encoding response
Rhyme
Category
CHARM
Similarity
RT
SE
RT
SE
High"
Low8
Noneb
1.97
2.34
2.16
.21
.27
.20
1.83
i.87
1.87
.18
.21
.23
Note. Means exclude RTs over 10 s.
* Yes response given on orienting list. b No response given on orienting
list.
the subject-conditional analysis was identical to that obtained
by the stimulus-conditional analysis. Second, there was a good
degree of correspondence between the subjects' responses (yes
and no) in the orienting task and the positive and negative classification of the stimulus pairs.
Guessing Rates and Response Times
Guessing rates (shown in the lower panel of Table 2) were
computed in the same fashion as for Experiment 1. As before,
they indicate that the levels-of-processing effect is underestimated by the recall data, whereas the effect of similarity would
be reduced if one adjusted for guessing.
The response times displayed in Table 5 were again classified
by response ("yes" vs. "no"), We only present the trimmed
means (excluding 12 response latencies exceeding 10 s, 7 of
which were no), because the number of outliers was sufficiently
small and because there was no relevant difference between the
patterns of trimmed and untrirnmed means.
The data in Table 5 show that in this experiment, rhyme decisions were faster than category decisions, a result consistent
with thefindingsof Craik and Tulving (1975). Furthermore, the
yes responses for category decisions were faster for high-similarity pairs than for low-similarity pairs. However, the similarity
manipulation had no effect on the response times for rhyme
decisions.
Discussion
Again it seems that a levels-of-processing effect is obtainable
even if encoding manipulation is not confounded with stimulus
selection or stimulus similarity. In contrast to Experiment 1,
however, the effect of the manipulation of levels of encoding
was equal for both high- and low-similarity items (about 22%).
Furthermore, the response time data are in closer agreement
with previous results. Overall, Experiment 2 constitutes a replication—indeed much cleaner and stronger—of the effects
found in our first study.
General Discussion
The present experiments obtained an effect of encoding condition even though the target and its context were not con-
Ourfindingsalso have clear implications for CHARM*S (Eich,
1985) interpretation of the levels-of-processing effect- Eich
equated differences in levels of processing with differences in
similarity between the target and the cue. This approach could
account for parts of the results of earlier studies, but the present
experiments clearly show that it is not the whole explanation.
An alternative for CHARM would be to resort to functional
rather than nominal similarity to explain the levels-of-processing effect. That is, one could argue that in our experiments the
functional intrapair similarity for, say, Hotel-Motel was—despite our validation efforts—greater in the category-judgment
than in the rhyme-judgment condition. By this interpretation,
similarity would not be antecedent to mental processes but
rather a result thereof, and CHARM'S explanation could be salvaged. However, wefindthis argument unacceptable because it
severs CHARM'S connection to the empirical realm. Similarity
could not be assessed empirically, which would render such an
implementation as circular and untestable as the original levelsof-processing proposal.
Circularity by itself does not necessarily render a theory useless, as is shown by the persistent influence of the levels-of-processing framework. Contrary to that framework, however,
Eich's (1985) CHARM implementation is not viable without
contact with the empirical realm. This is because CHARM fails
to predict an effect of encoding manipulation for negative orienting pairs. In previous studies, and indeed in our experiments, the effect on negatives has been clear and unambiguous.
Thus, whereas a testable version of CHARM (using nominal similarity) may have been of some use even though it failed to account for negatives, a circular version of that model (resorting
to functional similarity) would have less merit than the original
levels-of-processing proposal, because that version of CHARM
would still fail to explain negatives but would not even be
testable.
In summary, we accepted Eich*s (1985) invitation that "ordered differences in the similarity parameter should be open to
experimental verification" (p. 27) and ruled out similarity as
the variable responsible for the levels-of-processing effect. It follows that the current version of CHARM'S analysis of levels of
processing is inadequate and incorrect. It also appears that this
version of CHARM is not salvageable by resorting to a concept
of functional similarity. Do our experiments, therefore, unambiguously endorse the levels-of-processing framework and imply that there is no merit in a distributed memory approach to
the levels phenomenon?
Levels of Processing
Although the present results provide support for the levels-ofprocessing framework, this view is also not without problems.
Consider, for example, the data provided by Morris, Bransford,
and Franks (1977) and Stein (1978). Morris et al. found that.
451
LEVELS AND SIMILARITY
although semantic encoding was superior to rhyme encoding
for a standard recognition test, rhyme acquisition was superior
to semantic acquisition when a rhyming recognition test was
used. That is, Train was recognized better than Eagle (given the
encoding sentence frames "The
had a silver engine" and
"
rhymes with regal," respectively) only when recognition was tested using the items presented at encoding. If rhymes
of the words were instead used as targets, Legal would be recognized better than Pain. Similarly, Stein found that an encoding
question concerning cases of letters resulted in better recognition than a semantic acquisition task when a case oriented recognition test was given. Thesefindingsdemonstrate that semantic (or deep) processing need not always result in superior retention. As alternatives to the levels-of-processing framework
Morris et al. and Stein suggested "transfer appropriate processing" and "precision of encoding," respectively.
Although the results of the Morris et al. (1977) and Stein
(1978) studies are problematic for the levels-of-processing
framework, they do not require that the concept of levels be
entirely abandoned. If one considers recall performance between conditions that preserve the encoding context at test (e.g.,
rhyme-rhyme or semantic-semantic), the semantic encoding
condition invariably leads to better performance than the phonemic conditions. Thus, both "transfer-appropriate processing" and the nature (or depth) of encoding are relevant (cf.
Fisher & Craik, 1977).
However, the results of Morris et al. (1977) and Stein (1978),
as well as the criticism of circularity, do show that Craik and
Lockhart's original proposal (1972) requires theoretical clarification and a more precise formulation. A formulation of this
type can only come from formal theoretical development, such
as model building and simulations of theories. Perhaps, therefore, one should search for a different implementation of the
levels-of-processing effect within the framework of distributed
memory models.
ing and retrieval observed in the present experiments, different
assumptions within the same theory could, in principle, prove
to be more successful. One possible modification consists of the
assumption that at encoding the features representing an item
are given more weight under deeper processing conditions. Success of retrieval would be a direct function of the weighting the
features received at the time of encoding.4
Note that this version of a distributed memory system would
predict a levels-of-processing effect that is independent of the
similarity between items and that depends only on the weighting (i.e., the amount of processing) given to an item's features
at encoding. Orthogonal to the levels prediction, such a model
could address similarity effects in the same way as Eich's (1985)
CHARM implementation and could thus also account for the
present data.
Indeed, using a modification of the matched filter model of
Anderson (1973; see also Murdock, 1982), which shares many
basic assumptions with CHARM, Shaw (1986) has demonstrated
that such an implementation successfully simulates the levelsof-processing effect reported by Craik and Tulving (1975).
However, unlike CHARM, Shaw's implementation can account
for the effect on negatives.
Alternative Distributed Memory Implementations
4
In more formal terms, this assumption amounts to a manipulation
of the power of the item vectors across different levels of processing. If
one thinks of the item vectors in geometric terms, the power of a vector
is directly related to its length. For further details, see Murdock (1982).
Craik and Lockhart (1986) state that "the CHARM model
does appear to be on the right track. The holographic approach,
the notion that stimulus properties interact with 'accumulated
knowledge' to form new memory traces, the distributed nature
of trace representation, the parallels between encoding and retrieval—all these aspects of the modelfitcurrent knowledge and
beliefs, and we have no quarrel with them" (p. 360). We agree
with Craik and Lockhart, and furthermore, we consider CHARM
to be an elegant and powerful model. Indeed, the operations of
convolution and correlation that form the basis of CHARM serve
to accentuate those features of items that are similar at encoding and test. Thus, the operations of CHARM would appear to
reflect the current views of the memory system emphasizing the
distinctiveness of the memory trace and the importance of the
interaction of information at the time of encoding and the time
of test. In fact, CHARM'S operations constitute a natural implementation of the transfer-appropriate processing view of Morris
etal.(1977).
Therefore, although Eich's assumptions concerning how features of the target and context are encoded are clearly inadequate and cannot account for the interactions between encod-
Conclusion
We have demonstrated that a strong levels-of-processing
effect can be obtained even if the stimulus material is not confounded with the encoding manipulation. Our data furthermore show that CHARM'S account of the levels-of-processing
effect cannot be correct. However, we believe that a version of a
distributed memory system—as, for instance, implemented by
Shaw (1986)—may provide a viable, quantitative account of the
levels-of-processing effect while also providing a natural mechanism for transfer-appropriate processing and other interactions between encoding and retrieval.
References
Anderson, J. A. (1973). A theory for the recognition of items from short
memorized lists. Psychological Review, 80, 417-438.
Battig, W. F., & Montague, W. E. (1969). Category norms for verbal
items in 56 categories: A replication and extension of the Connecticut
norms. Journal of Experimental Psychology Monographs, 80(3,
Pt. 2).
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: A framework for memory research. Journal of Verbal Learning and Verbal
Behavior, 77,671-684.
Craik, F. I. M., & Lockharl, R. S. (1986). CHARM is not enough: Comments on Eich's model of cued recall. Psychological Review, 93, 360364.
Craik, F.I. M., & Tulving, E. (1975). Depth of processing and the retention of words in episodic memory. Journal of Experimental Psychology: General, 104, 268-294.
Eich, J. M. (1982). A composite holographic associative recall model.
Psychological Review, 89, 627-661.
452
STEPHAN LEWANDOWSKY AND WILLIAM E. HOCKLEY
Eich, J. M. (19,85). Levels of processing, encoding specificity, elaboration, and CHARM. Psychological Review, 92, 1-38.
Epstein, M. L., Phillips, W. U , & Johnson, S. J. (1975). Recall of related
and unrelated word pairs as a function of processing level. Journal of
Experimental Psychology: Human Learning and Memory, 104, 149152.
Fisher, R. P., & Craik, F. I, M (1977). Interaction between encoding
and retrieval operations in cued recall. Journal of Experimental Psychology: Human Learning and Memory, 3, 701-7 U .
Graf, P., & Schacter, D. L. (i 985). Implicit and explicit memory for new
associations in normal and amnesic subjects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 501-518.
Jacoby, L. L., <fc Craik, F. I. M. (1979). Effects of elaboration of processing at encoding and retrieval: Trace distinctions and recovery of initial context. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing and human memory (pp. 1-21). Hillsdale, NJ: Erlbaum.
James, W. (1890). The principles of psychology. New York: Holt.
Lockhart, R. S., Craik, F. I. M., & Jacoby, L. L. (1976). Depth of processing, recognition, and recall: Some aspects of a general memory
system. In J. Brown (Ed.), Recall and recognition (pp. 75-102). London: Wiley.
Morris, C. D., Bransford, J. D., & Franks, J. J. (1977). Levels of processing versus transfer appropriate processing. Journal of Verbal Learningand Verbal Behavior, 16, 519-533.
Moscovitch, M., & Craik, F. I.M.(1976). Depth of processing, retrieval
cues, and uniqueness of encoding as factors in recall. Journal of Verbal Learning and Verbal Behavior, 15, 447-458.
Murdock, B. B., Jr. (1976). Item and order information in short-term
serial memory. Journal of Experimental Psychology: General, 105,
191-216.
Murdock, B, B., St. (1982). A theory for the storage and retrieval of item
and associative information. Psychological Review, 90, 316-338.
Nelson, X O. (1977). Repetition and depth of processing. Journal of
Verbal Learning and Verbal Behavior, 16, 151-172,
Shaw, R. J. (1986). A simulation model of encoding operations in levels
of processing. Manuscript in preparation, University of Toronto, Toronto, Ontario, Canada,
Stein, B. S. (1978). Depth of processing reexamined: The effects of the
precision of encoding and test appropriateness. Journal of Verbal
Learning and Verbal Behavior, 17, 165-174.
Appendix A
Stimuli Used in Experiment I
Negative
Positive pair
Phonemic
Semantic
Global
M
5.25
5.25
6.50
5.50
5.00
5.75
6.50
5.75
5.50
4.75
4.75
5.00
5.75
4.75
5.00
6.00
5.75
5.25
6.00
6.00
6.25
5.25
4.75
5.25
5.50
5.75
6.00
5.50
6.25
5.75
6.25
5.75
5.75
6.50
4,50
4.75
6.00
4.00
4.25
5.00
5.25
4.50
5.25
5.25
4.00
3,25
5.00
4.25
5.50
5.25
4.75
5.00
5.75
4.75
6.50
4.50
4.75
5.00
4.00
4.75
5.00
5.25
5.50
5.25
5.50
5.50
5.50
3.50
5.00
5.17
6.33
4.92
4,67
5.25
5.50
5.00
5.75
5.50
4.50
4.42
5.42
4.58
5.67
5.75
5.08
5.00
5.92
5.42
6.58
4.92
4.83
5.00
4.92
5.42
5.50
5.50
5.75
5.92
5.83
5.50
5.42
4.92
High-similarity pool
Argon
Barium
Breast
Cinnamon
Congressman
Cornet
Cougar
Dahlia
Deer
Door
Ebony
Edmonton
Escalator
Falcon
Flute
Gasoline
General
Governor
Guitar
Gully
Hotel
Houston
Hydrogen
Leukemia
Lithium
Mackerel
Magnesium
Mandolin
Millipede
Mother
Nectarine
Nylon
Oceanography
Pastor
Boron
Cesium
Chest
Cardamom
Alderman
Clarinet
Jaguar
Magnolia
Steer
Floor
Ivory
Washington
Elevator
Heron
Lute
Kerosene
Admiral
Senator
Sitar
Valley
Motel
Boston
Oxygen
Hemophilia
Uranium
Pickerel
Potassium
Violin
Centipede
Brother
Tangerine
Rayon
Geography
Rector
Bigamy
Brandy
Limbo
Broccoli
Orion
Raspberry
Chapel
Clipper
Venice
Tango
Cuckoo
Vicar
Giraffe
Grater
Frog
Hockey
Curling
Geology
Journal
Bison
Moose
Apricot
Leprosy
Martini
Canvas
Lemon
Monkey
Perjury
Petunia
Potato
Peony
PoppyPercolator
Panda
5.25
5.50
6.50
5,25
4.75
5.00
4,75
4.75
6.50
6,50
4.75
5.00
5.50
4.75
6.50
6.00
4,75
4.75
6.00
5.50
7.00
5.00
5,00
4.75
5.25
5.75
5.50
5.75
6.00
6.75
5.75
5.25
5.00
4.75
453
LEVELS AND SIMILARITY
Appendix K (continued)
Negative
Positive pair
Phonemic
Semantic
Global
M
5.75
5.75
6.00
5.25
6.00
6.00
5.63
5.50
5.25
4.75
4.75
5.75
4.00
4.92
5.33
5.67
5.25
5.50
5.58
5.25
5.34
4.00
3.50
3.50
3.25
4.50
4.25
4.25
4,00
2.50
3.75
4.25
3.25
4.00
3.50
3.75
3.25
4.25
3.75
3.25
4.25
2.00
4.00
3.25
3.75
3.75
3.75
3.50
3.00
4.00
4.25
3.50
4.50
3.75
3.00
4.00
4.00
3.75
4.25
4.00
3,25
3.71
3.50
3.00
3.25
2.50
3.50
3.75
4.50
2.25
2.75
3.75
3.50
2.75
3.25
3.50
3.50
3.50
3.25
3,00
3.25
3.75
2.50
3.25
3.25
2.25
3.75
3.75
4.25
2.50
2.50
3.75
4.00
3.50
4.00
3.25
3.50
4.00
3.75
2.75
3.25
3.75
3.34
3.67
3.50
3.33
2.92
4.00
4.00
4.17
3.25
3.17
3.92
4.08
3.33
3.67
3.83
3.42
3.50
3-33
3,75
3.50
4.00
2.67
3.83
3.25
3.33
3.75
3.50
4.00
2.75
3.50
3,92
3-92
4.08
4.08
2.92
3.83
4.00
3.92
3.25
3.67
3.58
3.60
High-similarity pool
Pence
Propane
Rumba
Slug
Swallow
Typhoon
M across items
Cents
Butane
Samba
Bug
Sparrow
Monsoon
Denim
Salmon
Shirt
Wall
Weevil
Spinach
4.75
6.00
5.00
6.50
5.00
5.75
5.47
Low-similarity pool
Aspen
Banker
Canary
Celesta
Chicago
Cloudy
Cotton
Crappie
Crater
Deacon
Dentist
Doctor
Elbow
Farmer
Ferry
Gnu
Gorilla
Hornet
Judo
Kettle
larceny
Lavender
Liver
Malaria
Mango
Merchant
Physics
Plateau
Play
Poetry
Riding
Rupee
Scarlet
Squirrel
Tanker
Tuba
Tuna
Tundra
Ulcer
Vanilla
M across items
Linden
Teacher
Turkey
Harmonica
Tokyo
Sunny
Dacron
Guppy
Geyser
Canon
Pharmacist
Sailor
Torso
Plumber
Dinghy
Kangaroo
Zebra
Cricket
Poio
Griddle
Sodomy
Amber
Finger
Gonorrhea
Avocado
Accountant
Genetics
Volcano
Essay
Story
Diving
Guinea
Russet
Camel
Schooner
Marimba
Piranha
Delta
Fever
Paprika
Whiskey
Cairo
Poplar
Cedar
Chiffon
Cathedral
Celery
Cruiser
Cherry
Disco
Drought
Dallas
Fox
Cheetah
Friar
Hip
Melon
Hunting
Lion
Dolphin
Marl in
Kumquat
Daiquiri
Madras
Donkey
Goose
Daisy
Minister
Cello
Dagger
Raven
Ruler
Savory
Strainer
Turin
Train
Wrench
Sock
Storm
Surfing
3.50
4.00
3.25
3.00
4.00
4.00
3.75
3.50
4.25
4.25
4.50
4.00
3.75
4.50
3.00
3.75
2.50
4.50
4.00
4.00
3.50
4.25
3.25
4.00
3.75
3.00
4.25
2.75
4.00
3.75
4.25
4.25
4.50
2.50
4.00
4.00
4.25
2.75
3.75
3.75
3.76
454
STEPHAN LEWANDOWSKY AND WILLIAM E. HOCKLEY
Appendix B
Stimuli Used in Experiment 2
Mean rating
Positive pair
Negative
Category
Rhyme
5.88
5.50
6.50
5.75
2.88
4.75
4.50
4.75
4.75
5.38
5.13
5.25
5.13
6.13
5.88
5.75
3.75
5.88
6.50
6.25
5.25
6.25
5.88
6.00
6.00
6.25
5.63
4.75
5.13
5.00
5.88
6.38
5.88
5.63
6.38
5.50
5.00
6.38
4.38
4.63
5.46
1.38
6.00
5.88
4.63
4.63
6.75
6.63
4.75
4.75
5.00
3.38
2.25
6.75
5.25
3.75
6.75
6.38
2.63
6.88
4.63
6.75
5.38
4.63
6.88
5.25
3.25
4.38
5.50
5.44
6.75
4.63
4.88
4.50
6.63
2.75
5.00
5.75
5.38
3.00
6.75
5.06
5.75
5.25
2.63
4.50
3.13
5.00
4.25
1.75
4.63
5.00
3.88
5.25
4.13
4.38
3.25
3.75
5.25
5.63
6.25
0.88
3.00
5.44
3.06
3.44
6.88
6.38
4.00
2.38
6.88
6.88
High-similarity pool
Adultery
Bee
Breast"
Cherry
Chive
Deer*
Doora
Duet
Eglintonb
Escalator3
Evangelist
Falcon8
Flute"
Gasoline3
General*
Grime
Guitar"
Gullva
Hotel" a
Hydrogen
Leer
Mackerel3
Mandolin"
Mother"
Nectarinea
Nylon3
Pence8
Plaster
Potato
Preacher
Princess
Propane3
Rumba*
Slug"
Swallow"
Terrarium
Train 3
Typhoon
Vesicle
Wine
M across items:
Bigamy
Flea
Chest
Raspberry
Endive
Steer
Floor
Minuet
Ossington
Elevator
Revivalist
Heron
Lute
Kerosine
Admiral
Slime
Sitar
Valley
Motel
Oxygen
Sneer
Pickerel
Violin
Brother
Tangerine
Rayon
Cents
Alabaster
Tomato
Teacher
Duchess
Butane
Samba
Bug
Sparrow
Aquarium
Plane
Monsoon
Ventricle
Dine
Cougar
Volcano
Poetry
Panda
Sander
Root
Lemon
Spoon
Elbow
Cloudy
Skirt
Wall
Ivory
Cedar
Uranium
Canary
Socialist
Rector
Centipede
Beetle
Celery
Brandy
Bus
Hockey
Journal
Petunia
Salmon
Ferry
Plateau
Leukemia
Dolphin
Farmer
Parsley
Stereo
Clock
Waltz
Ashtray
Revolver
Church
Pamphlet
Low-similarity pool
Aluminum
Arthritis
Cheek
Chin
Crater8
Dentist3
Diarrhea
Drought
Emu
Fox
Platinum
Bronchitis
Beak
Skin
Geyser
Pharmacist
Gonorrhea
Waterspout
Cuckoo
Ox
Hog
Frog
Giraffe
Grater
Calf
Beater
Hip
Jam
Lip
Ham
Savory
Piano
Watch
Sunny
Desk
Rugby
Denim
Chair
Doctor
Cathedral
Larceny
Harmonica
Pigeon
Radio
Car
455
LEVELS AND SIMILARITY
Appendix B (continued)
Mean rating
Positive pair
Negative
Category
Rhyme
4.25
2.75
2.00
2.75
6.88
6.88
438
2.63
6.75
4.75
4.25
4.88
6.88
5.88
6.50
6.75
2.38
2.38
1.63
0.88
3.88
0.63
1.00
5.63
6.75
6.75
4.42
Low-similarity pool
Judo*8
Liver
Merchant
Moose
Mice
Musician
Pajama
Pencil
Perjury
Play"
Refrigerator
Sailor
Sleet
Slipper
Sock
Squirrel11
Tubaa
Tuna"
Tundra*
Udder
Ulcera 8
Vanilla
Warm
Wheel
Wrench
M across items
Polo
Finger
Accountant
Goose
Lice
Electrician
Bandana
Stencil
Forgery
Essay
Percolator
Tailor
Heat
Zipper
Frock
Camel
Marimba
Piranha
Delta
Bladder
Fever
Paprika
Storm
Reel
Bench
Hornet
Kangaroo
Blender
Spittoon
Saxophone
Canvas
Bison
Kitchen
Drizzle
Hemophilia
Buggy
Tango
Magazine
Broccoli
Leprosy
Apartment
Torso
Daiquiri
Senator
Surfing
Raven
Poplar
Marlin
Gorilla
Cheetah
5.25
3.75
4.38
3.25
3.13
3.38
5.63
4.25
4.50
3.13
2.75
2.75
4.50
4.25
1.50
4,50
3.50
3,63
4.00
4.63
2.63
5.25
2.50
3.90
' Pair used in Experiment 1 also.h Pair of Toronto street names.
Received January 17, 1986
Revision received August 27, 1986
Accepted September I, 1986