Memory Processes in Frequency Judgment: The impact of

Memory Processes in Frequency Judgment:
The impact of pre-experimental frequencies and
co-occurrences on frequency estimates.
Dissertation
zur Erlangung des akademischen Grades
doctor rerum naturalium (Dr. rer. nat.)
vorgelegt der Philosophischen Fakultät der
Technischen Universität Chemnitz
von Frank Renkewitz, geboren am 09.10.1970 in Issum
Chemnitz, den 4. Februar 2004
Vorwort
Bei der Arbeit an dieser Dissertation unterstützten mich viele Menschen in
vielfältiger Weise. Ich möchte nicht versäumen, Ihnen zu danken. Herr Prof. Dr. Peter
Sedlmeier und Herr Prof. Dr. Manfred Wettler hatten wesentlichen Anteil daran, dass
ich diese Dissertation in einer stets angenehmen Atmosphäre erarbeiten und fertig
stellen konnte. Bei inhaltlichen und organisatorischen Problemen, die mich bei dieser
Arbeit ereilten, waren sie jederzeit ansprechbar. Petra Seidensticker schulde ich Dank
für die Programmierung der Simulationen, die in Kapitel 3 berichtet werden.
Alexander
Bischof
unterstützte
mich
bei
der
Programmierung
der
Versuchssteuerungen zu einigen der Experimente. Bei der Durchführung der
Untersuchungen halfen mir Conny Belger, Sonja Kunze, Susann Porstmann und Maya
Reimer. Johannes Hönekopp, Andreas Keinath, Martin Baumann und Gesine
Heisterkamp danke ich für ihre Bereitschaft zu etlichen und sicher ermüdenden
Diskussionen zum Thema Häufigkeitsschätzungen, vor allem aber für ihre äußerst
freundschaftliche Unterstützung bei allen Aspekten dieser Arbeit. Von ihren
Anregungen, Korrekturen und Aufmunterungen hat diese Dissertation zweifellos sehr
profitiert. Schließlich gilt mein Dank Birgit Aufderheide sowie Dieter und Christa
Renkewitz, deren Hilfe über meine Mühen mit dieser Dissertation weit hinausreicht
und mich stets begleitet.
Contents
CHAPTER 1:
JUDGMENTS ON FREQUENCIES: INTRODUCTION AND OVERVIEW …....…...
1
CHAPTER 2:
IS THE FAMOUS-NAMES EFFECT CAUSED BY DIFFERENCES IN THE
PRE-EXPERIMENTAL FREQUENCIES OF THE NAMES?
A TEST OF A HYPOTHESIS OF MINERVA-DM AND PASS ……………...
8
Summary ..….……………...…………………………………………………….
8
Introduction ………………..….…………………………………………………
9
Demonstrations of Faulty Estimates ………………………………….….…..….
10
The Availability Heuristic as an Explanation for Biased Estimates ….….….…..
12
Memory Models of Frequency Judgment ……………………………………….
14
Biased Estimates Reconsidered …………………………………………………
16
An Alternative Account for the Famous-Names Effect …………………….…..
18
Overview of the Experiments …………………………………………….……..
26
Experiment 1 …………………………………………………………………….
Method …………………………………………………………………….
Predictions of the Memory Models ……………………………………….
Results ……………………………………………………………………..
Discussion …………………………………………………………………
28
28
31
34
40
Experiment 2 …………………………………………………………………….
Method …………………………………………………………………….
Predictions of the Memory Models ……………………………………….
Results ……………………………………………………………………..
Discussion ……………………………………………………………..…..
44
44
46
46
49
Experiment 3 …………………………………………………………………….
Method …………………………………………………………………….
Predictions of the Memory Models ……………………………………….
Results ……………………………………………………………………..
Discussion …………………………………………………………………
53
53
55
57
60
Experiment 4 ………………………………………………………….…..……..
Method …………………………………………………………………….
Predictions of the Memory Models …………………..…………………...
Results……………………………………………………………………...
Discussion ……………..………………………………….……………….
64
64
66
66
69
General Discussion …………………………..……….…….…….…….……….
71
Conclusion ………………………………………………………..……………..
82
CHAPTER 3:
EFFECTS OF ASSOCIATIONS ON FREQUENCY JUDGMENTS:
SIMULATIONS WITH PASS AND EMPIRICAL RESULTS .......……….……...
84
Summary ...………………………………………………………………………
84
Introduction …………………...…………………………………………………
What Kinds of Frequency Judgments can be modeled with PASS? ..…….
85
87
The PASS-Model ...……………………………………………….……………..
FEN – Architecture and Learning Rule .…………………………………..
The Generation of Frequency Judgments ...……………………………….
Former Simulations with PASS ...…………………………………………
90
91
93
95
Overview of the Simulations and Experiments ....………………………………
96
Study 1 ...……………………………………..………………………………….
Method ………………………………………………….…………………
Simulation Method ...…………….………………………………………..
Simulation Results ...…..…………………………………………………..
Experimental Results ……………...………………………………………
Discussion …….…………………………………………………………...
100
100
103
107
115
129
Study 2 …….…………………………………………………………………….
Method …………………………………………………………………….
Simulation Method and Results ….………………………………………..
Experimental Results …...……………………….………………………...
Discussion …………………………………………………………………
137
137
139
144
155
General Discussion ……..……………………………………………………….
The Explanatory Power of PASS ..……….……………………………….
What about other Memory Models of Frequency Judgment? ..……...……
What Problems Remain in Predicting Frequency Estimates with PASS? ...
Beyond Memory Assessment: Transforming Intuitions into Absolute
Frequency Judgments ……..………………………………………………
159
160
161
163
165
CHAPTER 4:
SUMMARY AND OUTLOOK …….….….………………………………………….. 168
Can Memory Models Account For the Famous-Names Effect? …....…………... 169
The Influence of Associations on Frequency Judgments ..….…...……………...
REFERENCES
171
173
CHAPTER 1
Judgments on Frequencies: Introduction and
Overview
Psychologists have shown a long-standing interest in people’s estimates of
absolute and relative frequencies and the processes underlying such estimates. One
motivation for this interest can be seen in the fact that judgments about frequencies
permeate our daily life. For instance, to select the appropriate rate of an Internet
provider, we have to assess how often we usually use the Internet. Before we decide to
leave the umbrella at home, we may wonder how often it rains in the afternoon
although the sky is blue in the morning. Before we take out an insurance, we will
probably consider how often the misadventure in question may occur. Judgments on
the frequency of certain students’ mistakes may influence the teacher’s lesson plan. A
doctor will often ask about the frequency of particular symptoms. Her diagnosis will
be influenced by how often the symptoms under consideration have indicated a
specific disease in her professional practice (Eraker & Politser, 1988; Weber et. al,
1993).
Some of these examples already point to a second motivation for the interest in
frequency processing: The experience of frequencies plays a significant role in
numerous areas of thinking and judgment. To highlight the diversity of the cognitive
processes that are affected by frequencies let us name a few of them. The frequency of
jointly occurring objects or features of objects is the decisive input for categorization
achievements and contingency learning (Hintzman, 1988; Shanks, 1995, Smith &
Medin, 1981). Probability judgments can be based on the frequency with which certain
events occurred in the past (Reyna & Brainerd, 1994; Sedlmeier; 1999; Tversky &
Kahneman, 1973). In turn, judgments on probabilities and relative frequencies are an
essential determinant of decisions (e.g. Yates, 1990). Valence judgments on a neutral
stimulus are influenced by its presentation frequency (Zajonc, 1968; Bornstein,1989).
1
The extent of happiness in life depends, among other things, on the perceived
frequency of positive and negative life events (Diener, Sandvik & Pavot, 1991). As a
last example, frequencies affect various social judgments such as attributions (Kelley,
1973; Cheng, 1997; Cheng & Novick, 1992) or stereotyping (Hamilton & Gifford,
1976; Schaller & Maas, 1989; Schaller et al., 1996).
According to the manifold effects of frequencies, the question of how people
process frequencies and how they judge them on the basis of this processing was
examined in various sub-disciplines of psychology. Thus, there are numerous studies
on frequency judgments in experimental social psychology (e.g. Betsch et al., 1999;
Manis et al., 1993; Schwarz et al., 1991), in developmental psychology (e.g. Fischbein,
1975; Ginsburg & Rapoport, 1967; Huber, 1993; Kuzmak & Gelman, 1986) or in the
literature on survey methods (e.g. Blair & Burton, 1987; Conrad, Brown, & Cashman,
1998; Menon, Raghubir, & Schwarz, 1995). However, the longest tradition of research
on frequency judgment can be found in the area of judgment and decision-making and
in the literature on memory and learning. Unfortunately, methods, theoretical
assumptions and results from one of those areas have often been neglected in the other.
This led to the surprising fact that completely contradictory conclusions have been
achieved already with questions the empirical clarification of which may appear
simple at first sight. The most prominent example of this kind concerns the question
about the accuracy of frequency judgments.
A particular interest in the quality of frequency judgments chiefly existed in the
area of decision-making. Here, research on frequency estimation was heavily
influenced by the heuristics and biases program (Kahneman, Slovic, & Tversky, 1982).
At the core of this program was the assumption that people use a toolbox of heuristics
to solve the complex tasks of generating judgments on frequencies or probabilities and
of making predictions on uncertain quantities. In order to explain frequency judgments
on serially encoded events, which are in the focus of this thesis, it was assumed for the
most part that the availability heuristic is selected from the toolbox (e.g. Tversky &
Kahneman, 1973; Beyth-Marom & Fischhoff, 1977; Lichtenstein et al., 1978).
According to the availability heuristic, judgments about the frequency of an event can
be based on the ease with which instances of that event can be brought to mind
2
(Tversky & Kahneman, 1973). Although it was assumed that the heuristic can lead to
reasonable estimates under favorable conditions, the typical procedure to test the
availability hypothesis was to create an experimental situation in which the heuristic
predicted that the estimates would not conserve the rank ordering of the actual event
frequencies and to check whether the predicted error occurs. This procedure produced
several impressive demonstrations of biased frequency judgments. However, another
consequence of this test strategy was, that evidence for the heuristic almost exclusively
stemmed from faulty estimates. Starting from these demonstrations of biases, the
conclusion was drawn in the literature on judgment and decision-making that
frequency estimates were generally error-prone. This conclusion gained enormous
popularity far beyond the heuristics and biases program and entered numerous
textbooks on cognitive psychology (e.g. Mayer, 1992; Anderson, 1996; Yates, 1990)
and social psychology (e.g. Fiske & Taylor, 1991).
For a long time, the point of view that frequency judgments are susceptible to
biases remained astonishingly unaffected by the fact that in the literature on memory
and learning a huge amount of empirical studies appeared in parallel in which the
participants showed a remarkable sensitivity to frequencies. Accurate frequency
judgments were found for various events from everyday life such as letters, names,
words, professions or karate techniques within training sessions (Attneave, 1953;
Bedon & Howard, 1992; Shapiro, 1969; Tryk, 1968; Zechmeister et al., 1975). The
same result was found to be true in laboratory studies in which, even under aggravated
conditions, reasonable estimates were regularly given on the experimentally varied
frequencies of stimuli previously presented on lists (e.g. Greene, 1984). From the
multitude of corresponding results a consensus arose in the literature on memory and
learning which exactly contradicted the conclusion from the research on judgment and
decision-making: It was generally assumed that frequency judgments ordinarily reflect
the actual frequencies of the events in question fairly accurately (e.g. Peterson &
Beach, 1967; Howell, 1973; Estes, 1976; Jonides & Jones, 1992). Hasher and Zacks
(1984) presumably adopted the most explicit point of view of this kind formulating the
“automatic coding hypothesis“. In this approach, frequency was regarded as one of the
few attributes of events or objects registered automatically. According to this, the
3
encoding of frequency information requires little or no attentional capacity and works
without any awareness and intention (s. also Hasher & Zacks, 1979; Zacks, Hasher, &
Sanft, 1982; Zacks & Hasher, 2002). In this view, people will obviously not have to
rely on heuristics to generate frequency judgments since the required information will
regularly be stored in memory.
Within the memory literature several precise and computational models were
suggested that allow the prediction of frequency judgments (e.g. Dougherty et al,
1999; Gillund & Shiffrin, 1984; Hintzman, 1988; Murdock, 1993; Sedlmeier, 1999).
All these models incorporate the assumption that every event encoded with a minimum
of attention causes a modification of memory representation. Accordingly, in these
models the state of the memory representation at the time of judgment can be used to
produce a signal, the intensity of which depends on the frequency with which the event
in question was encoded. Although the models differ clearly regarding their
architecture, their specific representation assumptions and the processes with which
the memory representation is accessed, they therefore make the same basic prediction:
Frequency judgments will mostly reflect the actual frequencies fairly well. In all
models, this prediction is subject to one general restriction. Since the memory
representation of an event is not a perfect copy of this event, regression to the mean is
expected. According to this, small frequencies should be overestimated and large
frequencies underestimated. In fact, regression was observed in numerous studies (e.g.
Attneave, 1953; Fiedler, 1991; Hintzman, 1969; Varey et al., 1990). Nowadays there is
nearly no dissent any longer that the fundamental predictions of the memory models
describe the typical finding to frequency judgments correctly (e.g. Sedlmeier et al.,
2002; Fiedler, 1996): estimates are mostly calibrated fairly well, but regressed.
However, this does not mean that the models assume that the memory for
frequencies is invariant. On the contrary, it is expected that factors affecting the
encoding or representation of events also have an effect on frequency judgments on
these events. In the research to memory models, recognition judgments have often
been regarded as a special case of frequency judgments (in which only the frequencies
0 and not-0 are differentiated). Corresponding to the expectation of the models, indeed
several factors were identified that exert a reliable influence on both kinds of
4
judgments such as item similarity (Hintzman, 1988), depth of encoding (Greene, 1988)
or context effects (Brown, 1995). Moreover, recency effects (Lopez et al., 1998;
Sedlmeier, 1999), list length and spacing effects (Hintzman, 1988) in frequency
judgments were successfully predicted or explained with memory models. Finally, also
phenomena beyond simple frequency or recognition judgments could be simulated,
such as people’s confidence in their frequency estimates. (Sedlmeier, 1999)
All together, research to frequency judgments within the memory paradigm
was both empirically and theoretically very fertile. Empirically, researchers succeeded
in finding a series of stable phenomena that allow conclusions on how memory
processes frequencies. Theoretically, models of memory were developed that are able
to account for these phenomena and that have an explanatory scope which is often not
confined to frequency estimates. Contemporary research within the memory paradigm
primarily aims at the further evaluation of these models. This is mainly evidenced by
studies in which new predictions about memory factors influencing frequency
estimates are derived and tested (e.g. Murdock, 1999; Hintzman, 2001; Dougherty &
Franco-Watkins, 2003).
For a long time, however, the theoretical approaches from memory research
were not applied to the biases described in the literature on judgment and decisionmaking. This has changed only recently. Later memory models no longer ignored the
demonstrations of faulty estimates from the heuristics and biases program. Dougherty,
Gettys and Ogden (1999) suggested MINERVA-DM. The aim of this model is to
preserve the advantages of earlier memory models and to explain a multitude of
judgmental biases at the same time. Also within the PASS model (Sedlmeier, 1999)
attempts have been made to explain some of the demonstrations of faulty frequency
estimates on serially encoded events by memory processes. These integrative
approaches thus challenge the assumption that the biased frequency estimates observed
in the judgment and decision-making literature must be ascribed to the use of the
availability heuristic.
The present dissertation includes two series of studies that fit into the two
trends in memory research to frequency estimates described above: In chapter 2 I
tested a memory-model-based explanation of one of the best known and most quoted
5
biases, the famous-names effect. Tversky and Kahneman (1973) could demonstrate
that after the presentation of a list with a similar number of female and male names,
the frequency of that sex is overestimated that was represented on the list with more
famous names. Since the more famous names were also recalled better, this effect was
usually explained with the availability heuristic An alternative explanation was
recently suggested both in MINERVA-DM (Dougherty et al., 1999) and in PASS
(Sedlmeier, 1999). In both models it is assumed that the property of famous names
eliciting the bias is their higher pre-experimental frequency of occurrence. The
memory models suggest furthermore that people are not able to discriminate perfectly
between different encoding contexts of the same event. Thus, if the memory
representation is accessed to estimate the experimental frequency of famous names,
then the resulting memory signal will be influenced by the extra-experimental
occurrence of the names. More famous names thus elicit a stronger signal than less
famous names. In this view, the assumption that single exemplars are retrieved is not
necessary to explain the famous-names effect. An implication of this explanation is
that an analogous effect should occur with other stimulus material than famous names:
If the frequency of two categories is to be judged, then that category should be
overestimated the exemplars of which occur more often outside the target context. I
tested this hypothesis in four experiments. The predictions of the memory models were
compared with the predictions of a recall-estimate version of the availability heuristic.
This procedure allows a conclusion whether the occurrence of a bias is indeed caused
by the processes described in the memory models or by the use of the availability
heuristic.
In chapter 3 new predictions about a factor influencing the memory for
frequencies are derived from the PASS model. Earlier studies indicate that estimates
on the experimental frequency of a word increase with the number of jointly presented
associated words (Leicht, 1968; Shaughnessy & Underwood, 1973; Vereb & Voss,
1974). PASS can offer an explanation for the effect of the factor ‘association’. In this
memory model the ability to judge the frequencies of serially encoded events is
regarded as a by-product of associative learning. Accordingly, knowledge about the
frequencies of events is represented in memory in the form of associative strengths
6
between features constituting these events. The acquisition of these associative
strengths is modeled by a neural net that successively encodes events. In this process,
PASS also acquires associations between events (or objects). Sedlmeier, Wettler and
Seidensticker (2004) could now demonstrate that this property of the model can be
successfully used to predict human primary associations to stimulus words if the model
encoded an adequately large amount of text beforehand. This indicates that PASS can
map the associative strengths between words acquired by the participants already
before the experiment. Other memory models on frequency judgments do not dispose
of this ability. To the best of my knowledge, PASS is thus the only model able to
describe the acquisition of associations between words, the experimental encoding of
these words and, finally, the generation of frequency estimates on these words.
In simulations with PASS, I used this capability of the model to gain precise
predictions about the effect of associations between words on frequency estimates.
These predictions were tested in two experiments in which the participants were
presented with lists containing words that were either strongly or weakly associated
among each other. The results indicate that associative learning is more than just a
plausible candidate for the memory processes underlying frequency estimates.
In chapter 4 the results of both series of studies are summarized briefly and
some tentative guidelines for future research are suggested.
7
CHAPTER 2
Is the Famous-Names Effect Caused by Differences in
the Pre-Experimental Frequencies of the Names?
A Test of a Hypothesis of MINERVA-DM and PASS.
Summary
One of the best-known demonstrations of biased frequency judgments is the
famous-names effect (Tversky & Kahneman, 1973). If participants are presented with
a list of names of female and male persons, then the frequency of that sex is
overestimated that was represented with more famous names on the list. Usually, this
finding is explained with the availability heuristic. According to this heuristic,
frequency judgments are based on the ease-of-retrieval of relevant exemplars. An
alternative explanation of the effect was suggested within the framework of the models
MINERVA-DM (Dougherty, Gettys, & Ogden, 1999) and PASS (Sedlmeier, 1999).
These models postulate that frequency judgments are generated with a retrieval-free
memory assessment process. It is assumed in both models that the different preexperimental frequency of famous and non-famous names is responsible for the effect.
If this is correct, the famous-names effect should generalize to any stimulus material:
When the frequency of two categories in a target context is judged, the frequency of
that category should be overestimated the exemplars of which occur more often
outside the target context. We tested this hypothesis in four experiments. The
predictions of the memory models were compared with the predictions of a recallestimate version of the availability heuristic. We found that differences in the
frequencies of exemplars are not sufficient to produce a bias in category size
judgments. The results suggest furthermore that an overestimation of a category only
occurred if the participants based their judgments at least partially on the number of
retrieved exemplars. However, the participants did not do so in all experiments. We
conclude that the entire result pattern is explained best within a multiple strategy
perspective (Brown, 2002) and name factors that may foster the use of a recallestimate strategy.
8
Introduction
How accurately can people estimate the absolute and relative frequencies of
events? Surprisingly, in the literature two contrasting answers to this question can be
found. In the area of research on judgment and decision-making, frequency estimates
are generally described as error-prone. This point of view originated from the
heuristics and biases program (Kahneman, Slovic, & Tversky, 1982) in which several
experimental demonstrations of faulty estimates were produced (e.g. Tversky &
Kahneman, 1973; 1974). Theoretically, these demonstrations were considered as
evidence for the assumption that people use heuristics to simplify and solve the too
demanding task of frequency estimation. Both the conclusion that frequency
judgments will often be biased and the explanation that this is caused by the
application of heuristics were widely perceived and gained acceptance in various subdisciplines of psychology.
However, a completely different point of view on the accuracy of frequency
estimates and the processes underlying such estimates can be taken from the literature
on memory and learning. Here, frequency judgments were mostly explained with
precise memory models (e.g. Hintzman, 1969; 1988; Murdock, 1993; Dougherty,
Gettys, & Ogden, 1999; Sedlmeier, 1999). In these models it is generally assumed that
the memory representation of an event changes with the frequency with which the
event was encoded. Furthermore it is assumed that people are usually able to access
the memory representation in a way that allows them to generate frequency estimates.
Thus, according to the memory models, it is at least not necessary to apply heuristics.
Instead people should mostly be able to estimate frequencies with little effort (e.g.
Brown, 2002) and fairly accurately (e.g. Hasher & Zacks, 1984).
Corresponding to the assumptions of the models, a multitude of studies were
published in memory literature in which the participants showed a remarkable
sensitivity to frequencies. This was valid both for estimates on naturally varying
frequencies from everyday life (e.g. Attneave, 1953; Bedon & Howard, 1992; Shapiro,
1969; Tryk, 1968; Zechmeister et al., 1975) and for estimates on experimentally varied
frequencies in laboratory studies. (e.g. Greene, 1986; Jonides & Naveh-Benjamin,
1987; Hockley & Christi, 1996). Furthermore, laboratory studies showed that quite
9
accurate estimates can be expected both in so called frequency of occurrence
judgments (e.g., Flexser & Bower, 1975; Hasher & Chromiak, 1977; Hintzman, 1969;
Hintzman & Block, 1971) and in so called set size judgments (e.g., Alba, Chromiak,
Hasher, & Attig, 1980; Barsalou & Ross, 1986; Brooks, 1985; Watkins & LeCompte,
1991). In frequency of occurrence judgments the required estimate refers to the
repeated occurrence of the same item (e.g. “How often did the word “oak“ appear in
the list you just studied?“). In set size judgments the correct frequency estimate results
from the number of presented diverse exemplars of a given category (e.g. “How many
different kinds of trees did appear on the list?”).
Consequently, within the memory literature the findings on frequency estimates
were often summarized in a way that was opposed to the conclusions from the area of
judgment and decision-making. It was generally inferred that people are mostly able to
judge event frequencies with great fidelity (e.g. Peterson & Beach, 1967; Howell,
1973; Estes, 1976; Hasher & Zacks, 1984; Jonides & Jones, 1992).
Demonstrations of Faulty Estimates
In consideration of the recurrently observed valid frequency estimates, one may
wonder how the assumption that frequency estimates were generally error-prone could
gain acceptance in many areas of psychology. Part of the answer can presumably be
found in the great popularity of the comparatively few demonstrations of inaccurate
estimates. Below, three of these demonstrations are described.
In the letter study Tversky and Kahneman (1973) asked their participants to
estimate the relative frequency of certain letters in the first and third position of words
of the English language. The participants should first of all specify whether a given
letter appears more often in the first or third place. Afterwards, they should estimate
the ratio of the frequencies of occurrence of the letter in question in both positions.
Tversky and Kahneman used the five consonants k, l, n, r and v which all, in contrast
to the majority of consonants, appear more often in the third place. However, the
participants did not recognize this. For all five letters the majority of participants
estimated that they occur more frequently in the initial position. The median of the
estimated ratio of frequencies in the first and third place was 2 : 1 for all letters.
10
Another study in which the participants at least partially gave inaccurate
estimates was conducted by Lichtenstein, Slovic, Fischhoff, Layman and Combs
(1978). These authors asked their participants for judgments on the frequencies of 41
different causes of death. Generally, the participants demonstrated sensitivity to the
frequencies of these causes of death – the median of the correlations of the individual
estimates with the actual frequencies amounted to approximately r = .70 in several
experimental groups. However, the estimates nevertheless deviated systematically
from the normative responses in two different respects. First, the estimates were
regressed, that is, large frequencies were underestimated and small frequencies
overestimated. Regression was also observed in many other older as well as more
recent studies (e.g. Attneave, 1953; Erlick, 1964; Greene, 1984; Varey, Mellers, &
Birnbaum, 1990) and has to be generally regarded as a restriction towards the view
that frequency estimates corresponded fairly precisely to the actual frequencies.
Therefore, the finding that the estimates for some causes of death also consistently
deviated from the regression line in several experimental groups was far more
surprising. The frequencies of deaths caused by catastrophic events mostly reported
intensely in the media were overestimated compared to the values predicted by the
regression. Examples are tornados, floods, homicides but also cancer. Less salient
causes of death, such as asthma, diabetes or strokes, on the other hand, received lower
estimates than predicted.
A third very well known demonstration of biased estimates is the famousnames study (Tversky & Kahneman, 1973). In this study the participants were
presented with a list of 39 names. The list either contained 19 names of very famous
men and 20 names of less famous women or – exactly the reverse – 20 names of less
famous men and 19 names of very famous women. Tversky and Kahneman
exclusively asked for an ordinal frequency judgment after the list presentation. In other
words, the participants should indicate whether the list contained more men’s or
women’s names. This frequency judgment turned out unambiguously clear-cut: 81%
of the participants judged erroneously that the sex represented with more famous
names on the list appeared more often.
11
The Availability Heuristic as an Explanation for Biased Estimates
Tversky and Kahneman (1973; 1974) suggested several heuristics to account
for different phenomena observed in judgments on frequencies and probabilities and in
predictions of uncertain quantities. For the case of frequency judgments on serially
encoded events it was mostly assumed that people rely on the availability heuristic.
This was also true for the demonstrations of faulty estimates reported here.
According to the availability heuristic, frequency judgments can be obtained by
assessing the “ease of retrieval”. Thus, when this heuristic is applied frequency
estimates on an event or a category should be based on how easily relevant instances
or exemplars can be recalled from memory. The biases observed in frequency
judgments were explained by the assumption that availability is not only influenced by
the objective frequency of the event in question but also by a series of further factors.
Examples of such factors can be taken from the study of frequency judgments on
causes of death. Lichtenstein et al. (1978) could demonstrate that the over- and
underestimation of the frequencies of different causes of death could be predicted
fairly well by means of two variables. These variables were the frequency with which
these causes are reported in newspapers and ratings of the catastrophic quality of these
causes. While no direct availability measure was collected in the study, the authors
assumed that these variables influence the ease of retrieval of the causes of death.
Correspondingly, they regarded their results as being consistent with the view that the
frequency judgments were generated with the availability heuristic.
The results of the letter study were explained in a similar way. Here as well
availability was not measured independently. Tversky and Kahneman, however,
claimed correctly (s. Sedlmeier, Hertwig, & Gigerenzer, 1998) that words beginning
with a certain consonant could be retrieved from memory more easily than words in
which this consonant occurs in third position. The observed overestimation of the
frequencies of consonants in the initial position could thus be attributed to the
application of the availability heuristic.
In contrast to the other studies, a direct measure of the ease of retrieval was
used in the famous names study. Each list of names was presented to two groups of
participants here. Subsequent to the presentation one group gave the ordinal frequency
12
judgments already described. The participants of the other group were asked to recall
as many names on the list as possible. In accordance with the hypothesis that the
frequency estimates were generated by means of the availability heuristic, it could be
shown that the more famous names were easier to retrieve. On average, 12.3 of the
famous names and 8.4 of the less famous names were recalled. Approximately 85% of
the participants recalled more or at least as many names of the sex represented with the
more famous names on the list.
Tversky and Kahneman (1973) suggested that availability could be generally
measured by means of the number of retrieved instances as in the famous-names study.
In addition to this, they assumed that people judging the frequency of an event can
assess the availability of this event in a similar way. According to the proposed
mechanism, frequency estimates based on availability can be generated by recalling as
many instances of the event as possible for a certain period of time. The number of
recalled instances then serves as a basis for the frequency estimate on this event.
The measurement of availability by the number of recalled instances became a
widespread practice in further research (e.g. Beyth-Marom & Fischhoff, 1977;
Barsalou & Ross, 1986; Williams & Durso, 1986; Bruce, Hockley, & Craik, 1991;
Brooks, 1985). The corresponding cognitive mechanism received different labels, such
as exemplar-retrieval hypothesis (Greene, 1989), recall-estimate theory (Watkins &
LeCompte, 1991), enumeration (Brown, 1995) or availability-by-number (Sedlmeier,
Hertwig, & Gigerenzer, 1998). A reason for the fact that later researchers preferred
different labels for this mechanism was that the notion of the availability heuristic was
originally phrased in a way that is consistent with several different cognitive
mechanisms (Fiedler, 1983; Schwarz & Wänke, 2002, Betsch & Pohl, 2002). Thus,
Tversky and Kahneman (1973) emphasized that a person using this heuristic can also
assess availability alternatively without performing a retrieval operation. However,
they did not specify the process that allows for such a retrieval-free judgment. In the
subsequent research no generally accepted measure of availability evolved which can
be collected independent of a recall test. The vague phrasing of the availability
heuristic thus left space for manifold interpretations of this notion. In later literature
very different forms of memory assessment were suggested and yet likewise described
13
as possible availability mechanisms, among them assessment of processing fluency
(Reber & Zupanek, 2002), subjective experience of ease-of-retrieval (Schwarz et al.,
1991; Wänke, Schwarz, & Bless, 1995) assessment of familiarity (Dougherty, Gettys,
& Ogden, 1999) or assessment of associative activation (Sedlmeier, 1999).
It is evident that the availability hypothesis cannot be tested on the basis of
such an ambiguous definition (Fiedler, 1983; Betsch & Pohl, 2002). Thus, the term
’availability’ can hardly be used in a reasonable manner. In the following
investigations of the famous-names effect and its ability to generalize to different
stimulus material we will instead use the term recall-estimate strategy. This term
should solely signify a cognitive process to estimate frequencies in which it is first
tried to retrieve relevant instances or exemplars from memory. The estimate is then
based on the size of the recalled sample. Correspondingly, to gain predictions of
frequency estimates of this strategy we will use the number of retrieved exemplars in a
recall test – analogous with the original famous-names study.
Memory Models of Frequency Judgment
As already alluded to, in the memory literature several detailed computational
models were suggested that can be used to explain frequency judgments (e. g. Gillund
& Shiffrin, 1984; Hintzman, 1988; Murdock, 1993; Fiedler, 1996; Dougherty, Gettys,
& Ogden, 1999; Sedlmeier, 1999). All these models share the assumption that every
additional encoding of an event occurring with a minimum of attention has an effect on
memory representation. Accordingly, in these models the state of the memory
representation at the time of judgment can be used to produce a signal the intensity of
which depends on the frequency of the event in question. Depending on the specific
model, this signal is called trace strength (Hintzman, 1969), familiarity (Hintzman,
1988; Dougherty, Gettys, & Ogden, 1999) or associative response (Sedlmeier, 2002).
Generally, it is considered as an equivalent to intuitions about event frequencies. All
models have thus also the assumption in common that frequency judgments can be
generated without retrieving single instances.
However, the models clearly differ with regard to their specific representation
assumptions and, as a result, also with regard to the process with which the memory
14
representation is accessed. According to these criteria, the models can be subdivided
into two categories: multiple-trace models (sometimes also termed exemplar models;
Estes, 1986) and associationist models (sometimes also termed prototype models;
Smith & Medin, 1981). In multiple trace models a trace is added to an indefinitely
increasing memory matrix for each newly encoded event. MINERVA 2 (Hintzman,
1988) is the model of this kind that was most often used to explain frequency
judgments. In this model, a trace with which an event is represented in memory
corresponds to a vector of features. Possible feature values are “1“ (feature present),
“-1“ (feature not present) and “0” (feature either irrelevant or unknown). Features of an
event are stored correctly in memory with the probability L and are coded as unknown
with the probability 1 - L. To elicit a frequency judgment, memory is probed with the
vector that represents the event in question. In order to determine the ‚response’ of
memory to this probe, the similarity between each trace and the probe is determined.
The similarity of a single trace to the probe results from the scalar product of the two
feature vectors1. The familiarity signal used to model frequency judgments
corresponds to the sum of the cubed similarity of all traces. As far as all other variables
remain unchanged, a more frequent event will thus elicit a greater familiarity signal
than a less frequent event since it is represented in memory with more traces.
In associationist models, on the other hand, events are not stored in the form of
separate traces. An example of an associationist model is PASS (Probability
ASSociator) (Sedlmeier, 1999; 2002). PASS is based on a neural network that acquires
frequency knowledge by associative learning. The network consists of one layer of
units and weights connecting these units with each other2. A given unit is activated
each time the appropriate feature is available in an event that is to be encoded. The
input format for PASS therefore consists of vectors, similar to the input format for
MINERVA 2, in which the values 1 and 0 indicate the presence or absence of certain
features in events. To encode an event, the weights (or associative strengths) between
the units are modified. The associative strength between two units is increased if both
1
More precisely, the similarity is given by the scalar product divided by the number of features that are
not 0 in both vectors.
2
Sedlmeier (1999) could demonstrate that different architectures of networks result in similar
simulation results. PASS thus includes a family of neural networks. The present description refers to a
simple one-layered network called FEN II (s. also Sedlmeier, 2002).
15
corresponding features are available in the event. If only one or neither feature is
available, the associative strength between the units is decreased. The extent of the
modification of the weights in each encoding is regulated by a learning parameter, an
interference parameter and a decay parameter. To elicit a frequency judgment, PASS is
presented with a featural description of the event in question. The activation which this
event triggers at the corresponding units spreads across all units of the network
according to the previously acquired associative strengths. The sum of the activation in
all units is called associative response and is considered the basis for frequency
judgments. As far as all remaining potentially relevant variables are constant, more
frequent events will elicit a stronger response than less frequent ones since higher
associative strengths exist between units that encode their features.
In spite of their different architectures, multiple-trace models and associationist
models often make similar or identical predictions. Thus, both MINERVA 2 and PASS
were used to simulate the typical finding of quite accurate but regressed frequency
judgments. In addition to this, memory models have been successful in explaining the
influence of diverse factors on frequency judgments. For instance, recency effects
(Lopez et al., 1998; Sedlmeier, 1999), list-length effects and spacing effects
(Hintzman, 1988) in frequency judgments could be simulated with these models.
Biased Estimates Reconsidered
Until recently, however, no attempt was made to explain biased estimates in the
scope of memory models. Just like memory literature had little influence on the
research in the area of judgment and decision-making, the results and explanatory
approaches from the heuristics and biases program were hardly addressed in memory
research. This only changed in the course of the 1990-ies. Researchers in both subdisciplines started to carry out studies in which predictions of the availability heuristic
were tested against predictions of the memory models (e.g. Betsch et al., 1999; Manis
et al., 1993; Sedlmeier, Hertwig, & Gigerenzer, 1998; Watkins & LeCompte, 1991;
Williams & Durso, 1986). Additionally, theoretical attempts to integrate the findings
from both research paradigms originated. An example is MINERVA-DM (with DM
standing for decision making), a modified version of MINERVA 2 that was suggested
16
by Dougherty, Gettys and Ogden (1999). This model could be used successfully to
simulate both accurate and biased estimates.
However, the greater consideration of results from the area of judgment and
decision-making in memory literature also resulted in a criticism of those studies
demonstrating biases in frequency judgments (e.g. Gigerenzer, 1991). In case of the
letter study (Tversky & Kahneman, 1973), this criticism was mainly empirically
founded. The finding of a general overestimation of the frequency of occurrence of
letters in the initial position of words could not be replicated in subsequent studies.
White (1991) reported in a short article that his participants judged the majority of
consonants also used by Tversky and Kahneman correctly as occurring more often in
third position. Sedlmeier, Hertwig and Gigerenzer (1998) used a greater sample of
letters. The judgment task of their participants was to estimate the relative frequency of
these letters in initial and second position of German words. Additionally, to measure
the availability of words with these letters in the relevant positions, a recall test was
carried out. The result of this recall test confirmed Tversky and Kahneman’s
assumption: Irrespective of how often the letters actually occur in initial and second
position, the participants generally retrieved more words beginning with a given letter
than words showing this letter in second position. However, the result of the judgment
task contradicted the recall-estimate hypothesis. In three successive studies, letters
occurring more often in second position were mostly also judged to appear more often
in this position. Correspondingly, the covariation between the frequency judgments
and the number of retrieved words was low. On the other hand, the estimates could be
predicted very well by the actual positional frequencies of the letters. The only “bias“
evident in the judgments consisted of the well-known phenomenon of regression to the
mean.
Criticism of the study of judgments on the frequencies of causes of death
(Lichtenstein et. al, 1978) primarily referred to the normative standard with which the
estimates were compared. In a comment on the study, Shanteau (1978) already pointed
out that most people hardly have any immediate experiences with the repeated
occurrence of a greater number of causes of death. However, if the participants had no
opportunity to learn the frequencies normatively regarded as being correct, it is little
17
surprising that their estimates deviated from the comparison standard at least in some
cases. Therefore, he considered it as tenuous to suggest a psychological interpretation
of the supposed judgment errors with the availability heuristic. That possibly no
special mechanism is necessary to explain the errors observed in this study results
from the fact that the systematical over- and underestimations of the frequencies of
different causes of death were closely related to the frequency with which newspapers
reported on these causes. This leaves the interpretation open that the estimates did
indeed not correspond to the actual facts but reflected the frequencies with which the
causes of death were presented to the participants (Sedlmeier, 1999; 2002). In this
view, the result of the study by Lichtenstein et al. (1978) would correspond to the
usual finding in research on frequency judgments: The participants gave quite accurate
but regressed estimates on the frequencies with which they encoded the stimuli. This
finding could in turn easily be explained by memory models.
Different from the letter study and the study of frequency judgments on causes
of death, later research did not call into question that the famous-names study (Tversky
& Kahneman, 1973) demonstrates a reliable bias. It is normatively evident that it is a
systematic error if the large majority of participants judges that one of two categories
as more frequent which was presented less frequent before. Empirically it could be
shown that the famous-names effect is replicable. In several experiments,
Lewandowsky and Smith (1983) presented lists with a varying number of famous
names of the one sex and non-famous names of the other sex to their participants.
Subsequent to the list presentation, the frequency of the “famous sex“ was consistently
overestimated. A study by Manis et al. (1993) in which 20 famous and 20 less famous
names of both sexes were presented yielded the same result.
An Alternative Account for the Famous-Names Effect
However, later research did call into question Tversky and Kahneman’s
original explanation for this bias according to which the participants used a recallestimate strategy to generate their estimates. Dougherty, Gettys and Ogden (1999) and
Sedlmeier (1999) suggested virtually the same alternative account for the famousnames effect on the basis of memory models. Accordingly, these authors assume that
18
the famous-names effect is mediated by a cognitive process for which the retrieval of
exemplars from the list of names is without relevance. The starting point of the
alternative explanation is the assumption that the essential feature of famous names is
their increased frequency of occurrence outside the experiment. Consider as an
example the famous name “Marilyn Monroe“. Every participant of a study in which
this name is used as a stimulus will probably have come across the name “Marilyn
Monroe” several hundred times before the study in magazines, films, books, on
posters, in conversations, radio and television broadcasts. Less or non-famous names,
on the other hand, will rarely or not at all have been encoded before the experiment.
Dougherty and colleagues (1999) and Sedlmeier (1999) suppose furthermore that the
participants are not able to discriminate perfectly between the occurrence of stimuli
inside and outside the experiment. Thus, when memory is probed for the frequency of
famous and less famous names after the list presentation, the memory response will be
influenced, to some degree, by the pre-experimental frequency of these names. This
way, the famous names that were presented less often in the experiment could receive
a higher response and thus a higher estimate than the less famous names.
Dougherty et al. (1999) modeled the assumed process with MINERVA-DM3.
In each of their simulations the memory matrix contained 39 traces, corresponding to
the 19 famous and 20 less famous names presented in the experiment. In order to
assess the frequency with which the participants may have encoded the names preexperimentally, the authors determined the citation frequency of – according to their
judgment - highly famous and less famous male and female actors in some major USAmerican newspapers. Famous male actors were mentioned 46 times more often in
these newspapers than less famous male actors. For famous and less famous female
actors the ratio amounted to 26 : 1. According to this ratio, in two simulations either 46
or 26 additional extra-experimental traces were stored in memory for every famous
name. For each less famous name the memory matrix was supplemented with one
extra-experimental trace.
3
For the current purpose, the differences between MINERVA 2 and MINERVA-DM are irrelevant. Our
description of MINERVA 2 is thus a valid description of the model which formed the basis of the
simulations described below.
19
Dougherty and colleagues now assume that the extent to which the preexperimental occurrence of stimuli influences estimates on the experimental frequency
of these stimuli depends on the contexts in which the stimuli were encoded. The
influence of pre-experimental events is supposed to be the greater the more similar the
context in which these events were encoded is to the experimental context (s. also
Dougherty & Franco-Watkins, 2003; Johnson, Hashtroudi, & Lindsay, 1993). To
integrate this “context-discrimination mechanism“ (Dougherty & Franco-Watkins,
2002) into the simulation, every trace in memory consisted of two mini-vectors. One
of these mini vectors represented a name; the second mini vector represented the
respective encoding context of the name. Correspondingly, each of the 39 intraexperimental traces was complemented by the same (experimental) context minivector. The context mini-vector for each extra-experimental trace, on the other hand,
was generated randomly.
The memory probes used to elicit frequency judgments of the model included
the experimental context component. Therefore, intra-experimental traces were
considerably more similar to the probes than extra-experimental traces. Consequently,
every single extra-experimental trace could increase the resulting familiarity signal
only by a comparatively small amount. Due to their large number, the extraexperimental traces of famous names nevertheless affected the simulation result
decisively. On average, famous names received a clearly more intense familiarity
signal than less famous names. After the summation of the familiarity signals to all
famous and less famous names, MINERVA-DM predicted a large overestimation of
the frequency of famous names on the experimental study list. In the simulation with
26 extra-experimental traces for each famous female actor, the model expected an
estimate of approximately 70% for the proportion of famous names on the study list. In
the simulation with 46 extra-experimental traces for each famous name, MINERVADM even predicted an estimate of about 80% for this proportion. In contrast to this, the
model estimated the actual relative frequency of 48.7 % of famous names on the study
list approximately correctly in a third simulation in which no extra-experimental traces
were included in the memory matrix. Dougherty et al. (1999) concluded from these
20
results that “[the simulations] have shown that availability effects can be accounted for
by a global familiarity signal without assuming recall is taking place“ (p. 193).
Sedlmeier (1999) pointed out that the famous-names effect could be explained
in the scope of the PASS model, again due to the higher pre-experimental frequency of
famous names. While he did not carry out a simulation demonstrating that the model
predicts this effect, he outlined how such a simulation could look like. The suggested
procedure is identical with the simulation method used by Dougherty and colleagues.
In a first step PASS encodes the pre-experimental occurrences of famous and less
famous names together with their respective contexts. After that, every name on the
study list is encoded a further time – now with the experimental context identical for
all names. At the time of judgment, memory is probed with the 39 experimentally
presented names. PASS’ estimates on the proportion of famous names on the study list
result from the sum of the associative responses to the 19 famous names in relation to
the sum of the responses to the 20 less famous names. Thus, the only difference to the
explanation of the famous-names effect with MINERVA-DM consists in the fact that
the repeated (pre-experimental) occurrence of names is in PASS not represented by
additional traces in a memory matrix but by changed associative strengths in a neural
network. Higher associative responses for famous names than for less famous names
are to be expected in this model since the associative strengths between the units of the
network coding the features of a famous name should be already pre-experimentally
higher.
The simulation method suggested by Sedlmeier would undoubtedly lead to the
result that PASS overestimates the frequency of famous names on the study list.
However, it should be noted that this simulation method is definitely not suitable to
map the cognitive process with which the participants judged the frequency of famous
and less famous names in the original study. The same applies to the simulation
method used by Dougherty and colleagues to demonstrate that MINERVA-DM
expects a famous-names effect. In both simulations all 39 names from the study list
were used successively as probes to elicit frequency judgments of the memory models.
The participants of the original study, however, were not presented again with all
21
names from the study list at the time of judgment. As a result, they could not access
memory by the individual names.
By using individual names as probes, the simulations primarily demonstrate
that memory models expect an overestimation of famous names in the case of
frequency of occurrence judgments. According to this prediction, participants should,
for instance, overestimate the experimental frequency of “Marilyn Monroe“ relative to
a less famous name if both names are included in a study list equally often. On the
other hand, in the famous-names study set size judgments on the relative frequency of
male and female names were requested. Correspondingly, the participants could
exclusively use the prompts “male names“ and “female names“ to access memory. The
simulation of set size judgments by adding the responses to individual names at the
time of judgment can thus not correspond to the actual cognitive process.
Despite this problem, the simulations lend credence to the assumption that
memory models can offer an alternative explanation for the famous-names effect. They
demonstrate that famous names elicit higher memory responses due to their increased
pre-experimental frequency. These increased responses could be aggregated to set size
judgments in a different way than by addition at the time of judgment. Various
aggregation mechanisms are conceivable and could be integrated into the models. The
simplest assumption presumably is that stimuli activate the memory representation of
appropriate superordinate categories already during their encoding. Since famous
names elicit a greater memory response, their superordinate categories should then be
activated more than the superordinate categories of less famous names. Thus, “Marilyn
Monroe“ would, for example, activate the category “female“ more than a less famous
male name the category “male“. This stronger activation of the memory representation
of “female“ could increase by the continual presentation of famous women and less
famous men for the period of the learning phase. Therefore, when memory is probed
for the frequency of male names and female names at the time of judgment the
activations at the level of superordinate categories could be directly accessed.4
4
A similar suggestion was made by Alba et al. (1980). These authors found that the absolute number of
exemplars of several categories was estimated quite accurately. Furthermore, the accuracy of the
judgments was not impaired by severe response time restrictions. From this finding Alba et al.
22
Consequently, the more ’famous’ category would receive an inflated frequency
estimate.
Thus, the essence of the alternative explanation for the famous-names effect by
memory models can be regarded as plausible despite the deficient specification and
implementation of an aggregation mechanism: The models explain the effect by the
increased pre-experimental frequency of occurrence of famous names and the
participants’ incapability to discriminate perfectly between different encoding
contexts. An important characteristic of this explanation is that it deems other features
of famous names irrelevant. The size of the effect should therefore covariate with the
differences in the pre-experimental frequencies of occurrence of famous and less
famous names. Besides, the famous-names effect should generalize to different
stimulus material. Judgments on the relative frequency of any two categories should
exhibit an analogous effect: That category represented with exemplars on the study list
which show the greater extra-experimental frequency should generally be
overestimated.
In the case of frequency of occurrence judgments, similar hypotheses have
already been tested successfully. Several studies demonstrate that estimates on the
frequencies of stimuli in a target context are influenced by the frequency of these
stimuli outside this target context. Johnson, Taylor and Raye (1977) found that
estimates on how often words appeared on a study list increased the more often the
participants were additionally asked to generate the words internally. Hintzman and
Block (1971) presented their participants with two successive study lists that both
contained the same stimuli with varying frequencies. Estimates on the frequency of a
stimulus on one of the two lists were influenced by the frequency of this stimulus on
the other list. Dougherty and Franco-Watkins (2003) used a similar design. First they
presented a list on which words occurred with different frequencies. On the subsequent
target list these words had all the same frequency. Again, judgments on the target list
frequency of a word increased with the frequency of this word on the previous list. In
this study it could also be demonstrated that the extent of the influence of the
concluded that the basis for category size judgments is already formed during the encoding of the
exemplars and can later be directly retrieved.
23
frequencies of occurrence outside the target list is dependent on the similarity of the
encoding contexts. As predicted by the memory models, this influence was smaller
when target list and non-target list were not identically designed.
The studies cited above do not only differ from the situation in the famousnames experiment by the fact that frequency of occurrence judgments were inquired
instead of set size judgments. In these studies, the non-target frequency was
manipulated within the experiment. In contrast, according to the explanation of the
memory models, the famous-names effect is caused by the naturally varying
frequencies of occurrence of the names outside the experiment. On the one hand, these
extra-experimental frequencies of occurrence can be assumed to be larger by orders of
magnitudes than the frequencies of stimuli in an experimentally presented non-target
list (amounting to a maximum of 16 in the studies described above, s. Dougherty and
Franco-Watkins, 2003). On the other hand, the similarity of the encoding context to the
experimental target context is of course very reduced in the case of extra-experimental
occurrences of stimuli. Nevertheless, at least two studies show that judgments on the
experimental frequency of occurrence of words can be influenced by the extraexperimental, linguistic frequency of the words. Both Rao (1983) and Greene and
Thapar (1994) found that the experimental frequencies of high-linguistic-frequency
words are discriminated worse than those of low-linguistic-frequency words.
Moreover, Rao reports that in several studies high-linguistic-frequency words with an
experimental frequency of 1 received slightly higher judgments than low-linguisticfrequency words with the same experimental frequency.
The question whether also judgments on the experimental size of categories are
affected by the frequency of occurrence of their exemplars in different contexts has not
been examined yet. The objective of the four experiments reported below was to
provide such a test. In all studies the participants were presented with exemplars from
two categories occurring with a considerably different frequency outside the target list.
The frequency of occurrence of the exemplars was either varied by using stimuli with
extra-experimentally different frequencies (experiments 1, 2, and 4) or by presenting a
non-target list in advance (experiment 3). If that category the exemplars of which
occur more frequently outside the target list is overestimated in these experiments, this
24
will support the assumption that the famous-names effect can be explained in the scope
of memory models. The assumption that the participants retrieve single exemplars to
generate their frequency judgments would no longer be necessary. On the other hand,
if the expected effect does not occur with stimulus material other than famous names,
the hypothesis of the memory models will have to be rejected. In this case, the famousnames effect could not be attributed to the different pre-experimental frequency of
famous and non-famous names. The explanation of the famous-names effect by
memory models would then be deprived of its basis.
It should be pointed out that a covariation between the extra-experimental
frequency of exemplars and experimental frequency judgments on the corresponding
categories would merely demonstrate that memory models can offer an explanation for
the famous-names effect. Such a result would, on the other hand, not necessarily
indicate that the participants really generated their estimates with a memory
assessment strategy. Alternatively, it can be assumed that the manipulation of extraexperimental frequencies influences recall performances and memory responses in an
equal manner. In this case, the frequency judgments could be traced back to both a
memory assessment strategy and a recall-estimate strategy. In general, whenever these
strategies are tested the problem occurs that factors resulting in an improved memory
representation of events and thus higher memory responses usually also elicit an
enhanced retrieval of these events (Watkins & LeCompte, 1991; Brown, 1997; Manis
et al., 1993; Lewandowsky & Smith, 1983). This also holds for the extra-experimental
frequency manipulated in the current experiments. It is a well-established fact, that
high-linguistic-frequency words are better recalled than low-linguistic-frequency
words (e.g. Sumby, 1963). Thus, it can be expected that also the recall-estimate
strategy predicts that the relative frequency of the category with the more frequent
exemplars is overestimated. However, this does not necessarily mean that the recallestimate approach and the memory models also arrive at the same quantitative
predictions. For instance, in a study by Watkins and LeCompte (1991) the predictions
of the recall-estimate strategy were too weakly correlated with the actual frequencies
to be able to explain the usual finding of fairly well calibrated category size judgments.
Therefore, in the present experiments we also checked whether the recall-estimate
25
strategy expects a similarly sized overestimation of the category with the more
frequent exemplars as the memory models. A comparison of the quality of the
predictions of both approaches in all experiments should allow a conclusion whether
the approaches are really equally suitable to explain the participants’ frequency
judgments.
Overview of the Experiments
For all four experiments we chose a design that closely followed the procedure
of the famous-names study. Specifically, we presented the participants in every
experiment with a study list containing 39 different stimuli. These stimuli were
exemplars of two different superordinate categories. One of these categories was
represented with 19 exemplars on the study list, the other with 20 exemplars. The 19
exemplars of the one category always had a clearly greater frequency of occurrence
outside the study list than the 20 exemplars of the other category. Thus, there was
always an inverse relation between the number of exemplars of a category on the study
list and the frequency of occurrence of these exemplars outside the study list. After the
presentation of the study list, the participants judged from which of the two categories
more exemplars were included in the list. Following this ordinal judgment, we asked
the participants for quantitative estimates on the relative frequencies of the categories.
Furthermore, we administered a free recall test in all experiments to be able to derive
quantitative predictions of the recall-estimate strategy. To control for possible order
effects, half of the participants completed this recall test before giving their frequency
estimates and the other half completed the recall test after having given their estimates.
The explanation of the memory models for the famous-names effect is based on
the assumption that more famous names generally occur more often than less-famous
names. Obviously, this explanation can only be valid if the effect is also observed
when names are used in the experiment that were not selected because of their fame
but due to their extra-experimental frequency of occurrence. If the effect did not occur
in this case, a further test of the memory model explanation would be superfluous. In
Experiment 1 we therefore used names of male and female personalities mentioned in
major German newspapers with different frequencies. A further aim of Experiment 1
26
was to obtain a comparative standard for the evaluation of the size of the effects in the
subsequent experiments: As far as the famous-names effect is really caused by the
extra-experimental frequency of occurrence of the names, an analogous effect should
show a similar size when different stimulus material with similar frequencies is
applied.
An apparent problem of Experiment 1 is that it does not solve the supposed
confounding of fame and frequency of occurrence outside the target list. To meet this
problem, we used stimulus material that cannot be considered as famous in both
subsequent experiments. In Experiment 2 the study lists contained exemplars of the
categories ’plants’ and ‘animals’ which pre-experimentally occur with a different
frequency. In Experiment 3 we used female and male names of persons unknown to
the participants. The frequency of occurrence of these names outside the target list was
varied in a preceding learning phase.
For reasons that will become clear from the results of the experiments 1 to 3,
we looked for stimulus material that should be as similar as possible to that of the
original famous-names study for experiment 4. we decided on names of German and
foreign cities with different pre-experimental frequencies of occurrence.
27
Experiment 1
Experiment 1 is a modified and extended replication of the famous-names
study. The essential difference to the original study is that the names were not selected
due to their fame but due to their extra-experimental frequency of occurrence.
Moreover, instead of two study lists we used six lists on which the differences between
the pre-experimental frequencies of male and female names were varied. This allowed
us to check whether the estimates on the relative experimental frequencies of the
categories covariate with the extra-experimental frequencies of the exemplars.
Method
Participants: Ninety-six persons took part in the experiment (41 men and 55
women, with an average age of 24 years). Most of the participants were students of the
TU Chemnitz from various departments. Students of Psychology did not take part in
the experiment. The participants received a reimbursement of 3 Euros.
Material: The procedure to select stimulus names was as follows: In a first step
we prepared a list of names with 80 known female and male personalities each. These
personalities were predominantly actors, musicians, athletes, politicians and writers. In
a second step, we determined the frequency of occurrence of the names in several
volumes of three nation-wide German daily newspapers (Südeutsche Zeitung,
Frankfurter Rundschau and die tageszeitung). From the 19 female and male names
with the greatest frequency of occurrence in this text corpus we formed sets with
“high-frequent names” (these sets contained, for example, Steffi Graf, Alice Schwarzer,
Marilyn Monroe or Berti Vogts, Günter Grass and Steven Spielberg). The 20 male and
female names with the least frequency of occurrence in the corpus were, on the other
hand, used as “medium-frequent names” in the experiment (e.g. Udo Jürgens, Roman
Polanski, Georg Büchner or Doris Day, Ella Fitzgerald and Isabel Allende). In
addition to this, we formed two sets with 20 unknown female and male names. These
names were drawn from an Internet list of participants of a lottery game and had a
frequency of occurrence of 0 in the corpus. For all stimuli used in the experiment the
first names allowed an unambiguous identification of sex. The six sets with high-
28
frequent, medium-frequent and unknown names of both sexes were combined to the
six study lists used in the experiment according to the scheme illustrated in Table 2.1.
Table 2.1
Composition of the study lists in Experiment 1
female names
high-
high-frequent
medium-frequent
unknown
--
19 males /
19 males /
20 females
20 females
--
19 males /
frequent
male
names
medium-
19 females /
frequent
20 males
unknown
19 females /
19 females /
20 males
20 malesa
20 femalesa
--
a
In study lists with medium-frequent and unknown names the medium-frequent name
with the smallest frequency of occurrence in the corpus was not used.
Table 2.2 displays the sums of the frequencies of occurrence in the text corpus
for male and female names on each of the six study lists. These sums of the
frequencies of occurrence form the basis of the memory model prediction of the
frequency estimates. On the first three lists the male names show the greater preexperimental frequency of occurrence, while female names are pre-experimentally
more frequent on the following lists. As the table shows, each time the greatest
differences between the frequencies of occurrence of male and female names exist on
the lists with high-frequent and unknown names, followed by the differences on the
lists with high-frequent and medium-frequent names. The smallest differences in the
pre-experimental frequency of occurrence can each time be found on the lists with
medium-frequent and unknown names.
Procedure: The participants were tested individually or in groups of up to six
persons at personal computers. Every participant was randomly assigned one of the six
study lists. Prior to the study phase, the participants were informed that they would be
presented with a list of names in the following minutes. Furthermore, they were
instructed to read through this list of names consciously as they would later have to
29
answer some questions referring to this list. The exact nature of the memory test,
however, remained unspecified.
Table 2.2
Sum total of the frequencies of occurrence in the text corpus for male and female
names on the study lists in Experiment 1
male
female
study lists
names
names
19 high-frequent males / 20 unknown females
18101
0
19 high-frequent males / 20 medium-frequent females
18101
1364
19 medium-frequent males / 20 unknown females
2267
0
0
9661
2311
9661
0
1343
19 high-frequent females / 20 unknown males
19 high-frequent females / 20 medium-frequent males
19 medium-frequent females / 20 unknown males
During the study phase, names were presented one at a time. Each presentation
lasted three seconds, followed by a one-second inter-stimulus interval in which the
screen was blank. The presentation order of the 39 names on the study list was
randomly determined for every participant. Subsequent to the study phase, one half of
the participants in each study list condition first completed the frequency judgment
tasks, and the other half first completed the free recall test. Within the frequency
judgment tasks, participants were first asked to indicate whether the study list
contained more female names or more male names. After this ordinal frequency
judgment, exact quantitative estimates of the percentages of female names and male
names on the study list were requested from the participants. Although the ordinal
judgment was a forced-choice task, participants had the opportunity to estimate both
the percentages of female and male names with 50% in the quantitative judgment.
For the free recall test, participants were handed over a sheet of paper. They
were instructed to recall and write down within two minutes as many names as
possible from the study list in any order. After the two minutes, the experimenter
collected the sheet.
30
Predictions of the Memory Models
To gain quantitative predictions of the memory models, we used the same
procedure as Dougherty et al. (1999) in their simulations with MINERVA-DM.
However, since the results of this procedure can be derived mathematically in an easy
manner, we renounced the realization of further simulations. Generally, it was assumed
in the memory models that the estimates on the relative frequency of a category are
determined by the sums of the memory responses to all female and male names.5 In the
terms of MINERVA-DM the estimate on the relative frequency of female names on
the study list should be given by the equation below (the expected relative frequency
estimate on male names can be determined likewise):
Estimatefemale names =
familiarity signalsfemale names / ( familiarity signalsfemale names + familiarity signalsmale names)
Given that all intra-experimental occurrences of names are encoded equally
well, all names can be expected to elicit a familiarity signal of approximately the same
strength based solely on their intra-experimental traces. If the study list exclusively
contained 19 unknown female and 20 unknown male names, the model would
consequently expect a relative frequency judgments for female names of 19/39. This
was also the result of the simulation by Dougherty et al. in which the memory did not
contain any extra-experimental traces. However, if names appear on the study list
already encoded extra-experimentally, the extra-experimental traces will also
contribute to the strength of the familiarity signal elicited by a name. As already
explained above, the size of this contribution should depend on the similarity between
the extra-experimental encoding context and the intra-experimental encoding context.
Since one can assume that the extra-experimental encoding contexts of female and
male names show on average the same similarity to the intra-experimental context,
also the average contribution of the extra-experimental traces of female and male
names to the familiarity signal should be alike. The contribution of each extra5
It should be pointed out once more that we do not assume that the addition of responses to individual
names maps the process with which category size judgments are generated. Nevertheless, we follow the
predictions from MINERVA-DM and PASS as far as we assume that the responses to categories are
proportional to the corresponding sums.
31
experimental trace can thus be described as a constant fraction a of the contribution of
an intra-experimental trace. Both, the sum of the familiarity signals to all female
names and the sum of the familiarity signals to all male names are thus proportional to:
intra-experimental occurrences of names + a
extra-experimental occurrences of names
If the frequencies of occurrence in the text corpus given in Table 2.2 were
inserted in the formulas described above, the MINERVA-DM estimate for the relative
frequency of female names on the list with 19 high-frequent women and 20 mediumfrequent men would result from (19 + a * 9661) / ((19 + a * 9661) + (20 + a * 2311)).
The value of a is obviously unknown and has to be estimated. Without applying any
formal fitting-procedures, we determined a so that the magnitude of the MINERVADM predictions approximately corresponds with the magnitude of the participants’
frequency judgments on all six study lists. This resulted in an estimate of a = 0.001.6 In
the Experiments 2 and 4 in which also stimulus material with a varying extraexperimental frequency was used we determined the quantitative prediction of the
memory model with the same parameter setting for a. For the present experiment the
selected procedure evidently means that the magnitude of frequency judgments cannot
be used for an evaluation of the memory model prediction.
Figure 2.1 displays the resulting predictions for the estimated relative
frequency of male names on the six study lists. In the figure the study lists are arranged
according to the difference between the sum totals of the extra-experimental
frequencies of occurrence of male and female names. On the first three study lists this
difference is positive. Starting from the list with medium-frequent female names and
unknown male names, the difference takes negative values. As the figure shows, the
prediction of the memory model covaries with the difference of the pre-experimental
frequency of occurrence of the names of both sexes. The highest estimate for male
6
The frequencies of occurrence in the text corpus certainly represent only a mediocre estimate of the
total number of the extra-experimental encodings of the names. However, a proportional increase of the
extra-experimental frequencies of male and female names would be compensated by our setting of the
parameter a and thus not change the MINERVA-DM predictions. Consequently, the use of the
frequencies of occurrence in the text corpus merely implies the assumption that these frequencies of
occurrence reflect the relative differences between the extra-experimental frequencies of male and
female names approximately correctly.
32
names is expected on the list with high-frequent male and unknown female names.
Conversely, the lowermost estimate is predicted on the list with high-frequent female
and unknown male names for which this difference takes the largest negative value.
Overestimations of the proportion of male names are expected on all study lists on
which these names are pre-experimentally more frequent. The rank order of the
predicted estimates on the frequency of male names on the six study lists is largely
independent of the parameter a. Also the relative differences between the predictions
to the six lists vary only slightly with the parameter as long as values are taken for a
that are (I) so large that the model expects overestimations of the experimentally rarer
category, and (II) so small that an intra-experimental trace exerts a clearly greater
influence on the familiarity signal than an extra-experimental trace.7 The relative
differences of the expected frequency judgments across the six study lists thus provide
a criterion for the assessment of the quality of the prediction of the memory models.
As already explained above, on the basis of the same assumptions also entering
into the derivation of the MINERVA-DM predictions, the PASS model (Sedlmeier,
1998), too, expects an overestimation of the sex with the pre-experimentally more
frequent names. However, the quantitative predictions of PASS do not tally precisely
with those of MINERVA-DM. In MINERVA-DM every additional occurrence of an
event is represented in memory by an additional trace. The familiarity signal elicited
by extra-experimental traces of names thus increases linearly with the number of these
traces (Hintzman, 1988). In contrast to this, events in PASS are encoded by a
modification of associative strengths. The increase of the associative strength becomes
the smaller the more frequent the event in question was encoded before. The
associative response elicited in PASS by the names thus does not follow a linear but a
negatively accelerated function of the pre-experimental frequencies of the names.8 To
7
To illustrate this point: We tested different parameter settings between a = 0.001 and a = 0.01. While
the absolute magnitude of the predicted frequency judgments with these parameter settings changes by
up to 30%, the covariation between the MINERVA-DM predictions and the participants’ estimates
remains nearly constant.
8
In general, MINERVA-DM expects that frequency judgments increase linearly with the actual
frequencies while PASS assumes that that the function relating estimates to frequencies is negatively
accelerated. Surprisingly, even the large amount of studies on frequency judgments does not allow any
definite conclusions which of the assumptions applies. In the vast majority of these studies, the
frequencies are varied experimentally. The maximum presentation frequency seldom exceeds 15 in
these cases. In such studies the estimates can mostly - but not always (e.g. Leicht, 1968; s. a. the results
33
gain predictions of the PASS model, the pre-experimental frequencies thus have to be
transformed first according to this function. As a consequence, PASS expects a weaker
increase of the judgmental bias between study lists with medium-frequent and
unknown names and study lists with high-frequent and unknown names than
MINERVA-DM.
For the present experiments the difference between the predictions by
MINERVA-DM and PASS can nevertheless be neglected. The exact function relating
the extra-experimental frequencies to the associative responses depends in PASS on
the settings of the model parameters and on the temporal course of the encodings. In
order to carry out necessary parameter settings, however, we had to rely in the present
studies on the magnitude of the frequency judgments in Experiment 1. The frequency
judgments expected by PASS would thus vary in a similar range as the MINERVADM predictions. The remaining differences in the progression of the predictions across
the six study lists would be too small to permit a competitive test of both models. On
behalf of the predictions of both memory models, we thus used in all experiments the
MINERVA-DM predictions.
Results
Predictions of the Recall-Estimate approach
To gain quantitative predictions of the recall-estimate strategy for the estimates
on the relative frequency of male names on the study lists, we calculated a recall ratio
for every participant. This measure is usually applied in tests of the recall-estimate
strategy (e.g. Lewandowsky & Smith, 1983; Watkins & LeCompte, 1991, Manis et al.,
1993, Sedlmeier, Hertwig, & Gigerenzer, 1998). In the present experiment, the recall
ratio denotes the percentage of recalled male names among the total number of recalled
names.9 Figure 2.1 shows the means of these ratios on the six study lists. In the figure
reported in chapter 3) - be approximated fairly well by a linear function of the actual frequencies.
However, it remains unclear whether this result is caused by the restricted range of the presentation
frequencies.
9
Here and in all following experiments we scored all recalled exemplars that could unambiguously be
assigned to one category. Generally, the recall-estimate predictions change only slightly if a stricter
scoring criterion is applied. However, there is a tendency that greater biases are expected with such a
34
the data of the participants having worked on the recall test before and after the
frequency judgment tasks are summarized since the recall performances did not differ
between the two task orders.
100
90
Memory Model Predictions
Recall-Estimate Predictions
Estimates
Percentage of Male Names
80
70
60
50
40
30
20
10
0
19
f
uk
20
/
hfm
19
/20
hfm
f
mf
f
m
m
km
uk
uk
mf
0u
0
20
0
2
/
/
2
2
/
f/
m
hff
hff
mf
mf
19
19
19
19
Figure 2.1. Mean estimated percentages of male names on the six study lists and the
corresponding predictions of the memory models and the recall-estimate strategy
in Experiment 1. Error bars denote standard errors. hfm = high-frequent male
names; mfm = medium-frequent male names, ukm = unknown male names; hff =
high-frequent female names; mff = medium-frequent female names, ukf =
unknown female names.
The recall ratios were strongly influenced by the pre-experimental frequencies
of the names. On all six lists clearly more names of that sex were recalled the
exemplars of which had greater pre-experimental frequencies of occurrence. The
recall-estimate strategy thus expects like the memory models an overestimation of the
relative frequency of male names on the first three study lists and an underestimation
criterion. As will become clear from the results of the experiments, this would be a disadvantage for the
recall-estimate strategy.
35
of this frequency on the last three study lists (s. Fig. 2.1). However, the predictions of
the recall-estimate strategy do not decrease monotonously with the differences of the
frequencies of occurrence of male and female names in the text corpus. The smallest
bias in the estimates is here predicted on the two lists with high and medium-frequent
names with which merely a medium difference in the frequencies of occurrence of
male and female names in the corpus was realized. On each of the four lists with
unknown names, on the other hand, the recall-estimate strategy expects a bias of a
similarly extreme size. The predicted deviation between the estimated and the actual
relative frequency of male names here continuously amounts to more than 30%.
We evaluated the influence of the pre-experimental frequency of the names on
the recall ratios additionally with the help of correlational effect sizes. we compared
the recall ratios for male names (I) on the two lists on which high-frequent and
unknown names were combined, (II) on the two lists on which high-frequent and
medium-frequent names were combined and (III) on the two lists on which mediumfrequent and unknown names were combined. The resulting three effect sizes together
with the corresponding t-values and degrees of freedom can be gathered from Table
2.3. The effect sizes display the pattern that is suggested by Figure 2.1. The effect in
the recall ratios on the lists with medium-frequent and unknown names is as large as
on the lists with high-frequent and unknown names. On the lists with high-frequent
and medium-frequent names, on the other hand, the effect is somewhat smaller.10
10
Since also the experimental frequencies of male names on the compared lists varied between 48.7%
and 51.3%, the effect sizes do not reflect exclusively the influence of pre-experimental frequencies. Due
to the fact that pre-experimental and experimental frequencies were inversely related, one can assume
that the actual influence of the pre-experimental frequencies is somewhat larger than expressed in the
effect sizes. For the present purpose this issue is irrelevant for we use effect sizes to compare the impact
of pre-experimental frequencies across conditions and experiments in which the experimental
frequencies always varied in the same way.
36
Table 2.3
Effect sizes of the list compositions on recall ratios and frequency estimates in
Experiment 1
Recall Ratio
Study Lists
Frequency Estimates
reffect size
t
reffect size
t
df
19 hfm/20 ukf 19 hff/20 ukm
.93
13.57
.67
4.94
30
19 hfm/20 mff 19 hff/20 mfm
.67
4.99
.40
2.37
30
19 mfm/20 ukf 19 mff/20 ukm
.95
16.17
.32
1.88
30
Note.
hfm = high-frequent male names; mfm = medium-frequent male names, ukm = unknown
male names; hff = high-frequent female names; mff = medium-frequent female names, ukf = unknown
female names
Frequency Estimates
Figure 2.1 displays the mean quantitative estimates about the percentage of
male names on the six study lists. The data from the two conditions in which the
frequency judgment tasks preceded or followed the recall test are summarized here and
in the analyses below since the task order had no systematic influence on the
magnitude of the quantitative estimates and the ordinal frequency judgments. On the
first three lists the actual percentage of male names on the study list amounts to 48.7%.
This percentage is overestimated on all of the three lists on which male names have the
greater pre-experimental frequency than female names. In contrast to this, the actual
percentage of male names of 51.3% on the three lists with pre-experimentally more
frequent female names is continuously underestimated. Although the deviation of the
mean estimates from the normative value is generally not large and exceeds 10% only
on the two lists with unknown female names, a bias in the estimates is evident. This is
particularly obvious in the effect sizes with which we analyzed the influence of the
pre-experimental frequencies of occurrence on the quantitative estimates on male
names. For lists with high-frequent and medium-frequent names and for lists with
medium-frequent and unknown names an effect of a similar size is the result (s. Tab.
2.3) which, according to Cohen’s conventions (1992) can be described as medium to
large. On the two lists that combined high-frequent and unknown names the estimates
37
on male names differed even more which is reflected in a further increased effect size.
In general, the pre-experimental frequencies normatively irrelevant for the frequency
judgment task thus seem to have a strong influence on the quantitative estimates.
A similar picture results from the participants’ ordinal frequency judgments.
Table 2.4 shows for each of the six study lists the percentage of those participants who
judged that the list contained more male than female names. For each of the three lists
with fewer male than female names a vast majority of participants was of the
erroneous opinion that they were presented with more male names. In contrast to this,
only a minority of participants judged that male names were more frequent on those
three lists that really contained more male names. Thus, both in the quantitative and in
the ordinal frequency judgments an effect analogous to the famous-names effect is
found: The relative frequency of that sex represented with pre-experimentally more
frequent names on the study list is overestimated.
Table 2.4
Percentage of the participants who judged male names to be more frequent on each of
the six study lists in Experiment 1. For all lists N = 16.
Study List
19 hfm/20 ukf 19 hfm/20 mff 19 mfm/20 ukf 19 mff/20 ukm 19 hff/20 mfm 19 hff/20 ukm
87.5
75.0
81.3
37.5
25.0
12.5
Note.
hfm = high-frequent male names; mfm = medium-frequent male names, ukm = unknown
male names; hff = high-frequent female names; mff = medium-frequent female names, ukf = unknown
female names
How well are the frequency estimates predicted by the recall-estimate strategy
and the memory models? The quality of the prediction of the recall-estimate strategy
can not only be evaluated according to the progression of the estimates across the six
study lists but also by the magnitude of the estimates. As Figure 2.1 illustrates, the
recall-estimate strategy comes off badly according to this criterion. On all study lists
the recall-estimate approach expects a greater deviation between the correct and
estimated percentage of male names than actually occurs. Only on the list with highfrequent female names and medium-frequent male names predictions and estimates
have a similar size. In contrast, on the four lists containing unknown names the bias in
the estimates is overestimated by at least 25%.
38
As already explained, the magnitude of the predictions of the memory models
resulted from a fitting with the actual data. A comparison of the quality of the
predictions of the memory models and the recall-estimate strategy can hence be
exclusively based on the progression of the estimates. A visual inspection of the data
reveals that the progression of the estimates across the three lists with preexperimentally more frequent female names corresponds to the prediction of the
memory models. Here the estimates decrease together with the differences of the preexperimental frequencies of occurrence of male and female names. In contrast to this,
the estimates on the three lists with pre-experimentally more frequent male names
rather follow the prediction of the recall-estimate strategy. The smallest mean estimate
for the percentage of male names can be found here on the list with high-frequent male
and medium-frequent female names. Furthermore, as expected by the recall-estimate
strategy, the mean estimate to the list with medium-frequent male and unknown female
names is similarly large as the mean estimate to the list with high-frequent male and
unknown female names.
To assess the covariation between the predictions and the estimates more
exactly, we used contrast analyses (Rosenthal et al., 2000). As fit measure, contrast
analysis provides an effect size that can be interpreted as the correlation of the
predicted estimates on the six study lists with the individual estimates of all 96
participants. Table 2.5 displays the results of the contrast analyses for the predictions
of the memory models and the predictions of the recall-estimate strategy. Both
approaches attain high and nearly identical effect sizes. As already suggested by the
visual inspection of the data, the predictive performances of the memory models and
the recall-estimate strategy can thus be judged alike.
Table 2.5
Contrast Analyses for the Predictions of the Memory Models and the Recall-Estimate
Strategy in Experiment 1
Prediction
MScontrast
MSerror
dferror
reffect size
Memory Models
Recall-Estimate
Note.
2983,6
3153,6
136,3
136,3
90
90
.43
.44
F values can be calculated by dividing MScontrast by MSerror.
39
In the experiment we determined a recall ratio for every participant on the basis
of the results of the recall test. The recall-estimate strategy thus allows, in contrast to
the memory models, not only the prediction of the mean estimates on the six study lists
but also the prediction of individual estimates. Therefore, we calculated additionally
the correlation between the 96 individual recall ratios and the corresponding estimates.
This correlation amounted to r = .44. However, the relation between the recall ratios
and the estimates differed somewhat between the two task orders. For the 48
participants who completed the recall test before the frequency judgments the
correlation amounted to r = .52. For the participants who submitted their frequency
judgments first a correlation of r = .36 was the result (Fisher-Z = 0.944, p = .171 (onetailed)).
Discussion
The primary objective of the present experiment was to check whether an effect
analogous to the famous-names effect can also be observed if the names presented on
the study lists are not selected according to their fame but according to their preexperimental frequency of occurrence. The answer to this question is an unambiguous
‘Yes’. The pre-experimental frequencies of occurrence of female and male names
exerted a strong influence on the quantitative frequency judgments. Moreover, in the
ordinal frequency judgments to all six study lists, a majority of participants chose the
sex with the pre-experimentally more frequent names although the names of this sex
were actually contained less frequently on the lists. In the original study by Tversky
and Kahneman (1973) 81% of the participants judged falsely that they were presented
with more famous names. In the current study those lists with high-frequent and
medium-frequent names are the most similar to the lists with famous and less-famous
names used by Tversky and Kahneman. For these lists 75% of the participants judged
that the number of pre-experimentally more frequent names was greater. The size of
the effect observed here hence seems to be comparable with the size of the effect in the
original study.
40
Although the participants’ frequency judgments were systematically biased, the
absolute deviation between the quantitative estimates - that were not collected in the
original study - and the normatively correct responses was rather small. For example,
the difference between the mean estimated and the actual percentage of male names on
the lists with high-frequent and medium-frequent names amounted to little more than
5%. Similarly minor deviations have also been found in earlier studies using famous
and unknown names (Lewandowsky & Smith, 1983) or famous and less-famous names
(Manis et al., 1993). The fact that the absolute size of the error is rather small may well
qualify the far-reaching conclusions that were drawn in the literature on judgment and
decision making also with respect to the famous-names effect. From the view that
frequency and probability judgments were generally extremely susceptible to biases
the recommendation accrued here to delegate important decisions exclusively to
trained decision-makers or to computers (e.g. Bazerman, 2001; Lichtenstein et al.,
1978; Shanteau, 1978). This recommendation appears to be exaggerated – at least as
far as it is based on the famous-names effect. From an estimate of a little over 50% for
an actual frequency of 48.7% normally no serious damage will be caused even in case
of critical decisions.
In fact, the results of the present experiment can also be seen as evidence for
the thesis that memory for frequencies is less error-prone than other memory
performances (e.g. Hasher & Zacks, 1984). In accordance with this assumption, the
recall ratios for all six study lists were biased more than the frequency judgments.
Obviously, problems result from this for the recall-estimate approach. If the
participants had used the number of recalled male and female names as sole indicator
of the requested frequencies, their estimates would have to have deviated more from
the normatively correct responses. Thus, if one assumes that the participants used a
recall-estimate strategy to generate their estimates, one has to assume at the same time
that they would dispose of any kind of additional
frequency knowledge. This
knowledge should make it possible to correct the strong recall bias at least to some
degree. Several previous studies indicate that recall performances are oftentimes a cue
too bad for frequencies to be able to explain the comparatively precise estimates
completely (Watkins & LeCompte, 1991; Lewandowsky & Smith, 1983; Manis et al.,
41
1993). In the literature to the recall-estimate approach yet no suggestions can be found
which additional information is possibly used for the generation of frequency estimates
and how this information can be integrated with the recall performances.
In spite of this problem of the recall-estimate approach, the results of
Experiment 1 were inconclusive with regard to the question which estimation strategy
produced the bias in the participants’ judgments. The covariation between the
predictions of the recall-estimate strategy and the estimates was as high as the
covariation between the predictions of the memory models and the estimates.
Therefore, the assumption that the participants relied on the recall-estimate strategy
remains as plausible as the assumption that they used a retrieval-free memory
assessment process to generate their judgments.
Altogether, the findings from Experiment 1 thus confirmed that the necessary
preconditions of the alternative explanation for the famous-names effect by the
memory models are fulfilled: the presentation of names with different preexperimental frequencies led to a bias in the estimates which corresponded to the
famous-names effect in its size. Besides, the estimates could be predicted by the
memory models, the sole input of which are the pre-experimental frequencies, as well
as by the recall-estimate strategy.
In order to be able to put down the famous-names effect unambiguously to the
differences in the pre-experimental frequencies of occurrence, a confounding of fame
and frequency of occurrence would have to be avoided. This was not the case in the
present experiment. The assumption that names mentioned in the text corpus more
often are also judged to be more famous was also confirmed in a classroom study. We
asked 64 participants to rate the fame of 160 names for which we had determined
citation frequencies on 9-point scales. The mean ratings of the names correlated with
the logarithm of the frequency of occurrence in the corpus with r = .48. For the 78
‘known’ names used in the experiment showing the greatest and smallest citation
frequencies the correlation amounted to r = .60. The mean ratings of the high-frequent
male and female names were 6.1 and 5.1, respectively. For the medium-frequent male
and female names the corresponding numbers amounted to 4.5 and 4.0. This leaves
open the interpretation that the bias in the frequency judgments observed in
42
Experiment 1 is indeed not caused by the different pre-experimental frequency of
names but by other characteristics of famous names. A critical test of the memory
model hypothesis thus requires that the pre-experimental frequency of occurrence of
the stimuli is varied independent of fame. To provide such a test was the objective of
Experiment 2.
43
Experiment 2
In Experiment 2 we used exemplars of the categories ‘plants’ and ‘animals’ as
stimulus material. From this material we constructed two study lists. On one of the lists
animals showed the greater pre-experimental frequency, on the other plants did so.
According to the hypothesis of the memory models, we should find the same effect as
in Experiment 1 with this material: The category with the pre-experimentally more
frequent exemplars should each time receive higher frequency estimates although
fewer of its exemplars are included in the study list. The differences between the preexperimental frequencies of occurrence of plants and animals selected as stimulus
material were greater than the differences between the pre-experimental frequencies of
occurrence of female and male names on each of the six study lists in the previous
experiment. The memory models attribute the famous-names effect exclusively to
these differences. On the assumption that the extra-experimental encoding contexts of
names are not systematically more similar to the experimental encoding context than
those of plants and animals, the effect should even be greater in the present study.
Method
Participants: Forty persons took part in the experiment (27 men and 13
women, with an average age of 25 years). The participants were students of the
University Paderborn, most of them from the departments of economics and
engineering. Students of Psychology did not take part in the experiment. The
participants received a reimbursement of 3 Euros.
Material: For the selection of the stimulus material we first determined the
frequency of occurrence of 220 animals and 121 plants in the same text corpus we had
already used in Experiment 1. From each of the two categories we selected 19
exemplars with a frequency of occurrence as high as possible and 20 exemplars with a
frequency of occurrence as low as possible. Exemplars with less than 15 listings in the
corpus, however, were not taken in the study lists. Additionally, words with more than
three syllables were excluded. Finally, the total number of syllables of the 19 plants
and 19 animals with a high frequency of occurrence was matched. The same was done
44
with the total number of syllables of the 20 plants and animals with a low frequency of
occurrence.
The selected exemplars were combined to a study list with 19 high-frequent
animals (e.g. Löwe (lion), Ente (duck), Biene (bee)) and 20 low-frequent plants (e.g.
Eibe (yew), Aster (aster), Hirse (millet)) and to a study list with 19 high-frequent
plants (e.g. Rose (rose), Tanne (fir), Tomate (tomato)) and 20 low-frequent animals
(e.g. Hyäne (hyena), Fasan (pheasant), Libelle (dragonfly)). Table 2.6 displays the
sum totals of the frequencies of occurrence in the text corpus for animals and plants on
each of the two study lists.
Table 2.6
Sums of the frequencies of occurrence in the text corpus for animals and plants on the
study lists in Experiment 2
study lists
animals
plants
19 high-frequent animals / 20 low-frequent plants
31617
2052
19 high-frequent plants / 20 low-frequent animals
2029
22188
Procedure: The results of a pilot study with identical test material indicated
that the participants did not categorize the presented words into plants and animals on
their own. The participants’ frequency judgments for plants and animals did on both
lists not add up to 100% but to approximately 80%. In a post-questioning very many
participants stated that they subdivided the test material into categories like trees,
flowers, vegetables, birds, carnivores etc.11 Apparently, the relevant categories plants
and animals were not salient during the presentation of the study lists. In contrast, one
can assume that the sex categorization relevant in the famous-names study and in
Experiment 1 was automatically perceived (Taylor et al., 1978; Fiske & Neuberg,
1990; Messieck & Mackie, 1989). Therefore, to ensure that the presented words were
categorized into plants and animals, we realized some changes in Experiment 2 with
regard to Experiment 1. Before the presentation of the study lists the participants were
11
This observation can be viewed in line with earlier findings demonstrating that the otherwise quite
stable sensitivity to frequencies completely breaks down when participants are questioned about the
frequency of categories which they did not perceive during the encoding of the stimuli (Barsalou &
Ross, 1986; Freund & Hasher, 1989; Conrad, Brown, & Dashen, 2003).
45
informed that the list would exclusively include animals and plants. Furthermore, the
participants worked on a categorization task during the presentation of the lists. After
the presentation of each word, two fields with the labels “plant“ and “animal“ appeared
on the screen. The participants should indicate by a mouse click to which of these
categories the last word presented belonged. Right after this mouse click, the next
word appeared.
With the exception of these changes, the procedure was identical with the
procedure from Experiment 1: One of the two study lists was randomly assigned to
every participant. The presentation time for a word was three seconds. The sequence of
the presentation of the 39 words on one study list was randomly determined for every
participant. After the study phase, every participant was asked for an ordinal and a
quantitative frequency judgment. In addition to this, the participants completed a recall
test. The results from Experiment 1 indicated that the order in which the frequency
judgments tasks and the recall test are completed possibly has an influence on the
correlation between the individual recall ratios and the quantitative frequency
judgments. To test whether this result can be replicated, the task order was again
varied.
Predictions of the Memory Models
For the derivation of quantitative predictions of the memory models we used
the parameter setting a = 0.001 that was estimated from the magnitude of the
frequency judgments in Experiment 1. This parameter setting and the sums of the
frequencies of occurrence of animals and plants in the text corpus given in Table 2.6
determine the predictions of the memory models. For the list with high-frequent
animals and low-frequent plants an estimate of 69.6% on the relative frequency of
animals is predicted. For the list with high-frequent plants and low-frequent animals,
on the other hand, an estimate of 34.8% on the same relative frequency is expected (s.
a. Fig. 2.2).
Results
An analysis of the results of the categorization task revealed that 10 out of 40
participants categorized one word incorrectly. Further 5 participants made 2 to 5
46
categorization errors. The vast majority of all categorization errors occurred with
plants and animals with a low corpus frequency. Besides, some words were
categorized wrongly by several participants (e.g. Zeisig (siskin), Flunder (flounder),
Hirse (spelt)). This suggests the conclusion that at least a part of the errors can be
ascribed to ignorance of the meanings of the corresponding words and not to careless
mistakes or a lack of attention during the completion of the categorization task. We
hence excluded the 5 participants from the further analysis who made more than one
categorization error. Moreover, we carried out all analyses described below twice.
Once with the data of the 25 participants without mistakes and a further time with the
altogether 35 participants who made none or one categorization error. Since the results
of both analyses did not differ significantly and led to the same conclusions, we only
report those results below that are based on 35 participants (19 in the condition “19
high-frequent animals / 20 low-frequent plants” and 16 in the condition”19 highfrequent plants / 20 low-frequent animals”).
Predictions of the Recall-Estimate approach
As in Experiment 1 we calculated a recall ratio for every participant. This ratio
here indicates the percentage of recalled animals among all recalled exemplars. Since
the recall ratios from both task order conditions did not differ, these conditions were
summarized in the analyses reported below.
The recall ratios were again influenced by the pre-experimental frequency of
occurrence of the words. Figure 2.2 displays the corresponding means on the study
lists. For both study lists the participants recalled more words from the category with
the pre-experimentally more frequent exemplars. The calculation of the effect size of
the list composition on the recall ratios resulted in reffect size = .29 (t (33) = 1.738), a
medium effect according to Cohen’s conventions (1992).
Consequently, also the recall-estimate strategy predicts an overestimation of
the percentage of animals on the list with high-frequent exemplars of this category. At
the same time, an underestimation of the percentage of animals on the list with lowfrequent exemplars is expected. However, different from the situation in Experiment 1
the absolute size of the bias in the predictions of the recall-estimate strategy is clearly
47
smaller than in the predictions of the memory models. Thus, the predictions of this
strategy deviate from the correct responses on the list with high-frequent animals by
6.0% and on the list with low-frequent animals even only by 3.2%.
Memory Model Predictions
Recall-Estimate Predictions
Estimates
Percentage of Animals
70
60
50
40
30
0
19 hfa / 20 lfp
19 hfp / 20 lfa
Figure 2.2. Mean estimated percentages of animals on the two study lists and the
corresponding predictions of the memory models and the recall-estimate
strategy in Experiment 2. Error bars denote standard errors. hfa = highfrequent animals, lfp = low-frequent plants, hfp = high-frequent plants, lfa
= low-frequent animals.
Frequency Estimates
No differences in the magnitude of the quantitative estimates or in the ordinal
frequency judgments were associated with the variation of the task order. Therefore, in
the following analyses, the data of the participants from both task order conditions are
combined again.
The frequency judgments were not influenced by the different pre-experimental
frequencies of animals and plants. Contrary to the predictions of the memory models
and the recall-estimate strategy, the mean estimate on the relative frequency of animals
48
on the list with low-frequent animals was slightly higher than on the list with highfrequent animals (s. Fig. 2.2). The effect size of the list composition on the quantitative
frequency judgments was reffect
size
= -.09 (t (33) = -0.512). Furthermore, the mean
estimates on the relative frequency of animals on both lists corresponded nearly
exactly to the actual proportions. On the list with high-frequent animals the deviation
amounted to 1.7%, on the list with low-frequent animals it was 0.9%. Accordingly,
also in the ordinal frequency judgments no indication to an influence of the preexperimental frequencies of occurrence could be found. Eleven of the nineteen (58%)
participants presented with the list with high-frequent animals judged that the list
contained more animals. In case of the list with low-frequent animals, nine out of
sixteen (56%) participants made the same judgment.
As in Experiment 1, we also determined in the present data the relation
between the individual recall ratios and individual frequency judgments. As suggested
by the previous results, this correlation was with r = .04 close to zero. However, once
more, differences between the correlations in both task order conditions were found. In
the condition with a preceding recall test a positive correlation of r = .38 (N = 17) was
the result although even in this condition the overestimation of the pre-experimentally
more frequent category predicted by the recall-estimate strategy could not be observed.
On the other hand, in the condition in which the estimates were given before the recall
test was completed, the correlation was r = - .29 (N = 18; Fisher-Z = 1.886, p = .06
(one-tailed)).
Discussion
In the second experiment stimulus material was used the pre-experimental
frequency of occurrence of which does not covariate with fame. As opposed to the
expectations of the memory models, for this stimulus material no influence of the preexperimental frequency of occurrence on the frequency judgments could be found. As
becomes clear from the quantitative predictions of the memory models, in the present
experiment an even greater effect should have occurred than in Experiment 1 due to
the larger differences in the pre-experimental frequencies of the items. The results of
the second experiment hence suggest the conclusion that the bias in the estimates on
49
female and male names in Experiment 1 was produced by different characteristics of
the names than their pre-experimental frequency. The results thus also disagree with
the hypothesis that the famous-names effect is triggered by the different preexperimental frequency of occurrence of famous and non-famous names. Since the
explanation of the memory models for the famous-names effect is based on this
hypothesis, consequently the results oppose the assumption that estimates showing this
bias are generated by a retrieval-free memory assessment process.
However, also the recall-estimate approach cannot offer a satisfying
explanation for the findings of the present experiment. It is true that a clearly smaller
bias in the estimates was expected by the recall-estimate strategy than by the memory
models. The deviation of the mean estimates from the predictions was thus less high
for the recall-estimate approach. On the other hand, the results provide hardly any
evidence for the assumption that the frequency judgments were in any way influenced
by the number of retrieved animals and plants. With the recall-estimate strategy as well
it had to be expected that the estimated percentage of animals on the list with preexperimentally high-frequent animals was higher than on the list with preexperimentally low-frequent animals. This effect did not appear. Furthermore, the
individual estimates did not correlate with the individual recall ratios. Merely in the
condition with a preceding recall test a moderately positive correlation between these
variables could be found. However, also in this condition the relation was apparently
too weak to produce the expected effect in the mean frequency judgments.
Nevertheless, the fact that higher correlations between recall ratios and estimates were
found in both experiments if the recall test was completed before the frequency
judgment tasks may be significant. The recall test provides the participants inevitably
with information about the number of retrieved exemplars. Possibly, the participants
would use this information less or not at all to generate their frequency judgments
without the test (Betsch et. al, 1999; Bruce, Hockley, & Craik, 1991). The preceding
completion of a recall test may thus foster the use of a recall-estimate strategy.
Why did the bias expected both by the memory models and by the recallestimate approach not occur in the estimates in Experiment 2? From the view of the
memory models, a possible explanation can be found in the assumption that the extra-
50
experimental encoding contexts of animals and plants are less similar to the
experimental encoding context than the extra-experimental encoding contexts of
proper names. Thus, it would be inappropriate to model the influence of the extraexperimental frequencies on the estimates in both cases with the same setting for the
parameter a. Indeed, such an assumption might explain slight differences in the size of
the effect for proper names on the one hand and plants and animals, on the other.
However, it seems to be little plausible that the extra-experimental encoding context of
plants and animals allows a perfect discrimination of the experimental context while
the extra-experimental context of proper names results in a considerable influence on
the experimental estimates.
Further possible explanations for the absence of the expected bias result from
the differences in the procedures of Experiment 2 and Experiment 1. To manipulate the
variable ’pre-experimental frequency of occurrence’ independently of fame, in the
present experiment words signifying a whole class of objects (e.g. “goat“) were used
instead of proper names. Moreover, this change of stimulus material required
introducing a categorization task during the presentation of stimuli. Finally, this
categorization task showed that a part of the participants had difficulties to classify the
stimulus material correctly.
The fact that the participants overtly categorized the presented words might
possibly explain why the estimates were not related to the recall performances. In a
study by Bruce, Hockley and Craik (1991) the participants estimated the number of
exemplars of different categories on a study list previously presented. During the
presentation, one group of participants completed a categorization task. It could be
demonstrated that the estimates of this group correlated clearly weaker with the
corresponding recall performances than the estimates of a group of participants not
completing a categorization task. This indicates that a categorization task can hamper
the use of a recall-estimate strategy. If an effect of extra-experimental frequencies on
category size judgments was exclusively mediated via the use of this strategy this
could besides explain why no bias occurred in the present experiment.
The fact that at least some participants could not categorize the presented
words correctly into animals and plants provides, on the other hand, a further possible
51
explanation for the estimates deviating clearly from the predictions of the memory
models. In the derivation of the memory model predictions we assumed that the
presented exemplars activate the memory representation of their superordinate
categories already at the stage of encoding. This activation can certainly only occur if
the participants learned a stable connection between exemplars and categories
beforehand. As far as the observed categorization errors can be interpreted as an
indication to the fact that the connection between the presented items and the
categories “animals” and “plants” was generally only weakly marked, this activation
could take place only to a reduced extent or not at all. In this case, the predictions of
the memory models would be invalid.
To exclude the alternative explanations for the absence of the bias in
Experiment 2, we designed the presentation of the target list in the following
experiment just as in Experiment 1. As stimulus material we used again proper names.
52
Experiment 3
To prevent a confounding of frequency of occurrence and fame also in
Experiment 3, the participants were exclusively presented with proper names unknown
to them before the experiment. Consequently, the participants did not dispose of any
differential knowledge with regard to the pre-experimental frequency of occurrence of
these names. Therefore, to be able to test the hypothesis of the memory models that
estimates on the frequencies of categories in a target context are influenced by the
frequencies of the corresponding exemplars outside this context we presented the
participants with two lists. The arrangement of the target list followed the design of the
famous-names study. On the previously presented non-target list the proper names
occurred with a varied frequency.
Method
Participants: Eighty-one persons took part in the experiment (16 men and 65
women, with an average age of 21 years). The participants, who were students of
introductory courses in psychology at the TU Chemnitz received course credits for
their participation. None of them had taken part in the earlier experiments.
Material: On the non-target list the participants were presented with 78
different names consisting of first names and surnames. The first names were 39 male
and 39 female names with a similar frequency of occurrence in the text corpus. The
surnames came from the telephone directory of the city of Münster. We selected bisyllabic names which were not listed more than 20 times and which had no common
meaning. First and surnames were put together randomly to the names presented on the
non-target list for every participant. From the resulting 39 female and male names 19
were presented on the non-target list ten times. Each of the remaining 20 female and
male names was shown once. Altogether, 420 presentations took place, 210 of female
names and 210 of male names.
The target list again consisted of 39 names that were presented once.
Analogous to the famous-names experiment this list either contained 19 male or 20
female names or 20 female and 19 male names. For the sex that was represented with
19 names on the target list only those names were selected that had been presented ten
53
times on the non-target list. The 20 names of the other sex were those names that had
only been shown once on the non-target list. The sum total of the frequencies of
occurrence of the names of both sexes outside the target list thus amounted to 190 and
20.
Procedure: The experiment always started with the presentation of the nontarget list. The participants were tested individually or in groups of up to three persons
at personal computers. Every participant was randomly assigned to one of the two
target list compositions. The participants were first informed that this experiment was
a memory study. Furthermore, they were told that in the first phase of the experiment
they were presented with a list of names for about 30 minutes. In addition to this, it
was explained to them that within this list repetitions of names could occur.
The order of the presentation of names on the non-target list was randomly
determined for every participant with the restriction that the same name could not
occur twice in immediate succession. The presentation time was three seconds. During
this first phase of the experiment, the participants had to complete an additional task.
The purpose of this task was to guarantee that the participants encoded the 420
presentations of names as completely as possible. For this, a letter (A, E, I, O or L) was
inserted on the screen after every presentation of a name. The participants should now
indicate by clicking a corresponding response option whether this letter was contained
both in the previously presented first name and in the previously presented surname.
As soon as the participants selected a response option, an interstimulus interval of 0.5
seconds followed in which the screen was blank. Before the presentation of the nontarget list started, the participants completed five practice trials. Within the practice
trials names were used that did not occur again in the further course of the experiment.
During the presentation of the non-target list, the participants were asked to pay more
attention if they made at least two wrong decisions in ten successive trials.
As has already been delineated, the non-target list in Experiment 3 was
introduced with the aim to induce differential pre-knowledge about the frequency of
occurrence of the different names. In order to test how far the depicted procedure is
successful, we carried out a pre-test with 20 participants. After the presentation, each
of these 20 participants gave estimates on the frequency of occurrence of 30 names
54
that had been shown before 10 times, 1 time or 0 times. The corresponding mean
frequency judgments amounted to 13.10 (Md = 10.40), 2.08 (Md = 1.00) and 0.33 (Md
= 0.10). Consequently, one can assume that the participants’ pre-knowledge about the
frequency of occurrence of different names was manipulated successfully by the
presentation of the non-target list.
Subsequent to the presentation of the non-target list, the experiment was
interrupted for 3 or 24 hours. After the expiration of this interval, the participants were
presented with the target list in the second phase of the experiment. The intermission
between the two phases of the experiments was introduced to decrease the probability
that the participants become aware of the relation between the sex of the names on the
target list and the frequency of the names on the non-target list. The length of the
interval and the composition of the target list were counterbalanced. During the
presentation of the target list, the participants completed no further task. Duration and
order of the presentations on the target list were determined as in Experiment 1. Also
the arrangement of the frequency judgment tasks and the recall test followed the
procedure from Experiment 1. However, it was pointed out to the participants that their
estimates should exclusively refer to the frequency of male and female names in the
second phase of the experiment.
Predictions of the Memory Models
According to the memory models, the extent to which estimates on the
frequency of stimuli on a target list are influenced by the frequency of occurrence of
the stimuli outside the target list should depend on the similarity of the encoding
contexts. In the present experiment this similarity is evidently large. Both the nontarget list and the target list were encoded here within the same experimental context
under nearly identical conditions. Thus, the influence of an item occurring on the nontarget list should be greater than the influence of an extra-experimentally encoded
item. Hence, the parameter a with which this influence is modeled has to adopt a
clearly higher value than in the former experiments. To estimate the parameter a, we
relied on the results of two earlier studies by Dougherty and Franco-Watkins (2003)
and Hintzman and Block (1971). In both studies the influence of frequencies of stimuli
55
outside the target context on frequency of occurrence judgments was examined. In
addition to this, in both studies as in the present experiment the frequency outside the
target context was manipulated by means of a non-target list. In the four experiments
by Dougherty and Franco-Watkins all words for which the frequency had to be
estimated occurred four times on the target list. The frequency of the same words on
the non-target list, however, varied between 0 and 16. In all experiments the estimates
on the frequency of words on the target list increased linearly with the frequency of
these words on the non-target list. The size of the increase was influenced by the
second independent variable in the experiments, the similarity of target and non-target
list. In all experiments for one group of participants both lists were designed
identically. Averaged across experiments, participants in this condition gave a mean
estimate of about 3.4 on words that did not appear on the non-target list. Thus, every
presentation on the target list increased the estimate by about 0.85 units. The mean
estimate for words included 16 times in the non-target list, on the other hand,
amounted to about 7.3. We can conclude from these figures that a presentation on the
non-target list increased the estimate by about 0.25 units. The influence of
presentations on the non-target list relative to the influence of presentations on the
target list was thus slightly below 0.3.
However, the increase in the estimates induced by the non-target list was at
least in a part of the experiments smaller when this list was not designed exactly as the
target list. The increase remained unchanged in a second condition of two experiments
in which the two lists differed by the background on which the words were presented.
The discrimination of both lists improved if the words on the non-target list were not
presented alone but together with a varying context word. The same finding resulted if
the participants in a second condition were instructed to use a different encoding
strategy during the presentation of the non-target list than during the presentation of
the target list (mental imagery and rote rehearsal, respectively). In both cases, the
relative influence of presentations on the non-target list amounted to approximately
0.1.
In the study by Hintzman and Block (1971) target and non-target list were
designed identically and presented with identical instructions. The frequency of the
56
words to be estimated varied on both lists between 0 and 5. Irrespective of the
frequency on the target list, a linear increase of the estimates with the frequency on the
non-target list was also found here. The relative influence of presentations on the nontarget list was again at approximately 0.3 (see Hintzman (1988) for a re-analysis of the
same data in which a similar approach to determine this influence was used). Thus, at
least in case of identically designed target and non-target lists this value seems to be
fairly stable.
Since it is assumed in the memory models that category size judgments are
influenced in the same way as frequency of occurrence judgments by the frequency of
the relevant stimuli outside the target context, the presentations on the non-target list
should have a similarly great influence also in the current experiment. The setting for
the parameter a should thus be somewhere between 0.1 and 0.3. It can hardly be
assessed whether the encoding task that our participants completed only during the
presentation of the non-target list allows an improved discrimination of the two lists.
We decided in favor of a conservative prediction. With the lowest plausible parameter
value of a = 0.1, the memory models expect an estimate of 63.3% on the relative
frequency of male names on the target list with 19 male and 20 female names.
Correspondingly, on the list with 20 male and 19 female names an estimate of 36.7%
is predicted for the same relative frequency (s. a. Fig. 2.3). Since frequency estimates
remain relatively stable even over a period of up to one week (Bruce, Hockley, &
Craik, 1991; Leicht, 1968), we used these predictions both in the condition in which 3
hours were between the non-target and the target list and in the condition in which this
interval lasted 24 hours.
Results
Predictions of the Recall-Estimate approach
We determined individual recall ratios for every participant in the same way as
in Experiment 1. Figure 2.3 displays the mean recall ratios on the different target lists
separately for both interval conditions. The order in which the frequency judgment
tasks and the recall test were completed again had no effect on the recall ratios.
57
As shown in Figure 2.3, the frequency of occurrence of the names on the nontarget list affected the recall ratios strongly. Male names on the target list were recalled
clearly more often than female names if they had been previously presented more often
on the non-target list. On the contrary, male names were recalled worse than female
names if they had been included on the non-target list less often. This effect can be
observed likewise in both interval conditions. The effect size of the list composition on
the recall ratios is in both conditions large with reffect size > .60 (s. a. Tab. 2.7).
Interval Non-Target - Target List: 3h
Interval Non-Target - Target List: 24h
60
70
Memory Model Prediction
Recall-Estimate Prediction
Estimates
50
40
30
Percentage of Male Names
Percentage of Male Names
70
60
50
40
30
0
0
19 males / 20 females 19 females / 20 males
19 males / 20 females 19 females / 20 males
Figure 2.3. Mean estimated percentages of male names on the target lists in the two
conditions with different intervals between non-target and target lists in
Experiment 3. The corresponding predictions of the memory models and
the recall-estimate strategy are also shown. Error bars denote standard
errors. Names of the sex represented by 19 names on the target list
appeared 10 times on the non-target list. The names of the other sex were
presented once on the non-target list.
Consequently, also the recall-estimate strategy expects a large influence of the
frequencies of names on the non-target list on the estimates on the relative frequencies
of female and male names on the target list. Generally, the bias in the estimates
predicted by the recall-estimate strategy is marginally greater than the bias expected by
the memory models. The predicted deviation between the estimated and the actual
percentage of male names on the target lists varies in both interval conditions between
15% and 19%. The size of these deviations corresponds to the bias predicted by the
58
recall-estimate strategy also on the lists with pre-experimentally high-frequent and
medium-frequent names in Experiment 1 (cf. Fig. 2.1).
Table 2.7
Effect sizes of the list composition on recall ratios and frequency estimates for both
interval conditions Experiment 3
Recall Ratio
Interval
Frequency Estimates
reffect size
t
reffect size
t
df
3h
.62
4.94
.11
0.66
39
24 h
.67
5.51
.11
0.67
38
Frequency Estimates
Like in the preceding experiments, the variation of the task order had no
influence on the magnitude of the quantitative estimates or the ordinal frequency
judgments. In the analyses below the task order conditions have thus been
summarized.
As shown in Figure 2.3, only a weak effect on the frequency judgments was
associated with the manipulation of the frequencies of names on the non-target list. As
expected by the memory models and the recall-estimate strategy, the relative frequency
of male names on the target list on which the actual proportion was 48.7% was judged
higher than on the target list on which the actual proportion amounted to 51.3%. This
holds in the condition in which 3 hours were between the presentation of the non-target
and the target list as well as in the condition in which the lists were separated by a 24
hour interval. However, this bias in the estimates was generally very small. In no
interval condition and on no target list the deviation between the estimated and the
actual relative frequency of male names exceeded 2.8%. Accordingly, also the effect
sizes of the list composition on the frequency judgments in both interval conditions
were small with reffect size = .11 (s. Tab. 2.7). The presentation frequencies on the nontarget list thus exerted a clearly smaller influence on the frequency judgments than
expected by the memory models and the recall-estimate approach.
59
The same picture results from the ordinal frequency judgments. In case of the
target list with 19 male and 20 female names, in both interval conditions only a small
majority of participants judged that the list contained more male names (3h-interval:
57% (N = 21); 24h-interval: 55% (N = 20)). On the list with 19 female and 20 male
names a narrow minority of participants was of the same opinion (3h- and 24hinterval: 45% (in both conditions N = 20)).
As suggested by the large difference between the bias predicted by the recallestimate approach and the actually observed bias, the correlations between the
individual recall ratios and the individual frequency judgments were small. With a
3-hour interval between non-target and target list this correlation amounted to r = .15
(N = 41). After a 24-hour break the correlation was r = .07 (N = 40; Fisher-Z = 0.351,
p = 0.72 (two-tailed)). Like in the previous experiments, these correlations were again
higher in the condition in which the recall test preceded the frequency judgments than
in the condition in which the frequency judgments were submitted first. However, this
time the differences between the correlations in the two task order conditions were
close to 0 (3h-interval: r = .18 and r = .15; 24h-interval: r = .10 and r = .04).
Discussion
In Experiment 3 the frequencies of occurrence of the stimuli outside the target
context were manipulated by the presentation of a preceding non-target list. According
to the hypothesis of the memory models, this manipulation should exert a considerable
influence on the estimates on the relative frequency of the superordinate categories of
the stimuli in the target context. In fact, a slight but in both interval conditions
consistent effect on the frequency judgments was found. Thus, the results of the
present experiment turn out somewhat more advantageous for the memory models than
the results from Experiment 2 in which no effect at all was found for animals and
plants with a pre-experimentally different frequency. Nevertheless, both experiments
largely lead to similar conclusions. In Experiment 3 the bias in the frequency
judgments was again clearly smaller than expected by the memory models. This also
means - contrary to the predictions - that the bias turned out much smaller than the bias
in the estimates on pre-experimentally high- and medium-frequent names in
60
Experiment 1. The absence of a greater effect cannot be ascribed to any changes in the
procedure of the presentation of the target lists with regard to Experiment 1. In both
experiments the participants completed no categorization task during the presentation
of these lists. Also the alternative explanation that the participants had difficulties with
the assignment of exemplars to categories that was plausible in Experiment 2 can be
excluded for the current experiment. All presented exemplars were common first
names (e.g. Matthias, Franz, Felix, Claudia, Helga, Inge) that should allow every
participant an easy identification of the corresponding sex. Thus, the results of
Experiment 3 indicate that the frequencies of exemplars outside the target context can
have an influence on estimates on the corresponding categories if these alternative
explanations are excluded. Nevertheless, the results also confirm the conclusion from
Experiment 2 that contrary to the hypothesis of the memory models this effect is not
suitable to explain the famous-names effect. The influence of the frequencies outside
the target context was again too small to be regarded as the driving force behind the
famous-names effect.
Also with regard to the recall-estimate approach, similar conclusions result
from Experiment 3 as from Experiment 2. In the present case, this approach predicted
an equally great bias in the estimates as the memory models. Once more the estimates
turned out to be considerably less error-prone than the recall performances underlying
this prediction. Moreover, the low correlations between the individual recall ratios and
estimates show that not more than 2% of the variance in the estimates can be explained
with the recall-estimate strategy in both interval conditions. In other words, the
participants obviously did not use this strategy to generate their estimates. Particularly
noteworthy is furthermore that the bias predicted by the recall-estimate approach was
as large as the bias predicted for the lists with high-frequent and medium-frequent
names in Experiment 1. Thus, a bias in the recall ratios of the size observed with these
names is apparently not sufficient to induce a bias in the estimates that corresponds to
the famous-names effect.
A last and important implication for the memory models of the findings from
Experiment 3 results from a comparison with earlier studies in which the influence of
frequencies on non-target lists on frequency of occurrence judgments was examined.
61
As already described above, the relative influence of presentations on the non-target
list on estimates about the frequency of occurrence of words on the target list was in
the studies by Hintzman and Block (1971) and Dougherty and Franco-Watkins (2003)
fairly stable at approximately 0.3. However, this was only valid if target and non-target
list were designed identically. As far as differences in the instructions to or in the
design of the lists existed, the influence varied between 0.1 and 0.3. The explanation of
the memory models for the famous-names effect is based on the assumption that
category size judgments are influenced in the same way by frequencies outside the
target context as frequency of occurrence judgments. We hence based the predictions
of the memory models on these values. Since in the present experiment the non-target
and target list were not designed completely identical, we used the lowest plausible
parameter setting of a = 0.1. Despite this conservative decision, the bias in the
category size judgments was clearly overestimated. An approximately correct
prediction of the estimates would have resulted with a parameter setting of a = 0.01.
That is, the influence of the frequencies on the non-target list on the category size
judgments in the present experiment was by one order of magnitude smaller than the
influence on frequency of occurrence judgments in every earlier, comparable study.
What can we conclude from this observation? Given that the participants
obviously did not use the number of retrievable male and female names to generate
their estimates, the assumption seems to be reasonable that they relied on a kind of
memory assessment process. This assessment process, however, apparently leads to a
different result than expected by the memory models. The predictions of the memory
models were based on the hypothesis that the memory response elicited by the
category labels is proportional to the sum of the responses to the corresponding
exemplars on the target list. If the participants actually used a memory assessment
strategy this hypothesis is wrong. Instead, the response to category labels is apparently
biased clearly less by the frequencies of the exemplars outside the target context.
The results from Experiments 2 and 3 make the famous-names effect appear to
be a singular phenomenon. Owing to the manipulation of the frequencies of the stimuli
outside the target context, in both experiments a bias of a similar size as in the famousnames study was to be expected. Furthermore, this manipulation resulted in both
62
experiments in a strong recall bias which was in one case as great as the recall bias
with famous names. Notwithstanding, a comparable effect in the frequency judgments
did not occur. Therefore, the question arises whether an effect analogous to the
famous-names effect can be induced at all by different stimulus material than famous
names. In Experiment 4 we thus used stimulus material supposed to be as similar to the
material of the original study as possible.
63
Experiment 4
The stimulus material in Experiment 4 consisted of names of German and
foreign cities with varying pre-experimental frequency of occurrence (and possibly of
varying fame). In contrast to Experiment 2, the stimuli were thus proper names here.
Besides, other than in Experiment 3 the participants here already pre-experimentally
disposed of knowledge of the frequency of occurrence and other attributes of the
stimuli. In other words, the stimuli here share more characteristics with the names of
famous persons than the stimuli of the former experiments.
Method
Participants: Eighty-one persons took part in the experiment (27 men and 54
women, with an average age of 23 years). In their large majority the participants were
students of the TU Chemnitz, 26 of them came from the department of Psychology.
The participants either received a reimbursement of 3 Euros or course credit for their
participation. None of them had taken part in the earlier experiments.
Material: Again we constructed two study lists. Analogous to the previous
experiments, these lists either contained 19 German cities with a high and 20 foreign
cities with a low frequency of occurrence in the text corpus or 19 foreign cities with a
high and 20 German cities with a low frequency of occurrence. As pre-experimentally
high-frequent German cities we used the 19 cities in Germany with the largest number
of inhabitants (e.g. Berlin, Hamburg, Wuppertal, Bielefeld). For the selection of the
pre-experimentally high-frequent foreign cities first a set with 50 known, international
cities was created. From these 50 cities those 19 cities with the highest frequency of
occurrence in the text corpus were included in the corresponding study list (e.g. New
York, Paris, Moscow, Peking). The low-frequent foreign and German cities used in the
experiment were selected by means of a pre-test. The aim of this pre-test was to
guarantee that the cities mentioned on the study lists could be categorized correctly by
the participants. The 34 participants of the pre-test were presented with a questionnaire
with 60 foreign and 60 German cities with a low frequency of occurrence in the text
corpus. The participants’ task was to indicate to every city whether it is a German or a
64
foreign city.12 From the altogether 120 cities 21 German and 35 foreign cities were
correctly categorized by all participants. From these cities 20 were selected randomly
out of each category and used as low-frequent cities on the study lists (e.g. Büsingen,
Wunsiedel, Goslar, Meppen or Quito, Amiens, Sfax, Toledo). Table 2.8 displays the
sum totals of the frequencies of occurrence in the text corpus of high- und lowfrequent German and foreign cities on both study lists.13
Table 2.8
Sums of the frequencies of occurrence in the text corpus for German and foreign cities
on the study lists in Experiment 4
study lists
German cities
Foreign cities
19 high-frequent German cities /
20 low-frequent foreign cities
207,775
1,734
19 high-frequent foreign cities /
20 low-frequent German cities
4,035
104,731
Procedure: In the pilot study to Experiment 2 the participants’ estimates did
not add up to 100%. At the same time, the participants stated that they subdivided the
stimulus material in different categories than those to be estimated. This suggests that
the participants give reasonable estimates on the size of the relevant categories only if
they perceived those categories already during the encoding of the stimulus material.
It appeared doubtful that the participants in the present experiment would
automatically categorize the stimuli in German and foreign cities. As in Experiment 2,
the participants thus completed a categorization task during the presentation of the
study lists. In other words, the participants were informed in advance that the study
lists would exclusively contain cities from the categories German and foreign. After
each presentation of a name of a city, the participants indicated by a mouse click on an
appropriate field to which category this city belongs. As in Experiment 2, the
12
None of the foreign cities included in the questionnaire was a German-speaking city.
The newspapers contained in our text corpus are published in Berlin, Munich and Frankfurt. In all
three newspapers their places of publication occurred with an extremely increased frequency. We thus
excluded these frequencies and always estimated them by the frequencies of occurrence in the other two
newspapers.
13
65
procedure was identical with the procedure in the Experiments 1 and 3 with the
exception of this categorization task.
Predictions of the Memory Models
To gain predictions of the memory models, we used the parameter setting a =
0.001 which allowed a fairly accurate prediction of the magnitude of the frequency
judgments in Experiment 1. From this parameter setting and the sums of the
frequencies of occurrence of German and foreign cities in the text corpus given in
Table 2.8 the following predictions result: for the list with high-frequent German and
low-frequent foreign cities an estimate of 91.3% on the relative frequency of German
cities is expected. For the list with high-frequent foreign and low-frequent German
cities, on the other hand, an estimate of 16.3% on the same relative frequency is
predicted (s. a. Fig. 2.4).
Results
Predictions of the Recall-Estimate Approach
For every participant we determined a recall ratio indicating the percentage of
recalled German cities among all recalled cities. Figure 2.4 displays the mean recall
ratios on the different study lists separately for the two orders in which the frequency
judgment tasks and the recall test were completed. As illustrated by the figure, the task
order had once more no influence on the recall ratios. In both order conditions a strong
bias in the recall ratios is the result. Generally, more cities from the category with the
pre-experimentally more frequent exemplars are recalled. The effect size of the list
composition on the recall ratios in both order conditions is with reffect size > .80 very
large (s. Tab. 2.9).
66
Table 2.9
Effect sizes of the list composition on recall ratios and frequency estimates in both task
order conditions in Experiment 4
Recall Ratio
Task Order
Frequency Estimates
reffect size
t
reffect size
t
df
Recall Test –
Frequency Judgments
.83
9.38
.35
2.34
39
Frequency Judgments Recall Test
.89
12.68
-.06
-0.39
38
Thus, the recall-estimate approach expects again strongly distorted estimates.
The predicted deviation between the estimated and the actual percentage of German
cities amounts to about 30% on the list with high-frequent German cities irrespective
of the order condition. On the list with low-frequent German cities this deviation
comes up to approximately 16%.
Task Order:
Recall Test - Frequency Judgments
Task Order:
Frequency Judgments - Recall Test
100
100
Percentage of German Cities
90
80
70
60
50
40
30
20
10
90
Percentage of German Cities
Memory Model Prediction
Recall-Estimate Prediction
Estimates
80
70
60
50
40
30
20
10
0
0
19 hfgc/20 lffc
19 hffc/20 lfgc
19 hfgc/20 lffc
19 hffc/20 lfgc
Figure 2.4. Mean estimated percentages of German cities on the study lists in the two
task order conditions in Experiment 4. The corresponding predictions of
the memory models and the recall-estimate strategy are also shown. Error
bars denote standard errors. hfgc = high-frequent German cities, lffc =
low-frequent foreign cities, hffc = high-frequent foreign cities, lfgc =
low-frequent German cities.
67
Frequency Estimates
The occurrence of a bias in the frequency judgments was dependent on the task
order condition. As illustrated in Figure 2.4, a bias could be found if the recall test
preceded the estimates. In this condition German cities received higher estimates on
the study list on which they were represented with 19 exemplars than on the study list
on which they were represented with 20 exemplars. The effect size of the list
composition on the estimates is similarly large as with high- and medium-frequent
names in Experiment 1 (cf. Tab. 2.9 and 2.3).14 On the other hand, the estimates were
uninfluenced by the pre-experimental frequency of the cities if they had been given
before the recall test. In this order condition the relative frequency of German cities on
both study lists was slightly overestimated (by approximately 2%). Consequently,
German cities received here higher estimates on the list in which they were included
more often. Accordingly, the effect size of the list composition adopts a slightly
negative value (s. Tab. 2.9). As can be seen from Table 2.10, the ordinal frequency
judgments confirm the results of the analysis of the quantitative estimates.
Table 2.10
Percentage of the participants who judged German cities to be more frequent on the
study lists in the two task order conditions in Experiment 4.
Task Order:
Task Order:
Recall Test – Frequency Judgments
Frequency Judgments - Recall Test
19 hfgc / 20 lffc
19 hffc / 20 lfgc
19 hfgc / 20 lffc
19 hffc / 20 lfgc
86% (N = 21)
50% (N = 20)
55% (N = 20)
60% (N = 20)
Note. hfgc = high-frequent German cities, lffc = low-frequent foreign cities, hffc = high-frequent foreign
cities, lfgc = low-frequent German cities.
In the condition in which the participants completed the recall test first the
correlation between the individual recall ratios and the estimates amounts to r = .38 (N
= 41). In the condition in which the participants gave their estimates first this
correlation comes up to r = .01 (N = 40; Fisher-Z = 1.69, p = 0.05 (one-tailed)).
14
It should be noted that the frequency of German cities on the list with high-frequent foreign and lowfrequent German cities was hardly underestimated. The results in the condition in which the frequency
judgments were given before the recall test indicate that this might be ascribed to a general tendency
independent of the list composition to estimate the frequency of German cities slightly higher.
68
Discussion
The aim of Experiment 4 was to check whether the famous-names effect could
be replicated with stimulus material that shares as many characteristics with the names
of famous persons as possible. This replication succeeded at least in the condition in
which the recall test preceded the frequency judgments. In this condition, the results
largely corresponded to the findings from Experiment 1. Study lists with names of
cities with a different pre-experimental frequency here induced a very strong recall
bias. As was therefore expected by the recall-estimate strategy, also the frequency
judgments were systematically distorted. The effect in the estimates was similarly
large as with lists with high- and medium-frequent and lists with medium-frequent and
unknown names of persons in Experiment 1. However, this means at the same time
that the bias in the estimates was again clearly smaller than the bias in the recall ratios.
Finally, also the correlations between the individual recall ratios and the individual
frequency judgments obtained at least a roughly similar size as in Experiment 1.
However, in the condition in which the participants gave their frequency
judgments before the recall test, a completely different picture resulted. Although the
recall-estimate approach here makes the same predictions as in the other order
condition, the frequency judgments corresponded fairly well to the actual percentage
of German cities on the study lists. Individual recall ratios and estimates were
uncorrelated. Thus, the number of retrieved foreign and German cities had no
influence at all on the estimates.
Furthermore, the absence of a bias in the estimates in this order condition
demonstrates that the predictions of the memory models do once again not apply. Due
to the great differences in the pre-experimental frequencies of German and foreign
cities, these models expected an even larger bias than the recall-estimate approach.
However, differences in the pre-experimental frequencies alone obviously do not cause
a bias analogous to the famous-names effect. In addition to this, the coexistence of a
strong recall bias and approximately correct frequency judgments supports the
conclusion we already derived in the discussion of the results of Experiment 3:
Contrary to our original expectation, participants seem to have access to memory
responses indicating the relative size of the categories on the study lists fairly accurate.
69
In the previous experiments we already made the observation that the
correlation between recall ratios and estimates turns out higher if the recall-test was
completed before the judgment tasks. In accordance with some findings from the
literature (Betsch et. al, 1999; Bruce, Hockley, & Craik, 1991), we interpreted this as
evidence for the assumption that the recall test makes information salient that the
participants would otherwise use less or not at all to generate their judgments. In
Experiment 1 we found in the condition in which the recall test was completed first
similar results as in the same order condition of the present experiment. However, the
correlation between recall ratios and estimates was in this experiment not reduced to 0
if the estimates preceded the recall test. Besides, in Experiment 1 there was a bias in
the frequency judgments also in this task order condition. Thus, the question arises
why estimates that could not be influenced by a previous recall test were completely
unbiased in the present experiment. The answer to this question may be found in the
categorization task. With the exception of the modified stimulus material to which the
absence of the bias can apparently not be attributed, this task was the only difference
between the experiments. As already described above, earlier research demonstrated
that a categorization task can impede the use of a recall-estimate strategy (Bruce,
Hockley, & Craik, 1991). The current results suggest furthermore that the combination
of overt categorization and no preceding recall test blocks the use of this strategy
entirely. It seems that after a categorization of the stimulus material the recall bias
affects the estimates only if the participants become aware through a recall test how
different the number of retrieved exemplars per category is. Consistent with this
interpretation, also Bruce, Hockley and Craik (1991) found in several experiments
correlations of 0 or close to 0 between recall performances and estimates if the stimuli
were categorized but no preceding recall test was completed. In the present study the
same condition was realized one further time in Experiment 2. There the correlation
was even slightly negative. Moreover, the differences between the correlations in the
two task order conditions were larger in the experiments with a categorization task
than in the experiments without such a task (Experiments 1 and 3). This also supports
the assumption that after an overt categorization of the stimuli, the recall-estimate
strategy is not used spontaneously but can still be motivated by a recall test.
70
General Discussion
The main objective of the present study was to test the hypothesis of the
memory models that the famous-names effect is caused by the different preexperimental frequency of famous and less famous names. All in all, the results of our
experiments contradict this hypothesis. In the Experiments 2 and 3 in which the
frequency of the stimuli outside the target context was manipulated independently of
fame we found no or just an extremely slight bias in the frequency judgments on the
corresponding superordinate categories. In Experiment 4 in which the study lists
contained cities with a different pre-experimental frequency a bias occurred only if a
recall test had been completed before the frequency judgments were given. This result
as well should of course not occur if different pre-experimental frequencies of stimuli
would generally lead to distorted estimates on category sizes mediated via a retrieval
free memory assessment process. None of the Experiments 2 to 4 thus provided
evidence for the assumption that differences in the frequency of exemplars outside the
target context alone are sufficient to influence estimates on categories in a considerable
way. As a result, we have to draw the conclusion that the explanation of the memory
models for the famous-names effect is incorrect.
Originally, the famous-names effect was explained by Tversky and Kahneman
(1973) with the assumption that frequency judgments on categories are based on the
number of the relevant retrieved exemplars. This assumption was supported in the
famous-names study by the observation that recall performances and estimates were
influenced by the fame of the presented names in the same way. We also found this
result pattern in Experiment 1 for study lists with names that were pre-experimentally
differently frequent and famous at the same time. Here a bias in the mean recall ratios
was associated with a bias in the mean frequency judgments. Besides, also the
participants’ individual estimates covaried to a substantial degree with the individual
recall ratios. All in all, the results of our experiments demonstrate nevertheless that the
hypothesis that the participants generally generated their estimates with the recallestimate strategy can not be maintained. The recall-estimate approach predicted biased
estimates in all experiments. Even if we observed a bias in the estimates as in
71
Experiment 1, this bias was smaller than had to be expected due to the recall ratios. We
found this pattern in all experiments and, over and above, without exception also in all
independent experimental groups.15 This confirms earlier findings according to which
recall performances are an indicator too poor of objective frequencies to be able to
explain the comparatively accurate estimates completely (e.g. Hasher & Zacks, 1984;
Lewandowsky & Smith, 1983; Manis et al., 1993; Watkins & LeCompte, 1991). In
addition to this, the occurrence of a recall bias did not at all necessarily mean that also
the estimates were biased. In Experiment 2 the estimates corresponded almost exactly
to the actual proportions of the categories in question on the study lists. The same was
true in Experiment 4 as far as the estimates were given before the recall test. In
Experiment 3 a recall bias corresponding in its size to the bias for high-frequent and
medium-frequent names was merely accompanied by a minimal distortion in the
estimates. The occurrence of a recall bias alone thus obviously does not allow a
reasonable prediction about whether and how much also the frequency judgments will
be distorted. Furthermore, the correlations between individual recall ratios and
estimates in the Experiments 2 and 3 as well as in the mentioned task order condition
of Experiment 4 were at 0 or close to 0. Therefore, any evidence is missing here that
the estimates were influenced by the recall performances in any way.
Apparently, neither the memory models nor the recall-estimate approach can
explain the entire result pattern in a satisfying manner. However, which cognitive
processes determined the participants’ estimates? In the remainder of this discussion
we will try to offer an explanation on the basis of the multiple strategy perspective
(Brown, 1995; 1997; 2002). According to this perspective, persons can use several
strategies to generate frequency judgments. Strategy selection is assumed to be
15
One might argue that this finding is due to the way we transformed the number of retrieved exemplars
in predictions of the recall-estimate strategy or to the time span we allotted for the recall test. However,
previous research strongly suggests that this is not the case. In a study by Manis et al. (1993) five
different recall indices were determined to predict frequency estimates. Recall ratios performed best. A
study by Watkins and LeCompte (1991) indicates that recall performances indeed become a better cue
of objective frequencies the more time is allotted for recall. However the maximum recall time in this
study was considerably lower than in our experiments (in which it was 2 min.). Additionally, frequency
estimates are, of course, usually given in less than 2 minutes. Finally, results of Sedlmeier et al. (1998)
demonstrate that predictions based on the time needed to retrieve a first exemplar of each category do
not differ systematically from predictions based on the number of retrieved exemplars. Thus, it seems,
no matter how recall performance is measured, it reflects actual frequencies worse than estimated
frequencies.
72
systematically related to properties of the events in question, encoding factors and the
conditions at the time of frequency estimation. Brown basically distinguishes three
strategies. Two of these strategies correspond to the memory assessment approach and
the recall-estimate approach examined by us.16 The third strategy is direct retrieval of
previously stored tallies of frequencies. For the present experiments the application of
this strategy can be excluded since participants of laboratory studies usually count item
frequencies only if they were explicitly instructed to do so (Brown, 2002).
If a combination of the approaches examined by us should be able to explain
the result pattern of all experiments, one of the strategies has to make possible an
error-free estimation of category sizes. This can obviously not be the recall-estimate
strategy. As opposed to the original hypothesis of the memory models, we thus have to
assume that the memory responses to a category label were at least almost unbiased.
Evidence for this assumption especially results from Experiment 3 in which the
frequencies of the stimuli outside the target context were manipulated within the
experiment. Earlier experiments show that in case of such a manipulation a
considerable influence of the frequencies of the stimuli outside the target context on
frequency of occurrence judgments has to be expected (Hintzman & Block, 1971;
Dougherty & Franco-Watkins, 2003). On the contrary, the category size judgments
collected by us were nearly unbiased. This gives reason to assume that the responses to
category labels are not proportional to the sum of the responses to the corresponding
exemplars.
Let us shortly discuss how this finding relates to the memory models before we
return to the multiple strategy perspective. MINERVA-DM (Dougherty, Gettys, &
Ogden, 1999) and PASS (Sedlmeier, 1999) are aimed at modeling the memory
assessment process. Both Dougherty and colleagues and Sedlmeier assumed that the
response to categories is proportional to the sum of the responses to the exemplars. As
we already emphasized in the introduction, both authors, however, did not specify the
process that brings about this proportionality. The fact that Dougherty and colleagues
16
Brown refers to the recall-estimate strategy with the term enumeration. For the memory assessment
strategy he does not specify an underlying cognitive process (as done in MINERVA-DM or PASS).
Thus, in the multiple strategy perspective the only assumption connected with this strategy is that
persons have access to some kind of memory response that can be used as indicator of frequencies.
73
simulated the famous-names effect without integrating a corresponding process into
MINERVA-DM may signal that the proportionality assumption was made ad hoc.
Together with the finding that frequency of occurrence judgments - as expected by the
memory models - are influenced by frequencies outside the target context, this
assumption enabled an explanation of the famous-names effect within the scope of the
models. In the introduction we argued that proportionality could be brought about by
an activation of superordinate categories already during the encoding of the stimuli.
While this assumption seemed plausible to us at the beginning of this research, it is by
no means necessary within the framework of MINERVA-DM or PASS. Therefore, the
question arises which predictions the memory models will make if we abandon this or
other possible additional assumptions generating proportionality.
Let us consider the situation in the famous-names experiment. In MINERVADM every new event is stored by an additional trace in a memory matrix. If the sex of
a name on the study list was encoded, it then has to be represented within the
corresponding trace. At the time the judgments are given, memory is probed with the
label of a category, here ‘female names’ and ‘male names’. The activation of an intraexperimental trace can thus only depend on its similarity to this probe. Given that the
participants know the sex of the presented names and that the feature ’sex’ is encoded
automatically (Taylor et al., 1978; Fiske & Neuberg, 1990; Messieck & Mackie,
1989), the probability that the sex of a female and a male name is represented in
memory should be equal. Consequently, also the familiarity signals elicited by intraexperimental traces should correspond to the proportions of male and female names on
the study list. The present experiments provide no indication that memory responses to
categories are influenced by the frequency of extra-experimental events. When the
frequency outside the target context was varied independent of fame, we found a
(slight) bias in the estimates only if intra-experimental non-target lists were used
(Experiment 3). Let us nevertheless assume that also in the case of category size
judgments participants are not able to discriminate perfectly between the occurrence of
events in the experimental and extra-experimental context. Since memory is not
probed with single names, the activation of extra-experimental traces - just as the
activation of intra-experimental traces - would have to depend on their similarity with
74
the probe ‘male’ or ‘female names’. In addition to this, the activation of extraexperimental traces would be influenced by the similarity of the encoding context to
the experimental context. A distorting effect of extra-experimental traces on estimates
on the relative frequency of male and female names thus would only be expected if at
least one of the following circumstances will be given: 1. Names of one sex are
generally encoded systematically more often than names of the other sex, or 2. names
of one sex are encoded in contexts that are systematically more similar to the
experimental context than the encoding contexts of names of the other sex. Both
circumstances do not seem to be very plausible.17
In a similar manner also the predictions of the PASS model have to be
modified if one abandons the assumption that the names elicit an activation of their
superordinate category already during the presentation of the study lists. The
associative response of this model to the probes ’male’ and ’female names’ then
mainly depends on whether and to what extent an associative connection between the
sex of the single names and the experimental context was learned. Again there is no
reason for the assumption that this associative connection is systematically learned
better for the names of one of the two sexes within the experiment. Also outside the
experiment no systematically different weights between the units representing the
experimental context and the units representing the sexes should have been acquired.
Consequently, without additional assumptions both models come to the
prediction that frequency judgments on categories in a target context should not be
influenced by the frequency of occurrence of the relevant exemplars outside the target
context. Thus, the models are in principle absolutely able to explain at the same time
distorted frequency of occurrence judgments and accurate category size judgments.
The present experiments give every reason to believe that the models come off far
better if no additional assumptions are made modifying these predictions. With that the
17
Of course, MINERVA-DM would not predict a famous-names effect even if one of these
circumstances would be given. As far as, for example, more male than female names were encoded
outside the experiment, the model would expect that the frequency of male names is overestimated
irrespective of the specific names on the study list. In fact, in the Experiments 2 and 4 we found a slight
tendency to overestimate the frequency of animals and German cities on all study lists. Whether this was
not only due to chance variation but also to more extra-experimental encodings of animals and German
cities than of plants and foreign cities has to remain speculation.
75
models of course forfeit in any event the possibility to explain the famous-names effect
on account of the different pre-experimental frequency of the names.
According to the previous considerations, we suggest that our participants had
generally access to unbiased information about category sizes via a memory
assessment strategy. Thus, the question arises why they nevertheless gave distorted
estimates in a part of the experimental conditions. From the view of the multiple
strategy perspective, the biased estimates have to go back to an application of the
recall-estimate strategy. Since the recall ratios were biased in all experiments, this
strategy indeed represents continuously a possible cause of distorted estimates. In other
words, an increased use of this strategy should cause a greater bias. The correlation
between individual recall ratios and estimates can be regarded as an indication of the
use of the recall-estimate strategy. we determined the covariation between this measure
and the effect size in the estimates on the eight study list combinations for which
separate analyses were reported.18 This covariation was indeed strong, ralerting = .88
(Rosenthal et al., 2000). That is, a larger bias resulted in those experimental conditions
in which the estimates were systematically related to the recall ratios. On the other
hand, if the estimates correlated only weakly or not at all with the recall ratios, no bias
occurred. However, for the hypothesis that a bias in the estimates is caused by the use
of the recall-estimate strategy, this is only weak evidence. While a missing relation
between recall ratios and estimates indicates that the estimates were definitely not
generated with the recall-estimate strategy, the presence of a correlation does, of
course, not have to indicate that the participants used this strategy. Alternatively, any
unknown third variable could have influenced estimates and recall ratios in the
corresponding experimental conditions in the same direction.
A stronger point in favor of the hypothesis could be made if we could name
reasons why the participants should have used the recall-estimate strategy in some
experimental conditions more than in other conditions. we suggest that persons base
18
Four of these eight combinations result from the lists with high-frequent and unknown, high-frequent
and medium-frequent and medium-frequent and unknown names in Experiment 1 as well as from the
lists with plants and animals in Experiment 2. Two further combinations consist of the lists with male
and female names in the two interval conditions in Experiment 3. Moreover, the lists with German and
foreign cities in Experiment 4 were analyzed separately for the two task order conditions due to the
extremely different effect in the estimates.
76
their estimates increasingly on the recall-estimate strategy whenever the recall
performance is perceived as valid and informative cue for frequencies. As a matter of
principle, this strategy can only be used if persons succeed in recalling relevant
exemplars. Furthermore, the information provided by the recall performance should
appear trustworthier the larger the sample of retrieved exemplars is (Sedlmeier &
Gigerenzer, 1997; Sedlmeier, 2004). In accordance with this consideration, Brown
(1997) could demonstrate that the use of the recall-estimate strategy increases with the
number of recalled exemplars. In a series of three experiments he presented his
participants with word pairs consisting of a category label and an exemplar. Exemplars
appeared on the study lists only once whereas the frequency of the categories varied
between 0 and 16. After the study phase, participants estimated the presentation
frequency of categories and then recalled exemplars. Within and across experiments,
encoding time, category-exemplar relatedness and study phase instructions were
manipulated. This set of manipulations produced large differences in the number of
retrieved exemplars. More importantly, across the altogether nine different
experimental conditions the number of retrieved exemplars covariated nearly perfectly
with an index for the use of the recall-estimate strategy.19
However, even if the hypothesis held that the biased frequency judgments in
the present experiments were generated with the recall-estimate strategy, their
occurrence could, however, not be predicted by the number of recalled exemplars
alone. As a matter of course, the estimates should also be influenced by the size of the
recall bias. Even a large sample of recalled exemplars gives the participants only little
reason to deviate from an estimate based on memory assessment as long as the recall
ratios provide a similar result. Given that the memory assessment strategy enables
approximately correct responses, this means, no bias should occur if the number of
recalled exemplars is small or the recall ratios are only weakly distorted. On the
contrary, a large bias in the estimates should only occur if both the number of recalled
19
This index was based on response times. Since the recall of every exemplar requires additional time,
the response time should increase with the presentation frequencies as far as the recall-estimate strategy
is used. On the other hand, there is no reason to assume that the response times should be influenced by
the presentation frequencies, if a memory assessment strategy is applied. Thus, the steepness of the
function relating response times to presentation frequencies can be regarded as an index for the use of
the recall-estimate strategy.
77
exemplars and the distortion in the recall ratios are large. In other words, the bias in the
estimates should depend on the additional informational value of the recall
performances. In order to check this assumption, we re-analyzed the data of the
different experimental conditions. As measure for the additional informational value of
the recall performances in these conditions we calculated the product of the mean
number of retrieved exemplars with the recall bias. Table 2.11 displays these variables
together with the informational value in the eight experimental conditions. In Figure
2.5 the bias in the estimates is plotted against the informational value. Both the recall
bias and the bias in the estimates were measured as differences. In each experimental
condition we determined the difference between the mean estimates for the same
category on the respective study lists. The recall ratios were treated analogously.
20
Positive values indicate a bias in the expected direction.
Table 2.11
Mean recall bias, mean number of retrieved exemplars and informational value of
recall performance for the different list compositions and conditions in each
experiment.
Exp. 1
high-frequent / unknown names
high-frequent / medium-frequent names
medium-frequent / unknown names
Exp. 2
animals / plants
Exp. 3
names with varying non-target list
frequencies
(interval non-target – target list: 3h)
names with varying non-target list
frequencies
(interval non-target – target list: 24h)
Exp. 4
German / foreign cities
(task order: recall-test before)
German / foreign cities
(task order: recall-test after)
recall bias
(%)
number of
retrieved
exemplars
informational
value of recall
performance
N
69.6
24.4
71.8
9.1
11.5
7.5
6.3
2.8
5.4
32
32
32
6.9
12.5
0.9
35
30.9
6.2
1.9
41
31.8
5.3
1.7
40
40.6
11.8
4.8
41
45.1
12.0
5.4
40
20
We did not use effect sizes here because these depend on the variances of the recall ratios (or the
estimates) between participants. The variances of the recall ratios, however, cannot influence a
participant’s strategy selection (and his estimates) since they are unknown to him.
78
As illustrated in Figure 2.5, there is a strong tendency that the bias in the
estimates increases with the product from the recall bias and the number of retrieved
exemplars. Thus, the informational value of the recall performances is indeed
predictive for the estimation bias. Particularly instructive is a comparison of the results
of the Experiments 1 and 3 in which names were used as stimuli. In Experiment 3 in
which the frequency of unknown names was manipulated by means of a non-target list,
the recall bias was slightly larger than in case of the lists with pre-experimentally highfrequent and medium-frequent names in Experiment 1. Nevertheless, we observed in
20
Exp. 1: hf/uk
Estimation Bias (%)
15
Exp. 1: mf/uk
10
Exp. 1: hf/mf
Exp. 4: Rec. before
5
Exp. 3: 3h
Exp. 3: 24h
0
Exp. 2
Exp. 4: Rec. after
-5
0
1
2
3
4
5
6
7
Informational Value of Recall Performance
Figure 2.5: Estimation bias plotted against informational value of the recall
performance for the different list compositions and conditions in each
experiment. See text for details. List compositions and conditions can be
taken from Tab. 2.11.
Experiment 3 a clearly smaller estimation bias. The present analysis suggests that this
can be ascribed to the fact that in Experiment 3 markedly less exemplars were recalled.
79
Consequently, the information from the recall performances appeared possibly less
reliable and was used less for the generation of the estimates.
As could be expected, within Experiment 1 fewer exemplars were recalled on
the lists that contained unknown names than on the lists with high-frequent and
medium-frequent names. At the same time, however, the recall bias was overproportionally larger on the lists with unknown names. Thus, the recall performances
attained the highest informational value on the list with high-frequent and unknown
names. In fact, also the bias in the estimates increased starting from the list with highfrequent and medium-frequent names up to the list with high-frequent and unknown
names. This suggests that the special thing about names with pre-experimentally
varying frequencies (and varying fame) might be that they lead to a large sample of
recalled exemplars and a strong recall bias at the same time.
Within our studies, only in Experiment 4 the recall performances attained a
similar informational value as in Experiment 1. In this experiment city names with preexperimentally varying frequencies were used as stimulus material. The results from
the condition in which the recall test was completed before the frequency judgment
tasks fit well into the general picture. However, the results from the condition in which
the frequency judgments were given first, represent an outlier in Figure 2.5. Despite a
high informational value of the recall performances, no bias occurred here in the
estimates. Remember that Experiment 4 was one of the two experiments in which the
participants completed a categorization task during the presentation of the study lists.
As we already emphasized before, there is evidence that a categorization task impedes
the use of a recall-estimate strategy (Bruce, Hockley, & Craik, 1991). This finding
makes sense also from the psychological point of view. Generally, this strategy can
only exert an influence on the estimates if the participants assume a priori that the
recall performances could provide necessary and meaningful information about the
frequencies and thus at least try to retrieve exemplars. Exactly this may, however, be
prevented by the categorization task. This task provides the participants with a second
possible basis for their estimates which is not available without this task: the frequency
with which they allocated the category labels during the presentation of the study list.
This second basis may be sufficient to generate subjectively credible estimates. An
80
additional attempt to retrieve exemplars would thus be superfluous (Bruce, Hockley, &
Craik, 1991). On the other hand, if the participants learned “by force” from a
preceding recall test that they could retrieve a large sample of exemplars in which the
categories occurred with very different frequencies, this information can apparently not
be ignored.
It should be noted that the estimates in Experiment 2 in which likewise a
categorization task was completed were unbiased even if they were given after the
recall test. However, this corresponds to our assumption that the occurrence of a bias
should depend on the informational value of the recall performances. This value was
particularly low here.
The former considerations suggest that the experimental conditions combining
a categorization task and no preceding recall test should be excluded from our analysis
of the relation between the informational value of the recall performances and the
estimation bias. After this exclusion, the correlation between these variables across the
remaining seven experimental conditions amounts to ralerting = .94 (Rosenthal et al.,
2000).
Let us summarize: Neither the memory models nor the recall-estimate strategy
could account for the results of the present study. We suggest that the following set of
assumptions allows a satisfactory explanation of the result pattern within the
framework of a multiple strategy perspective: Our participants generally had access to
a memory response that indicated the relative frequencies of the categories in a more
or less unbiased manner. An estimation bias occurred whenever the participants
nevertheless used the recall-estimate strategy. Since the estimates were consistently
closer to the actual frequencies than the recall ratios, we have to assume that the
participants never exclusively relied on the recall-estimate strategy. Thus, the memory
response was at least always used to correct the information from the recall ratios. The
extent to which the recall-estimate strategy exerted an influence on the estimates
depended on the perceived additional informational value of the recall performances.
This informational value results from the number of recalled exemplars and the size of
81
the recall bias.21 Finally, we assume that an overt categorization of stimuli prevented
any attempt of the participants to gain information from the retrieval of exemplars.
That is, without a preceding recall test, the recall-estimate strategy was in this case not
used.
Conclusion
MINERVA-DM is fairly explicitly formulated with the claim to be able to
explain all kinds of frequency judgments on serially encoded events (Dougherty,
Gettys, & Ogden, 1999). Sedlmeier (1999, 2002) acknowledges, on the other hand,
that different mechanisms than memory assessment can exert an influence on
frequency judgments. However, he apparently only expects deviations from the
predictions of the PASS model if such mechanisms are triggered by additional
information at the moment the judgments are given (a preceding recall-test may be an
example for a situation in which such information is given). Actually, this memory
model and other memory models were utterly successful in predicting various
phenomena mainly observed in frequency of occurrence judgments. Thus, besides
accurate frequency judgments, these models can offer explanations for regressed
judgments (Fiedler, 1996; Sedlmeier, 1999), recency effects (Lopez et al., 1998;
Sedlmeier, 1999), list-length and spacing effects (Hintzman, 1988). The influence of
varying attention (Sedlmeier & Renkewitz, 2004) and varying depth-of-encoding
(Dougherty et al., 1999) on estimates could be simulated with these models as well. In
chapter 3 we will demonstrate furthermore that PASS correctly predicts effects of
associations between jointly presented stimuli on frequency judgments. On the other
hand, evidence for the use of a recall-estimate strategy in case of frequency of
occurrence judgments is indeed generally scarce if not absent (Brown, 2002; Marx,
1985).
In spite of these remarkable successes of the memory models, the results of the
present research suggest in accordance with the multiple strategy perspective (Brown,
2002) that the situation is different in case of category size judgments. The famous21
It should be pointed out once more, that of course also a large unbiased sample of recalled exemplars
represents a trustworthy source of information. However, the recall ratios would in this case only
confirm the memory responses and could thus exert no modifying influence on the estimates.
82
names effect could not be put down to different pre-experimental frequencies of the
stimuli. As a result, the memory models could offer no convincing explanation for this
effect and the result pattern in our experiments. On the contrary, the result pattern
becomes understandable if one assumes that the recall-estimate strategy has a part in
the generation of category size judgments at least under particular circumstances. As
far as one accepts that people use several strategies for the generation of frequency
judgments, the question arises which variables determine the selection of a strategy.
The present research indicates that properties of the stimulus material and all other
variables influencing the number of retrievable exemplars are plausible candidates.
Furthermore, there may be factors that generally prevent the use of a recall-estimate
strategy since they suggest a priori that the retrieval of exemplars is not necessary to
obtain subjectively acceptable estimates. The overt categorization of the stimulus
material could be such a factor. These considerations are in accordance with factors
proposed and identified within the multiple strategy perspective (s. Brown, 2002, for
an overview). The challenge of future research on category size judgments will be to
integrate these factors into a comprehensive model of strategy selection.
83
CHAPTER 3
Effects of Associations on Frequency Judgments:
Simulations with PASS and Empirical Results
Summary
PASS (Sedlmeier, 1999), a computational memory model of frequency judgment,
considers sensitivity to the frequencies of events as a by-product of associative
learning. In this chapter predictions of the model about the influence of associations
between jointly presented items on frequency estimates are derived and tested. In two
experiments sets of words that were either strongly or weakly associated with each
other were presented. Within these sets the presentation frequency of the words was
varied. Predictions of the PASS model for the frequency estimates about this stimulus
material were gained by means of simulations. In a first step PASS encoded a very
large text corpus and acquired associations between the stimulus words. In a second
step the model encoded the experimental study lists. PASS expected higher frequency
judgments for associated than for non-associated words. The size of this effect was
predicted to be independent of the actual presentation frequencies of the words.
Finally, the model expected that the frequencies of associated words are discriminated
slightly worse than the frequencies of non-associated words. These predictions were
largely confirmed in both experiments, if the participants did not complete a recall test
before the frequency judgment task. The results thus strengthen the assumption that the
encoding and representation of frequency information are based on associative
learning. However, the results also point to conditions, in which the ability of PASS to
predict frequency judgments is nonetheless restricted. Implications for PASS and other
memory models of frequency judgment are discussed.
84
Introduction
People show considerable sensitivity to the frequency of repeatedly occurring
events. In the last four decades, a vast amount of studies accumulated that compared
participants’ judgments about the frequencies of serially encoded events to the
objective frequencies of these events. The central finding of this research is that people
arrive at quite accurate judgments under a wide variety of conditions. This holds for
frequencies of events from everyday life - such as letters, pairs of letters, syllables,
surnames or professions (Attneave, 1953; Shapiro, 1969; Tryk, 1968; Zechmeister et
al., 1975) - as well as for experimentally varied frequencies of repeated items in
laboratory settings (e.g. Greene, 1986; Jonides & Naveh-Benjamin, 1987; Hockley &
Christi, 1996).
What forms the basis of this pervasive ability to judge the frequency of
occurrence of items? The PASS (Probability Associator) model (Sedlmeier, 1999;
2002), which aims at explaining judgments of frequencies and probabilities, suggests
that sensitivity to frequencies is a by-product of associative learning. Accordingly, the
basic assumption of PASS is that frequency knowledge is represented in associations
between objects or the features that constitute those objects. These associations are
assumed to be acquired in the course of all encodings of objects or events that are
performed with a minimum of attention. Several simulations with PASS demonstrate
that the model can account for the sensitivity to frequencies observed in numerous
empirical studies (Sedlmeier, 1999).
However, any valid model of frequency judgments does not only have to
account for the usual finding of estimates that roughly conform to actual frequencies
but also for a number of experimental results demonstrating that estimates can be
systematically influenced by variables different from objective frequencies. Several
factors working at the stage of encoding, representing or retrieving frequency
information, are known to have a reliable impact on frequency judgments. One of
these factors is associative relatedness between items. Leicht (1968) was the first to
demonstrate that a target item obtains greater frequency judgments the more items
associated to it have been presented in the same list. In his study list target items were
85
presented zero, one, three or five times. Additionally, the list contained zero, one, three
or five (different or repeated) associates of each target item. Partly, associates were
words eliciting the target item as response in a word-association task (e.g. “eating”
elicits the target item “food”). Further pairs of target items and associates consisted of
category names and corresponding instances (e.g. “tree” and “oak”) or antonyms (e.g.
“buy” and “sell”). Leicht found that frequency judgments about target items increased
with the number of presented associates. This held for all presentation frequencies of
the target items, although the effect was rather small. Averaged over presentation
frequencies, target items with zero associates on the study list received a mean
frequency judgment of 2.74 while target items with five associates received a mean
judgment of 2.92. Two other studies could replicate this effect of associative
relatedness between items on frequency estimates. Shaughnessy and Underwood
(1973) used instances of the same category as targets and associates in two
experiments. In both experiments targets that were presented along with several
associates obtained higher estimates than targets that had no items conceptually related
to them on the study list. In a study by Vereb and Voss (1974) frequency judgments
again increased with the number of presented words that were associates of the items
in question.
The main objective of the present article is to test if PASS can account for the
effects of associations between jointly presented items on frequency estimates. For this
purpose we report two new experiments and compare their results to predictions
derived from simulations with PASS. The experiments differ in one important aspect
from the studies reported above. All studies cited have in common that they
exclusively analyze whether associations exert an influence on the absolute magnitude
of frequency judgments. However, generally frequency judgments may not only differ
in their absolute magnitude but also with regard to the validity with which different
frequencies are discriminated (Flexser & Bower, 1975; Naveh-Benjamin & Jonides,
1986). Thus, a set of frequency judgments on associated stimuli presented with various
frequencies can turn out higher or lower than a set of judgments on non-associated
stimuli with the same presentation frequencies. Irrespective of this, these judgments
could furthermore reflect a better or worse ability to discriminate the actual
86
frequencies of associated or non-associated stimuli. An assessment of whether
associations do have an effect on frequency judgments also in this respect requires that
the presentation frequency within a set of associated stimuli is varied systematically. A
set of non-associated stimuli with the same presentation frequencies has to serve as
control group. These requirements were not fulfilled in the former studies. The study
lists there always contained a number of target words each of which was accompanied
by several associates. The (associative) relations between different sets of target words
and their associates remained undetermined or were minimized. Furthermore, within a
set of a target word and its associates the frequency of the stimuli was not varied
systematically.
In the following experiments the associative strength between all items on a
study list was controlled and manipulated. Furthermore, the presentation frequency of
items was varied within sets of a high and low associative strength. As a result,
measures for the discriminability of frequencies of associated and non-associated
stimuli could thus also be collected besides the magnitude of the frequency judgments.
This allows us to extend the comparison between the predictions of the PASS-model
and the experimental results to several characteristics of frequency judgments. Before
we take a closer look at the model, the simulations and the predictions derived from
these simulations, we have to clarify what kinds of frequency judgments lie in the
explanatory scope of PASS.
What kind of frequency judgments can be modeled with PASS?
The first restriction of PASS is that it exclusively deals with frequency or
probability judgments pertaining to directly experienced and serially encoded events.
Recent research accumulated evidence that persons use various strategies to generate
such judgments (e.g. Manis et. al. 1993, Brown, 1997). Brown (1995) was the first to
suggest a taxonomy of these strategies (for a more detailed classification of strategies
see Brown, 2002). Besides the direct retrieval of previously stored facts about
frequencies he distinguishes between enumeration and memory assessment. If
enumeration is used, single relevant instances will be retrieved from memory and
counted. The result of the counting is either given directly as frequency judgment or
87
the frequency judgment is extrapolated from the result. As far as memory assessment
is concerned, intuitions of relative frequencies of events serve as a basis for judgments.
These intuitions are in turn based on the evaluation of some aspect of memory
performance. Therefore, in contrast to enumeration, memory assessment first of all
provides a non-numerical impression about event frequencies which thus has to be
translated into a numerical response in a further step.
The second restriction of PASS is that it exclusively aims at modeling the
memory assessment process. There is a large degree of agreement on the point that
every encoding of objects and events occurring with a minimum of attention affects
memory representation in a way that can be called on for the generation of judgments
on the frequency of these objects and events (Reichardt et al., 1973; Dougherty &
Franko-Watkins, 2002; Hasher & Zacks, 1984; Zacks & Hasher, 2002; Sedlmeier,
2002). The PASS model holds specific assumptions on how the representation of an
event changes with every additional encoding of this event and how this change can be
used to give frequency judgments.
As already mentioned, the main objective of this article is to test whether these
assumptions are appropriate to explain possible effects of associations on frequency
judgments. Such a test is obviously only reasonable if it can be demonstrated first that
these effects occur when memory assessment is used. For this reason, in addition to
frequency judgments in our experiments the reaction times required by the participants
to decide in favor of a judgment were also collected. Decision times provide a processsensitive measure which allows differentiating between judgments based on
enumeration and judgments based on memory assessment22 (Brown, 2002). If
enumeration is used, the required decision times sharply increase with presentation
frequencies (Brown, 1995; 1997). The reason for this is that enumeration is a serial
process. The judgments are here based on the number of retrieved instances. The
retrieval of additional instances requires additional time (Bousefield & Sedgewick,
1944; Indow & Togano, 1970). In contrast to enumeration, memory assessment is
22
The possibility that participants use a direct retrieval strategy to generate their judgments can be
neglected. The use of a direct retrieval strategy requires that facts about the relevant frequencies were
memorized in advance. In an experimental context such knowledge about facts could only be acquired if
repetitions of items were counted explicitly during the study phase. However, experimental event
frequencies are only tallied by participants that were previously instructed to do so (Brown, 2002).
88
described as a non-serial process (Hintzman, 1988; Dougherty, Gettys, & Ogden,
1999; Sedlmeier, 1999). Thus, there is no reason to suppose that the speed of this
process is affected by presentation frequencies (Brown, 1995). Accordingly, in
judgments based on memory assessment only a notably smaller increase of decision
times can be found with increasing presentation frequency. Numerous studies,
however, document that the function relating decision times to frequencies even in
such judgments is not entirely flat (Brown, 1995; 1997; Dougherty & Franco-Watkins,
2002; Schimmack, 2002). In an extensive series of studies in which list frequencies of
words had to be estimated Brown found slight differences in the decision times
between various presentation frequencies for judgments based on memory assessment.
This can possibly be ascribed to the fact that the transformation of the intuitions of
event frequencies into numerical responses is more difficult with higher numbers than
with smaller ones (Brown, Buchanan & Cabeza, 2000; Moyer & Dumais, 1978). In
Brown’s studies (1995; 1997) the increase within the frequency range of 2 to 8 –
which is also considered in our studies reported below – constantly amounted to about
0.7 seconds. In judgments based on enumeration, on the other hand, an increase of
approximately 4 seconds could be found in the same frequency range.
There is good reason to assume that our participants should use memory
assessment to generate their judgments and thus produce the relatively flat decision
time function tied to this estimation strategy. Enumeration requires that relevant single
instances can be retrieved. The retrieval of event instances is facilitated the more, the
more distinct the instances are (Brown, 2002). Accordingly, studies on judgments
about behavioral frequencies found that the use of enumeration becomes more
common when event instances are judged to be distinctive (Menon, 1993; Conrad et
al., 1998). In experimental settings evidence for the use of this strategy almost
exclusively stems from studies, in which participants had to judge category sizes (e.g.
Tversky & Kahneman, 1973; Lewandowsky & Smith, 1983; Barsalou & Ross, 1986;
Begg et al., 1986; Williams & Durso, 1986; Greene, 1989; Brown, 1995; 1997). In
such studies the requested frequency estimate results from the number of presented
diverse exemplars of the category in question (e.g. "How many different kinds of trees
did appear on the list you just studied?”). In contrast to this, in our experiments
89
participants had to judge the frequency of occurrence of words on the study list. Here
the frequency judgment refers to the number of repetitions of identical events. The
retrieval of single instances should thus be inhibited. Accordingly, previous studies in
which participants had to judge the frequency of occurrence of items consistently
found that enumeration was rare (Marx, 1985) or even absent (Brown, 1995; see also
Brown, 1997; Manis et al., 1993).
The PASS-Model
How can one imagine the memory assessment process exactly? How does the
representation of an event change with every additional encoding of this event? And
how can this change be used to give appropriate judgments on the frequency of
occurrence of the event?
The PASS-model (Sedlmeier, 1999; 2002) assumes that knowledge of
frequencies is represented in learned associations. Associative learning is based on the
frequency with which events co-occur. The greater the frequency of co-occurrence the
greater is the association between two events (Slamecka, 1985). Therefore, it seems
straightforward to use associative strengths, the result of this learning process, as a
basis for frequency judgments. Within PASS, this learning process is modeled by FEN
(Frequency Encoding Network), a neural network that encodes events and acquires
associations while doing this. The strength of these acquired associations determines
the extent of activation in the neural network triggered by an event after its repeated
presentation. The PASS-model postulates that it is this activation that provides an
intuition of relative frequencies of events in the memory assessment process.
There is a number of associative models aimed at explaining how contingency
relations between events are acquired (e.g., Shanks, 1995; Wasserman & Miller,
1997). Contingency relations between events allow the prediction of one or several
events from the presence of another set of events. Thus, for example, a set of
symptoms may enable the diagnosis of a certain disease. The corresponding
associative models mainly use the occurrence or non-occurrence of the predicted event
as a corrective feedback for the modification of associations between the events
involved. However, contingency relations between events are in most cases only of
90
secondary importance for judgments of relative and absolute frequencies. If you are,
for instance, asked about the relative frequency of red cars, a possible co-occurrence of
other objects will be without any relevance. If in typical experiments on frequency
estimates a judgment about the absolute frequency with which a word was included in
a study list is requested, there will be no contingency to other events which could be
useful to answer the question. In other words, frequency judgments often merely refer
to the repeated occurrence of a single event. What kinds of associations are learned in
that case? In PASS it is assumed that events and objects are represented by their
features or values on attributes. A given collection of features and the associations
between these features determine which object or event is experienced. Learning
consists of changing the strength of associations between features whenever an event
or object is encoded. Associations between co-occurring objects can be acquired by
treating the objects in question as a single event. Accordingly, the associative strengths
between the features constituting the objects are modified together.
FEN – Architecture and Learning Rule
What does the applied neural network exactly look like? There is no single
model behind FEN but a family of models. Sedlmeier and Köhlers (unpublished data)
used a number of different neural networks with varied architectures and associative
learning rules in their simulations of frequency judgments. In the course of this, feedforward models using competitive learning (Grossberg, 1976; Rumelhart & Zipser,
1985) and feed-forward as well as recurrent architectures working with Hebbian
learning led to very similar results, as well as several other models. The principal
commonness of all models in the FEN-family is that they have to operate without
corrective feedback. Below, only that FEN-variant (Sedlmeier, 2002; 1999) is
explained in more detail that has been used for the following simulations.
Figure 3.1 shows the architecture of this variant. FEN here only consists of one
layer of units. Every unit is connected with every other unit and with itself. A
particular unit is activated each time the suitable feature is available in a stimulus.
Therefore, a given input for FEN consists of a vector of 1s and 0s, with 1s coding the
presence and 0s coding the absence of features in a stimulus. Learning takes place by
91
changing the associative strengths or weights that connect two units. The learning rule
controlling this modification of weights is a variant of the Hebbian learning rule and
was adapted from the stimulus sampling theory (Bower, 1994; Estes, 1950). It is:
θ1 (1 − wij ), if both units i and j are active
∆wij −
2
w ij , if either unit i or j is active
(1)
− θ 3 wij , if neither unit i or unit j is active
Here wij corresponds to the associative strength between the units i and j, ∆wij is the
change in the associative strength after a learning trial and θ1, θ2 and θ3 represent
learning rates, interference and decay rates, respectively. The weights can vary
between 0 and 1 according to this rule. If two units are active in an input vector, the
weight between them will, according to the learning parameter θ1, be increased by a
constant proportion of the difference between the current weight and the maximum
associative strength. An interference between two units occurs if one unit is active and
the other one not. In this case, the weight between the units in question is decreased by
a proportion of the current associative strength specified in θ2. The weight between
two units decays if both units are inactive. The strength of the decay is given by the
parameter θ3.
I1
I2
I3
I4
w23
w12
w14
1
0
0
1
Input Pattern S1
Figure 3.1. The architecture of a variant of the FEN module
92
The situation represented in illustration 1 may serve as an example. Here, units
I1 and I4 are activated by the input pattern S1, whereas units I2 and I3 are not. Let us
assume that the learning, interference and decay parameters are θ1 = 0.2, θ2 = 0.05 and
θ3 = 0.02. Given the weight w14 that connects node I1 with node I4 is at 0.3 prior to the
encoding of the input pattern S1, so the increase of the weight turns out as ∆w14 = 0.2(1
– 0.3) = 0.14. As a result, the updated weight amounts to w14(new) = w14(old) + ∆w14 = 0.3
+ 0.14 = 0.44 after the encoding. The effect of interference can be illustrated by weight
w12. If this weight has a starting value of 0.5, it will be at w12(new) = 0.5 – 0.05 * 0.5 =
0.475 after the learning trial. Finally, in the given input pattern the weight w23 will
decay. If w23 = 0.1, then w23(new) = 0.1 – 0.02 * 0.1 = 0.098.
The Generation of Frequency Judgments
After any number of stimulus presentations and the corresponding successive
weight modifications in FEN, PASS can render frequency judgments. For this, first the
activation is determined which is elicited in all units of the neural network by the
stimulus the frequency of which has to be estimated. The activation of a particular unit
results from the scalar product of the input vector and the vector of weights that
connect the unit with itself and the remaining units in the network. Let us assume that
in Figure 3.1 the weights that establish the connection between all nodes of the
network and node I1 amount to w11 = 0,6, w21 = 0,475, w31 = 0.35 and w41 = 0.44 at the
time of probing. Then the present input pattern S1 triggers an activation at node I1
which is a1 = 1
0.6 + 0
0.475 + 0
0.35 + 1
0.44 = 1.04. The activation is
calculated for all units of the network correspondingly. The sum of these activations is
FEN’s response to a stimulus. In the PASS-model this response is regarded as the basis
of intuitions of frequencies of occurrences.
FEN’s response cannot be used as estimation yet as it is not scaled in the same
way as frequencies are. For that reason, the PASS-model contains a CA-module
(cognitive algorithms module) beside FEN which interprets the output of FEN with the
help of several rules and turns it into frequency judgments (for a more detailed
description of the CA-module, see Sedlmeier, 2002). The rule used to generate
93
judgments about relative frequencies is called relative-frequency algorithm. This
algorithm takes advantage of the fact that all kinds of information required for a
relative frequency judgment can be acquired from FEN. Let us assume PASS was to
give a judgment on the proportion of red cars among all cars encoded during a certain
period of time. According to the relative frequency algorithm, first of all, PASS would
determine FEN’s response to the input pattern representing red cars. In a second step,
the responses to cars of all colors that occurred would be determined and added.
Finally, the response to red cars would be divided by the total sum determined in the
second step. This division results in a quotient that would be used by PASS as estimate
for the requested relative frequency.
PASS cannot give absolute frequency judgments by solely falling back upon
knowledge represented in FEN. The response triggered by a stimulus in FEN does not
allow any immediate conclusions about its absolute presentation frequency.
Consequently, the transformation of this response or the corresponding relative
frequency judgment into an absolute judgment can only take place on the basis of
information that is delivered to PASS from the outside. However, it is assumed that
absolute frequency judgments on different stimuli are scaled in the same manner by
PASS as long as the stimuli have been presented in the same (e.g. an experimental)
context and the judgments are given under identical conditions. As a result, in order to
be able to generate absolute judgments on a given set of stimuli, an assumption of one
single correspondence between a relative frequency and one absolute frequency is
sufficient for PASS. If, for example, an assumption of or knowledge about the total
number of stimulus presentations is available to PASS, absolute judgments can be
determined by multiplying this total number with the relative frequency judgment on
the stimulus in question (Sedlmeier, 2002). If the total number is 40 and the relative
frequency judgment of PASS 0.3, then this amounts to, for example, an absolute
judgment of 40 * 0.3 = 12. Alternatively and with the same result, absolute judgments
for a set of stimuli can also be generated directly from responses. For instance, if it is
known to PASS that a stimulus triggering a response of 0.8 was presented 12 times,
absolute frequency judgments for all remaining stimuli can be obtained by multiplying
their responses with the factor 15. In experimental settings, the value of one absolute
94
frequency required by PASS is often given directly or indirectly. Thus, participants are
informed about the maximum presentation frequency of a single stimulus in many
cases or they are presented with a response scale (e.g. Johnson et al., 1977; Hockley &
Cristi, 1996; Naveh-Benjamin & Jonides, 1986).
There is evidence that also people's judgments on absolute frequencies based
on memory assessment are generated in a two-stage process. Memory assessment first
of all results in a non-numerical intuition of relative event frequencies (Brown, 1995;
1997; Hintzman, 1988, Dougherty et al., 1999). Therefore, the non-numerical intuition
has to be transformed into a numerical answer in a second step. This transformation is
assumed to be independent of the assessment process (Brown & Siegler, 1993).
Evidence for this assumption mainly ensues from studies that gave different response
scales or that manipulated information about the maximum presentation frequency of a
stimulus. These manipulations can exert a high influence on the absolute amount of
frequency judgments without impairing the discrimination of different frequencies
(Brown, 1995; Schimmack, 2002).
Former Simulations with PASS
PASS can simulate a number of findings on frequency judgments based on
memory assessment. As mentioned earlier, judgments of the model show that PASS is
sensitive to frequencies just as experimental participants are. However, empirical tests
on relative frequency judgments have often found that this sensitivity is subject to a
restriction: small relative frequencies tend to be overestimated and large ones
underestimated (e.g. Erlick, 1964; Sedlmeier et al., 1998, Varey et al., 1990). PASS
also produces this “regression effect” (Edgington, 1965). Within the framework of the
model, the extent of the regression depends on two factors: the regression is the more
pronounced the more the encoded stimuli overlap, i.e. the more features they share.
Furthermore, the regression increases the more the interference parameter θ2 is to be
found below the learning parameter θ1 (Sedlmeier, 1999).
A further finding successfully simulated by PASS involves the chronological
sequence of stimulus presentations. If the occurrence rate of a stimulus changes in the
course of a presentation sequence, the occurrence rate at the end of the sequence exerts
95
an increased influence on the corresponding frequency judgments. Thus, the total
relative frequency of a stimulus occurring towards the end of a presentation sequence
with an increased rate is overestimated and the total frequency of a stimulus occurring
more seldom towards the end is underestimated. PASS’ frequency judgments show
this recency effect (Sedlmeier, 1999).
A lot of studies demonstrate that frequency judgments based on memory
assessment are influenced by factors impinging upon the encoding of stimuli. In this
way, frequency judgments are influenced by the salience (Sedlmeier & Renkewitz,
2004) as well as by the processing depth of stimuli (Fisk & Schneider, 1984; Greene,
1984; Jonides & Naveh-Benjamin, 1987; Maki & Ostby, 1987, Naveh-Benjamin &
Jonides, 1986, Rose & Rowe, 1976; Rowe, 1974). In addition to this, a generation
effect appears in frequency judgments; i.e. judgments on self-generated stimuli differ
from judgments on stimuli that have been merely copied (Greene, 1988). The typical
pattern of results in all these cases is that stimuli that are encoded better receive higher
judgments than stimuli encoded less well. Moreover, some studies show that a better
encoding also implies a better discrimination between frequencies (e.g. Greene, 1984;
1988; Naveh-Benjamin & Jonides, 1986; Rowe, 1974). These findings can be
simulated by PASS, too. An improved encoding is modeled by an increase of the
learning parameter θ1. This increase effects that PASS renders higher frequency
judgments and discriminates better between the relevant stimuli (Sedlmeier &
Renkewitz., 2004).
Overview of the simulations and experiments
The effect of associations between jointly presented items on frequency
judgments has so far not been examined with PASS. Therefore, the simulations
reported below serve at first the aim to clarify which effects of associations are
predicted by the model. Apart from the influence of associations on the magnitude of
judgments, it is also to be examined in this process whether PASS expects differences
in the discriminability of frequencies of associated and non-associated stimuli.
Afterwards, the predictions of the model will be compared with data from two
96
experiments. In both experiments the participants were presented with sets of
associated and non-associated words. The presentation frequencies for associated and
non-associated items were 0, 2, 5 and 8. Besides frequency judgments on this stimulus
material, decision times were collected in both experiments. These data served to
identify the strategy used for the generation of frequency judgments. We expected that
the participants would rely on a memory assessment strategy. As a consequence, with
an increased presentation frequency the relatively slight rise in the decision times
involved with this strategy should be found.
In the simulations with which the predictions of the PASS model were obtained
the same stimulus material as in the experiments was used. That is, PASS encoded the
same study lists with the same words and presentation frequencies as the participants.
For PASS to be able to make predictions about the effect of the associative relations in
this stimulus material, it is necessary that it "knows of" these associative relations.
Similar to a participant who already pre-experimentally acquired associations between
the experimentally presented items, PASS requires a pre-knowledge about the
associations between words. This knowledge should not be conveyed to the model
from the outside but should be learned "autonomously" by PASS. Thus, the simulation
of frequency judgments on associated stimulus material required a two-step procedure.
In a first pre-experimental step, PASS encoded a large text corpus. From this corpus
PASS acquired associative pre-knowledge. (The detailed process of the acquisition of
associations between words from this corpus is described below). In a second step, the
processes during the experimental presentation of the study lists were simulated on this
basis.
Besides the association and the frequency of the presented words, one variable
was manipulated in the experiments that is no issue of the PASS simulations: With one
half of the participants a recall test was administered before they gave their frequency
judgments. The other half gave their judgments immediately after the presentation of
the study list. The recall test was mainly introduced to examine an effect observed by
Erickson & Gaffney (1985). In their study the participants gave judgments referring to
the frequency of occurrence of words directly after the study phase or after an
intermediate recall test. The recall test influenced the subsequent judgments and
97
interacted with the actual presentation frequencies. Judgments in the condition with
recall test turned out smaller than judgments in the condition without recall test. The
difference between the conditions increased with rising presentation frequency.
Participants who recalled words from the study list before they gave their judgments
showed a tendency to underestimate the presentation frequency of items. However, in
judgments uninfluenced by a previous recall test a (slight) tendency to overestimate the
actual frequencies could be found. Up to the present moment, the study by Erickson
and Gaffney seems to be the only one in which the impact of a recall task upon
subsequent judgments on frequencies of occurrence was examined. Thus, the first
objective of the recall treatment was to check whether the observed effect is replicable.
In the second place, we wanted to test a possible explanation of this effect. The
differences found by Erickson and Gaffney between judgments generated with or
without a preceding recall test correspond to typical differences between judgments
based on enumeration and memory assessment (Brown, 1995; 2002): If enumeration is
used, there is a tendency to underestimate actual event frequencies. Memory
assessment, on the other hand, results often in overestimation (at least, as far as no
upper limit of the response range is given). Moreover, the difference in the magnitude
of judgments that can be traced back to these strategies increases with presentation
frequencies. In addition to this, Betsch et al. (1999) could demonstrate for frequency
judgments on category sizes that a preceding recall test fosters the use of an
enumeration strategy. The information made available by the recall test is apparently
also used in an intensified manner for the generation of frequency judgments. A
possible explanation for the effect observed by Erickson and Gaffney can thus be
found in the assumption that a preceding recall test facilitates the use of enumeration
even in case of judgments on frequencies of occurrence. Compared to judgments on
category sizes, however, different relevant exemplars cannot be retrieved in such
judgments. Instead of this, enumeration would here have to be based upon the retrieval
of various encoding episodes of identical events. In other words, when using this
strategy, the participants would have to recollect several episodic memories of
repeated presentations of the same word by, for example, remembering adjacent
words, own thoughts or activities during various presentations (Brown et al., 2000;
98
Roediger & McDermott, 1995; Gardiner, 1988; Dobbins et al., 1998). If the recall test
facilitates such recollections and thereby fosters the use of enumeration in case of
judgments on frequencies of occurrence, this should be accompanied by the typical
differences in the decision times: The function relating decision times with
presentation frequencies should be steeper in case of judgments after a preceding recall
test than in case of judgments without a preceding recall test. Furthermore, judgments
after the recall test should increase – as they did in the study by Erickson and Gaffney
– less steep with rising presentation frequency than judgments without a recall test.
Below, the design and the exact procedure of the first experiment are described.
This is followed by the description of how this procedure was realized with PASS in
the simulation. The section 'Simulation Method' is subdivided into two parts. The first
part provides a description of how PASS acquired pre-knowledge about the associative
strengths between the words used in the experiments. In the second part the modeling
of the experimental presentation of the study list is illustrated. After that, we will
report the results of the simulation and derive from them predictions of the PASS
model about the magnitude of judgments and the discrimination of presentation
frequencies of associated and non-associated items. Finally, the results of the first
experiment are reported and compared with the prediction of the PASS model.
99
Study 1
Method
Participants: Ninety-six persons took part in the experiment (43 men and 56
women, with an average age of 23 years). Most of the participants were students of the
TU Chemnitz from various departments. Students of Psychology did not take part in
the experiment. The participants received a reimbursement of 3 Euros.
Material: In the experiment four lists were used, two lists with associated
words as well as two lists with non-associated words. Each of the four lists contained
20 words. To arrive at the lists, first of all 10 associated lists were constructed.
Afterwards, two of them were selected for experimental use. German wordassociation-norms (Russell, 1970) served as a basis for the construction of the 10
associated lists. These norms merely itemize the associates of a given stimulus word
and, thus, do not contain any information on the relations between the associates of the
stimulus word. Generally, however, it can be observed that words triggered as
associative response by a stimulus word also tend to be associated with each other. The
first ten associates of the stimulus word river may illustrate this tendency (from the
English associative norms by Russell and Jenkins, 1954): water, stream, lake,
Mississippi, boat, tide, swim, flow, run, barge. We selected 10 stimulus words from the
German associative norms the associates of which were, in our judgment, also
particularly strongly associated among each other. The selected stimulus words were:
Berg (mountain), Brot (bread), Butter (butter), Dieb (thief), Fluss (river), Fuß (foot),
Hand (hand), König (king), Nadel (needle) and Stuhl (chair). The following words
were deleted from the ten lists containing the stimulus words and their first 19
associates: 1. Associates repeating the stimulus word or the stem of a preceding
associate (e.g. Diebstahl (theft) as an associated word to Dieb, or Sitz (seat) and
Sitzgelegenheit (seating) which are of a weaker association to Stuhl than sitzen (sit)). 2.
Associates that are in German typically used as compounds together with the stimulus
word and that, as single nouns, mainly occur in different contexts of meaning (e.g. Bett
(bed) from the compound Flussbett (river bed)). The deleted associates were replaced
by words that appeared to be plausible as associates of the respective stimulus word
and its remaining associates.
100
From the thus constructed ten associated lists those two should be used in the
experiment the words of which showed the strongest mean association with each other.
For the quantification of the associations the PASS-model was used. In the simulation
of the required associative pre-knowledge PASS had learned the associations between
all 20 words on the ten lists (for the details of this learning trial see the section on the
simulation method). For each of the word lists the mean of the 20 × 19 ÷ 2 = 190
associations per word list acquired by PASS was calculated. For the lists around the
words „Stuhl“ and „Fluss“ it was the highest. Table 3.1 shows the two associated lists
used.
Table 3.1
Word lists used in Study 1. English translations of the original German words are
given in parentheses.
Associated lists
Stuhl (chair)
Fluss (river)
Tisch (table)
Wasser (water)
Sessel (easy-chair) Strom (stream)
sitzen (to sit)
Ufer (bank)
bequem
Rhein (Rhine)
(comfortable)
setzen (to sit down) Meer (sea)
Hocker (stool)
Möbel (furniture)
Sofa (sofa)
Holz (wood)
Lehne (back-rest)
tief (deep)
See (lake)
Tal (valley)
breit (broad)
schwimmen (to
swim)
hart (hard)
Boot (boat)
Kissen (cushion)
Fisch (fish)
schaukeln (to swing) Brücke (bridge)
rutschen (to slide)
Bach (brook)
Zimmer (room)
kippen (to tilt)
Wohnung
(appartment)
Bett (bed)
stehen (to stand)
baden (to bath)
Stadt (city)
Schiff (ship)
Landschaft
(landscape)
überqueren (to
cross)
Non-associated lists
Stuhl (chair)
Fluss (river)
Boot (boat)
Lösung (solution)
Solist (soloist)
Wand (wall)
punkten (to score)
Mode (fashion)
mutig (courageous) Münster (German
city)
verlangen (to
Stil (style)
demand)
Teufel (devil)
spitz (pointed)
Anruf (call)
Ohr (ear)
Beleg (receipt)
Horn (horn)
Sinn (sense)
karg (meagre)
Bereich (area)
merken (to realize)
sauer (sour)
Gen (gene)
ragen (to tower)
entziehen (to
withdraw)
Start (start)
urteilen (to judge)
Protest (protest)
Herbst (fall)
Anteil (share)
Mittel (means
Brief (letter)
leiden (to suffer)
Sport (sport)
Druck (pressure)
Bahn (railroad)
Herrschaft (power)
vernichten (to
destroy
stecken (to stuck)
101
The starting point for the arrangement of the non-associated lists constituted
again the words Stuhl and Fluss. The remaining words on these lists each time came
from those 100 words from the PASS-vocabulary (Sedlmeier et al., 2004) showing the
weakest associations to Stuhl or Fluss. From these words 19 were chosen each time in
such a way that 1) the various classes of words (nouns, proper names, adjectives,
verbs) were represented equally often on associated and non-associated lists, and 2) the
words on associated and non-associated lists had nearly the same length (s. Tab. 3.1).
For the resulting word lists the means and medians of all associations between the 20
words on both associated lists are approximately 4.5 times as large as the means and
medians on the corresponding non-associated lists.
For each participant a unique study list was created from one of the four word
lists. This study list contained always 5 words from the corresponding word list 0, 2, 5,
and 8 times. Thus, the study list had an overall length of 75 items. The words were
randomly assigned to these presentation frequencies for each participant. The sequence
of items was also randomly determined for every participant, with the restriction that
the same word could not appear twice in immediate succession.
The test list for the frequency judgment contained the 20 words of the
corresponding word list, once more in random sequence.
Procedure: The participants were tested individually at a personal computer.
Every participant was randomly assigned one of the four word lists. Prior to the study
phase the participants were informed that they would be presented with a longer word
list in the following minutes. Furthermore, they were instructed to consciously read
through this word list as they would later have to answer some questions referring to
this list. The exact nature of the memory test, however, remained unspecified. The
participants were also told that words on the list could occur repeatedly.
During the study phase, words were presented one at a time. Each presentation
lasted three seconds, followed by a one-second interval in which the screen was blank.
Subsequent to the presentation of the word list, a free-recall test was carried out with
half of the participants. The participants were given a sheet of paper. They were
instructed to write down within two minutes as many words as possible from the list
102
previously studied in any order. After the two minutes were over, the sheet was taken
away again and collected.
After the recall test or, for the other half of the participants, immediately after
the presentation of the study list, the participants were required to estimate as
accurately as possible the number of times each word had appeared on the list. For
this, the participants were presented each of the 20 words on their test list one at a
time. It was pointed out to them in advance that they possibly also have to assess the
frequency of words not yet presented. In this case their frequency judgment should be
"0".
In addition to this, the participants were informed that decision times would be
recorded as well. The measurement of the decision times largely followed a method
introduced by Brown (1995). The participants were instructed to press a key marked
red (the spacebar) in each test trial "as soon as they decided in favor of a particular
number". At the activation of the spacebar a response field appeared on the screen next
to the word to be estimated. Now the participants should start entering their estimate as
quickly as possible. A test trial ended when the participants pressed the enter-key to
confirm their judgment. After a one-second interval in which the screen was blank, the
next test trial started with the appearance of the next word. Measured was 1) the time
from the appearance of the test word to the activation of the spacebar (decision time),
2) the time from the activation of the spacebar to the entry of the first digit (initiation
time), and 3) the time from the entry of the first digit to the activation of the enter key
(entry time). The sequence that thus had to be kept whenever judgments were given
was practiced with the participants beforehand by means of three examples. Here the
participants had to give judgments on the frequency with which they had performed
particular activities in the past week (e.g. "washing the dishes").
All together, an experimental session lasted about 20 minutes.
Simulation Method
Simulation of Pre-knowledge
A large machine-readable text corpus served as a model of the environment in
which our participants acquired their associative pre-knowledge. This text corpus
103
consisted of four complete volumes (1995 – 1998) of the Süddeutsche Zeitung, a big
German nationwide daily newspaper. The text corpus contained approximately 120
million word forms all together. A text corpus of this scale can be regarded as an
adequate sample of the linguistic information that was pre-experimentally encoded by
our participants23. First of all, the text corpus was lemmatized, that is to say all word
forms included in the corpus were reduced to their stems24. From the lemmatized
corpus PASS now learned the associations between the experimental stimulus words.
To learn associations from texts PASS simulates the process of reading. Similar to a
human reader whose attentional span comprises not only the most recent word but also
several words of the text previously read, PASS uses an “attentional window”. This
window is moved over the corpus word by word. In this process, PASS encodes all
words in the window in every step and in this way changes the associations between
these words and the remaining words in its vocabulary according to the rules in
equation 1.25 Thus, associations between two words both occurring in the attentional
window are strengthened. However, associations between two words of which only
one or none is contained in the window are attenuated. Since all words in the window
are taken into account in this updating process, the association between adjacent words
is more frequently and thus more strongly increased than the association between
words separated by some intermediate words. Thus, with an assumed window length
of 10, two adjacent words would be contained together in the window eight times. On
the other hand, two words with eight intermediates between them would co-occur only
once.
To be able to simulate the process of association acquisition, it was necessary
to have every word treated as independent feature by PASS. Therefore, to every word
of the PASS-vocabulary an own unit was assigned in FEN that encoded the occurrence
of this word. With regard to the original formulation of PASS in which every object is
23
The extent to which a text corpus represents an adequate sample of language seems to depend
primarily on its size (Burgess & Livesay, 1998). In this respect, our corpus comes off well compared to
other corpora such as the frequently used Brown Corpus of Standard American English (Kucera &
Francis, 1967), which comprises approx. 1 million word forms.
24
For that we used the program MORPHY, a lemmatizer for the German language (Lezius, Rapp, &
Wettler, 1998).
25
I am indebted to Petra Seidensticker for the programming of the simulations reported here and in
Study 2.
104
represented by several features, this simplification was necessary for pragmatic and
theoretical reasons. For one thing, the required number of arithmetic operations cannot
be managed with a distributed representation26. For another, currently there is no
model that makes both theoretically and empirically satisfying assumptions on how
words can be represented in a distributive manner (Kintsch, 1998). A reasonable
“decomposition“ of words in different features was hence impossible.
For the learning trial in which PASS acquired its pre-knowledge about the
associative relations between the words used in the experiment, the text corpus was
subdivided into the four volumes 1995, 1996, 1997 and 1998 of the Süddeutsche
Zeitung. The associations acquired from the single volumes were then averaged to
diminish possible recency effects. In the learning trial the following parameter settings
were used: θ1 = 0.005, θ2 = 0.0001 and θ3 = 0.0000001. The length of the attentional
window was 12 words. These parameter settings yielded the best results among several
tested parameter combinations in a study by Sedlmeier, Wettler and Seidensticker
(2004) on the prediction of primary associations. In the simulation by Sedlmeier et al.
PASS disposed of a vocabulary of 3580 words. PASS acquired all associations
between these words from the text corpus also used here. After that, PASS was
presented with 100 stimulus words. The strongest associations of the model to these
stimulus words tallied best with human primary associations if the parameter
combination mentioned was used. With these parameters, the number of concordances
between the associative responses of the model and the primary associations
approximately corresponded to the number of concordances also achieved by human
test persons. Thereby, the study also demonstrates that the procedure followed here is
in general suitable to simulate human associations.
26
In every step in which the attentional window is moved over the text PASS updates the associations
between all words in its vocabulary according to one of the three rules in equation 1. Thus, in a
localistic representation the total number of required arithmetic operations already results from the
product of the number of words in the text corpus with the square of the number of words in the PASSvocabulary. In a distributed representation this total number of arithmetic calculations would have to be
multiplied once again with the square of the number of features per word.
105
The Simulation of the Experimental List Presentation
In a second step of the PASS-simulation the encoding of the experimental
presentation of the study lists had to modeled. For that, a further learning trial was
realized with PASS. As already described above, the study lists of participants differed
even in the same experimental condition from each other by the fact that the stimulus
words were randomly assigned to various presentation frequencies and positions on the
list. All in all, 48 different associated as well as 48 different non-associated study lists
were used, one for each participant. Exactly these 96 study lists were now presented to
PASS and encoded by the model.
Prior to the encoding of the study lists, the self-associations of the words (i.e.
the weights in FEN connecting a unit with itself) previously acquired by PASS from
the text corpus were set to 0. These self-associations are exclusively determined by the
pre-experimental frequency of occurrence of the words in the text corpus. However, at
least two studies demonstrate that estimates on list frequencies of words are only
marginally influenced by the linguistic frequency of these words (Rao, 1983; Greene
& Thapar, 1994). Consequently, a noteworthy impact of the frequencies of the
stimulus words in the text corpus on the judgments about the experimental list
frequencies was not to be expected. To take this circumstance into account, we decided
to set the pre-experimentally acquired self-associations to 0. As a result, the selfassociations of words were completely relearned during the modeling of the
experimental presentation while the pre-experimentally acquired associations between
words were overlearned.
For the encoding of the study lists by PASS, various parameter combinations
were used. we changed the parameter settings due to the consideration that the
information presented in the experiment is possibly encoded with more attention and is
represented better than information processed in more familiar contexts. Thus, starting
from the parameter settings used for the encoding of the text corpus, we increased the
learning parameter θ1 and decreased the parameters θ2 and θ3 which regulate the
amount of forgetting. θ1 was increased in nine steps from 0.005 to 0.05. The
parameters θ2 and θ3 either were 0.0001 and 0.0000001 or were commonly set to 0.
Additionally, we varied the length of the window in which two stimulus words had to
106
co-occur so that the associative connection between them was increased. A window
length of 12 words had proved optimal in the prediction of primary associations from
the text corpus. However, in the text corpus the relevant words are semantically
connected and at the same time present. For the case realized in the experiment, i.e.
that words succeed unconnected and are presented one at a time, it seems to be
reasonable that the distance between two words has to be shorter so that the associative
connection between them is increased. Therefore, the three shortened window lengths
3, 5, and 7 were used for the encoding of the study lists. All together, the 96 study lists
were thus encoded by PASS with 60 different parameter settings each (10 θ1, 2
common settings for θ2 and θ3, and 3 window lengths).
Simulation Results
Relative magnitude of responses: Figure 3.2 exemplarily illustrates FEN’s
standardized mean responses for four different settings of the learning parameter θ1
and the window length. The interference parameter θ2 always amounts to 0.0001; the
decay parameter θ3 is constantly at 0.0000001. The results of the item sets Stuhl and
Fluss did not differ systematically and are summarized in the figure. The data points
thus result from an averaging of responses over all 48 study lists presented. The
absolute magnitude of FEN’s responses largely varies with the settings for θ1 and the
window length. Hence, with a θ1 of 0.01 and a window length of 3 (chart A), the mean
response over all stimuli amounts to 0.66. With a θ1 of 0.03 and a window length of 5
(chart D) on the other hand, a mean of 2.41 is the result. However, since it is assumed
in PASS that the frequency judgments of participants result from a linear
transformation of responses, the predictions of the model refer to the relative
differences between the data points and are independent of the absolute magnitude of
the responses. To facilitate a comparison of the predictions of the model across
different parameter settings, the displayed means were hence standardized in such a
way that the mean response for associated stimuli with the presentation frequency 8
always takes the value 1.
107
Theta 1 = 0,01; Window length = 3
Theta 1 = 0,01; Window length = 5
1,2
associated
non-associated
1,0
0,8
0,6
0,4
0,2
A
0,0
0
2
5
Standardized Mean Response
Standardized Mean Response
1,2
0,8
0,6
0,4
0,2
B
0,0
8
0
2
5
8
Presentation Frequency
Presentation Frequency
Theta 1 = 0,03; Window length = 3
Theta 1 = 0,03; Window length = 5
1,2
1,2
1,0
0,8
0,6
0,4
0,2
C
0,0
0
2
5
8
Presentation Frequency
Standardized Mean Response
Standardized Mean Response
1,0
1,0
0,8
0,6
0,4
0,2
D
0,0
0
2
5
8
Presentation Frequency
Figure 3.2. FEN’s standardized mean responses for 4 different settings of the learning
parameter θ1 and the window length in Study 1. In all charts θ2 = 0.0001
and θ3 = 0.0000001.
Three characteristics of the graphs in Figure 3.2 are of interest. First, the figure
exemplifies the sensitivity of the PASS model to frequencies. FEN’s responses
increase with the presentation frequency of items irrespective of the selected parameter
combination. This holds for both associated and non-associated words. Second, FEN’s
responses on associated and non-associated items show a similar trend so that the
curves for both types of items are approximately parallel. Again, this holds irrespective
108
of the parameter combinations. Third, and most important for the question pursued
here, all four exemplarily parameter combinations lead PASS to the prediction that
associated words receive higher frequency judgments than non-associated words.
However, the magnitude of the expected difference between estimates on
associated and non-associated items is influenced by θ1 and the window length. The
general pattern is that the expected effect decreases the more FEN learns during the
experimental presentation of the study lists. Chart A illustrates FEN’s responses with
θ1 = 0.01 and a window length of three words, the lowest parameter settings included
in the figure. In this case, the pre-experimentally acquired knowledge about the
associations between the stimulus words exerts the strongest influence on the
responses. Accordingly, the maximum difference between associated and nonassociated items can be found here. The impact of the pre-knowledge on the judgments
on the experimental presentation frequencies is now warded off the better the study
lists are represented in PASS’ memory. A better representation can be found in FEN in
a greater change of weights connecting the words in the study list with themselves and
each other. An increase of both the learning parameter θ1 and the window length now
causes such a greater change of weights during the experimental presentation. Charts B
and C illustrate that it is irrelevant for the resulting responses whether this is obtained
by a greater change with each encoding (i.e. by a higher θ1) or by a more frequent
updating of weights (i.e. a greater window length). Chart D illustrates that θ1 and the
window length take an additive effect. A conjoint increase of both parameters results
in a further reduction of the difference in the responses to associated and nonassociated items.
The variation of the parameters θ2 and θ3, which conjointly regulate the scope
of forgetting processes, also had a systematic effect on the mean responses in FEN. If
no forgetting occurred during the presentation of study lists (θ2 = θ3 = 0), greater
differences generally resulted between associated and non-associated stimuli than in
case of active forgetting processes. This can be ascribed to the fact that the higher
weights between associated words are reduced more strongly by forgetting processes
than the lower weights between non-associated words (see equation 1). However, the
effect of the variation of the parameters θ2 and θ3 was very small. By adding forgetting
109
processes, the mean responses did not change by more than 1%. For all practical
purposes the influence of forgetting processes realized in the simulation can thus be
neglected.
In which range now does the expected influence of the associations on
frequency judgments vary in dependence of the parameter settings? To obtain an
answer to this question, the ratio of mean responses to associated and non-associated
words was calculated separately for the presentation frequencies for each of the 60
parameter combinations used. If, for example, a parameter combination produced a
mean response of 0.9 for associated words with the presentation frequency 2 and a
mean response of 0.6 for non-associated words with the same frequency, the ratio of
this parameter combination for the frequency 2 would result from the calculation 0.9 ÷
0.6 = 1.5. The box-plots in Figure 3.3 illustrate the resulting distributions of ratios over
the 60 parameter combinations. Values above 1 show that greater frequency judgments
are expected for associated words than for non-associated ones. The uppermost outliers
in the presentation frequencies 2, 5, and 8 result from a θ1 of 0.005 and a window
length of 3, the lowest parameter settings realized in the simulation. Accordingly, the
responses to associated words are here most clearly above those to non-associated
words. As soon as θ1 or the window length are increased the ratios between associated
and non-associated words become smaller27. Values that equal 1 or are smaller than 1,
however, are not obtained with any of these parameter combinations. Thus,
independent of the quality of the encoding of the study lists, PASS expects for all
presentation frequencies that associated words receive higher frequency judgments
than non-associated words. Within the representation assumptions of the model the
influence of the pre-experimentally acquired associations on the judgments on list
frequencies obviously cannot be totally neutralized. Consequently, higher experimental
judgments for non-associated words than for associated words would be incompatible
with PASS.
27
In case of the presentation frequency 0, such an influence cannot be found. This can be ascribed to the
fact that the pre-experimentally acquired associations of words with the presentation frequency 0 are not
overlearned during the presentation of the study list. The weights connecting these words with the
remaining stimulus words are only subject to the influence of forgetting. This influence, however, turns
out to be very small for the selected parameter combinations.
110
Ratios of Mean Responses
4
3
2
1
0
0
2
5
8
Presentation Frequency
Figure 3.3. Distributions of ratios of mean responses for associated and non-associated
items in Study 1.
In principle, the difference in the responses to associated and non-associated
words is, however, arbitrarily reducible by an increase of θ1 and the window length.
On the other hand, there exists a considerable trade-off between both parameters due to
their similar impact on the responses. Only if both parameters acquire extreme values,
i.e. the study list is encoded extremely well, the expected effect will be small. For the
great majority of parameter combinations used a relatively narrow prediction range
results in which the differences between associated and non-associated items exceed
10%. On the assumption that the parameter settings chosen in the simulation map the
quality of the encoding of the study list by the participants more or less adequately,
PASS well expects substantially higher frequency judgments with associated items.
111
Discrimination of Frequencies: As a measure for the ability to discriminate different
frequencies Flexser and Bower (1975) suggested the discrimination coefficient. This
coefficient is defined as the correlation between actual and estimated frequencies. The
correlation is calculated separately for every participant. Thus, the coefficient provides
an individual measure for the quality of the discrimination of various frequencies,
independent of the absolute magnitude of frequency judgments.
To gain predictions of the PASS model for the discriminability of associated
and non-associated items, this measure was used. For each of the different 48
associated and 48 non-associated study lists the correlation between the actual
presentation frequencies of the 20 items and FEN’s responses was calculated.
Systematic differences in the simulation results to the item sets „Stuhl“ and „Fluss“
also did not arise with regard to the discrimination coefficient. Therefore, the results to
both item sets are summarized below. Figure 3.4 shows the resulting mean correlations
for associated and non-associated words for six different settings of the window length
and the learning parameter θ128. The interference parameter θ2 each time amounts to
0.0001, θ3 is 0.0000001.
First of all, Figure 3.4 once again illustrates PASS’ sensitivity to frequencies.
With all parameter combinations high to very high correlations between the
presentation frequencies and FEN’s responses are yielded both in case of associated
and non-associated items. However, the frequencies of non-associated items are
systematically discriminated better than the frequencies of associated items. The size
of this effect is influenced by θ1 and the window length in a similar way as the size of
the effect of associations on the amount of responses. Generally, the quality of the
discrimination of different frequencies increases with the quality of the encoding of the
study lists. Thus, if θ1 or the window length are increased, then the correlations
between the actual frequencies and the responses will also be increased. This
especially holds for the associated stimuli the discriminability of which is fairly clearly
affected by the associative connections between the stimulus words in case of low
parameter settings, that is, poor encoding of the study lists. With a better encoding of
28
Here and below, the correlations had been transformed into Fisher Z-values before they were
averaged or subjected to statistical tests (Silver & Dunlap, 1987). The reported means result from the
back-transformation of the corresponding mean Z-values.
112
study lists the influence of the associative pre-knowledge on the discrimination of
frequencies decreases. As a result, the differences between associated and nonassociated items in the correlations of the corresponding responses with the
presentation frequencies also diminish. As Figure 3.4 illustrates, for the resulting
predictions of the PASS model it is repeatedly irrelevant whether an improved
encoding of study lists is effectuated by an increase of θ1 or the window length29.
Theta 1 = 0,01
Theta 1 = 0,03
1,0
1,0
0,9
0,9
Mean Correlation
Mean Correlation
associated
non-associated
0,8
0,7
0,6
0,5
0,8
0,7
0,6
0,5
3
5
Window length
7
3
5
7
Window length
Figure 3.4. Mean correlations between FEN’s responses and presentation frequencies
for 6 different settings of the learning parameter θ1 and the window length
in Study 1. In both charts θ2 = 0.0001 and θ3 = 0.0000001.
In order to illustrate in which range the differences in the discriminability of
associated and non-associated items expected by PASS vary across all parameter
settings, a similar procedure as in the predictions of the relative magnitude of
frequency judgments was chosen. For each of the 60 parameter settings the ratio of the
mean predictions, that is the mean discrimination coefficients, for associated and nonassociated words was calculated. Values below 1 hence show that frequencies of
29
The variation of the parameters θ2 and θ3 only exerted an extremely small influence of no practical
importance on the discrimination of frequencies. Generally, however, a tendency could be observed
according to which the differences in the discriminability of frequencies of associated and nonassociated items were smaller if forgetting processes occurred.
113
associated words are discriminated worse than frequencies of non-associated words.
Figure 3.5 illustrates the distribution of the resulting ratios.
Ratio of Mean Correlations
1,0
0,9
0,8
0,7
0,6
0,5
Figure 3.5. Distribution of ratios of mean correlations between FEN’s responses and
presentation frequencies for associated and non-associated items in Study 1.
The figure shows that also concerning the discriminability of frequencies an
invariant prediction of the PASS model about the effect of associations is yielded.
Values above or equal 1 are not attained even if the study lists are encoded extremely
well. That is, independent of the parameter settings, PASS expects that the frequencies
of associated items are discriminated worse than the frequencies of non-associated
items. However, clear differences in the discriminability of frequencies of associated
and non-associated items only arise if both the learning parameter θ1 and the window
length take low values. Thus, the two outliers at the bottom of Figure 3.5 result from
the lowest settings for both parameters (window length = 3, θ1 = .005 or θ1 = 0.01).
Generally, a trade-off between θ1 and the window length also exists with regard to the
discrimination of frequencies which is due to the similar effect of both parameters.
This trade-off causes again that the predictions of the PASS model only vary in a
relatively narrow range for the majority of parameter combinations. In this range only
114
slight differences in the discriminability of frequencies of associated and nonassociated items are expected. Accordingly, the median of the ratios resulting from the
60 parameter combinations amounts to 96.3%, as can be seen in Figure 3.5. Hence,
with half of the parameter combinations it is predicted that the mean correlations
between estimated and actual frequencies for associated words differ by less than 4%
from the correlations for non-associated words. In other words, PASS generally
expects on the one hand that the frequencies of associated items are discriminated less
well than the frequencies of non-associated items. On the assumption that the
parameter settings chosen in the simulation map the quality of the encoding of the
study list by the participants more or less adequately, this effect should, on the other
hand, turn out to be small.
Experimental Results
Decision Times
A reasonable comparison of the predictions of the PASS model with the
experimentally collected frequency judgments is only possible if it can be shown first
that the judgments of the participants were generated with a memory assessment
strategy. Therefore, below we examine first whether the decision times show the slight
increase with presentation frequencies typical of judgments based on memory
assessment. The decision times are additionally used to examine whether a preceding
recall test fosters the use of an enumeration strategy. If this is the case, the function
that relates the decision times with the presentation frequencies should be steeper than
without a preceding recall test.
Following Brown’s (1995; 1997) procedure, three time measures were recorded
during each trial of the frequency test. In addition to the decision time, an initiation
time and an entry time were ascertained for control purposes. Moreover, all three timemeasures were added to a total time indicating the time needed for the generation and
entry of a frequency judgment. For each of these four time-measures the mean and the
median over the five trials on each presentation frequency level were calculated for
every participant. These means and medians served as dependent variables for all
further analyses. However, all results reported below are based on analyses with mean
115
decision times. Hence, some explanatory notes on the remaining time measures and
medians: for all time-measures the medians were slightly smaller than the
corresponding means (between 0.1 s and 0.2 s). The resulting effects were unaffected
by this so that the analyses of means and medians lead to an identical pattern of result.
Typically, the initiation and entry of frequency judgments occurred quickly. The
corresponding means were at 1.0 s for initiation times and at 0.9 s for entry times. Both
time-measures slightly increased with the presentation frequency. The difference of
mean times between the presentation frequency 0 and the presentation frequency 8
amounted to approximately 0.3 s for both initiation times and entry times. Other
effects did not occur with the initiation and entry times. Finally, an analysis of the total
times leads to the same pattern of result as the analysis of decision times presented
here. All reported findings on initiation, entry and total times replicate results also
observed by Brown (1995; 1997).
In Figure 3.6 the mean decision times are plotted against the presentation
frequencies separately for the conditions with or without a preceding recall test.
Irrespective of presentation frequencies, the recall test constantly slows down the
generation of frequency judgments by approximately 0.2 s. The effect size of the recall
test on the decision times, however, is small with reffect size = .08. For the question about
the influence of the recall test on the used judgment strategies the fact that there is
obviously no interaction with the presentation frequencies is more important. The
decision times slightly increase both in the condition with and without recall test in a
similar manner. Thus, the data speak clearly against the assumption that a preceding
recall test fosters the use of an enumeration strategy.
In general, the decision times correspond to the result typical of judgments
based on memory assessment. For example, Brown (1995, Experiment 3) found
decision times of 2.0 s, 2.8 s, 3.0 s and 3.5 s for the presentation frequencies 0, 2, 4 and
8 for judgments based on memory assessment. In the present experiment we had
values of 2.3 s, 3.0 s, 3.4 s and 3.7 s averaged over the conditions with and without
recall test for the frequencies 0, 2, 5, and 8. In magnitude and trend the decision times
thus almost exactly come up to previous results for judgments based on memory
assessment (see also Brown, 1995; 1997). On the other hand, the sharp increase in the
116
decision times to values above 6.5 s for the frequency 8 occurring when an
enumeration strategy is used cannot be found. Hence, it can be assumed that the
participants in the present experiment used an assessment strategy to generate their
frequency judgments.
5
Recall
Mean Decision Time (s)
No Recall
4
3
2
0
0
2
5
8
Presentation Frequency
Figure 3.6. Mean decision time as a function of presentation frequency for groups with
and without a preceding recall test in Experiment 1. Error bars denote
standard errors.
The decision times for associated and non-associated items did not differ from
each other (reffect
size
= 0.02). An ANOVA with the within-factor ‘presentation
frequency’, and the between-factors ‘recall test’ and ‘association’ accordantly
exclusively identifies the main effect of the presentation frequency as significant,
0.45, F(3, 264) = 23.03, p
significant (all Fs
=
0.001. All other main effects and interactions were not
1) and the corresponding effect sizes were close to 0 (all s
0.1).
117
Frequency Estimates
Like the responses of the PASS model before, the frequency estimates of the
participants were also analyzed with regard to their relative magnitude and the quality
of discrimination of the actual frequencies. Apart from the influence of the
associations and presentation frequencies on the estimates, it should also be examined
whether a preceding recall test has an effect on the estimates. Below, the magnitude of
frequency estimates is considered first.
Relative Magnitude of Frequency Estimates: Every participant provided 5 estimates
on each level of presentation frequency. The means of these 5 estimates formed the
basis of the further analysis. From each of the 4 conditions resulting from the crossing
of the factors ’association’ and ’recall test’, 2 participants were excluded from this
analysis since they had provided exceptionally high estimates.30
The two charts in Figure 3.7 show the mean frequency estimates of the 44
participants remaining in each of the recall conditions as a function of the presentation
frequencies. A first visual assessment of the estimates in the condition without recall
test shows that the data display all features that were predicted by the PASS model
irrespective of possible parameter settings: First, the judgments on associated and nonassociated items increase with rising presentation frequencies. Second, this increase is
largely unaffected by the associations between the items so that the graphs for
associated and non-associated items show a similar trend. Third, for all presentation
frequencies the associated items receive higher estimates than the non-associated ones.
Collapsing over presentation frequencies, a mean of 4.5 is the result for associated
items, the mean for non-associated items amounts to 3.8. The effect size of the
associations on the frequency estimates is reffect size = 0.17, that is, according to Cohen’s
(1992) conventions, a small to medium effect. All together, the experimental frequency
estimates in the condition without a preceding recall test thus seem to correspond well
to the predictions of the PASS model.
30
The estimates of these participants were - for at least one of the presentation frequencies 2, 5 and 8 more than 1.5 interquartile ranges above the 75th percentile of the corresponding reference group.
118
No Recall
8
associated
non-associated
Mean Estimated Frequency
Mean Estimated Frequency
8
Recall
6
4
2
0
6
4
2
0
0
2
5
8
Presentation Frequency
0
2
5
8
Presentation Frequency
Figure 3.7. Mean estimated frequencies for associated and non-associated words in the
conditions without (left graph) and with (right graph) a preceding recall test
in Experiment 1. Error bars denote standard errors. The dashed lines
represent the actual frequencies.
In the condition with recall test the data deviate from this pattern in several
respects. First, the estimates given after the recall test are generally lower. The total
means of the estimates in the conditions with and without recall test amount to 3.6 and
4.2, respectively. The effect size of the recall test is reffect size = 0.18. Furthermore, the
differences between the recall conditions increase with rising presentation frequencies.
Collapsing over associated and non-associated items, the means for the presentation
frequencies 2, 5, and 8 differ between the recall conditions by 0.1, 0.7, and 1.3. So far,
the findings to the effect of the recall test on frequency estimates thus replicate the
results by Erickson and Gaffney (1985).
Moreover, in the condition with recall test differences occur in the trend of the
estimates between associated and non-associated items. These differences lead to the
fact that the data pattern in this condition apparently disagrees with the predictions of
the PASS model. It is true that also in the condition with recall test associated items
receive on average higher estimates than non-associated ones (Massociated = 3.8;
119
Mnon-associated = 3.4). The effect of associations on the estimates is here with reffect size =
0.18 as great as in the condition without recall test. On the other hand, the estimates on
associated items show a clearly smaller increase from presentation frequency 2
onwards than the judgments on non-associated items. This brings about that for the
presentation frequency 8 associated items receive lower estimates than non-associated
ones. Such differences in the trend of judgments for associated and non-associated
words have not been expected by PASS irrespective of the parameter settings.
Now, how good is PASS exactly at predicting the trend of the estimates in the
four conditions resulting from the combination of the factors ’association’ and ’recall
test’? To shed light on this, we determined a fit-measure for each of the individual
conditions. The 20 frequency estimates of each participant were correlated with the
mean predictions of the PASS model for the four presentation frequencies in the
corresponding condition31. Thus, the resulting correlations indicate for each participant
how well the trend in the estimates of this participant is predicted by PASS, inclusive
of the entire individual error variance. Table 3.2 displays the correlations in the four
conditions averaged across participants for a simulation with parameter values of
medium magnitude (θ1 = 0.025, window length = 5).
For estimates rendered without an intermediate recall test high fit-values arise
that are approximately at a mean correlation of r = 0.8 for both associated and nonassociated items. A t-test shows that the correlations for associated and non-associated
items do not differ significantly from each other, t(42) = 1.24. Thus, an analysis at the
level of individual participants confirms the result already indicated by the mean
estimates: As predicted by PASS, the estimates on associated and non-associated
words take a similar trend.
31
In other words, the predictions of the PASS model for the individual presentation frequencies each
time resulted from the mean responses over the 24 study lists presented in the corresponding condition.
However, remember that the study lists in the conditions with and without recall test only differed from
each other by randomly assigning items to presentation frequencies. Correspondingly, there are no
systematic differences between the conditions with and without recall test in the PASS predictions.
Thus, it is irrelevant for the resulting fit-measures whether the PASS predictions are based on all 48
associated and 48 non-associated study lists or whether they are – as in the reported analysis –
additionally determined separately according to recall conditions.
120
Table 3.2
Mean correlations between estimated frequencies and PASS mean responsesa for
associated and non-associated items in Experiment 1
No Recall
Recall
Associated
.77
.71
Non-associated
.81
.81
a
Parameter values for the PASS simulation: θ1 = 0.025, θ2 = 0.0001, θ3 = 0.0000001, window length = 5
However, this does not hold for estimates given after a recall test. In this case,
the correlation between the PASS predictions and the frequency estimates is smaller
for associated words than for non-associated ones, t(42) = 2.94, p < 0.005. Hence, after
a recall test, the trends of the estimates on associated and non-associated items differ
from each other. Apparently, PASS can map the trend of estimates on associated
words only comparatively poorly.
Since PASS expects a similar trend of estimates in all conditions irrespective of
the parameter combinations (see Figure 3.2), the fit-values hardly vary with the
parameter settings. For example, in the condition without recall test the mean
correlations for non-associated items are for all 60 parameter combinations between
r = 0.80 and r = 0.81 . As a result, also the differences in the fit-values between the
conditions are nearly invariant for all parameter settings. Thus, the recall test affects
the estimates on associated words in a way not predicted by PASS independent of the
parameter settings.
The fit-measures reported so far refer to the predicted trend of estimates across
the presentation frequencies. With that, they do not yet take into account the
differences predicted by PASS in the relative magnitude of estimates for associated
and non-associated items. In order to determine the co-variation between the estimates
and the predictions for all eight conditions that result from the combinations of the
factors ’association’ and ’presentation frequency’, contrast analyses (Rosenthal,
Rosnow, & Rubin, 2000) were carried out. To obtain a single effect size for the PASS
predictions, it was necessary to treat the data as if a complete between-subjects design
had been employed. The resulting effect sizes can be interpreted as the correlation
between the eight mean predictions of the PASS model and the individual mean
121
estimates of all participants. That is, the variance between participants enters these
effect sizes as error.
Table 3.3 contains the results of the contrast analyses for the lowest, the
medium and the highest settings for θ1 and the window length in the conditions
without preceding recall test. In the condition with recall test the effect sizes are
similar despite the different trend of the estimates for associated and non-associated
words which was not expected by PASS. This can be ascribed to the fact that in the
condition with recall test generally lower variances occurred32. Thus, irrespective of
the predictions, in this condition the effect sizes are impaired by less error variance
than in the condition without recall test. Therefore, the effect sizes are not comparable
in a reasonable manner between the two recall conditions. However, the former
analysis of the trends of the estimates already demonstrated that in the condition with
recall test differences between associated and non-associated words appeared that
cannot be accounted for by PASS. Consequently, only the total-fit in the condition
without recall test is described in detail below.
As can be seen in Table 3.3, high effect sizes of reffect size = 0.77 are the result
for the medium parameter combination (θ1 = 0.025, window length = 5) as well as for
the highest parameter combination used (θ1 = 0.05, window length = 7). In other
words, with these parameter settings PASS predicts the total pattern of frequency
judgments across the presentation frequencies and the two levels of association very
well. For the lowest parameter combination (θ1 = 0.005, window length = 3) the effect
size is clearly smaller with reffect
size
= 0.46. Recall that the predictions of various
parameter settings mainly differed in the expected influence of the associations on the
magnitude of estimates. In case of low parameter settings, that is, in case of a
comparatively bad encoding of the study lists by FEN, this influence is obviously
overestimated. The predicted differences between the estimates on associated and nonassociated items turn out to be too great here.
32
This particularly holds for the estimates on associated words. Here, the standard deviations between
the participants’ mean judgments on the four levels of presentation frequency were 0.58, 1.54, 1.89 and
2.01 in the condition with recall test, and 0.83, 1.50, 3.70 and 3.61 in the condition without recall test.
122
Table 3.3
Contrast Analyses with predictions from three different settings for θ1 and the window
length for the estimates in the condition without recall test in Experiment 1
Parameter settingsa
MScontrast
MSerror
dferror
reffect size
θ1 = 0.005,
window length = 3
θ1 = 0.025,
window length = 5
θ1 = 0.05,
window length = 7
290.62
5.12
168
.46
855.19
5.12
168
.77
884.68
5.12
168
.77
Note. F values can be calculated by dividing MScontrast by MSerror.
a
θ2 = 0.0001; θ3 = 0.0000001
The development of the effect sizes over the three parameter combinations
contained in Table 3.3 also stands the test in general: the fit-values clearly increase
from lower to medium parameter settings. On the other hand, a further increase of
parameter settings generally effectuates no change in the fit-values. The reason for this
is that the predictions for parameter settings of medium magnitude strongly correlate
with the predictions for high parameter settings. For example, the correlation of the
predictions for the medium parameter combination in Table 3.3 with the predictions of
the highest parameter combination used amounts to r = .98. Nevertheless, even the
predictions for these parameter settings differ in the expected degree of the impact of
the associations on the estimates. This has no effect on the correlations between the
predictions since for parameter settings in the higher range of values the variance of
the predicted estimates generally predominantly goes back to the variation of the
presentation frequencies (s. a. chart D in Figure 3.2.). Compared to this, the differences
in the predicted distances between estimates on associated and non-associated words
are too small to lessen the correlation deeply.
If one regards the expected influence of the association on the magnitude of the
estimates in isolation, the differences between the predictions of various parameter
combinations even in the higher range of values are nonetheless considerable.
Averaging over presentation frequencies, with the highest parameter combination it is
predicted that the estimates for associated words are 8% above the estimates for nonassociated words. In contrast, with the medium parameter combination a difference of
23% between the estimates for associated and non-associated words is expected. In
123
fact, the estimates of the participants differed by 15% between the two levels of
association. Owing to the trade-off between the learning parameter θ1 and the window
length, identical or similar differences between associated and non-associated words
are predicted with several parameter combinations. One of these combinations is θ1 =
0.04 and window length = 5. Figure 3.8 illustrates the correspondence between the
PASS predictions for this parameter combination and the estimates averaged over the
participants in the condition without recall test. In the figure both the estimates and the
responses from FEN are standardized in such a way that the means for associated
words with the presentation frequency 8 assume the value 1. A nearly perfect fit is the
result. The correlation of the participants’ mean estimates with the responses is ralert =
0.99 (Rosenthal, Rosnow, & Rubin, 2000).
Taken together, the analyses reported so far indicate that PASS is able to
predict the pattern in the estimates in the condition without recall test extraordinarily
well. And yet, also in this condition the judgments display a feature that was not
expected by PASS: In contrast to the means, the medians of the estimates to associated
items hardly differ from the medians of the estimates to non-associated items for all
presentation frequencies. Figure 3.9 illustrates the distributions of the estimates in both
conditions. The mean estimates per participant collapsed over presentation frequencies
served as dependent variable here. The distribution of judgments on associated words
is strongly skewed to the right. Accordingly, the median of the judgments is clearly
below the mean here (Mdassociated = 3.6). A comparably strong asymmetry in the
distribution of estimates cannot be found for non-associated words. For this type of
items the median is identical with the mean (Mdnon-associated = 3.8). As a consequence,
the medians in both conditions do not display an associative effect.33
33
In the condition with recall test, the distributions of the judgments are approximately symmetrical for
both associated and non-associated items, and the medians only differ marginally from the means. For
the mean estimates per participant a median of 3.7 (mean 3.8) results for associated words and a median
of 3.4 (mean 3.4) for non-associated words.
124
Standardized Mean Estimate
1,0
0,8
0,6
0,4
0,2
0,0
0,0
0,2
0,4
0,6
0,8
1,0
Standardized Mean Response
Figure 3.8. Standardized mean estimated frequencies in the condition without recall
test plotted against standardized mean responses from FEN in Experiment
1. The mean responses result from a simulation with the parameter settings
θ1 = 0.04, θ2 = 0.0001; θ3 = 0.0000001 and window length = 5. The line
indicates the best linear fit.
According to PASS, differences in the magnitude of the estimates on associated
and non-associated items should appear in the medians as well as in the means.
Irrespective of parameter settings, PASS expects approximately symmetrical
distributions for both associated and non-associated words with all presentation
frequencies. Correspondingly, predicted differences between the mean judgments on
associated and non-associated words are generally accompanied by similar differences
between the median judgments. For example, in the simulation with the parameter
settings θ1 = 0.04 and window length = 5, which predicts the mean estimates of the
participants almost perfectly, the median as well as the mean of all responses for
associated items amounts to 3.2. For non-associated words either, the mean and the
median turn out to be identical with 2.7. The lack of an associative effect in the
medians of the estimates is thus inconsistent with the prediction derived from PASS.
125
Mean Estimated Frequency
10
8
6
4
2
0
associated
non-associated
Figure 3.9. Distributions of the mean estimates on associated and non-associated
words in the condition without recall test in Experiment 1. The dotted
lines indicate means.
Discrimination of Frequencies: To determine the discriminability of the frequencies
of associated and non-associated words, the correlations between the 20 frequency
judgments of each participant and the actual presentation frequencies were calculated.
Figure 3.10 displays the means of these correlations in the conditions without and with
recall test34.
Both for associated and non-associated words fairly high correlations result in
both recall conditions. That is, the participants generally show great sensitivity to the
frequencies of the presented items. However, as predicted by the PASS model with all
parameter combinations, the association between the items impairs this sensitivity. In
both recall conditions the correlations are lower for associated words than for nonassociated ones. Without a preceding recall test, this difference is small. For associated
34
The data of those participants that had been excluded from the analysis of the relative magnitude of
frequency judgments are included in the analysis of the discrimination of frequencies. The judgments of
these participants were unusually high but did not differ notably from the values of the remaining
participants with regard to their correlation with the actual frequencies.
126
items the mean correlation here amounts to r = 0.77 , for non-associated items to
r = 0.80 . Such slight differences in the discriminability of associated and nonassociated words were expected by PASS with a multitude of parameter settings (s.
Fig. 3.5). The effect size of the association on the correlations in this recall condition
also turns out to be rather small with reffect size = 0.14. In the condition with a preceding
recall test the difference is notably greater. Here, the participants discriminate the
frequencies of associated words clearly worse, which was already alluded to in the
comparatively slight increase of mean estimates across the presentation frequencies.
For these words the mean correlation is r = 0.69 . For non-associated words the mean
is here again r = 0.80 . The effect size of the association in the condition with recall
test comes up to reffect size = 0.45, a large effect according to Cohen’s convention.
1,0
associated
non-associated
Mean Correlation
0,9
0,8
0,7
0,6
0,5
No Recall
Recall
Figure 3.10. Mean correlations between estimated and actual presentation frequencies
for associated and non-associated items in Experiment 1. Error bars
denote standard errors.
In the ideal case, the PASS model should of course predict the relative
differences in the discriminability of associated and non-associated words best with the
127
parameter combination that attained the best fit for the predictions concerning
magnitude and trend of the estimates. Due to the similarity of the predictions of
various parameter settings, an unequivocally best parameter combination cannot be
determined neither regarding the discrimination nor the magnitude of judgments.
However, in the previous paragraph it could be shown that medium to high parameter
settings yield the best fit with regard to the differences in the magnitude of estimates
on associated and non-associated items. With parameter settings from this range of
values PASS expects that the correlations between estimated and actual frequencies for
associated items turn out to be smaller by about 1% to 4% than for non-associated
items (see Fig. 3.5). In fact, the correlations for associated and non-associated items in
the condition without recall test differ from each other by 3.8%. In other words, in this
condition similar parameter combinations result in good predictions both with regard
to the magnitude of estimates and with regard to the discrimination. This does not hold
for the condition with a preceding recall test. Here the correlations for associated
words are by 13.8% below the correlations for non-associated words. Such great
differences in the discriminability of associated and non-associated items are predicted
by PASS only with low parameter settings. However, with these settings the difference
in the magnitude of estimates on associated and non-associated items is at the same
time clearly overestimated. Thus, for example, PASS correctly predicts the difference
in the correlations in the condition with recall test with the parameter combination θ1 =
0.025 and window length = 3. This parameter combination, however, also results in the
prediction that the mean frequency estimates for associated words are by 54% above
the frequency judgments for non-associated words.
A final remark on the discrimination of frequencies and their prediction by
PASS is in order: As we have seen above, the participants generally discriminate fairly
well between different frequencies. Nevertheless, the participants’ discrimination
performances are in all conditions worse than expected by the PASS model. For nonassociated words PASS already predicted with the lowest parameter combination a
mean correlation between estimated and actual frequencies of 0.85. With all other
parameter settings values of above 0.90 were resulting. Even for associated words
correlations which were unexceptionally clearly above 0.80 already resulted with
128
relatively low parameter settings (see Fig. 3.4). In other words, while PASS predicts
the relative difference in the correlations for associated and non-associated words
correctly – at least in the condition without recall test - it overestimates at the same
time the absolute magnitude of the correlations.
Discussion
The main objective of the present study was to clarify whether the PASS model
can explain possible effects of the associations between jointly presented items on
frequency judgments on these items. We examined in this respect both the magnitude
of the estimates and the discrimination of the actual frequencies for lists of associated
and non-associated words. Let us first consider the results in the condition in which no
recall test was carried out between the presentation of study lists and the submitting of
frequency judgments.
Estimates in the Condition without Preceding Recall Test
In this condition the associated words received higher mean frequency
judgments than the non-associated ones. This effect was independent of the
presentation frequency of the items so that the estimates on associated and nonassociated words had a nearly parallel trend. Besides, the participants discriminated
worse between the frequencies of associated words than between the frequencies of
non-associated ones. However, the difference between the mean discrimination
performances in both conditions was modest. In addition to this, by means of the flat
increase in the decision times we could demonstrate that the participants relied on a
memory assessment strategy when they generated their judgments. This confirms the
results of earlier studies in which also no indication to the use of an enumeration
strategy was found for judgments on the frequency of occurrence of items (Brown
1995; Dougherty & Franco-Watkins, 2003; Manis et al., 1993; Marx, 1985). Thus, we
can conclude that the effects of associations on frequency judgments are mediated via
the process of memory assessment. Therefore, PASS, originally specifically developed
to the purpose of modeling frequency judgments based on memory assessment, should
be able to explain these effects.
129
The result pattern observed in the condition without recall test indeed exactly
corresponded to the invariant predictions of the PASS model. The preceding
simulations had shown that independent of possible parameter settings PASS expects
that the frequencies of associated words are estimated higher and are, at the same time,
discriminated worse than the frequencies of non-associated words. The trend of the
estimates should in this process not be influenced by the associations between the
words. Consequently, the parameter-independent, ordinal predictions of the PASS
model are in accordance with the data.
Furthermore, we could demonstrate that with medium to high parameter
settings PASS is able to predict various features of the estimates also quantitatively
very well. Thus, high fit-values resulted for the trend in the estimates of individual
participants both for associated and non-associated words. Huge effect sizes also
resulted for the total fit between the PASS predictions and the participants’ estimates
for associated and non-associated words on the four levels of presentation frequency.
For the mean estimates this fit was even almost perfect. Moreover, those parameter
settings obtaining these high fit-values also predicted the difference in the
discrimination performances with associated and non-associated items approximately
correctly. Thus, the best fit-values were unexceptionally achieved with parameter
settings from the higher range of values which simulated a relatively good encoding of
the study lists. This also means that from the model’s point of view the influence of the
pre-experimentally acquired associations on the frequency judgments was smaller than
it could have been since the participants showed a relatively good learning
performance in the experiment.
To summarize, the analysis of the quality of the predictions of the PASS model
in the condition without recall test so far led to quite impressive results. However, the
participants’ estimates also revealed a feature that deviated systematically from the
predictions of the model: The medians of the estimates on associated and nonassociated items were approximately of the same size. That the medians other than the
means showed no associative effect was due to the right skewed distribution of the
estimates to associated items. In contrast to this the responses from FEN were roughly
symmetrically distributed in case of both types of items.
130
Differences between the distributions of estimates and responses alone are,
however, not yet critical for the assumption that the responses form the basis for
intuitions of event frequencies. As already explained above, memory assessment first
of all results in a non-numerical intuition of frequencies which is modeled in PASS by
FEN’s responses. In a second step, this intuition has to be transformed into a numerical
judgment. In experiments in which like in the present study no upper bound of the
judgment range is given right-skewed distributions of frequency judgments are
frequently observed (Brown, 1995; 1997; 2002). This is mostly explained by the fact
that the participants can deviate arbitrarily far upwards from the actual frequencies
when scaling their absolute estimates due to the lack of an upper limit. Deviations
downwards are, however, limited by the minimal frequency 0. Thus, upwards more
extreme deviations can be expected than downwards (Conrad & Brown, 1994; Brown,
1995).
Against the background of this consideration it cannot necessarily be expected
that the distributions of the frequency judgments are as symmetrical as the
distributions of the responses from FEN. What could nevertheless be expected
according to the PASS predictions is that the distributions of the estimates for
associated and non-associated items turn out similar. The reason for this is that when
deriving the PASS predictions, we assumed that participants encoding associated
words carry out the transformation into absolute frequencies in the same way as
participants encoding non-associated words. Since, furthermore, the responses from
FEN are distributed equally for both kinds of items, equal distributions in the estimates
should consequently be the result. That also makes PASS predict that differences in the
estimates to associated and non-associated items should be of a similar size in medians
and means.
Why did the data not meet this prediction? In the experiment the participants
were not provided with any information about the frequency of any single item or the
length of the study list. Consequently, when giving their estimates, the participants had
to infer the range of the absolute word frequencies. Generally, only little is known
about how people transform their intuitions of event frequencies into estimates on
absolute frequencies if no indications to an adequate judgment scale are given. The
131
assumption that the participants carry out these transformations in both associative
conditions in the same way was based on the fact that associated and non-associated
words were presented in the same way and that the judgments on the frequencies of
those words took place with identical instructions and under identical conditions.
Equal transformations in both associative conditions could perhaps develop if
participants used former experiences of which memory responses in comparable
situations corresponded to which absolute frequencies. Actually, the difference in the
means of the frequency judgments between both associative conditions supports the
assumption that in the condition with associated words at least some participants
carried out the transformation in the same way as the participants in the condition with
non-associated words, and thus transformed higher responses into higher estimates.
However, in retrospect, the assumption that all participants behaved like this seems
doubtable. An alternative possibility is that some participants could use information
they had gained independent of the process of memory assessment when transforming
their intuitions into frequency judgments. For example, it is conceivable that
participants happen to know35 the frequency of a single item and can thus find an
adequate judgment range for the remaining items (Brown & Siegler, 1993). A further
possibility is that participants can derive a plausible value for the total number of item
presentations from the time span of the individual presentations and the time span of
the presentation of the total study list. Irrespective of whether and how often
participants in the condition with non-associated words could use such kinds of
information, these participants inferred the absolute frequency of the items
approximately correctly. The mean estimates on non-associated words correspond to
the actual frequencies quite well (see Fig. 3.7). The sum of the 20 estimates given by
each participant amounts in mean and median to 76.1 and corresponds nearly exactly
to the total number of 75 presentations on the study list. A consequence of this is that
an identical transformation of responses in estimates would have led the participants in
35
This could be the case when participants try to count the presentation frequency of at least a single
item during the study phase. It is well known that the frequencies of numerous items presented
experimentally are not all counted successfully (Brown, 2002; cf. footnote 1). Nevertheless, Brown’s
results (1997) support the assumption that accurate tallies of single items can be occasionally used by
participants to infer an adequate frequency range for the remaining items. Another possibility would be
that the participants are able to recollect various encoding episodes for at least one item and, thus, find
the absolute frequency of this item correctly via enumeration.
132
the condition with associated items to estimates that would have been too high.
Participants in this condition who could correctly infer the frequency range perhaps by
means of their knowing the absolute frequency of a single item should thus behave
differently: They should choose a different transformation than but the same (correct)
scaling of their estimates as the participants in the condition with non-associated items.
Thus, despite higher responses, these participants would give equally high estimates as
participants who encoded non-associated items. Such a group of participants could
produce the lower estimates on associated words and could therefore prevent that a
difference between estimates on associated and non-associated items is also indicated
in the medians.
The explanation that the different distributions of the estimates for associated
and non-associated items are based on different transformations hinges critically on the
fact that the factor ’association’ was manipulated between-subjects in Experiment 1.
The situation should change provided that participants in a within-subjects design give
judgments on associated and non-associated items that were previously jointly
presented on one study list. In this case, every single participant should perceive the
differences in the intuitions of event frequencies between both types of items predicted
by FEN. Even if participants had additional pieces of information about the absolute
frequency of single items or about the length of the list, then this should merely lead
them to infer a corresponding judgment scale for their absolute estimates. This
judgment scale should then be applied to both associated and non-associated items.
Consequently, differences in the intuitions to both types of items should in this case be
converted into differences in the estimates, irrespective of whether additional
knowledge about an adequate judgment scale is available or not. Hence, an associative
effect occurring in FEN’s responses should in a within-subjects design be found in
both the medians and the means of the frequency estimates. we tested this prediction in
a second experiment in which associated and non-associated items were presented
jointly on one study list to the participants. It should be pointed out that even in a
within-subjects design it is not necessarily to be expected that the distributions of
frequency judgments show the same shape as the distributions of the responses in
FEN. While it is assumed that a given participant carries out the transformations in the
133
same way for all items to be assessed, different participants could perfectly well use
different transformations (i.e. due to dissimilar knowledge that can be used to develop
a subjectively suitable judgment scale). Since generally in the transformation process
more extreme deviations from the actual frequencies are possible upwards than
downwards, right-skewed distributions of the estimates can furthermore appear even if
the distributions of the responses are symmetrical. Thus, the prediction for the withinsubjects design only states that the distributions of the estimates to associated and nonassociated items should have the same shape as long as also FEN’s responses for both
types of items are distributed alike.
Estimates in the Condition with Preceding Recall Test
In the condition with recall test the estimates turned out clearly lower than in
the condition without recall test for both associated and non-associated items. Besides,
the differences in the magnitude of the estimates between the recall conditions
increased with rising presentation frequencies. In this respect, the results of the present
experiment replicate the findings of a previous study by Erickson & Gaffney (1985)
which also examined the influence of a preceding recall test on frequency judgments.
The differences between the estimates in both recall conditions also correspond
to the typical differences between estimates based on memory assessment and those
estimates generated with enumeration. Similar to the estimates in the condition with
recall test, estimates based on enumeration mostly turn out lower than estimates based
on memory assessment and show a smaller slope (Brown, 2002). However, the
decision times contradict the obvious hypothesis that a recall test fosters the use of an
enumeration strategy and thus produces the differences in the magnitude of the
estimates. In fact, the preceding recall test decelerated the decision times marginally,
an interaction between the recall conditions and the presentation frequencies could yet
not be found. Accordingly, the sharp increase in decision times with rising presentation
frequencies - typical of judgments based on enumeration - did not appear even after the
recall test. Hence, the decision times support the conclusion that the participants
applied memory assessment in both recall conditions to generate their estimates.
134
This evidently also means that a model of the memory assessment process
should be able to explain both the estimates in the conditions with and without a recall
test. The differences between the recall conditions in magnitude and slope of estimates
described so far do not yet represent any essential problem. By the additional
assumption that the recall test causes the participants to choose a smaller scaling when
transforming their intuitions into absolute estimates, such differences could be
integrated in the PASS prediction.
Problems for the PASS model indeed do arise from the fact that additionally
also the findings concerning the impact of associations on estimates differed between
the recall conditions. It is true that higher judgments for associated items were found
as expected also after a recall test, but the estimates on associated and non-associated
items did not run parallel. For associated words the estimates increased clearly slower
from presentation frequency 2 onwards. This trend of judgments on associated words
only matched comparatively badly with the PASS prediction. Furthermore, the
participants discriminated the frequencies of associated items clearly worse than those
of non-associated items. Therefore, it was impossible to find parameter combinations
with which PASS predicts both the differences in the discrimination performances and
the differences in the magnitude of the estimates approximately correctly.
From the model’s point of view, these findings are surprising. They indicate
that the recall test specifically affected the estimates on associated items in a way not
covered by the model. Presumably, this influence can only be integrated in PASS with
difficulty. The influence of the recall test cannot take effect during the encoding phase
since the recall conditions in the experiment did not differ until the completion of the
presentation of the study lists. Consequently, it is by no means possible to map the
impact of the recall test in PASS by the assumption of a changed learning parameter or
a changed size of the attentional window. Erickson and Gaffney (1985) assumed that
the recall test could take effect on the level of memory representations. They presumed
that the successful recall of an item alters the memory representation of this item in a
similar way as an additional presentation of this item on the study list. This assumption
about the effect of a recall test on frequency judgments could in fact be easily modeled
with PASS, but it disagrees with the results found. If the successful recall of an item
135
had a similar effect as an additional presentation, one would rather expect higher than
lower estimates after a preceding recall test. Moreover, in our experiment as well as in
previous studies (e.g. Deese, 1959; Cofer, 1965) words presented more frequently
were recalled with a higher probability. Correspondingly, the estimates in the
condition with recall test should have increased more steeply than in the condition
without test. After all, also associated words were recalled more often in the
experiment than non-associated ones. Hence, the difference in the magnitude of
estimates on associated and non-associated words would have had to be greater after
the recall test. None of these predictions applied.
Therefore, it has to be assumed that the impact of the recall test takes effect
only at the time when the judgments are given. Presently, PASS does not offer any
possibility to account for such influences. In the model merely one invariant process to
retrieve frequency information from memory is specified. The resulting responses
should always be converted with a linear transformation into absolute frequency
estimates. In situations in which additional influences take effect at the time when
judgments are given the model can thus not be applied. If the findings from the present
experiment were replicable, then a preceding recall test would constitute such a
limitation of applicability of PASS. Therefore, we decided to incorporate the recall
condition in the second experiment as well.
136
Study 2
For the most part, Experiment 2 represents a replication of the previous study.
The essential modification is that the factor ‘association’ was manipulated withinsubjects in Experiment 2. Associated and non-associated items were presented together
on one study list and every participant judged the presentation frequencies of all items.
This change in the design was mainly motivated by the observation in Experiment 1
that the estimates on associated and non-associated items were distributed unequally.
We assumed that this deviation from the PASS prediction went back to the fact that in
a between-subjects design the intuitions of frequencies of associated and nonassociated items were converted into estimates in a different manner. Provided that this
assumption is correct, in a within-subjects design the distributions of the estimates
should correspond to the PASS predictions. In addition to this, the use of the withinsubjects design allows us to examine whether the effects of the associations on the
magnitude of estimates and the discrimination of frequencies from Experiment 1 also
occur in case of mixed lists.
The frequency estimates in the second experiment were the subject of a new
simulation with PASS. We carried out this simulation in order to check whether the
changed arrangement of the study lists alters the basic predictions of the model
concerning the effect of associations on frequency estimates. The results of this
simulation enable us furthermore to determine again the quantitative fit between the
experimental estimates and the predictions of the model.
In the following paragraph the construction of the study lists and the procedure
of the second experiment is described. Afterwards, we turn to the simulation method
and the simulation results. Finally, the results of the experiment are reported and
compared with the PASS predictions.
Method
Participants: Fourty-eight persons took part in the experiment (23 men and 25
women with an average age of 23 years). None of them had taken part in Study 1. The
137
participants were mainly students of the TU Chemnitz from different departments.
Students of Psychology did not take part in the experiment. The participants received a
reimbursement of 3 Euros.
Material: In the experiment two item sets with 20 words each were used. Both
item sets contained ten words strongly associated with each other and ten words as
weakly associated to all other words in the set as possible. The ten associated items
were taken from the two sets with associated words from Experiment 1. We used the
stimulus words Stuhl (chair) and Fluss (river) and those nine words which were,
according to PASS, most strongly associated with these stimulus words. Both sets
were completed by ten words from the corresponding non-associated sets from
Experiment 1. Here, those words were admitted in the sets which, according to PASS,
showed the weakest association with “Stuhl” or “Fluss”. Table 3.4 shows the two
resulting item sets.
Table 3.4
Word lists used in Study 2. English translations of the original German words are
given in parentheses.
Associated words:
Non-associated words:
Set 1
Stuhl (chair)
Tisch (table)
Sessel (easy-chair)
sitzen (to sit)
setzen (to sit down)
Hocker (stool)
Möbel (furniture)
Sofa (sofa)
rutschen (to slide)
Bett (bed)
mutig (courageous)
Sinn (sense)
Bereich (area)
sauer (sour)
Gen (gene)
ragen (to tower)
Start (start)
urteilen (to judge)
Protest (protest)
vernichten (to destroy)
Set 2
Fluss (river)
Wasser (water)
Strom (stream)
Ufer (bank)
tief (deep)
breit (broad)
schwimmen (to swim)
Fisch (fish)
Brücke (bridge)
überqueren (to cross)
Münster (German city)
Wand (wall)
Ohr (ear)
Horn (horn)
Herbst (fall)
Brief (letter)
leiden (to suffer)
Sport (sport)
Herrschaft (power)
Druck (pressure)
138
To quantify the associative strengths between all words on the item sets, we
used the PASS model again. PASS learned these associations – as far as this did not
already happen within Study 1 – during the simulation of the pre-knowledge required
for the predictions. For each word the mean association to the 19 remaining words
from the same item set was calculated. After that, the mean and the median for the 10
associated words as well as for the 10 non-associated words were calculated from
these means. According to PASS, the means and medians of the associated words are
in both sets about 4 times larger than the means and medians of the non-associated
words.
For each participant a unique study list was created from one of the item sets.
This study list contained always 5 words from the corresponding item set 0, 2, 5, and 8
times. For half of the participants three associated and two non-associated words each
were assigned to the presentation frequencies 0 and 5, while the presentation
frequencies 2 and 8 were filled with two associated and three non-associated words
each. For the other half of the participants the distribution of associated and nonassociated items to the presentation frequencies was reversed. With the exception of
these restrictions, associated and non-associated words were randomly assigned to the
presentation frequencies for every participant. The sequence of items was also
randomly determined for every participant, with the restriction that the same word
could not appear twice in immediate succession. Thus, associated and non-associated
words appeared randomly mixed on the study lists. The same held for the test lists that
contained the 20 words of the corresponding item set once again in random sequence.
Procedure: The procedure was identical to the one used in Experiment 1.
Simulation Method and Results
The simulations with PASS followed the same procedure as in Study 1. First of
all, PASS learned the additionally required associative pre-knowledge from the text
corpus. As before, the parameter settings were θ1 = 0.005, θ2 = 0.0001 and θ3 =
0.0000001. Also the window length of 12 words was retained. Subsequently, the
presentation of the study lists was modeled. Here PASS encoded the same 48 study
139
lists that were presented to the participants in the experiment. This learning trial was
run with the 60 different parameter combinations also used in Study 1.
Relative magnitude of responses: The simulation results to both item sets did not
differ systematically and are thus summarized below. Figure 3.11 displays a typical
PASS prediction about the relative magnitude of frequency judgments for associated
and non-associated items. The predictions come from a simulation with parameter
settings of medium magnitude for θ1 and the window length. The interference and
decay parameter were θ2 = 0.0001 and θ3 = 0.0000001, respectively. In order to
determine the mapped data points, first, within each study list the means of FEN’s
responses to associated and non-associated words were calculated on a given
frequency level. Afterwards, the unweighted means across the 48 study lists were
computed from these means. Finally, these means were standardized via a division by
the mean for associated words with the frequency 8.
Standardized Mean Response
1,2
associated
non-associated
1,0
0,8
0,6
0,4
0,2
0,0
0
2
5
8
Presentation Frequency
Figure 3.11. FEN’s standardized mean responses for the parameter settings θ1 = 0.03,
θ2 = 0.0001, θ3 = 0.0000001 and window length = 5 in Study 2.
140
As illustrated in the figure, the essential features of the PASS prediction from
Study 1 are preserved even if associated and non-associated words are presented
together on one study list. PASS is sensitive to the presentation frequencies of the
items. It expects higher estimates for associated than for non-associated words. The
size of this effect is not influenced by the frequency of the words so that the predicted
estimates on associated and non-associated words run parallel. Changes in the
parameter settings also take a similar effect on the PASS predictions as in Study 1. The
estimates on associated and non-associated words show a similar trend for all
parameter combinations. However, the size of the expected effect of the associations
on the magnitude of the estimates depends on the settings for θ1 and the window
length. With an improved encoding of the study lists – i.e. a higher θ1 and a greater
window length – the impact of the associations on the judgments becomes smaller.
Moreover, if forgetting processes occur, greater differences in the estimates between
associated and non-associated items generally come about than if this is not the case.
The extent of the influence of the parameters θ2 and θ3 regulating these forgetting
processes, however, was too minor again to be of practical relevance.
Figure 3.12 illustrates in which range the expected associative effect varies
depending on the parameter settings. The dependent variable is once more the ratio of
the mean responses to associated and non-associated items on each frequency level.
For all parameter settings and all presentation frequencies values greater than 1 are the
result. In other words, irrespective of the parameter settings, PASS also expects for a
mixed list of associated and non-associated words higher judgments for the associated
items. The only systematic difference to the predictions in Study 1 is that – equal
parameter settings presupposed – the associative effect is somewhat smaller. For
example, PASS here expects with a parameter combination of medium magnitude (θ1
= 0.025, window length = 5) that the judgments for associated words averaged over
frequencies are by 19% higher than the judgments for non-associated words. In Study
1, on the other hand, the predicted estimates for associated and non-associated words
in the same parameter combination differed by 23%. This difference in the expected
associative effect is due to fact that the number of jointly presented associated words
on the mixed lists is smaller than in Study 1. Consequently, here as well the mean
141
associative strength of an associated item to all remaining words on the study list is
smaller. Accordingly, FEN’s response to such a word is influenced less strongly by the
remaining words.
Ratios of Mean Responses
4
3
2
1
0
0
2
5
8
Presentation Frequency
Figure 3.12. Distributions of ratios of mean responses for associated and nonassociated items in Study 2.
Discrimination of Frequencies: Similar to the relative magnitude of estimates, the
model basically makes the same predictions for the discriminability of the frequencies
of associated and non-associated items as in Study 1. To determine the expected
quality of the discrimination on the two levels of association, the correlations between
FEN’s responses and the actual frequencies of the words were calculated on each of
the 48 study lists. The correlations averaged over the study lists varied for the
associated words between 0.47 and 0.98 depending on the parameter settings. For nonassociated words the correlations varied between 0.88 und 0.99. Generally, the
142
correlations increased with higher settings for θ1 and the window length36. Thus, a
better encoding of the study lists again led to a better discrimination of the presentation
frequencies. With a given parameter combination the correlations for associated items
were constantly lower than for non-associated items. In other words, irrespective of the
parameter settings, PASS expects that the frequencies of associated items are
discriminated worse than the frequencies of non-associated items. Figure 3.13
illustrates this situation by means of the ratios of the mean correlations for associated
and non-associated words. However, the figure also demonstrates that for the great
majority of parameter combinations again only slight differences are yielded between
the correlations in both associative conditions. Thus, with parameter settings of
medium magnitude PASS already expects that the correlations for associated words
only differ by approximately 3% from the correlations for non-associated words. With
a further improved encoding of the study lists the correlations also assimilate further.
The effect is almost of exactly the same size as in Study 1 when medium or
high parameter combinations are used. However, with low parameter settings a
tendency becomes apparent that the difference between the correlations for associated
and non-associated words is slightly smaller here. This again can be put down to the
fact that the mean associative strength of an associated item to the remaining items on
the study list is smaller here than in the between-subjects design from Study 1. As a
result, the discrimination of frequencies of associated items is less strongly impaired
by the associative connection between the words. Due to this, slightly higher
correlations between FEN’s responses and the actual presentation frequencies of
associated words result here, and the difference to the correlations for non-associated
words is somewhat smaller. However, with medium and high parameter settings PASS
already expected in Study 1 that even the frequencies of associated words are
discriminated extremely well (s. Fig. 3.4). Thus, the discrimination performance can
hardly be improved by reducing the mean associative strength between the words.
36
The influence of the variation of the parameters θ2 and θ3 on the correlations was again very small.
The tendency already observed in Study 1 according to which the occurrence of forgetting processes
involved slighter differences in the discriminability of associated and non-associated items, however,
appeared anew.
143
Consequently, with these parameter settings the difference between the correlations for
associated and non-associated items is practically of the same size in both studies.
Ratio of Mean Correlations
1,0
0,9
0,8
0,7
0,6
0,5
Figure 3.13. Distribution of ratios of mean correlations between FEN’s responses and
presentation frequencies for associated and non-associated items in Study 2.
All in all, as in Study 1 PASS expects that associated words are discriminated
worse than non-associated ones. As far as the quality of the encoding of the study lists
by the participants does not deteriorate clearly compared to Study 1, the difference in
the discrimination performances should, however, be small again.
Experimental Results
Decision Times
For every participant the means and medians of the decision times were
calculated for associated and non-associated items on each frequency level. The same
measures were also determined for the initiation, entry and total times (the sum of the
three timings). Below, only the results of the analysis of the mean decision times are
reported in detail. For the remaining time measures and the medians similar results as
in Experiment 1 were found. For all time measures the medians were slightly smaller
than the means (about 0.1 s) but showed the same effects. The mean initiation time
144
was 1.2 s, the mean entry time 1.1 s. Both time measures slightly increased across the
range of the presentation frequencies (again by about 0.3 s). An analysis of the total
times yields the same result pattern as the analysis of the decision times.
Mean decision time is plotted against presentation frequency for the
Mean Decision Time (s)
participants with and without a preceding recall test in Figure 3.14.
Recall
No Recall
5
4
3
2
0
0
2
5
8
Presentation Frequency
Figure 3.14. Mean decision time as a function of presentation frequency for groups
with and without a preceding recall test in Experiment 2. Error bars
denote standard errors.
As in Experiment 1, the participants in the condition with recall test took more
time to generate their frequency judgments than the participants in the condition
without recall test. Averaged over presentation frequencies, the decision times between
both recall conditions differed by 0.5 s. The effect size of the recall test was rather
small with reffect size = .13. Generally, the participants responded somewhat slower than
in Experiment 1 (cf. Fig. 3.6). However, in both recall conditions again only a
comparatively small increase in the decision times resulted across the range of
presentation frequencies. Thus, the difference between the mean decision times for the
presentation frequencies 0 and 8 amounted to 1.3 s in the condition with recall test and
145
1.6 s in the condition without recall test, similar to the results in Experiment 1. Hence,
the data suggest again that the participants in both recall conditions employed a
memory assessment strategy to generate their frequency estimates.
An ANOVA with the within-subjects factors ‘presentation frequency’ and
‘association’ and the between-subjects factor ‘recall test’ reveals that apart from the
presentation frequency ( = 0.57, F(3, 138) = 22.24, p < 0.001) also the association
between the items ( = 0.37, F(1, 46) = 7.36, p < 0.01) exerted a significant influence
on the decision times. The main effect of the recall test, on the other hand, is not
significant as well as all interactions (all Fs < 1.1 and all s < 0.13). The main effect of
the factor ‚association’ is due to the fact that the participants generated their estimates
to associated items (M = 3.7 s) slightly slower than the estimates to non-associated
items (M = 3.4 s). However, it should be noted that the effect size of the association is
with reffect size = .09 even lower than that of the recall test if one treats the association in
the calculation as a between-subjects factor (for reasons to use between-subjects effect
sizes in repeated measurement designs, see Rosenthal, Rosnow, & Rubin, 2000;
Dunlap et al., 1996).
Frequency Estimates
Relative Magnitude of Frequency Estimates: The estimates of every
participant on associated and non-associated words on each frequency level were
averaged separately. These individual means formed the basis of the further analysis.
We excluded five participants from this analysis (three in the condition with and two in
the condition without recall test) since they gave inordinately high estimates.37
In Figure 3.15 the unweighted means of the frequency judgments of the
remaining participants are plotted against presentation frequencies. As can be seen in
the figure, the findings on the magnitude of the mean frequency judgments from
37
At least two of the six mean estimates of these participants to associated and non-associated items
with the presentation frequencies 2, 5 and 8 were more than 1.5 interquartile ranges above the 75.
percentile of the corresponding reference group. Four of the five excluded participants produced outliers
both with associated and non-associated items. Generally, the difference between the estimates to
associated and non-associated words was clearly greater for the excluded participants than for the
remaining ones. However, the estimates of the excluded participants were also on average three times as
large as the judgments of the remaining participants.
146
Experiment 1 are largely replicated. In the condition without recall test the results thus
comply again with the invariant predictions of the PASS model: The judgments on
associated and non-associated items show an approximately parallel trend and the
associated items constantly receive higher estimates. This associative effect here turns
out notably greater than in Experiment 1. Collapsing over presentation frequencies,
means of 4.7 and 3.7 result for associated and non-associated words, respectively.
Following the suggestions of Rosenthal et al. (2000), we computed the effect size of
the association as if a between-subjects design had been employed, which resulted in
reffect size = 0.28.
No Recall
associated
non-associated
8
Mean Estimated Frequency
8
Mean Estimated Frequency
Recall
6
4
2
0
6
4
2
0
0
2
5
Presentation Frequency
8
0
2
5
8
Presentation Frequency
Figure 3.15. Mean estimated frequencies for associated and non-associated words in
the conditions without (left graph) and with (right graph) a preceding
recall test in Experiment 2. Error bars denote standard errors. The dashed
lines represent the actual frequencies.
In the condition without recall test in Experiment 1 the means of the estimates
on associated and non-associated items differed but, on the other hand, the medians of
the estimates turned out roughly identical, contrary to the PASS prediction. We
presumed that this occurred because participants giving judgments on associated items
transformed their intuitions of event frequencies – at least partly – differently into
absolute judgments than participants whose judgments referred to non-associated
147
items. Furthermore, we assumed that associated and non-associated items should be
treated alike in the transformation process by any given participant, provided that the
words are both presented on the same study list in a within-subjects design and that the
participants make judgments on both kinds of items. The results of the second
experiment confirm this consideration. Figure 3.16 displays the distributions of the
mean estimates to associated and non-associated items collapsed over presentation
frequencies in the condition without recall test. Both distributions show a similar form
and are somewhat skewed to the right. The medians (4.3 for associated and 3.2 for
non-associated words) are accordingly below the means. However, the difference in
the magnitude of the estimates can be seen in the medians as well as in the means as
was expected by PASS.
Mean Estimated Frequency
10
8
6
4
2
0
associated
non associated
Figure 3.16. Distributions of the mean estimates on associated and non-associated
words in the condition without recall test in Experiment 2. The dotted
lines indicate means.
The results in the condition with recall test differ from those in the condition
without recall test in a similar way as they did in Experiment 1 (s. Figs. 3.15 and 3.7).
Again, lower estimates are given after the recall test. The total means with and without
recall test amount to 3.3 and 4.2, respectively. The effect size of the recall test is
148
reffect size = 0.36. The differences between the recall conditions slightly rise with
increasing presentation frequencies. Collapsing over associated and non-associated
items, the mean estimates differ between the recall conditions by 1.0, 1.2 and 1.4 for
the presentation frequencies 2, 5 and 8. Moreover, in contrast to the PASS prediction,
the trends of the mean estimates on associated and non-associated items differ after the
recall test. As in Experiment 1, from presentation frequency 2 onwards the estimates
on associated words show a clearly smaller increase than the judgments on nonassociated words. Nevertheless, associated items receive on average somewhat higher
estimates (Massociated = 3.5; Mnon-associated = 3.1)38. The effect size of the association,
however, is with reffect size = 0.19 notably smaller than in the condition without recall
test (again the effect size was calculated as if a between-subject design had been
employed).
An analysis of the fit between the participants’ individual estimates and the
PASS predictions in both associative conditions likewise yields that the estimates to
associated and non-associated items show different trends after the recall test. To
determine the fit, in both associative conditions the mean predictions of the PASS
model for the four presentation frequencies were correlated with the 10 corresponding
estimates of every participant, similar to the procedure in Study 1.39 Table 3.5 shows
these correlations averaged across participants for the simulation with the parameter
settings θ1 = 0.025 and window-length = 5. In the condition without recall test high
and identical fit values result for associated and non-associated items. In the condition
with recall test, however, the fit values for associated and non-associated items differ.
The mean correlation between the PASS predictions and the judgments is lower with
associated words, t(20) = 1.83; p = 0.08. Since PASS expects here as in Study 1with all
parameter combinations a similar trend of estimates both for associated and nonassociated items, this result is independent of the parameter settings. Consequently, the
38
The distributions of the estimates in the condition with recall test are again slightly skewed to the right
for both associated and non-associated words. The medians are accordingly only slightly below the
means (Mdassociated = 3.3; Mdnon-associated = 3.0).
39
The mean predictions of the PASS model were determined separately for the 24 study lists from the
conditions with and without recall test. However, the predictions of the model again turn out almost
identical in both recall conditions. Consequently, the fit hardly changes if the predictions are in both
conditions based upon all 48 study lists.
149
data here suggest again that the recall test exerts an influence on the trend of the
judgments on associated items that is not covered by PASS.
Table 3.5
Mean correlations between estimated frequencies and PASS mean responsesa for
associated and non-associated items in Experiment 2
No Recall
Recall
Associated
.81
.76
Non-associated
.81
.83
a
Parameter values for the PASS simulation: θ1 = 0.025, θ2 = 0.0001, θ3 = 0.0000001, window length = 5
The correlations between the predicted and actual trend of the estimates in both
associative conditions given in Table 3.5 do obviously not yet take into account the
different magnitude of the estimates expected by PASS for associated and nonassociated items. That is why we determined a second fit measure including the factor
‘association’ in addition to the factor ‘presentation frequency’. In the present
experiment both factors were manipulated within subjects, different from Study 1. This
allows us to determine the correspondence between predictions and judgments
individually for every participant. For this purpose, we computed the correlation
between the eight mean predictions of the PASS model for associated and nonassociated items and the 20 judgments of every participant. Table 3.6 shows the
resulting mean correlations for three different parameter combinations in the condition
without recall test. We confine ourselves to this condition since the preceding analysis
of the trend of the estimates already demonstrated that systematic deviations from the
PASS prediction occur in the condition with recall test.40
For the parameter combination from the medium range of values (θ1 = 0.015,
window length = 5) as well as for the highest parameter combination used (θ1 = 0.05,
40
Similar to the situation with the contrast analysis in Study 1, the mean correlations reported
here also do not permit a direct comparison between the recall conditions. This is due to the fact that not
only the variances between the participants but also the variances between the estimates of one
participant relevant here turn out to be smaller after a preceding recall test. Thus, for instance, the mean
standard deviation for non-associated items with the presentation frequency 8 amounts to 1.2 in the
condition with recall test and to 2.0 in the condition without test. Irrespective of the predictions, the
correlations in the condition with recall test are thus less impaired by error variance. This leads to the
fact that the mean correlations in this condition are – despite the different trend of the estimates for
associated and non-associated items – only slightly smaller than in the condition without recall test for
all parameter combinations. They reach a maximum value of r =.77.
150
window length = 7) high mean correlations are the result. Particularly noteworthy is
that the correlations including both associated and non-associated items obtain with
r = .79 nearly the same magnitude as the single correlations in both associative
conditions (cf. Tab. 3.5). This indicates that PASS predicts with these parameter
combinations also the relative distances between the estimates on associated and nonassociated items very well. On the other hand, this does not hold for the lowest
parameter combination (θ1 = 0.005, window length = 3) for which the mean correlation
is clearly smaller with r = .59 . Evidently, with this parameter setting – i.e. in case of a
simulated worse encoding of the study lists – the influence of the associations on the
magnitude of the frequency judgments is overestimated, as already was the case in
Study 1.
Table 3.6
Mean correlations between estimated frequencies and PASS mean responsesa in
Experiment 2
Parameter settingsa
r
a
θ1 = 0.005, window length = 3
.59
θ1 = 0.015, window length = 5
.79
θ1 = 0.050, window length = 7
.79
θ2 = 0.0001; θ3 = 0.0000001
The alteration of the correlations between estimates and PASS predictions from
low via medium to high parameter settings corresponds also in general to the situation
represented in Table 3.6. The correlations increase from low to medium parameter
settings. As the associative effect here turns out greater than in Experiment 1, the
maximum correspondence between the predictions and the estimates is achieved with
somewhat lower parameter combinations than in the previous study. From these
medium parameter settings to high settings no further change of the fit occurs. The
reason for this is once more that the predictions for parameter combinations from the
medium to the high range of values strongly correlate with each other. Nevertheless, as
in Study 1 substantial differences between these predictions do exist if one considers
the expected size of the associative effect in isolation. Thus, averaging over
151
presentation frequencies, with the highest parameter setting it is expected that the
estimates for associated items are by 6 % above the estimates for non-associated items.
With parameter combinations from the medium range of values, on the other hand,
differences of up to 30 % between the estimates on associated and non-associated
items are predicted. In fact, the participants’ estimates differed in the condition without
recall test by 27 % between the associative conditions. One of the parameter
combinations with which a difference of similar size is expected is θ1 = 0.02 und
window length = 5. Figure 3.17 illustrates the correspondence between the PASS
predictions for this parameter setting and the participants’ mean estimates. In the
figure both the estimates and the responses from FEN are standardized by a division by
the means for associated words with the presentation frequency 8. A nearly perfect fit
is the result. The correlation between the participants’ mean estimates and the
responses amounts to ralert = 0.98 (Rosenthal, Rosnow, & Rubin, 2000).
Standardized Mean Estimate
1,0
0,8
0,6
0,4
0,2
0,0
0,0
0,2
0,4
0,6
0,8
1,0
Standardized Mean Response
Figure 3.17. Standardized mean estimated frequencies in the condition without recall
test plotted against standardized mean responses from FEN in Experiment
2. The mean responses result from a simulation with the parameter
settings θ1 = 0.02, θ2 = 0.0001; θ3 = 0.0000001 and window length = 5.
The line indicates the best linear fit.
152
Discrimination of Frequencies: To measure the discrimination performances
in both associative conditions, we computed the correlations between estimated and
actual frequencies for associated and non-associated items for every participant. Since
unusually low correlations appeared with 3 participants, these were excluded from the
further analysis.41 Figure 3.18 displays the mean correlations of the remaining
participants in both recall conditions. As illustrated, the results from Experiment 1 are
replicated (cf. Fig. 3.10). In both recall conditions lower correlations occur with
associated items than with non-associated ones. In the condition without recall test,
however, this difference turns out to be small again. The means here are r = .77 for
associated words and r = .80 for non-associated ones. The effect size is reffect size = .14.
In the condition with recall test the effect of the association is notably greater with
reffect size = .29 (again both effect sizes were calculated as if a between-subject design
had been employed). This is due to the worse discrimination of the frequencies of
associated items in this condition ( r = .71 ). The discrimination of non-associated
words, however, is not impaired by the recall test ( r = .80 ).
With which parameter settings does PASS expect differences of the observed
size in the discrimination performances? For the condition without recall test these are
– as in Study 1 – the same parameter settings with which also the differences in the
magnitude of estimates were predicted correctly. Thus, PASS expects with the
parameter combination θ1 = 0.02 and window-length = 5, which predicts the
magnitude of estimates exceedingly well (s. Fig. 3.17), that the correlations between
estimated and actual frequencies for associated items turn out by 3.7 % lower than for
non-associated items. Actually, the correlations differ by 3.8 % in the condition
without recall test. For the condition with recall test, in contrast, no parameter settings
can be found which predict the differences in the magnitude of the estimates and the
discrimination performances equally well. Thus, with the parameter combination θ1 =
41
Two of these participants came from the condition without and one from the condition with recall test.
In case of all three participants the correlations for non-associated items were more than 1.5 interquartile
ranges below the 25. percentile of the corresponding reference group. However, also in case of
associated items the discrimination performances of the participants concerned were far below average.
Thus, the exclusion of these participants does not influence the reported differences between the
discrimination performances for associated and non-associated items essentially.
153
0.025 and window length = 3, for example, PASS can predict the difference of 11.3 %
in the discrimination performances between associated and non-associated words
approximately correctly. However, this parameter combination also arouses
expectations that associated items receive by 47 % higher estimates than nonassociated ones. The effect of the association on the magnitude of the estimates is
therefore clearly overestimated.
1,0
Mean Correlation
0,9
associated
non-associated
0,8
0,7
0,6
0,5
No Recall
Recall
Figure 3.18. Mean correlations between estimated and actual presentation frequencies
for associated and non-associated items in Experiment 2. Error bars
denote standard errors.
Finally, the results of the second experiment again show that the participants
discriminate fairly well between different frequencies. On the other hand, their
discrimination performances nevertheless stay behind the predictions of the PASS
model. For non-associated items PASS expected correlations above r = .87 with all
parameter combinations. For associated items correlations above r = .90 were expected
as long as only parameter settings are considered that predict the differences to nonassociated items in the magnitude of the estimates and the discrimination performances
154
roughly correctly. Thus, as in Study 1, the absolute magnitude of correlations between
estimated and actual frequencies is overestimated by PASS in all conditions.
Discussion
With Study 2 mainly three aims were pursued. First it should be examined
whether the findings on the effect of associations on frequency judgments from
Experiment 1 could also be replicated with a joint presentation of associated and nonassociated items. Moreover, the frequency judgments should be used to subject the
predictions of the PASS model to a further test. Finally, another aim was to check our
assumption that in a within-subjects design the estimates for associated and nonassociated items show – in contrast to Experiment 1 – the same distribution and, thus,
correspond to the PASS predictions.
Estimates in the Condition without Preceding Recall Test
In the condition without preceding recall test the results entirely corresponded
to our expectations. First of all, the decision times again confirmed that the participants
used a memory assessment strategy to generate their frequency judgments. These
judgments showed the expected associative effect. Associated items received clearly
higher estimates than non-associated ones. The estimates on both kind of items showed
a parallel trend across the presentation frequencies. Moreover, as predicted by PASS,
the frequencies of associated items were discriminated slightly worse than the
frequencies of non-associated items.
Finally, our assumption regarding the distributions of the estimates proved true:
According to the PASS prediction, the estimates on associated and non-associated
words were now nearly equally distributed. Therefore, differences in the magnitude of
the estimates between both kinds of items could equally be found in the means and the
medians. This also supports our explanation for the fact that the medians of the
estimates in Experiment 1 showed no associative effect. There, associated and nonassociated words were judged by different participants. We assumed that this gave the
participants in both groups the opportunity to transform their intuitions of event
frequencies into estimates in different ways. Differences in the intuitions about the
155
frequency of associated and non-associated items could thus at least have been
partially leveled in the estimates. This could not happen in Experiment 2 because each
participant gave judgments on associated and non-associated items, presumably using
the same kind of transformation for both kinds of items. Consequently, the
distributions of the frequency estimates, too, conformed to the PASS predictions.
Thus, the results of Study 2 correspond to all ordinal predictions of the PASS
model that can be gained independent of specific parameter settings. Moreover, we
could demonstrate that with suitable parameter combinations from the medium range
of values the model predicts the data also quantitatively excellently. The fit between
the PASS predictions and the trend of the individual estimates of the participants was
very high for both associated and non-associated items. The fit was hardly lower when,
additionally to the trend of the estimates, also the effect of the factor association was
considered in the calculation. Thus, with appropriate parameter settings, PASS could
also successfully predict the relative distances between the estimates for associated and
non-associated items. Furthermore, similar to Study 1, a nearly perfect fit resulted with
the means of the participants’ estimates. Finally, the parameter combination with
which this fit was attained also predicted the size of the difference in the
discrimination performances for associated and non-associated items correctly. Merely
the absolute magnitude of the discrimination performances was overestimated by
PASS as was already the case in Experiment 1.
PASS obtained the best fit-values with lower parameter settings than in Study
1. This may appear surprising at first sight. As already described above, lower settings
for the learning parameter θ1 and the length of the attentional window modeled a worse
encoding of the study lists. However, there is little reason to assume that the
participants of the second experiment really encoded the items worse. The presentation
of the items occurred in both experiments under identical conditions and with identical
instructions, and the study lists had the same length. Moreover, the participants in both
experiments were drawn from the same population and did not differ from each other
by features such as age and education. It thus seems to be reasonable to assume that
the quality of encoding did not differ essentially in both experiments. However, the
simulations with PASS demonstrated that with identical parameter settings, the model
156
expects here a slightly smaller associative effect with regard to the magnitude of the
estimates than in Study 1. We found that the opposite was the case: The difference
between associated and non-associated items was clearly larger in Study 2 than in
Study 1.
However, this is only apparently contrary to our considerations concerning the
quality of the encoding of the study lists and can be explained by our assumptions of
the transformation process. We assume that part of the participants working on
associated items in Experiment 1 used the same response range as the participants
working on non-associated items despite different intuitions. Thus, they transformed
different intuitions into similar estimates. Consequently, the associative effect turns out
smaller in the mean frequency estimates than in the intuitions. Now if PASS is
supposed to match the effect in the estimates under the restriction of equal
transformations, this can only be balanced via changes in the parameter settings. To
predict a decreased association effect, PASS requires higher parameter settings. In this
way, the different optimal parameter settings in both experiments can be explained.
Actually, this result had to occur if our assumptions of the transformation process are
correct and the study lists were encoded similarly well in both experiments. In this
view, the actual quality of the encoding in Experiment 1 was overestimated by the
parameter settings obtaining the best fit. This argumentation is further supported by the
results of the discrimination performances. The discrimination performances are not
influenced by the transformation process or by the selected response range.
Accordingly, the results of this measure should correspond to the PASS predictions in
case of similarly good encodings in both experiments. This is exactly what we found.
As expected by PASS, the difference in the discrimination performances for associated
and non-associated items was just as great in Experiment 2 as in Experiment 1.
Altogether, the PASS model was thus extremely successful with the prediction
of the estimates in the condition without intermediate recall test. The results hence
demonstrate that the model is not only capable to simulate the fundamental finding of
sensitive frequency judgments but that it is also able to explain the effects of
associations on frequency estimates.
157
Estimates in the Condition with Preceding Recall Test
The results in the condition with preceding recall test largely replicated the
findings from Experiment 1. First the decision times confirmed again that the
participants counted on memory assessment and not on an enumeration strategy when
they gave their frequency judgments. The differences in the frequency judgments
between the recall conditions also found in Experiment 2 can thus not be put down to a
change of strategy.
After the recall test, the estimates were smaller and increased slower with
increasing presentation frequencies than in the condition without recall test. Moreover,
after the recall test, the effect of the associations on the frequency judgments changed.
The effect size of the association on the magnitude of the estimates was smaller than in
the condition without recall test. The estimates on both types of items did not run
parallel any longer. The estimates for associated items increased slower than those for
non-associated items. For the presentation frequency 8, the estimates for associated
items were not higher than those for non-associated items. Once again, PASS predicted
the trend for associated items less well than the trend for non-associated ones. Finally,
the participants discriminated the frequencies of associated items after a recall test
clearly worse than the frequencies of non-associated items. This resulted in the fact
that PASS was not able to predict both the differences in the discrimination
performances and the differences in the magnitude of the estimates with the same
parameter settings correctly.
The findings to the effect of the recall test on the frequency estimates were thus
essentially the same as in Experiment 1. In the meantime, we could also replicate the
result that the recall test leads to reduced absolute estimates in a series of experiments
on the role of attentional processes in frequency judgments. (Sedlmeier & Renkewitz,
2004). Besides, the estimates given after a recall test on words repeatedly presented
before had also in these experiments a slighter increase than estimates given without
an intermediate recall test. Hence, we can conclude that a recall test reliably lowers
absolute frequency judgments.
In the condition with recall test, PASS failed to predict the complete data
pattern correctly in Experiment 1 and in Experiment 2. As already discussed in
158
Study 1, the influence of the recall test cannot take effect during the encoding phase.
The assumption that the recall test influences the frequency estimates by changing the
memory representation is not consistent with the data, either. Consequently, the recall
test must take effect at the time the judgment is given. Manipulations and information
that influence the estimates at the time the judgment is given are not considered in
PASS. As far as such influences take effect, it can generally not be expected that the
model predicts the estimates correctly. For this reason, the results of the current
experiments show that the carrying out of a recall test establishes a situation in which
PASS cannot be applied in its present form.
General Discussion
With the current experiments we primarily examined the influence of
associations between jointly presented items on frequency estimates. We pursued the
aim to check whether the PASS model is capable of explaining possible effects of
associations on frequency judgments. Earlier studies already demonstrated that
estimates on the presentation frequency of a word increased with the number of jointly
presented associated words (Leicht, 1968; Shaughnessy & Underwood, 1973; Vereb &
Voss, 1974). The typical design in these studies consisted of presenting several target
words with different frequencies. Moreover, the study lists contained a varying number
of associated words to each target word. The associations between diverse sets of
target words and their associates was either minimized or remained undetermined.
Furthermore, since the frequencies of target words and associates were not
systematically varied within a set, none of these studies allowed conclusions whether
associations between stimuli do have an effect on the discriminability of the
frequencies of these stimuli. On the contrary, in our experiments the associations
between all words on the study lists were controlled. We created item sets in which all
words were associated with each other as strongly or as weakly as possible. Within
these sets, the frequencies of the words were manipulated in the same way. This
procedure allowed examining the effects of associations on both the magnitude of
frequency estimates and the discriminability of frequencies.
159
Our simulations with PASS demonstrated that irrespective of possible
parameter settings the model expects that higher frequency estimates are given for sets
of associated words. Besides, according to the model predictions, this effect should not
be influenced by the actual presentation frequencies of the stimuli. The estimates on
associated and non-associated words should thus increase in a parallel way with rising
frequencies. Finally, PASS expected that the frequencies of associated items are
discriminated slightly worse than the frequencies of non-associated items. All these
predictions were largely confirmed in both experiments as far as the frequency
estimates were not influenced by an additional manipulation, an intermediate recall
test.
The Explanatory Power of PASS
Associations between words proved to affect frequency estimates in the studies
reported here. These associations did not arise from experimental manipulation.
Instead they were acquired pre-experimentally by the participants. A great advantage
of PASS is that it is not necessary to feed participants’ pre-experimentally learned
knowledge into the model. Rather, PASS experiences a large sample of natural
language, just as the participants had done before the experiment. PASS, using an
associative learning algorithm, is sensitive to the frequencies of co-occurrences of
words within this sample of natural language. The model acquires proper associations
between words solely on the basis of these frequencies of co-occurrences (s. a.
Sedlmeier, Wettler, & Seidensticker, 2004). The same learning mechanism can
obviously be successfully used to simulate participants’ encoding of information
within the experiment and the subsequent estimates of the participants on the
experimental frequency of words. Thus, PASS gives a parsimonious account of both
the pre-experimental acquisition of associations and the frequency judgments within
the experiment that are affected by these associations.
Remember that PASS can additionally account for several other phenomena in
frequency judgments such as regression to the mean (Sedlmeier, 1999), recency effects
(Sedlmeier, 1999), and the effects of varying attention at encoding (Sedlmeier &
Renkewitz, 2004). Taken together, these findings strongly suggest that associative
160
learning – as modeled in PASS – might be more than a plausible candidate for the
processes underlying frequency estimates based on memory assessment.
What about other Memory Models of Frequency Judgment?
PASS is not the only successful memory model of frequency judgments. Like
PASS these other models (e.g. SAM (Gillund & Shiffrin, 1984), TODAM 2 (Murdock,
1993), MINERVA 2 (Hintzman, 1988), MINERVA-DM (Dougherty, Gettys, &
Ogden, 1999)) assume that (I) memory representation changes with every additional
encoding of an event that occurs with a minimum of attention, and (II) that the current
state of memory representation can be used to produce a signal the intensity of which
depends on the frequency of an event in question. The models differ largely with
respect to their architecture, their specific representation assumptions and the
processes with which the memory representation is accessed. Nonetheless these
diverse models have arrived at similar or identical predictions across a wide range of
phenomena in frequency judgments (e.g. Dougherty & Franco-Watkins, 2002;
Hintzman et al., 1992; Hintzman, 2001; Sedlmeier, 2002). However, the capability of
all other models to account for the influence of associations on frequency judgments
studied here, might be limited. To the best of my knowledge, PASS is so far the only
model that successfully simulates human primary associations (Sedlmeier, Wettler, &
Seidensticker, 2004). Thus, the other models had to be delivered from the outside with
the crucial information about associations between words to arrive at any predictions.
Let us illustrate the latter aspect for MINERVA 2 (Hintzman, 1988), the model that
has presumably been used most often to simulate frequency judgments.
Unlike PASS, MINERVA 2 is an exemplar model (e.g. Estes, 1986). In
MINERVA 2 associative learning is not considered as the basis of sensitivity to
frequencies. Consequently, frequency knowledge is not represented in associations.
Instead, every newly encoded event adds a trace to an ever-growing memory matrix.
An event is considered as a collection of features. Accordingly, a trace corresponds to
a vector of features with possible values of ‘1’ (feature present), ‘–1’ (feature absent),
or ‘0’ (feature irrelevant or unknown). Random error effects that any specific feature
of an event might not be correctly stored but instead be coded as unknown. To arrive at
161
a frequency judgment for an event, the model assesses the similarity between this
event and all traces in the memory matrix. This results in a global familiarity signal
that is in turn the basis for frequency judgments. As MINERVA 2 encodes each new
event in a separate trace that is by no means connected to any other trace in memory, it
cannot learn about relations between events. This includes that it cannot learn
associations between words. Thus, to be able to make any prediction about the
influence of associations on frequency judgments, MINERVA 2 has to be supplied
with information about which word is how strongly associated to which other words.
How could associations be represented in the memory of MINERVA 2? The
only way would be to translate associations between events into similarities of these
events. In MINERVA 2 the similarity of two events is a function of the number of
features that these events share. PASS acquired associations solely on the basis of the
frequency of co-occurrences of words. This knowledge about co-occurrences allowed
the model to predict effects of associations on frequency estimates very well.
Consequently, it is apparently only possible to translate associations into similarities if
one can assume that the number of features two words share is highly, if not perfectly,
correlated with the frequency with which these words co-occur. Is that a reasonable
assumption? This question has no definite answer, as it is not specified in MINERVA
2 – or elsewhere – what the relevant features are that constitute words or events. If we
regard the stimulus material in our studies, there are undoubtedly word pairs for which
this assumption seems to be plausible: For example, Stuhl (chair) and Hocker (stool)
are associated and they certainly share many features. However, PASS also identifies
words as strongly associated for which it seems difficult to enumerate common
features such as, for example, Fluss (river) and Brücke (bridge) or Stuhl (chair) and
rutschen (to slide). Thus, it seems at least doubtful that associations could be
adequately translated into similarities. If this is not possible, the effects observed in our
studies would be based on an attribute that cannot be represented in MINERVA 2.
Therefore, PASS might not only be a promising model of the memory assessment
process but also the only model that can presently offer a comprehensive account for
the effects of associations we observed in the present experiments.
162
What Problems Remain in Predicting Frequency Estimates with PASS?
Obviously, the current studies also revealed some remaining weaknesses in the
prediction of frequency estimates by PASS. Let us conclude the discussion with a
glance at these problems. Two of these problems immediately concern the modeling of
the memory assessment process.
Predicting Discrimination Performances
That this modeling cannot be perfect first results from the prediction of the
discrimination performances. PASS indeed correctly predicted the differences in the
discriminability of the frequencies of associated and non-associated words; however,
the absolute discrimination performance was overestimated by PASS in all conditions
of both experiments. The correlations with which we measured the discrimination
performances are determined by the trend of the estimates and the variance of the
estimates at each given presentation frequency. Since the trend of the participants’
estimates corresponded to the PASS predictions in the condition without recall test, it
becomes obvious that the model underestimates the error variance of the estimates.
Hence, PASS evidently lacks an appropriate error component producing such variance.
In fact, in the present simulations the model contained the assumption of a faultless
encoding of stimuli. Since, furthermore, the parameters for learning, interference and
decay remained constant throughout learning, all variance in the PASS predictions at a
given presentation frequency can be put down to the associations between the stimuli
and the temporal course of the encoding. Any kind of additional unsystematic error
variance was not taken into consideration. In contrast, in other memory models it is
assumed that accidental information loss occurs when stimuli are encoded (e.g.
Hintzman, 1988; Fiedler, 1996). How could such a kind of information loss be
incorporated in PASS? In its original form PASS encodes events as a collection of
features. Accidentally loosing or exchanging values of single features prior to or
during the encoding of events would produce the desired error variance. A
corresponding procedure is usually realized in simulations with other memory models
(e.g. Dougherty et al., 1999; Hintzman, 1988). In the present studies this was not
possible because we had to use localistic representations of words to be able to
163
simulate the acquisition of associations between words. However, in principle, the
assumption of unsystematic information loss can be integrated into PASS without any
problems.
Predicting Context Sensitivity in Frequency Judgments
The second problem with the modeling of the memory assessment process
concerns the ability of PASS to discriminate the frequency of occurrence of stimuli in
different contexts. We demonstrated that associations between words increase
estimates on the experimental frequencies of these words. Associations are, according
to PASS, a result of the frequency with which words were jointly encoded before the
experiment. Thus, people are not able to fade out their knowledge about the frequency
of co-occurrences of words outside the experiment when they give experimental
frequency estimates. Interestingly, earlier studies demonstrated that the preexperimental frequency of a single word does not increase experimental estimates on
this word (Greene & Thapar, 1994; Rao, 1983). In this case, people can very well
discriminate between the experimental and the extra-experimental context.
In the ideal case, a model should be capable of simulating both findings
described. In the present simulations PASS was generally not sensitive to contexts
since no information at all was encoded whether the words occurred outside or within
the experiment. For this reason, besides the associations between words also the preexperimental frequencies of occurrence of words would have influenced PASS’
predictions. To avoid this unwanted effect, we intervened in the simulations by setting
the self-associations of the words to zero after PASS had acquired pre-experimental
knowledge from the text corpus. The pre-experimental frequency of occurrence of a
word could thus not increase the predictions.
In principle, PASS can also produce frequency estimates differentiating
between dissimilar contexts. This only requires that the model encodes these contexts
and that the memory probe specifies a target context in addition to a target stimulus.
Knowledge about the frequency of occurrence of a word itself and about the frequency
of co-occurrences of words, however, is identically represented in PASS and by no
means separated. Thus if the model discriminates between contexts, this will not only
164
weaken the (undesired) influence of the pre-experimental word frequencies but also the
(desired) effect of pre-experimental co-occurrences of words. A further development
of PASS should thus aim at integrating a mechanism into the model that allows
differential effects of associations and pre-experimental frequencies of occurrence.
Beyond Memory Assessment: Transforming Intuitions into Absolute Frequency
Judgments
The last problem to be discussed here, PASS shares with all other memory
assessment models: Even if participants use a memory assessment strategy to generate
frequency judgments and if a model exists that perfectly describes the corresponding
memory processes, this may not suffice to predict frequency estimates. The reason for
this is that the models do not directly simulate frequency judgments but intuitions
about frequencies. These intuitions still need to be transformed by the participants into
absolute frequency judgments. With respect to this transformation all memory models
only contain the assumption that a participant will use the same linear function for all
relevant stimuli to transform his intuitions into estimates. At least as far as the
participants are not provided with a response range, this function remains unspecified.
As a consequence, only predictions about relative differences between frequency
estimates are possible but not predictions about the absolute magnitude of estimates.
Especially if one takes into account that participants obviously carry out very different
transformations even within the same experimental condition, this seems to be
unsatisfying. For example, in the condition without recall test in Experiment 1 the 25.
percentile of the mean estimates on associated stimuli was 3.0 while the 75. percentile
amounted to 5.8 (s. Fig. 3.9). With regard to the reasons for this variance in the
frequency estimates the memory models remain silent.
Moreover, our results indicate that even relative differences in absolute
frequency estimates can only be predicted completely correctly if the factors to be
studied are manipulated within-subjects. In Experiment 1 we manipulated the factor
‘association’ between-subjects. To derive model predictions we assumed that the
transformation functions would be equally distributed in both associative conditions.
The results of our experiments strongly suggest that this assumption was wrong. In
165
Experiment 1 PASS could not predict the distributions and the medians of the
estimates to associated and non-associated words correctly. These problems
disappeared when the factor ‘association’ was manipulated within-subjects in
Experiment 2. This indicates that participants who gave estimates on associated words
in Experiment 1 at least partially selected different transformations than participants
who judged the frequency of non-associated words. However, we could only speculate
about the factors influencing the transformation process. In general, the literature on
frequency estimates reveals hardly more about the transformation process than that
given information about absolute frequencies will affect the magnitude of the estimates
(e.g. Brown & Siegler, 1993; Brown, 1995; Schwarz et al., 1985; Schwarz &
Scheuring, 1992; Schwarz, 1999). As long as this gap is not closed, memory models
can only make precise predictions about the relative differences of absolute frequency
estimates in within-subjects designs.
The fact that transformation processes are obviously influenced by factors that
remain unconsidered in memory models may also explain why the participants’
frequency estimates deviated from the PASS predictions after a recall test. The recall
test had two effects in both experiments: On the one hand, it generally led to lower
frequency judgments and, on the other, it affected the trend of the estimates on
associated words. These estimates increased slower in the condition with recall test
than in the condition without recall test. As already explained in the discussion of
Experiment 1, it has to be assumed that the recall test takes effect only at the time the
judgment is given. At this stage, two processes have to be carried out. Frequency
information has to be retrieved from memory and it must then be transformed into
frequency estimates. On the basis of the present experiments, it cannot be resolved
which of these two processes was influenced by the recall test. However, it is worth to
be noted that there are different findings according to which an influence of a recall
test on the transformation process seems to be plausible. Schwarz and colleagues
(1991; s. a. Wänke et al., 1995, Schwarz & Wänke, 2002) found that the experienced
ease of recall changed the magnitude of subsequent frequency estimates. Irrespective
of the actual recall performance, the participants gave higher frequency estimates on an
event if the retrieval of corresponding instances was subjectively easy. In our
166
experiments the participants had two minutes to recall words from the study list. On
average, our participants recalled 11.0 (Experiment 1) and 11.3 (Experiment 2) of the
15 stimuli presented. We observed that the participants started to have difficulties in
retrieving additional words a long time before the two minutes run out. Thus, the
participants may have experienced the recall test as rather difficult. The experience that
the retrieval of words failed or succeeded only with effort might have rendered the
assumption implausible that these words were presented very often immediately
before. And this might be reason enough for the participants to choose a comparatively
low response scale for their frequency estimates.
One could at least speculate about whether an effect of the recall test on the
transformation process can also explain the changed trend of the estimates on
associated items. It would be possible that the participants infer a maximum
presentation frequency from their experiences with the recall and, as a consequence,
never give a frequency estimate beyond this maximum. Such a limitation of the
subjective response range could result in a ceiling effect. The decelerated increase of
the estimates on associated items and the absence of an associative effect with the
maximum presentation frequency 8 observed in both experiments could be explained
in this way.
167
CHAPTER 4
Summary and Outlook
To act successfully in the world, humans must notice the frequencies of events
occurring. For example, when parents think of sending their child to a particular
school, they may rely on the experiences of other parents with this school. To exploit
these experiences properly, it is not sufficient to remember that the school was praised
and criticised. One must also access how often this occurred respectively. How do
people do that?
Two answers have dominated the psychological literature on frequency
judgments: (1) People try to remember specific instances of events (“Mrs. Hull said X,
Mr. Schmidt said Y.”). They base their frequency judgements about events on the size
of the recalled samples (i.e. numbers of praises and criticisms in our example). This
approach has been termed enumeration or recall-estimate strategy. (2) On the
contrary, memory models of frequency judgments assume that the representation of
frequency information in memory is an inevitable consequence of attending to events.
Additionally, it is assumed that the current state of the memory representation can be
used to produce a signal the intensity of which reflects the frequency of events. Thus,
memory models postulate that people can estimate frequencies by using a retrieval free
memory assessment strategy.
In a first step, this dissertation explores whether memory models can explain
the famous-names effect, hitherto accounted for by the use of a recall-estimate strategy.
In a second step, this work then takes a closer look at the PASS model (Sedlmeier,
1999; 2002), which assumes that sensitivity to frequencies is based on associative
learning. PASS is used to generate new predictions concerning the influence of
associations between events on frequency judgments.
168
Can Memory Models Account for the Famous-Names Effect?
An important issue in the research on frequency estimates is the differentiation
between category size judgments and frequency of occurrence judgments. In category
size judgments the requested frequency estimates refer to the number of different
exemplars of a category. In contrast, frequency of occurrence judgments refer to the
number of repetitions of the same event. Evidence for the recall-estimate strategy
mostly stems from research on category size judgments.
The best-known study advocates of the recall-estimate strategy refer to is
presumably Kahneman and Tversky’s famous-names experiment (1973). If
participants are presented with a list of male and female persons they overestimate the
frequency of that sex that was represented by more famous-names on the list. As the
participants additionally recall more names of the “famous” sex, the bias in frequency
judgments was ascribed to the recall-estimate strategy. Proponents of memory models
claimed that the famous-names effect could also result from the use of a memory
assessment strategy (Dougherty et al., 1999; Sedlmeier, 1999). They proposed that the
higher pre-experimental frequency of occurrence of famous-names is responsible for
the bias. They additionally assumed that the memory assessment process does not
allow for a perfect discrimination of encounters in different contexts. In this case also
memory models would expect an overestimation of the category with the more famous
names. An implication of this explanation is that the famous-names effect should
generalize to any stimulus material: When the frequency of two categories is judged
the frequency of that category should be overestimated the exemplars of which occur
more often outside the target context.
These ideas were tested in the four experiments reported in chapter 2. The
results largely disagreed with the memory model explanation for the famous-names
effect. It was found that differences in the pre-experimental frequency of stimuli are
not sufficient to produce biases in category size judgments. However, also the recallestimate strategy was not successful in explaining the results of the four experiments.
While this strategy expected a considerable bias in all experiments the frequency
estimates were quite accurate in several conditions. Thus neither the memory
assessment strategy nor the recall-estimate strategy could provide a satisfying account
169
of the entire pattern of results. A reanalysis of the data suggested that the results might
be best understood by the following assumptions: 1. The memory response to
categories was not influenced by the frequency of the exemplars outside the target
context. Thus, the participants generally had access to unbiased information about the
category sizes by memory assessment. 2. Nonetheless, the participants supplemented
this information sometimes with information that they gained from their recall
performances. This resulted in slightly biased frequency estimates.
Thus, a multiple strategy perspective (Brown, 1995; 2002) appears most
promising to account for the heterogeneous results with category size tasks found here
and in numerous other studies (e.g. Alba et al., 1980; Barsalou & Ross, 1986; Watkins
& LeCompte, 1991; Williams & Durso, 1986). What does affect the choice of
strategy? The current experiments suggest that participants rely more heavily on a
recall strategy when many instances were remembered, which presumably renders the
recall strategy subjectively valid. In line with this, participants should hardly employ
enumeration when they do not judge category sizes but frequencies of occurrence: If
chair for example is encountered twice over the course of an experiment, these two
episodes can hardly be remembered separately because they occurred within very
similar contexts. In line with this, we found no indication of the use of the recallestimate strategy in the experiments of chapter 3, which required frequency of
occurrence judgments.
What are implications for further research? Outside the laboratory, repetitions
of events hardly occur in contexts as similar to each other as in a typical experiment.
When the same event occurs in richer and more diverse contexts enumeration might be
feasible, even for frequency of occurrence judgments. Future research should address
this issue. Another interesting question is which other factors do effect the choice of
strategy. It seems reasonable that not only the perceived validity of the recall-estimate
strategy effects the choice but also the subjective validity of a memory assessment
strategy. In Experiment 4, after an overt categorization of the stimuli participants relied
exclusively on memory assessment, even though they did recall a large number of
exemplars. This suggests that a categorization task increases the perceived validity of
170
memory assessment. This issue deserves further scrutiny as well as the question which
other factors influence the perceived validity of enumeration and memory assessment.
The Influence of Associations on Frequency Judgments
The results reported in chapter 2 suggest that the encoding of events is
sufficient to bring about the representation of quite accurate knowledge about the
frequencies of the superordinate categories of these events in memory. However,
people do not necessarily exclusively rely on this frequency memory to generate
frequency judgments. In general there is much evidence that knowledge about
frequencies of events and their superordinate categories is an inevitable consequence
of attending to events (e.g. Alba et al., 1980; Hasher & Zacks, 1984; Betsch &
Sedlmeier, 2002). This raises the question how frequencies are encoded and
represented in memory.
PASS (Sedlmeier, 1999; 2002) describes the emergence of frequency
knowledge as a by-product of associative learning. The model acquires not only
knowledge about the frequencies of events but also about the frequencies with which
different events co-occur. The latter enables PASS to predict human primary
associations to stimuli successfully (Sedlmeier, Wettler, & Seidensticker, 2004).
In chapter 3 I used the capability of PASS to acquire associations to gain
precise predictions about the effects of associations on frequency estimates. In two
simulations PASS encoded a large text corpus and learned in this way about the
associations between words. In a second step of the simulations PASS encoded study
lists that were also presented to human participants in two experiments. PASS
predicted that the joint presentation of associated words (like river and bridge) on a
study list would lead participants to give higher frequency of occurrence estimates than
the joint presentation of non-associated words (like river and fashion). Moreover, this
effect should be independent of the presentation frequency of the words. Finally, the
frequencies of associated items should be discriminated slightly worse than the
frequencies of non-associated items.
These predictions were largely confirmed in the experiments reported in
chapter 3. Thus, PASS can obviously provide a parsimonious account of both the
171
acquisition of associations and judgments of frequencies that are based on memory
assessment. This indicates that associative learning – as modelled in PASS – might be
more than a plausible candidate for the memory mechanisms underlying frequency
processing. Proponents of other memory models will have to show if their models can
also account for the demonstrated associative effects. Since exemplar models (as
MINERVA 2 (Hintzman, 1998) or MINERVA-DM (Dougherty et al., 1999)) cannot
acquire associations between stimuli, it is doubtful that this class of models is suitable
to achieve this.
Although PASS successfully predicted the effects of association on frequency
judgments, the experiments in chapter 3 nonetheless revealed several limitations of
PASS. Two limitations pertain to the modelling of the memory assessment process:
First, PASS does not yet integrate the effects of random error during the encoding of
stimuli. As a consequence, PASS must necessarily overestimate participants’ ability to
discriminate frequencies. Second, PASS does not yet allow to differentially treat
effects of the pre-experimental frequencies of co-occurrences of events and effects of
the pre-experimental frequencies of the events themselves. However, the present
studies demonstrated that associations between words – that are according to PASS
based on pre-experimental co-occurrences – can exert a quite considerable influence
on frequency estimates. In contrast, earlier studies suggest that the pre-experimental,
linguistic frequency of words hardly influences frequency judgments (Rao, 1983;
Greene & Thapar, 1994). Thus, a further development of PASS should aim at
integrating mechanisms that can simultaneously account for these diverging effects.
A last problem of PASS and all other memory models results from the fact that
these models simulate intuitions about frequencies. However, the memory models lack
a comprehensive description of how people transform intuitions about frequencies into
absolute judgments of frequencies. The experiments in chapter 3 suggest that this
shortcoming can restrict the ability of the models to predict absolute frequency
estimates even if the memory assessment process is modelled correctly.
172
References
Alba, J. W., Chromiak, W., Hasher, L., & Attig, M. S. (1980). Automatic encoding of
category size information. Journal of Experimental Psychology: Human
Learning and Memory, 6, 370-378.
Anderson, J. R. (1996). Kognitive Psychologie. Heidelberg: Spektrum, Akademischer
Verlag.
Attneave, F. (1953). Psychological probability as a function of experienced frequency.
Journal of Experimental Psychology, 46, 81-86.
Barsalou, L.W., & Ross, B. H. (1986). The role of automatic and strategic processing
in sensitivity to superordinate and property frequency. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 12, 116-134.
Bazerman, M. H. (2001). Judgment in managerial decision making. New York: John
Wiley & Sons.
Bedon, B. G., & Howard, D. V. (1992). Memory for the frequency of occurrence of
karate techniques: A comparison of experts and novices. Bulletin of the
Psychonomic Society, 30 (2), 117-119.
Begg, I., Maxwell, D., Mitterer, J. O., & Harris, G. (1986). Estimates of frequency:
Attribute or attribution? Journal of Experimental Psychology: Learning,
Memory, and Cognition, 12, 496-508.
Betsch, T., & Pohl, D. (2002). Tversky and Kahneman’s availability approach to
frequency judgment: A critical analysis. In P. Sedlmeier, & T. Betsch (Eds.),
Etc.: Frequency processing and cognition (pp. 109-120). Oxford: University
Press.
Betsch, T., & Sedlmeier, P. (2002). Frequency Processing and Cognition: StockTaking and Outlook. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency
processing and cognition (pp. 303-318). Oxford: University Press.
Betsch, T., Siebler, F., Marz, P., Hormuth, S., & Dickenberger, D. (1999). The
moderating role of category salience and category focus in judgments of set
size and frequency of occurrence. Personality and Social Psychology Bulletin,
25, 463-481.
173
Beyth-Marom, R., & Fischhoff, B. (1977). Direct measures of availability and
judgments of category frequency. Bulletin of the Psychonomic Society, 9, 236238.
Blair, E., & Burton S. (1987). Cognitive processes used by survey respondents to
answer behavioral frequency questions. Journal of Consumer Research, 14,
280-288.
Bornstein, R. F. (1989). Exposure and affect: Overview and meta-analysis of research,
1968-1987. Psychological Bulletin, 106(2), 265-289.
Bousfield, W. A., & Sedgewick, C. H. (1944). An analysis of sequences of restricted
associative responses. Journal of General Psychology, 30, 149-165.
Bower, G. H. (1994). A turning point in mathematical learning theory. Psychological
Review, 101, 290–300.
Brooks, J. E. (1985). Judgments of category frequency. American Journal of
Psychology, 98, 363-372.
Brown, N. R. (1995). Estimation strategies and the judgment of event frequency.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 21,
1539-1553.
Brown, N. R. (1997). Context memory and the selection of frequency estimation
strategies. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 23, 898-914.
Brown, N. R. (2002). Encoding, representing, and estimating event frequencies: A
multiple strategy perspective. In P. Sedlmeier, & T. Betsch (Eds.), Etc.:
Frequency processing and cognition (pp. 37-54). Oxford: University Press.
Brown, N. R., Buchanan, L., & Cabeza, R. (2000). Estimating the frequency of
nonevents: The role of recollection failure in false recognition. Psychonomic
Bulletin and Review, 7, 684-691.
Brown, N. R., & Siegler, R. S. (1993). Metrics and mappings: A framework for
understanding real-world quantitative estimation. Psychological Review, 100,
511-534.
Bruce, D., Hockley, W. E., & Craik, F. I. M. (1991). Availability and category
frequency estimation. Memory and Cognition, 19, 301-312.
174
Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time
in a basic word recognition task: Moving on from Kucera and Francis.
Behavior Research Methods, Instruments, & Computers, 30, 272-277.
Cheng, P. W. (1997). From covariation to causation: A causal power theory.
Psychological Review, 104, 367-405.
Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction.
Psychological Review, 99, 365-382.
Cofer, C. N. (1965). On some factors in the organizational characteristics of free recall.
American Psychologist, 20, 261-272.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159.
Conrad, F. G., & Brown, N. R. (1994). Strategies for estimating category frequency:
Effects of abstractness and distinctiveness. American Statistical Association,
Proceedings of the Section on Survey Methods Research (pp. 1345-1350).
Alexandria, VA: American Statistical Association.
Conrad, F. G., Brown, N. R., & Cashman, E. (1998). Strategies for answering
behavioral frequency questions. Memory, 6, 339-366.
Conrad, F. G., Brown, N. R., & Dashen, M. (2003). Estimating the frequency of events
from unnatural categories. Memory and Cognition, 31, 552-562.
Deese, J. (1959). Influence of inter-item associative strength upon immediate free
recall. Psychological Reports, 5, 305-312.
Diener, E., Sandvik, E., & Pavot, W. (1991). Happiness is the frequency, not the
intensity, of positive versus negative affect. In F. Strack, M. Argyle, & N.
Schwarz (Eds.), Subjective Well-Being. Oxford: Pergamon Press.
Dobbins, I. G., Kroll, N. E. A., Yonelinas, A. P., & Liu, Q. (1998). Distinctiveness in
recognition and free recall: The role of recollection in the rejection of the
familiar. Journal of Memory and Language, 38, 381-400.
Dougherty, M. R. P., & Franko-Watkins, A. M. (2002). A memory models approach to
frequency and probability judgement: Applications of Minerva 2 and Minerva
DM. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and
cognition (pp. 121-136). Oxford: University Press.
175
Dougherty, M. R. P., & Franco-Watkins, A. M. (2003). Reducing bias in frequency
judgment by improving source monitoring. Acta Psychologica, 113, 23-44.
Dougherty, M. R. P., Gettys, C. F., & Ogden, E. E. (1999). MINERVA-DM: A
memory process model for judgments of likelihood. Psychological Review,
106, 180-209.
Dunlap, W. P., Cortina J. M., Vaslow, J. B., & Burke, M. J. (1996). Meta-analysis of
experiments with matched groups or repeated measures designs. Psychological
Methods, 2, 170-177.
Edgington, E. S. (1965). A type of spurious constant error in magnitude estimation.
Perceptual and Motor Skills, 20, 199-202.
Eraker, S. A., & Politser, P. (1988). How decisions are reached: Physicians and the
patient. In J. Dowie & A. S. Elstein (Eds.), Professional Judgment: A reader in
clinical decision making (pp. 379-394). Cambridge: Cambridge University
Press.
Erickson, J. R., & Gaffney, C. R. (1985). Effects of instructions, orienting task, and
memory tests on memory for words and word frequency. Bulletin of the
Psychonomic Society, 23, 377-380.
Erlick, D. E. (1964). Absolute judgments of discrete quantities randomly distributed
over time. Journal of Experimental Psychology, 57, 475–482.
Estes, W. K. (1950). Toward a statistical theory of learning. Psychological Review, 57,
94-107.
Estes, W. K. (1976). Structural aspects of associative models of memory. In: C. N.
Cofer (Ed), The Structure of Human Memory (pp. 31-53). San Francisco:
Freeman.
Estes, W. K. (1986). Array models for category learning. Cognitive Psychology, 18,
500-549.
Fiedler, K. (1983). On the testability of the availability heuristic. In R.W. Scholz (Ed.),
Decision making under uncertainty (pp. 109-119). Amsterdam: Elsevier,
North-Holland.
176
Fiedler, K. (1991). The tricky nature of skewed frequency tables: An information loss
account of distinctiveness-based illusory correlations. Journal of Personality
and Social Psychology, 60, 24-36.
Fiedler, K. (1996). Explaining and simulating judgment biases as an aggregation
phenomenon in probabilistic, multiple-cue environments. Psychological
Review, 103, 193–214.
Fischbein, E. (1975). The intuitive sources of probabilistic thinking in children.
Dordrecht, The Netherlands: D. Reidel.
Fisk, A. D., & Schneider, W. (1984). Memory as a function of attention, level of
processing, and automatization. Journal of Experimental Psychology:
Learning, Memory, and Cognition, 10, 181-197.
Fiske, S. T., & Neuberg, S. L. (1990). A continuum of impression formation from
category based to individuation: influences of information and motivation on
attention and interpretation. In M. Zanna (Ed.), Advances in experimental
social psychology (Vol. 23, pp. 1-74). New York: Academic Press.
Fiske, S. T., & Taylor, S. E. (1991). Social Cognition. McGraw-Hill, New York.
Flexser, A. J., & Bower, G. H. (1975). Further evidence regarding instructional effects
on frequency judgements. Bulletin of the Psychonomic Society, 6 (3), 321-324.
Freund, J. S., & Hasher L. (1989). Judgments of category size: Now you have them,
now you don' t. American Journal of Psychology, 102, 333-352.
Gardiner, J. M. (1988). Functional aspects of recollective experience. Memory &
Cognition, 16, 309 - 313.
Gigerenzer, G. (1991). How to make cognitive illusions disappear: Beyond ‘heuristics
and biases’. In W. Storche, & M. Hewstone (Eds.), European review of social
psychology, (Vol. 2, pp. 83-115). Wiley: New York NY.
Gillund, G., & Shiffrin, R. M. (1984). A retrieval model for both recognition and
recall. Psychological Review, 91, 1-67.
Ginsburg, H., & Rapoport, A. (1967). Children’s estimates of proportions. Child
Development, 38 (1), 205-212.
Greene, R. L. (1984). Incidental learning of event frequencies. Memory & Cognition,
12, 90–95.
177
Greene, R. L. (1986). Effects of intentionality and strategy on memory for frequency.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 12,
489-495.
Greene, R. L. (1988). Generation effects in frequency judgment. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 14, 298-304.
Greene, R. L. (1989). On the relationship between categorical frequency estimation
and cued recall. Memory and Cognition, 17, 235-239.
Greene, R. L., & Thapar, A. (1994). Mirror effect in frequency discrimination. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 20, 946-952.
Grossberg, S. (1976). Adaptive pattern classification and universal recoding, I: Parallel
development and coding of neural feature detectors. Biological Cybernetics,
23, 121-134.
Hamilton, D. L., & Gifford, R. K. (1976). Illusory correlation in interpersonal
perception: A cognitive basis of stereotypic judgements. Journal of
Experimental Social Psychology, 12, 392-407.
Hasher, L., & Chromiak, W. (1977). The processing of frequency information: An
automatic mechanism? Journal of Verbal Learning and Verbal Behavior, 16,
173-184.
Hasher, L., & Zacks, R. T. (1979). Automatic and effortful processes in memory.
Journal of Experimental Psychology: General, 108 (3), 356-388.
Hasher, L., & Zacks, R. T. (1984). Automatic processing of fundamental information:
The case of frequency of occurrence. American Psychologist, 39, 1372–1388.
Hintzman, D. L. (1969). Apparent frequency as a function of frequency and the
spacing of repetitions. Journal of Experimental Psychology, 80, 139–145.
Hintzman, D. L. (1988). Judgments of frequency and recognition memory in a
multiple-trace memory model. Psychological Review, 95, 528–551.
Hintzman, D. L. (2001). Similarity, global matching, and judgments of frequency.
Memory & Cognition, 29, 547-556.
Hintzman, D. L., & Block, R. A. (1971). Repetition and memory: Evidence for a
multiple-trace hypothesis. Journal of Experimental Psychology, 88 (3), 297306.
178
Hintzman, D. L., Curran, T., & Oppy, B. (1992). Effects of similarity and repetition on
memory: Registration without learning? Journal of Experimental Psychology:
Learning, Memory, and Cognition, 18, 667-680.
Hockley, W. E., & Christi, C. (1996). Tests of the separate retrieval of item and
associative information using a frequency judgment task. Memory &
Cognition, 24, 796-811.
Howell, W. C. (1973). Storage of events and event frequencies: A comparison of two
paradigms in memory. Journal of Experimental Psychology, 98, 260-263.
Huber, O. (1993). The development of the probability concept: Some reflections.
Archives de Psychologie, 61, 187-195.
Indow, T., & Togano, K. (1970). On retrieving sequence from long-term memory.
Psychological Review, 77, 317-331.
Johnson, M. K., Hashtroudi, S., & Lindsay, D. S. (1993). Source monitoring.
Psychological Bulletin, 114, 3-28.
Johnson, M. K., Taylor, T. H., & Raye, C. L. (1977). Fact and Fantasy: The effects of
internally generated events on the apparent frequency of externally generated
events. Memory & Cognition, 5, 116-122.
Jonides, J., & Naveh-Benjamin, M. (1987). Estimating frequency of occurrence.
Journal of Experimental Psychology: Learning, Memory, and Cognition, 13,
230-240.
Jonides, J., & Jones, C. (1992). Direct coding for frequency of occurrence. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 18, 107-128.
Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty:
Heuristics and biases. Cambridge University Press.
Kelley, H. H. (1973). The process of causal attribution. American Psychologist, 28,
107-128.
Kintsch, W. (1998). Comprehension. A paradigm for cognition. Cambridge:
Cambridge University Press.
Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day American
English. Providence: Brown University Press.
179
Kuzmak, S. D., & Gelman, R. (1986). Young children’s understanding of random
phenomena. Child Development, 57, 559-566.
Leicht, K. L. (1968). Recall and judged frequency of implicitly occurring words.
Journal of Verbal Learning and Verbal Behavior, 7, 918-923.
Lewandowsky, S, & Smith, P. W. (1983). The effect of increasing the memorability of
category instances on estimates of category size. Memory & Cognition, 11,
347-350.
Lezius, W., Rapp, R., & Wettler, M. (1998). A freely available morphological
analyser, disambiguator, and context sensitive lemmatizer for German.
Proceedings of the COLING-ACL 1998 (pp. 743-747).
Lichtenstein, S., Slovic, P., Fischhoff, B., Layman, M., & Combs, B. (1978). Judged
frequency of lethal events. Journal of Experimental Psychology: Human
Learning and Memory, 4, 551–581.
Lopez, F. J., Shanks, D. R., Almaraz, J., & Fernandez, P. (1998). Effects of trial order
on contingency judgments: A comparison of associative and probabilistic
contrast accounts. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 24, 672–694.
Maki, R. H., & Ostby, R. S. (1987). Effects of levels of processing and rehearsal on
frequency judgments. Journal of Experimental Psychology: Learning, Memory,
and Cognition, 13, 151-163.
Manis, M. Shedler, J. Jonides, J., & Nelson, T. E. (1993). Availability heuristic in
judgments of set-size and frequency of occurrence. Journal of Personality and
Social Psychology, 65, 448-457.
Marx, M. H. (1985). Retrospective reports on frequency judgments. Bulletin of the
Psychonomic Society, 23, 309-310.
Mayer, R.E. (1992). Thinking, Problem Solving, Cognition (2nd ed.). New York:
Freeman.
Menon, G. (1993). The effects of accessibility of information in memory on judgments
of behavioral frequencies. Journal of Consumer Research, 20, 431-440.
180
Menon, G., Raghubir, P., & Schwarz, N. (1995). Behavioral frequency judgments: An
accessibility-diagnosticity framework. Journal of Consumer Research, 22, 212228.
Messieck, D. M., & Mackie, D. (1989). Intergroup relations. Annual review of
psychology, 40, 45-81.
Moyer, R. S., & Dumais, S. T.(1978). Mental comparison. In G. H. Bower (Ed.), The
psychology of learning and motivation (Vol. 12, pp. 117-155). New York:
Academic Press.
Murdock, B. (1993). TODAM2: A model for the storage and retrieval of item,
associative, and serial-order information. Psychological Review, 100, 183-203.
Murdock, B. (1999). Item and associative interactions in short-term memory: Multiple
memory systems? International Journal of Psychology, 34, 427-433.
Naveh-Benjamin, M., & Jonides, J. (1986). On the automaticity of frequency coding:
Effects of competing task load, encoding strategy, and intention. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 12, 378-386.
Peterson, C. R., & Beach, L. R. (1967). Man as an intuitive statistician. Psychological
Bulletin, 68, 29-46.
Rao, K. V. (1983). Word frequency effect in situational frequency estimation. Journal
of Experimental Psychology: Learning, Memory, and Cognition, 9, 79-81.
Reber, R., & Zupanek, N. (2002). Effects of processing fluency on estimates of
probability and frequency. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency
processing and cognition (pp. 175-188). Oxford: University Press.
Reichardt, C. S., Shaughnessy, J. J., & Zimmerman, J. (1973). On the independence of
judged frequencies for items presented in successive lists. Memory and
Cognition, 1, 149-156.
Reyna, V. R., & Brainerd, C. J. (1994). The origins of probability judgment: A review
of data and theories. In G. Wright, & P. Ayton (Eds.). Subjective probability
(pp. 239-272). Chicester: Wiley.
Roediger, H. L., & McDermott, K. B. (1995). Creating false memories: Remembering
words that were not presented in lists. Journal of Experimental Psychology:
Learning, Memory, & Cognition, 21, 803-814.
181
Rose, R. J., & Rowe, E. J. (1976). Effects of orienting task and spacing of repetitions
on frequency judgments. Journal of Experimental Psychology: Human
Learning and Memory, 2, 142-152.
Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in
behavioral research. A correlational approach. Cambridge: University Press.
Rowe, E. J. (1974). Depth of processing in a frequency judgment task. Journal of
Verbal Learning and Verbal Behavior, 13, 638-643.
Rumelhart, D. E., & Zipser, D. (1985). Feature discovery by competitive learning.
Cognitive Science, 9, 75–112.
Russell, W. A. (1970). The complete German language norms for responses to 100
words from the Kent-Rosanoff word association test. In: L. Postman, & G.
Keppel (Eds.), Norms of word association (pp. 53-94). New York: Academic
Press.
Russell, W. A., & Jenkins, J. J. (1954). The complete Minnesota norms for responses
to 100 words from the Kent-Rosanoff Word Association Test. University of
Minnesota.
Schaller, M., Asp, C. H., Rossell, M. C., & Heim, S. J. (1996). Training in statistical
reasoning inhibits formation of erroneus group stereotypes. Personality and
Social Psychology Bulletin, 22, 829-844.
Schaller, M., & Maas, A. (1989). Illusory correlations and social categorization:
Toward an integration of motivational and cognitive factors in stereotype
formation. Journal of Personality and Social Psychology, 56, 709-721.
Schimmack, U. (2002). Frequency judgments of emotions: The cognitive basis of
personality assessment. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency
processing and cognition (pp. 189-204). Oxford: University Press.
Schwarz, N. (1999). Frequency reports of physical symptoms and health behaviors:
How the questionnaire determines the results. In Park, D. C., Morrell, R. W., &
Shifren, K. (Eds.), Processing medical information in aging patients: Cognitive
and human factors perspectives (pp. 93-108). Mahaw, NJ: Erlbaum.
182
Schwarz, N., Bless, H., Strack, F., Klumpp, G., Rittenauer-Schatka, H., & Simons, A.
(1991). Ease of retrieval as information: Another look at the availability
heuristic. Journal of Personality and Social Psychology, 61, 195-202.
Schwarz, N., Hippler, H. J., Deutsch, B., & Strack, F. (1985). Response categories:
Effects on behavioral reports and comparative judgments. Public Opinion
Quarterly, 49, 388-395.
Schwarz, N., & Scheuring, B. (1992). Selbstberichtete Verhaltens- und
Symptomhäufigkeiten: Was Befragte aus Anwortvorgaben des Fragebogens
lernen. Zeitschrift für Klinische Psychologie, 22, 197-208.
Schwarz, N., & Wänke, M. (2002). Experiential and contextual heuristics in frequency
judgement: Ease of recall and response scales. In P. Sedlmeier, & T. Betsch
(Eds.), Etc.: Frequency processing and cognition (pp. 89-108). Oxford:
University Press.
Sedlmeier, P. (1999). Improving statistical reasoning: Theoretical models and
practical implications. Mahwah: Lawrence Erlbaum Associates.
Sedlmeier, P. (2002). Associative learning and frequency judgement. In P. Sedlmeier,
& T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 137-152).
Oxford: University Press.
Sedlmeier, P. (in press). Intuitive judgments about sample size. In K. Fiedler, & P.
Juslin (Eds.), In the beginning there is a sample: Information sampling as a key
to understand adaptive cognition. Cambridge: Cambridge University Press.
Sedlmeier, P., Betsch, T., & Renkewitz, F. (2002). Frequency processing and
cognition: introduction and overview. In P. Sedlmeier, & T. Betsch (Eds.), Etc.:
Frequency processing and cognition (pp. 1-17). Oxford: University Press.
Sedlmeier, P., & Gigerenzer, G. (1997). Intuitions about sample size: The empirical
law of large numbers? Journal of Behavioral Decision Making, 10, 33-51.
Sedlmeier, P., Hertwig, R, & Gigerenzer, G. (1998). Are judgments of the positional
frequencies of letters systematically biased due to availability? Journal of
Experimental Psychology: Learning, Memory, and Cognition, 24, 754-770.
183
Sedlmeier, P., & Renkewitz, F. (2004). Attentional processes and judgments about
frequency: Does more attention lead to higher estimates? Manuscript in
Preparation.
Sedlmeier, P., Wettler, M., & Seidensticker, P. (2004). Why Do We Associate
„Woman“ when we hear „man“? A computational model of associative
responses based on co-occurrences of words in large text corpora. Manuscript
in Preparation.
Shanks, D. R. (1995). The psychology of associative learning. Cambridge: University
Press.
Shanteau, J. (1978). When does a response error become a judgmental bias ? Journal
of Experimental Psychology: Human Learning and Memory, 4, 579-581.
Shapiro, B. J. (1969). The subjective estimation of relative word frequency. Journal of
Verbal Learning and Verbal Behavior, 8, 248-251.
Shaughnessy, J. J., & Underwood, B. J. (1973). The retention of frequency information
for categorized lists. Journal of Verbal Learning and Verbal Behavior, 12, 99107.
Silver, N. C., & Dunlap, W. P. (1987). Averaging correlation coefficients: Should
Fisher's Z transformation be used? Journal of Applied Psychology, 72, 146148.
Slamecka, N. J. (1985). Ebbinghaus: Some associations. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 11, 414–435.
Smith, E., & Medin, D. (1981). Categories and Concepts. Cambridge, Mass.: Harvard
University Press.
Sumby, W. H. (1963). Word frequency and the serial position effect. Journal of Verbal
Learning and Verbal Behavior, I, 443-450.
Taylor, S. E., Fiske, S. T., Etcoff, N. L., & Ruderman, A. J. (1978). Categorial and
contextual basis of person memory and stereotyping. Journal of Personality
and Social Psychology, 36, 778-793.
Tryk, H. E. (1968). Subjective scaling and word frequency. American Journal of
Psychology, 81, 170-177.
184
Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency
and probability. Cognitive Psychology, 4, 207–232.
Tversky, A., & Kahneman, D. (1974). Judgment under uncertainy: Heuristics and
biases. Science, 185, 1124-1131.
Varey, C. A., Mellers, B. A., & Birnbaum, M. H. (1990). Judgments of proportions.
Journal of Experimental Psychology: Human Perception and Performance, 16,
613–625.
Vereb, C. E., & Voss, J. F. (1974). Perceived frequency of implicit associative
responses as a function of frequency of occurrence of list items. Journal of
Experimental Psychology, 103, 992-998.
Wänke, M., Schwarz, N., & Bless, H. (1995). The availability heuristic revisited:
Experienced ease of retrieval in mundane frequency judgments. Acta
Psychologica, 89, 83-90.
Wasserman, E. A., & Miller, R. R. (1997). What's elementary about associative
learning? Annual Review of Psychology, 48, 573-607.
Watkins, M. J., & LeCompte, D. (1991). Inadequacy of recall as a basis for frequency
knowledge. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 17, 1161–1176.
Weber, E. U., Bockenholt, U., Hilton, D. J., & Wallace, B. (1993). Determinants of
diagnostic hypothesis generation: Effects of information, base rates, and
experience. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 19, 1151–1164.
White, B. J. (1991). Availibility heuristic and judgments of letter frequency.
Perceptual and Motor Skills, 72, 34.
Williams, K.W., & Durso, F.T. (1986). Judging category frequency: Automaticity or
availability. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 12, 387-396.
Yates, J. F. (1990). Judgment and decision-making. Englewood Cliffs: Prentice-Hall.
Zacks, R. T., & Hasher L. (2002). Frequency processing: A twenty-five year
perspective. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing
and cognition (pp. 21-36). Oxford: University Press.
185
Zacks R. T., Hasher, L., & Sanft, H. (1982). Automatic encoding of event frequency:
Further findings. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 8, 106-116.
Zajonc, R. B. (1968). Attidutinal effects of mere exposure. Journal of Personality and
Social Psychology, Monograph Supplement, 9 (2, part 2), 1-27.
Zechmeister E. B., King, J. F., Gude, C., & Opera-Nadi, B. (1975). Ratings of
frequency, familiarity, orthographic distinctiveness and pronounciability for
192 surnames. Instrumentation and Psychophysics, 7, 531-533.
186