Memory Processes in Frequency Judgment: The impact of pre-experimental frequencies and co-occurrences on frequency estimates. Dissertation zur Erlangung des akademischen Grades doctor rerum naturalium (Dr. rer. nat.) vorgelegt der Philosophischen Fakultät der Technischen Universität Chemnitz von Frank Renkewitz, geboren am 09.10.1970 in Issum Chemnitz, den 4. Februar 2004 Vorwort Bei der Arbeit an dieser Dissertation unterstützten mich viele Menschen in vielfältiger Weise. Ich möchte nicht versäumen, Ihnen zu danken. Herr Prof. Dr. Peter Sedlmeier und Herr Prof. Dr. Manfred Wettler hatten wesentlichen Anteil daran, dass ich diese Dissertation in einer stets angenehmen Atmosphäre erarbeiten und fertig stellen konnte. Bei inhaltlichen und organisatorischen Problemen, die mich bei dieser Arbeit ereilten, waren sie jederzeit ansprechbar. Petra Seidensticker schulde ich Dank für die Programmierung der Simulationen, die in Kapitel 3 berichtet werden. Alexander Bischof unterstützte mich bei der Programmierung der Versuchssteuerungen zu einigen der Experimente. Bei der Durchführung der Untersuchungen halfen mir Conny Belger, Sonja Kunze, Susann Porstmann und Maya Reimer. Johannes Hönekopp, Andreas Keinath, Martin Baumann und Gesine Heisterkamp danke ich für ihre Bereitschaft zu etlichen und sicher ermüdenden Diskussionen zum Thema Häufigkeitsschätzungen, vor allem aber für ihre äußerst freundschaftliche Unterstützung bei allen Aspekten dieser Arbeit. Von ihren Anregungen, Korrekturen und Aufmunterungen hat diese Dissertation zweifellos sehr profitiert. Schließlich gilt mein Dank Birgit Aufderheide sowie Dieter und Christa Renkewitz, deren Hilfe über meine Mühen mit dieser Dissertation weit hinausreicht und mich stets begleitet. Contents CHAPTER 1: JUDGMENTS ON FREQUENCIES: INTRODUCTION AND OVERVIEW …....…... 1 CHAPTER 2: IS THE FAMOUS-NAMES EFFECT CAUSED BY DIFFERENCES IN THE PRE-EXPERIMENTAL FREQUENCIES OF THE NAMES? A TEST OF A HYPOTHESIS OF MINERVA-DM AND PASS ……………... 8 Summary ..….……………...……………………………………………………. 8 Introduction ………………..….………………………………………………… 9 Demonstrations of Faulty Estimates ………………………………….….…..…. 10 The Availability Heuristic as an Explanation for Biased Estimates ….….….….. 12 Memory Models of Frequency Judgment ………………………………………. 14 Biased Estimates Reconsidered ………………………………………………… 16 An Alternative Account for the Famous-Names Effect …………………….….. 18 Overview of the Experiments …………………………………………….…….. 26 Experiment 1 ……………………………………………………………………. Method ……………………………………………………………………. Predictions of the Memory Models ………………………………………. Results …………………………………………………………………….. Discussion ………………………………………………………………… 28 28 31 34 40 Experiment 2 ……………………………………………………………………. Method ……………………………………………………………………. Predictions of the Memory Models ………………………………………. Results …………………………………………………………………….. Discussion ……………………………………………………………..….. 44 44 46 46 49 Experiment 3 ……………………………………………………………………. Method ……………………………………………………………………. Predictions of the Memory Models ………………………………………. Results …………………………………………………………………….. Discussion ………………………………………………………………… 53 53 55 57 60 Experiment 4 ………………………………………………………….…..…….. Method ……………………………………………………………………. Predictions of the Memory Models …………………..…………………... Results……………………………………………………………………... Discussion ……………..………………………………….………………. 64 64 66 66 69 General Discussion …………………………..……….…….…….…….………. 71 Conclusion ………………………………………………………..…………….. 82 CHAPTER 3: EFFECTS OF ASSOCIATIONS ON FREQUENCY JUDGMENTS: SIMULATIONS WITH PASS AND EMPIRICAL RESULTS .......……….……... 84 Summary ...……………………………………………………………………… 84 Introduction …………………...………………………………………………… What Kinds of Frequency Judgments can be modeled with PASS? ..……. 85 87 The PASS-Model ...……………………………………………….…………….. FEN – Architecture and Learning Rule .………………………………….. The Generation of Frequency Judgments ...………………………………. Former Simulations with PASS ...………………………………………… 90 91 93 95 Overview of the Simulations and Experiments ....……………………………… 96 Study 1 ...……………………………………..…………………………………. Method ………………………………………………….………………… Simulation Method ...…………….……………………………………….. Simulation Results ...…..………………………………………………….. Experimental Results ……………...……………………………………… Discussion …….…………………………………………………………... 100 100 103 107 115 129 Study 2 …….……………………………………………………………………. Method ……………………………………………………………………. Simulation Method and Results ….……………………………………….. Experimental Results …...……………………….………………………... Discussion ………………………………………………………………… 137 137 139 144 155 General Discussion ……..………………………………………………………. The Explanatory Power of PASS ..……….………………………………. What about other Memory Models of Frequency Judgment? ..……...…… What Problems Remain in Predicting Frequency Estimates with PASS? ... Beyond Memory Assessment: Transforming Intuitions into Absolute Frequency Judgments ……..……………………………………………… 159 160 161 163 165 CHAPTER 4: SUMMARY AND OUTLOOK …….….….………………………………………….. 168 Can Memory Models Account For the Famous-Names Effect? …....…………... 169 The Influence of Associations on Frequency Judgments ..….…...……………... REFERENCES 171 173 CHAPTER 1 Judgments on Frequencies: Introduction and Overview Psychologists have shown a long-standing interest in people’s estimates of absolute and relative frequencies and the processes underlying such estimates. One motivation for this interest can be seen in the fact that judgments about frequencies permeate our daily life. For instance, to select the appropriate rate of an Internet provider, we have to assess how often we usually use the Internet. Before we decide to leave the umbrella at home, we may wonder how often it rains in the afternoon although the sky is blue in the morning. Before we take out an insurance, we will probably consider how often the misadventure in question may occur. Judgments on the frequency of certain students’ mistakes may influence the teacher’s lesson plan. A doctor will often ask about the frequency of particular symptoms. Her diagnosis will be influenced by how often the symptoms under consideration have indicated a specific disease in her professional practice (Eraker & Politser, 1988; Weber et. al, 1993). Some of these examples already point to a second motivation for the interest in frequency processing: The experience of frequencies plays a significant role in numerous areas of thinking and judgment. To highlight the diversity of the cognitive processes that are affected by frequencies let us name a few of them. The frequency of jointly occurring objects or features of objects is the decisive input for categorization achievements and contingency learning (Hintzman, 1988; Shanks, 1995, Smith & Medin, 1981). Probability judgments can be based on the frequency with which certain events occurred in the past (Reyna & Brainerd, 1994; Sedlmeier; 1999; Tversky & Kahneman, 1973). In turn, judgments on probabilities and relative frequencies are an essential determinant of decisions (e.g. Yates, 1990). Valence judgments on a neutral stimulus are influenced by its presentation frequency (Zajonc, 1968; Bornstein,1989). 1 The extent of happiness in life depends, among other things, on the perceived frequency of positive and negative life events (Diener, Sandvik & Pavot, 1991). As a last example, frequencies affect various social judgments such as attributions (Kelley, 1973; Cheng, 1997; Cheng & Novick, 1992) or stereotyping (Hamilton & Gifford, 1976; Schaller & Maas, 1989; Schaller et al., 1996). According to the manifold effects of frequencies, the question of how people process frequencies and how they judge them on the basis of this processing was examined in various sub-disciplines of psychology. Thus, there are numerous studies on frequency judgments in experimental social psychology (e.g. Betsch et al., 1999; Manis et al., 1993; Schwarz et al., 1991), in developmental psychology (e.g. Fischbein, 1975; Ginsburg & Rapoport, 1967; Huber, 1993; Kuzmak & Gelman, 1986) or in the literature on survey methods (e.g. Blair & Burton, 1987; Conrad, Brown, & Cashman, 1998; Menon, Raghubir, & Schwarz, 1995). However, the longest tradition of research on frequency judgment can be found in the area of judgment and decision-making and in the literature on memory and learning. Unfortunately, methods, theoretical assumptions and results from one of those areas have often been neglected in the other. This led to the surprising fact that completely contradictory conclusions have been achieved already with questions the empirical clarification of which may appear simple at first sight. The most prominent example of this kind concerns the question about the accuracy of frequency judgments. A particular interest in the quality of frequency judgments chiefly existed in the area of decision-making. Here, research on frequency estimation was heavily influenced by the heuristics and biases program (Kahneman, Slovic, & Tversky, 1982). At the core of this program was the assumption that people use a toolbox of heuristics to solve the complex tasks of generating judgments on frequencies or probabilities and of making predictions on uncertain quantities. In order to explain frequency judgments on serially encoded events, which are in the focus of this thesis, it was assumed for the most part that the availability heuristic is selected from the toolbox (e.g. Tversky & Kahneman, 1973; Beyth-Marom & Fischhoff, 1977; Lichtenstein et al., 1978). According to the availability heuristic, judgments about the frequency of an event can be based on the ease with which instances of that event can be brought to mind 2 (Tversky & Kahneman, 1973). Although it was assumed that the heuristic can lead to reasonable estimates under favorable conditions, the typical procedure to test the availability hypothesis was to create an experimental situation in which the heuristic predicted that the estimates would not conserve the rank ordering of the actual event frequencies and to check whether the predicted error occurs. This procedure produced several impressive demonstrations of biased frequency judgments. However, another consequence of this test strategy was, that evidence for the heuristic almost exclusively stemmed from faulty estimates. Starting from these demonstrations of biases, the conclusion was drawn in the literature on judgment and decision-making that frequency estimates were generally error-prone. This conclusion gained enormous popularity far beyond the heuristics and biases program and entered numerous textbooks on cognitive psychology (e.g. Mayer, 1992; Anderson, 1996; Yates, 1990) and social psychology (e.g. Fiske & Taylor, 1991). For a long time, the point of view that frequency judgments are susceptible to biases remained astonishingly unaffected by the fact that in the literature on memory and learning a huge amount of empirical studies appeared in parallel in which the participants showed a remarkable sensitivity to frequencies. Accurate frequency judgments were found for various events from everyday life such as letters, names, words, professions or karate techniques within training sessions (Attneave, 1953; Bedon & Howard, 1992; Shapiro, 1969; Tryk, 1968; Zechmeister et al., 1975). The same result was found to be true in laboratory studies in which, even under aggravated conditions, reasonable estimates were regularly given on the experimentally varied frequencies of stimuli previously presented on lists (e.g. Greene, 1984). From the multitude of corresponding results a consensus arose in the literature on memory and learning which exactly contradicted the conclusion from the research on judgment and decision-making: It was generally assumed that frequency judgments ordinarily reflect the actual frequencies of the events in question fairly accurately (e.g. Peterson & Beach, 1967; Howell, 1973; Estes, 1976; Jonides & Jones, 1992). Hasher and Zacks (1984) presumably adopted the most explicit point of view of this kind formulating the “automatic coding hypothesis“. In this approach, frequency was regarded as one of the few attributes of events or objects registered automatically. According to this, the 3 encoding of frequency information requires little or no attentional capacity and works without any awareness and intention (s. also Hasher & Zacks, 1979; Zacks, Hasher, & Sanft, 1982; Zacks & Hasher, 2002). In this view, people will obviously not have to rely on heuristics to generate frequency judgments since the required information will regularly be stored in memory. Within the memory literature several precise and computational models were suggested that allow the prediction of frequency judgments (e.g. Dougherty et al, 1999; Gillund & Shiffrin, 1984; Hintzman, 1988; Murdock, 1993; Sedlmeier, 1999). All these models incorporate the assumption that every event encoded with a minimum of attention causes a modification of memory representation. Accordingly, in these models the state of the memory representation at the time of judgment can be used to produce a signal, the intensity of which depends on the frequency with which the event in question was encoded. Although the models differ clearly regarding their architecture, their specific representation assumptions and the processes with which the memory representation is accessed, they therefore make the same basic prediction: Frequency judgments will mostly reflect the actual frequencies fairly well. In all models, this prediction is subject to one general restriction. Since the memory representation of an event is not a perfect copy of this event, regression to the mean is expected. According to this, small frequencies should be overestimated and large frequencies underestimated. In fact, regression was observed in numerous studies (e.g. Attneave, 1953; Fiedler, 1991; Hintzman, 1969; Varey et al., 1990). Nowadays there is nearly no dissent any longer that the fundamental predictions of the memory models describe the typical finding to frequency judgments correctly (e.g. Sedlmeier et al., 2002; Fiedler, 1996): estimates are mostly calibrated fairly well, but regressed. However, this does not mean that the models assume that the memory for frequencies is invariant. On the contrary, it is expected that factors affecting the encoding or representation of events also have an effect on frequency judgments on these events. In the research to memory models, recognition judgments have often been regarded as a special case of frequency judgments (in which only the frequencies 0 and not-0 are differentiated). Corresponding to the expectation of the models, indeed several factors were identified that exert a reliable influence on both kinds of 4 judgments such as item similarity (Hintzman, 1988), depth of encoding (Greene, 1988) or context effects (Brown, 1995). Moreover, recency effects (Lopez et al., 1998; Sedlmeier, 1999), list length and spacing effects (Hintzman, 1988) in frequency judgments were successfully predicted or explained with memory models. Finally, also phenomena beyond simple frequency or recognition judgments could be simulated, such as people’s confidence in their frequency estimates. (Sedlmeier, 1999) All together, research to frequency judgments within the memory paradigm was both empirically and theoretically very fertile. Empirically, researchers succeeded in finding a series of stable phenomena that allow conclusions on how memory processes frequencies. Theoretically, models of memory were developed that are able to account for these phenomena and that have an explanatory scope which is often not confined to frequency estimates. Contemporary research within the memory paradigm primarily aims at the further evaluation of these models. This is mainly evidenced by studies in which new predictions about memory factors influencing frequency estimates are derived and tested (e.g. Murdock, 1999; Hintzman, 2001; Dougherty & Franco-Watkins, 2003). For a long time, however, the theoretical approaches from memory research were not applied to the biases described in the literature on judgment and decisionmaking. This has changed only recently. Later memory models no longer ignored the demonstrations of faulty estimates from the heuristics and biases program. Dougherty, Gettys and Ogden (1999) suggested MINERVA-DM. The aim of this model is to preserve the advantages of earlier memory models and to explain a multitude of judgmental biases at the same time. Also within the PASS model (Sedlmeier, 1999) attempts have been made to explain some of the demonstrations of faulty frequency estimates on serially encoded events by memory processes. These integrative approaches thus challenge the assumption that the biased frequency estimates observed in the judgment and decision-making literature must be ascribed to the use of the availability heuristic. The present dissertation includes two series of studies that fit into the two trends in memory research to frequency estimates described above: In chapter 2 I tested a memory-model-based explanation of one of the best known and most quoted 5 biases, the famous-names effect. Tversky and Kahneman (1973) could demonstrate that after the presentation of a list with a similar number of female and male names, the frequency of that sex is overestimated that was represented on the list with more famous names. Since the more famous names were also recalled better, this effect was usually explained with the availability heuristic An alternative explanation was recently suggested both in MINERVA-DM (Dougherty et al., 1999) and in PASS (Sedlmeier, 1999). In both models it is assumed that the property of famous names eliciting the bias is their higher pre-experimental frequency of occurrence. The memory models suggest furthermore that people are not able to discriminate perfectly between different encoding contexts of the same event. Thus, if the memory representation is accessed to estimate the experimental frequency of famous names, then the resulting memory signal will be influenced by the extra-experimental occurrence of the names. More famous names thus elicit a stronger signal than less famous names. In this view, the assumption that single exemplars are retrieved is not necessary to explain the famous-names effect. An implication of this explanation is that an analogous effect should occur with other stimulus material than famous names: If the frequency of two categories is to be judged, then that category should be overestimated the exemplars of which occur more often outside the target context. I tested this hypothesis in four experiments. The predictions of the memory models were compared with the predictions of a recall-estimate version of the availability heuristic. This procedure allows a conclusion whether the occurrence of a bias is indeed caused by the processes described in the memory models or by the use of the availability heuristic. In chapter 3 new predictions about a factor influencing the memory for frequencies are derived from the PASS model. Earlier studies indicate that estimates on the experimental frequency of a word increase with the number of jointly presented associated words (Leicht, 1968; Shaughnessy & Underwood, 1973; Vereb & Voss, 1974). PASS can offer an explanation for the effect of the factor ‘association’. In this memory model the ability to judge the frequencies of serially encoded events is regarded as a by-product of associative learning. Accordingly, knowledge about the frequencies of events is represented in memory in the form of associative strengths 6 between features constituting these events. The acquisition of these associative strengths is modeled by a neural net that successively encodes events. In this process, PASS also acquires associations between events (or objects). Sedlmeier, Wettler and Seidensticker (2004) could now demonstrate that this property of the model can be successfully used to predict human primary associations to stimulus words if the model encoded an adequately large amount of text beforehand. This indicates that PASS can map the associative strengths between words acquired by the participants already before the experiment. Other memory models on frequency judgments do not dispose of this ability. To the best of my knowledge, PASS is thus the only model able to describe the acquisition of associations between words, the experimental encoding of these words and, finally, the generation of frequency estimates on these words. In simulations with PASS, I used this capability of the model to gain precise predictions about the effect of associations between words on frequency estimates. These predictions were tested in two experiments in which the participants were presented with lists containing words that were either strongly or weakly associated among each other. The results indicate that associative learning is more than just a plausible candidate for the memory processes underlying frequency estimates. In chapter 4 the results of both series of studies are summarized briefly and some tentative guidelines for future research are suggested. 7 CHAPTER 2 Is the Famous-Names Effect Caused by Differences in the Pre-Experimental Frequencies of the Names? A Test of a Hypothesis of MINERVA-DM and PASS. Summary One of the best-known demonstrations of biased frequency judgments is the famous-names effect (Tversky & Kahneman, 1973). If participants are presented with a list of names of female and male persons, then the frequency of that sex is overestimated that was represented with more famous names on the list. Usually, this finding is explained with the availability heuristic. According to this heuristic, frequency judgments are based on the ease-of-retrieval of relevant exemplars. An alternative explanation of the effect was suggested within the framework of the models MINERVA-DM (Dougherty, Gettys, & Ogden, 1999) and PASS (Sedlmeier, 1999). These models postulate that frequency judgments are generated with a retrieval-free memory assessment process. It is assumed in both models that the different preexperimental frequency of famous and non-famous names is responsible for the effect. If this is correct, the famous-names effect should generalize to any stimulus material: When the frequency of two categories in a target context is judged, the frequency of that category should be overestimated the exemplars of which occur more often outside the target context. We tested this hypothesis in four experiments. The predictions of the memory models were compared with the predictions of a recallestimate version of the availability heuristic. We found that differences in the frequencies of exemplars are not sufficient to produce a bias in category size judgments. The results suggest furthermore that an overestimation of a category only occurred if the participants based their judgments at least partially on the number of retrieved exemplars. However, the participants did not do so in all experiments. We conclude that the entire result pattern is explained best within a multiple strategy perspective (Brown, 2002) and name factors that may foster the use of a recallestimate strategy. 8 Introduction How accurately can people estimate the absolute and relative frequencies of events? Surprisingly, in the literature two contrasting answers to this question can be found. In the area of research on judgment and decision-making, frequency estimates are generally described as error-prone. This point of view originated from the heuristics and biases program (Kahneman, Slovic, & Tversky, 1982) in which several experimental demonstrations of faulty estimates were produced (e.g. Tversky & Kahneman, 1973; 1974). Theoretically, these demonstrations were considered as evidence for the assumption that people use heuristics to simplify and solve the too demanding task of frequency estimation. Both the conclusion that frequency judgments will often be biased and the explanation that this is caused by the application of heuristics were widely perceived and gained acceptance in various subdisciplines of psychology. However, a completely different point of view on the accuracy of frequency estimates and the processes underlying such estimates can be taken from the literature on memory and learning. Here, frequency judgments were mostly explained with precise memory models (e.g. Hintzman, 1969; 1988; Murdock, 1993; Dougherty, Gettys, & Ogden, 1999; Sedlmeier, 1999). In these models it is generally assumed that the memory representation of an event changes with the frequency with which the event was encoded. Furthermore it is assumed that people are usually able to access the memory representation in a way that allows them to generate frequency estimates. Thus, according to the memory models, it is at least not necessary to apply heuristics. Instead people should mostly be able to estimate frequencies with little effort (e.g. Brown, 2002) and fairly accurately (e.g. Hasher & Zacks, 1984). Corresponding to the assumptions of the models, a multitude of studies were published in memory literature in which the participants showed a remarkable sensitivity to frequencies. This was valid both for estimates on naturally varying frequencies from everyday life (e.g. Attneave, 1953; Bedon & Howard, 1992; Shapiro, 1969; Tryk, 1968; Zechmeister et al., 1975) and for estimates on experimentally varied frequencies in laboratory studies. (e.g. Greene, 1986; Jonides & Naveh-Benjamin, 1987; Hockley & Christi, 1996). Furthermore, laboratory studies showed that quite 9 accurate estimates can be expected both in so called frequency of occurrence judgments (e.g., Flexser & Bower, 1975; Hasher & Chromiak, 1977; Hintzman, 1969; Hintzman & Block, 1971) and in so called set size judgments (e.g., Alba, Chromiak, Hasher, & Attig, 1980; Barsalou & Ross, 1986; Brooks, 1985; Watkins & LeCompte, 1991). In frequency of occurrence judgments the required estimate refers to the repeated occurrence of the same item (e.g. “How often did the word “oak“ appear in the list you just studied?“). In set size judgments the correct frequency estimate results from the number of presented diverse exemplars of a given category (e.g. “How many different kinds of trees did appear on the list?”). Consequently, within the memory literature the findings on frequency estimates were often summarized in a way that was opposed to the conclusions from the area of judgment and decision-making. It was generally inferred that people are mostly able to judge event frequencies with great fidelity (e.g. Peterson & Beach, 1967; Howell, 1973; Estes, 1976; Hasher & Zacks, 1984; Jonides & Jones, 1992). Demonstrations of Faulty Estimates In consideration of the recurrently observed valid frequency estimates, one may wonder how the assumption that frequency estimates were generally error-prone could gain acceptance in many areas of psychology. Part of the answer can presumably be found in the great popularity of the comparatively few demonstrations of inaccurate estimates. Below, three of these demonstrations are described. In the letter study Tversky and Kahneman (1973) asked their participants to estimate the relative frequency of certain letters in the first and third position of words of the English language. The participants should first of all specify whether a given letter appears more often in the first or third place. Afterwards, they should estimate the ratio of the frequencies of occurrence of the letter in question in both positions. Tversky and Kahneman used the five consonants k, l, n, r and v which all, in contrast to the majority of consonants, appear more often in the third place. However, the participants did not recognize this. For all five letters the majority of participants estimated that they occur more frequently in the initial position. The median of the estimated ratio of frequencies in the first and third place was 2 : 1 for all letters. 10 Another study in which the participants at least partially gave inaccurate estimates was conducted by Lichtenstein, Slovic, Fischhoff, Layman and Combs (1978). These authors asked their participants for judgments on the frequencies of 41 different causes of death. Generally, the participants demonstrated sensitivity to the frequencies of these causes of death – the median of the correlations of the individual estimates with the actual frequencies amounted to approximately r = .70 in several experimental groups. However, the estimates nevertheless deviated systematically from the normative responses in two different respects. First, the estimates were regressed, that is, large frequencies were underestimated and small frequencies overestimated. Regression was also observed in many other older as well as more recent studies (e.g. Attneave, 1953; Erlick, 1964; Greene, 1984; Varey, Mellers, & Birnbaum, 1990) and has to be generally regarded as a restriction towards the view that frequency estimates corresponded fairly precisely to the actual frequencies. Therefore, the finding that the estimates for some causes of death also consistently deviated from the regression line in several experimental groups was far more surprising. The frequencies of deaths caused by catastrophic events mostly reported intensely in the media were overestimated compared to the values predicted by the regression. Examples are tornados, floods, homicides but also cancer. Less salient causes of death, such as asthma, diabetes or strokes, on the other hand, received lower estimates than predicted. A third very well known demonstration of biased estimates is the famousnames study (Tversky & Kahneman, 1973). In this study the participants were presented with a list of 39 names. The list either contained 19 names of very famous men and 20 names of less famous women or – exactly the reverse – 20 names of less famous men and 19 names of very famous women. Tversky and Kahneman exclusively asked for an ordinal frequency judgment after the list presentation. In other words, the participants should indicate whether the list contained more men’s or women’s names. This frequency judgment turned out unambiguously clear-cut: 81% of the participants judged erroneously that the sex represented with more famous names on the list appeared more often. 11 The Availability Heuristic as an Explanation for Biased Estimates Tversky and Kahneman (1973; 1974) suggested several heuristics to account for different phenomena observed in judgments on frequencies and probabilities and in predictions of uncertain quantities. For the case of frequency judgments on serially encoded events it was mostly assumed that people rely on the availability heuristic. This was also true for the demonstrations of faulty estimates reported here. According to the availability heuristic, frequency judgments can be obtained by assessing the “ease of retrieval”. Thus, when this heuristic is applied frequency estimates on an event or a category should be based on how easily relevant instances or exemplars can be recalled from memory. The biases observed in frequency judgments were explained by the assumption that availability is not only influenced by the objective frequency of the event in question but also by a series of further factors. Examples of such factors can be taken from the study of frequency judgments on causes of death. Lichtenstein et al. (1978) could demonstrate that the over- and underestimation of the frequencies of different causes of death could be predicted fairly well by means of two variables. These variables were the frequency with which these causes are reported in newspapers and ratings of the catastrophic quality of these causes. While no direct availability measure was collected in the study, the authors assumed that these variables influence the ease of retrieval of the causes of death. Correspondingly, they regarded their results as being consistent with the view that the frequency judgments were generated with the availability heuristic. The results of the letter study were explained in a similar way. Here as well availability was not measured independently. Tversky and Kahneman, however, claimed correctly (s. Sedlmeier, Hertwig, & Gigerenzer, 1998) that words beginning with a certain consonant could be retrieved from memory more easily than words in which this consonant occurs in third position. The observed overestimation of the frequencies of consonants in the initial position could thus be attributed to the application of the availability heuristic. In contrast to the other studies, a direct measure of the ease of retrieval was used in the famous names study. Each list of names was presented to two groups of participants here. Subsequent to the presentation one group gave the ordinal frequency 12 judgments already described. The participants of the other group were asked to recall as many names on the list as possible. In accordance with the hypothesis that the frequency estimates were generated by means of the availability heuristic, it could be shown that the more famous names were easier to retrieve. On average, 12.3 of the famous names and 8.4 of the less famous names were recalled. Approximately 85% of the participants recalled more or at least as many names of the sex represented with the more famous names on the list. Tversky and Kahneman (1973) suggested that availability could be generally measured by means of the number of retrieved instances as in the famous-names study. In addition to this, they assumed that people judging the frequency of an event can assess the availability of this event in a similar way. According to the proposed mechanism, frequency estimates based on availability can be generated by recalling as many instances of the event as possible for a certain period of time. The number of recalled instances then serves as a basis for the frequency estimate on this event. The measurement of availability by the number of recalled instances became a widespread practice in further research (e.g. Beyth-Marom & Fischhoff, 1977; Barsalou & Ross, 1986; Williams & Durso, 1986; Bruce, Hockley, & Craik, 1991; Brooks, 1985). The corresponding cognitive mechanism received different labels, such as exemplar-retrieval hypothesis (Greene, 1989), recall-estimate theory (Watkins & LeCompte, 1991), enumeration (Brown, 1995) or availability-by-number (Sedlmeier, Hertwig, & Gigerenzer, 1998). A reason for the fact that later researchers preferred different labels for this mechanism was that the notion of the availability heuristic was originally phrased in a way that is consistent with several different cognitive mechanisms (Fiedler, 1983; Schwarz & Wänke, 2002, Betsch & Pohl, 2002). Thus, Tversky and Kahneman (1973) emphasized that a person using this heuristic can also assess availability alternatively without performing a retrieval operation. However, they did not specify the process that allows for such a retrieval-free judgment. In the subsequent research no generally accepted measure of availability evolved which can be collected independent of a recall test. The vague phrasing of the availability heuristic thus left space for manifold interpretations of this notion. In later literature very different forms of memory assessment were suggested and yet likewise described 13 as possible availability mechanisms, among them assessment of processing fluency (Reber & Zupanek, 2002), subjective experience of ease-of-retrieval (Schwarz et al., 1991; Wänke, Schwarz, & Bless, 1995) assessment of familiarity (Dougherty, Gettys, & Ogden, 1999) or assessment of associative activation (Sedlmeier, 1999). It is evident that the availability hypothesis cannot be tested on the basis of such an ambiguous definition (Fiedler, 1983; Betsch & Pohl, 2002). Thus, the term ’availability’ can hardly be used in a reasonable manner. In the following investigations of the famous-names effect and its ability to generalize to different stimulus material we will instead use the term recall-estimate strategy. This term should solely signify a cognitive process to estimate frequencies in which it is first tried to retrieve relevant instances or exemplars from memory. The estimate is then based on the size of the recalled sample. Correspondingly, to gain predictions of frequency estimates of this strategy we will use the number of retrieved exemplars in a recall test – analogous with the original famous-names study. Memory Models of Frequency Judgment As already alluded to, in the memory literature several detailed computational models were suggested that can be used to explain frequency judgments (e. g. Gillund & Shiffrin, 1984; Hintzman, 1988; Murdock, 1993; Fiedler, 1996; Dougherty, Gettys, & Ogden, 1999; Sedlmeier, 1999). All these models share the assumption that every additional encoding of an event occurring with a minimum of attention has an effect on memory representation. Accordingly, in these models the state of the memory representation at the time of judgment can be used to produce a signal the intensity of which depends on the frequency of the event in question. Depending on the specific model, this signal is called trace strength (Hintzman, 1969), familiarity (Hintzman, 1988; Dougherty, Gettys, & Ogden, 1999) or associative response (Sedlmeier, 2002). Generally, it is considered as an equivalent to intuitions about event frequencies. All models have thus also the assumption in common that frequency judgments can be generated without retrieving single instances. However, the models clearly differ with regard to their specific representation assumptions and, as a result, also with regard to the process with which the memory 14 representation is accessed. According to these criteria, the models can be subdivided into two categories: multiple-trace models (sometimes also termed exemplar models; Estes, 1986) and associationist models (sometimes also termed prototype models; Smith & Medin, 1981). In multiple trace models a trace is added to an indefinitely increasing memory matrix for each newly encoded event. MINERVA 2 (Hintzman, 1988) is the model of this kind that was most often used to explain frequency judgments. In this model, a trace with which an event is represented in memory corresponds to a vector of features. Possible feature values are “1“ (feature present), “-1“ (feature not present) and “0” (feature either irrelevant or unknown). Features of an event are stored correctly in memory with the probability L and are coded as unknown with the probability 1 - L. To elicit a frequency judgment, memory is probed with the vector that represents the event in question. In order to determine the ‚response’ of memory to this probe, the similarity between each trace and the probe is determined. The similarity of a single trace to the probe results from the scalar product of the two feature vectors1. The familiarity signal used to model frequency judgments corresponds to the sum of the cubed similarity of all traces. As far as all other variables remain unchanged, a more frequent event will thus elicit a greater familiarity signal than a less frequent event since it is represented in memory with more traces. In associationist models, on the other hand, events are not stored in the form of separate traces. An example of an associationist model is PASS (Probability ASSociator) (Sedlmeier, 1999; 2002). PASS is based on a neural network that acquires frequency knowledge by associative learning. The network consists of one layer of units and weights connecting these units with each other2. A given unit is activated each time the appropriate feature is available in an event that is to be encoded. The input format for PASS therefore consists of vectors, similar to the input format for MINERVA 2, in which the values 1 and 0 indicate the presence or absence of certain features in events. To encode an event, the weights (or associative strengths) between the units are modified. The associative strength between two units is increased if both 1 More precisely, the similarity is given by the scalar product divided by the number of features that are not 0 in both vectors. 2 Sedlmeier (1999) could demonstrate that different architectures of networks result in similar simulation results. PASS thus includes a family of neural networks. The present description refers to a simple one-layered network called FEN II (s. also Sedlmeier, 2002). 15 corresponding features are available in the event. If only one or neither feature is available, the associative strength between the units is decreased. The extent of the modification of the weights in each encoding is regulated by a learning parameter, an interference parameter and a decay parameter. To elicit a frequency judgment, PASS is presented with a featural description of the event in question. The activation which this event triggers at the corresponding units spreads across all units of the network according to the previously acquired associative strengths. The sum of the activation in all units is called associative response and is considered the basis for frequency judgments. As far as all remaining potentially relevant variables are constant, more frequent events will elicit a stronger response than less frequent ones since higher associative strengths exist between units that encode their features. In spite of their different architectures, multiple-trace models and associationist models often make similar or identical predictions. Thus, both MINERVA 2 and PASS were used to simulate the typical finding of quite accurate but regressed frequency judgments. In addition to this, memory models have been successful in explaining the influence of diverse factors on frequency judgments. For instance, recency effects (Lopez et al., 1998; Sedlmeier, 1999), list-length effects and spacing effects (Hintzman, 1988) in frequency judgments could be simulated with these models. Biased Estimates Reconsidered Until recently, however, no attempt was made to explain biased estimates in the scope of memory models. Just like memory literature had little influence on the research in the area of judgment and decision-making, the results and explanatory approaches from the heuristics and biases program were hardly addressed in memory research. This only changed in the course of the 1990-ies. Researchers in both subdisciplines started to carry out studies in which predictions of the availability heuristic were tested against predictions of the memory models (e.g. Betsch et al., 1999; Manis et al., 1993; Sedlmeier, Hertwig, & Gigerenzer, 1998; Watkins & LeCompte, 1991; Williams & Durso, 1986). Additionally, theoretical attempts to integrate the findings from both research paradigms originated. An example is MINERVA-DM (with DM standing for decision making), a modified version of MINERVA 2 that was suggested 16 by Dougherty, Gettys and Ogden (1999). This model could be used successfully to simulate both accurate and biased estimates. However, the greater consideration of results from the area of judgment and decision-making in memory literature also resulted in a criticism of those studies demonstrating biases in frequency judgments (e.g. Gigerenzer, 1991). In case of the letter study (Tversky & Kahneman, 1973), this criticism was mainly empirically founded. The finding of a general overestimation of the frequency of occurrence of letters in the initial position of words could not be replicated in subsequent studies. White (1991) reported in a short article that his participants judged the majority of consonants also used by Tversky and Kahneman correctly as occurring more often in third position. Sedlmeier, Hertwig and Gigerenzer (1998) used a greater sample of letters. The judgment task of their participants was to estimate the relative frequency of these letters in initial and second position of German words. Additionally, to measure the availability of words with these letters in the relevant positions, a recall test was carried out. The result of this recall test confirmed Tversky and Kahneman’s assumption: Irrespective of how often the letters actually occur in initial and second position, the participants generally retrieved more words beginning with a given letter than words showing this letter in second position. However, the result of the judgment task contradicted the recall-estimate hypothesis. In three successive studies, letters occurring more often in second position were mostly also judged to appear more often in this position. Correspondingly, the covariation between the frequency judgments and the number of retrieved words was low. On the other hand, the estimates could be predicted very well by the actual positional frequencies of the letters. The only “bias“ evident in the judgments consisted of the well-known phenomenon of regression to the mean. Criticism of the study of judgments on the frequencies of causes of death (Lichtenstein et. al, 1978) primarily referred to the normative standard with which the estimates were compared. In a comment on the study, Shanteau (1978) already pointed out that most people hardly have any immediate experiences with the repeated occurrence of a greater number of causes of death. However, if the participants had no opportunity to learn the frequencies normatively regarded as being correct, it is little 17 surprising that their estimates deviated from the comparison standard at least in some cases. Therefore, he considered it as tenuous to suggest a psychological interpretation of the supposed judgment errors with the availability heuristic. That possibly no special mechanism is necessary to explain the errors observed in this study results from the fact that the systematical over- and underestimations of the frequencies of different causes of death were closely related to the frequency with which newspapers reported on these causes. This leaves the interpretation open that the estimates did indeed not correspond to the actual facts but reflected the frequencies with which the causes of death were presented to the participants (Sedlmeier, 1999; 2002). In this view, the result of the study by Lichtenstein et al. (1978) would correspond to the usual finding in research on frequency judgments: The participants gave quite accurate but regressed estimates on the frequencies with which they encoded the stimuli. This finding could in turn easily be explained by memory models. Different from the letter study and the study of frequency judgments on causes of death, later research did not call into question that the famous-names study (Tversky & Kahneman, 1973) demonstrates a reliable bias. It is normatively evident that it is a systematic error if the large majority of participants judges that one of two categories as more frequent which was presented less frequent before. Empirically it could be shown that the famous-names effect is replicable. In several experiments, Lewandowsky and Smith (1983) presented lists with a varying number of famous names of the one sex and non-famous names of the other sex to their participants. Subsequent to the list presentation, the frequency of the “famous sex“ was consistently overestimated. A study by Manis et al. (1993) in which 20 famous and 20 less famous names of both sexes were presented yielded the same result. An Alternative Account for the Famous-Names Effect However, later research did call into question Tversky and Kahneman’s original explanation for this bias according to which the participants used a recallestimate strategy to generate their estimates. Dougherty, Gettys and Ogden (1999) and Sedlmeier (1999) suggested virtually the same alternative account for the famousnames effect on the basis of memory models. Accordingly, these authors assume that 18 the famous-names effect is mediated by a cognitive process for which the retrieval of exemplars from the list of names is without relevance. The starting point of the alternative explanation is the assumption that the essential feature of famous names is their increased frequency of occurrence outside the experiment. Consider as an example the famous name “Marilyn Monroe“. Every participant of a study in which this name is used as a stimulus will probably have come across the name “Marilyn Monroe” several hundred times before the study in magazines, films, books, on posters, in conversations, radio and television broadcasts. Less or non-famous names, on the other hand, will rarely or not at all have been encoded before the experiment. Dougherty and colleagues (1999) and Sedlmeier (1999) suppose furthermore that the participants are not able to discriminate perfectly between the occurrence of stimuli inside and outside the experiment. Thus, when memory is probed for the frequency of famous and less famous names after the list presentation, the memory response will be influenced, to some degree, by the pre-experimental frequency of these names. This way, the famous names that were presented less often in the experiment could receive a higher response and thus a higher estimate than the less famous names. Dougherty et al. (1999) modeled the assumed process with MINERVA-DM3. In each of their simulations the memory matrix contained 39 traces, corresponding to the 19 famous and 20 less famous names presented in the experiment. In order to assess the frequency with which the participants may have encoded the names preexperimentally, the authors determined the citation frequency of – according to their judgment - highly famous and less famous male and female actors in some major USAmerican newspapers. Famous male actors were mentioned 46 times more often in these newspapers than less famous male actors. For famous and less famous female actors the ratio amounted to 26 : 1. According to this ratio, in two simulations either 46 or 26 additional extra-experimental traces were stored in memory for every famous name. For each less famous name the memory matrix was supplemented with one extra-experimental trace. 3 For the current purpose, the differences between MINERVA 2 and MINERVA-DM are irrelevant. Our description of MINERVA 2 is thus a valid description of the model which formed the basis of the simulations described below. 19 Dougherty and colleagues now assume that the extent to which the preexperimental occurrence of stimuli influences estimates on the experimental frequency of these stimuli depends on the contexts in which the stimuli were encoded. The influence of pre-experimental events is supposed to be the greater the more similar the context in which these events were encoded is to the experimental context (s. also Dougherty & Franco-Watkins, 2003; Johnson, Hashtroudi, & Lindsay, 1993). To integrate this “context-discrimination mechanism“ (Dougherty & Franco-Watkins, 2002) into the simulation, every trace in memory consisted of two mini-vectors. One of these mini vectors represented a name; the second mini vector represented the respective encoding context of the name. Correspondingly, each of the 39 intraexperimental traces was complemented by the same (experimental) context minivector. The context mini-vector for each extra-experimental trace, on the other hand, was generated randomly. The memory probes used to elicit frequency judgments of the model included the experimental context component. Therefore, intra-experimental traces were considerably more similar to the probes than extra-experimental traces. Consequently, every single extra-experimental trace could increase the resulting familiarity signal only by a comparatively small amount. Due to their large number, the extraexperimental traces of famous names nevertheless affected the simulation result decisively. On average, famous names received a clearly more intense familiarity signal than less famous names. After the summation of the familiarity signals to all famous and less famous names, MINERVA-DM predicted a large overestimation of the frequency of famous names on the experimental study list. In the simulation with 26 extra-experimental traces for each famous female actor, the model expected an estimate of approximately 70% for the proportion of famous names on the study list. In the simulation with 46 extra-experimental traces for each famous name, MINERVADM even predicted an estimate of about 80% for this proportion. In contrast to this, the model estimated the actual relative frequency of 48.7 % of famous names on the study list approximately correctly in a third simulation in which no extra-experimental traces were included in the memory matrix. Dougherty et al. (1999) concluded from these 20 results that “[the simulations] have shown that availability effects can be accounted for by a global familiarity signal without assuming recall is taking place“ (p. 193). Sedlmeier (1999) pointed out that the famous-names effect could be explained in the scope of the PASS model, again due to the higher pre-experimental frequency of famous names. While he did not carry out a simulation demonstrating that the model predicts this effect, he outlined how such a simulation could look like. The suggested procedure is identical with the simulation method used by Dougherty and colleagues. In a first step PASS encodes the pre-experimental occurrences of famous and less famous names together with their respective contexts. After that, every name on the study list is encoded a further time – now with the experimental context identical for all names. At the time of judgment, memory is probed with the 39 experimentally presented names. PASS’ estimates on the proportion of famous names on the study list result from the sum of the associative responses to the 19 famous names in relation to the sum of the responses to the 20 less famous names. Thus, the only difference to the explanation of the famous-names effect with MINERVA-DM consists in the fact that the repeated (pre-experimental) occurrence of names is in PASS not represented by additional traces in a memory matrix but by changed associative strengths in a neural network. Higher associative responses for famous names than for less famous names are to be expected in this model since the associative strengths between the units of the network coding the features of a famous name should be already pre-experimentally higher. The simulation method suggested by Sedlmeier would undoubtedly lead to the result that PASS overestimates the frequency of famous names on the study list. However, it should be noted that this simulation method is definitely not suitable to map the cognitive process with which the participants judged the frequency of famous and less famous names in the original study. The same applies to the simulation method used by Dougherty and colleagues to demonstrate that MINERVA-DM expects a famous-names effect. In both simulations all 39 names from the study list were used successively as probes to elicit frequency judgments of the memory models. The participants of the original study, however, were not presented again with all 21 names from the study list at the time of judgment. As a result, they could not access memory by the individual names. By using individual names as probes, the simulations primarily demonstrate that memory models expect an overestimation of famous names in the case of frequency of occurrence judgments. According to this prediction, participants should, for instance, overestimate the experimental frequency of “Marilyn Monroe“ relative to a less famous name if both names are included in a study list equally often. On the other hand, in the famous-names study set size judgments on the relative frequency of male and female names were requested. Correspondingly, the participants could exclusively use the prompts “male names“ and “female names“ to access memory. The simulation of set size judgments by adding the responses to individual names at the time of judgment can thus not correspond to the actual cognitive process. Despite this problem, the simulations lend credence to the assumption that memory models can offer an alternative explanation for the famous-names effect. They demonstrate that famous names elicit higher memory responses due to their increased pre-experimental frequency. These increased responses could be aggregated to set size judgments in a different way than by addition at the time of judgment. Various aggregation mechanisms are conceivable and could be integrated into the models. The simplest assumption presumably is that stimuli activate the memory representation of appropriate superordinate categories already during their encoding. Since famous names elicit a greater memory response, their superordinate categories should then be activated more than the superordinate categories of less famous names. Thus, “Marilyn Monroe“ would, for example, activate the category “female“ more than a less famous male name the category “male“. This stronger activation of the memory representation of “female“ could increase by the continual presentation of famous women and less famous men for the period of the learning phase. Therefore, when memory is probed for the frequency of male names and female names at the time of judgment the activations at the level of superordinate categories could be directly accessed.4 4 A similar suggestion was made by Alba et al. (1980). These authors found that the absolute number of exemplars of several categories was estimated quite accurately. Furthermore, the accuracy of the judgments was not impaired by severe response time restrictions. From this finding Alba et al. 22 Consequently, the more ’famous’ category would receive an inflated frequency estimate. Thus, the essence of the alternative explanation for the famous-names effect by memory models can be regarded as plausible despite the deficient specification and implementation of an aggregation mechanism: The models explain the effect by the increased pre-experimental frequency of occurrence of famous names and the participants’ incapability to discriminate perfectly between different encoding contexts. An important characteristic of this explanation is that it deems other features of famous names irrelevant. The size of the effect should therefore covariate with the differences in the pre-experimental frequencies of occurrence of famous and less famous names. Besides, the famous-names effect should generalize to different stimulus material. Judgments on the relative frequency of any two categories should exhibit an analogous effect: That category represented with exemplars on the study list which show the greater extra-experimental frequency should generally be overestimated. In the case of frequency of occurrence judgments, similar hypotheses have already been tested successfully. Several studies demonstrate that estimates on the frequencies of stimuli in a target context are influenced by the frequency of these stimuli outside this target context. Johnson, Taylor and Raye (1977) found that estimates on how often words appeared on a study list increased the more often the participants were additionally asked to generate the words internally. Hintzman and Block (1971) presented their participants with two successive study lists that both contained the same stimuli with varying frequencies. Estimates on the frequency of a stimulus on one of the two lists were influenced by the frequency of this stimulus on the other list. Dougherty and Franco-Watkins (2003) used a similar design. First they presented a list on which words occurred with different frequencies. On the subsequent target list these words had all the same frequency. Again, judgments on the target list frequency of a word increased with the frequency of this word on the previous list. In this study it could also be demonstrated that the extent of the influence of the concluded that the basis for category size judgments is already formed during the encoding of the exemplars and can later be directly retrieved. 23 frequencies of occurrence outside the target list is dependent on the similarity of the encoding contexts. As predicted by the memory models, this influence was smaller when target list and non-target list were not identically designed. The studies cited above do not only differ from the situation in the famousnames experiment by the fact that frequency of occurrence judgments were inquired instead of set size judgments. In these studies, the non-target frequency was manipulated within the experiment. In contrast, according to the explanation of the memory models, the famous-names effect is caused by the naturally varying frequencies of occurrence of the names outside the experiment. On the one hand, these extra-experimental frequencies of occurrence can be assumed to be larger by orders of magnitudes than the frequencies of stimuli in an experimentally presented non-target list (amounting to a maximum of 16 in the studies described above, s. Dougherty and Franco-Watkins, 2003). On the other hand, the similarity of the encoding context to the experimental target context is of course very reduced in the case of extra-experimental occurrences of stimuli. Nevertheless, at least two studies show that judgments on the experimental frequency of occurrence of words can be influenced by the extraexperimental, linguistic frequency of the words. Both Rao (1983) and Greene and Thapar (1994) found that the experimental frequencies of high-linguistic-frequency words are discriminated worse than those of low-linguistic-frequency words. Moreover, Rao reports that in several studies high-linguistic-frequency words with an experimental frequency of 1 received slightly higher judgments than low-linguisticfrequency words with the same experimental frequency. The question whether also judgments on the experimental size of categories are affected by the frequency of occurrence of their exemplars in different contexts has not been examined yet. The objective of the four experiments reported below was to provide such a test. In all studies the participants were presented with exemplars from two categories occurring with a considerably different frequency outside the target list. The frequency of occurrence of the exemplars was either varied by using stimuli with extra-experimentally different frequencies (experiments 1, 2, and 4) or by presenting a non-target list in advance (experiment 3). If that category the exemplars of which occur more frequently outside the target list is overestimated in these experiments, this 24 will support the assumption that the famous-names effect can be explained in the scope of memory models. The assumption that the participants retrieve single exemplars to generate their frequency judgments would no longer be necessary. On the other hand, if the expected effect does not occur with stimulus material other than famous names, the hypothesis of the memory models will have to be rejected. In this case, the famousnames effect could not be attributed to the different pre-experimental frequency of famous and non-famous names. The explanation of the famous-names effect by memory models would then be deprived of its basis. It should be pointed out that a covariation between the extra-experimental frequency of exemplars and experimental frequency judgments on the corresponding categories would merely demonstrate that memory models can offer an explanation for the famous-names effect. Such a result would, on the other hand, not necessarily indicate that the participants really generated their estimates with a memory assessment strategy. Alternatively, it can be assumed that the manipulation of extraexperimental frequencies influences recall performances and memory responses in an equal manner. In this case, the frequency judgments could be traced back to both a memory assessment strategy and a recall-estimate strategy. In general, whenever these strategies are tested the problem occurs that factors resulting in an improved memory representation of events and thus higher memory responses usually also elicit an enhanced retrieval of these events (Watkins & LeCompte, 1991; Brown, 1997; Manis et al., 1993; Lewandowsky & Smith, 1983). This also holds for the extra-experimental frequency manipulated in the current experiments. It is a well-established fact, that high-linguistic-frequency words are better recalled than low-linguistic-frequency words (e.g. Sumby, 1963). Thus, it can be expected that also the recall-estimate strategy predicts that the relative frequency of the category with the more frequent exemplars is overestimated. However, this does not necessarily mean that the recallestimate approach and the memory models also arrive at the same quantitative predictions. For instance, in a study by Watkins and LeCompte (1991) the predictions of the recall-estimate strategy were too weakly correlated with the actual frequencies to be able to explain the usual finding of fairly well calibrated category size judgments. Therefore, in the present experiments we also checked whether the recall-estimate 25 strategy expects a similarly sized overestimation of the category with the more frequent exemplars as the memory models. A comparison of the quality of the predictions of both approaches in all experiments should allow a conclusion whether the approaches are really equally suitable to explain the participants’ frequency judgments. Overview of the Experiments For all four experiments we chose a design that closely followed the procedure of the famous-names study. Specifically, we presented the participants in every experiment with a study list containing 39 different stimuli. These stimuli were exemplars of two different superordinate categories. One of these categories was represented with 19 exemplars on the study list, the other with 20 exemplars. The 19 exemplars of the one category always had a clearly greater frequency of occurrence outside the study list than the 20 exemplars of the other category. Thus, there was always an inverse relation between the number of exemplars of a category on the study list and the frequency of occurrence of these exemplars outside the study list. After the presentation of the study list, the participants judged from which of the two categories more exemplars were included in the list. Following this ordinal judgment, we asked the participants for quantitative estimates on the relative frequencies of the categories. Furthermore, we administered a free recall test in all experiments to be able to derive quantitative predictions of the recall-estimate strategy. To control for possible order effects, half of the participants completed this recall test before giving their frequency estimates and the other half completed the recall test after having given their estimates. The explanation of the memory models for the famous-names effect is based on the assumption that more famous names generally occur more often than less-famous names. Obviously, this explanation can only be valid if the effect is also observed when names are used in the experiment that were not selected because of their fame but due to their extra-experimental frequency of occurrence. If the effect did not occur in this case, a further test of the memory model explanation would be superfluous. In Experiment 1 we therefore used names of male and female personalities mentioned in major German newspapers with different frequencies. A further aim of Experiment 1 26 was to obtain a comparative standard for the evaluation of the size of the effects in the subsequent experiments: As far as the famous-names effect is really caused by the extra-experimental frequency of occurrence of the names, an analogous effect should show a similar size when different stimulus material with similar frequencies is applied. An apparent problem of Experiment 1 is that it does not solve the supposed confounding of fame and frequency of occurrence outside the target list. To meet this problem, we used stimulus material that cannot be considered as famous in both subsequent experiments. In Experiment 2 the study lists contained exemplars of the categories ’plants’ and ‘animals’ which pre-experimentally occur with a different frequency. In Experiment 3 we used female and male names of persons unknown to the participants. The frequency of occurrence of these names outside the target list was varied in a preceding learning phase. For reasons that will become clear from the results of the experiments 1 to 3, we looked for stimulus material that should be as similar as possible to that of the original famous-names study for experiment 4. we decided on names of German and foreign cities with different pre-experimental frequencies of occurrence. 27 Experiment 1 Experiment 1 is a modified and extended replication of the famous-names study. The essential difference to the original study is that the names were not selected due to their fame but due to their extra-experimental frequency of occurrence. Moreover, instead of two study lists we used six lists on which the differences between the pre-experimental frequencies of male and female names were varied. This allowed us to check whether the estimates on the relative experimental frequencies of the categories covariate with the extra-experimental frequencies of the exemplars. Method Participants: Ninety-six persons took part in the experiment (41 men and 55 women, with an average age of 24 years). Most of the participants were students of the TU Chemnitz from various departments. Students of Psychology did not take part in the experiment. The participants received a reimbursement of 3 Euros. Material: The procedure to select stimulus names was as follows: In a first step we prepared a list of names with 80 known female and male personalities each. These personalities were predominantly actors, musicians, athletes, politicians and writers. In a second step, we determined the frequency of occurrence of the names in several volumes of three nation-wide German daily newspapers (Südeutsche Zeitung, Frankfurter Rundschau and die tageszeitung). From the 19 female and male names with the greatest frequency of occurrence in this text corpus we formed sets with “high-frequent names” (these sets contained, for example, Steffi Graf, Alice Schwarzer, Marilyn Monroe or Berti Vogts, Günter Grass and Steven Spielberg). The 20 male and female names with the least frequency of occurrence in the corpus were, on the other hand, used as “medium-frequent names” in the experiment (e.g. Udo Jürgens, Roman Polanski, Georg Büchner or Doris Day, Ella Fitzgerald and Isabel Allende). In addition to this, we formed two sets with 20 unknown female and male names. These names were drawn from an Internet list of participants of a lottery game and had a frequency of occurrence of 0 in the corpus. For all stimuli used in the experiment the first names allowed an unambiguous identification of sex. The six sets with high- 28 frequent, medium-frequent and unknown names of both sexes were combined to the six study lists used in the experiment according to the scheme illustrated in Table 2.1. Table 2.1 Composition of the study lists in Experiment 1 female names high- high-frequent medium-frequent unknown -- 19 males / 19 males / 20 females 20 females -- 19 males / frequent male names medium- 19 females / frequent 20 males unknown 19 females / 19 females / 20 males 20 malesa 20 femalesa -- a In study lists with medium-frequent and unknown names the medium-frequent name with the smallest frequency of occurrence in the corpus was not used. Table 2.2 displays the sums of the frequencies of occurrence in the text corpus for male and female names on each of the six study lists. These sums of the frequencies of occurrence form the basis of the memory model prediction of the frequency estimates. On the first three lists the male names show the greater preexperimental frequency of occurrence, while female names are pre-experimentally more frequent on the following lists. As the table shows, each time the greatest differences between the frequencies of occurrence of male and female names exist on the lists with high-frequent and unknown names, followed by the differences on the lists with high-frequent and medium-frequent names. The smallest differences in the pre-experimental frequency of occurrence can each time be found on the lists with medium-frequent and unknown names. Procedure: The participants were tested individually or in groups of up to six persons at personal computers. Every participant was randomly assigned one of the six study lists. Prior to the study phase, the participants were informed that they would be presented with a list of names in the following minutes. Furthermore, they were instructed to read through this list of names consciously as they would later have to 29 answer some questions referring to this list. The exact nature of the memory test, however, remained unspecified. Table 2.2 Sum total of the frequencies of occurrence in the text corpus for male and female names on the study lists in Experiment 1 male female study lists names names 19 high-frequent males / 20 unknown females 18101 0 19 high-frequent males / 20 medium-frequent females 18101 1364 19 medium-frequent males / 20 unknown females 2267 0 0 9661 2311 9661 0 1343 19 high-frequent females / 20 unknown males 19 high-frequent females / 20 medium-frequent males 19 medium-frequent females / 20 unknown males During the study phase, names were presented one at a time. Each presentation lasted three seconds, followed by a one-second inter-stimulus interval in which the screen was blank. The presentation order of the 39 names on the study list was randomly determined for every participant. Subsequent to the study phase, one half of the participants in each study list condition first completed the frequency judgment tasks, and the other half first completed the free recall test. Within the frequency judgment tasks, participants were first asked to indicate whether the study list contained more female names or more male names. After this ordinal frequency judgment, exact quantitative estimates of the percentages of female names and male names on the study list were requested from the participants. Although the ordinal judgment was a forced-choice task, participants had the opportunity to estimate both the percentages of female and male names with 50% in the quantitative judgment. For the free recall test, participants were handed over a sheet of paper. They were instructed to recall and write down within two minutes as many names as possible from the study list in any order. After the two minutes, the experimenter collected the sheet. 30 Predictions of the Memory Models To gain quantitative predictions of the memory models, we used the same procedure as Dougherty et al. (1999) in their simulations with MINERVA-DM. However, since the results of this procedure can be derived mathematically in an easy manner, we renounced the realization of further simulations. Generally, it was assumed in the memory models that the estimates on the relative frequency of a category are determined by the sums of the memory responses to all female and male names.5 In the terms of MINERVA-DM the estimate on the relative frequency of female names on the study list should be given by the equation below (the expected relative frequency estimate on male names can be determined likewise): Estimatefemale names = familiarity signalsfemale names / ( familiarity signalsfemale names + familiarity signalsmale names) Given that all intra-experimental occurrences of names are encoded equally well, all names can be expected to elicit a familiarity signal of approximately the same strength based solely on their intra-experimental traces. If the study list exclusively contained 19 unknown female and 20 unknown male names, the model would consequently expect a relative frequency judgments for female names of 19/39. This was also the result of the simulation by Dougherty et al. in which the memory did not contain any extra-experimental traces. However, if names appear on the study list already encoded extra-experimentally, the extra-experimental traces will also contribute to the strength of the familiarity signal elicited by a name. As already explained above, the size of this contribution should depend on the similarity between the extra-experimental encoding context and the intra-experimental encoding context. Since one can assume that the extra-experimental encoding contexts of female and male names show on average the same similarity to the intra-experimental context, also the average contribution of the extra-experimental traces of female and male names to the familiarity signal should be alike. The contribution of each extra5 It should be pointed out once more that we do not assume that the addition of responses to individual names maps the process with which category size judgments are generated. Nevertheless, we follow the predictions from MINERVA-DM and PASS as far as we assume that the responses to categories are proportional to the corresponding sums. 31 experimental trace can thus be described as a constant fraction a of the contribution of an intra-experimental trace. Both, the sum of the familiarity signals to all female names and the sum of the familiarity signals to all male names are thus proportional to: intra-experimental occurrences of names + a extra-experimental occurrences of names If the frequencies of occurrence in the text corpus given in Table 2.2 were inserted in the formulas described above, the MINERVA-DM estimate for the relative frequency of female names on the list with 19 high-frequent women and 20 mediumfrequent men would result from (19 + a * 9661) / ((19 + a * 9661) + (20 + a * 2311)). The value of a is obviously unknown and has to be estimated. Without applying any formal fitting-procedures, we determined a so that the magnitude of the MINERVADM predictions approximately corresponds with the magnitude of the participants’ frequency judgments on all six study lists. This resulted in an estimate of a = 0.001.6 In the Experiments 2 and 4 in which also stimulus material with a varying extraexperimental frequency was used we determined the quantitative prediction of the memory model with the same parameter setting for a. For the present experiment the selected procedure evidently means that the magnitude of frequency judgments cannot be used for an evaluation of the memory model prediction. Figure 2.1 displays the resulting predictions for the estimated relative frequency of male names on the six study lists. In the figure the study lists are arranged according to the difference between the sum totals of the extra-experimental frequencies of occurrence of male and female names. On the first three study lists this difference is positive. Starting from the list with medium-frequent female names and unknown male names, the difference takes negative values. As the figure shows, the prediction of the memory model covaries with the difference of the pre-experimental frequency of occurrence of the names of both sexes. The highest estimate for male 6 The frequencies of occurrence in the text corpus certainly represent only a mediocre estimate of the total number of the extra-experimental encodings of the names. However, a proportional increase of the extra-experimental frequencies of male and female names would be compensated by our setting of the parameter a and thus not change the MINERVA-DM predictions. Consequently, the use of the frequencies of occurrence in the text corpus merely implies the assumption that these frequencies of occurrence reflect the relative differences between the extra-experimental frequencies of male and female names approximately correctly. 32 names is expected on the list with high-frequent male and unknown female names. Conversely, the lowermost estimate is predicted on the list with high-frequent female and unknown male names for which this difference takes the largest negative value. Overestimations of the proportion of male names are expected on all study lists on which these names are pre-experimentally more frequent. The rank order of the predicted estimates on the frequency of male names on the six study lists is largely independent of the parameter a. Also the relative differences between the predictions to the six lists vary only slightly with the parameter as long as values are taken for a that are (I) so large that the model expects overestimations of the experimentally rarer category, and (II) so small that an intra-experimental trace exerts a clearly greater influence on the familiarity signal than an extra-experimental trace.7 The relative differences of the expected frequency judgments across the six study lists thus provide a criterion for the assessment of the quality of the prediction of the memory models. As already explained above, on the basis of the same assumptions also entering into the derivation of the MINERVA-DM predictions, the PASS model (Sedlmeier, 1998), too, expects an overestimation of the sex with the pre-experimentally more frequent names. However, the quantitative predictions of PASS do not tally precisely with those of MINERVA-DM. In MINERVA-DM every additional occurrence of an event is represented in memory by an additional trace. The familiarity signal elicited by extra-experimental traces of names thus increases linearly with the number of these traces (Hintzman, 1988). In contrast to this, events in PASS are encoded by a modification of associative strengths. The increase of the associative strength becomes the smaller the more frequent the event in question was encoded before. The associative response elicited in PASS by the names thus does not follow a linear but a negatively accelerated function of the pre-experimental frequencies of the names.8 To 7 To illustrate this point: We tested different parameter settings between a = 0.001 and a = 0.01. While the absolute magnitude of the predicted frequency judgments with these parameter settings changes by up to 30%, the covariation between the MINERVA-DM predictions and the participants’ estimates remains nearly constant. 8 In general, MINERVA-DM expects that frequency judgments increase linearly with the actual frequencies while PASS assumes that that the function relating estimates to frequencies is negatively accelerated. Surprisingly, even the large amount of studies on frequency judgments does not allow any definite conclusions which of the assumptions applies. In the vast majority of these studies, the frequencies are varied experimentally. The maximum presentation frequency seldom exceeds 15 in these cases. In such studies the estimates can mostly - but not always (e.g. Leicht, 1968; s. a. the results 33 gain predictions of the PASS model, the pre-experimental frequencies thus have to be transformed first according to this function. As a consequence, PASS expects a weaker increase of the judgmental bias between study lists with medium-frequent and unknown names and study lists with high-frequent and unknown names than MINERVA-DM. For the present experiments the difference between the predictions by MINERVA-DM and PASS can nevertheless be neglected. The exact function relating the extra-experimental frequencies to the associative responses depends in PASS on the settings of the model parameters and on the temporal course of the encodings. In order to carry out necessary parameter settings, however, we had to rely in the present studies on the magnitude of the frequency judgments in Experiment 1. The frequency judgments expected by PASS would thus vary in a similar range as the MINERVADM predictions. The remaining differences in the progression of the predictions across the six study lists would be too small to permit a competitive test of both models. On behalf of the predictions of both memory models, we thus used in all experiments the MINERVA-DM predictions. Results Predictions of the Recall-Estimate approach To gain quantitative predictions of the recall-estimate strategy for the estimates on the relative frequency of male names on the study lists, we calculated a recall ratio for every participant. This measure is usually applied in tests of the recall-estimate strategy (e.g. Lewandowsky & Smith, 1983; Watkins & LeCompte, 1991, Manis et al., 1993, Sedlmeier, Hertwig, & Gigerenzer, 1998). In the present experiment, the recall ratio denotes the percentage of recalled male names among the total number of recalled names.9 Figure 2.1 shows the means of these ratios on the six study lists. In the figure reported in chapter 3) - be approximated fairly well by a linear function of the actual frequencies. However, it remains unclear whether this result is caused by the restricted range of the presentation frequencies. 9 Here and in all following experiments we scored all recalled exemplars that could unambiguously be assigned to one category. Generally, the recall-estimate predictions change only slightly if a stricter scoring criterion is applied. However, there is a tendency that greater biases are expected with such a 34 the data of the participants having worked on the recall test before and after the frequency judgment tasks are summarized since the recall performances did not differ between the two task orders. 100 90 Memory Model Predictions Recall-Estimate Predictions Estimates Percentage of Male Names 80 70 60 50 40 30 20 10 0 19 f uk 20 / hfm 19 /20 hfm f mf f m m km uk uk mf 0u 0 20 0 2 / / 2 2 / f/ m hff hff mf mf 19 19 19 19 Figure 2.1. Mean estimated percentages of male names on the six study lists and the corresponding predictions of the memory models and the recall-estimate strategy in Experiment 1. Error bars denote standard errors. hfm = high-frequent male names; mfm = medium-frequent male names, ukm = unknown male names; hff = high-frequent female names; mff = medium-frequent female names, ukf = unknown female names. The recall ratios were strongly influenced by the pre-experimental frequencies of the names. On all six lists clearly more names of that sex were recalled the exemplars of which had greater pre-experimental frequencies of occurrence. The recall-estimate strategy thus expects like the memory models an overestimation of the relative frequency of male names on the first three study lists and an underestimation criterion. As will become clear from the results of the experiments, this would be a disadvantage for the recall-estimate strategy. 35 of this frequency on the last three study lists (s. Fig. 2.1). However, the predictions of the recall-estimate strategy do not decrease monotonously with the differences of the frequencies of occurrence of male and female names in the text corpus. The smallest bias in the estimates is here predicted on the two lists with high and medium-frequent names with which merely a medium difference in the frequencies of occurrence of male and female names in the corpus was realized. On each of the four lists with unknown names, on the other hand, the recall-estimate strategy expects a bias of a similarly extreme size. The predicted deviation between the estimated and the actual relative frequency of male names here continuously amounts to more than 30%. We evaluated the influence of the pre-experimental frequency of the names on the recall ratios additionally with the help of correlational effect sizes. we compared the recall ratios for male names (I) on the two lists on which high-frequent and unknown names were combined, (II) on the two lists on which high-frequent and medium-frequent names were combined and (III) on the two lists on which mediumfrequent and unknown names were combined. The resulting three effect sizes together with the corresponding t-values and degrees of freedom can be gathered from Table 2.3. The effect sizes display the pattern that is suggested by Figure 2.1. The effect in the recall ratios on the lists with medium-frequent and unknown names is as large as on the lists with high-frequent and unknown names. On the lists with high-frequent and medium-frequent names, on the other hand, the effect is somewhat smaller.10 10 Since also the experimental frequencies of male names on the compared lists varied between 48.7% and 51.3%, the effect sizes do not reflect exclusively the influence of pre-experimental frequencies. Due to the fact that pre-experimental and experimental frequencies were inversely related, one can assume that the actual influence of the pre-experimental frequencies is somewhat larger than expressed in the effect sizes. For the present purpose this issue is irrelevant for we use effect sizes to compare the impact of pre-experimental frequencies across conditions and experiments in which the experimental frequencies always varied in the same way. 36 Table 2.3 Effect sizes of the list compositions on recall ratios and frequency estimates in Experiment 1 Recall Ratio Study Lists Frequency Estimates reffect size t reffect size t df 19 hfm/20 ukf 19 hff/20 ukm .93 13.57 .67 4.94 30 19 hfm/20 mff 19 hff/20 mfm .67 4.99 .40 2.37 30 19 mfm/20 ukf 19 mff/20 ukm .95 16.17 .32 1.88 30 Note. hfm = high-frequent male names; mfm = medium-frequent male names, ukm = unknown male names; hff = high-frequent female names; mff = medium-frequent female names, ukf = unknown female names Frequency Estimates Figure 2.1 displays the mean quantitative estimates about the percentage of male names on the six study lists. The data from the two conditions in which the frequency judgment tasks preceded or followed the recall test are summarized here and in the analyses below since the task order had no systematic influence on the magnitude of the quantitative estimates and the ordinal frequency judgments. On the first three lists the actual percentage of male names on the study list amounts to 48.7%. This percentage is overestimated on all of the three lists on which male names have the greater pre-experimental frequency than female names. In contrast to this, the actual percentage of male names of 51.3% on the three lists with pre-experimentally more frequent female names is continuously underestimated. Although the deviation of the mean estimates from the normative value is generally not large and exceeds 10% only on the two lists with unknown female names, a bias in the estimates is evident. This is particularly obvious in the effect sizes with which we analyzed the influence of the pre-experimental frequencies of occurrence on the quantitative estimates on male names. For lists with high-frequent and medium-frequent names and for lists with medium-frequent and unknown names an effect of a similar size is the result (s. Tab. 2.3) which, according to Cohen’s conventions (1992) can be described as medium to large. On the two lists that combined high-frequent and unknown names the estimates 37 on male names differed even more which is reflected in a further increased effect size. In general, the pre-experimental frequencies normatively irrelevant for the frequency judgment task thus seem to have a strong influence on the quantitative estimates. A similar picture results from the participants’ ordinal frequency judgments. Table 2.4 shows for each of the six study lists the percentage of those participants who judged that the list contained more male than female names. For each of the three lists with fewer male than female names a vast majority of participants was of the erroneous opinion that they were presented with more male names. In contrast to this, only a minority of participants judged that male names were more frequent on those three lists that really contained more male names. Thus, both in the quantitative and in the ordinal frequency judgments an effect analogous to the famous-names effect is found: The relative frequency of that sex represented with pre-experimentally more frequent names on the study list is overestimated. Table 2.4 Percentage of the participants who judged male names to be more frequent on each of the six study lists in Experiment 1. For all lists N = 16. Study List 19 hfm/20 ukf 19 hfm/20 mff 19 mfm/20 ukf 19 mff/20 ukm 19 hff/20 mfm 19 hff/20 ukm 87.5 75.0 81.3 37.5 25.0 12.5 Note. hfm = high-frequent male names; mfm = medium-frequent male names, ukm = unknown male names; hff = high-frequent female names; mff = medium-frequent female names, ukf = unknown female names How well are the frequency estimates predicted by the recall-estimate strategy and the memory models? The quality of the prediction of the recall-estimate strategy can not only be evaluated according to the progression of the estimates across the six study lists but also by the magnitude of the estimates. As Figure 2.1 illustrates, the recall-estimate strategy comes off badly according to this criterion. On all study lists the recall-estimate approach expects a greater deviation between the correct and estimated percentage of male names than actually occurs. Only on the list with highfrequent female names and medium-frequent male names predictions and estimates have a similar size. In contrast, on the four lists containing unknown names the bias in the estimates is overestimated by at least 25%. 38 As already explained, the magnitude of the predictions of the memory models resulted from a fitting with the actual data. A comparison of the quality of the predictions of the memory models and the recall-estimate strategy can hence be exclusively based on the progression of the estimates. A visual inspection of the data reveals that the progression of the estimates across the three lists with preexperimentally more frequent female names corresponds to the prediction of the memory models. Here the estimates decrease together with the differences of the preexperimental frequencies of occurrence of male and female names. In contrast to this, the estimates on the three lists with pre-experimentally more frequent male names rather follow the prediction of the recall-estimate strategy. The smallest mean estimate for the percentage of male names can be found here on the list with high-frequent male and medium-frequent female names. Furthermore, as expected by the recall-estimate strategy, the mean estimate to the list with medium-frequent male and unknown female names is similarly large as the mean estimate to the list with high-frequent male and unknown female names. To assess the covariation between the predictions and the estimates more exactly, we used contrast analyses (Rosenthal et al., 2000). As fit measure, contrast analysis provides an effect size that can be interpreted as the correlation of the predicted estimates on the six study lists with the individual estimates of all 96 participants. Table 2.5 displays the results of the contrast analyses for the predictions of the memory models and the predictions of the recall-estimate strategy. Both approaches attain high and nearly identical effect sizes. As already suggested by the visual inspection of the data, the predictive performances of the memory models and the recall-estimate strategy can thus be judged alike. Table 2.5 Contrast Analyses for the Predictions of the Memory Models and the Recall-Estimate Strategy in Experiment 1 Prediction MScontrast MSerror dferror reffect size Memory Models Recall-Estimate Note. 2983,6 3153,6 136,3 136,3 90 90 .43 .44 F values can be calculated by dividing MScontrast by MSerror. 39 In the experiment we determined a recall ratio for every participant on the basis of the results of the recall test. The recall-estimate strategy thus allows, in contrast to the memory models, not only the prediction of the mean estimates on the six study lists but also the prediction of individual estimates. Therefore, we calculated additionally the correlation between the 96 individual recall ratios and the corresponding estimates. This correlation amounted to r = .44. However, the relation between the recall ratios and the estimates differed somewhat between the two task orders. For the 48 participants who completed the recall test before the frequency judgments the correlation amounted to r = .52. For the participants who submitted their frequency judgments first a correlation of r = .36 was the result (Fisher-Z = 0.944, p = .171 (onetailed)). Discussion The primary objective of the present experiment was to check whether an effect analogous to the famous-names effect can also be observed if the names presented on the study lists are not selected according to their fame but according to their preexperimental frequency of occurrence. The answer to this question is an unambiguous ‘Yes’. The pre-experimental frequencies of occurrence of female and male names exerted a strong influence on the quantitative frequency judgments. Moreover, in the ordinal frequency judgments to all six study lists, a majority of participants chose the sex with the pre-experimentally more frequent names although the names of this sex were actually contained less frequently on the lists. In the original study by Tversky and Kahneman (1973) 81% of the participants judged falsely that they were presented with more famous names. In the current study those lists with high-frequent and medium-frequent names are the most similar to the lists with famous and less-famous names used by Tversky and Kahneman. For these lists 75% of the participants judged that the number of pre-experimentally more frequent names was greater. The size of the effect observed here hence seems to be comparable with the size of the effect in the original study. 40 Although the participants’ frequency judgments were systematically biased, the absolute deviation between the quantitative estimates - that were not collected in the original study - and the normatively correct responses was rather small. For example, the difference between the mean estimated and the actual percentage of male names on the lists with high-frequent and medium-frequent names amounted to little more than 5%. Similarly minor deviations have also been found in earlier studies using famous and unknown names (Lewandowsky & Smith, 1983) or famous and less-famous names (Manis et al., 1993). The fact that the absolute size of the error is rather small may well qualify the far-reaching conclusions that were drawn in the literature on judgment and decision making also with respect to the famous-names effect. From the view that frequency and probability judgments were generally extremely susceptible to biases the recommendation accrued here to delegate important decisions exclusively to trained decision-makers or to computers (e.g. Bazerman, 2001; Lichtenstein et al., 1978; Shanteau, 1978). This recommendation appears to be exaggerated – at least as far as it is based on the famous-names effect. From an estimate of a little over 50% for an actual frequency of 48.7% normally no serious damage will be caused even in case of critical decisions. In fact, the results of the present experiment can also be seen as evidence for the thesis that memory for frequencies is less error-prone than other memory performances (e.g. Hasher & Zacks, 1984). In accordance with this assumption, the recall ratios for all six study lists were biased more than the frequency judgments. Obviously, problems result from this for the recall-estimate approach. If the participants had used the number of recalled male and female names as sole indicator of the requested frequencies, their estimates would have to have deviated more from the normatively correct responses. Thus, if one assumes that the participants used a recall-estimate strategy to generate their estimates, one has to assume at the same time that they would dispose of any kind of additional frequency knowledge. This knowledge should make it possible to correct the strong recall bias at least to some degree. Several previous studies indicate that recall performances are oftentimes a cue too bad for frequencies to be able to explain the comparatively precise estimates completely (Watkins & LeCompte, 1991; Lewandowsky & Smith, 1983; Manis et al., 41 1993). In the literature to the recall-estimate approach yet no suggestions can be found which additional information is possibly used for the generation of frequency estimates and how this information can be integrated with the recall performances. In spite of this problem of the recall-estimate approach, the results of Experiment 1 were inconclusive with regard to the question which estimation strategy produced the bias in the participants’ judgments. The covariation between the predictions of the recall-estimate strategy and the estimates was as high as the covariation between the predictions of the memory models and the estimates. Therefore, the assumption that the participants relied on the recall-estimate strategy remains as plausible as the assumption that they used a retrieval-free memory assessment process to generate their judgments. Altogether, the findings from Experiment 1 thus confirmed that the necessary preconditions of the alternative explanation for the famous-names effect by the memory models are fulfilled: the presentation of names with different preexperimental frequencies led to a bias in the estimates which corresponded to the famous-names effect in its size. Besides, the estimates could be predicted by the memory models, the sole input of which are the pre-experimental frequencies, as well as by the recall-estimate strategy. In order to be able to put down the famous-names effect unambiguously to the differences in the pre-experimental frequencies of occurrence, a confounding of fame and frequency of occurrence would have to be avoided. This was not the case in the present experiment. The assumption that names mentioned in the text corpus more often are also judged to be more famous was also confirmed in a classroom study. We asked 64 participants to rate the fame of 160 names for which we had determined citation frequencies on 9-point scales. The mean ratings of the names correlated with the logarithm of the frequency of occurrence in the corpus with r = .48. For the 78 ‘known’ names used in the experiment showing the greatest and smallest citation frequencies the correlation amounted to r = .60. The mean ratings of the high-frequent male and female names were 6.1 and 5.1, respectively. For the medium-frequent male and female names the corresponding numbers amounted to 4.5 and 4.0. This leaves open the interpretation that the bias in the frequency judgments observed in 42 Experiment 1 is indeed not caused by the different pre-experimental frequency of names but by other characteristics of famous names. A critical test of the memory model hypothesis thus requires that the pre-experimental frequency of occurrence of the stimuli is varied independent of fame. To provide such a test was the objective of Experiment 2. 43 Experiment 2 In Experiment 2 we used exemplars of the categories ‘plants’ and ‘animals’ as stimulus material. From this material we constructed two study lists. On one of the lists animals showed the greater pre-experimental frequency, on the other plants did so. According to the hypothesis of the memory models, we should find the same effect as in Experiment 1 with this material: The category with the pre-experimentally more frequent exemplars should each time receive higher frequency estimates although fewer of its exemplars are included in the study list. The differences between the preexperimental frequencies of occurrence of plants and animals selected as stimulus material were greater than the differences between the pre-experimental frequencies of occurrence of female and male names on each of the six study lists in the previous experiment. The memory models attribute the famous-names effect exclusively to these differences. On the assumption that the extra-experimental encoding contexts of names are not systematically more similar to the experimental encoding context than those of plants and animals, the effect should even be greater in the present study. Method Participants: Forty persons took part in the experiment (27 men and 13 women, with an average age of 25 years). The participants were students of the University Paderborn, most of them from the departments of economics and engineering. Students of Psychology did not take part in the experiment. The participants received a reimbursement of 3 Euros. Material: For the selection of the stimulus material we first determined the frequency of occurrence of 220 animals and 121 plants in the same text corpus we had already used in Experiment 1. From each of the two categories we selected 19 exemplars with a frequency of occurrence as high as possible and 20 exemplars with a frequency of occurrence as low as possible. Exemplars with less than 15 listings in the corpus, however, were not taken in the study lists. Additionally, words with more than three syllables were excluded. Finally, the total number of syllables of the 19 plants and 19 animals with a high frequency of occurrence was matched. The same was done 44 with the total number of syllables of the 20 plants and animals with a low frequency of occurrence. The selected exemplars were combined to a study list with 19 high-frequent animals (e.g. Löwe (lion), Ente (duck), Biene (bee)) and 20 low-frequent plants (e.g. Eibe (yew), Aster (aster), Hirse (millet)) and to a study list with 19 high-frequent plants (e.g. Rose (rose), Tanne (fir), Tomate (tomato)) and 20 low-frequent animals (e.g. Hyäne (hyena), Fasan (pheasant), Libelle (dragonfly)). Table 2.6 displays the sum totals of the frequencies of occurrence in the text corpus for animals and plants on each of the two study lists. Table 2.6 Sums of the frequencies of occurrence in the text corpus for animals and plants on the study lists in Experiment 2 study lists animals plants 19 high-frequent animals / 20 low-frequent plants 31617 2052 19 high-frequent plants / 20 low-frequent animals 2029 22188 Procedure: The results of a pilot study with identical test material indicated that the participants did not categorize the presented words into plants and animals on their own. The participants’ frequency judgments for plants and animals did on both lists not add up to 100% but to approximately 80%. In a post-questioning very many participants stated that they subdivided the test material into categories like trees, flowers, vegetables, birds, carnivores etc.11 Apparently, the relevant categories plants and animals were not salient during the presentation of the study lists. In contrast, one can assume that the sex categorization relevant in the famous-names study and in Experiment 1 was automatically perceived (Taylor et al., 1978; Fiske & Neuberg, 1990; Messieck & Mackie, 1989). Therefore, to ensure that the presented words were categorized into plants and animals, we realized some changes in Experiment 2 with regard to Experiment 1. Before the presentation of the study lists the participants were 11 This observation can be viewed in line with earlier findings demonstrating that the otherwise quite stable sensitivity to frequencies completely breaks down when participants are questioned about the frequency of categories which they did not perceive during the encoding of the stimuli (Barsalou & Ross, 1986; Freund & Hasher, 1989; Conrad, Brown, & Dashen, 2003). 45 informed that the list would exclusively include animals and plants. Furthermore, the participants worked on a categorization task during the presentation of the lists. After the presentation of each word, two fields with the labels “plant“ and “animal“ appeared on the screen. The participants should indicate by a mouse click to which of these categories the last word presented belonged. Right after this mouse click, the next word appeared. With the exception of these changes, the procedure was identical with the procedure from Experiment 1: One of the two study lists was randomly assigned to every participant. The presentation time for a word was three seconds. The sequence of the presentation of the 39 words on one study list was randomly determined for every participant. After the study phase, every participant was asked for an ordinal and a quantitative frequency judgment. In addition to this, the participants completed a recall test. The results from Experiment 1 indicated that the order in which the frequency judgments tasks and the recall test are completed possibly has an influence on the correlation between the individual recall ratios and the quantitative frequency judgments. To test whether this result can be replicated, the task order was again varied. Predictions of the Memory Models For the derivation of quantitative predictions of the memory models we used the parameter setting a = 0.001 that was estimated from the magnitude of the frequency judgments in Experiment 1. This parameter setting and the sums of the frequencies of occurrence of animals and plants in the text corpus given in Table 2.6 determine the predictions of the memory models. For the list with high-frequent animals and low-frequent plants an estimate of 69.6% on the relative frequency of animals is predicted. For the list with high-frequent plants and low-frequent animals, on the other hand, an estimate of 34.8% on the same relative frequency is expected (s. a. Fig. 2.2). Results An analysis of the results of the categorization task revealed that 10 out of 40 participants categorized one word incorrectly. Further 5 participants made 2 to 5 46 categorization errors. The vast majority of all categorization errors occurred with plants and animals with a low corpus frequency. Besides, some words were categorized wrongly by several participants (e.g. Zeisig (siskin), Flunder (flounder), Hirse (spelt)). This suggests the conclusion that at least a part of the errors can be ascribed to ignorance of the meanings of the corresponding words and not to careless mistakes or a lack of attention during the completion of the categorization task. We hence excluded the 5 participants from the further analysis who made more than one categorization error. Moreover, we carried out all analyses described below twice. Once with the data of the 25 participants without mistakes and a further time with the altogether 35 participants who made none or one categorization error. Since the results of both analyses did not differ significantly and led to the same conclusions, we only report those results below that are based on 35 participants (19 in the condition “19 high-frequent animals / 20 low-frequent plants” and 16 in the condition”19 highfrequent plants / 20 low-frequent animals”). Predictions of the Recall-Estimate approach As in Experiment 1 we calculated a recall ratio for every participant. This ratio here indicates the percentage of recalled animals among all recalled exemplars. Since the recall ratios from both task order conditions did not differ, these conditions were summarized in the analyses reported below. The recall ratios were again influenced by the pre-experimental frequency of occurrence of the words. Figure 2.2 displays the corresponding means on the study lists. For both study lists the participants recalled more words from the category with the pre-experimentally more frequent exemplars. The calculation of the effect size of the list composition on the recall ratios resulted in reffect size = .29 (t (33) = 1.738), a medium effect according to Cohen’s conventions (1992). Consequently, also the recall-estimate strategy predicts an overestimation of the percentage of animals on the list with high-frequent exemplars of this category. At the same time, an underestimation of the percentage of animals on the list with lowfrequent exemplars is expected. However, different from the situation in Experiment 1 the absolute size of the bias in the predictions of the recall-estimate strategy is clearly 47 smaller than in the predictions of the memory models. Thus, the predictions of this strategy deviate from the correct responses on the list with high-frequent animals by 6.0% and on the list with low-frequent animals even only by 3.2%. Memory Model Predictions Recall-Estimate Predictions Estimates Percentage of Animals 70 60 50 40 30 0 19 hfa / 20 lfp 19 hfp / 20 lfa Figure 2.2. Mean estimated percentages of animals on the two study lists and the corresponding predictions of the memory models and the recall-estimate strategy in Experiment 2. Error bars denote standard errors. hfa = highfrequent animals, lfp = low-frequent plants, hfp = high-frequent plants, lfa = low-frequent animals. Frequency Estimates No differences in the magnitude of the quantitative estimates or in the ordinal frequency judgments were associated with the variation of the task order. Therefore, in the following analyses, the data of the participants from both task order conditions are combined again. The frequency judgments were not influenced by the different pre-experimental frequencies of animals and plants. Contrary to the predictions of the memory models and the recall-estimate strategy, the mean estimate on the relative frequency of animals 48 on the list with low-frequent animals was slightly higher than on the list with highfrequent animals (s. Fig. 2.2). The effect size of the list composition on the quantitative frequency judgments was reffect size = -.09 (t (33) = -0.512). Furthermore, the mean estimates on the relative frequency of animals on both lists corresponded nearly exactly to the actual proportions. On the list with high-frequent animals the deviation amounted to 1.7%, on the list with low-frequent animals it was 0.9%. Accordingly, also in the ordinal frequency judgments no indication to an influence of the preexperimental frequencies of occurrence could be found. Eleven of the nineteen (58%) participants presented with the list with high-frequent animals judged that the list contained more animals. In case of the list with low-frequent animals, nine out of sixteen (56%) participants made the same judgment. As in Experiment 1, we also determined in the present data the relation between the individual recall ratios and individual frequency judgments. As suggested by the previous results, this correlation was with r = .04 close to zero. However, once more, differences between the correlations in both task order conditions were found. In the condition with a preceding recall test a positive correlation of r = .38 (N = 17) was the result although even in this condition the overestimation of the pre-experimentally more frequent category predicted by the recall-estimate strategy could not be observed. On the other hand, in the condition in which the estimates were given before the recall test was completed, the correlation was r = - .29 (N = 18; Fisher-Z = 1.886, p = .06 (one-tailed)). Discussion In the second experiment stimulus material was used the pre-experimental frequency of occurrence of which does not covariate with fame. As opposed to the expectations of the memory models, for this stimulus material no influence of the preexperimental frequency of occurrence on the frequency judgments could be found. As becomes clear from the quantitative predictions of the memory models, in the present experiment an even greater effect should have occurred than in Experiment 1 due to the larger differences in the pre-experimental frequencies of the items. The results of the second experiment hence suggest the conclusion that the bias in the estimates on 49 female and male names in Experiment 1 was produced by different characteristics of the names than their pre-experimental frequency. The results thus also disagree with the hypothesis that the famous-names effect is triggered by the different preexperimental frequency of occurrence of famous and non-famous names. Since the explanation of the memory models for the famous-names effect is based on this hypothesis, consequently the results oppose the assumption that estimates showing this bias are generated by a retrieval-free memory assessment process. However, also the recall-estimate approach cannot offer a satisfying explanation for the findings of the present experiment. It is true that a clearly smaller bias in the estimates was expected by the recall-estimate strategy than by the memory models. The deviation of the mean estimates from the predictions was thus less high for the recall-estimate approach. On the other hand, the results provide hardly any evidence for the assumption that the frequency judgments were in any way influenced by the number of retrieved animals and plants. With the recall-estimate strategy as well it had to be expected that the estimated percentage of animals on the list with preexperimentally high-frequent animals was higher than on the list with preexperimentally low-frequent animals. This effect did not appear. Furthermore, the individual estimates did not correlate with the individual recall ratios. Merely in the condition with a preceding recall test a moderately positive correlation between these variables could be found. However, also in this condition the relation was apparently too weak to produce the expected effect in the mean frequency judgments. Nevertheless, the fact that higher correlations between recall ratios and estimates were found in both experiments if the recall test was completed before the frequency judgment tasks may be significant. The recall test provides the participants inevitably with information about the number of retrieved exemplars. Possibly, the participants would use this information less or not at all to generate their frequency judgments without the test (Betsch et. al, 1999; Bruce, Hockley, & Craik, 1991). The preceding completion of a recall test may thus foster the use of a recall-estimate strategy. Why did the bias expected both by the memory models and by the recallestimate approach not occur in the estimates in Experiment 2? From the view of the memory models, a possible explanation can be found in the assumption that the extra- 50 experimental encoding contexts of animals and plants are less similar to the experimental encoding context than the extra-experimental encoding contexts of proper names. Thus, it would be inappropriate to model the influence of the extraexperimental frequencies on the estimates in both cases with the same setting for the parameter a. Indeed, such an assumption might explain slight differences in the size of the effect for proper names on the one hand and plants and animals, on the other. However, it seems to be little plausible that the extra-experimental encoding context of plants and animals allows a perfect discrimination of the experimental context while the extra-experimental context of proper names results in a considerable influence on the experimental estimates. Further possible explanations for the absence of the expected bias result from the differences in the procedures of Experiment 2 and Experiment 1. To manipulate the variable ’pre-experimental frequency of occurrence’ independently of fame, in the present experiment words signifying a whole class of objects (e.g. “goat“) were used instead of proper names. Moreover, this change of stimulus material required introducing a categorization task during the presentation of stimuli. Finally, this categorization task showed that a part of the participants had difficulties to classify the stimulus material correctly. The fact that the participants overtly categorized the presented words might possibly explain why the estimates were not related to the recall performances. In a study by Bruce, Hockley and Craik (1991) the participants estimated the number of exemplars of different categories on a study list previously presented. During the presentation, one group of participants completed a categorization task. It could be demonstrated that the estimates of this group correlated clearly weaker with the corresponding recall performances than the estimates of a group of participants not completing a categorization task. This indicates that a categorization task can hamper the use of a recall-estimate strategy. If an effect of extra-experimental frequencies on category size judgments was exclusively mediated via the use of this strategy this could besides explain why no bias occurred in the present experiment. The fact that at least some participants could not categorize the presented words correctly into animals and plants provides, on the other hand, a further possible 51 explanation for the estimates deviating clearly from the predictions of the memory models. In the derivation of the memory model predictions we assumed that the presented exemplars activate the memory representation of their superordinate categories already at the stage of encoding. This activation can certainly only occur if the participants learned a stable connection between exemplars and categories beforehand. As far as the observed categorization errors can be interpreted as an indication to the fact that the connection between the presented items and the categories “animals” and “plants” was generally only weakly marked, this activation could take place only to a reduced extent or not at all. In this case, the predictions of the memory models would be invalid. To exclude the alternative explanations for the absence of the bias in Experiment 2, we designed the presentation of the target list in the following experiment just as in Experiment 1. As stimulus material we used again proper names. 52 Experiment 3 To prevent a confounding of frequency of occurrence and fame also in Experiment 3, the participants were exclusively presented with proper names unknown to them before the experiment. Consequently, the participants did not dispose of any differential knowledge with regard to the pre-experimental frequency of occurrence of these names. Therefore, to be able to test the hypothesis of the memory models that estimates on the frequencies of categories in a target context are influenced by the frequencies of the corresponding exemplars outside this context we presented the participants with two lists. The arrangement of the target list followed the design of the famous-names study. On the previously presented non-target list the proper names occurred with a varied frequency. Method Participants: Eighty-one persons took part in the experiment (16 men and 65 women, with an average age of 21 years). The participants, who were students of introductory courses in psychology at the TU Chemnitz received course credits for their participation. None of them had taken part in the earlier experiments. Material: On the non-target list the participants were presented with 78 different names consisting of first names and surnames. The first names were 39 male and 39 female names with a similar frequency of occurrence in the text corpus. The surnames came from the telephone directory of the city of Münster. We selected bisyllabic names which were not listed more than 20 times and which had no common meaning. First and surnames were put together randomly to the names presented on the non-target list for every participant. From the resulting 39 female and male names 19 were presented on the non-target list ten times. Each of the remaining 20 female and male names was shown once. Altogether, 420 presentations took place, 210 of female names and 210 of male names. The target list again consisted of 39 names that were presented once. Analogous to the famous-names experiment this list either contained 19 male or 20 female names or 20 female and 19 male names. For the sex that was represented with 19 names on the target list only those names were selected that had been presented ten 53 times on the non-target list. The 20 names of the other sex were those names that had only been shown once on the non-target list. The sum total of the frequencies of occurrence of the names of both sexes outside the target list thus amounted to 190 and 20. Procedure: The experiment always started with the presentation of the nontarget list. The participants were tested individually or in groups of up to three persons at personal computers. Every participant was randomly assigned to one of the two target list compositions. The participants were first informed that this experiment was a memory study. Furthermore, they were told that in the first phase of the experiment they were presented with a list of names for about 30 minutes. In addition to this, it was explained to them that within this list repetitions of names could occur. The order of the presentation of names on the non-target list was randomly determined for every participant with the restriction that the same name could not occur twice in immediate succession. The presentation time was three seconds. During this first phase of the experiment, the participants had to complete an additional task. The purpose of this task was to guarantee that the participants encoded the 420 presentations of names as completely as possible. For this, a letter (A, E, I, O or L) was inserted on the screen after every presentation of a name. The participants should now indicate by clicking a corresponding response option whether this letter was contained both in the previously presented first name and in the previously presented surname. As soon as the participants selected a response option, an interstimulus interval of 0.5 seconds followed in which the screen was blank. Before the presentation of the nontarget list started, the participants completed five practice trials. Within the practice trials names were used that did not occur again in the further course of the experiment. During the presentation of the non-target list, the participants were asked to pay more attention if they made at least two wrong decisions in ten successive trials. As has already been delineated, the non-target list in Experiment 3 was introduced with the aim to induce differential pre-knowledge about the frequency of occurrence of the different names. In order to test how far the depicted procedure is successful, we carried out a pre-test with 20 participants. After the presentation, each of these 20 participants gave estimates on the frequency of occurrence of 30 names 54 that had been shown before 10 times, 1 time or 0 times. The corresponding mean frequency judgments amounted to 13.10 (Md = 10.40), 2.08 (Md = 1.00) and 0.33 (Md = 0.10). Consequently, one can assume that the participants’ pre-knowledge about the frequency of occurrence of different names was manipulated successfully by the presentation of the non-target list. Subsequent to the presentation of the non-target list, the experiment was interrupted for 3 or 24 hours. After the expiration of this interval, the participants were presented with the target list in the second phase of the experiment. The intermission between the two phases of the experiments was introduced to decrease the probability that the participants become aware of the relation between the sex of the names on the target list and the frequency of the names on the non-target list. The length of the interval and the composition of the target list were counterbalanced. During the presentation of the target list, the participants completed no further task. Duration and order of the presentations on the target list were determined as in Experiment 1. Also the arrangement of the frequency judgment tasks and the recall test followed the procedure from Experiment 1. However, it was pointed out to the participants that their estimates should exclusively refer to the frequency of male and female names in the second phase of the experiment. Predictions of the Memory Models According to the memory models, the extent to which estimates on the frequency of stimuli on a target list are influenced by the frequency of occurrence of the stimuli outside the target list should depend on the similarity of the encoding contexts. In the present experiment this similarity is evidently large. Both the nontarget list and the target list were encoded here within the same experimental context under nearly identical conditions. Thus, the influence of an item occurring on the nontarget list should be greater than the influence of an extra-experimentally encoded item. Hence, the parameter a with which this influence is modeled has to adopt a clearly higher value than in the former experiments. To estimate the parameter a, we relied on the results of two earlier studies by Dougherty and Franco-Watkins (2003) and Hintzman and Block (1971). In both studies the influence of frequencies of stimuli 55 outside the target context on frequency of occurrence judgments was examined. In addition to this, in both studies as in the present experiment the frequency outside the target context was manipulated by means of a non-target list. In the four experiments by Dougherty and Franco-Watkins all words for which the frequency had to be estimated occurred four times on the target list. The frequency of the same words on the non-target list, however, varied between 0 and 16. In all experiments the estimates on the frequency of words on the target list increased linearly with the frequency of these words on the non-target list. The size of the increase was influenced by the second independent variable in the experiments, the similarity of target and non-target list. In all experiments for one group of participants both lists were designed identically. Averaged across experiments, participants in this condition gave a mean estimate of about 3.4 on words that did not appear on the non-target list. Thus, every presentation on the target list increased the estimate by about 0.85 units. The mean estimate for words included 16 times in the non-target list, on the other hand, amounted to about 7.3. We can conclude from these figures that a presentation on the non-target list increased the estimate by about 0.25 units. The influence of presentations on the non-target list relative to the influence of presentations on the target list was thus slightly below 0.3. However, the increase in the estimates induced by the non-target list was at least in a part of the experiments smaller when this list was not designed exactly as the target list. The increase remained unchanged in a second condition of two experiments in which the two lists differed by the background on which the words were presented. The discrimination of both lists improved if the words on the non-target list were not presented alone but together with a varying context word. The same finding resulted if the participants in a second condition were instructed to use a different encoding strategy during the presentation of the non-target list than during the presentation of the target list (mental imagery and rote rehearsal, respectively). In both cases, the relative influence of presentations on the non-target list amounted to approximately 0.1. In the study by Hintzman and Block (1971) target and non-target list were designed identically and presented with identical instructions. The frequency of the 56 words to be estimated varied on both lists between 0 and 5. Irrespective of the frequency on the target list, a linear increase of the estimates with the frequency on the non-target list was also found here. The relative influence of presentations on the nontarget list was again at approximately 0.3 (see Hintzman (1988) for a re-analysis of the same data in which a similar approach to determine this influence was used). Thus, at least in case of identically designed target and non-target lists this value seems to be fairly stable. Since it is assumed in the memory models that category size judgments are influenced in the same way as frequency of occurrence judgments by the frequency of the relevant stimuli outside the target context, the presentations on the non-target list should have a similarly great influence also in the current experiment. The setting for the parameter a should thus be somewhere between 0.1 and 0.3. It can hardly be assessed whether the encoding task that our participants completed only during the presentation of the non-target list allows an improved discrimination of the two lists. We decided in favor of a conservative prediction. With the lowest plausible parameter value of a = 0.1, the memory models expect an estimate of 63.3% on the relative frequency of male names on the target list with 19 male and 20 female names. Correspondingly, on the list with 20 male and 19 female names an estimate of 36.7% is predicted for the same relative frequency (s. a. Fig. 2.3). Since frequency estimates remain relatively stable even over a period of up to one week (Bruce, Hockley, & Craik, 1991; Leicht, 1968), we used these predictions both in the condition in which 3 hours were between the non-target and the target list and in the condition in which this interval lasted 24 hours. Results Predictions of the Recall-Estimate approach We determined individual recall ratios for every participant in the same way as in Experiment 1. Figure 2.3 displays the mean recall ratios on the different target lists separately for both interval conditions. The order in which the frequency judgment tasks and the recall test were completed again had no effect on the recall ratios. 57 As shown in Figure 2.3, the frequency of occurrence of the names on the nontarget list affected the recall ratios strongly. Male names on the target list were recalled clearly more often than female names if they had been previously presented more often on the non-target list. On the contrary, male names were recalled worse than female names if they had been included on the non-target list less often. This effect can be observed likewise in both interval conditions. The effect size of the list composition on the recall ratios is in both conditions large with reffect size > .60 (s. a. Tab. 2.7). Interval Non-Target - Target List: 3h Interval Non-Target - Target List: 24h 60 70 Memory Model Prediction Recall-Estimate Prediction Estimates 50 40 30 Percentage of Male Names Percentage of Male Names 70 60 50 40 30 0 0 19 males / 20 females 19 females / 20 males 19 males / 20 females 19 females / 20 males Figure 2.3. Mean estimated percentages of male names on the target lists in the two conditions with different intervals between non-target and target lists in Experiment 3. The corresponding predictions of the memory models and the recall-estimate strategy are also shown. Error bars denote standard errors. Names of the sex represented by 19 names on the target list appeared 10 times on the non-target list. The names of the other sex were presented once on the non-target list. Consequently, also the recall-estimate strategy expects a large influence of the frequencies of names on the non-target list on the estimates on the relative frequencies of female and male names on the target list. Generally, the bias in the estimates predicted by the recall-estimate strategy is marginally greater than the bias expected by the memory models. The predicted deviation between the estimated and the actual percentage of male names on the target lists varies in both interval conditions between 15% and 19%. The size of these deviations corresponds to the bias predicted by the 58 recall-estimate strategy also on the lists with pre-experimentally high-frequent and medium-frequent names in Experiment 1 (cf. Fig. 2.1). Table 2.7 Effect sizes of the list composition on recall ratios and frequency estimates for both interval conditions Experiment 3 Recall Ratio Interval Frequency Estimates reffect size t reffect size t df 3h .62 4.94 .11 0.66 39 24 h .67 5.51 .11 0.67 38 Frequency Estimates Like in the preceding experiments, the variation of the task order had no influence on the magnitude of the quantitative estimates or the ordinal frequency judgments. In the analyses below the task order conditions have thus been summarized. As shown in Figure 2.3, only a weak effect on the frequency judgments was associated with the manipulation of the frequencies of names on the non-target list. As expected by the memory models and the recall-estimate strategy, the relative frequency of male names on the target list on which the actual proportion was 48.7% was judged higher than on the target list on which the actual proportion amounted to 51.3%. This holds in the condition in which 3 hours were between the presentation of the non-target and the target list as well as in the condition in which the lists were separated by a 24 hour interval. However, this bias in the estimates was generally very small. In no interval condition and on no target list the deviation between the estimated and the actual relative frequency of male names exceeded 2.8%. Accordingly, also the effect sizes of the list composition on the frequency judgments in both interval conditions were small with reffect size = .11 (s. Tab. 2.7). The presentation frequencies on the nontarget list thus exerted a clearly smaller influence on the frequency judgments than expected by the memory models and the recall-estimate approach. 59 The same picture results from the ordinal frequency judgments. In case of the target list with 19 male and 20 female names, in both interval conditions only a small majority of participants judged that the list contained more male names (3h-interval: 57% (N = 21); 24h-interval: 55% (N = 20)). On the list with 19 female and 20 male names a narrow minority of participants was of the same opinion (3h- and 24hinterval: 45% (in both conditions N = 20)). As suggested by the large difference between the bias predicted by the recallestimate approach and the actually observed bias, the correlations between the individual recall ratios and the individual frequency judgments were small. With a 3-hour interval between non-target and target list this correlation amounted to r = .15 (N = 41). After a 24-hour break the correlation was r = .07 (N = 40; Fisher-Z = 0.351, p = 0.72 (two-tailed)). Like in the previous experiments, these correlations were again higher in the condition in which the recall test preceded the frequency judgments than in the condition in which the frequency judgments were submitted first. However, this time the differences between the correlations in the two task order conditions were close to 0 (3h-interval: r = .18 and r = .15; 24h-interval: r = .10 and r = .04). Discussion In Experiment 3 the frequencies of occurrence of the stimuli outside the target context were manipulated by the presentation of a preceding non-target list. According to the hypothesis of the memory models, this manipulation should exert a considerable influence on the estimates on the relative frequency of the superordinate categories of the stimuli in the target context. In fact, a slight but in both interval conditions consistent effect on the frequency judgments was found. Thus, the results of the present experiment turn out somewhat more advantageous for the memory models than the results from Experiment 2 in which no effect at all was found for animals and plants with a pre-experimentally different frequency. Nevertheless, both experiments largely lead to similar conclusions. In Experiment 3 the bias in the frequency judgments was again clearly smaller than expected by the memory models. This also means - contrary to the predictions - that the bias turned out much smaller than the bias in the estimates on pre-experimentally high- and medium-frequent names in 60 Experiment 1. The absence of a greater effect cannot be ascribed to any changes in the procedure of the presentation of the target lists with regard to Experiment 1. In both experiments the participants completed no categorization task during the presentation of these lists. Also the alternative explanation that the participants had difficulties with the assignment of exemplars to categories that was plausible in Experiment 2 can be excluded for the current experiment. All presented exemplars were common first names (e.g. Matthias, Franz, Felix, Claudia, Helga, Inge) that should allow every participant an easy identification of the corresponding sex. Thus, the results of Experiment 3 indicate that the frequencies of exemplars outside the target context can have an influence on estimates on the corresponding categories if these alternative explanations are excluded. Nevertheless, the results also confirm the conclusion from Experiment 2 that contrary to the hypothesis of the memory models this effect is not suitable to explain the famous-names effect. The influence of the frequencies outside the target context was again too small to be regarded as the driving force behind the famous-names effect. Also with regard to the recall-estimate approach, similar conclusions result from Experiment 3 as from Experiment 2. In the present case, this approach predicted an equally great bias in the estimates as the memory models. Once more the estimates turned out to be considerably less error-prone than the recall performances underlying this prediction. Moreover, the low correlations between the individual recall ratios and estimates show that not more than 2% of the variance in the estimates can be explained with the recall-estimate strategy in both interval conditions. In other words, the participants obviously did not use this strategy to generate their estimates. Particularly noteworthy is furthermore that the bias predicted by the recall-estimate approach was as large as the bias predicted for the lists with high-frequent and medium-frequent names in Experiment 1. Thus, a bias in the recall ratios of the size observed with these names is apparently not sufficient to induce a bias in the estimates that corresponds to the famous-names effect. A last and important implication for the memory models of the findings from Experiment 3 results from a comparison with earlier studies in which the influence of frequencies on non-target lists on frequency of occurrence judgments was examined. 61 As already described above, the relative influence of presentations on the non-target list on estimates about the frequency of occurrence of words on the target list was in the studies by Hintzman and Block (1971) and Dougherty and Franco-Watkins (2003) fairly stable at approximately 0.3. However, this was only valid if target and non-target list were designed identically. As far as differences in the instructions to or in the design of the lists existed, the influence varied between 0.1 and 0.3. The explanation of the memory models for the famous-names effect is based on the assumption that category size judgments are influenced in the same way by frequencies outside the target context as frequency of occurrence judgments. We hence based the predictions of the memory models on these values. Since in the present experiment the non-target and target list were not designed completely identical, we used the lowest plausible parameter setting of a = 0.1. Despite this conservative decision, the bias in the category size judgments was clearly overestimated. An approximately correct prediction of the estimates would have resulted with a parameter setting of a = 0.01. That is, the influence of the frequencies on the non-target list on the category size judgments in the present experiment was by one order of magnitude smaller than the influence on frequency of occurrence judgments in every earlier, comparable study. What can we conclude from this observation? Given that the participants obviously did not use the number of retrievable male and female names to generate their estimates, the assumption seems to be reasonable that they relied on a kind of memory assessment process. This assessment process, however, apparently leads to a different result than expected by the memory models. The predictions of the memory models were based on the hypothesis that the memory response elicited by the category labels is proportional to the sum of the responses to the corresponding exemplars on the target list. If the participants actually used a memory assessment strategy this hypothesis is wrong. Instead, the response to category labels is apparently biased clearly less by the frequencies of the exemplars outside the target context. The results from Experiments 2 and 3 make the famous-names effect appear to be a singular phenomenon. Owing to the manipulation of the frequencies of the stimuli outside the target context, in both experiments a bias of a similar size as in the famousnames study was to be expected. Furthermore, this manipulation resulted in both 62 experiments in a strong recall bias which was in one case as great as the recall bias with famous names. Notwithstanding, a comparable effect in the frequency judgments did not occur. Therefore, the question arises whether an effect analogous to the famous-names effect can be induced at all by different stimulus material than famous names. In Experiment 4 we thus used stimulus material supposed to be as similar to the material of the original study as possible. 63 Experiment 4 The stimulus material in Experiment 4 consisted of names of German and foreign cities with varying pre-experimental frequency of occurrence (and possibly of varying fame). In contrast to Experiment 2, the stimuli were thus proper names here. Besides, other than in Experiment 3 the participants here already pre-experimentally disposed of knowledge of the frequency of occurrence and other attributes of the stimuli. In other words, the stimuli here share more characteristics with the names of famous persons than the stimuli of the former experiments. Method Participants: Eighty-one persons took part in the experiment (27 men and 54 women, with an average age of 23 years). In their large majority the participants were students of the TU Chemnitz, 26 of them came from the department of Psychology. The participants either received a reimbursement of 3 Euros or course credit for their participation. None of them had taken part in the earlier experiments. Material: Again we constructed two study lists. Analogous to the previous experiments, these lists either contained 19 German cities with a high and 20 foreign cities with a low frequency of occurrence in the text corpus or 19 foreign cities with a high and 20 German cities with a low frequency of occurrence. As pre-experimentally high-frequent German cities we used the 19 cities in Germany with the largest number of inhabitants (e.g. Berlin, Hamburg, Wuppertal, Bielefeld). For the selection of the pre-experimentally high-frequent foreign cities first a set with 50 known, international cities was created. From these 50 cities those 19 cities with the highest frequency of occurrence in the text corpus were included in the corresponding study list (e.g. New York, Paris, Moscow, Peking). The low-frequent foreign and German cities used in the experiment were selected by means of a pre-test. The aim of this pre-test was to guarantee that the cities mentioned on the study lists could be categorized correctly by the participants. The 34 participants of the pre-test were presented with a questionnaire with 60 foreign and 60 German cities with a low frequency of occurrence in the text corpus. The participants’ task was to indicate to every city whether it is a German or a 64 foreign city.12 From the altogether 120 cities 21 German and 35 foreign cities were correctly categorized by all participants. From these cities 20 were selected randomly out of each category and used as low-frequent cities on the study lists (e.g. Büsingen, Wunsiedel, Goslar, Meppen or Quito, Amiens, Sfax, Toledo). Table 2.8 displays the sum totals of the frequencies of occurrence in the text corpus of high- und lowfrequent German and foreign cities on both study lists.13 Table 2.8 Sums of the frequencies of occurrence in the text corpus for German and foreign cities on the study lists in Experiment 4 study lists German cities Foreign cities 19 high-frequent German cities / 20 low-frequent foreign cities 207,775 1,734 19 high-frequent foreign cities / 20 low-frequent German cities 4,035 104,731 Procedure: In the pilot study to Experiment 2 the participants’ estimates did not add up to 100%. At the same time, the participants stated that they subdivided the stimulus material in different categories than those to be estimated. This suggests that the participants give reasonable estimates on the size of the relevant categories only if they perceived those categories already during the encoding of the stimulus material. It appeared doubtful that the participants in the present experiment would automatically categorize the stimuli in German and foreign cities. As in Experiment 2, the participants thus completed a categorization task during the presentation of the study lists. In other words, the participants were informed in advance that the study lists would exclusively contain cities from the categories German and foreign. After each presentation of a name of a city, the participants indicated by a mouse click on an appropriate field to which category this city belongs. As in Experiment 2, the 12 None of the foreign cities included in the questionnaire was a German-speaking city. The newspapers contained in our text corpus are published in Berlin, Munich and Frankfurt. In all three newspapers their places of publication occurred with an extremely increased frequency. We thus excluded these frequencies and always estimated them by the frequencies of occurrence in the other two newspapers. 13 65 procedure was identical with the procedure in the Experiments 1 and 3 with the exception of this categorization task. Predictions of the Memory Models To gain predictions of the memory models, we used the parameter setting a = 0.001 which allowed a fairly accurate prediction of the magnitude of the frequency judgments in Experiment 1. From this parameter setting and the sums of the frequencies of occurrence of German and foreign cities in the text corpus given in Table 2.8 the following predictions result: for the list with high-frequent German and low-frequent foreign cities an estimate of 91.3% on the relative frequency of German cities is expected. For the list with high-frequent foreign and low-frequent German cities, on the other hand, an estimate of 16.3% on the same relative frequency is predicted (s. a. Fig. 2.4). Results Predictions of the Recall-Estimate Approach For every participant we determined a recall ratio indicating the percentage of recalled German cities among all recalled cities. Figure 2.4 displays the mean recall ratios on the different study lists separately for the two orders in which the frequency judgment tasks and the recall test were completed. As illustrated by the figure, the task order had once more no influence on the recall ratios. In both order conditions a strong bias in the recall ratios is the result. Generally, more cities from the category with the pre-experimentally more frequent exemplars are recalled. The effect size of the list composition on the recall ratios in both order conditions is with reffect size > .80 very large (s. Tab. 2.9). 66 Table 2.9 Effect sizes of the list composition on recall ratios and frequency estimates in both task order conditions in Experiment 4 Recall Ratio Task Order Frequency Estimates reffect size t reffect size t df Recall Test – Frequency Judgments .83 9.38 .35 2.34 39 Frequency Judgments Recall Test .89 12.68 -.06 -0.39 38 Thus, the recall-estimate approach expects again strongly distorted estimates. The predicted deviation between the estimated and the actual percentage of German cities amounts to about 30% on the list with high-frequent German cities irrespective of the order condition. On the list with low-frequent German cities this deviation comes up to approximately 16%. Task Order: Recall Test - Frequency Judgments Task Order: Frequency Judgments - Recall Test 100 100 Percentage of German Cities 90 80 70 60 50 40 30 20 10 90 Percentage of German Cities Memory Model Prediction Recall-Estimate Prediction Estimates 80 70 60 50 40 30 20 10 0 0 19 hfgc/20 lffc 19 hffc/20 lfgc 19 hfgc/20 lffc 19 hffc/20 lfgc Figure 2.4. Mean estimated percentages of German cities on the study lists in the two task order conditions in Experiment 4. The corresponding predictions of the memory models and the recall-estimate strategy are also shown. Error bars denote standard errors. hfgc = high-frequent German cities, lffc = low-frequent foreign cities, hffc = high-frequent foreign cities, lfgc = low-frequent German cities. 67 Frequency Estimates The occurrence of a bias in the frequency judgments was dependent on the task order condition. As illustrated in Figure 2.4, a bias could be found if the recall test preceded the estimates. In this condition German cities received higher estimates on the study list on which they were represented with 19 exemplars than on the study list on which they were represented with 20 exemplars. The effect size of the list composition on the estimates is similarly large as with high- and medium-frequent names in Experiment 1 (cf. Tab. 2.9 and 2.3).14 On the other hand, the estimates were uninfluenced by the pre-experimental frequency of the cities if they had been given before the recall test. In this order condition the relative frequency of German cities on both study lists was slightly overestimated (by approximately 2%). Consequently, German cities received here higher estimates on the list in which they were included more often. Accordingly, the effect size of the list composition adopts a slightly negative value (s. Tab. 2.9). As can be seen from Table 2.10, the ordinal frequency judgments confirm the results of the analysis of the quantitative estimates. Table 2.10 Percentage of the participants who judged German cities to be more frequent on the study lists in the two task order conditions in Experiment 4. Task Order: Task Order: Recall Test – Frequency Judgments Frequency Judgments - Recall Test 19 hfgc / 20 lffc 19 hffc / 20 lfgc 19 hfgc / 20 lffc 19 hffc / 20 lfgc 86% (N = 21) 50% (N = 20) 55% (N = 20) 60% (N = 20) Note. hfgc = high-frequent German cities, lffc = low-frequent foreign cities, hffc = high-frequent foreign cities, lfgc = low-frequent German cities. In the condition in which the participants completed the recall test first the correlation between the individual recall ratios and the estimates amounts to r = .38 (N = 41). In the condition in which the participants gave their estimates first this correlation comes up to r = .01 (N = 40; Fisher-Z = 1.69, p = 0.05 (one-tailed)). 14 It should be noted that the frequency of German cities on the list with high-frequent foreign and lowfrequent German cities was hardly underestimated. The results in the condition in which the frequency judgments were given before the recall test indicate that this might be ascribed to a general tendency independent of the list composition to estimate the frequency of German cities slightly higher. 68 Discussion The aim of Experiment 4 was to check whether the famous-names effect could be replicated with stimulus material that shares as many characteristics with the names of famous persons as possible. This replication succeeded at least in the condition in which the recall test preceded the frequency judgments. In this condition, the results largely corresponded to the findings from Experiment 1. Study lists with names of cities with a different pre-experimental frequency here induced a very strong recall bias. As was therefore expected by the recall-estimate strategy, also the frequency judgments were systematically distorted. The effect in the estimates was similarly large as with lists with high- and medium-frequent and lists with medium-frequent and unknown names of persons in Experiment 1. However, this means at the same time that the bias in the estimates was again clearly smaller than the bias in the recall ratios. Finally, also the correlations between the individual recall ratios and the individual frequency judgments obtained at least a roughly similar size as in Experiment 1. However, in the condition in which the participants gave their frequency judgments before the recall test, a completely different picture resulted. Although the recall-estimate approach here makes the same predictions as in the other order condition, the frequency judgments corresponded fairly well to the actual percentage of German cities on the study lists. Individual recall ratios and estimates were uncorrelated. Thus, the number of retrieved foreign and German cities had no influence at all on the estimates. Furthermore, the absence of a bias in the estimates in this order condition demonstrates that the predictions of the memory models do once again not apply. Due to the great differences in the pre-experimental frequencies of German and foreign cities, these models expected an even larger bias than the recall-estimate approach. However, differences in the pre-experimental frequencies alone obviously do not cause a bias analogous to the famous-names effect. In addition to this, the coexistence of a strong recall bias and approximately correct frequency judgments supports the conclusion we already derived in the discussion of the results of Experiment 3: Contrary to our original expectation, participants seem to have access to memory responses indicating the relative size of the categories on the study lists fairly accurate. 69 In the previous experiments we already made the observation that the correlation between recall ratios and estimates turns out higher if the recall-test was completed before the judgment tasks. In accordance with some findings from the literature (Betsch et. al, 1999; Bruce, Hockley, & Craik, 1991), we interpreted this as evidence for the assumption that the recall test makes information salient that the participants would otherwise use less or not at all to generate their judgments. In Experiment 1 we found in the condition in which the recall test was completed first similar results as in the same order condition of the present experiment. However, the correlation between recall ratios and estimates was in this experiment not reduced to 0 if the estimates preceded the recall test. Besides, in Experiment 1 there was a bias in the frequency judgments also in this task order condition. Thus, the question arises why estimates that could not be influenced by a previous recall test were completely unbiased in the present experiment. The answer to this question may be found in the categorization task. With the exception of the modified stimulus material to which the absence of the bias can apparently not be attributed, this task was the only difference between the experiments. As already described above, earlier research demonstrated that a categorization task can impede the use of a recall-estimate strategy (Bruce, Hockley, & Craik, 1991). The current results suggest furthermore that the combination of overt categorization and no preceding recall test blocks the use of this strategy entirely. It seems that after a categorization of the stimulus material the recall bias affects the estimates only if the participants become aware through a recall test how different the number of retrieved exemplars per category is. Consistent with this interpretation, also Bruce, Hockley and Craik (1991) found in several experiments correlations of 0 or close to 0 between recall performances and estimates if the stimuli were categorized but no preceding recall test was completed. In the present study the same condition was realized one further time in Experiment 2. There the correlation was even slightly negative. Moreover, the differences between the correlations in the two task order conditions were larger in the experiments with a categorization task than in the experiments without such a task (Experiments 1 and 3). This also supports the assumption that after an overt categorization of the stimuli, the recall-estimate strategy is not used spontaneously but can still be motivated by a recall test. 70 General Discussion The main objective of the present study was to test the hypothesis of the memory models that the famous-names effect is caused by the different preexperimental frequency of famous and less famous names. All in all, the results of our experiments contradict this hypothesis. In the Experiments 2 and 3 in which the frequency of the stimuli outside the target context was manipulated independently of fame we found no or just an extremely slight bias in the frequency judgments on the corresponding superordinate categories. In Experiment 4 in which the study lists contained cities with a different pre-experimental frequency a bias occurred only if a recall test had been completed before the frequency judgments were given. This result as well should of course not occur if different pre-experimental frequencies of stimuli would generally lead to distorted estimates on category sizes mediated via a retrieval free memory assessment process. None of the Experiments 2 to 4 thus provided evidence for the assumption that differences in the frequency of exemplars outside the target context alone are sufficient to influence estimates on categories in a considerable way. As a result, we have to draw the conclusion that the explanation of the memory models for the famous-names effect is incorrect. Originally, the famous-names effect was explained by Tversky and Kahneman (1973) with the assumption that frequency judgments on categories are based on the number of the relevant retrieved exemplars. This assumption was supported in the famous-names study by the observation that recall performances and estimates were influenced by the fame of the presented names in the same way. We also found this result pattern in Experiment 1 for study lists with names that were pre-experimentally differently frequent and famous at the same time. Here a bias in the mean recall ratios was associated with a bias in the mean frequency judgments. Besides, also the participants’ individual estimates covaried to a substantial degree with the individual recall ratios. All in all, the results of our experiments demonstrate nevertheless that the hypothesis that the participants generally generated their estimates with the recallestimate strategy can not be maintained. The recall-estimate approach predicted biased estimates in all experiments. Even if we observed a bias in the estimates as in 71 Experiment 1, this bias was smaller than had to be expected due to the recall ratios. We found this pattern in all experiments and, over and above, without exception also in all independent experimental groups.15 This confirms earlier findings according to which recall performances are an indicator too poor of objective frequencies to be able to explain the comparatively accurate estimates completely (e.g. Hasher & Zacks, 1984; Lewandowsky & Smith, 1983; Manis et al., 1993; Watkins & LeCompte, 1991). In addition to this, the occurrence of a recall bias did not at all necessarily mean that also the estimates were biased. In Experiment 2 the estimates corresponded almost exactly to the actual proportions of the categories in question on the study lists. The same was true in Experiment 4 as far as the estimates were given before the recall test. In Experiment 3 a recall bias corresponding in its size to the bias for high-frequent and medium-frequent names was merely accompanied by a minimal distortion in the estimates. The occurrence of a recall bias alone thus obviously does not allow a reasonable prediction about whether and how much also the frequency judgments will be distorted. Furthermore, the correlations between individual recall ratios and estimates in the Experiments 2 and 3 as well as in the mentioned task order condition of Experiment 4 were at 0 or close to 0. Therefore, any evidence is missing here that the estimates were influenced by the recall performances in any way. Apparently, neither the memory models nor the recall-estimate approach can explain the entire result pattern in a satisfying manner. However, which cognitive processes determined the participants’ estimates? In the remainder of this discussion we will try to offer an explanation on the basis of the multiple strategy perspective (Brown, 1995; 1997; 2002). According to this perspective, persons can use several strategies to generate frequency judgments. Strategy selection is assumed to be 15 One might argue that this finding is due to the way we transformed the number of retrieved exemplars in predictions of the recall-estimate strategy or to the time span we allotted for the recall test. However, previous research strongly suggests that this is not the case. In a study by Manis et al. (1993) five different recall indices were determined to predict frequency estimates. Recall ratios performed best. A study by Watkins and LeCompte (1991) indicates that recall performances indeed become a better cue of objective frequencies the more time is allotted for recall. However the maximum recall time in this study was considerably lower than in our experiments (in which it was 2 min.). Additionally, frequency estimates are, of course, usually given in less than 2 minutes. Finally, results of Sedlmeier et al. (1998) demonstrate that predictions based on the time needed to retrieve a first exemplar of each category do not differ systematically from predictions based on the number of retrieved exemplars. Thus, it seems, no matter how recall performance is measured, it reflects actual frequencies worse than estimated frequencies. 72 systematically related to properties of the events in question, encoding factors and the conditions at the time of frequency estimation. Brown basically distinguishes three strategies. Two of these strategies correspond to the memory assessment approach and the recall-estimate approach examined by us.16 The third strategy is direct retrieval of previously stored tallies of frequencies. For the present experiments the application of this strategy can be excluded since participants of laboratory studies usually count item frequencies only if they were explicitly instructed to do so (Brown, 2002). If a combination of the approaches examined by us should be able to explain the result pattern of all experiments, one of the strategies has to make possible an error-free estimation of category sizes. This can obviously not be the recall-estimate strategy. As opposed to the original hypothesis of the memory models, we thus have to assume that the memory responses to a category label were at least almost unbiased. Evidence for this assumption especially results from Experiment 3 in which the frequencies of the stimuli outside the target context were manipulated within the experiment. Earlier experiments show that in case of such a manipulation a considerable influence of the frequencies of the stimuli outside the target context on frequency of occurrence judgments has to be expected (Hintzman & Block, 1971; Dougherty & Franco-Watkins, 2003). On the contrary, the category size judgments collected by us were nearly unbiased. This gives reason to assume that the responses to category labels are not proportional to the sum of the responses to the corresponding exemplars. Let us shortly discuss how this finding relates to the memory models before we return to the multiple strategy perspective. MINERVA-DM (Dougherty, Gettys, & Ogden, 1999) and PASS (Sedlmeier, 1999) are aimed at modeling the memory assessment process. Both Dougherty and colleagues and Sedlmeier assumed that the response to categories is proportional to the sum of the responses to the exemplars. As we already emphasized in the introduction, both authors, however, did not specify the process that brings about this proportionality. The fact that Dougherty and colleagues 16 Brown refers to the recall-estimate strategy with the term enumeration. For the memory assessment strategy he does not specify an underlying cognitive process (as done in MINERVA-DM or PASS). Thus, in the multiple strategy perspective the only assumption connected with this strategy is that persons have access to some kind of memory response that can be used as indicator of frequencies. 73 simulated the famous-names effect without integrating a corresponding process into MINERVA-DM may signal that the proportionality assumption was made ad hoc. Together with the finding that frequency of occurrence judgments - as expected by the memory models - are influenced by frequencies outside the target context, this assumption enabled an explanation of the famous-names effect within the scope of the models. In the introduction we argued that proportionality could be brought about by an activation of superordinate categories already during the encoding of the stimuli. While this assumption seemed plausible to us at the beginning of this research, it is by no means necessary within the framework of MINERVA-DM or PASS. Therefore, the question arises which predictions the memory models will make if we abandon this or other possible additional assumptions generating proportionality. Let us consider the situation in the famous-names experiment. In MINERVADM every new event is stored by an additional trace in a memory matrix. If the sex of a name on the study list was encoded, it then has to be represented within the corresponding trace. At the time the judgments are given, memory is probed with the label of a category, here ‘female names’ and ‘male names’. The activation of an intraexperimental trace can thus only depend on its similarity to this probe. Given that the participants know the sex of the presented names and that the feature ’sex’ is encoded automatically (Taylor et al., 1978; Fiske & Neuberg, 1990; Messieck & Mackie, 1989), the probability that the sex of a female and a male name is represented in memory should be equal. Consequently, also the familiarity signals elicited by intraexperimental traces should correspond to the proportions of male and female names on the study list. The present experiments provide no indication that memory responses to categories are influenced by the frequency of extra-experimental events. When the frequency outside the target context was varied independent of fame, we found a (slight) bias in the estimates only if intra-experimental non-target lists were used (Experiment 3). Let us nevertheless assume that also in the case of category size judgments participants are not able to discriminate perfectly between the occurrence of events in the experimental and extra-experimental context. Since memory is not probed with single names, the activation of extra-experimental traces - just as the activation of intra-experimental traces - would have to depend on their similarity with 74 the probe ‘male’ or ‘female names’. In addition to this, the activation of extraexperimental traces would be influenced by the similarity of the encoding context to the experimental context. A distorting effect of extra-experimental traces on estimates on the relative frequency of male and female names thus would only be expected if at least one of the following circumstances will be given: 1. Names of one sex are generally encoded systematically more often than names of the other sex, or 2. names of one sex are encoded in contexts that are systematically more similar to the experimental context than the encoding contexts of names of the other sex. Both circumstances do not seem to be very plausible.17 In a similar manner also the predictions of the PASS model have to be modified if one abandons the assumption that the names elicit an activation of their superordinate category already during the presentation of the study lists. The associative response of this model to the probes ’male’ and ’female names’ then mainly depends on whether and to what extent an associative connection between the sex of the single names and the experimental context was learned. Again there is no reason for the assumption that this associative connection is systematically learned better for the names of one of the two sexes within the experiment. Also outside the experiment no systematically different weights between the units representing the experimental context and the units representing the sexes should have been acquired. Consequently, without additional assumptions both models come to the prediction that frequency judgments on categories in a target context should not be influenced by the frequency of occurrence of the relevant exemplars outside the target context. Thus, the models are in principle absolutely able to explain at the same time distorted frequency of occurrence judgments and accurate category size judgments. The present experiments give every reason to believe that the models come off far better if no additional assumptions are made modifying these predictions. With that the 17 Of course, MINERVA-DM would not predict a famous-names effect even if one of these circumstances would be given. As far as, for example, more male than female names were encoded outside the experiment, the model would expect that the frequency of male names is overestimated irrespective of the specific names on the study list. In fact, in the Experiments 2 and 4 we found a slight tendency to overestimate the frequency of animals and German cities on all study lists. Whether this was not only due to chance variation but also to more extra-experimental encodings of animals and German cities than of plants and foreign cities has to remain speculation. 75 models of course forfeit in any event the possibility to explain the famous-names effect on account of the different pre-experimental frequency of the names. According to the previous considerations, we suggest that our participants had generally access to unbiased information about category sizes via a memory assessment strategy. Thus, the question arises why they nevertheless gave distorted estimates in a part of the experimental conditions. From the view of the multiple strategy perspective, the biased estimates have to go back to an application of the recall-estimate strategy. Since the recall ratios were biased in all experiments, this strategy indeed represents continuously a possible cause of distorted estimates. In other words, an increased use of this strategy should cause a greater bias. The correlation between individual recall ratios and estimates can be regarded as an indication of the use of the recall-estimate strategy. we determined the covariation between this measure and the effect size in the estimates on the eight study list combinations for which separate analyses were reported.18 This covariation was indeed strong, ralerting = .88 (Rosenthal et al., 2000). That is, a larger bias resulted in those experimental conditions in which the estimates were systematically related to the recall ratios. On the other hand, if the estimates correlated only weakly or not at all with the recall ratios, no bias occurred. However, for the hypothesis that a bias in the estimates is caused by the use of the recall-estimate strategy, this is only weak evidence. While a missing relation between recall ratios and estimates indicates that the estimates were definitely not generated with the recall-estimate strategy, the presence of a correlation does, of course, not have to indicate that the participants used this strategy. Alternatively, any unknown third variable could have influenced estimates and recall ratios in the corresponding experimental conditions in the same direction. A stronger point in favor of the hypothesis could be made if we could name reasons why the participants should have used the recall-estimate strategy in some experimental conditions more than in other conditions. we suggest that persons base 18 Four of these eight combinations result from the lists with high-frequent and unknown, high-frequent and medium-frequent and medium-frequent and unknown names in Experiment 1 as well as from the lists with plants and animals in Experiment 2. Two further combinations consist of the lists with male and female names in the two interval conditions in Experiment 3. Moreover, the lists with German and foreign cities in Experiment 4 were analyzed separately for the two task order conditions due to the extremely different effect in the estimates. 76 their estimates increasingly on the recall-estimate strategy whenever the recall performance is perceived as valid and informative cue for frequencies. As a matter of principle, this strategy can only be used if persons succeed in recalling relevant exemplars. Furthermore, the information provided by the recall performance should appear trustworthier the larger the sample of retrieved exemplars is (Sedlmeier & Gigerenzer, 1997; Sedlmeier, 2004). In accordance with this consideration, Brown (1997) could demonstrate that the use of the recall-estimate strategy increases with the number of recalled exemplars. In a series of three experiments he presented his participants with word pairs consisting of a category label and an exemplar. Exemplars appeared on the study lists only once whereas the frequency of the categories varied between 0 and 16. After the study phase, participants estimated the presentation frequency of categories and then recalled exemplars. Within and across experiments, encoding time, category-exemplar relatedness and study phase instructions were manipulated. This set of manipulations produced large differences in the number of retrieved exemplars. More importantly, across the altogether nine different experimental conditions the number of retrieved exemplars covariated nearly perfectly with an index for the use of the recall-estimate strategy.19 However, even if the hypothesis held that the biased frequency judgments in the present experiments were generated with the recall-estimate strategy, their occurrence could, however, not be predicted by the number of recalled exemplars alone. As a matter of course, the estimates should also be influenced by the size of the recall bias. Even a large sample of recalled exemplars gives the participants only little reason to deviate from an estimate based on memory assessment as long as the recall ratios provide a similar result. Given that the memory assessment strategy enables approximately correct responses, this means, no bias should occur if the number of recalled exemplars is small or the recall ratios are only weakly distorted. On the contrary, a large bias in the estimates should only occur if both the number of recalled 19 This index was based on response times. Since the recall of every exemplar requires additional time, the response time should increase with the presentation frequencies as far as the recall-estimate strategy is used. On the other hand, there is no reason to assume that the response times should be influenced by the presentation frequencies, if a memory assessment strategy is applied. Thus, the steepness of the function relating response times to presentation frequencies can be regarded as an index for the use of the recall-estimate strategy. 77 exemplars and the distortion in the recall ratios are large. In other words, the bias in the estimates should depend on the additional informational value of the recall performances. In order to check this assumption, we re-analyzed the data of the different experimental conditions. As measure for the additional informational value of the recall performances in these conditions we calculated the product of the mean number of retrieved exemplars with the recall bias. Table 2.11 displays these variables together with the informational value in the eight experimental conditions. In Figure 2.5 the bias in the estimates is plotted against the informational value. Both the recall bias and the bias in the estimates were measured as differences. In each experimental condition we determined the difference between the mean estimates for the same category on the respective study lists. The recall ratios were treated analogously. 20 Positive values indicate a bias in the expected direction. Table 2.11 Mean recall bias, mean number of retrieved exemplars and informational value of recall performance for the different list compositions and conditions in each experiment. Exp. 1 high-frequent / unknown names high-frequent / medium-frequent names medium-frequent / unknown names Exp. 2 animals / plants Exp. 3 names with varying non-target list frequencies (interval non-target – target list: 3h) names with varying non-target list frequencies (interval non-target – target list: 24h) Exp. 4 German / foreign cities (task order: recall-test before) German / foreign cities (task order: recall-test after) recall bias (%) number of retrieved exemplars informational value of recall performance N 69.6 24.4 71.8 9.1 11.5 7.5 6.3 2.8 5.4 32 32 32 6.9 12.5 0.9 35 30.9 6.2 1.9 41 31.8 5.3 1.7 40 40.6 11.8 4.8 41 45.1 12.0 5.4 40 20 We did not use effect sizes here because these depend on the variances of the recall ratios (or the estimates) between participants. The variances of the recall ratios, however, cannot influence a participant’s strategy selection (and his estimates) since they are unknown to him. 78 As illustrated in Figure 2.5, there is a strong tendency that the bias in the estimates increases with the product from the recall bias and the number of retrieved exemplars. Thus, the informational value of the recall performances is indeed predictive for the estimation bias. Particularly instructive is a comparison of the results of the Experiments 1 and 3 in which names were used as stimuli. In Experiment 3 in which the frequency of unknown names was manipulated by means of a non-target list, the recall bias was slightly larger than in case of the lists with pre-experimentally highfrequent and medium-frequent names in Experiment 1. Nevertheless, we observed in 20 Exp. 1: hf/uk Estimation Bias (%) 15 Exp. 1: mf/uk 10 Exp. 1: hf/mf Exp. 4: Rec. before 5 Exp. 3: 3h Exp. 3: 24h 0 Exp. 2 Exp. 4: Rec. after -5 0 1 2 3 4 5 6 7 Informational Value of Recall Performance Figure 2.5: Estimation bias plotted against informational value of the recall performance for the different list compositions and conditions in each experiment. See text for details. List compositions and conditions can be taken from Tab. 2.11. Experiment 3 a clearly smaller estimation bias. The present analysis suggests that this can be ascribed to the fact that in Experiment 3 markedly less exemplars were recalled. 79 Consequently, the information from the recall performances appeared possibly less reliable and was used less for the generation of the estimates. As could be expected, within Experiment 1 fewer exemplars were recalled on the lists that contained unknown names than on the lists with high-frequent and medium-frequent names. At the same time, however, the recall bias was overproportionally larger on the lists with unknown names. Thus, the recall performances attained the highest informational value on the list with high-frequent and unknown names. In fact, also the bias in the estimates increased starting from the list with highfrequent and medium-frequent names up to the list with high-frequent and unknown names. This suggests that the special thing about names with pre-experimentally varying frequencies (and varying fame) might be that they lead to a large sample of recalled exemplars and a strong recall bias at the same time. Within our studies, only in Experiment 4 the recall performances attained a similar informational value as in Experiment 1. In this experiment city names with preexperimentally varying frequencies were used as stimulus material. The results from the condition in which the recall test was completed before the frequency judgment tasks fit well into the general picture. However, the results from the condition in which the frequency judgments were given first, represent an outlier in Figure 2.5. Despite a high informational value of the recall performances, no bias occurred here in the estimates. Remember that Experiment 4 was one of the two experiments in which the participants completed a categorization task during the presentation of the study lists. As we already emphasized before, there is evidence that a categorization task impedes the use of a recall-estimate strategy (Bruce, Hockley, & Craik, 1991). This finding makes sense also from the psychological point of view. Generally, this strategy can only exert an influence on the estimates if the participants assume a priori that the recall performances could provide necessary and meaningful information about the frequencies and thus at least try to retrieve exemplars. Exactly this may, however, be prevented by the categorization task. This task provides the participants with a second possible basis for their estimates which is not available without this task: the frequency with which they allocated the category labels during the presentation of the study list. This second basis may be sufficient to generate subjectively credible estimates. An 80 additional attempt to retrieve exemplars would thus be superfluous (Bruce, Hockley, & Craik, 1991). On the other hand, if the participants learned “by force” from a preceding recall test that they could retrieve a large sample of exemplars in which the categories occurred with very different frequencies, this information can apparently not be ignored. It should be noted that the estimates in Experiment 2 in which likewise a categorization task was completed were unbiased even if they were given after the recall test. However, this corresponds to our assumption that the occurrence of a bias should depend on the informational value of the recall performances. This value was particularly low here. The former considerations suggest that the experimental conditions combining a categorization task and no preceding recall test should be excluded from our analysis of the relation between the informational value of the recall performances and the estimation bias. After this exclusion, the correlation between these variables across the remaining seven experimental conditions amounts to ralerting = .94 (Rosenthal et al., 2000). Let us summarize: Neither the memory models nor the recall-estimate strategy could account for the results of the present study. We suggest that the following set of assumptions allows a satisfactory explanation of the result pattern within the framework of a multiple strategy perspective: Our participants generally had access to a memory response that indicated the relative frequencies of the categories in a more or less unbiased manner. An estimation bias occurred whenever the participants nevertheless used the recall-estimate strategy. Since the estimates were consistently closer to the actual frequencies than the recall ratios, we have to assume that the participants never exclusively relied on the recall-estimate strategy. Thus, the memory response was at least always used to correct the information from the recall ratios. The extent to which the recall-estimate strategy exerted an influence on the estimates depended on the perceived additional informational value of the recall performances. This informational value results from the number of recalled exemplars and the size of 81 the recall bias.21 Finally, we assume that an overt categorization of stimuli prevented any attempt of the participants to gain information from the retrieval of exemplars. That is, without a preceding recall test, the recall-estimate strategy was in this case not used. Conclusion MINERVA-DM is fairly explicitly formulated with the claim to be able to explain all kinds of frequency judgments on serially encoded events (Dougherty, Gettys, & Ogden, 1999). Sedlmeier (1999, 2002) acknowledges, on the other hand, that different mechanisms than memory assessment can exert an influence on frequency judgments. However, he apparently only expects deviations from the predictions of the PASS model if such mechanisms are triggered by additional information at the moment the judgments are given (a preceding recall-test may be an example for a situation in which such information is given). Actually, this memory model and other memory models were utterly successful in predicting various phenomena mainly observed in frequency of occurrence judgments. Thus, besides accurate frequency judgments, these models can offer explanations for regressed judgments (Fiedler, 1996; Sedlmeier, 1999), recency effects (Lopez et al., 1998; Sedlmeier, 1999), list-length and spacing effects (Hintzman, 1988). The influence of varying attention (Sedlmeier & Renkewitz, 2004) and varying depth-of-encoding (Dougherty et al., 1999) on estimates could be simulated with these models as well. In chapter 3 we will demonstrate furthermore that PASS correctly predicts effects of associations between jointly presented stimuli on frequency judgments. On the other hand, evidence for the use of a recall-estimate strategy in case of frequency of occurrence judgments is indeed generally scarce if not absent (Brown, 2002; Marx, 1985). In spite of these remarkable successes of the memory models, the results of the present research suggest in accordance with the multiple strategy perspective (Brown, 2002) that the situation is different in case of category size judgments. The famous21 It should be pointed out once more, that of course also a large unbiased sample of recalled exemplars represents a trustworthy source of information. However, the recall ratios would in this case only confirm the memory responses and could thus exert no modifying influence on the estimates. 82 names effect could not be put down to different pre-experimental frequencies of the stimuli. As a result, the memory models could offer no convincing explanation for this effect and the result pattern in our experiments. On the contrary, the result pattern becomes understandable if one assumes that the recall-estimate strategy has a part in the generation of category size judgments at least under particular circumstances. As far as one accepts that people use several strategies for the generation of frequency judgments, the question arises which variables determine the selection of a strategy. The present research indicates that properties of the stimulus material and all other variables influencing the number of retrievable exemplars are plausible candidates. Furthermore, there may be factors that generally prevent the use of a recall-estimate strategy since they suggest a priori that the retrieval of exemplars is not necessary to obtain subjectively acceptable estimates. The overt categorization of the stimulus material could be such a factor. These considerations are in accordance with factors proposed and identified within the multiple strategy perspective (s. Brown, 2002, for an overview). The challenge of future research on category size judgments will be to integrate these factors into a comprehensive model of strategy selection. 83 CHAPTER 3 Effects of Associations on Frequency Judgments: Simulations with PASS and Empirical Results Summary PASS (Sedlmeier, 1999), a computational memory model of frequency judgment, considers sensitivity to the frequencies of events as a by-product of associative learning. In this chapter predictions of the model about the influence of associations between jointly presented items on frequency estimates are derived and tested. In two experiments sets of words that were either strongly or weakly associated with each other were presented. Within these sets the presentation frequency of the words was varied. Predictions of the PASS model for the frequency estimates about this stimulus material were gained by means of simulations. In a first step PASS encoded a very large text corpus and acquired associations between the stimulus words. In a second step the model encoded the experimental study lists. PASS expected higher frequency judgments for associated than for non-associated words. The size of this effect was predicted to be independent of the actual presentation frequencies of the words. Finally, the model expected that the frequencies of associated words are discriminated slightly worse than the frequencies of non-associated words. These predictions were largely confirmed in both experiments, if the participants did not complete a recall test before the frequency judgment task. The results thus strengthen the assumption that the encoding and representation of frequency information are based on associative learning. However, the results also point to conditions, in which the ability of PASS to predict frequency judgments is nonetheless restricted. Implications for PASS and other memory models of frequency judgment are discussed. 84 Introduction People show considerable sensitivity to the frequency of repeatedly occurring events. In the last four decades, a vast amount of studies accumulated that compared participants’ judgments about the frequencies of serially encoded events to the objective frequencies of these events. The central finding of this research is that people arrive at quite accurate judgments under a wide variety of conditions. This holds for frequencies of events from everyday life - such as letters, pairs of letters, syllables, surnames or professions (Attneave, 1953; Shapiro, 1969; Tryk, 1968; Zechmeister et al., 1975) - as well as for experimentally varied frequencies of repeated items in laboratory settings (e.g. Greene, 1986; Jonides & Naveh-Benjamin, 1987; Hockley & Christi, 1996). What forms the basis of this pervasive ability to judge the frequency of occurrence of items? The PASS (Probability Associator) model (Sedlmeier, 1999; 2002), which aims at explaining judgments of frequencies and probabilities, suggests that sensitivity to frequencies is a by-product of associative learning. Accordingly, the basic assumption of PASS is that frequency knowledge is represented in associations between objects or the features that constitute those objects. These associations are assumed to be acquired in the course of all encodings of objects or events that are performed with a minimum of attention. Several simulations with PASS demonstrate that the model can account for the sensitivity to frequencies observed in numerous empirical studies (Sedlmeier, 1999). However, any valid model of frequency judgments does not only have to account for the usual finding of estimates that roughly conform to actual frequencies but also for a number of experimental results demonstrating that estimates can be systematically influenced by variables different from objective frequencies. Several factors working at the stage of encoding, representing or retrieving frequency information, are known to have a reliable impact on frequency judgments. One of these factors is associative relatedness between items. Leicht (1968) was the first to demonstrate that a target item obtains greater frequency judgments the more items associated to it have been presented in the same list. In his study list target items were 85 presented zero, one, three or five times. Additionally, the list contained zero, one, three or five (different or repeated) associates of each target item. Partly, associates were words eliciting the target item as response in a word-association task (e.g. “eating” elicits the target item “food”). Further pairs of target items and associates consisted of category names and corresponding instances (e.g. “tree” and “oak”) or antonyms (e.g. “buy” and “sell”). Leicht found that frequency judgments about target items increased with the number of presented associates. This held for all presentation frequencies of the target items, although the effect was rather small. Averaged over presentation frequencies, target items with zero associates on the study list received a mean frequency judgment of 2.74 while target items with five associates received a mean judgment of 2.92. Two other studies could replicate this effect of associative relatedness between items on frequency estimates. Shaughnessy and Underwood (1973) used instances of the same category as targets and associates in two experiments. In both experiments targets that were presented along with several associates obtained higher estimates than targets that had no items conceptually related to them on the study list. In a study by Vereb and Voss (1974) frequency judgments again increased with the number of presented words that were associates of the items in question. The main objective of the present article is to test if PASS can account for the effects of associations between jointly presented items on frequency estimates. For this purpose we report two new experiments and compare their results to predictions derived from simulations with PASS. The experiments differ in one important aspect from the studies reported above. All studies cited have in common that they exclusively analyze whether associations exert an influence on the absolute magnitude of frequency judgments. However, generally frequency judgments may not only differ in their absolute magnitude but also with regard to the validity with which different frequencies are discriminated (Flexser & Bower, 1975; Naveh-Benjamin & Jonides, 1986). Thus, a set of frequency judgments on associated stimuli presented with various frequencies can turn out higher or lower than a set of judgments on non-associated stimuli with the same presentation frequencies. Irrespective of this, these judgments could furthermore reflect a better or worse ability to discriminate the actual 86 frequencies of associated or non-associated stimuli. An assessment of whether associations do have an effect on frequency judgments also in this respect requires that the presentation frequency within a set of associated stimuli is varied systematically. A set of non-associated stimuli with the same presentation frequencies has to serve as control group. These requirements were not fulfilled in the former studies. The study lists there always contained a number of target words each of which was accompanied by several associates. The (associative) relations between different sets of target words and their associates remained undetermined or were minimized. Furthermore, within a set of a target word and its associates the frequency of the stimuli was not varied systematically. In the following experiments the associative strength between all items on a study list was controlled and manipulated. Furthermore, the presentation frequency of items was varied within sets of a high and low associative strength. As a result, measures for the discriminability of frequencies of associated and non-associated stimuli could thus also be collected besides the magnitude of the frequency judgments. This allows us to extend the comparison between the predictions of the PASS-model and the experimental results to several characteristics of frequency judgments. Before we take a closer look at the model, the simulations and the predictions derived from these simulations, we have to clarify what kinds of frequency judgments lie in the explanatory scope of PASS. What kind of frequency judgments can be modeled with PASS? The first restriction of PASS is that it exclusively deals with frequency or probability judgments pertaining to directly experienced and serially encoded events. Recent research accumulated evidence that persons use various strategies to generate such judgments (e.g. Manis et. al. 1993, Brown, 1997). Brown (1995) was the first to suggest a taxonomy of these strategies (for a more detailed classification of strategies see Brown, 2002). Besides the direct retrieval of previously stored facts about frequencies he distinguishes between enumeration and memory assessment. If enumeration is used, single relevant instances will be retrieved from memory and counted. The result of the counting is either given directly as frequency judgment or 87 the frequency judgment is extrapolated from the result. As far as memory assessment is concerned, intuitions of relative frequencies of events serve as a basis for judgments. These intuitions are in turn based on the evaluation of some aspect of memory performance. Therefore, in contrast to enumeration, memory assessment first of all provides a non-numerical impression about event frequencies which thus has to be translated into a numerical response in a further step. The second restriction of PASS is that it exclusively aims at modeling the memory assessment process. There is a large degree of agreement on the point that every encoding of objects and events occurring with a minimum of attention affects memory representation in a way that can be called on for the generation of judgments on the frequency of these objects and events (Reichardt et al., 1973; Dougherty & Franko-Watkins, 2002; Hasher & Zacks, 1984; Zacks & Hasher, 2002; Sedlmeier, 2002). The PASS model holds specific assumptions on how the representation of an event changes with every additional encoding of this event and how this change can be used to give frequency judgments. As already mentioned, the main objective of this article is to test whether these assumptions are appropriate to explain possible effects of associations on frequency judgments. Such a test is obviously only reasonable if it can be demonstrated first that these effects occur when memory assessment is used. For this reason, in addition to frequency judgments in our experiments the reaction times required by the participants to decide in favor of a judgment were also collected. Decision times provide a processsensitive measure which allows differentiating between judgments based on enumeration and judgments based on memory assessment22 (Brown, 2002). If enumeration is used, the required decision times sharply increase with presentation frequencies (Brown, 1995; 1997). The reason for this is that enumeration is a serial process. The judgments are here based on the number of retrieved instances. The retrieval of additional instances requires additional time (Bousefield & Sedgewick, 1944; Indow & Togano, 1970). In contrast to enumeration, memory assessment is 22 The possibility that participants use a direct retrieval strategy to generate their judgments can be neglected. The use of a direct retrieval strategy requires that facts about the relevant frequencies were memorized in advance. In an experimental context such knowledge about facts could only be acquired if repetitions of items were counted explicitly during the study phase. However, experimental event frequencies are only tallied by participants that were previously instructed to do so (Brown, 2002). 88 described as a non-serial process (Hintzman, 1988; Dougherty, Gettys, & Ogden, 1999; Sedlmeier, 1999). Thus, there is no reason to suppose that the speed of this process is affected by presentation frequencies (Brown, 1995). Accordingly, in judgments based on memory assessment only a notably smaller increase of decision times can be found with increasing presentation frequency. Numerous studies, however, document that the function relating decision times to frequencies even in such judgments is not entirely flat (Brown, 1995; 1997; Dougherty & Franco-Watkins, 2002; Schimmack, 2002). In an extensive series of studies in which list frequencies of words had to be estimated Brown found slight differences in the decision times between various presentation frequencies for judgments based on memory assessment. This can possibly be ascribed to the fact that the transformation of the intuitions of event frequencies into numerical responses is more difficult with higher numbers than with smaller ones (Brown, Buchanan & Cabeza, 2000; Moyer & Dumais, 1978). In Brown’s studies (1995; 1997) the increase within the frequency range of 2 to 8 – which is also considered in our studies reported below – constantly amounted to about 0.7 seconds. In judgments based on enumeration, on the other hand, an increase of approximately 4 seconds could be found in the same frequency range. There is good reason to assume that our participants should use memory assessment to generate their judgments and thus produce the relatively flat decision time function tied to this estimation strategy. Enumeration requires that relevant single instances can be retrieved. The retrieval of event instances is facilitated the more, the more distinct the instances are (Brown, 2002). Accordingly, studies on judgments about behavioral frequencies found that the use of enumeration becomes more common when event instances are judged to be distinctive (Menon, 1993; Conrad et al., 1998). In experimental settings evidence for the use of this strategy almost exclusively stems from studies, in which participants had to judge category sizes (e.g. Tversky & Kahneman, 1973; Lewandowsky & Smith, 1983; Barsalou & Ross, 1986; Begg et al., 1986; Williams & Durso, 1986; Greene, 1989; Brown, 1995; 1997). In such studies the requested frequency estimate results from the number of presented diverse exemplars of the category in question (e.g. "How many different kinds of trees did appear on the list you just studied?”). In contrast to this, in our experiments 89 participants had to judge the frequency of occurrence of words on the study list. Here the frequency judgment refers to the number of repetitions of identical events. The retrieval of single instances should thus be inhibited. Accordingly, previous studies in which participants had to judge the frequency of occurrence of items consistently found that enumeration was rare (Marx, 1985) or even absent (Brown, 1995; see also Brown, 1997; Manis et al., 1993). The PASS-Model How can one imagine the memory assessment process exactly? How does the representation of an event change with every additional encoding of this event? And how can this change be used to give appropriate judgments on the frequency of occurrence of the event? The PASS-model (Sedlmeier, 1999; 2002) assumes that knowledge of frequencies is represented in learned associations. Associative learning is based on the frequency with which events co-occur. The greater the frequency of co-occurrence the greater is the association between two events (Slamecka, 1985). Therefore, it seems straightforward to use associative strengths, the result of this learning process, as a basis for frequency judgments. Within PASS, this learning process is modeled by FEN (Frequency Encoding Network), a neural network that encodes events and acquires associations while doing this. The strength of these acquired associations determines the extent of activation in the neural network triggered by an event after its repeated presentation. The PASS-model postulates that it is this activation that provides an intuition of relative frequencies of events in the memory assessment process. There is a number of associative models aimed at explaining how contingency relations between events are acquired (e.g., Shanks, 1995; Wasserman & Miller, 1997). Contingency relations between events allow the prediction of one or several events from the presence of another set of events. Thus, for example, a set of symptoms may enable the diagnosis of a certain disease. The corresponding associative models mainly use the occurrence or non-occurrence of the predicted event as a corrective feedback for the modification of associations between the events involved. However, contingency relations between events are in most cases only of 90 secondary importance for judgments of relative and absolute frequencies. If you are, for instance, asked about the relative frequency of red cars, a possible co-occurrence of other objects will be without any relevance. If in typical experiments on frequency estimates a judgment about the absolute frequency with which a word was included in a study list is requested, there will be no contingency to other events which could be useful to answer the question. In other words, frequency judgments often merely refer to the repeated occurrence of a single event. What kinds of associations are learned in that case? In PASS it is assumed that events and objects are represented by their features or values on attributes. A given collection of features and the associations between these features determine which object or event is experienced. Learning consists of changing the strength of associations between features whenever an event or object is encoded. Associations between co-occurring objects can be acquired by treating the objects in question as a single event. Accordingly, the associative strengths between the features constituting the objects are modified together. FEN – Architecture and Learning Rule What does the applied neural network exactly look like? There is no single model behind FEN but a family of models. Sedlmeier and Köhlers (unpublished data) used a number of different neural networks with varied architectures and associative learning rules in their simulations of frequency judgments. In the course of this, feedforward models using competitive learning (Grossberg, 1976; Rumelhart & Zipser, 1985) and feed-forward as well as recurrent architectures working with Hebbian learning led to very similar results, as well as several other models. The principal commonness of all models in the FEN-family is that they have to operate without corrective feedback. Below, only that FEN-variant (Sedlmeier, 2002; 1999) is explained in more detail that has been used for the following simulations. Figure 3.1 shows the architecture of this variant. FEN here only consists of one layer of units. Every unit is connected with every other unit and with itself. A particular unit is activated each time the suitable feature is available in a stimulus. Therefore, a given input for FEN consists of a vector of 1s and 0s, with 1s coding the presence and 0s coding the absence of features in a stimulus. Learning takes place by 91 changing the associative strengths or weights that connect two units. The learning rule controlling this modification of weights is a variant of the Hebbian learning rule and was adapted from the stimulus sampling theory (Bower, 1994; Estes, 1950). It is: θ1 (1 − wij ), if both units i and j are active ∆wij − 2 w ij , if either unit i or j is active (1) − θ 3 wij , if neither unit i or unit j is active Here wij corresponds to the associative strength between the units i and j, ∆wij is the change in the associative strength after a learning trial and θ1, θ2 and θ3 represent learning rates, interference and decay rates, respectively. The weights can vary between 0 and 1 according to this rule. If two units are active in an input vector, the weight between them will, according to the learning parameter θ1, be increased by a constant proportion of the difference between the current weight and the maximum associative strength. An interference between two units occurs if one unit is active and the other one not. In this case, the weight between the units in question is decreased by a proportion of the current associative strength specified in θ2. The weight between two units decays if both units are inactive. The strength of the decay is given by the parameter θ3. I1 I2 I3 I4 w23 w12 w14 1 0 0 1 Input Pattern S1 Figure 3.1. The architecture of a variant of the FEN module 92 The situation represented in illustration 1 may serve as an example. Here, units I1 and I4 are activated by the input pattern S1, whereas units I2 and I3 are not. Let us assume that the learning, interference and decay parameters are θ1 = 0.2, θ2 = 0.05 and θ3 = 0.02. Given the weight w14 that connects node I1 with node I4 is at 0.3 prior to the encoding of the input pattern S1, so the increase of the weight turns out as ∆w14 = 0.2(1 – 0.3) = 0.14. As a result, the updated weight amounts to w14(new) = w14(old) + ∆w14 = 0.3 + 0.14 = 0.44 after the encoding. The effect of interference can be illustrated by weight w12. If this weight has a starting value of 0.5, it will be at w12(new) = 0.5 – 0.05 * 0.5 = 0.475 after the learning trial. Finally, in the given input pattern the weight w23 will decay. If w23 = 0.1, then w23(new) = 0.1 – 0.02 * 0.1 = 0.098. The Generation of Frequency Judgments After any number of stimulus presentations and the corresponding successive weight modifications in FEN, PASS can render frequency judgments. For this, first the activation is determined which is elicited in all units of the neural network by the stimulus the frequency of which has to be estimated. The activation of a particular unit results from the scalar product of the input vector and the vector of weights that connect the unit with itself and the remaining units in the network. Let us assume that in Figure 3.1 the weights that establish the connection between all nodes of the network and node I1 amount to w11 = 0,6, w21 = 0,475, w31 = 0.35 and w41 = 0.44 at the time of probing. Then the present input pattern S1 triggers an activation at node I1 which is a1 = 1 0.6 + 0 0.475 + 0 0.35 + 1 0.44 = 1.04. The activation is calculated for all units of the network correspondingly. The sum of these activations is FEN’s response to a stimulus. In the PASS-model this response is regarded as the basis of intuitions of frequencies of occurrences. FEN’s response cannot be used as estimation yet as it is not scaled in the same way as frequencies are. For that reason, the PASS-model contains a CA-module (cognitive algorithms module) beside FEN which interprets the output of FEN with the help of several rules and turns it into frequency judgments (for a more detailed description of the CA-module, see Sedlmeier, 2002). The rule used to generate 93 judgments about relative frequencies is called relative-frequency algorithm. This algorithm takes advantage of the fact that all kinds of information required for a relative frequency judgment can be acquired from FEN. Let us assume PASS was to give a judgment on the proportion of red cars among all cars encoded during a certain period of time. According to the relative frequency algorithm, first of all, PASS would determine FEN’s response to the input pattern representing red cars. In a second step, the responses to cars of all colors that occurred would be determined and added. Finally, the response to red cars would be divided by the total sum determined in the second step. This division results in a quotient that would be used by PASS as estimate for the requested relative frequency. PASS cannot give absolute frequency judgments by solely falling back upon knowledge represented in FEN. The response triggered by a stimulus in FEN does not allow any immediate conclusions about its absolute presentation frequency. Consequently, the transformation of this response or the corresponding relative frequency judgment into an absolute judgment can only take place on the basis of information that is delivered to PASS from the outside. However, it is assumed that absolute frequency judgments on different stimuli are scaled in the same manner by PASS as long as the stimuli have been presented in the same (e.g. an experimental) context and the judgments are given under identical conditions. As a result, in order to be able to generate absolute judgments on a given set of stimuli, an assumption of one single correspondence between a relative frequency and one absolute frequency is sufficient for PASS. If, for example, an assumption of or knowledge about the total number of stimulus presentations is available to PASS, absolute judgments can be determined by multiplying this total number with the relative frequency judgment on the stimulus in question (Sedlmeier, 2002). If the total number is 40 and the relative frequency judgment of PASS 0.3, then this amounts to, for example, an absolute judgment of 40 * 0.3 = 12. Alternatively and with the same result, absolute judgments for a set of stimuli can also be generated directly from responses. For instance, if it is known to PASS that a stimulus triggering a response of 0.8 was presented 12 times, absolute frequency judgments for all remaining stimuli can be obtained by multiplying their responses with the factor 15. In experimental settings, the value of one absolute 94 frequency required by PASS is often given directly or indirectly. Thus, participants are informed about the maximum presentation frequency of a single stimulus in many cases or they are presented with a response scale (e.g. Johnson et al., 1977; Hockley & Cristi, 1996; Naveh-Benjamin & Jonides, 1986). There is evidence that also people's judgments on absolute frequencies based on memory assessment are generated in a two-stage process. Memory assessment first of all results in a non-numerical intuition of relative event frequencies (Brown, 1995; 1997; Hintzman, 1988, Dougherty et al., 1999). Therefore, the non-numerical intuition has to be transformed into a numerical answer in a second step. This transformation is assumed to be independent of the assessment process (Brown & Siegler, 1993). Evidence for this assumption mainly ensues from studies that gave different response scales or that manipulated information about the maximum presentation frequency of a stimulus. These manipulations can exert a high influence on the absolute amount of frequency judgments without impairing the discrimination of different frequencies (Brown, 1995; Schimmack, 2002). Former Simulations with PASS PASS can simulate a number of findings on frequency judgments based on memory assessment. As mentioned earlier, judgments of the model show that PASS is sensitive to frequencies just as experimental participants are. However, empirical tests on relative frequency judgments have often found that this sensitivity is subject to a restriction: small relative frequencies tend to be overestimated and large ones underestimated (e.g. Erlick, 1964; Sedlmeier et al., 1998, Varey et al., 1990). PASS also produces this “regression effect” (Edgington, 1965). Within the framework of the model, the extent of the regression depends on two factors: the regression is the more pronounced the more the encoded stimuli overlap, i.e. the more features they share. Furthermore, the regression increases the more the interference parameter θ2 is to be found below the learning parameter θ1 (Sedlmeier, 1999). A further finding successfully simulated by PASS involves the chronological sequence of stimulus presentations. If the occurrence rate of a stimulus changes in the course of a presentation sequence, the occurrence rate at the end of the sequence exerts 95 an increased influence on the corresponding frequency judgments. Thus, the total relative frequency of a stimulus occurring towards the end of a presentation sequence with an increased rate is overestimated and the total frequency of a stimulus occurring more seldom towards the end is underestimated. PASS’ frequency judgments show this recency effect (Sedlmeier, 1999). A lot of studies demonstrate that frequency judgments based on memory assessment are influenced by factors impinging upon the encoding of stimuli. In this way, frequency judgments are influenced by the salience (Sedlmeier & Renkewitz, 2004) as well as by the processing depth of stimuli (Fisk & Schneider, 1984; Greene, 1984; Jonides & Naveh-Benjamin, 1987; Maki & Ostby, 1987, Naveh-Benjamin & Jonides, 1986, Rose & Rowe, 1976; Rowe, 1974). In addition to this, a generation effect appears in frequency judgments; i.e. judgments on self-generated stimuli differ from judgments on stimuli that have been merely copied (Greene, 1988). The typical pattern of results in all these cases is that stimuli that are encoded better receive higher judgments than stimuli encoded less well. Moreover, some studies show that a better encoding also implies a better discrimination between frequencies (e.g. Greene, 1984; 1988; Naveh-Benjamin & Jonides, 1986; Rowe, 1974). These findings can be simulated by PASS, too. An improved encoding is modeled by an increase of the learning parameter θ1. This increase effects that PASS renders higher frequency judgments and discriminates better between the relevant stimuli (Sedlmeier & Renkewitz., 2004). Overview of the simulations and experiments The effect of associations between jointly presented items on frequency judgments has so far not been examined with PASS. Therefore, the simulations reported below serve at first the aim to clarify which effects of associations are predicted by the model. Apart from the influence of associations on the magnitude of judgments, it is also to be examined in this process whether PASS expects differences in the discriminability of frequencies of associated and non-associated stimuli. Afterwards, the predictions of the model will be compared with data from two 96 experiments. In both experiments the participants were presented with sets of associated and non-associated words. The presentation frequencies for associated and non-associated items were 0, 2, 5 and 8. Besides frequency judgments on this stimulus material, decision times were collected in both experiments. These data served to identify the strategy used for the generation of frequency judgments. We expected that the participants would rely on a memory assessment strategy. As a consequence, with an increased presentation frequency the relatively slight rise in the decision times involved with this strategy should be found. In the simulations with which the predictions of the PASS model were obtained the same stimulus material as in the experiments was used. That is, PASS encoded the same study lists with the same words and presentation frequencies as the participants. For PASS to be able to make predictions about the effect of the associative relations in this stimulus material, it is necessary that it "knows of" these associative relations. Similar to a participant who already pre-experimentally acquired associations between the experimentally presented items, PASS requires a pre-knowledge about the associations between words. This knowledge should not be conveyed to the model from the outside but should be learned "autonomously" by PASS. Thus, the simulation of frequency judgments on associated stimulus material required a two-step procedure. In a first pre-experimental step, PASS encoded a large text corpus. From this corpus PASS acquired associative pre-knowledge. (The detailed process of the acquisition of associations between words from this corpus is described below). In a second step, the processes during the experimental presentation of the study lists were simulated on this basis. Besides the association and the frequency of the presented words, one variable was manipulated in the experiments that is no issue of the PASS simulations: With one half of the participants a recall test was administered before they gave their frequency judgments. The other half gave their judgments immediately after the presentation of the study list. The recall test was mainly introduced to examine an effect observed by Erickson & Gaffney (1985). In their study the participants gave judgments referring to the frequency of occurrence of words directly after the study phase or after an intermediate recall test. The recall test influenced the subsequent judgments and 97 interacted with the actual presentation frequencies. Judgments in the condition with recall test turned out smaller than judgments in the condition without recall test. The difference between the conditions increased with rising presentation frequency. Participants who recalled words from the study list before they gave their judgments showed a tendency to underestimate the presentation frequency of items. However, in judgments uninfluenced by a previous recall test a (slight) tendency to overestimate the actual frequencies could be found. Up to the present moment, the study by Erickson and Gaffney seems to be the only one in which the impact of a recall task upon subsequent judgments on frequencies of occurrence was examined. Thus, the first objective of the recall treatment was to check whether the observed effect is replicable. In the second place, we wanted to test a possible explanation of this effect. The differences found by Erickson and Gaffney between judgments generated with or without a preceding recall test correspond to typical differences between judgments based on enumeration and memory assessment (Brown, 1995; 2002): If enumeration is used, there is a tendency to underestimate actual event frequencies. Memory assessment, on the other hand, results often in overestimation (at least, as far as no upper limit of the response range is given). Moreover, the difference in the magnitude of judgments that can be traced back to these strategies increases with presentation frequencies. In addition to this, Betsch et al. (1999) could demonstrate for frequency judgments on category sizes that a preceding recall test fosters the use of an enumeration strategy. The information made available by the recall test is apparently also used in an intensified manner for the generation of frequency judgments. A possible explanation for the effect observed by Erickson and Gaffney can thus be found in the assumption that a preceding recall test facilitates the use of enumeration even in case of judgments on frequencies of occurrence. Compared to judgments on category sizes, however, different relevant exemplars cannot be retrieved in such judgments. Instead of this, enumeration would here have to be based upon the retrieval of various encoding episodes of identical events. In other words, when using this strategy, the participants would have to recollect several episodic memories of repeated presentations of the same word by, for example, remembering adjacent words, own thoughts or activities during various presentations (Brown et al., 2000; 98 Roediger & McDermott, 1995; Gardiner, 1988; Dobbins et al., 1998). If the recall test facilitates such recollections and thereby fosters the use of enumeration in case of judgments on frequencies of occurrence, this should be accompanied by the typical differences in the decision times: The function relating decision times with presentation frequencies should be steeper in case of judgments after a preceding recall test than in case of judgments without a preceding recall test. Furthermore, judgments after the recall test should increase – as they did in the study by Erickson and Gaffney – less steep with rising presentation frequency than judgments without a recall test. Below, the design and the exact procedure of the first experiment are described. This is followed by the description of how this procedure was realized with PASS in the simulation. The section 'Simulation Method' is subdivided into two parts. The first part provides a description of how PASS acquired pre-knowledge about the associative strengths between the words used in the experiments. In the second part the modeling of the experimental presentation of the study list is illustrated. After that, we will report the results of the simulation and derive from them predictions of the PASS model about the magnitude of judgments and the discrimination of presentation frequencies of associated and non-associated items. Finally, the results of the first experiment are reported and compared with the prediction of the PASS model. 99 Study 1 Method Participants: Ninety-six persons took part in the experiment (43 men and 56 women, with an average age of 23 years). Most of the participants were students of the TU Chemnitz from various departments. Students of Psychology did not take part in the experiment. The participants received a reimbursement of 3 Euros. Material: In the experiment four lists were used, two lists with associated words as well as two lists with non-associated words. Each of the four lists contained 20 words. To arrive at the lists, first of all 10 associated lists were constructed. Afterwards, two of them were selected for experimental use. German wordassociation-norms (Russell, 1970) served as a basis for the construction of the 10 associated lists. These norms merely itemize the associates of a given stimulus word and, thus, do not contain any information on the relations between the associates of the stimulus word. Generally, however, it can be observed that words triggered as associative response by a stimulus word also tend to be associated with each other. The first ten associates of the stimulus word river may illustrate this tendency (from the English associative norms by Russell and Jenkins, 1954): water, stream, lake, Mississippi, boat, tide, swim, flow, run, barge. We selected 10 stimulus words from the German associative norms the associates of which were, in our judgment, also particularly strongly associated among each other. The selected stimulus words were: Berg (mountain), Brot (bread), Butter (butter), Dieb (thief), Fluss (river), Fuß (foot), Hand (hand), König (king), Nadel (needle) and Stuhl (chair). The following words were deleted from the ten lists containing the stimulus words and their first 19 associates: 1. Associates repeating the stimulus word or the stem of a preceding associate (e.g. Diebstahl (theft) as an associated word to Dieb, or Sitz (seat) and Sitzgelegenheit (seating) which are of a weaker association to Stuhl than sitzen (sit)). 2. Associates that are in German typically used as compounds together with the stimulus word and that, as single nouns, mainly occur in different contexts of meaning (e.g. Bett (bed) from the compound Flussbett (river bed)). The deleted associates were replaced by words that appeared to be plausible as associates of the respective stimulus word and its remaining associates. 100 From the thus constructed ten associated lists those two should be used in the experiment the words of which showed the strongest mean association with each other. For the quantification of the associations the PASS-model was used. In the simulation of the required associative pre-knowledge PASS had learned the associations between all 20 words on the ten lists (for the details of this learning trial see the section on the simulation method). For each of the word lists the mean of the 20 × 19 ÷ 2 = 190 associations per word list acquired by PASS was calculated. For the lists around the words „Stuhl“ and „Fluss“ it was the highest. Table 3.1 shows the two associated lists used. Table 3.1 Word lists used in Study 1. English translations of the original German words are given in parentheses. Associated lists Stuhl (chair) Fluss (river) Tisch (table) Wasser (water) Sessel (easy-chair) Strom (stream) sitzen (to sit) Ufer (bank) bequem Rhein (Rhine) (comfortable) setzen (to sit down) Meer (sea) Hocker (stool) Möbel (furniture) Sofa (sofa) Holz (wood) Lehne (back-rest) tief (deep) See (lake) Tal (valley) breit (broad) schwimmen (to swim) hart (hard) Boot (boat) Kissen (cushion) Fisch (fish) schaukeln (to swing) Brücke (bridge) rutschen (to slide) Bach (brook) Zimmer (room) kippen (to tilt) Wohnung (appartment) Bett (bed) stehen (to stand) baden (to bath) Stadt (city) Schiff (ship) Landschaft (landscape) überqueren (to cross) Non-associated lists Stuhl (chair) Fluss (river) Boot (boat) Lösung (solution) Solist (soloist) Wand (wall) punkten (to score) Mode (fashion) mutig (courageous) Münster (German city) verlangen (to Stil (style) demand) Teufel (devil) spitz (pointed) Anruf (call) Ohr (ear) Beleg (receipt) Horn (horn) Sinn (sense) karg (meagre) Bereich (area) merken (to realize) sauer (sour) Gen (gene) ragen (to tower) entziehen (to withdraw) Start (start) urteilen (to judge) Protest (protest) Herbst (fall) Anteil (share) Mittel (means Brief (letter) leiden (to suffer) Sport (sport) Druck (pressure) Bahn (railroad) Herrschaft (power) vernichten (to destroy stecken (to stuck) 101 The starting point for the arrangement of the non-associated lists constituted again the words Stuhl and Fluss. The remaining words on these lists each time came from those 100 words from the PASS-vocabulary (Sedlmeier et al., 2004) showing the weakest associations to Stuhl or Fluss. From these words 19 were chosen each time in such a way that 1) the various classes of words (nouns, proper names, adjectives, verbs) were represented equally often on associated and non-associated lists, and 2) the words on associated and non-associated lists had nearly the same length (s. Tab. 3.1). For the resulting word lists the means and medians of all associations between the 20 words on both associated lists are approximately 4.5 times as large as the means and medians on the corresponding non-associated lists. For each participant a unique study list was created from one of the four word lists. This study list contained always 5 words from the corresponding word list 0, 2, 5, and 8 times. Thus, the study list had an overall length of 75 items. The words were randomly assigned to these presentation frequencies for each participant. The sequence of items was also randomly determined for every participant, with the restriction that the same word could not appear twice in immediate succession. The test list for the frequency judgment contained the 20 words of the corresponding word list, once more in random sequence. Procedure: The participants were tested individually at a personal computer. Every participant was randomly assigned one of the four word lists. Prior to the study phase the participants were informed that they would be presented with a longer word list in the following minutes. Furthermore, they were instructed to consciously read through this word list as they would later have to answer some questions referring to this list. The exact nature of the memory test, however, remained unspecified. The participants were also told that words on the list could occur repeatedly. During the study phase, words were presented one at a time. Each presentation lasted three seconds, followed by a one-second interval in which the screen was blank. Subsequent to the presentation of the word list, a free-recall test was carried out with half of the participants. The participants were given a sheet of paper. They were instructed to write down within two minutes as many words as possible from the list 102 previously studied in any order. After the two minutes were over, the sheet was taken away again and collected. After the recall test or, for the other half of the participants, immediately after the presentation of the study list, the participants were required to estimate as accurately as possible the number of times each word had appeared on the list. For this, the participants were presented each of the 20 words on their test list one at a time. It was pointed out to them in advance that they possibly also have to assess the frequency of words not yet presented. In this case their frequency judgment should be "0". In addition to this, the participants were informed that decision times would be recorded as well. The measurement of the decision times largely followed a method introduced by Brown (1995). The participants were instructed to press a key marked red (the spacebar) in each test trial "as soon as they decided in favor of a particular number". At the activation of the spacebar a response field appeared on the screen next to the word to be estimated. Now the participants should start entering their estimate as quickly as possible. A test trial ended when the participants pressed the enter-key to confirm their judgment. After a one-second interval in which the screen was blank, the next test trial started with the appearance of the next word. Measured was 1) the time from the appearance of the test word to the activation of the spacebar (decision time), 2) the time from the activation of the spacebar to the entry of the first digit (initiation time), and 3) the time from the entry of the first digit to the activation of the enter key (entry time). The sequence that thus had to be kept whenever judgments were given was practiced with the participants beforehand by means of three examples. Here the participants had to give judgments on the frequency with which they had performed particular activities in the past week (e.g. "washing the dishes"). All together, an experimental session lasted about 20 minutes. Simulation Method Simulation of Pre-knowledge A large machine-readable text corpus served as a model of the environment in which our participants acquired their associative pre-knowledge. This text corpus 103 consisted of four complete volumes (1995 – 1998) of the Süddeutsche Zeitung, a big German nationwide daily newspaper. The text corpus contained approximately 120 million word forms all together. A text corpus of this scale can be regarded as an adequate sample of the linguistic information that was pre-experimentally encoded by our participants23. First of all, the text corpus was lemmatized, that is to say all word forms included in the corpus were reduced to their stems24. From the lemmatized corpus PASS now learned the associations between the experimental stimulus words. To learn associations from texts PASS simulates the process of reading. Similar to a human reader whose attentional span comprises not only the most recent word but also several words of the text previously read, PASS uses an “attentional window”. This window is moved over the corpus word by word. In this process, PASS encodes all words in the window in every step and in this way changes the associations between these words and the remaining words in its vocabulary according to the rules in equation 1.25 Thus, associations between two words both occurring in the attentional window are strengthened. However, associations between two words of which only one or none is contained in the window are attenuated. Since all words in the window are taken into account in this updating process, the association between adjacent words is more frequently and thus more strongly increased than the association between words separated by some intermediate words. Thus, with an assumed window length of 10, two adjacent words would be contained together in the window eight times. On the other hand, two words with eight intermediates between them would co-occur only once. To be able to simulate the process of association acquisition, it was necessary to have every word treated as independent feature by PASS. Therefore, to every word of the PASS-vocabulary an own unit was assigned in FEN that encoded the occurrence of this word. With regard to the original formulation of PASS in which every object is 23 The extent to which a text corpus represents an adequate sample of language seems to depend primarily on its size (Burgess & Livesay, 1998). In this respect, our corpus comes off well compared to other corpora such as the frequently used Brown Corpus of Standard American English (Kucera & Francis, 1967), which comprises approx. 1 million word forms. 24 For that we used the program MORPHY, a lemmatizer for the German language (Lezius, Rapp, & Wettler, 1998). 25 I am indebted to Petra Seidensticker for the programming of the simulations reported here and in Study 2. 104 represented by several features, this simplification was necessary for pragmatic and theoretical reasons. For one thing, the required number of arithmetic operations cannot be managed with a distributed representation26. For another, currently there is no model that makes both theoretically and empirically satisfying assumptions on how words can be represented in a distributive manner (Kintsch, 1998). A reasonable “decomposition“ of words in different features was hence impossible. For the learning trial in which PASS acquired its pre-knowledge about the associative relations between the words used in the experiment, the text corpus was subdivided into the four volumes 1995, 1996, 1997 and 1998 of the Süddeutsche Zeitung. The associations acquired from the single volumes were then averaged to diminish possible recency effects. In the learning trial the following parameter settings were used: θ1 = 0.005, θ2 = 0.0001 and θ3 = 0.0000001. The length of the attentional window was 12 words. These parameter settings yielded the best results among several tested parameter combinations in a study by Sedlmeier, Wettler and Seidensticker (2004) on the prediction of primary associations. In the simulation by Sedlmeier et al. PASS disposed of a vocabulary of 3580 words. PASS acquired all associations between these words from the text corpus also used here. After that, PASS was presented with 100 stimulus words. The strongest associations of the model to these stimulus words tallied best with human primary associations if the parameter combination mentioned was used. With these parameters, the number of concordances between the associative responses of the model and the primary associations approximately corresponded to the number of concordances also achieved by human test persons. Thereby, the study also demonstrates that the procedure followed here is in general suitable to simulate human associations. 26 In every step in which the attentional window is moved over the text PASS updates the associations between all words in its vocabulary according to one of the three rules in equation 1. Thus, in a localistic representation the total number of required arithmetic operations already results from the product of the number of words in the text corpus with the square of the number of words in the PASSvocabulary. In a distributed representation this total number of arithmetic calculations would have to be multiplied once again with the square of the number of features per word. 105 The Simulation of the Experimental List Presentation In a second step of the PASS-simulation the encoding of the experimental presentation of the study lists had to modeled. For that, a further learning trial was realized with PASS. As already described above, the study lists of participants differed even in the same experimental condition from each other by the fact that the stimulus words were randomly assigned to various presentation frequencies and positions on the list. All in all, 48 different associated as well as 48 different non-associated study lists were used, one for each participant. Exactly these 96 study lists were now presented to PASS and encoded by the model. Prior to the encoding of the study lists, the self-associations of the words (i.e. the weights in FEN connecting a unit with itself) previously acquired by PASS from the text corpus were set to 0. These self-associations are exclusively determined by the pre-experimental frequency of occurrence of the words in the text corpus. However, at least two studies demonstrate that estimates on list frequencies of words are only marginally influenced by the linguistic frequency of these words (Rao, 1983; Greene & Thapar, 1994). Consequently, a noteworthy impact of the frequencies of the stimulus words in the text corpus on the judgments about the experimental list frequencies was not to be expected. To take this circumstance into account, we decided to set the pre-experimentally acquired self-associations to 0. As a result, the selfassociations of words were completely relearned during the modeling of the experimental presentation while the pre-experimentally acquired associations between words were overlearned. For the encoding of the study lists by PASS, various parameter combinations were used. we changed the parameter settings due to the consideration that the information presented in the experiment is possibly encoded with more attention and is represented better than information processed in more familiar contexts. Thus, starting from the parameter settings used for the encoding of the text corpus, we increased the learning parameter θ1 and decreased the parameters θ2 and θ3 which regulate the amount of forgetting. θ1 was increased in nine steps from 0.005 to 0.05. The parameters θ2 and θ3 either were 0.0001 and 0.0000001 or were commonly set to 0. Additionally, we varied the length of the window in which two stimulus words had to 106 co-occur so that the associative connection between them was increased. A window length of 12 words had proved optimal in the prediction of primary associations from the text corpus. However, in the text corpus the relevant words are semantically connected and at the same time present. For the case realized in the experiment, i.e. that words succeed unconnected and are presented one at a time, it seems to be reasonable that the distance between two words has to be shorter so that the associative connection between them is increased. Therefore, the three shortened window lengths 3, 5, and 7 were used for the encoding of the study lists. All together, the 96 study lists were thus encoded by PASS with 60 different parameter settings each (10 θ1, 2 common settings for θ2 and θ3, and 3 window lengths). Simulation Results Relative magnitude of responses: Figure 3.2 exemplarily illustrates FEN’s standardized mean responses for four different settings of the learning parameter θ1 and the window length. The interference parameter θ2 always amounts to 0.0001; the decay parameter θ3 is constantly at 0.0000001. The results of the item sets Stuhl and Fluss did not differ systematically and are summarized in the figure. The data points thus result from an averaging of responses over all 48 study lists presented. The absolute magnitude of FEN’s responses largely varies with the settings for θ1 and the window length. Hence, with a θ1 of 0.01 and a window length of 3 (chart A), the mean response over all stimuli amounts to 0.66. With a θ1 of 0.03 and a window length of 5 (chart D) on the other hand, a mean of 2.41 is the result. However, since it is assumed in PASS that the frequency judgments of participants result from a linear transformation of responses, the predictions of the model refer to the relative differences between the data points and are independent of the absolute magnitude of the responses. To facilitate a comparison of the predictions of the model across different parameter settings, the displayed means were hence standardized in such a way that the mean response for associated stimuli with the presentation frequency 8 always takes the value 1. 107 Theta 1 = 0,01; Window length = 3 Theta 1 = 0,01; Window length = 5 1,2 associated non-associated 1,0 0,8 0,6 0,4 0,2 A 0,0 0 2 5 Standardized Mean Response Standardized Mean Response 1,2 0,8 0,6 0,4 0,2 B 0,0 8 0 2 5 8 Presentation Frequency Presentation Frequency Theta 1 = 0,03; Window length = 3 Theta 1 = 0,03; Window length = 5 1,2 1,2 1,0 0,8 0,6 0,4 0,2 C 0,0 0 2 5 8 Presentation Frequency Standardized Mean Response Standardized Mean Response 1,0 1,0 0,8 0,6 0,4 0,2 D 0,0 0 2 5 8 Presentation Frequency Figure 3.2. FEN’s standardized mean responses for 4 different settings of the learning parameter θ1 and the window length in Study 1. In all charts θ2 = 0.0001 and θ3 = 0.0000001. Three characteristics of the graphs in Figure 3.2 are of interest. First, the figure exemplifies the sensitivity of the PASS model to frequencies. FEN’s responses increase with the presentation frequency of items irrespective of the selected parameter combination. This holds for both associated and non-associated words. Second, FEN’s responses on associated and non-associated items show a similar trend so that the curves for both types of items are approximately parallel. Again, this holds irrespective 108 of the parameter combinations. Third, and most important for the question pursued here, all four exemplarily parameter combinations lead PASS to the prediction that associated words receive higher frequency judgments than non-associated words. However, the magnitude of the expected difference between estimates on associated and non-associated items is influenced by θ1 and the window length. The general pattern is that the expected effect decreases the more FEN learns during the experimental presentation of the study lists. Chart A illustrates FEN’s responses with θ1 = 0.01 and a window length of three words, the lowest parameter settings included in the figure. In this case, the pre-experimentally acquired knowledge about the associations between the stimulus words exerts the strongest influence on the responses. Accordingly, the maximum difference between associated and nonassociated items can be found here. The impact of the pre-knowledge on the judgments on the experimental presentation frequencies is now warded off the better the study lists are represented in PASS’ memory. A better representation can be found in FEN in a greater change of weights connecting the words in the study list with themselves and each other. An increase of both the learning parameter θ1 and the window length now causes such a greater change of weights during the experimental presentation. Charts B and C illustrate that it is irrelevant for the resulting responses whether this is obtained by a greater change with each encoding (i.e. by a higher θ1) or by a more frequent updating of weights (i.e. a greater window length). Chart D illustrates that θ1 and the window length take an additive effect. A conjoint increase of both parameters results in a further reduction of the difference in the responses to associated and nonassociated items. The variation of the parameters θ2 and θ3, which conjointly regulate the scope of forgetting processes, also had a systematic effect on the mean responses in FEN. If no forgetting occurred during the presentation of study lists (θ2 = θ3 = 0), greater differences generally resulted between associated and non-associated stimuli than in case of active forgetting processes. This can be ascribed to the fact that the higher weights between associated words are reduced more strongly by forgetting processes than the lower weights between non-associated words (see equation 1). However, the effect of the variation of the parameters θ2 and θ3 was very small. By adding forgetting 109 processes, the mean responses did not change by more than 1%. For all practical purposes the influence of forgetting processes realized in the simulation can thus be neglected. In which range now does the expected influence of the associations on frequency judgments vary in dependence of the parameter settings? To obtain an answer to this question, the ratio of mean responses to associated and non-associated words was calculated separately for the presentation frequencies for each of the 60 parameter combinations used. If, for example, a parameter combination produced a mean response of 0.9 for associated words with the presentation frequency 2 and a mean response of 0.6 for non-associated words with the same frequency, the ratio of this parameter combination for the frequency 2 would result from the calculation 0.9 ÷ 0.6 = 1.5. The box-plots in Figure 3.3 illustrate the resulting distributions of ratios over the 60 parameter combinations. Values above 1 show that greater frequency judgments are expected for associated words than for non-associated ones. The uppermost outliers in the presentation frequencies 2, 5, and 8 result from a θ1 of 0.005 and a window length of 3, the lowest parameter settings realized in the simulation. Accordingly, the responses to associated words are here most clearly above those to non-associated words. As soon as θ1 or the window length are increased the ratios between associated and non-associated words become smaller27. Values that equal 1 or are smaller than 1, however, are not obtained with any of these parameter combinations. Thus, independent of the quality of the encoding of the study lists, PASS expects for all presentation frequencies that associated words receive higher frequency judgments than non-associated words. Within the representation assumptions of the model the influence of the pre-experimentally acquired associations on the judgments on list frequencies obviously cannot be totally neutralized. Consequently, higher experimental judgments for non-associated words than for associated words would be incompatible with PASS. 27 In case of the presentation frequency 0, such an influence cannot be found. This can be ascribed to the fact that the pre-experimentally acquired associations of words with the presentation frequency 0 are not overlearned during the presentation of the study list. The weights connecting these words with the remaining stimulus words are only subject to the influence of forgetting. This influence, however, turns out to be very small for the selected parameter combinations. 110 Ratios of Mean Responses 4 3 2 1 0 0 2 5 8 Presentation Frequency Figure 3.3. Distributions of ratios of mean responses for associated and non-associated items in Study 1. In principle, the difference in the responses to associated and non-associated words is, however, arbitrarily reducible by an increase of θ1 and the window length. On the other hand, there exists a considerable trade-off between both parameters due to their similar impact on the responses. Only if both parameters acquire extreme values, i.e. the study list is encoded extremely well, the expected effect will be small. For the great majority of parameter combinations used a relatively narrow prediction range results in which the differences between associated and non-associated items exceed 10%. On the assumption that the parameter settings chosen in the simulation map the quality of the encoding of the study list by the participants more or less adequately, PASS well expects substantially higher frequency judgments with associated items. 111 Discrimination of Frequencies: As a measure for the ability to discriminate different frequencies Flexser and Bower (1975) suggested the discrimination coefficient. This coefficient is defined as the correlation between actual and estimated frequencies. The correlation is calculated separately for every participant. Thus, the coefficient provides an individual measure for the quality of the discrimination of various frequencies, independent of the absolute magnitude of frequency judgments. To gain predictions of the PASS model for the discriminability of associated and non-associated items, this measure was used. For each of the different 48 associated and 48 non-associated study lists the correlation between the actual presentation frequencies of the 20 items and FEN’s responses was calculated. Systematic differences in the simulation results to the item sets „Stuhl“ and „Fluss“ also did not arise with regard to the discrimination coefficient. Therefore, the results to both item sets are summarized below. Figure 3.4 shows the resulting mean correlations for associated and non-associated words for six different settings of the window length and the learning parameter θ128. The interference parameter θ2 each time amounts to 0.0001, θ3 is 0.0000001. First of all, Figure 3.4 once again illustrates PASS’ sensitivity to frequencies. With all parameter combinations high to very high correlations between the presentation frequencies and FEN’s responses are yielded both in case of associated and non-associated items. However, the frequencies of non-associated items are systematically discriminated better than the frequencies of associated items. The size of this effect is influenced by θ1 and the window length in a similar way as the size of the effect of associations on the amount of responses. Generally, the quality of the discrimination of different frequencies increases with the quality of the encoding of the study lists. Thus, if θ1 or the window length are increased, then the correlations between the actual frequencies and the responses will also be increased. This especially holds for the associated stimuli the discriminability of which is fairly clearly affected by the associative connections between the stimulus words in case of low parameter settings, that is, poor encoding of the study lists. With a better encoding of 28 Here and below, the correlations had been transformed into Fisher Z-values before they were averaged or subjected to statistical tests (Silver & Dunlap, 1987). The reported means result from the back-transformation of the corresponding mean Z-values. 112 study lists the influence of the associative pre-knowledge on the discrimination of frequencies decreases. As a result, the differences between associated and nonassociated items in the correlations of the corresponding responses with the presentation frequencies also diminish. As Figure 3.4 illustrates, for the resulting predictions of the PASS model it is repeatedly irrelevant whether an improved encoding of study lists is effectuated by an increase of θ1 or the window length29. Theta 1 = 0,01 Theta 1 = 0,03 1,0 1,0 0,9 0,9 Mean Correlation Mean Correlation associated non-associated 0,8 0,7 0,6 0,5 0,8 0,7 0,6 0,5 3 5 Window length 7 3 5 7 Window length Figure 3.4. Mean correlations between FEN’s responses and presentation frequencies for 6 different settings of the learning parameter θ1 and the window length in Study 1. In both charts θ2 = 0.0001 and θ3 = 0.0000001. In order to illustrate in which range the differences in the discriminability of associated and non-associated items expected by PASS vary across all parameter settings, a similar procedure as in the predictions of the relative magnitude of frequency judgments was chosen. For each of the 60 parameter settings the ratio of the mean predictions, that is the mean discrimination coefficients, for associated and nonassociated words was calculated. Values below 1 hence show that frequencies of 29 The variation of the parameters θ2 and θ3 only exerted an extremely small influence of no practical importance on the discrimination of frequencies. Generally, however, a tendency could be observed according to which the differences in the discriminability of frequencies of associated and nonassociated items were smaller if forgetting processes occurred. 113 associated words are discriminated worse than frequencies of non-associated words. Figure 3.5 illustrates the distribution of the resulting ratios. Ratio of Mean Correlations 1,0 0,9 0,8 0,7 0,6 0,5 Figure 3.5. Distribution of ratios of mean correlations between FEN’s responses and presentation frequencies for associated and non-associated items in Study 1. The figure shows that also concerning the discriminability of frequencies an invariant prediction of the PASS model about the effect of associations is yielded. Values above or equal 1 are not attained even if the study lists are encoded extremely well. That is, independent of the parameter settings, PASS expects that the frequencies of associated items are discriminated worse than the frequencies of non-associated items. However, clear differences in the discriminability of frequencies of associated and non-associated items only arise if both the learning parameter θ1 and the window length take low values. Thus, the two outliers at the bottom of Figure 3.5 result from the lowest settings for both parameters (window length = 3, θ1 = .005 or θ1 = 0.01). Generally, a trade-off between θ1 and the window length also exists with regard to the discrimination of frequencies which is due to the similar effect of both parameters. This trade-off causes again that the predictions of the PASS model only vary in a relatively narrow range for the majority of parameter combinations. In this range only 114 slight differences in the discriminability of frequencies of associated and nonassociated items are expected. Accordingly, the median of the ratios resulting from the 60 parameter combinations amounts to 96.3%, as can be seen in Figure 3.5. Hence, with half of the parameter combinations it is predicted that the mean correlations between estimated and actual frequencies for associated words differ by less than 4% from the correlations for non-associated words. In other words, PASS generally expects on the one hand that the frequencies of associated items are discriminated less well than the frequencies of non-associated items. On the assumption that the parameter settings chosen in the simulation map the quality of the encoding of the study list by the participants more or less adequately, this effect should, on the other hand, turn out to be small. Experimental Results Decision Times A reasonable comparison of the predictions of the PASS model with the experimentally collected frequency judgments is only possible if it can be shown first that the judgments of the participants were generated with a memory assessment strategy. Therefore, below we examine first whether the decision times show the slight increase with presentation frequencies typical of judgments based on memory assessment. The decision times are additionally used to examine whether a preceding recall test fosters the use of an enumeration strategy. If this is the case, the function that relates the decision times with the presentation frequencies should be steeper than without a preceding recall test. Following Brown’s (1995; 1997) procedure, three time measures were recorded during each trial of the frequency test. In addition to the decision time, an initiation time and an entry time were ascertained for control purposes. Moreover, all three timemeasures were added to a total time indicating the time needed for the generation and entry of a frequency judgment. For each of these four time-measures the mean and the median over the five trials on each presentation frequency level were calculated for every participant. These means and medians served as dependent variables for all further analyses. However, all results reported below are based on analyses with mean 115 decision times. Hence, some explanatory notes on the remaining time measures and medians: for all time-measures the medians were slightly smaller than the corresponding means (between 0.1 s and 0.2 s). The resulting effects were unaffected by this so that the analyses of means and medians lead to an identical pattern of result. Typically, the initiation and entry of frequency judgments occurred quickly. The corresponding means were at 1.0 s for initiation times and at 0.9 s for entry times. Both time-measures slightly increased with the presentation frequency. The difference of mean times between the presentation frequency 0 and the presentation frequency 8 amounted to approximately 0.3 s for both initiation times and entry times. Other effects did not occur with the initiation and entry times. Finally, an analysis of the total times leads to the same pattern of result as the analysis of decision times presented here. All reported findings on initiation, entry and total times replicate results also observed by Brown (1995; 1997). In Figure 3.6 the mean decision times are plotted against the presentation frequencies separately for the conditions with or without a preceding recall test. Irrespective of presentation frequencies, the recall test constantly slows down the generation of frequency judgments by approximately 0.2 s. The effect size of the recall test on the decision times, however, is small with reffect size = .08. For the question about the influence of the recall test on the used judgment strategies the fact that there is obviously no interaction with the presentation frequencies is more important. The decision times slightly increase both in the condition with and without recall test in a similar manner. Thus, the data speak clearly against the assumption that a preceding recall test fosters the use of an enumeration strategy. In general, the decision times correspond to the result typical of judgments based on memory assessment. For example, Brown (1995, Experiment 3) found decision times of 2.0 s, 2.8 s, 3.0 s and 3.5 s for the presentation frequencies 0, 2, 4 and 8 for judgments based on memory assessment. In the present experiment we had values of 2.3 s, 3.0 s, 3.4 s and 3.7 s averaged over the conditions with and without recall test for the frequencies 0, 2, 5, and 8. In magnitude and trend the decision times thus almost exactly come up to previous results for judgments based on memory assessment (see also Brown, 1995; 1997). On the other hand, the sharp increase in the 116 decision times to values above 6.5 s for the frequency 8 occurring when an enumeration strategy is used cannot be found. Hence, it can be assumed that the participants in the present experiment used an assessment strategy to generate their frequency judgments. 5 Recall Mean Decision Time (s) No Recall 4 3 2 0 0 2 5 8 Presentation Frequency Figure 3.6. Mean decision time as a function of presentation frequency for groups with and without a preceding recall test in Experiment 1. Error bars denote standard errors. The decision times for associated and non-associated items did not differ from each other (reffect size = 0.02). An ANOVA with the within-factor ‘presentation frequency’, and the between-factors ‘recall test’ and ‘association’ accordantly exclusively identifies the main effect of the presentation frequency as significant, 0.45, F(3, 264) = 23.03, p significant (all Fs = 0.001. All other main effects and interactions were not 1) and the corresponding effect sizes were close to 0 (all s 0.1). 117 Frequency Estimates Like the responses of the PASS model before, the frequency estimates of the participants were also analyzed with regard to their relative magnitude and the quality of discrimination of the actual frequencies. Apart from the influence of the associations and presentation frequencies on the estimates, it should also be examined whether a preceding recall test has an effect on the estimates. Below, the magnitude of frequency estimates is considered first. Relative Magnitude of Frequency Estimates: Every participant provided 5 estimates on each level of presentation frequency. The means of these 5 estimates formed the basis of the further analysis. From each of the 4 conditions resulting from the crossing of the factors ’association’ and ’recall test’, 2 participants were excluded from this analysis since they had provided exceptionally high estimates.30 The two charts in Figure 3.7 show the mean frequency estimates of the 44 participants remaining in each of the recall conditions as a function of the presentation frequencies. A first visual assessment of the estimates in the condition without recall test shows that the data display all features that were predicted by the PASS model irrespective of possible parameter settings: First, the judgments on associated and nonassociated items increase with rising presentation frequencies. Second, this increase is largely unaffected by the associations between the items so that the graphs for associated and non-associated items show a similar trend. Third, for all presentation frequencies the associated items receive higher estimates than the non-associated ones. Collapsing over presentation frequencies, a mean of 4.5 is the result for associated items, the mean for non-associated items amounts to 3.8. The effect size of the associations on the frequency estimates is reffect size = 0.17, that is, according to Cohen’s (1992) conventions, a small to medium effect. All together, the experimental frequency estimates in the condition without a preceding recall test thus seem to correspond well to the predictions of the PASS model. 30 The estimates of these participants were - for at least one of the presentation frequencies 2, 5 and 8 more than 1.5 interquartile ranges above the 75th percentile of the corresponding reference group. 118 No Recall 8 associated non-associated Mean Estimated Frequency Mean Estimated Frequency 8 Recall 6 4 2 0 6 4 2 0 0 2 5 8 Presentation Frequency 0 2 5 8 Presentation Frequency Figure 3.7. Mean estimated frequencies for associated and non-associated words in the conditions without (left graph) and with (right graph) a preceding recall test in Experiment 1. Error bars denote standard errors. The dashed lines represent the actual frequencies. In the condition with recall test the data deviate from this pattern in several respects. First, the estimates given after the recall test are generally lower. The total means of the estimates in the conditions with and without recall test amount to 3.6 and 4.2, respectively. The effect size of the recall test is reffect size = 0.18. Furthermore, the differences between the recall conditions increase with rising presentation frequencies. Collapsing over associated and non-associated items, the means for the presentation frequencies 2, 5, and 8 differ between the recall conditions by 0.1, 0.7, and 1.3. So far, the findings to the effect of the recall test on frequency estimates thus replicate the results by Erickson and Gaffney (1985). Moreover, in the condition with recall test differences occur in the trend of the estimates between associated and non-associated items. These differences lead to the fact that the data pattern in this condition apparently disagrees with the predictions of the PASS model. It is true that also in the condition with recall test associated items receive on average higher estimates than non-associated ones (Massociated = 3.8; 119 Mnon-associated = 3.4). The effect of associations on the estimates is here with reffect size = 0.18 as great as in the condition without recall test. On the other hand, the estimates on associated items show a clearly smaller increase from presentation frequency 2 onwards than the judgments on non-associated items. This brings about that for the presentation frequency 8 associated items receive lower estimates than non-associated ones. Such differences in the trend of judgments for associated and non-associated words have not been expected by PASS irrespective of the parameter settings. Now, how good is PASS exactly at predicting the trend of the estimates in the four conditions resulting from the combination of the factors ’association’ and ’recall test’? To shed light on this, we determined a fit-measure for each of the individual conditions. The 20 frequency estimates of each participant were correlated with the mean predictions of the PASS model for the four presentation frequencies in the corresponding condition31. Thus, the resulting correlations indicate for each participant how well the trend in the estimates of this participant is predicted by PASS, inclusive of the entire individual error variance. Table 3.2 displays the correlations in the four conditions averaged across participants for a simulation with parameter values of medium magnitude (θ1 = 0.025, window length = 5). For estimates rendered without an intermediate recall test high fit-values arise that are approximately at a mean correlation of r = 0.8 for both associated and nonassociated items. A t-test shows that the correlations for associated and non-associated items do not differ significantly from each other, t(42) = 1.24. Thus, an analysis at the level of individual participants confirms the result already indicated by the mean estimates: As predicted by PASS, the estimates on associated and non-associated words take a similar trend. 31 In other words, the predictions of the PASS model for the individual presentation frequencies each time resulted from the mean responses over the 24 study lists presented in the corresponding condition. However, remember that the study lists in the conditions with and without recall test only differed from each other by randomly assigning items to presentation frequencies. Correspondingly, there are no systematic differences between the conditions with and without recall test in the PASS predictions. Thus, it is irrelevant for the resulting fit-measures whether the PASS predictions are based on all 48 associated and 48 non-associated study lists or whether they are – as in the reported analysis – additionally determined separately according to recall conditions. 120 Table 3.2 Mean correlations between estimated frequencies and PASS mean responsesa for associated and non-associated items in Experiment 1 No Recall Recall Associated .77 .71 Non-associated .81 .81 a Parameter values for the PASS simulation: θ1 = 0.025, θ2 = 0.0001, θ3 = 0.0000001, window length = 5 However, this does not hold for estimates given after a recall test. In this case, the correlation between the PASS predictions and the frequency estimates is smaller for associated words than for non-associated ones, t(42) = 2.94, p < 0.005. Hence, after a recall test, the trends of the estimates on associated and non-associated items differ from each other. Apparently, PASS can map the trend of estimates on associated words only comparatively poorly. Since PASS expects a similar trend of estimates in all conditions irrespective of the parameter combinations (see Figure 3.2), the fit-values hardly vary with the parameter settings. For example, in the condition without recall test the mean correlations for non-associated items are for all 60 parameter combinations between r = 0.80 and r = 0.81 . As a result, also the differences in the fit-values between the conditions are nearly invariant for all parameter settings. Thus, the recall test affects the estimates on associated words in a way not predicted by PASS independent of the parameter settings. The fit-measures reported so far refer to the predicted trend of estimates across the presentation frequencies. With that, they do not yet take into account the differences predicted by PASS in the relative magnitude of estimates for associated and non-associated items. In order to determine the co-variation between the estimates and the predictions for all eight conditions that result from the combinations of the factors ’association’ and ’presentation frequency’, contrast analyses (Rosenthal, Rosnow, & Rubin, 2000) were carried out. To obtain a single effect size for the PASS predictions, it was necessary to treat the data as if a complete between-subjects design had been employed. The resulting effect sizes can be interpreted as the correlation between the eight mean predictions of the PASS model and the individual mean 121 estimates of all participants. That is, the variance between participants enters these effect sizes as error. Table 3.3 contains the results of the contrast analyses for the lowest, the medium and the highest settings for θ1 and the window length in the conditions without preceding recall test. In the condition with recall test the effect sizes are similar despite the different trend of the estimates for associated and non-associated words which was not expected by PASS. This can be ascribed to the fact that in the condition with recall test generally lower variances occurred32. Thus, irrespective of the predictions, in this condition the effect sizes are impaired by less error variance than in the condition without recall test. Therefore, the effect sizes are not comparable in a reasonable manner between the two recall conditions. However, the former analysis of the trends of the estimates already demonstrated that in the condition with recall test differences between associated and non-associated words appeared that cannot be accounted for by PASS. Consequently, only the total-fit in the condition without recall test is described in detail below. As can be seen in Table 3.3, high effect sizes of reffect size = 0.77 are the result for the medium parameter combination (θ1 = 0.025, window length = 5) as well as for the highest parameter combination used (θ1 = 0.05, window length = 7). In other words, with these parameter settings PASS predicts the total pattern of frequency judgments across the presentation frequencies and the two levels of association very well. For the lowest parameter combination (θ1 = 0.005, window length = 3) the effect size is clearly smaller with reffect size = 0.46. Recall that the predictions of various parameter settings mainly differed in the expected influence of the associations on the magnitude of estimates. In case of low parameter settings, that is, in case of a comparatively bad encoding of the study lists by FEN, this influence is obviously overestimated. The predicted differences between the estimates on associated and nonassociated items turn out to be too great here. 32 This particularly holds for the estimates on associated words. Here, the standard deviations between the participants’ mean judgments on the four levels of presentation frequency were 0.58, 1.54, 1.89 and 2.01 in the condition with recall test, and 0.83, 1.50, 3.70 and 3.61 in the condition without recall test. 122 Table 3.3 Contrast Analyses with predictions from three different settings for θ1 and the window length for the estimates in the condition without recall test in Experiment 1 Parameter settingsa MScontrast MSerror dferror reffect size θ1 = 0.005, window length = 3 θ1 = 0.025, window length = 5 θ1 = 0.05, window length = 7 290.62 5.12 168 .46 855.19 5.12 168 .77 884.68 5.12 168 .77 Note. F values can be calculated by dividing MScontrast by MSerror. a θ2 = 0.0001; θ3 = 0.0000001 The development of the effect sizes over the three parameter combinations contained in Table 3.3 also stands the test in general: the fit-values clearly increase from lower to medium parameter settings. On the other hand, a further increase of parameter settings generally effectuates no change in the fit-values. The reason for this is that the predictions for parameter settings of medium magnitude strongly correlate with the predictions for high parameter settings. For example, the correlation of the predictions for the medium parameter combination in Table 3.3 with the predictions of the highest parameter combination used amounts to r = .98. Nevertheless, even the predictions for these parameter settings differ in the expected degree of the impact of the associations on the estimates. This has no effect on the correlations between the predictions since for parameter settings in the higher range of values the variance of the predicted estimates generally predominantly goes back to the variation of the presentation frequencies (s. a. chart D in Figure 3.2.). Compared to this, the differences in the predicted distances between estimates on associated and non-associated words are too small to lessen the correlation deeply. If one regards the expected influence of the association on the magnitude of the estimates in isolation, the differences between the predictions of various parameter combinations even in the higher range of values are nonetheless considerable. Averaging over presentation frequencies, with the highest parameter combination it is predicted that the estimates for associated words are 8% above the estimates for nonassociated words. In contrast, with the medium parameter combination a difference of 23% between the estimates for associated and non-associated words is expected. In 123 fact, the estimates of the participants differed by 15% between the two levels of association. Owing to the trade-off between the learning parameter θ1 and the window length, identical or similar differences between associated and non-associated words are predicted with several parameter combinations. One of these combinations is θ1 = 0.04 and window length = 5. Figure 3.8 illustrates the correspondence between the PASS predictions for this parameter combination and the estimates averaged over the participants in the condition without recall test. In the figure both the estimates and the responses from FEN are standardized in such a way that the means for associated words with the presentation frequency 8 assume the value 1. A nearly perfect fit is the result. The correlation of the participants’ mean estimates with the responses is ralert = 0.99 (Rosenthal, Rosnow, & Rubin, 2000). Taken together, the analyses reported so far indicate that PASS is able to predict the pattern in the estimates in the condition without recall test extraordinarily well. And yet, also in this condition the judgments display a feature that was not expected by PASS: In contrast to the means, the medians of the estimates to associated items hardly differ from the medians of the estimates to non-associated items for all presentation frequencies. Figure 3.9 illustrates the distributions of the estimates in both conditions. The mean estimates per participant collapsed over presentation frequencies served as dependent variable here. The distribution of judgments on associated words is strongly skewed to the right. Accordingly, the median of the judgments is clearly below the mean here (Mdassociated = 3.6). A comparably strong asymmetry in the distribution of estimates cannot be found for non-associated words. For this type of items the median is identical with the mean (Mdnon-associated = 3.8). As a consequence, the medians in both conditions do not display an associative effect.33 33 In the condition with recall test, the distributions of the judgments are approximately symmetrical for both associated and non-associated items, and the medians only differ marginally from the means. For the mean estimates per participant a median of 3.7 (mean 3.8) results for associated words and a median of 3.4 (mean 3.4) for non-associated words. 124 Standardized Mean Estimate 1,0 0,8 0,6 0,4 0,2 0,0 0,0 0,2 0,4 0,6 0,8 1,0 Standardized Mean Response Figure 3.8. Standardized mean estimated frequencies in the condition without recall test plotted against standardized mean responses from FEN in Experiment 1. The mean responses result from a simulation with the parameter settings θ1 = 0.04, θ2 = 0.0001; θ3 = 0.0000001 and window length = 5. The line indicates the best linear fit. According to PASS, differences in the magnitude of the estimates on associated and non-associated items should appear in the medians as well as in the means. Irrespective of parameter settings, PASS expects approximately symmetrical distributions for both associated and non-associated words with all presentation frequencies. Correspondingly, predicted differences between the mean judgments on associated and non-associated words are generally accompanied by similar differences between the median judgments. For example, in the simulation with the parameter settings θ1 = 0.04 and window length = 5, which predicts the mean estimates of the participants almost perfectly, the median as well as the mean of all responses for associated items amounts to 3.2. For non-associated words either, the mean and the median turn out to be identical with 2.7. The lack of an associative effect in the medians of the estimates is thus inconsistent with the prediction derived from PASS. 125 Mean Estimated Frequency 10 8 6 4 2 0 associated non-associated Figure 3.9. Distributions of the mean estimates on associated and non-associated words in the condition without recall test in Experiment 1. The dotted lines indicate means. Discrimination of Frequencies: To determine the discriminability of the frequencies of associated and non-associated words, the correlations between the 20 frequency judgments of each participant and the actual presentation frequencies were calculated. Figure 3.10 displays the means of these correlations in the conditions without and with recall test34. Both for associated and non-associated words fairly high correlations result in both recall conditions. That is, the participants generally show great sensitivity to the frequencies of the presented items. However, as predicted by the PASS model with all parameter combinations, the association between the items impairs this sensitivity. In both recall conditions the correlations are lower for associated words than for nonassociated ones. Without a preceding recall test, this difference is small. For associated 34 The data of those participants that had been excluded from the analysis of the relative magnitude of frequency judgments are included in the analysis of the discrimination of frequencies. The judgments of these participants were unusually high but did not differ notably from the values of the remaining participants with regard to their correlation with the actual frequencies. 126 items the mean correlation here amounts to r = 0.77 , for non-associated items to r = 0.80 . Such slight differences in the discriminability of associated and nonassociated words were expected by PASS with a multitude of parameter settings (s. Fig. 3.5). The effect size of the association on the correlations in this recall condition also turns out to be rather small with reffect size = 0.14. In the condition with a preceding recall test the difference is notably greater. Here, the participants discriminate the frequencies of associated words clearly worse, which was already alluded to in the comparatively slight increase of mean estimates across the presentation frequencies. For these words the mean correlation is r = 0.69 . For non-associated words the mean is here again r = 0.80 . The effect size of the association in the condition with recall test comes up to reffect size = 0.45, a large effect according to Cohen’s convention. 1,0 associated non-associated Mean Correlation 0,9 0,8 0,7 0,6 0,5 No Recall Recall Figure 3.10. Mean correlations between estimated and actual presentation frequencies for associated and non-associated items in Experiment 1. Error bars denote standard errors. In the ideal case, the PASS model should of course predict the relative differences in the discriminability of associated and non-associated words best with the 127 parameter combination that attained the best fit for the predictions concerning magnitude and trend of the estimates. Due to the similarity of the predictions of various parameter settings, an unequivocally best parameter combination cannot be determined neither regarding the discrimination nor the magnitude of judgments. However, in the previous paragraph it could be shown that medium to high parameter settings yield the best fit with regard to the differences in the magnitude of estimates on associated and non-associated items. With parameter settings from this range of values PASS expects that the correlations between estimated and actual frequencies for associated items turn out to be smaller by about 1% to 4% than for non-associated items (see Fig. 3.5). In fact, the correlations for associated and non-associated items in the condition without recall test differ from each other by 3.8%. In other words, in this condition similar parameter combinations result in good predictions both with regard to the magnitude of estimates and with regard to the discrimination. This does not hold for the condition with a preceding recall test. Here the correlations for associated words are by 13.8% below the correlations for non-associated words. Such great differences in the discriminability of associated and non-associated items are predicted by PASS only with low parameter settings. However, with these settings the difference in the magnitude of estimates on associated and non-associated items is at the same time clearly overestimated. Thus, for example, PASS correctly predicts the difference in the correlations in the condition with recall test with the parameter combination θ1 = 0.025 and window length = 3. This parameter combination, however, also results in the prediction that the mean frequency estimates for associated words are by 54% above the frequency judgments for non-associated words. A final remark on the discrimination of frequencies and their prediction by PASS is in order: As we have seen above, the participants generally discriminate fairly well between different frequencies. Nevertheless, the participants’ discrimination performances are in all conditions worse than expected by the PASS model. For nonassociated words PASS already predicted with the lowest parameter combination a mean correlation between estimated and actual frequencies of 0.85. With all other parameter settings values of above 0.90 were resulting. Even for associated words correlations which were unexceptionally clearly above 0.80 already resulted with 128 relatively low parameter settings (see Fig. 3.4). In other words, while PASS predicts the relative difference in the correlations for associated and non-associated words correctly – at least in the condition without recall test - it overestimates at the same time the absolute magnitude of the correlations. Discussion The main objective of the present study was to clarify whether the PASS model can explain possible effects of the associations between jointly presented items on frequency judgments on these items. We examined in this respect both the magnitude of the estimates and the discrimination of the actual frequencies for lists of associated and non-associated words. Let us first consider the results in the condition in which no recall test was carried out between the presentation of study lists and the submitting of frequency judgments. Estimates in the Condition without Preceding Recall Test In this condition the associated words received higher mean frequency judgments than the non-associated ones. This effect was independent of the presentation frequency of the items so that the estimates on associated and nonassociated words had a nearly parallel trend. Besides, the participants discriminated worse between the frequencies of associated words than between the frequencies of non-associated ones. However, the difference between the mean discrimination performances in both conditions was modest. In addition to this, by means of the flat increase in the decision times we could demonstrate that the participants relied on a memory assessment strategy when they generated their judgments. This confirms the results of earlier studies in which also no indication to the use of an enumeration strategy was found for judgments on the frequency of occurrence of items (Brown 1995; Dougherty & Franco-Watkins, 2003; Manis et al., 1993; Marx, 1985). Thus, we can conclude that the effects of associations on frequency judgments are mediated via the process of memory assessment. Therefore, PASS, originally specifically developed to the purpose of modeling frequency judgments based on memory assessment, should be able to explain these effects. 129 The result pattern observed in the condition without recall test indeed exactly corresponded to the invariant predictions of the PASS model. The preceding simulations had shown that independent of possible parameter settings PASS expects that the frequencies of associated words are estimated higher and are, at the same time, discriminated worse than the frequencies of non-associated words. The trend of the estimates should in this process not be influenced by the associations between the words. Consequently, the parameter-independent, ordinal predictions of the PASS model are in accordance with the data. Furthermore, we could demonstrate that with medium to high parameter settings PASS is able to predict various features of the estimates also quantitatively very well. Thus, high fit-values resulted for the trend in the estimates of individual participants both for associated and non-associated words. Huge effect sizes also resulted for the total fit between the PASS predictions and the participants’ estimates for associated and non-associated words on the four levels of presentation frequency. For the mean estimates this fit was even almost perfect. Moreover, those parameter settings obtaining these high fit-values also predicted the difference in the discrimination performances with associated and non-associated items approximately correctly. Thus, the best fit-values were unexceptionally achieved with parameter settings from the higher range of values which simulated a relatively good encoding of the study lists. This also means that from the model’s point of view the influence of the pre-experimentally acquired associations on the frequency judgments was smaller than it could have been since the participants showed a relatively good learning performance in the experiment. To summarize, the analysis of the quality of the predictions of the PASS model in the condition without recall test so far led to quite impressive results. However, the participants’ estimates also revealed a feature that deviated systematically from the predictions of the model: The medians of the estimates on associated and nonassociated items were approximately of the same size. That the medians other than the means showed no associative effect was due to the right skewed distribution of the estimates to associated items. In contrast to this the responses from FEN were roughly symmetrically distributed in case of both types of items. 130 Differences between the distributions of estimates and responses alone are, however, not yet critical for the assumption that the responses form the basis for intuitions of event frequencies. As already explained above, memory assessment first of all results in a non-numerical intuition of frequencies which is modeled in PASS by FEN’s responses. In a second step, this intuition has to be transformed into a numerical judgment. In experiments in which like in the present study no upper bound of the judgment range is given right-skewed distributions of frequency judgments are frequently observed (Brown, 1995; 1997; 2002). This is mostly explained by the fact that the participants can deviate arbitrarily far upwards from the actual frequencies when scaling their absolute estimates due to the lack of an upper limit. Deviations downwards are, however, limited by the minimal frequency 0. Thus, upwards more extreme deviations can be expected than downwards (Conrad & Brown, 1994; Brown, 1995). Against the background of this consideration it cannot necessarily be expected that the distributions of the frequency judgments are as symmetrical as the distributions of the responses from FEN. What could nevertheless be expected according to the PASS predictions is that the distributions of the estimates for associated and non-associated items turn out similar. The reason for this is that when deriving the PASS predictions, we assumed that participants encoding associated words carry out the transformation into absolute frequencies in the same way as participants encoding non-associated words. Since, furthermore, the responses from FEN are distributed equally for both kinds of items, equal distributions in the estimates should consequently be the result. That also makes PASS predict that differences in the estimates to associated and non-associated items should be of a similar size in medians and means. Why did the data not meet this prediction? In the experiment the participants were not provided with any information about the frequency of any single item or the length of the study list. Consequently, when giving their estimates, the participants had to infer the range of the absolute word frequencies. Generally, only little is known about how people transform their intuitions of event frequencies into estimates on absolute frequencies if no indications to an adequate judgment scale are given. The 131 assumption that the participants carry out these transformations in both associative conditions in the same way was based on the fact that associated and non-associated words were presented in the same way and that the judgments on the frequencies of those words took place with identical instructions and under identical conditions. Equal transformations in both associative conditions could perhaps develop if participants used former experiences of which memory responses in comparable situations corresponded to which absolute frequencies. Actually, the difference in the means of the frequency judgments between both associative conditions supports the assumption that in the condition with associated words at least some participants carried out the transformation in the same way as the participants in the condition with non-associated words, and thus transformed higher responses into higher estimates. However, in retrospect, the assumption that all participants behaved like this seems doubtable. An alternative possibility is that some participants could use information they had gained independent of the process of memory assessment when transforming their intuitions into frequency judgments. For example, it is conceivable that participants happen to know35 the frequency of a single item and can thus find an adequate judgment range for the remaining items (Brown & Siegler, 1993). A further possibility is that participants can derive a plausible value for the total number of item presentations from the time span of the individual presentations and the time span of the presentation of the total study list. Irrespective of whether and how often participants in the condition with non-associated words could use such kinds of information, these participants inferred the absolute frequency of the items approximately correctly. The mean estimates on non-associated words correspond to the actual frequencies quite well (see Fig. 3.7). The sum of the 20 estimates given by each participant amounts in mean and median to 76.1 and corresponds nearly exactly to the total number of 75 presentations on the study list. A consequence of this is that an identical transformation of responses in estimates would have led the participants in 35 This could be the case when participants try to count the presentation frequency of at least a single item during the study phase. It is well known that the frequencies of numerous items presented experimentally are not all counted successfully (Brown, 2002; cf. footnote 1). Nevertheless, Brown’s results (1997) support the assumption that accurate tallies of single items can be occasionally used by participants to infer an adequate frequency range for the remaining items. Another possibility would be that the participants are able to recollect various encoding episodes for at least one item and, thus, find the absolute frequency of this item correctly via enumeration. 132 the condition with associated items to estimates that would have been too high. Participants in this condition who could correctly infer the frequency range perhaps by means of their knowing the absolute frequency of a single item should thus behave differently: They should choose a different transformation than but the same (correct) scaling of their estimates as the participants in the condition with non-associated items. Thus, despite higher responses, these participants would give equally high estimates as participants who encoded non-associated items. Such a group of participants could produce the lower estimates on associated words and could therefore prevent that a difference between estimates on associated and non-associated items is also indicated in the medians. The explanation that the different distributions of the estimates for associated and non-associated items are based on different transformations hinges critically on the fact that the factor ’association’ was manipulated between-subjects in Experiment 1. The situation should change provided that participants in a within-subjects design give judgments on associated and non-associated items that were previously jointly presented on one study list. In this case, every single participant should perceive the differences in the intuitions of event frequencies between both types of items predicted by FEN. Even if participants had additional pieces of information about the absolute frequency of single items or about the length of the list, then this should merely lead them to infer a corresponding judgment scale for their absolute estimates. This judgment scale should then be applied to both associated and non-associated items. Consequently, differences in the intuitions to both types of items should in this case be converted into differences in the estimates, irrespective of whether additional knowledge about an adequate judgment scale is available or not. Hence, an associative effect occurring in FEN’s responses should in a within-subjects design be found in both the medians and the means of the frequency estimates. we tested this prediction in a second experiment in which associated and non-associated items were presented jointly on one study list to the participants. It should be pointed out that even in a within-subjects design it is not necessarily to be expected that the distributions of frequency judgments show the same shape as the distributions of the responses in FEN. While it is assumed that a given participant carries out the transformations in the 133 same way for all items to be assessed, different participants could perfectly well use different transformations (i.e. due to dissimilar knowledge that can be used to develop a subjectively suitable judgment scale). Since generally in the transformation process more extreme deviations from the actual frequencies are possible upwards than downwards, right-skewed distributions of the estimates can furthermore appear even if the distributions of the responses are symmetrical. Thus, the prediction for the withinsubjects design only states that the distributions of the estimates to associated and nonassociated items should have the same shape as long as also FEN’s responses for both types of items are distributed alike. Estimates in the Condition with Preceding Recall Test In the condition with recall test the estimates turned out clearly lower than in the condition without recall test for both associated and non-associated items. Besides, the differences in the magnitude of the estimates between the recall conditions increased with rising presentation frequencies. In this respect, the results of the present experiment replicate the findings of a previous study by Erickson & Gaffney (1985) which also examined the influence of a preceding recall test on frequency judgments. The differences between the estimates in both recall conditions also correspond to the typical differences between estimates based on memory assessment and those estimates generated with enumeration. Similar to the estimates in the condition with recall test, estimates based on enumeration mostly turn out lower than estimates based on memory assessment and show a smaller slope (Brown, 2002). However, the decision times contradict the obvious hypothesis that a recall test fosters the use of an enumeration strategy and thus produces the differences in the magnitude of the estimates. In fact, the preceding recall test decelerated the decision times marginally, an interaction between the recall conditions and the presentation frequencies could yet not be found. Accordingly, the sharp increase in decision times with rising presentation frequencies - typical of judgments based on enumeration - did not appear even after the recall test. Hence, the decision times support the conclusion that the participants applied memory assessment in both recall conditions to generate their estimates. 134 This evidently also means that a model of the memory assessment process should be able to explain both the estimates in the conditions with and without a recall test. The differences between the recall conditions in magnitude and slope of estimates described so far do not yet represent any essential problem. By the additional assumption that the recall test causes the participants to choose a smaller scaling when transforming their intuitions into absolute estimates, such differences could be integrated in the PASS prediction. Problems for the PASS model indeed do arise from the fact that additionally also the findings concerning the impact of associations on estimates differed between the recall conditions. It is true that higher judgments for associated items were found as expected also after a recall test, but the estimates on associated and non-associated items did not run parallel. For associated words the estimates increased clearly slower from presentation frequency 2 onwards. This trend of judgments on associated words only matched comparatively badly with the PASS prediction. Furthermore, the participants discriminated the frequencies of associated items clearly worse than those of non-associated items. Therefore, it was impossible to find parameter combinations with which PASS predicts both the differences in the discrimination performances and the differences in the magnitude of the estimates approximately correctly. From the model’s point of view, these findings are surprising. They indicate that the recall test specifically affected the estimates on associated items in a way not covered by the model. Presumably, this influence can only be integrated in PASS with difficulty. The influence of the recall test cannot take effect during the encoding phase since the recall conditions in the experiment did not differ until the completion of the presentation of the study lists. Consequently, it is by no means possible to map the impact of the recall test in PASS by the assumption of a changed learning parameter or a changed size of the attentional window. Erickson and Gaffney (1985) assumed that the recall test could take effect on the level of memory representations. They presumed that the successful recall of an item alters the memory representation of this item in a similar way as an additional presentation of this item on the study list. This assumption about the effect of a recall test on frequency judgments could in fact be easily modeled with PASS, but it disagrees with the results found. If the successful recall of an item 135 had a similar effect as an additional presentation, one would rather expect higher than lower estimates after a preceding recall test. Moreover, in our experiment as well as in previous studies (e.g. Deese, 1959; Cofer, 1965) words presented more frequently were recalled with a higher probability. Correspondingly, the estimates in the condition with recall test should have increased more steeply than in the condition without test. After all, also associated words were recalled more often in the experiment than non-associated ones. Hence, the difference in the magnitude of estimates on associated and non-associated words would have had to be greater after the recall test. None of these predictions applied. Therefore, it has to be assumed that the impact of the recall test takes effect only at the time when the judgments are given. Presently, PASS does not offer any possibility to account for such influences. In the model merely one invariant process to retrieve frequency information from memory is specified. The resulting responses should always be converted with a linear transformation into absolute frequency estimates. In situations in which additional influences take effect at the time when judgments are given the model can thus not be applied. If the findings from the present experiment were replicable, then a preceding recall test would constitute such a limitation of applicability of PASS. Therefore, we decided to incorporate the recall condition in the second experiment as well. 136 Study 2 For the most part, Experiment 2 represents a replication of the previous study. The essential modification is that the factor ‘association’ was manipulated withinsubjects in Experiment 2. Associated and non-associated items were presented together on one study list and every participant judged the presentation frequencies of all items. This change in the design was mainly motivated by the observation in Experiment 1 that the estimates on associated and non-associated items were distributed unequally. We assumed that this deviation from the PASS prediction went back to the fact that in a between-subjects design the intuitions of frequencies of associated and nonassociated items were converted into estimates in a different manner. Provided that this assumption is correct, in a within-subjects design the distributions of the estimates should correspond to the PASS predictions. In addition to this, the use of the withinsubjects design allows us to examine whether the effects of the associations on the magnitude of estimates and the discrimination of frequencies from Experiment 1 also occur in case of mixed lists. The frequency estimates in the second experiment were the subject of a new simulation with PASS. We carried out this simulation in order to check whether the changed arrangement of the study lists alters the basic predictions of the model concerning the effect of associations on frequency estimates. The results of this simulation enable us furthermore to determine again the quantitative fit between the experimental estimates and the predictions of the model. In the following paragraph the construction of the study lists and the procedure of the second experiment is described. Afterwards, we turn to the simulation method and the simulation results. Finally, the results of the experiment are reported and compared with the PASS predictions. Method Participants: Fourty-eight persons took part in the experiment (23 men and 25 women with an average age of 23 years). None of them had taken part in Study 1. The 137 participants were mainly students of the TU Chemnitz from different departments. Students of Psychology did not take part in the experiment. The participants received a reimbursement of 3 Euros. Material: In the experiment two item sets with 20 words each were used. Both item sets contained ten words strongly associated with each other and ten words as weakly associated to all other words in the set as possible. The ten associated items were taken from the two sets with associated words from Experiment 1. We used the stimulus words Stuhl (chair) and Fluss (river) and those nine words which were, according to PASS, most strongly associated with these stimulus words. Both sets were completed by ten words from the corresponding non-associated sets from Experiment 1. Here, those words were admitted in the sets which, according to PASS, showed the weakest association with “Stuhl” or “Fluss”. Table 3.4 shows the two resulting item sets. Table 3.4 Word lists used in Study 2. English translations of the original German words are given in parentheses. Associated words: Non-associated words: Set 1 Stuhl (chair) Tisch (table) Sessel (easy-chair) sitzen (to sit) setzen (to sit down) Hocker (stool) Möbel (furniture) Sofa (sofa) rutschen (to slide) Bett (bed) mutig (courageous) Sinn (sense) Bereich (area) sauer (sour) Gen (gene) ragen (to tower) Start (start) urteilen (to judge) Protest (protest) vernichten (to destroy) Set 2 Fluss (river) Wasser (water) Strom (stream) Ufer (bank) tief (deep) breit (broad) schwimmen (to swim) Fisch (fish) Brücke (bridge) überqueren (to cross) Münster (German city) Wand (wall) Ohr (ear) Horn (horn) Herbst (fall) Brief (letter) leiden (to suffer) Sport (sport) Herrschaft (power) Druck (pressure) 138 To quantify the associative strengths between all words on the item sets, we used the PASS model again. PASS learned these associations – as far as this did not already happen within Study 1 – during the simulation of the pre-knowledge required for the predictions. For each word the mean association to the 19 remaining words from the same item set was calculated. After that, the mean and the median for the 10 associated words as well as for the 10 non-associated words were calculated from these means. According to PASS, the means and medians of the associated words are in both sets about 4 times larger than the means and medians of the non-associated words. For each participant a unique study list was created from one of the item sets. This study list contained always 5 words from the corresponding item set 0, 2, 5, and 8 times. For half of the participants three associated and two non-associated words each were assigned to the presentation frequencies 0 and 5, while the presentation frequencies 2 and 8 were filled with two associated and three non-associated words each. For the other half of the participants the distribution of associated and nonassociated items to the presentation frequencies was reversed. With the exception of these restrictions, associated and non-associated words were randomly assigned to the presentation frequencies for every participant. The sequence of items was also randomly determined for every participant, with the restriction that the same word could not appear twice in immediate succession. Thus, associated and non-associated words appeared randomly mixed on the study lists. The same held for the test lists that contained the 20 words of the corresponding item set once again in random sequence. Procedure: The procedure was identical to the one used in Experiment 1. Simulation Method and Results The simulations with PASS followed the same procedure as in Study 1. First of all, PASS learned the additionally required associative pre-knowledge from the text corpus. As before, the parameter settings were θ1 = 0.005, θ2 = 0.0001 and θ3 = 0.0000001. Also the window length of 12 words was retained. Subsequently, the presentation of the study lists was modeled. Here PASS encoded the same 48 study 139 lists that were presented to the participants in the experiment. This learning trial was run with the 60 different parameter combinations also used in Study 1. Relative magnitude of responses: The simulation results to both item sets did not differ systematically and are thus summarized below. Figure 3.11 displays a typical PASS prediction about the relative magnitude of frequency judgments for associated and non-associated items. The predictions come from a simulation with parameter settings of medium magnitude for θ1 and the window length. The interference and decay parameter were θ2 = 0.0001 and θ3 = 0.0000001, respectively. In order to determine the mapped data points, first, within each study list the means of FEN’s responses to associated and non-associated words were calculated on a given frequency level. Afterwards, the unweighted means across the 48 study lists were computed from these means. Finally, these means were standardized via a division by the mean for associated words with the frequency 8. Standardized Mean Response 1,2 associated non-associated 1,0 0,8 0,6 0,4 0,2 0,0 0 2 5 8 Presentation Frequency Figure 3.11. FEN’s standardized mean responses for the parameter settings θ1 = 0.03, θ2 = 0.0001, θ3 = 0.0000001 and window length = 5 in Study 2. 140 As illustrated in the figure, the essential features of the PASS prediction from Study 1 are preserved even if associated and non-associated words are presented together on one study list. PASS is sensitive to the presentation frequencies of the items. It expects higher estimates for associated than for non-associated words. The size of this effect is not influenced by the frequency of the words so that the predicted estimates on associated and non-associated words run parallel. Changes in the parameter settings also take a similar effect on the PASS predictions as in Study 1. The estimates on associated and non-associated words show a similar trend for all parameter combinations. However, the size of the expected effect of the associations on the magnitude of the estimates depends on the settings for θ1 and the window length. With an improved encoding of the study lists – i.e. a higher θ1 and a greater window length – the impact of the associations on the judgments becomes smaller. Moreover, if forgetting processes occur, greater differences in the estimates between associated and non-associated items generally come about than if this is not the case. The extent of the influence of the parameters θ2 and θ3 regulating these forgetting processes, however, was too minor again to be of practical relevance. Figure 3.12 illustrates in which range the expected associative effect varies depending on the parameter settings. The dependent variable is once more the ratio of the mean responses to associated and non-associated items on each frequency level. For all parameter settings and all presentation frequencies values greater than 1 are the result. In other words, irrespective of the parameter settings, PASS also expects for a mixed list of associated and non-associated words higher judgments for the associated items. The only systematic difference to the predictions in Study 1 is that – equal parameter settings presupposed – the associative effect is somewhat smaller. For example, PASS here expects with a parameter combination of medium magnitude (θ1 = 0.025, window length = 5) that the judgments for associated words averaged over frequencies are by 19% higher than the judgments for non-associated words. In Study 1, on the other hand, the predicted estimates for associated and non-associated words in the same parameter combination differed by 23%. This difference in the expected associative effect is due to fact that the number of jointly presented associated words on the mixed lists is smaller than in Study 1. Consequently, here as well the mean 141 associative strength of an associated item to all remaining words on the study list is smaller. Accordingly, FEN’s response to such a word is influenced less strongly by the remaining words. Ratios of Mean Responses 4 3 2 1 0 0 2 5 8 Presentation Frequency Figure 3.12. Distributions of ratios of mean responses for associated and nonassociated items in Study 2. Discrimination of Frequencies: Similar to the relative magnitude of estimates, the model basically makes the same predictions for the discriminability of the frequencies of associated and non-associated items as in Study 1. To determine the expected quality of the discrimination on the two levels of association, the correlations between FEN’s responses and the actual frequencies of the words were calculated on each of the 48 study lists. The correlations averaged over the study lists varied for the associated words between 0.47 and 0.98 depending on the parameter settings. For nonassociated words the correlations varied between 0.88 und 0.99. Generally, the 142 correlations increased with higher settings for θ1 and the window length36. Thus, a better encoding of the study lists again led to a better discrimination of the presentation frequencies. With a given parameter combination the correlations for associated items were constantly lower than for non-associated items. In other words, irrespective of the parameter settings, PASS expects that the frequencies of associated items are discriminated worse than the frequencies of non-associated items. Figure 3.13 illustrates this situation by means of the ratios of the mean correlations for associated and non-associated words. However, the figure also demonstrates that for the great majority of parameter combinations again only slight differences are yielded between the correlations in both associative conditions. Thus, with parameter settings of medium magnitude PASS already expects that the correlations for associated words only differ by approximately 3% from the correlations for non-associated words. With a further improved encoding of the study lists the correlations also assimilate further. The effect is almost of exactly the same size as in Study 1 when medium or high parameter combinations are used. However, with low parameter settings a tendency becomes apparent that the difference between the correlations for associated and non-associated words is slightly smaller here. This again can be put down to the fact that the mean associative strength of an associated item to the remaining items on the study list is smaller here than in the between-subjects design from Study 1. As a result, the discrimination of frequencies of associated items is less strongly impaired by the associative connection between the words. Due to this, slightly higher correlations between FEN’s responses and the actual presentation frequencies of associated words result here, and the difference to the correlations for non-associated words is somewhat smaller. However, with medium and high parameter settings PASS already expected in Study 1 that even the frequencies of associated words are discriminated extremely well (s. Fig. 3.4). Thus, the discrimination performance can hardly be improved by reducing the mean associative strength between the words. 36 The influence of the variation of the parameters θ2 and θ3 on the correlations was again very small. The tendency already observed in Study 1 according to which the occurrence of forgetting processes involved slighter differences in the discriminability of associated and non-associated items, however, appeared anew. 143 Consequently, with these parameter settings the difference between the correlations for associated and non-associated items is practically of the same size in both studies. Ratio of Mean Correlations 1,0 0,9 0,8 0,7 0,6 0,5 Figure 3.13. Distribution of ratios of mean correlations between FEN’s responses and presentation frequencies for associated and non-associated items in Study 2. All in all, as in Study 1 PASS expects that associated words are discriminated worse than non-associated ones. As far as the quality of the encoding of the study lists by the participants does not deteriorate clearly compared to Study 1, the difference in the discrimination performances should, however, be small again. Experimental Results Decision Times For every participant the means and medians of the decision times were calculated for associated and non-associated items on each frequency level. The same measures were also determined for the initiation, entry and total times (the sum of the three timings). Below, only the results of the analysis of the mean decision times are reported in detail. For the remaining time measures and the medians similar results as in Experiment 1 were found. For all time measures the medians were slightly smaller than the means (about 0.1 s) but showed the same effects. The mean initiation time 144 was 1.2 s, the mean entry time 1.1 s. Both time measures slightly increased across the range of the presentation frequencies (again by about 0.3 s). An analysis of the total times yields the same result pattern as the analysis of the decision times. Mean decision time is plotted against presentation frequency for the Mean Decision Time (s) participants with and without a preceding recall test in Figure 3.14. Recall No Recall 5 4 3 2 0 0 2 5 8 Presentation Frequency Figure 3.14. Mean decision time as a function of presentation frequency for groups with and without a preceding recall test in Experiment 2. Error bars denote standard errors. As in Experiment 1, the participants in the condition with recall test took more time to generate their frequency judgments than the participants in the condition without recall test. Averaged over presentation frequencies, the decision times between both recall conditions differed by 0.5 s. The effect size of the recall test was rather small with reffect size = .13. Generally, the participants responded somewhat slower than in Experiment 1 (cf. Fig. 3.6). However, in both recall conditions again only a comparatively small increase in the decision times resulted across the range of presentation frequencies. Thus, the difference between the mean decision times for the presentation frequencies 0 and 8 amounted to 1.3 s in the condition with recall test and 145 1.6 s in the condition without recall test, similar to the results in Experiment 1. Hence, the data suggest again that the participants in both recall conditions employed a memory assessment strategy to generate their frequency estimates. An ANOVA with the within-subjects factors ‘presentation frequency’ and ‘association’ and the between-subjects factor ‘recall test’ reveals that apart from the presentation frequency ( = 0.57, F(3, 138) = 22.24, p < 0.001) also the association between the items ( = 0.37, F(1, 46) = 7.36, p < 0.01) exerted a significant influence on the decision times. The main effect of the recall test, on the other hand, is not significant as well as all interactions (all Fs < 1.1 and all s < 0.13). The main effect of the factor ‚association’ is due to the fact that the participants generated their estimates to associated items (M = 3.7 s) slightly slower than the estimates to non-associated items (M = 3.4 s). However, it should be noted that the effect size of the association is with reffect size = .09 even lower than that of the recall test if one treats the association in the calculation as a between-subjects factor (for reasons to use between-subjects effect sizes in repeated measurement designs, see Rosenthal, Rosnow, & Rubin, 2000; Dunlap et al., 1996). Frequency Estimates Relative Magnitude of Frequency Estimates: The estimates of every participant on associated and non-associated words on each frequency level were averaged separately. These individual means formed the basis of the further analysis. We excluded five participants from this analysis (three in the condition with and two in the condition without recall test) since they gave inordinately high estimates.37 In Figure 3.15 the unweighted means of the frequency judgments of the remaining participants are plotted against presentation frequencies. As can be seen in the figure, the findings on the magnitude of the mean frequency judgments from 37 At least two of the six mean estimates of these participants to associated and non-associated items with the presentation frequencies 2, 5 and 8 were more than 1.5 interquartile ranges above the 75. percentile of the corresponding reference group. Four of the five excluded participants produced outliers both with associated and non-associated items. Generally, the difference between the estimates to associated and non-associated words was clearly greater for the excluded participants than for the remaining ones. However, the estimates of the excluded participants were also on average three times as large as the judgments of the remaining participants. 146 Experiment 1 are largely replicated. In the condition without recall test the results thus comply again with the invariant predictions of the PASS model: The judgments on associated and non-associated items show an approximately parallel trend and the associated items constantly receive higher estimates. This associative effect here turns out notably greater than in Experiment 1. Collapsing over presentation frequencies, means of 4.7 and 3.7 result for associated and non-associated words, respectively. Following the suggestions of Rosenthal et al. (2000), we computed the effect size of the association as if a between-subjects design had been employed, which resulted in reffect size = 0.28. No Recall associated non-associated 8 Mean Estimated Frequency 8 Mean Estimated Frequency Recall 6 4 2 0 6 4 2 0 0 2 5 Presentation Frequency 8 0 2 5 8 Presentation Frequency Figure 3.15. Mean estimated frequencies for associated and non-associated words in the conditions without (left graph) and with (right graph) a preceding recall test in Experiment 2. Error bars denote standard errors. The dashed lines represent the actual frequencies. In the condition without recall test in Experiment 1 the means of the estimates on associated and non-associated items differed but, on the other hand, the medians of the estimates turned out roughly identical, contrary to the PASS prediction. We presumed that this occurred because participants giving judgments on associated items transformed their intuitions of event frequencies – at least partly – differently into absolute judgments than participants whose judgments referred to non-associated 147 items. Furthermore, we assumed that associated and non-associated items should be treated alike in the transformation process by any given participant, provided that the words are both presented on the same study list in a within-subjects design and that the participants make judgments on both kinds of items. The results of the second experiment confirm this consideration. Figure 3.16 displays the distributions of the mean estimates to associated and non-associated items collapsed over presentation frequencies in the condition without recall test. Both distributions show a similar form and are somewhat skewed to the right. The medians (4.3 for associated and 3.2 for non-associated words) are accordingly below the means. However, the difference in the magnitude of the estimates can be seen in the medians as well as in the means as was expected by PASS. Mean Estimated Frequency 10 8 6 4 2 0 associated non associated Figure 3.16. Distributions of the mean estimates on associated and non-associated words in the condition without recall test in Experiment 2. The dotted lines indicate means. The results in the condition with recall test differ from those in the condition without recall test in a similar way as they did in Experiment 1 (s. Figs. 3.15 and 3.7). Again, lower estimates are given after the recall test. The total means with and without recall test amount to 3.3 and 4.2, respectively. The effect size of the recall test is 148 reffect size = 0.36. The differences between the recall conditions slightly rise with increasing presentation frequencies. Collapsing over associated and non-associated items, the mean estimates differ between the recall conditions by 1.0, 1.2 and 1.4 for the presentation frequencies 2, 5 and 8. Moreover, in contrast to the PASS prediction, the trends of the mean estimates on associated and non-associated items differ after the recall test. As in Experiment 1, from presentation frequency 2 onwards the estimates on associated words show a clearly smaller increase than the judgments on nonassociated words. Nevertheless, associated items receive on average somewhat higher estimates (Massociated = 3.5; Mnon-associated = 3.1)38. The effect size of the association, however, is with reffect size = 0.19 notably smaller than in the condition without recall test (again the effect size was calculated as if a between-subject design had been employed). An analysis of the fit between the participants’ individual estimates and the PASS predictions in both associative conditions likewise yields that the estimates to associated and non-associated items show different trends after the recall test. To determine the fit, in both associative conditions the mean predictions of the PASS model for the four presentation frequencies were correlated with the 10 corresponding estimates of every participant, similar to the procedure in Study 1.39 Table 3.5 shows these correlations averaged across participants for the simulation with the parameter settings θ1 = 0.025 and window-length = 5. In the condition without recall test high and identical fit values result for associated and non-associated items. In the condition with recall test, however, the fit values for associated and non-associated items differ. The mean correlation between the PASS predictions and the judgments is lower with associated words, t(20) = 1.83; p = 0.08. Since PASS expects here as in Study 1with all parameter combinations a similar trend of estimates both for associated and nonassociated items, this result is independent of the parameter settings. Consequently, the 38 The distributions of the estimates in the condition with recall test are again slightly skewed to the right for both associated and non-associated words. The medians are accordingly only slightly below the means (Mdassociated = 3.3; Mdnon-associated = 3.0). 39 The mean predictions of the PASS model were determined separately for the 24 study lists from the conditions with and without recall test. However, the predictions of the model again turn out almost identical in both recall conditions. Consequently, the fit hardly changes if the predictions are in both conditions based upon all 48 study lists. 149 data here suggest again that the recall test exerts an influence on the trend of the judgments on associated items that is not covered by PASS. Table 3.5 Mean correlations between estimated frequencies and PASS mean responsesa for associated and non-associated items in Experiment 2 No Recall Recall Associated .81 .76 Non-associated .81 .83 a Parameter values for the PASS simulation: θ1 = 0.025, θ2 = 0.0001, θ3 = 0.0000001, window length = 5 The correlations between the predicted and actual trend of the estimates in both associative conditions given in Table 3.5 do obviously not yet take into account the different magnitude of the estimates expected by PASS for associated and nonassociated items. That is why we determined a second fit measure including the factor ‘association’ in addition to the factor ‘presentation frequency’. In the present experiment both factors were manipulated within subjects, different from Study 1. This allows us to determine the correspondence between predictions and judgments individually for every participant. For this purpose, we computed the correlation between the eight mean predictions of the PASS model for associated and nonassociated items and the 20 judgments of every participant. Table 3.6 shows the resulting mean correlations for three different parameter combinations in the condition without recall test. We confine ourselves to this condition since the preceding analysis of the trend of the estimates already demonstrated that systematic deviations from the PASS prediction occur in the condition with recall test.40 For the parameter combination from the medium range of values (θ1 = 0.015, window length = 5) as well as for the highest parameter combination used (θ1 = 0.05, 40 Similar to the situation with the contrast analysis in Study 1, the mean correlations reported here also do not permit a direct comparison between the recall conditions. This is due to the fact that not only the variances between the participants but also the variances between the estimates of one participant relevant here turn out to be smaller after a preceding recall test. Thus, for instance, the mean standard deviation for non-associated items with the presentation frequency 8 amounts to 1.2 in the condition with recall test and to 2.0 in the condition without test. Irrespective of the predictions, the correlations in the condition with recall test are thus less impaired by error variance. This leads to the fact that the mean correlations in this condition are – despite the different trend of the estimates for associated and non-associated items – only slightly smaller than in the condition without recall test for all parameter combinations. They reach a maximum value of r =.77. 150 window length = 7) high mean correlations are the result. Particularly noteworthy is that the correlations including both associated and non-associated items obtain with r = .79 nearly the same magnitude as the single correlations in both associative conditions (cf. Tab. 3.5). This indicates that PASS predicts with these parameter combinations also the relative distances between the estimates on associated and nonassociated items very well. On the other hand, this does not hold for the lowest parameter combination (θ1 = 0.005, window length = 3) for which the mean correlation is clearly smaller with r = .59 . Evidently, with this parameter setting – i.e. in case of a simulated worse encoding of the study lists – the influence of the associations on the magnitude of the frequency judgments is overestimated, as already was the case in Study 1. Table 3.6 Mean correlations between estimated frequencies and PASS mean responsesa in Experiment 2 Parameter settingsa r a θ1 = 0.005, window length = 3 .59 θ1 = 0.015, window length = 5 .79 θ1 = 0.050, window length = 7 .79 θ2 = 0.0001; θ3 = 0.0000001 The alteration of the correlations between estimates and PASS predictions from low via medium to high parameter settings corresponds also in general to the situation represented in Table 3.6. The correlations increase from low to medium parameter settings. As the associative effect here turns out greater than in Experiment 1, the maximum correspondence between the predictions and the estimates is achieved with somewhat lower parameter combinations than in the previous study. From these medium parameter settings to high settings no further change of the fit occurs. The reason for this is once more that the predictions for parameter combinations from the medium to the high range of values strongly correlate with each other. Nevertheless, as in Study 1 substantial differences between these predictions do exist if one considers the expected size of the associative effect in isolation. Thus, averaging over 151 presentation frequencies, with the highest parameter setting it is expected that the estimates for associated items are by 6 % above the estimates for non-associated items. With parameter combinations from the medium range of values, on the other hand, differences of up to 30 % between the estimates on associated and non-associated items are predicted. In fact, the participants’ estimates differed in the condition without recall test by 27 % between the associative conditions. One of the parameter combinations with which a difference of similar size is expected is θ1 = 0.02 und window length = 5. Figure 3.17 illustrates the correspondence between the PASS predictions for this parameter setting and the participants’ mean estimates. In the figure both the estimates and the responses from FEN are standardized by a division by the means for associated words with the presentation frequency 8. A nearly perfect fit is the result. The correlation between the participants’ mean estimates and the responses amounts to ralert = 0.98 (Rosenthal, Rosnow, & Rubin, 2000). Standardized Mean Estimate 1,0 0,8 0,6 0,4 0,2 0,0 0,0 0,2 0,4 0,6 0,8 1,0 Standardized Mean Response Figure 3.17. Standardized mean estimated frequencies in the condition without recall test plotted against standardized mean responses from FEN in Experiment 2. The mean responses result from a simulation with the parameter settings θ1 = 0.02, θ2 = 0.0001; θ3 = 0.0000001 and window length = 5. The line indicates the best linear fit. 152 Discrimination of Frequencies: To measure the discrimination performances in both associative conditions, we computed the correlations between estimated and actual frequencies for associated and non-associated items for every participant. Since unusually low correlations appeared with 3 participants, these were excluded from the further analysis.41 Figure 3.18 displays the mean correlations of the remaining participants in both recall conditions. As illustrated, the results from Experiment 1 are replicated (cf. Fig. 3.10). In both recall conditions lower correlations occur with associated items than with non-associated ones. In the condition without recall test, however, this difference turns out to be small again. The means here are r = .77 for associated words and r = .80 for non-associated ones. The effect size is reffect size = .14. In the condition with recall test the effect of the association is notably greater with reffect size = .29 (again both effect sizes were calculated as if a between-subject design had been employed). This is due to the worse discrimination of the frequencies of associated items in this condition ( r = .71 ). The discrimination of non-associated words, however, is not impaired by the recall test ( r = .80 ). With which parameter settings does PASS expect differences of the observed size in the discrimination performances? For the condition without recall test these are – as in Study 1 – the same parameter settings with which also the differences in the magnitude of estimates were predicted correctly. Thus, PASS expects with the parameter combination θ1 = 0.02 and window-length = 5, which predicts the magnitude of estimates exceedingly well (s. Fig. 3.17), that the correlations between estimated and actual frequencies for associated items turn out by 3.7 % lower than for non-associated items. Actually, the correlations differ by 3.8 % in the condition without recall test. For the condition with recall test, in contrast, no parameter settings can be found which predict the differences in the magnitude of the estimates and the discrimination performances equally well. Thus, with the parameter combination θ1 = 41 Two of these participants came from the condition without and one from the condition with recall test. In case of all three participants the correlations for non-associated items were more than 1.5 interquartile ranges below the 25. percentile of the corresponding reference group. However, also in case of associated items the discrimination performances of the participants concerned were far below average. Thus, the exclusion of these participants does not influence the reported differences between the discrimination performances for associated and non-associated items essentially. 153 0.025 and window length = 3, for example, PASS can predict the difference of 11.3 % in the discrimination performances between associated and non-associated words approximately correctly. However, this parameter combination also arouses expectations that associated items receive by 47 % higher estimates than nonassociated ones. The effect of the association on the magnitude of the estimates is therefore clearly overestimated. 1,0 Mean Correlation 0,9 associated non-associated 0,8 0,7 0,6 0,5 No Recall Recall Figure 3.18. Mean correlations between estimated and actual presentation frequencies for associated and non-associated items in Experiment 2. Error bars denote standard errors. Finally, the results of the second experiment again show that the participants discriminate fairly well between different frequencies. On the other hand, their discrimination performances nevertheless stay behind the predictions of the PASS model. For non-associated items PASS expected correlations above r = .87 with all parameter combinations. For associated items correlations above r = .90 were expected as long as only parameter settings are considered that predict the differences to nonassociated items in the magnitude of the estimates and the discrimination performances 154 roughly correctly. Thus, as in Study 1, the absolute magnitude of correlations between estimated and actual frequencies is overestimated by PASS in all conditions. Discussion With Study 2 mainly three aims were pursued. First it should be examined whether the findings on the effect of associations on frequency judgments from Experiment 1 could also be replicated with a joint presentation of associated and nonassociated items. Moreover, the frequency judgments should be used to subject the predictions of the PASS model to a further test. Finally, another aim was to check our assumption that in a within-subjects design the estimates for associated and nonassociated items show – in contrast to Experiment 1 – the same distribution and, thus, correspond to the PASS predictions. Estimates in the Condition without Preceding Recall Test In the condition without preceding recall test the results entirely corresponded to our expectations. First of all, the decision times again confirmed that the participants used a memory assessment strategy to generate their frequency judgments. These judgments showed the expected associative effect. Associated items received clearly higher estimates than non-associated ones. The estimates on both kind of items showed a parallel trend across the presentation frequencies. Moreover, as predicted by PASS, the frequencies of associated items were discriminated slightly worse than the frequencies of non-associated items. Finally, our assumption regarding the distributions of the estimates proved true: According to the PASS prediction, the estimates on associated and non-associated words were now nearly equally distributed. Therefore, differences in the magnitude of the estimates between both kinds of items could equally be found in the means and the medians. This also supports our explanation for the fact that the medians of the estimates in Experiment 1 showed no associative effect. There, associated and nonassociated words were judged by different participants. We assumed that this gave the participants in both groups the opportunity to transform their intuitions of event frequencies into estimates in different ways. Differences in the intuitions about the 155 frequency of associated and non-associated items could thus at least have been partially leveled in the estimates. This could not happen in Experiment 2 because each participant gave judgments on associated and non-associated items, presumably using the same kind of transformation for both kinds of items. Consequently, the distributions of the frequency estimates, too, conformed to the PASS predictions. Thus, the results of Study 2 correspond to all ordinal predictions of the PASS model that can be gained independent of specific parameter settings. Moreover, we could demonstrate that with suitable parameter combinations from the medium range of values the model predicts the data also quantitatively excellently. The fit between the PASS predictions and the trend of the individual estimates of the participants was very high for both associated and non-associated items. The fit was hardly lower when, additionally to the trend of the estimates, also the effect of the factor association was considered in the calculation. Thus, with appropriate parameter settings, PASS could also successfully predict the relative distances between the estimates for associated and non-associated items. Furthermore, similar to Study 1, a nearly perfect fit resulted with the means of the participants’ estimates. Finally, the parameter combination with which this fit was attained also predicted the size of the difference in the discrimination performances for associated and non-associated items correctly. Merely the absolute magnitude of the discrimination performances was overestimated by PASS as was already the case in Experiment 1. PASS obtained the best fit-values with lower parameter settings than in Study 1. This may appear surprising at first sight. As already described above, lower settings for the learning parameter θ1 and the length of the attentional window modeled a worse encoding of the study lists. However, there is little reason to assume that the participants of the second experiment really encoded the items worse. The presentation of the items occurred in both experiments under identical conditions and with identical instructions, and the study lists had the same length. Moreover, the participants in both experiments were drawn from the same population and did not differ from each other by features such as age and education. It thus seems to be reasonable to assume that the quality of encoding did not differ essentially in both experiments. However, the simulations with PASS demonstrated that with identical parameter settings, the model 156 expects here a slightly smaller associative effect with regard to the magnitude of the estimates than in Study 1. We found that the opposite was the case: The difference between associated and non-associated items was clearly larger in Study 2 than in Study 1. However, this is only apparently contrary to our considerations concerning the quality of the encoding of the study lists and can be explained by our assumptions of the transformation process. We assume that part of the participants working on associated items in Experiment 1 used the same response range as the participants working on non-associated items despite different intuitions. Thus, they transformed different intuitions into similar estimates. Consequently, the associative effect turns out smaller in the mean frequency estimates than in the intuitions. Now if PASS is supposed to match the effect in the estimates under the restriction of equal transformations, this can only be balanced via changes in the parameter settings. To predict a decreased association effect, PASS requires higher parameter settings. In this way, the different optimal parameter settings in both experiments can be explained. Actually, this result had to occur if our assumptions of the transformation process are correct and the study lists were encoded similarly well in both experiments. In this view, the actual quality of the encoding in Experiment 1 was overestimated by the parameter settings obtaining the best fit. This argumentation is further supported by the results of the discrimination performances. The discrimination performances are not influenced by the transformation process or by the selected response range. Accordingly, the results of this measure should correspond to the PASS predictions in case of similarly good encodings in both experiments. This is exactly what we found. As expected by PASS, the difference in the discrimination performances for associated and non-associated items was just as great in Experiment 2 as in Experiment 1. Altogether, the PASS model was thus extremely successful with the prediction of the estimates in the condition without intermediate recall test. The results hence demonstrate that the model is not only capable to simulate the fundamental finding of sensitive frequency judgments but that it is also able to explain the effects of associations on frequency estimates. 157 Estimates in the Condition with Preceding Recall Test The results in the condition with preceding recall test largely replicated the findings from Experiment 1. First the decision times confirmed again that the participants counted on memory assessment and not on an enumeration strategy when they gave their frequency judgments. The differences in the frequency judgments between the recall conditions also found in Experiment 2 can thus not be put down to a change of strategy. After the recall test, the estimates were smaller and increased slower with increasing presentation frequencies than in the condition without recall test. Moreover, after the recall test, the effect of the associations on the frequency judgments changed. The effect size of the association on the magnitude of the estimates was smaller than in the condition without recall test. The estimates on both types of items did not run parallel any longer. The estimates for associated items increased slower than those for non-associated items. For the presentation frequency 8, the estimates for associated items were not higher than those for non-associated items. Once again, PASS predicted the trend for associated items less well than the trend for non-associated ones. Finally, the participants discriminated the frequencies of associated items after a recall test clearly worse than the frequencies of non-associated items. This resulted in the fact that PASS was not able to predict both the differences in the discrimination performances and the differences in the magnitude of the estimates with the same parameter settings correctly. The findings to the effect of the recall test on the frequency estimates were thus essentially the same as in Experiment 1. In the meantime, we could also replicate the result that the recall test leads to reduced absolute estimates in a series of experiments on the role of attentional processes in frequency judgments. (Sedlmeier & Renkewitz, 2004). Besides, the estimates given after a recall test on words repeatedly presented before had also in these experiments a slighter increase than estimates given without an intermediate recall test. Hence, we can conclude that a recall test reliably lowers absolute frequency judgments. In the condition with recall test, PASS failed to predict the complete data pattern correctly in Experiment 1 and in Experiment 2. As already discussed in 158 Study 1, the influence of the recall test cannot take effect during the encoding phase. The assumption that the recall test influences the frequency estimates by changing the memory representation is not consistent with the data, either. Consequently, the recall test must take effect at the time the judgment is given. Manipulations and information that influence the estimates at the time the judgment is given are not considered in PASS. As far as such influences take effect, it can generally not be expected that the model predicts the estimates correctly. For this reason, the results of the current experiments show that the carrying out of a recall test establishes a situation in which PASS cannot be applied in its present form. General Discussion With the current experiments we primarily examined the influence of associations between jointly presented items on frequency estimates. We pursued the aim to check whether the PASS model is capable of explaining possible effects of associations on frequency judgments. Earlier studies already demonstrated that estimates on the presentation frequency of a word increased with the number of jointly presented associated words (Leicht, 1968; Shaughnessy & Underwood, 1973; Vereb & Voss, 1974). The typical design in these studies consisted of presenting several target words with different frequencies. Moreover, the study lists contained a varying number of associated words to each target word. The associations between diverse sets of target words and their associates was either minimized or remained undetermined. Furthermore, since the frequencies of target words and associates were not systematically varied within a set, none of these studies allowed conclusions whether associations between stimuli do have an effect on the discriminability of the frequencies of these stimuli. On the contrary, in our experiments the associations between all words on the study lists were controlled. We created item sets in which all words were associated with each other as strongly or as weakly as possible. Within these sets, the frequencies of the words were manipulated in the same way. This procedure allowed examining the effects of associations on both the magnitude of frequency estimates and the discriminability of frequencies. 159 Our simulations with PASS demonstrated that irrespective of possible parameter settings the model expects that higher frequency estimates are given for sets of associated words. Besides, according to the model predictions, this effect should not be influenced by the actual presentation frequencies of the stimuli. The estimates on associated and non-associated words should thus increase in a parallel way with rising frequencies. Finally, PASS expected that the frequencies of associated items are discriminated slightly worse than the frequencies of non-associated items. All these predictions were largely confirmed in both experiments as far as the frequency estimates were not influenced by an additional manipulation, an intermediate recall test. The Explanatory Power of PASS Associations between words proved to affect frequency estimates in the studies reported here. These associations did not arise from experimental manipulation. Instead they were acquired pre-experimentally by the participants. A great advantage of PASS is that it is not necessary to feed participants’ pre-experimentally learned knowledge into the model. Rather, PASS experiences a large sample of natural language, just as the participants had done before the experiment. PASS, using an associative learning algorithm, is sensitive to the frequencies of co-occurrences of words within this sample of natural language. The model acquires proper associations between words solely on the basis of these frequencies of co-occurrences (s. a. Sedlmeier, Wettler, & Seidensticker, 2004). The same learning mechanism can obviously be successfully used to simulate participants’ encoding of information within the experiment and the subsequent estimates of the participants on the experimental frequency of words. Thus, PASS gives a parsimonious account of both the pre-experimental acquisition of associations and the frequency judgments within the experiment that are affected by these associations. Remember that PASS can additionally account for several other phenomena in frequency judgments such as regression to the mean (Sedlmeier, 1999), recency effects (Sedlmeier, 1999), and the effects of varying attention at encoding (Sedlmeier & Renkewitz, 2004). Taken together, these findings strongly suggest that associative 160 learning – as modeled in PASS – might be more than a plausible candidate for the processes underlying frequency estimates based on memory assessment. What about other Memory Models of Frequency Judgment? PASS is not the only successful memory model of frequency judgments. Like PASS these other models (e.g. SAM (Gillund & Shiffrin, 1984), TODAM 2 (Murdock, 1993), MINERVA 2 (Hintzman, 1988), MINERVA-DM (Dougherty, Gettys, & Ogden, 1999)) assume that (I) memory representation changes with every additional encoding of an event that occurs with a minimum of attention, and (II) that the current state of memory representation can be used to produce a signal the intensity of which depends on the frequency of an event in question. The models differ largely with respect to their architecture, their specific representation assumptions and the processes with which the memory representation is accessed. Nonetheless these diverse models have arrived at similar or identical predictions across a wide range of phenomena in frequency judgments (e.g. Dougherty & Franco-Watkins, 2002; Hintzman et al., 1992; Hintzman, 2001; Sedlmeier, 2002). However, the capability of all other models to account for the influence of associations on frequency judgments studied here, might be limited. To the best of my knowledge, PASS is so far the only model that successfully simulates human primary associations (Sedlmeier, Wettler, & Seidensticker, 2004). Thus, the other models had to be delivered from the outside with the crucial information about associations between words to arrive at any predictions. Let us illustrate the latter aspect for MINERVA 2 (Hintzman, 1988), the model that has presumably been used most often to simulate frequency judgments. Unlike PASS, MINERVA 2 is an exemplar model (e.g. Estes, 1986). In MINERVA 2 associative learning is not considered as the basis of sensitivity to frequencies. Consequently, frequency knowledge is not represented in associations. Instead, every newly encoded event adds a trace to an ever-growing memory matrix. An event is considered as a collection of features. Accordingly, a trace corresponds to a vector of features with possible values of ‘1’ (feature present), ‘–1’ (feature absent), or ‘0’ (feature irrelevant or unknown). Random error effects that any specific feature of an event might not be correctly stored but instead be coded as unknown. To arrive at 161 a frequency judgment for an event, the model assesses the similarity between this event and all traces in the memory matrix. This results in a global familiarity signal that is in turn the basis for frequency judgments. As MINERVA 2 encodes each new event in a separate trace that is by no means connected to any other trace in memory, it cannot learn about relations between events. This includes that it cannot learn associations between words. Thus, to be able to make any prediction about the influence of associations on frequency judgments, MINERVA 2 has to be supplied with information about which word is how strongly associated to which other words. How could associations be represented in the memory of MINERVA 2? The only way would be to translate associations between events into similarities of these events. In MINERVA 2 the similarity of two events is a function of the number of features that these events share. PASS acquired associations solely on the basis of the frequency of co-occurrences of words. This knowledge about co-occurrences allowed the model to predict effects of associations on frequency estimates very well. Consequently, it is apparently only possible to translate associations into similarities if one can assume that the number of features two words share is highly, if not perfectly, correlated with the frequency with which these words co-occur. Is that a reasonable assumption? This question has no definite answer, as it is not specified in MINERVA 2 – or elsewhere – what the relevant features are that constitute words or events. If we regard the stimulus material in our studies, there are undoubtedly word pairs for which this assumption seems to be plausible: For example, Stuhl (chair) and Hocker (stool) are associated and they certainly share many features. However, PASS also identifies words as strongly associated for which it seems difficult to enumerate common features such as, for example, Fluss (river) and Brücke (bridge) or Stuhl (chair) and rutschen (to slide). Thus, it seems at least doubtful that associations could be adequately translated into similarities. If this is not possible, the effects observed in our studies would be based on an attribute that cannot be represented in MINERVA 2. Therefore, PASS might not only be a promising model of the memory assessment process but also the only model that can presently offer a comprehensive account for the effects of associations we observed in the present experiments. 162 What Problems Remain in Predicting Frequency Estimates with PASS? Obviously, the current studies also revealed some remaining weaknesses in the prediction of frequency estimates by PASS. Let us conclude the discussion with a glance at these problems. Two of these problems immediately concern the modeling of the memory assessment process. Predicting Discrimination Performances That this modeling cannot be perfect first results from the prediction of the discrimination performances. PASS indeed correctly predicted the differences in the discriminability of the frequencies of associated and non-associated words; however, the absolute discrimination performance was overestimated by PASS in all conditions of both experiments. The correlations with which we measured the discrimination performances are determined by the trend of the estimates and the variance of the estimates at each given presentation frequency. Since the trend of the participants’ estimates corresponded to the PASS predictions in the condition without recall test, it becomes obvious that the model underestimates the error variance of the estimates. Hence, PASS evidently lacks an appropriate error component producing such variance. In fact, in the present simulations the model contained the assumption of a faultless encoding of stimuli. Since, furthermore, the parameters for learning, interference and decay remained constant throughout learning, all variance in the PASS predictions at a given presentation frequency can be put down to the associations between the stimuli and the temporal course of the encoding. Any kind of additional unsystematic error variance was not taken into consideration. In contrast, in other memory models it is assumed that accidental information loss occurs when stimuli are encoded (e.g. Hintzman, 1988; Fiedler, 1996). How could such a kind of information loss be incorporated in PASS? In its original form PASS encodes events as a collection of features. Accidentally loosing or exchanging values of single features prior to or during the encoding of events would produce the desired error variance. A corresponding procedure is usually realized in simulations with other memory models (e.g. Dougherty et al., 1999; Hintzman, 1988). In the present studies this was not possible because we had to use localistic representations of words to be able to 163 simulate the acquisition of associations between words. However, in principle, the assumption of unsystematic information loss can be integrated into PASS without any problems. Predicting Context Sensitivity in Frequency Judgments The second problem with the modeling of the memory assessment process concerns the ability of PASS to discriminate the frequency of occurrence of stimuli in different contexts. We demonstrated that associations between words increase estimates on the experimental frequencies of these words. Associations are, according to PASS, a result of the frequency with which words were jointly encoded before the experiment. Thus, people are not able to fade out their knowledge about the frequency of co-occurrences of words outside the experiment when they give experimental frequency estimates. Interestingly, earlier studies demonstrated that the preexperimental frequency of a single word does not increase experimental estimates on this word (Greene & Thapar, 1994; Rao, 1983). In this case, people can very well discriminate between the experimental and the extra-experimental context. In the ideal case, a model should be capable of simulating both findings described. In the present simulations PASS was generally not sensitive to contexts since no information at all was encoded whether the words occurred outside or within the experiment. For this reason, besides the associations between words also the preexperimental frequencies of occurrence of words would have influenced PASS’ predictions. To avoid this unwanted effect, we intervened in the simulations by setting the self-associations of the words to zero after PASS had acquired pre-experimental knowledge from the text corpus. The pre-experimental frequency of occurrence of a word could thus not increase the predictions. In principle, PASS can also produce frequency estimates differentiating between dissimilar contexts. This only requires that the model encodes these contexts and that the memory probe specifies a target context in addition to a target stimulus. Knowledge about the frequency of occurrence of a word itself and about the frequency of co-occurrences of words, however, is identically represented in PASS and by no means separated. Thus if the model discriminates between contexts, this will not only 164 weaken the (undesired) influence of the pre-experimental word frequencies but also the (desired) effect of pre-experimental co-occurrences of words. A further development of PASS should thus aim at integrating a mechanism into the model that allows differential effects of associations and pre-experimental frequencies of occurrence. Beyond Memory Assessment: Transforming Intuitions into Absolute Frequency Judgments The last problem to be discussed here, PASS shares with all other memory assessment models: Even if participants use a memory assessment strategy to generate frequency judgments and if a model exists that perfectly describes the corresponding memory processes, this may not suffice to predict frequency estimates. The reason for this is that the models do not directly simulate frequency judgments but intuitions about frequencies. These intuitions still need to be transformed by the participants into absolute frequency judgments. With respect to this transformation all memory models only contain the assumption that a participant will use the same linear function for all relevant stimuli to transform his intuitions into estimates. At least as far as the participants are not provided with a response range, this function remains unspecified. As a consequence, only predictions about relative differences between frequency estimates are possible but not predictions about the absolute magnitude of estimates. Especially if one takes into account that participants obviously carry out very different transformations even within the same experimental condition, this seems to be unsatisfying. For example, in the condition without recall test in Experiment 1 the 25. percentile of the mean estimates on associated stimuli was 3.0 while the 75. percentile amounted to 5.8 (s. Fig. 3.9). With regard to the reasons for this variance in the frequency estimates the memory models remain silent. Moreover, our results indicate that even relative differences in absolute frequency estimates can only be predicted completely correctly if the factors to be studied are manipulated within-subjects. In Experiment 1 we manipulated the factor ‘association’ between-subjects. To derive model predictions we assumed that the transformation functions would be equally distributed in both associative conditions. The results of our experiments strongly suggest that this assumption was wrong. In 165 Experiment 1 PASS could not predict the distributions and the medians of the estimates to associated and non-associated words correctly. These problems disappeared when the factor ‘association’ was manipulated within-subjects in Experiment 2. This indicates that participants who gave estimates on associated words in Experiment 1 at least partially selected different transformations than participants who judged the frequency of non-associated words. However, we could only speculate about the factors influencing the transformation process. In general, the literature on frequency estimates reveals hardly more about the transformation process than that given information about absolute frequencies will affect the magnitude of the estimates (e.g. Brown & Siegler, 1993; Brown, 1995; Schwarz et al., 1985; Schwarz & Scheuring, 1992; Schwarz, 1999). As long as this gap is not closed, memory models can only make precise predictions about the relative differences of absolute frequency estimates in within-subjects designs. The fact that transformation processes are obviously influenced by factors that remain unconsidered in memory models may also explain why the participants’ frequency estimates deviated from the PASS predictions after a recall test. The recall test had two effects in both experiments: On the one hand, it generally led to lower frequency judgments and, on the other, it affected the trend of the estimates on associated words. These estimates increased slower in the condition with recall test than in the condition without recall test. As already explained in the discussion of Experiment 1, it has to be assumed that the recall test takes effect only at the time the judgment is given. At this stage, two processes have to be carried out. Frequency information has to be retrieved from memory and it must then be transformed into frequency estimates. On the basis of the present experiments, it cannot be resolved which of these two processes was influenced by the recall test. However, it is worth to be noted that there are different findings according to which an influence of a recall test on the transformation process seems to be plausible. Schwarz and colleagues (1991; s. a. Wänke et al., 1995, Schwarz & Wänke, 2002) found that the experienced ease of recall changed the magnitude of subsequent frequency estimates. Irrespective of the actual recall performance, the participants gave higher frequency estimates on an event if the retrieval of corresponding instances was subjectively easy. In our 166 experiments the participants had two minutes to recall words from the study list. On average, our participants recalled 11.0 (Experiment 1) and 11.3 (Experiment 2) of the 15 stimuli presented. We observed that the participants started to have difficulties in retrieving additional words a long time before the two minutes run out. Thus, the participants may have experienced the recall test as rather difficult. The experience that the retrieval of words failed or succeeded only with effort might have rendered the assumption implausible that these words were presented very often immediately before. And this might be reason enough for the participants to choose a comparatively low response scale for their frequency estimates. One could at least speculate about whether an effect of the recall test on the transformation process can also explain the changed trend of the estimates on associated items. It would be possible that the participants infer a maximum presentation frequency from their experiences with the recall and, as a consequence, never give a frequency estimate beyond this maximum. Such a limitation of the subjective response range could result in a ceiling effect. The decelerated increase of the estimates on associated items and the absence of an associative effect with the maximum presentation frequency 8 observed in both experiments could be explained in this way. 167 CHAPTER 4 Summary and Outlook To act successfully in the world, humans must notice the frequencies of events occurring. For example, when parents think of sending their child to a particular school, they may rely on the experiences of other parents with this school. To exploit these experiences properly, it is not sufficient to remember that the school was praised and criticised. One must also access how often this occurred respectively. How do people do that? Two answers have dominated the psychological literature on frequency judgments: (1) People try to remember specific instances of events (“Mrs. Hull said X, Mr. Schmidt said Y.”). They base their frequency judgements about events on the size of the recalled samples (i.e. numbers of praises and criticisms in our example). This approach has been termed enumeration or recall-estimate strategy. (2) On the contrary, memory models of frequency judgments assume that the representation of frequency information in memory is an inevitable consequence of attending to events. Additionally, it is assumed that the current state of the memory representation can be used to produce a signal the intensity of which reflects the frequency of events. Thus, memory models postulate that people can estimate frequencies by using a retrieval free memory assessment strategy. In a first step, this dissertation explores whether memory models can explain the famous-names effect, hitherto accounted for by the use of a recall-estimate strategy. In a second step, this work then takes a closer look at the PASS model (Sedlmeier, 1999; 2002), which assumes that sensitivity to frequencies is based on associative learning. PASS is used to generate new predictions concerning the influence of associations between events on frequency judgments. 168 Can Memory Models Account for the Famous-Names Effect? An important issue in the research on frequency estimates is the differentiation between category size judgments and frequency of occurrence judgments. In category size judgments the requested frequency estimates refer to the number of different exemplars of a category. In contrast, frequency of occurrence judgments refer to the number of repetitions of the same event. Evidence for the recall-estimate strategy mostly stems from research on category size judgments. The best-known study advocates of the recall-estimate strategy refer to is presumably Kahneman and Tversky’s famous-names experiment (1973). If participants are presented with a list of male and female persons they overestimate the frequency of that sex that was represented by more famous-names on the list. As the participants additionally recall more names of the “famous” sex, the bias in frequency judgments was ascribed to the recall-estimate strategy. Proponents of memory models claimed that the famous-names effect could also result from the use of a memory assessment strategy (Dougherty et al., 1999; Sedlmeier, 1999). They proposed that the higher pre-experimental frequency of occurrence of famous-names is responsible for the bias. They additionally assumed that the memory assessment process does not allow for a perfect discrimination of encounters in different contexts. In this case also memory models would expect an overestimation of the category with the more famous names. An implication of this explanation is that the famous-names effect should generalize to any stimulus material: When the frequency of two categories is judged the frequency of that category should be overestimated the exemplars of which occur more often outside the target context. These ideas were tested in the four experiments reported in chapter 2. The results largely disagreed with the memory model explanation for the famous-names effect. It was found that differences in the pre-experimental frequency of stimuli are not sufficient to produce biases in category size judgments. However, also the recallestimate strategy was not successful in explaining the results of the four experiments. While this strategy expected a considerable bias in all experiments the frequency estimates were quite accurate in several conditions. Thus neither the memory assessment strategy nor the recall-estimate strategy could provide a satisfying account 169 of the entire pattern of results. A reanalysis of the data suggested that the results might be best understood by the following assumptions: 1. The memory response to categories was not influenced by the frequency of the exemplars outside the target context. Thus, the participants generally had access to unbiased information about the category sizes by memory assessment. 2. Nonetheless, the participants supplemented this information sometimes with information that they gained from their recall performances. This resulted in slightly biased frequency estimates. Thus, a multiple strategy perspective (Brown, 1995; 2002) appears most promising to account for the heterogeneous results with category size tasks found here and in numerous other studies (e.g. Alba et al., 1980; Barsalou & Ross, 1986; Watkins & LeCompte, 1991; Williams & Durso, 1986). What does affect the choice of strategy? The current experiments suggest that participants rely more heavily on a recall strategy when many instances were remembered, which presumably renders the recall strategy subjectively valid. In line with this, participants should hardly employ enumeration when they do not judge category sizes but frequencies of occurrence: If chair for example is encountered twice over the course of an experiment, these two episodes can hardly be remembered separately because they occurred within very similar contexts. In line with this, we found no indication of the use of the recallestimate strategy in the experiments of chapter 3, which required frequency of occurrence judgments. What are implications for further research? Outside the laboratory, repetitions of events hardly occur in contexts as similar to each other as in a typical experiment. When the same event occurs in richer and more diverse contexts enumeration might be feasible, even for frequency of occurrence judgments. Future research should address this issue. Another interesting question is which other factors do effect the choice of strategy. It seems reasonable that not only the perceived validity of the recall-estimate strategy effects the choice but also the subjective validity of a memory assessment strategy. In Experiment 4, after an overt categorization of the stimuli participants relied exclusively on memory assessment, even though they did recall a large number of exemplars. This suggests that a categorization task increases the perceived validity of 170 memory assessment. This issue deserves further scrutiny as well as the question which other factors influence the perceived validity of enumeration and memory assessment. The Influence of Associations on Frequency Judgments The results reported in chapter 2 suggest that the encoding of events is sufficient to bring about the representation of quite accurate knowledge about the frequencies of the superordinate categories of these events in memory. However, people do not necessarily exclusively rely on this frequency memory to generate frequency judgments. In general there is much evidence that knowledge about frequencies of events and their superordinate categories is an inevitable consequence of attending to events (e.g. Alba et al., 1980; Hasher & Zacks, 1984; Betsch & Sedlmeier, 2002). This raises the question how frequencies are encoded and represented in memory. PASS (Sedlmeier, 1999; 2002) describes the emergence of frequency knowledge as a by-product of associative learning. The model acquires not only knowledge about the frequencies of events but also about the frequencies with which different events co-occur. The latter enables PASS to predict human primary associations to stimuli successfully (Sedlmeier, Wettler, & Seidensticker, 2004). In chapter 3 I used the capability of PASS to acquire associations to gain precise predictions about the effects of associations on frequency estimates. In two simulations PASS encoded a large text corpus and learned in this way about the associations between words. In a second step of the simulations PASS encoded study lists that were also presented to human participants in two experiments. PASS predicted that the joint presentation of associated words (like river and bridge) on a study list would lead participants to give higher frequency of occurrence estimates than the joint presentation of non-associated words (like river and fashion). Moreover, this effect should be independent of the presentation frequency of the words. Finally, the frequencies of associated items should be discriminated slightly worse than the frequencies of non-associated items. These predictions were largely confirmed in the experiments reported in chapter 3. Thus, PASS can obviously provide a parsimonious account of both the 171 acquisition of associations and judgments of frequencies that are based on memory assessment. This indicates that associative learning – as modelled in PASS – might be more than a plausible candidate for the memory mechanisms underlying frequency processing. Proponents of other memory models will have to show if their models can also account for the demonstrated associative effects. Since exemplar models (as MINERVA 2 (Hintzman, 1998) or MINERVA-DM (Dougherty et al., 1999)) cannot acquire associations between stimuli, it is doubtful that this class of models is suitable to achieve this. Although PASS successfully predicted the effects of association on frequency judgments, the experiments in chapter 3 nonetheless revealed several limitations of PASS. Two limitations pertain to the modelling of the memory assessment process: First, PASS does not yet integrate the effects of random error during the encoding of stimuli. As a consequence, PASS must necessarily overestimate participants’ ability to discriminate frequencies. Second, PASS does not yet allow to differentially treat effects of the pre-experimental frequencies of co-occurrences of events and effects of the pre-experimental frequencies of the events themselves. However, the present studies demonstrated that associations between words – that are according to PASS based on pre-experimental co-occurrences – can exert a quite considerable influence on frequency estimates. In contrast, earlier studies suggest that the pre-experimental, linguistic frequency of words hardly influences frequency judgments (Rao, 1983; Greene & Thapar, 1994). Thus, a further development of PASS should aim at integrating mechanisms that can simultaneously account for these diverging effects. A last problem of PASS and all other memory models results from the fact that these models simulate intuitions about frequencies. However, the memory models lack a comprehensive description of how people transform intuitions about frequencies into absolute judgments of frequencies. The experiments in chapter 3 suggest that this shortcoming can restrict the ability of the models to predict absolute frequency estimates even if the memory assessment process is modelled correctly. 172 References Alba, J. W., Chromiak, W., Hasher, L., & Attig, M. S. (1980). Automatic encoding of category size information. Journal of Experimental Psychology: Human Learning and Memory, 6, 370-378. Anderson, J. R. (1996). Kognitive Psychologie. Heidelberg: Spektrum, Akademischer Verlag. Attneave, F. (1953). Psychological probability as a function of experienced frequency. Journal of Experimental Psychology, 46, 81-86. Barsalou, L.W., & Ross, B. H. (1986). The role of automatic and strategic processing in sensitivity to superordinate and property frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 116-134. Bazerman, M. H. (2001). Judgment in managerial decision making. New York: John Wiley & Sons. Bedon, B. G., & Howard, D. V. (1992). Memory for the frequency of occurrence of karate techniques: A comparison of experts and novices. Bulletin of the Psychonomic Society, 30 (2), 117-119. Begg, I., Maxwell, D., Mitterer, J. O., & Harris, G. (1986). Estimates of frequency: Attribute or attribution? Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 496-508. Betsch, T., & Pohl, D. (2002). Tversky and Kahneman’s availability approach to frequency judgment: A critical analysis. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 109-120). Oxford: University Press. Betsch, T., & Sedlmeier, P. (2002). Frequency Processing and Cognition: StockTaking and Outlook. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 303-318). Oxford: University Press. Betsch, T., Siebler, F., Marz, P., Hormuth, S., & Dickenberger, D. (1999). The moderating role of category salience and category focus in judgments of set size and frequency of occurrence. Personality and Social Psychology Bulletin, 25, 463-481. 173 Beyth-Marom, R., & Fischhoff, B. (1977). Direct measures of availability and judgments of category frequency. Bulletin of the Psychonomic Society, 9, 236238. Blair, E., & Burton S. (1987). Cognitive processes used by survey respondents to answer behavioral frequency questions. Journal of Consumer Research, 14, 280-288. Bornstein, R. F. (1989). Exposure and affect: Overview and meta-analysis of research, 1968-1987. Psychological Bulletin, 106(2), 265-289. Bousfield, W. A., & Sedgewick, C. H. (1944). An analysis of sequences of restricted associative responses. Journal of General Psychology, 30, 149-165. Bower, G. H. (1994). A turning point in mathematical learning theory. Psychological Review, 101, 290–300. Brooks, J. E. (1985). Judgments of category frequency. American Journal of Psychology, 98, 363-372. Brown, N. R. (1995). Estimation strategies and the judgment of event frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 1539-1553. Brown, N. R. (1997). Context memory and the selection of frequency estimation strategies. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 898-914. Brown, N. R. (2002). Encoding, representing, and estimating event frequencies: A multiple strategy perspective. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 37-54). Oxford: University Press. Brown, N. R., Buchanan, L., & Cabeza, R. (2000). Estimating the frequency of nonevents: The role of recollection failure in false recognition. Psychonomic Bulletin and Review, 7, 684-691. Brown, N. R., & Siegler, R. S. (1993). Metrics and mappings: A framework for understanding real-world quantitative estimation. Psychological Review, 100, 511-534. Bruce, D., Hockley, W. E., & Craik, F. I. M. (1991). Availability and category frequency estimation. Memory and Cognition, 19, 301-312. 174 Burgess, C., & Livesay, K. (1998). The effect of corpus size in predicting reaction time in a basic word recognition task: Moving on from Kucera and Francis. Behavior Research Methods, Instruments, & Computers, 30, 272-277. Cheng, P. W. (1997). From covariation to causation: A causal power theory. Psychological Review, 104, 367-405. Cheng, P. W., & Novick, L. R. (1992). Covariation in natural causal induction. Psychological Review, 99, 365-382. Cofer, C. N. (1965). On some factors in the organizational characteristics of free recall. American Psychologist, 20, 261-272. Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159. Conrad, F. G., & Brown, N. R. (1994). Strategies for estimating category frequency: Effects of abstractness and distinctiveness. American Statistical Association, Proceedings of the Section on Survey Methods Research (pp. 1345-1350). Alexandria, VA: American Statistical Association. Conrad, F. G., Brown, N. R., & Cashman, E. (1998). Strategies for answering behavioral frequency questions. Memory, 6, 339-366. Conrad, F. G., Brown, N. R., & Dashen, M. (2003). Estimating the frequency of events from unnatural categories. Memory and Cognition, 31, 552-562. Deese, J. (1959). Influence of inter-item associative strength upon immediate free recall. Psychological Reports, 5, 305-312. Diener, E., Sandvik, E., & Pavot, W. (1991). Happiness is the frequency, not the intensity, of positive versus negative affect. In F. Strack, M. Argyle, & N. Schwarz (Eds.), Subjective Well-Being. Oxford: Pergamon Press. Dobbins, I. G., Kroll, N. E. A., Yonelinas, A. P., & Liu, Q. (1998). Distinctiveness in recognition and free recall: The role of recollection in the rejection of the familiar. Journal of Memory and Language, 38, 381-400. Dougherty, M. R. P., & Franko-Watkins, A. M. (2002). A memory models approach to frequency and probability judgement: Applications of Minerva 2 and Minerva DM. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 121-136). Oxford: University Press. 175 Dougherty, M. R. P., & Franco-Watkins, A. M. (2003). Reducing bias in frequency judgment by improving source monitoring. Acta Psychologica, 113, 23-44. Dougherty, M. R. P., Gettys, C. F., & Ogden, E. E. (1999). MINERVA-DM: A memory process model for judgments of likelihood. Psychological Review, 106, 180-209. Dunlap, W. P., Cortina J. M., Vaslow, J. B., & Burke, M. J. (1996). Meta-analysis of experiments with matched groups or repeated measures designs. Psychological Methods, 2, 170-177. Edgington, E. S. (1965). A type of spurious constant error in magnitude estimation. Perceptual and Motor Skills, 20, 199-202. Eraker, S. A., & Politser, P. (1988). How decisions are reached: Physicians and the patient. In J. Dowie & A. S. Elstein (Eds.), Professional Judgment: A reader in clinical decision making (pp. 379-394). Cambridge: Cambridge University Press. Erickson, J. R., & Gaffney, C. R. (1985). Effects of instructions, orienting task, and memory tests on memory for words and word frequency. Bulletin of the Psychonomic Society, 23, 377-380. Erlick, D. E. (1964). Absolute judgments of discrete quantities randomly distributed over time. Journal of Experimental Psychology, 57, 475–482. Estes, W. K. (1950). Toward a statistical theory of learning. Psychological Review, 57, 94-107. Estes, W. K. (1976). Structural aspects of associative models of memory. In: C. N. Cofer (Ed), The Structure of Human Memory (pp. 31-53). San Francisco: Freeman. Estes, W. K. (1986). Array models for category learning. Cognitive Psychology, 18, 500-549. Fiedler, K. (1983). On the testability of the availability heuristic. In R.W. Scholz (Ed.), Decision making under uncertainty (pp. 109-119). Amsterdam: Elsevier, North-Holland. 176 Fiedler, K. (1991). The tricky nature of skewed frequency tables: An information loss account of distinctiveness-based illusory correlations. Journal of Personality and Social Psychology, 60, 24-36. Fiedler, K. (1996). Explaining and simulating judgment biases as an aggregation phenomenon in probabilistic, multiple-cue environments. Psychological Review, 103, 193–214. Fischbein, E. (1975). The intuitive sources of probabilistic thinking in children. Dordrecht, The Netherlands: D. Reidel. Fisk, A. D., & Schneider, W. (1984). Memory as a function of attention, level of processing, and automatization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 181-197. Fiske, S. T., & Neuberg, S. L. (1990). A continuum of impression formation from category based to individuation: influences of information and motivation on attention and interpretation. In M. Zanna (Ed.), Advances in experimental social psychology (Vol. 23, pp. 1-74). New York: Academic Press. Fiske, S. T., & Taylor, S. E. (1991). Social Cognition. McGraw-Hill, New York. Flexser, A. J., & Bower, G. H. (1975). Further evidence regarding instructional effects on frequency judgements. Bulletin of the Psychonomic Society, 6 (3), 321-324. Freund, J. S., & Hasher L. (1989). Judgments of category size: Now you have them, now you don' t. American Journal of Psychology, 102, 333-352. Gardiner, J. M. (1988). Functional aspects of recollective experience. Memory & Cognition, 16, 309 - 313. Gigerenzer, G. (1991). How to make cognitive illusions disappear: Beyond ‘heuristics and biases’. In W. Storche, & M. Hewstone (Eds.), European review of social psychology, (Vol. 2, pp. 83-115). Wiley: New York NY. Gillund, G., & Shiffrin, R. M. (1984). A retrieval model for both recognition and recall. Psychological Review, 91, 1-67. Ginsburg, H., & Rapoport, A. (1967). Children’s estimates of proportions. Child Development, 38 (1), 205-212. Greene, R. L. (1984). Incidental learning of event frequencies. Memory & Cognition, 12, 90–95. 177 Greene, R. L. (1986). Effects of intentionality and strategy on memory for frequency. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 489-495. Greene, R. L. (1988). Generation effects in frequency judgment. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14, 298-304. Greene, R. L. (1989). On the relationship between categorical frequency estimation and cued recall. Memory and Cognition, 17, 235-239. Greene, R. L., & Thapar, A. (1994). Mirror effect in frequency discrimination. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 946-952. Grossberg, S. (1976). Adaptive pattern classification and universal recoding, I: Parallel development and coding of neural feature detectors. Biological Cybernetics, 23, 121-134. Hamilton, D. L., & Gifford, R. K. (1976). Illusory correlation in interpersonal perception: A cognitive basis of stereotypic judgements. Journal of Experimental Social Psychology, 12, 392-407. Hasher, L., & Chromiak, W. (1977). The processing of frequency information: An automatic mechanism? Journal of Verbal Learning and Verbal Behavior, 16, 173-184. Hasher, L., & Zacks, R. T. (1979). Automatic and effortful processes in memory. Journal of Experimental Psychology: General, 108 (3), 356-388. Hasher, L., & Zacks, R. T. (1984). Automatic processing of fundamental information: The case of frequency of occurrence. American Psychologist, 39, 1372–1388. Hintzman, D. L. (1969). Apparent frequency as a function of frequency and the spacing of repetitions. Journal of Experimental Psychology, 80, 139–145. Hintzman, D. L. (1988). Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review, 95, 528–551. Hintzman, D. L. (2001). Similarity, global matching, and judgments of frequency. Memory & Cognition, 29, 547-556. Hintzman, D. L., & Block, R. A. (1971). Repetition and memory: Evidence for a multiple-trace hypothesis. Journal of Experimental Psychology, 88 (3), 297306. 178 Hintzman, D. L., Curran, T., & Oppy, B. (1992). Effects of similarity and repetition on memory: Registration without learning? Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 667-680. Hockley, W. E., & Christi, C. (1996). Tests of the separate retrieval of item and associative information using a frequency judgment task. Memory & Cognition, 24, 796-811. Howell, W. C. (1973). Storage of events and event frequencies: A comparison of two paradigms in memory. Journal of Experimental Psychology, 98, 260-263. Huber, O. (1993). The development of the probability concept: Some reflections. Archives de Psychologie, 61, 187-195. Indow, T., & Togano, K. (1970). On retrieving sequence from long-term memory. Psychological Review, 77, 317-331. Johnson, M. K., Hashtroudi, S., & Lindsay, D. S. (1993). Source monitoring. Psychological Bulletin, 114, 3-28. Johnson, M. K., Taylor, T. H., & Raye, C. L. (1977). Fact and Fantasy: The effects of internally generated events on the apparent frequency of externally generated events. Memory & Cognition, 5, 116-122. Jonides, J., & Naveh-Benjamin, M. (1987). Estimating frequency of occurrence. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 230-240. Jonides, J., & Jones, C. (1992). Direct coding for frequency of occurrence. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 107-128. Kahneman, D., Slovic, P., & Tversky, A. (Eds.). (1982). Judgment under uncertainty: Heuristics and biases. Cambridge University Press. Kelley, H. H. (1973). The process of causal attribution. American Psychologist, 28, 107-128. Kintsch, W. (1998). Comprehension. A paradigm for cognition. Cambridge: Cambridge University Press. Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence: Brown University Press. 179 Kuzmak, S. D., & Gelman, R. (1986). Young children’s understanding of random phenomena. Child Development, 57, 559-566. Leicht, K. L. (1968). Recall and judged frequency of implicitly occurring words. Journal of Verbal Learning and Verbal Behavior, 7, 918-923. Lewandowsky, S, & Smith, P. W. (1983). The effect of increasing the memorability of category instances on estimates of category size. Memory & Cognition, 11, 347-350. Lezius, W., Rapp, R., & Wettler, M. (1998). A freely available morphological analyser, disambiguator, and context sensitive lemmatizer for German. Proceedings of the COLING-ACL 1998 (pp. 743-747). Lichtenstein, S., Slovic, P., Fischhoff, B., Layman, M., & Combs, B. (1978). Judged frequency of lethal events. Journal of Experimental Psychology: Human Learning and Memory, 4, 551–581. Lopez, F. J., Shanks, D. R., Almaraz, J., & Fernandez, P. (1998). Effects of trial order on contingency judgments: A comparison of associative and probabilistic contrast accounts. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 672–694. Maki, R. H., & Ostby, R. S. (1987). Effects of levels of processing and rehearsal on frequency judgments. Journal of Experimental Psychology: Learning, Memory, and Cognition, 13, 151-163. Manis, M. Shedler, J. Jonides, J., & Nelson, T. E. (1993). Availability heuristic in judgments of set-size and frequency of occurrence. Journal of Personality and Social Psychology, 65, 448-457. Marx, M. H. (1985). Retrospective reports on frequency judgments. Bulletin of the Psychonomic Society, 23, 309-310. Mayer, R.E. (1992). Thinking, Problem Solving, Cognition (2nd ed.). New York: Freeman. Menon, G. (1993). The effects of accessibility of information in memory on judgments of behavioral frequencies. Journal of Consumer Research, 20, 431-440. 180 Menon, G., Raghubir, P., & Schwarz, N. (1995). Behavioral frequency judgments: An accessibility-diagnosticity framework. Journal of Consumer Research, 22, 212228. Messieck, D. M., & Mackie, D. (1989). Intergroup relations. Annual review of psychology, 40, 45-81. Moyer, R. S., & Dumais, S. T.(1978). Mental comparison. In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 12, pp. 117-155). New York: Academic Press. Murdock, B. (1993). TODAM2: A model for the storage and retrieval of item, associative, and serial-order information. Psychological Review, 100, 183-203. Murdock, B. (1999). Item and associative interactions in short-term memory: Multiple memory systems? International Journal of Psychology, 34, 427-433. Naveh-Benjamin, M., & Jonides, J. (1986). On the automaticity of frequency coding: Effects of competing task load, encoding strategy, and intention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 378-386. Peterson, C. R., & Beach, L. R. (1967). Man as an intuitive statistician. Psychological Bulletin, 68, 29-46. Rao, K. V. (1983). Word frequency effect in situational frequency estimation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9, 79-81. Reber, R., & Zupanek, N. (2002). Effects of processing fluency on estimates of probability and frequency. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 175-188). Oxford: University Press. Reichardt, C. S., Shaughnessy, J. J., & Zimmerman, J. (1973). On the independence of judged frequencies for items presented in successive lists. Memory and Cognition, 1, 149-156. Reyna, V. R., & Brainerd, C. J. (1994). The origins of probability judgment: A review of data and theories. In G. Wright, & P. Ayton (Eds.). Subjective probability (pp. 239-272). Chicester: Wiley. Roediger, H. L., & McDermott, K. B. (1995). Creating false memories: Remembering words that were not presented in lists. Journal of Experimental Psychology: Learning, Memory, & Cognition, 21, 803-814. 181 Rose, R. J., & Rowe, E. J. (1976). Effects of orienting task and spacing of repetitions on frequency judgments. Journal of Experimental Psychology: Human Learning and Memory, 2, 142-152. Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000). Contrasts and effect sizes in behavioral research. A correlational approach. Cambridge: University Press. Rowe, E. J. (1974). Depth of processing in a frequency judgment task. Journal of Verbal Learning and Verbal Behavior, 13, 638-643. Rumelhart, D. E., & Zipser, D. (1985). Feature discovery by competitive learning. Cognitive Science, 9, 75–112. Russell, W. A. (1970). The complete German language norms for responses to 100 words from the Kent-Rosanoff word association test. In: L. Postman, & G. Keppel (Eds.), Norms of word association (pp. 53-94). New York: Academic Press. Russell, W. A., & Jenkins, J. J. (1954). The complete Minnesota norms for responses to 100 words from the Kent-Rosanoff Word Association Test. University of Minnesota. Schaller, M., Asp, C. H., Rossell, M. C., & Heim, S. J. (1996). Training in statistical reasoning inhibits formation of erroneus group stereotypes. Personality and Social Psychology Bulletin, 22, 829-844. Schaller, M., & Maas, A. (1989). Illusory correlations and social categorization: Toward an integration of motivational and cognitive factors in stereotype formation. Journal of Personality and Social Psychology, 56, 709-721. Schimmack, U. (2002). Frequency judgments of emotions: The cognitive basis of personality assessment. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 189-204). Oxford: University Press. Schwarz, N. (1999). Frequency reports of physical symptoms and health behaviors: How the questionnaire determines the results. In Park, D. C., Morrell, R. W., & Shifren, K. (Eds.), Processing medical information in aging patients: Cognitive and human factors perspectives (pp. 93-108). Mahaw, NJ: Erlbaum. 182 Schwarz, N., Bless, H., Strack, F., Klumpp, G., Rittenauer-Schatka, H., & Simons, A. (1991). Ease of retrieval as information: Another look at the availability heuristic. Journal of Personality and Social Psychology, 61, 195-202. Schwarz, N., Hippler, H. J., Deutsch, B., & Strack, F. (1985). Response categories: Effects on behavioral reports and comparative judgments. Public Opinion Quarterly, 49, 388-395. Schwarz, N., & Scheuring, B. (1992). Selbstberichtete Verhaltens- und Symptomhäufigkeiten: Was Befragte aus Anwortvorgaben des Fragebogens lernen. Zeitschrift für Klinische Psychologie, 22, 197-208. Schwarz, N., & Wänke, M. (2002). Experiential and contextual heuristics in frequency judgement: Ease of recall and response scales. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 89-108). Oxford: University Press. Sedlmeier, P. (1999). Improving statistical reasoning: Theoretical models and practical implications. Mahwah: Lawrence Erlbaum Associates. Sedlmeier, P. (2002). Associative learning and frequency judgement. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 137-152). Oxford: University Press. Sedlmeier, P. (in press). Intuitive judgments about sample size. In K. Fiedler, & P. Juslin (Eds.), In the beginning there is a sample: Information sampling as a key to understand adaptive cognition. Cambridge: Cambridge University Press. Sedlmeier, P., Betsch, T., & Renkewitz, F. (2002). Frequency processing and cognition: introduction and overview. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 1-17). Oxford: University Press. Sedlmeier, P., & Gigerenzer, G. (1997). Intuitions about sample size: The empirical law of large numbers? Journal of Behavioral Decision Making, 10, 33-51. Sedlmeier, P., Hertwig, R, & Gigerenzer, G. (1998). Are judgments of the positional frequencies of letters systematically biased due to availability? Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 754-770. 183 Sedlmeier, P., & Renkewitz, F. (2004). Attentional processes and judgments about frequency: Does more attention lead to higher estimates? Manuscript in Preparation. Sedlmeier, P., Wettler, M., & Seidensticker, P. (2004). Why Do We Associate „Woman“ when we hear „man“? A computational model of associative responses based on co-occurrences of words in large text corpora. Manuscript in Preparation. Shanks, D. R. (1995). The psychology of associative learning. Cambridge: University Press. Shanteau, J. (1978). When does a response error become a judgmental bias ? Journal of Experimental Psychology: Human Learning and Memory, 4, 579-581. Shapiro, B. J. (1969). The subjective estimation of relative word frequency. Journal of Verbal Learning and Verbal Behavior, 8, 248-251. Shaughnessy, J. J., & Underwood, B. J. (1973). The retention of frequency information for categorized lists. Journal of Verbal Learning and Verbal Behavior, 12, 99107. Silver, N. C., & Dunlap, W. P. (1987). Averaging correlation coefficients: Should Fisher's Z transformation be used? Journal of Applied Psychology, 72, 146148. Slamecka, N. J. (1985). Ebbinghaus: Some associations. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 414–435. Smith, E., & Medin, D. (1981). Categories and Concepts. Cambridge, Mass.: Harvard University Press. Sumby, W. H. (1963). Word frequency and the serial position effect. Journal of Verbal Learning and Verbal Behavior, I, 443-450. Taylor, S. E., Fiske, S. T., Etcoff, N. L., & Ruderman, A. J. (1978). Categorial and contextual basis of person memory and stereotyping. Journal of Personality and Social Psychology, 36, 778-793. Tryk, H. E. (1968). Subjective scaling and word frequency. American Journal of Psychology, 81, 170-177. 184 Tversky, A., & Kahneman, D. (1973). Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 4, 207–232. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainy: Heuristics and biases. Science, 185, 1124-1131. Varey, C. A., Mellers, B. A., & Birnbaum, M. H. (1990). Judgments of proportions. Journal of Experimental Psychology: Human Perception and Performance, 16, 613–625. Vereb, C. E., & Voss, J. F. (1974). Perceived frequency of implicit associative responses as a function of frequency of occurrence of list items. Journal of Experimental Psychology, 103, 992-998. Wänke, M., Schwarz, N., & Bless, H. (1995). The availability heuristic revisited: Experienced ease of retrieval in mundane frequency judgments. Acta Psychologica, 89, 83-90. Wasserman, E. A., & Miller, R. R. (1997). What's elementary about associative learning? Annual Review of Psychology, 48, 573-607. Watkins, M. J., & LeCompte, D. (1991). Inadequacy of recall as a basis for frequency knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 1161–1176. Weber, E. U., Bockenholt, U., Hilton, D. J., & Wallace, B. (1993). Determinants of diagnostic hypothesis generation: Effects of information, base rates, and experience. Journal of Experimental Psychology: Learning, Memory, and Cognition, 19, 1151–1164. White, B. J. (1991). Availibility heuristic and judgments of letter frequency. Perceptual and Motor Skills, 72, 34. Williams, K.W., & Durso, F.T. (1986). Judging category frequency: Automaticity or availability. Journal of Experimental Psychology: Learning, Memory, and Cognition, 12, 387-396. Yates, J. F. (1990). Judgment and decision-making. Englewood Cliffs: Prentice-Hall. Zacks, R. T., & Hasher L. (2002). Frequency processing: A twenty-five year perspective. In P. Sedlmeier, & T. Betsch (Eds.), Etc.: Frequency processing and cognition (pp. 21-36). Oxford: University Press. 185 Zacks R. T., Hasher, L., & Sanft, H. (1982). Automatic encoding of event frequency: Further findings. Journal of Experimental Psychology: Learning, Memory, and Cognition, 8, 106-116. Zajonc, R. B. (1968). Attidutinal effects of mere exposure. Journal of Personality and Social Psychology, Monograph Supplement, 9 (2, part 2), 1-27. Zechmeister E. B., King, J. F., Gude, C., & Opera-Nadi, B. (1975). Ratings of frequency, familiarity, orthographic distinctiveness and pronounciability for 192 surnames. Instrumentation and Psychophysics, 7, 531-533. 186
© Copyright 2026 Paperzz