VISUAL COGNITION, 2006, 13 (7/8), 1012 /1026 Chasing psycholinguistic effects: A cautionary tale Michael B. Lewis Cardiff University, UK Many studies have addressed the issue of whether age of acquisition and/or frequency affect particular lexical tasks. Methods typically employed in such studies are based on the general linear model (e.g., ANOVA or multiple regression). These methods assume manipulated independent variables whereas the usual approach of investigating age-of-acquisition and frequency effects uses estimated norms of word properties. This failure to truly manipulate variables violate the assumptions of the analyses. A simulation is provided that demonstrates how this violation can lead to erroneous conclusions of effects when none are present. Recommendations are made for a more correlational approach to analysis using structural equation modelling techniques. It is also discussed how this use of estimates of lexical data is problematic for determining effects throughout psycholinguistic research. Since Carroll and White’s (1973) study into age-of-acquisition (AoA) and frequency effects in a picture naming task, there has been considerable debate concerning which tasks show an AoA effect and which show a frequency effect. The presence or absence of these two effects has been the focus of an increasing number of pages in psychology journals. At different times, for different tasks and by different authors, a variety of positions have been taken. It has been suggested that frequency affects reaction times in some lexical tasks and AoA is of no consequence (e.g., Lewis, Gerhand, & Ellis, 2001; Oldfield & Wingfield, 1965; Zevin & Siedenberg, 2002). It has been suggested that AoA affects reaction times and frequency has no effect for tasks such as object-naming and lexical-decision tasks (e.g., Morrison, Ellis, & Quinlan, 1992; Turner, Valentine, & Ellis, 1998). Another suggestion is that both frequency and AoA affect reaction times and this can be either as additive factors (e.g., Carroll & White, 1973; Gerhand & Barry, 1998; Turner et al., 1998) or in an interactive manner (e.g., Barry, Morrison, & Ellis, 1997; Gerhand & Barry, 1999). Some studies have attempted to map out the way the size of AoA and frequency effects change over different tasks (e.g., Ghyselinck, Lewis, & Brysbaert, 2004). Similar debates erupt whenever Please address all correspondence to: Michael B. Lewis, School of Psychology, Cardiff University, PO Box 901, Cardiff CF10 3YG, UK. Email: [email protected] http://www.psypress.com/viscog # 2006 Psychology Press Ltd DOI: 10.1080/13506280544000174 CHASING PSYCHOLINGUISTIC EFFECT 1013 frequency effects or AoA effects are shown in a new task or methodology (e.g., semantic processing */Brysbaert, van Wijnen-daele & de Deyne, 2000): Is that frequency effect really an AoA effect? Does that task show a frequency effect as well as an AoA effect? The main aim of these studies into AoA, and the related issue of frequency, is to establish whether, in a particular task in a particular language, the time to complete that task for a word is affected by AoA and/ or frequency. Words have a variety of properties such as frequency, AoA, length, initial phoneme, and many others (some of which are matters of linguistic fact */i.e., that a word is a noun) that might affect reaction times. Some of these are directly measurable (e.g., word length), whereas others have to be estimated. Word frequency and AoA cannot be absolutely known and so estimates are generated. Word frequency estimates or norms can be generated by counting the number of times words occur in a corpus of words. Good agreement is found between estimates based on different corpora or based on ratings of word frequency of a language (e.g., r.557 */see Morrison, Chappell, & Ellis, 1997) and so it is likely that these estimates are accurate. Even word frequency norms based on the largest of corpuses, however, will only be an estimate of actual word frequency. AoA estimates or norms have been generated by asking people to say when they thought that they learnt a word (e.g., Gilhooly & Logie, 1980). Other AoA norms have been generated by finding the proportion of a sample of children who can read a particular word at a particular age (Morrison et al.). These two methods for generating AoA norms show good agreement and so are probably quite accurate (e.g., r .747 */see Morrison et al.). The properties of words may well be related to each other for a variety of causal reasons. For example, we may tend to learn shorter words before being taught longer words (causation due to schooling) and it may be the case that commonly used items develop shorter names (e.g., ‘‘telephone’’ to ‘‘phone’’, causation due to evolution of language). Probably due to these causative factors, many of the estimates of properties of words that are of interest to us correlate strongly with each other. Importantly, for the question in hand, word frequency might very well correlate with AoA. This correlation might occur because we start using words earlier if they are heard more often, or more frequent words may be taught to us at a younger age. This correlation between the word properties cannot be measured directly but evidence for it comes from the fact that the word frequency norms correlate with the AoA norms (depending on which databases are used the correlation can be as high as r .616 */see Morrison et al., 1997). Such correlation between estimates of different properties can be almost as large as the correlations between different estimates of the same property. 1014 LEWIS This intercorrelation between the properties of words is problematic for understanding direct affects of these properties on task performance. If one wishes to establish that, say, AoA affects lexical decision speed then it is necessary to consider its contribution independently of other factors that may also affect speed. This is difficult if AoA is correlated with frequency. Researchers have typically adopted one of four different methods in order to isolate AoA effects. These different methods have been used because of differing beliefs about which one is best able to uncover causal relationships. These four methods are each briefly described below before considering which, if any, is best for determining the presence of psycholinguistic effects. EXPERIMENTAL METHODS EMPLOYED TO INVESTIGATE AOA EFFECTS Semifactorial designs In these studies, two experiments are conducted each with two lists of words. In one experiment the lists will differ in their average AoA values while all other considered parameters (usually including frequency) are matched. In the other experiment the lists will differ in their average frequency values while all other considered parameters (using including AoA) are matched. Such experiments */it is supposed */can demonstrate the presence of independent AoA effects and frequency effects. An accepted limitation of this method is that it cannot reveal any potential interactions. An example of this method is the set of experiments conducted by Turner, Valentine, and Ellis (1998) into the effects of AoA and frequency in auditory and visual lexical-decision tasks. Factorial designs These studies also employ four lists of words but this time they are employed within the same experiment. The lists are arranged such that two lists have high frequency and two have low frequency, and also two lists are early acquired and two lists are late acquired. It is also common to match the lists on any other parameters that may affect reaction times. This methodology is preferred by many researchers because it follows the principles of a traditional experimental design with full manipulation of the independent variables. The difficulty with this type of design is in generating the lists so that they are well matched. This method allows for consideration of interactions but ignores much of the wealth of information contained in word-norm data by converting continuous scales into dichotomous scales (semifactorial designs also suffer from this problem). Examples of this type CHASING PSYCHOLINGUISTIC EFFECT 1015 of experiment are those conducted by Gerhand and Barry (1998) into the effects of AoA and frequency in word-reading tasks. Multiple regression designs (simultaneous not stepwise) These studies typically involve a single list of words chosen to be representative of the set of words in question. No selection process based on matching word properties is required and so the list is easy to obtain. The analysis is performed by considering how much of the variability in the dependent variable can be accounted for by a particular independent variable (e.g., AoA) once all other independent variables (usually including frequency) have been partialled out (i.e., their contribution to the accounting for the variability of the dependent variable has been removed). One problem with this type of analysis is that the correlation between AoA and frequency remains within the stimuli. This colinearity can swamp the independent effects of AoA and frequency leading to nonsignificant results (hence, it can be conservative in revealing effects). The advantages of this method are that: It uses the real values of the word-norm estimates (not just a dichotomous split); it allows any number of additional predictors to be included without requiring matching; and if there really is an independent effect of AoA then it should overcome the colinearity problem. This method has been used to investigate the effects of AoA and other factors in object naming in young (e.g., Barry et al., 1997) and older people (e.g., Hodgson & Ellis, 1998). Stepwise regression designs Stepwise regression has been employed as a way to bypass the colinearity problem of simultaneous multiple regression. This method selects the best predictors to include in the regression model. Any colinearity between factors inside the model and those outside the model is ‘‘given’’ to those inside the model. Morris (1981) has demonstrated previously that this method has dangerous consequences and so has rarely been employed since. It is worth noting, however, that Morris’ criticisms only apply to stepwise regression and not simultaneous multiple regression. Prior to Morris’ discourse, this method had been used by Carroll and White (1973), and Gilhooly and Gilhooly (1979) to investigate AoA and frequency effects. In the light of Morris’ work and because this method is no longer employed, stepwise regression will not be considered further here. 1016 LEWIS A SIMULATION TO SHOW THE PROBLEMS WITH MULTIPLE REGRESSION DESIGN All of the methods described above are based on the same mathematical equations */that is the General Linear Model. They have the same underlying assumptions about the data that are being considered. Unfortunately, when considering the effects of AoA and frequency on lexical tasks, at least one of these assumptions is violated (in fact more than one is violated but the issues of curvilinearity and normality have been dealt with elsewhere*/ Lewis et al., 2001). The important assumption considered here is that in order to infer effects we have to manipulate our independent variables and have experimental control over them. In psycholinguistic experiments, however, we do not manipulate the AoA of a word or its frequency. Instead, we may choose a word because of its AoA and its frequency for inclusion in an experiment but that is not the same thing as manipulating it. This is not strictly an experimental design */it is pseudoexperimental. All the analysis can do is to reveal correlational structures. This lack of actual manipulation is particularly damaging because our independent variables are themselves experimental estimates of the actual properties. The consequences of this can be demonstrated using a simulation of what might be happening and how Figure 1. The hypothetical situation used for all the simulations. Actual frequency affects actual AoA and both of these are hidden from the experimenter. Actual AoA (only) affects reaction times. The word norms are affected by the hidden properties. CHASING PSYCHOLINGUISTIC EFFECT 1017 this can lead to correlational structures or effects that are not present in the data for the actual values. Let us assume, hypothetically, that what is really going on is as follows. First, the frequency with which we see a word affects the age at which it is learnt. Also, the age at which a word is learnt affects the time taken in a reaction time task (e.g., lexical decision). In this hypothetical simulation, frequency of the word does not directly affect reaction times */it only has an effect moderated by AoA. Now frequency is not the sole influence on AoA and so there is also some unexplained variation. Also, AoA is not the sole influence of reaction time and so there is some unexplained variation. The situation can be drawn as in Figure 1. We can measure the reaction times to words but we cannot have direct access to the actual frequency of the words or their actual AoA (these are hidden properties). Instead, we have the estimated word norms that we must assume are affected by the actual word values but have unexplained variation within them as well. This artificial situation can easily be modelled using randomly generated data. This was done using normal distributions to generate a set of items that represent the word frequencies for 200 words. From the hypothetical situation that we are using, word frequency affects AoA. To illustrate this we can add another normally distributed set of data to the negative frequency scores (negative because there is a negative correlation between AoA and frequency). These will be the actual AoA scores as generated by actual frequency values plus some random unexplained variance. Next, reaction times are generated the by adding random data to the AoA data (because AoA affects reaction time together with unexplained variation). The frequency norms and the AoA norms can be generated by adding random variation to the actual values. The appendix describes how this hypothetical situation was generated using random data sets. In an experiment, we typically wish to find out which factors (e.g., frequency or AoA) affect the outcome (e.g., reaction times). We do not know the actual values for these and so we use the estimates available from the word norms. A multiple regression analysis, therefore, would explore whether the AoA norms affects reaction times once any variation that can be accounted for by the frequency norms has been removed and whether the frequency norms affects reaction times once any variation that can be accounted for by AoA norms has been removed. In the current simulation, this regression led to an overall R squared value of .459 with both AoA, t(l97 )9.099, p B.001, and frequency, t(197) 4.753, p B.001, being significant independent factors. The conclusion from this analysis, therefore, would be that both AoA and frequency have independent effects on reaction times. In the current simulation we know that the conclusion drawn from the multiple regression is fallacious. We know from the way that the data were generated that only AoA has a direct effect on reaction times and any effect 1018 LEWIS of frequency is completely mediated by AoA. The conclusion of effects from this multiple regression is flawed. To reinforce the fact that only AoA has a direct effect on reaction times, we can conduct the multiple regression using the actual values for frequency and AoA (i.e., the hidden properties). We can do this for the current artificial simulation but this would be impossible in reality. This analysis gives an R squared value of .665 with AoA, t(197) 12.262, pB.001, being a significant predictor and frequency, t(197) 0.486, p .5, being nonsignificant. That is, only actual AoA affects reaction times*/a correct conclusion. In this simulation, the multiple regression using the word norms suggests a very different underlying model than was actually used to generate the data. It can suggest that an effect is present even when it is not. This simple demonstration suggests that multiple regression can give uninformative results when applied to psycholinguistic data derived from estimates. The reason why the multiple regression is problematic in the case described above is because it is designed for experimental manipulations. The regression analysis assumes that the predictors are fixed or experimentally manipulated variables in which case the use of the term ‘‘effect’’ would be appropriate. What we have here, and in the psycholinguistic literature, are predictors that are themselves random variables and so we can only discuss the results in terms of multiple correlations. The word ‘‘cat’’ is not read quickly because it is judged to be learnt early or because it occurs X number of times in a particular corpus; the word ‘‘cat’’ is read quickly because of some (hidden) property of its mental representation. This mental property correlates with frequency and/or AoA norms. It is appropriate, therefore, to say that AoA norms and frequency norms correlate with reaction times but not that these norms (or whatever these norms might represent) affect reaction times. A SIMULATION TO SHOW THE PROBLEMS WITH FACTORIAL DESIGNS It has been suggested that because of the problem described above, and other reasons, factorial designs are better than multiple regression designs. In factorial designs, word lists are selected such that they manipulate frequency and AoA in unambiguous ways. It could be argued that the difference between the actual values of the property and the estimates of the property will not affect this type of analysis. Factorial designs, therefore, have been used as supposedly truly experimental designs whereas multiple regressions are more correlational or pseudoexperimental in nature. It can be demonstrated, however, that this argument is false for reasons very similar to those for the multiple regression. This demonstration uses the same set of data as CHASING PSYCHOLINGUISTIC EFFECT 1019 Figure 2. Data from the simulation using a factorial design. The four lists of items were well matched on frequency and AoA norms. AoA and frequency appear to affect the reaction time data (high frequency and early acquired items produce lower reaction times). The four lists, however, are not matched on the actual frequency and AoA values. the multiple regression demonstration above but applies a factorial experiment design. As before, we have frequency affecting AoA, which in turn affects reaction times. Also, frequency affects frequency norms and AoA affects the AoA norms. From the original 200 items, four lists of 28 items each were selected. The criteria for selection of the items were that they generated sets that varied on their average AoA norms whilst being matched on their average frequency norms and vice versa. The left-hand side of Figure 2 shows how well the four sets were matched on frequency norms and AoA norms. Clearly, the match is very good. The procedure used to generate the four lists was similar to that which would be employed by an experimenter selecting words for inclusion in a factorial AoA and frequency experiment. The reaction times for the items in the four sets were analysed using a standard two-by-two factorial by-item ANOVA. This ANOVA found a significant effect of AoA, F(1, 108)34.491, P B.001, and frequency, F (1, 108) 10.135, P B.001. The interaction was not significant, F (1, 108) 0.197, p .5. Using this factorial design, therefore, one would conclude that both frequency and AoA affect the reaction times. Again, as we know the actual model that generated the data does not have frequency affecting reaction times, this result means the conclusions drawn from the factorial analysis are incorrect. 1020 LEWIS The reason why the ANOVA gets it so wrong is the same as why the multiple regression got it wrong (after all, they are based on the same mathematics). While the selection of the items for the four lists appears to be in the experimenter’s hands, in fact, it is based on estimates. Because of the correlation between the actual AoA and frequency data, the experimenter will have to be strategic when choosing items that reduce this correlation. Unfortunately, selection of items that reduce this correlation will exaggerate the incorrect part of the norms (i.e., that part which is spurious variation rather than being related to the actual values). The higher the correlation between AoA and frequency, the harder it becomes to select items that are matched on one factor but vary on the other. If the two actual word properties are completely correlated then the selection will be based entirely on the spurious variations in the norms. The lower the correlation between the actual values, the less influence unexplained variance will have. The net result of any correlation between actual AoA and frequency values is that when the lists are matched, they are being matched to a large degree on the spurious variation rather than the actual values. While two sets may have the same average AoA norm the average actual AoA for one may be quite different than for the other. The right-hand side of Figure 2 illustrates the actual values of the four lists in the current simulation. These lists were almost perfectly matched on the word norms but their matching on the actual values would worry any experimentalist. This failure to match on actual values comes from the correlation of the actual values together with the use of estimates of these values. These two properties are present in the lexical data. It is this failure to match on actual values that leads to the erroneous conclusions in the ANOVA above. In this way, the factorial design is no more experimental than multiple regression design. As such, this type of pseudoexperimental analysis of psycholinguistic data cannot reveal effects but merely the presence of correlations (and if all we can get are correlations then the multiple regression */or rather multiple correlation */would be a better method to employ). A SIMULATION TO SHOW THE PROBLEMS WITH SEMIFACTORIAL DESIGNS For the sake of completeness, it can be demonstrated that the semifactorial design has the same failing. Using the same model and the same data as in the simulations above, two lists of 45 items each were chosen. These lists were chosen such that they were matched on their average AoA norm but differed on their frequency norm (see Figure 3). An unpaired t -test was conducted on the reaction times for the two lists. This found a significant CHASING PSYCHOLINGUISTIC EFFECT 1021 Figure 3. Data from the simulation using a semifactorial design. The two lists of items were well matched on AoA norms. Frequency appears to affect the reaction time data (high frequency produce lower reaction times). The two lists, however, are not matched on the actual AoA values. difference, t(88) 2.691, p B.001. The conclusion from this analysis would be that frequency affects reaction times when AoA is controlled for. Once again, this conclusion is incorrect given the actual generation of the model from which the data came from. The reason for the incorrect conclusion is again the fact that estimates are being used in an analytical design that makes the assumption of a manipulation. WHERE DOES THIS LEAVE PSYCHOLINGUISTIC RESEARCH? From the discussion so far, it might appear that only AoA and not frequency affects lexical tasks (the simulation shows that such a situation is consistent with the empirical evidence). This is not, however, what is intended. In fact, the above discussion can be made swapping the two terms around so that it is frequency and not AoA that affects reaction times. In this case the pattern of results would still be that AoA and frequency affect reaction times but this time it is the AoA result that is fallacious. Indeed, the same argument applies to any set of correlated values. The point is that the tools that we have been employing are not capable of answering the questions we wish to ask of the lexical data we have available to us. Neither factorial designs nor multiple regression techniques can provide evidence of causal relationships if we have 1022 LEWIS not truly manipulated our independent variables. Mere selection from a list (albeit a long list of words) is not in itself an experimental manipulation. Virtually all studies into the effects of properties of words on lexical processing (including those exploring AoA effects) use these pseudoexperimental methods to make conclusions that particular properties affect the reaction times for a particular task. The conclusions being drawn concern, for example, age-of-acquisition effects, whereas all that has really been established are correlations. The problem highlighted here is not restricted to the psycholinguistic domain. Studies into face recognition using naturally occurring personalities (e.g., Lewis, 1999; Lewis, Chadwick, & Ellis, 2002; Moore & Valentine, 1998) also make the mistake of treating factors that we have no real control over (e.g., when a person become famous) as a manipulated independent variable. The problem in this domain, however, is not as severe as in the lexical domain because AoA and frequency are not likely to be as highly correlated as they are for words (there is no particular reason why those faces learnt at a young age will be seen more often), but nevertheless the problem is still present. One way to completely remove the problem is to generate a completely new set of stimuli on which to train the participants. Obviously such a procedure would have its own ethical problems if the AoA term were to be explored over a range covering several years. Some previous AoA studies have involved doing this but in a limited domain: Exploration of artificial neural network models typically involves the controlled delivery of arbitrary patterns during learning. In these, the experimenter really does have full control over the independent variables and so multiple regression or factorial designs would be a sound method of analysis. As Ellis and Lambon Ralph (2000) have demonstrated, these neural networks do show AoA and frequency effects and these are real experimental effects rather than just correlations. It can be argued that it is only these experiments, those using artificial neural networks, that show real AoA effects. These studies are experiments because variables have really been manipulated by the researcher and the outcome of these manipulations can be interpreted as real effects. NONEXPERIMENTAL PSYCHOLINGUISTIC ANALYSIS There are many possible factors involved in any psycholinguistic performance. Some of the most commonly considered factors are AoA, frequency, imageability, concreteness, word length, familiarity, and so on, although there are many more that could be considered. These all correlate with each other to lesser or greater degrees and they also correlate with outcome measures such as reaction times in lexical decision tasks or word reading tasks. Language is a social construction that has evolved over a long period CHASING PSYCHOLINGUISTIC EFFECT 1023 of time. This fact means that there may be many relationships caused by its history, such as the shortening of words that are used more often. Also, there is the possibility of causal relationships based on cognitive factors (such as the familiarization of words that are seen more often). The simulation and discussion above have demonstrated that the research that has been conducted so far is not able to distinguish between effects and mere correlations. The question remains, can we use these correlations to build and test some model of causality and can such models tell us more than the simple correlations do? Structural equation modelling has been used effectively in other areas of psychology to use correlations to build up models of causality. This method explores the correlations between factors given a hypothetical set of causal relationships. This set of relationships can then be contrasted against other hypotheses and evaluated. Further, some forms of structural equation modelling allow for the consideration of latent variables. These are factors that cannot be measured directly (e.g., intelligence or depression). If a person has depression then this will show in a number of directly measurable ways (e.g., loss of appetite, self loathing, or tiredness). When studying depression, we need to study these measurable factors in order to find out about the latent variable of depression. In psycholinguistics, we are interested in word representations as affected by word frequency, AoA, or other factors. Obviously, we cannot measure directly the strength of a word representation (it is latent) but we can use reaction times to get a measure of it. Similarly, we cannot measure the age at which someone learns a word but we can have a number of variables that are likely to be related to it quite closely (e.g., AoA estimates from adults or age of reading scores from children). Only by considering models of language which incorporate latent variables (e.g., word representation strength, actual word frequency, actual age of acquisition) will it be possible to evaluate causation. The existing psycholinguistic data provides a great variety of different information about measurable indicators or factors (e.g., several different word frequency counts or estimates). These data can be brought together to compare and contrast the variety of different models of causation that have been suggested. CONCLUSIONS Above, the methods for exploring the effects of frequency and AoA on lexical tasks were reviewed. These methods assume that the experimenter freely manipulates the independent variables. In fact, psycholinguistic experiments often use estimates of the actual variables and these variables are themselves correlated. A simulation of a hypothetical situation was described in which only one property (of two correlated properties) affected reaction times. It was 1024 LEWIS shown that, when using estimates of the properties, the two properties were found to ‘‘affect’’ reaction times. This erroneous conclusion would be made from multiple regression or factorial design experiments. It is suggested that the same types of erroneous conclusions occurs in real psycholinguistic research employing these techniques. It is suggested that a structural equation modelling procedure might avoid these problems. The case that has been made here is that there is a problem with research into the analysis of the effects of age of acquisition and frequency in tasks involving word processing. The reason for this focus was that this is the scope of this special issue of the journal. The same argument can apply to any situation where one is trying to infer causality from correlations between estimated properties and performance measures. Indeed, much of our understanding of lexical processing is based on the experimental techniques critiqued here being applied to normative estimate data and reaction times. A consequence of the analysis reported here is that there are no studies that demonstrate age-of-acquisition effects for psycholinguistic tasks. The argument, however, can be taken further to say that no study has demonstrated a psycholinguistic effect based on word properties. Reported effects as robust as word frequency effects or word length effects may be just as fallacious as the effects revealed in the simulations above. REFERENCES Barry, C, Morrison, C. M., & Ellis, A. W. (1997). Naming the Snodrass and Vanderwart pictures: Effects of age of acquisition frequency and name agreement. Quarterly Journal of Experiment Psychology, 50A , 560 /585. Brysbaert, M., van Wijnendaele, I., & de Deyne, S. (2000). Age-of-acquisition effects in semantic processing tasks. Acta Psychologica , 104 , 215 /226. Carroll, J. B., & White, M. N. (1973). Word frequency and age of acquisition as determiners of picture naming latencies. Quarterly Journal of Experimental Psychology, 12 , 85 /95. Ellis, A. W., & Lambon Ralph, M. A. (2000). Age of acquisition effects in adult lexical processing reflect loss of plasticity in maturing systems: Insights from connectionist networks. Journal of Experimental Psychology: Learning, Memory, and Cognition , 26 (5), 1103 /1123. Gerhand, S., & Barry, C. (1998). Word frequency effects in oral reading are not merely age-ofacquisition effects in disguise. Journal of Experimental Psychology: Learning, Memory, and Cognition , 24 , 267 /283. Gerhand, S., & Barry, C. (1999). Age of acquisition, frequency and the role of phonology in the lexical decision task. Memory and Cognition , 27 , 592 /602. Ghyselinck, M., Lewis, M. B., & Brysbaert, M. (2004). Age of acquisition and the cumulativefrequency hypothesis: A review of the literature and a new multi-task investigation. Acta Psychologia , 115 , 43 /67. Gilhooly, K. J., & Gilhooly, M. L. (1979). Age-of-acquisition effects in lexical and episodic memory tasks. Memory and Cognition , 7 , 214 /223. CHASING PSYCHOLINGUISTIC EFFECT 1025 Gilhooly, K. J., & Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity and ambiguity measures for 1944 words. Behavior Research Method and Instrumentation , 12 , 395 /27. Hodgson, C, & Ellis, A. W. (1998). Last in, first to go: Age of acquisition and naming in the elderly. Brain and Language, 64 , 146 /163. Lewis, M. B. (1999). Age of acquisition in face categorisation: Is there an instance-based account? Cognition , 71 , B23 /B39. Lewis, M. B., Chadwick, A. J., & Ellis, H. D. (2002). Exploring a neural network account of ageof-acquisition effects using repetition priming of face. Memory and Cognition , 30 , 1229 / 1237. Lewis, M. B., Gerhand, S., & Ellis, H. D. (2001). Re-evaluating age of acquisition effects: Are they simply cumulative-frequency effects? Cognition , 75 , 1 /17. Moore, V., & Valentine, T. (1998). The effect of age of acquisition on speed and accuracy of naming famous faces. Quarterly Journal of Experimental Psychology, 51A , 485 /513. Morris, P. E. (1981). Age of acquisition, imagery, recall, and the limitations of multipleregression analysis. Memory and Cognition , 9 , 277 /282. Morrison, C. M., Chappell, T. D., & Ellis, A. W. (1997). Age of acquisition norms for a large set of object names and their relation to adult estimates and other variables. Quarterly Journal of Experimental Psychology, 50A , 528 /559. Morrison, C. M., Ellis, A. W., & Quinlan, P. T. (1992). Age of acquisition, not word frequency, affects object naming, not object recognition. Memory and Cognition , 20 , 705 /714. Oldfield, R. C, & Wingfield, A. (1965). Response latencies in naming objects. Quarterly Journal of Experimental Psychology, 17 , 273 /281. Turner, J. E., Valentine, V., & Ellis, A. W. (1998). Contrasting effects of AoA and word frequency on auditory and lexical decision. Memory and Cognition , 26 , 740 /753. Zevin, J. D., & Siedenberg, M. S. (2002). Age of acquisition effects in word reading and other tasks. Journal of Memory and Language, 47 , 1 /29. 1026 LEWIS APPENDIX Steps required to generate the simulation using statistical software such as spss or statview 1. Generate five columns of normally distributed random data. Each column should be independent but have the same mean and standard deviation (e.g., 0.0 and 1.0 respectively). All columns should be the same length (e.g., 200 rows). 2. Label the first column ‘‘Actual Frequency’’. This is a property of the 200 words that is not affected by any other term (as such it is all unexplained variance). 3. Generate a new column called ‘‘Actual AoA’’ as ‘‘Column 2’’ ‘‘Actual frequency’’. This represents the Actual AoA values of the words affected by things not measured (unexplained variance) and the frequency of the words (Actual frequency). 4. Generate a new column called ‘‘Reaction time’’ as ‘‘Actual AoA’’‘‘Column 3’’. This represents the reaction time for each of the words as affected by things not measured (unexplained variance) and the actual AoA of the words. 5. Generate a new column called ‘‘Norm Frequency’’ as ‘‘Actual frequency’’‘‘Column 4’’. This represents the experimentally generated word frequency norms for the words predicted by the real word frequency by also affected by measurement error (unexplained variance). 6. Generate a new column called ‘‘Norm AoA’’ as ‘‘Actual AoA’’‘‘Column 5’’. This represents the experimentally generated word AoA norms for the words predicted by the real word AoA by also affected by measurement error (unexplained variance). The multiple regression that models those typically employed in real analysis is ‘‘Reaction time’’ predicted by ‘‘Norm frequency’’ and ‘‘Norm AoA’’. This type of analysis of the simulation data suggests that frequency affects reaction times even when AoA effects have been removed. This is not true. To conduct an ANOVA, choose four lists of items that vary norm AoA and norm frequency factorially. The result is likely to find an effect of frequency independent of AoA. This is because, typically, the actual AoA and actual frequency values are not well matched across the four lists. A semifactoral analysis can also be conducted using a similar list selection method. This analysis too will show an effect of frequency independent of AoA */an effect that is not true for the data as it was generated.
© Copyright 2026 Paperzz