Chasing psycholinguistic effects: A cautionary tale

VISUAL COGNITION, 2006, 13 (7/8), 1012 /1026
Chasing psycholinguistic effects: A cautionary tale
Michael B. Lewis
Cardiff University, UK
Many studies have addressed the issue of whether age of acquisition and/or
frequency affect particular lexical tasks. Methods typically employed in such studies
are based on the general linear model (e.g., ANOVA or multiple regression). These
methods assume manipulated independent variables whereas the usual approach of
investigating age-of-acquisition and frequency effects uses estimated norms of word
properties. This failure to truly manipulate variables violate the assumptions of the
analyses. A simulation is provided that demonstrates how this violation can lead to
erroneous conclusions of effects when none are present. Recommendations are
made for a more correlational approach to analysis using structural equation
modelling techniques. It is also discussed how this use of estimates of lexical data is
problematic for determining effects throughout psycholinguistic research.
Since Carroll and White’s (1973) study into age-of-acquisition (AoA) and
frequency effects in a picture naming task, there has been considerable
debate concerning which tasks show an AoA effect and which show a
frequency effect. The presence or absence of these two effects has been the
focus of an increasing number of pages in psychology journals. At different
times, for different tasks and by different authors, a variety of positions have
been taken. It has been suggested that frequency affects reaction times in
some lexical tasks and AoA is of no consequence (e.g., Lewis, Gerhand, &
Ellis, 2001; Oldfield & Wingfield, 1965; Zevin & Siedenberg, 2002). It has
been suggested that AoA affects reaction times and frequency has no effect
for tasks such as object-naming and lexical-decision tasks (e.g., Morrison,
Ellis, & Quinlan, 1992; Turner, Valentine, & Ellis, 1998). Another suggestion
is that both frequency and AoA affect reaction times and this can be either
as additive factors (e.g., Carroll & White, 1973; Gerhand & Barry, 1998;
Turner et al., 1998) or in an interactive manner (e.g., Barry, Morrison, &
Ellis, 1997; Gerhand & Barry, 1999). Some studies have attempted to map
out the way the size of AoA and frequency effects change over different tasks
(e.g., Ghyselinck, Lewis, & Brysbaert, 2004). Similar debates erupt whenever
Please address all correspondence to: Michael B. Lewis, School of Psychology, Cardiff
University, PO Box 901, Cardiff CF10 3YG, UK. Email: [email protected]
http://www.psypress.com/viscog
# 2006 Psychology Press Ltd
DOI: 10.1080/13506280544000174
CHASING PSYCHOLINGUISTIC EFFECT
1013
frequency effects or AoA effects are shown in a new task or methodology
(e.g., semantic processing */Brysbaert, van Wijnen-daele & de Deyne, 2000):
Is that frequency effect really an AoA effect? Does that task show a
frequency effect as well as an AoA effect?
The main aim of these studies into AoA, and the related issue of
frequency, is to establish whether, in a particular task in a particular
language, the time to complete that task for a word is affected by AoA and/
or frequency. Words have a variety of properties such as frequency, AoA,
length, initial phoneme, and many others (some of which are matters of
linguistic fact */i.e., that a word is a noun) that might affect reaction times.
Some of these are directly measurable (e.g., word length), whereas others
have to be estimated. Word frequency and AoA cannot be absolutely known
and so estimates are generated. Word frequency estimates or norms can be
generated by counting the number of times words occur in a corpus of
words. Good agreement is found between estimates based on different
corpora or based on ratings of word frequency of a language (e.g.,
r.557 */see Morrison, Chappell, & Ellis, 1997) and so it is likely that
these estimates are accurate. Even word frequency norms based on the
largest of corpuses, however, will only be an estimate of actual word
frequency. AoA estimates or norms have been generated by asking people to
say when they thought that they learnt a word (e.g., Gilhooly & Logie, 1980).
Other AoA norms have been generated by finding the proportion of a
sample of children who can read a particular word at a particular age
(Morrison et al.). These two methods for generating AoA norms show good
agreement and so are probably quite accurate (e.g., r .747 */see Morrison
et al.).
The properties of words may well be related to each other for a variety of
causal reasons. For example, we may tend to learn shorter words before
being taught longer words (causation due to schooling) and it may be the
case that commonly used items develop shorter names (e.g., ‘‘telephone’’ to
‘‘phone’’, causation due to evolution of language). Probably due to these
causative factors, many of the estimates of properties of words that are of
interest to us correlate strongly with each other. Importantly, for the
question in hand, word frequency might very well correlate with AoA. This
correlation might occur because we start using words earlier if they are heard
more often, or more frequent words may be taught to us at a younger age.
This correlation between the word properties cannot be measured directly
but evidence for it comes from the fact that the word frequency norms
correlate with the AoA norms (depending on which databases are used the
correlation can be as high as r .616 */see Morrison et al., 1997). Such
correlation between estimates of different properties can be almost as large
as the correlations between different estimates of the same property.
1014
LEWIS
This intercorrelation between the properties of words is problematic for
understanding direct affects of these properties on task performance. If one
wishes to establish that, say, AoA affects lexical decision speed then it is
necessary to consider its contribution independently of other factors that
may also affect speed. This is difficult if AoA is correlated with frequency.
Researchers have typically adopted one of four different methods in order to
isolate AoA effects. These different methods have been used because of
differing beliefs about which one is best able to uncover causal relationships.
These four methods are each briefly described below before considering
which, if any, is best for determining the presence of psycholinguistic effects.
EXPERIMENTAL METHODS EMPLOYED TO
INVESTIGATE AOA EFFECTS
Semifactorial designs
In these studies, two experiments are conducted each with two lists of words.
In one experiment the lists will differ in their average AoA values while all
other considered parameters (usually including frequency) are matched. In
the other experiment the lists will differ in their average frequency values
while all other considered parameters (using including AoA) are matched.
Such experiments */it is supposed */can demonstrate the presence of
independent AoA effects and frequency effects. An accepted limitation of
this method is that it cannot reveal any potential interactions. An example of
this method is the set of experiments conducted by Turner, Valentine, and
Ellis (1998) into the effects of AoA and frequency in auditory and visual
lexical-decision tasks.
Factorial designs
These studies also employ four lists of words but this time they are employed
within the same experiment. The lists are arranged such that two lists have
high frequency and two have low frequency, and also two lists are early
acquired and two lists are late acquired. It is also common to match the lists
on any other parameters that may affect reaction times. This methodology is
preferred by many researchers because it follows the principles of a
traditional experimental design with full manipulation of the independent
variables. The difficulty with this type of design is in generating the lists so
that they are well matched. This method allows for consideration of
interactions but ignores much of the wealth of information contained in
word-norm data by converting continuous scales into dichotomous scales
(semifactorial designs also suffer from this problem). Examples of this type
CHASING PSYCHOLINGUISTIC EFFECT
1015
of experiment are those conducted by Gerhand and Barry (1998) into the
effects of AoA and frequency in word-reading tasks.
Multiple regression designs (simultaneous not stepwise)
These studies typically involve a single list of words chosen to be
representative of the set of words in question. No selection process based
on matching word properties is required and so the list is easy to obtain. The
analysis is performed by considering how much of the variability in the
dependent variable can be accounted for by a particular independent
variable (e.g., AoA) once all other independent variables (usually including
frequency) have been partialled out (i.e., their contribution to the accounting
for the variability of the dependent variable has been removed). One problem
with this type of analysis is that the correlation between AoA and frequency
remains within the stimuli. This colinearity can swamp the independent
effects of AoA and frequency leading to nonsignificant results (hence, it can
be conservative in revealing effects). The advantages of this method are that:
It uses the real values of the word-norm estimates (not just a dichotomous
split); it allows any number of additional predictors to be included without
requiring matching; and if there really is an independent effect of AoA then
it should overcome the colinearity problem. This method has been used to
investigate the effects of AoA and other factors in object naming in young
(e.g., Barry et al., 1997) and older people (e.g., Hodgson & Ellis, 1998).
Stepwise regression designs
Stepwise regression has been employed as a way to bypass the colinearity
problem of simultaneous multiple regression. This method selects the best
predictors to include in the regression model. Any colinearity between
factors inside the model and those outside the model is ‘‘given’’ to those
inside the model. Morris (1981) has demonstrated previously that this
method has dangerous consequences and so has rarely been employed since.
It is worth noting, however, that Morris’ criticisms only apply to stepwise
regression and not simultaneous multiple regression. Prior to Morris’
discourse, this method had been used by Carroll and White (1973), and
Gilhooly and Gilhooly (1979) to investigate AoA and frequency effects. In
the light of Morris’ work and because this method is no longer employed,
stepwise regression will not be considered further here.
1016
LEWIS
A SIMULATION TO SHOW THE PROBLEMS WITH
MULTIPLE REGRESSION DESIGN
All of the methods described above are based on the same mathematical
equations */that is the General Linear Model. They have the same underlying assumptions about the data that are being considered. Unfortunately,
when considering the effects of AoA and frequency on lexical tasks, at least
one of these assumptions is violated (in fact more than one is violated but
the issues of curvilinearity and normality have been dealt with elsewhere*/
Lewis et al., 2001). The important assumption considered here is that in
order to infer effects we have to manipulate our independent variables and
have experimental control over them. In psycholinguistic experiments,
however, we do not manipulate the AoA of a word or its frequency. Instead,
we may choose a word because of its AoA and its frequency for inclusion in
an experiment but that is not the same thing as manipulating it. This is not
strictly an experimental design */it is pseudoexperimental. All the analysis
can do is to reveal correlational structures. This lack of actual manipulation
is particularly damaging because our independent variables are themselves
experimental estimates of the actual properties. The consequences of this can
be demonstrated using a simulation of what might be happening and how
Figure 1. The hypothetical situation used for all the simulations. Actual frequency affects actual
AoA and both of these are hidden from the experimenter. Actual AoA (only) affects reaction times.
The word norms are affected by the hidden properties.
CHASING PSYCHOLINGUISTIC EFFECT
1017
this can lead to correlational structures or effects that are not present in the
data for the actual values.
Let us assume, hypothetically, that what is really going on is as follows.
First, the frequency with which we see a word affects the age at which it is
learnt. Also, the age at which a word is learnt affects the time taken in a
reaction time task (e.g., lexical decision). In this hypothetical simulation,
frequency of the word does not directly affect reaction times */it only has an
effect moderated by AoA. Now frequency is not the sole influence on AoA
and so there is also some unexplained variation. Also, AoA is not the sole
influence of reaction time and so there is some unexplained variation. The
situation can be drawn as in Figure 1. We can measure the reaction times to
words but we cannot have direct access to the actual frequency of the words
or their actual AoA (these are hidden properties). Instead, we have the
estimated word norms that we must assume are affected by the actual word
values but have unexplained variation within them as well.
This artificial situation can easily be modelled using randomly generated
data. This was done using normal distributions to generate a set of items
that represent the word frequencies for 200 words. From the hypothetical
situation that we are using, word frequency affects AoA. To illustrate this we
can add another normally distributed set of data to the negative frequency
scores (negative because there is a negative correlation between AoA and
frequency). These will be the actual AoA scores as generated by actual
frequency values plus some random unexplained variance. Next, reaction
times are generated the by adding random data to the AoA data (because
AoA affects reaction time together with unexplained variation). The
frequency norms and the AoA norms can be generated by adding random
variation to the actual values. The appendix describes how this hypothetical
situation was generated using random data sets.
In an experiment, we typically wish to find out which factors (e.g.,
frequency or AoA) affect the outcome (e.g., reaction times). We do not know
the actual values for these and so we use the estimates available from the word
norms. A multiple regression analysis, therefore, would explore whether the
AoA norms affects reaction times once any variation that can be accounted
for by the frequency norms has been removed and whether the frequency
norms affects reaction times once any variation that can be accounted for by
AoA norms has been removed. In the current simulation, this regression led to
an overall R squared value of .459 with both AoA, t(l97 )9.099, p B.001,
and frequency, t(197) 4.753, p B.001, being significant independent
factors. The conclusion from this analysis, therefore, would be that both
AoA and frequency have independent effects on reaction times.
In the current simulation we know that the conclusion drawn from the
multiple regression is fallacious. We know from the way that the data were
generated that only AoA has a direct effect on reaction times and any effect
1018
LEWIS
of frequency is completely mediated by AoA. The conclusion of effects from
this multiple regression is flawed. To reinforce the fact that only AoA has a
direct effect on reaction times, we can conduct the multiple regression using
the actual values for frequency and AoA (i.e., the hidden properties). We can
do this for the current artificial simulation but this would be impossible in
reality. This analysis gives an R squared value of .665 with AoA, t(197) 12.262, pB.001, being a significant predictor and frequency, t(197) 0.486,
p .5, being nonsignificant. That is, only actual AoA affects reaction
times*/a correct conclusion.
In this simulation, the multiple regression using the word norms suggests
a very different underlying model than was actually used to generate the
data. It can suggest that an effect is present even when it is not. This simple
demonstration suggests that multiple regression can give uninformative
results when applied to psycholinguistic data derived from estimates.
The reason why the multiple regression is problematic in the case
described above is because it is designed for experimental manipulations.
The regression analysis assumes that the predictors are fixed or experimentally manipulated variables in which case the use of the term ‘‘effect’’ would
be appropriate. What we have here, and in the psycholinguistic literature, are
predictors that are themselves random variables and so we can only discuss
the results in terms of multiple correlations.
The word ‘‘cat’’ is not read quickly because it is judged to be learnt early
or because it occurs X number of times in a particular corpus; the word
‘‘cat’’ is read quickly because of some (hidden) property of its mental
representation. This mental property correlates with frequency and/or AoA
norms. It is appropriate, therefore, to say that AoA norms and frequency
norms correlate with reaction times but not that these norms (or whatever
these norms might represent) affect reaction times.
A SIMULATION TO SHOW THE PROBLEMS WITH
FACTORIAL DESIGNS
It has been suggested that because of the problem described above, and other
reasons, factorial designs are better than multiple regression designs. In
factorial designs, word lists are selected such that they manipulate frequency
and AoA in unambiguous ways. It could be argued that the difference
between the actual values of the property and the estimates of the property
will not affect this type of analysis. Factorial designs, therefore, have been
used as supposedly truly experimental designs whereas multiple regressions
are more correlational or pseudoexperimental in nature. It can be demonstrated, however, that this argument is false for reasons very similar to those
for the multiple regression. This demonstration uses the same set of data as
CHASING PSYCHOLINGUISTIC EFFECT
1019
Figure 2.
Data from the simulation using a factorial design. The four lists of items were well
matched on frequency and AoA norms. AoA and frequency appear to affect the reaction time data
(high frequency and early acquired items produce lower reaction times). The four lists, however, are
not matched on the actual frequency and AoA values.
the multiple regression demonstration above but applies a factorial experiment design.
As before, we have frequency affecting AoA, which in turn affects
reaction times. Also, frequency affects frequency norms and AoA affects the
AoA norms. From the original 200 items, four lists of 28 items each were
selected. The criteria for selection of the items were that they generated sets
that varied on their average AoA norms whilst being matched on their
average frequency norms and vice versa. The left-hand side of Figure 2
shows how well the four sets were matched on frequency norms and AoA
norms. Clearly, the match is very good. The procedure used to generate the
four lists was similar to that which would be employed by an experimenter
selecting words for inclusion in a factorial AoA and frequency experiment.
The reaction times for the items in the four sets were analysed using a
standard two-by-two factorial by-item ANOVA. This ANOVA found a
significant effect of AoA, F(1, 108)34.491, P B.001, and frequency, F (1,
108) 10.135, P B.001. The interaction was not significant, F (1, 108) 0.197, p .5. Using this factorial design, therefore, one would conclude
that both frequency and AoA affect the reaction times. Again, as we know
the actual model that generated the data does not have frequency affecting
reaction times, this result means the conclusions drawn from the factorial
analysis are incorrect.
1020
LEWIS
The reason why the ANOVA gets it so wrong is the same as why the
multiple regression got it wrong (after all, they are based on the same
mathematics). While the selection of the items for the four lists appears to be
in the experimenter’s hands, in fact, it is based on estimates. Because of the
correlation between the actual AoA and frequency data, the experimenter
will have to be strategic when choosing items that reduce this correlation.
Unfortunately, selection of items that reduce this correlation will exaggerate
the incorrect part of the norms (i.e., that part which is spurious variation
rather than being related to the actual values). The higher the correlation
between AoA and frequency, the harder it becomes to select items that are
matched on one factor but vary on the other. If the two actual word
properties are completely correlated then the selection will be based entirely
on the spurious variations in the norms. The lower the correlation between
the actual values, the less influence unexplained variance will have. The net
result of any correlation between actual AoA and frequency values is that
when the lists are matched, they are being matched to a large degree on the
spurious variation rather than the actual values. While two sets may have the
same average AoA norm the average actual AoA for one may be quite
different than for the other. The right-hand side of Figure 2 illustrates the
actual values of the four lists in the current simulation. These lists were
almost perfectly matched on the word norms but their matching on the
actual values would worry any experimentalist. This failure to match on
actual values comes from the correlation of the actual values together with
the use of estimates of these values. These two properties are present in the
lexical data.
It is this failure to match on actual values that leads to the erroneous
conclusions in the ANOVA above. In this way, the factorial design is no more
experimental than multiple regression design. As such, this type of pseudoexperimental analysis of psycholinguistic data cannot reveal effects but
merely the presence of correlations (and if all we can get are correlations
then the multiple regression */or rather multiple correlation */would be a
better method to employ).
A SIMULATION TO SHOW THE PROBLEMS WITH
SEMIFACTORIAL DESIGNS
For the sake of completeness, it can be demonstrated that the semifactorial
design has the same failing. Using the same model and the same data as in
the simulations above, two lists of 45 items each were chosen. These lists
were chosen such that they were matched on their average AoA norm but
differed on their frequency norm (see Figure 3). An unpaired t -test was
conducted on the reaction times for the two lists. This found a significant
CHASING PSYCHOLINGUISTIC EFFECT
1021
Figure 3. Data from the simulation using a semifactorial design. The two lists of items were well
matched on AoA norms. Frequency appears to affect the reaction time data (high frequency produce
lower reaction times). The two lists, however, are not matched on the actual AoA values.
difference, t(88) 2.691, p B.001. The conclusion from this analysis would
be that frequency affects reaction times when AoA is controlled for. Once
again, this conclusion is incorrect given the actual generation of the model
from which the data came from. The reason for the incorrect conclusion is
again the fact that estimates are being used in an analytical design that
makes the assumption of a manipulation.
WHERE DOES THIS LEAVE PSYCHOLINGUISTIC
RESEARCH?
From the discussion so far, it might appear that only AoA and not frequency
affects lexical tasks (the simulation shows that such a situation is consistent
with the empirical evidence). This is not, however, what is intended. In fact, the
above discussion can be made swapping the two terms around so that it is
frequency and not AoA that affects reaction times. In this case the pattern of
results would still be that AoA and frequency affect reaction times but this
time it is the AoA result that is fallacious. Indeed, the same argument applies
to any set of correlated values. The point is that the tools that we have been
employing are not capable of answering the questions we wish to ask of the
lexical data we have available to us. Neither factorial designs nor multiple
regression techniques can provide evidence of causal relationships if we have
1022
LEWIS
not truly manipulated our independent variables. Mere selection from a list
(albeit a long list of words) is not in itself an experimental manipulation.
Virtually all studies into the effects of properties of words on lexical
processing (including those exploring AoA effects) use these pseudoexperimental methods to make conclusions that particular properties affect the
reaction times for a particular task. The conclusions being drawn concern,
for example, age-of-acquisition effects, whereas all that has really been
established are correlations.
The problem highlighted here is not restricted to the psycholinguistic
domain. Studies into face recognition using naturally occurring personalities
(e.g., Lewis, 1999; Lewis, Chadwick, & Ellis, 2002; Moore & Valentine, 1998)
also make the mistake of treating factors that we have no real control over
(e.g., when a person become famous) as a manipulated independent variable.
The problem in this domain, however, is not as severe as in the lexical domain
because AoA and frequency are not likely to be as highly correlated as they are
for words (there is no particular reason why those faces learnt at a young age
will be seen more often), but nevertheless the problem is still present.
One way to completely remove the problem is to generate a completely new
set of stimuli on which to train the participants. Obviously such a procedure
would have its own ethical problems if the AoA term were to be explored over
a range covering several years. Some previous AoA studies have involved
doing this but in a limited domain: Exploration of artificial neural network
models typically involves the controlled delivery of arbitrary patterns during
learning. In these, the experimenter really does have full control over the
independent variables and so multiple regression or factorial designs would be
a sound method of analysis. As Ellis and Lambon Ralph (2000) have
demonstrated, these neural networks do show AoA and frequency effects
and these are real experimental effects rather than just correlations. It can be
argued that it is only these experiments, those using artificial neural networks,
that show real AoA effects. These studies are experiments because variables
have really been manipulated by the researcher and the outcome of these
manipulations can be interpreted as real effects.
NONEXPERIMENTAL PSYCHOLINGUISTIC ANALYSIS
There are many possible factors involved in any psycholinguistic performance. Some of the most commonly considered factors are AoA, frequency,
imageability, concreteness, word length, familiarity, and so on, although
there are many more that could be considered. These all correlate with each
other to lesser or greater degrees and they also correlate with outcome
measures such as reaction times in lexical decision tasks or word reading
tasks. Language is a social construction that has evolved over a long period
CHASING PSYCHOLINGUISTIC EFFECT
1023
of time. This fact means that there may be many relationships caused by its
history, such as the shortening of words that are used more often. Also, there
is the possibility of causal relationships based on cognitive factors (such as
the familiarization of words that are seen more often).
The simulation and discussion above have demonstrated that the research
that has been conducted so far is not able to distinguish between effects and
mere correlations. The question remains, can we use these correlations to
build and test some model of causality and can such models tell us more
than the simple correlations do?
Structural equation modelling has been used effectively in other areas of
psychology to use correlations to build up models of causality. This method
explores the correlations between factors given a hypothetical set of causal
relationships. This set of relationships can then be contrasted against other
hypotheses and evaluated. Further, some forms of structural equation
modelling allow for the consideration of latent variables. These are factors
that cannot be measured directly (e.g., intelligence or depression). If a person
has depression then this will show in a number of directly measurable ways
(e.g., loss of appetite, self loathing, or tiredness). When studying depression,
we need to study these measurable factors in order to find out about the
latent variable of depression. In psycholinguistics, we are interested in word
representations as affected by word frequency, AoA, or other factors.
Obviously, we cannot measure directly the strength of a word representation
(it is latent) but we can use reaction times to get a measure of it. Similarly, we
cannot measure the age at which someone learns a word but we can have a
number of variables that are likely to be related to it quite closely (e.g., AoA
estimates from adults or age of reading scores from children).
Only by considering models of language which incorporate latent variables
(e.g., word representation strength, actual word frequency, actual age of
acquisition) will it be possible to evaluate causation. The existing psycholinguistic data provides a great variety of different information about
measurable indicators or factors (e.g., several different word frequency counts
or estimates). These data can be brought together to compare and contrast the
variety of different models of causation that have been suggested.
CONCLUSIONS
Above, the methods for exploring the effects of frequency and AoA on lexical
tasks were reviewed. These methods assume that the experimenter freely
manipulates the independent variables. In fact, psycholinguistic experiments
often use estimates of the actual variables and these variables are themselves
correlated. A simulation of a hypothetical situation was described in which
only one property (of two correlated properties) affected reaction times. It was
1024
LEWIS
shown that, when using estimates of the properties, the two properties were
found to ‘‘affect’’ reaction times. This erroneous conclusion would be made
from multiple regression or factorial design experiments. It is suggested that
the same types of erroneous conclusions occurs in real psycholinguistic
research employing these techniques. It is suggested that a structural equation
modelling procedure might avoid these problems.
The case that has been made here is that there is a problem with research
into the analysis of the effects of age of acquisition and frequency in tasks
involving word processing. The reason for this focus was that this is the
scope of this special issue of the journal. The same argument can apply to
any situation where one is trying to infer causality from correlations between
estimated properties and performance measures. Indeed, much of our
understanding of lexical processing is based on the experimental techniques
critiqued here being applied to normative estimate data and reaction times.
A consequence of the analysis reported here is that there are no studies that
demonstrate age-of-acquisition effects for psycholinguistic tasks. The
argument, however, can be taken further to say that no study has
demonstrated a psycholinguistic effect based on word properties. Reported
effects as robust as word frequency effects or word length effects may be just
as fallacious as the effects revealed in the simulations above.
REFERENCES
Barry, C, Morrison, C. M., & Ellis, A. W. (1997). Naming the Snodrass and Vanderwart
pictures: Effects of age of acquisition frequency and name agreement. Quarterly Journal of
Experiment Psychology, 50A , 560 /585.
Brysbaert, M., van Wijnendaele, I., & de Deyne, S. (2000). Age-of-acquisition effects in semantic
processing tasks. Acta Psychologica , 104 , 215 /226.
Carroll, J. B., & White, M. N. (1973). Word frequency and age of acquisition as determiners of
picture naming latencies. Quarterly Journal of Experimental Psychology, 12 , 85 /95.
Ellis, A. W., & Lambon Ralph, M. A. (2000). Age of acquisition effects in adult lexical
processing reflect loss of plasticity in maturing systems: Insights from connectionist
networks. Journal of Experimental Psychology: Learning, Memory, and Cognition , 26 (5),
1103 /1123.
Gerhand, S., & Barry, C. (1998). Word frequency effects in oral reading are not merely age-ofacquisition effects in disguise. Journal of Experimental Psychology: Learning, Memory, and
Cognition , 24 , 267 /283.
Gerhand, S., & Barry, C. (1999). Age of acquisition, frequency and the role of phonology in the
lexical decision task. Memory and Cognition , 27 , 592 /602.
Ghyselinck, M., Lewis, M. B., & Brysbaert, M. (2004). Age of acquisition and the cumulativefrequency hypothesis: A review of the literature and a new multi-task investigation. Acta
Psychologia , 115 , 43 /67.
Gilhooly, K. J., & Gilhooly, M. L. (1979). Age-of-acquisition effects in lexical and episodic
memory tasks. Memory and Cognition , 7 , 214 /223.
CHASING PSYCHOLINGUISTIC EFFECT
1025
Gilhooly, K. J., & Logie, R. H. (1980). Age-of-acquisition, imagery, concreteness, familiarity and
ambiguity measures for 1944 words. Behavior Research Method and Instrumentation , 12 ,
395 /27.
Hodgson, C, & Ellis, A. W. (1998). Last in, first to go: Age of acquisition and naming in the
elderly. Brain and Language, 64 , 146 /163.
Lewis, M. B. (1999). Age of acquisition in face categorisation: Is there an instance-based
account? Cognition , 71 , B23 /B39.
Lewis, M. B., Chadwick, A. J., & Ellis, H. D. (2002). Exploring a neural network account of ageof-acquisition effects using repetition priming of face. Memory and Cognition , 30 , 1229 /
1237.
Lewis, M. B., Gerhand, S., & Ellis, H. D. (2001). Re-evaluating age of acquisition effects: Are
they simply cumulative-frequency effects? Cognition , 75 , 1 /17.
Moore, V., & Valentine, T. (1998). The effect of age of acquisition on speed and accuracy of
naming famous faces. Quarterly Journal of Experimental Psychology, 51A , 485 /513.
Morris, P. E. (1981). Age of acquisition, imagery, recall, and the limitations of multipleregression analysis. Memory and Cognition , 9 , 277 /282.
Morrison, C. M., Chappell, T. D., & Ellis, A. W. (1997). Age of acquisition norms for a large set
of object names and their relation to adult estimates and other variables. Quarterly Journal of
Experimental Psychology, 50A , 528 /559.
Morrison, C. M., Ellis, A. W., & Quinlan, P. T. (1992). Age of acquisition, not word frequency,
affects object naming, not object recognition. Memory and Cognition , 20 , 705 /714.
Oldfield, R. C, & Wingfield, A. (1965). Response latencies in naming objects. Quarterly Journal
of Experimental Psychology, 17 , 273 /281.
Turner, J. E., Valentine, V., & Ellis, A. W. (1998). Contrasting effects of AoA and word frequency
on auditory and lexical decision. Memory and Cognition , 26 , 740 /753.
Zevin, J. D., & Siedenberg, M. S. (2002). Age of acquisition effects in word reading and other
tasks. Journal of Memory and Language, 47 , 1 /29.
1026
LEWIS
APPENDIX
Steps required to generate the simulation using statistical
software such as spss or statview
1. Generate five columns of normally distributed random data. Each column should be independent
but have the same mean and standard deviation (e.g., 0.0 and 1.0 respectively). All columns should
be the same length (e.g., 200 rows).
2. Label the first column ‘‘Actual Frequency’’. This is a property of the 200 words that is not affected
by any other term (as such it is all unexplained variance).
3. Generate a new column called ‘‘Actual AoA’’ as ‘‘Column 2’’ ‘‘Actual frequency’’. This represents
the Actual AoA values of the words affected by things not measured (unexplained variance) and the
frequency of the words (Actual frequency).
4. Generate a new column called ‘‘Reaction time’’ as ‘‘Actual AoA’’‘‘Column 3’’. This represents the
reaction time for each of the words as affected by things not measured (unexplained variance) and
the actual AoA of the words.
5. Generate a new column called ‘‘Norm Frequency’’ as ‘‘Actual frequency’’‘‘Column 4’’. This
represents the experimentally generated word frequency norms for the words predicted by the real
word frequency by also affected by measurement error (unexplained variance).
6. Generate a new column called ‘‘Norm AoA’’ as ‘‘Actual AoA’’‘‘Column 5’’. This represents the
experimentally generated word AoA norms for the words predicted by the real word AoA by also
affected by measurement error (unexplained variance).
The multiple regression that models those typically employed in real analysis is ‘‘Reaction time’’
predicted by ‘‘Norm frequency’’ and ‘‘Norm AoA’’. This type of analysis of the simulation data
suggests that frequency affects reaction times even when AoA effects have been removed. This is not
true.
To conduct an ANOVA, choose four lists of items that vary norm AoA and norm frequency
factorially. The result is likely to find an effect of frequency independent of AoA. This is because,
typically, the actual AoA and actual frequency values are not well matched across the four lists.
A semifactoral analysis can also be conducted using a similar list selection method. This analysis
too will show an effect of frequency independent of AoA */an effect that is not true for the data as it
was generated.