Neural Mechanisms of Spoken Word Recognition: MEG Evidence

Neural Mechanisms of Spoken Word Recognition:
MEG Evidence for Distinct Sources of Inhibitory Effects
Liina Pylkkänen, Andrew Stringfellow and Alec Marantz
Department of Linguistics and Philosophy, KIT/MIT MEG Laboratory,
Massachusetts Institute of Technology, Cambridge, MA
Correspondence:
Liina Pylkkänen
Department of Linguistics and Philosophy
Massachusetts Institute of Technology
E39-229
77 Massachusetts Avenue
Cambridge, MA 02139
(617) 253 2690 (Phone)
(617) 253-5017 (fax)
[email protected] (E-mail)
Category:
Report
1
Abstract
Spoken word recognition involves excitation of potential matches to a stimulus and inhibition of
representations that do not optimally conform to the input. Inhibition is evidenced by delayed reaction
times to stimuli whose representation has been rejected as the wrong match to a previous stimulus. The
cognitive level of this effect was investigated in a priming paradigm using magnetoencephalography.
Results locate the delay to inhibited activation for targets that match in onset with the prime and to
inhibited recognition for targets that are otherwise phonologically similar to the prime. The study
provides evidence that the neural correlates of behavioral inhibition are not uniform. (103 words)
2
Spoken word recognition involves mapping a temporal arrangement of distinctive speech units, i.e.
phonemes, to a single representation in the mental lexicon. A person’s vocabulary consists of tens of
thousands of entries. In contrast, the inventory of distinctive units contains on the average only 30 to 40
phonemes (1). It is thus obvious that the sound representations are not highly distinctive. Nevertheless,
humans are able to perform this computationally demanding matching task effortlessly and extremely
rapidly. The average speaking rate in continuous speech has been estimated to be around five syllables
per second (2). Given that many words consist of only one syllable, this implies it must not take humans
more than a few hundred milliseconds to match acoustic input to lexical representations.
Similarly to all mental processes, successful spoken word recognition depends on a delicate balance
between excitation and inhibition. As the incoming speech signal unfolds over time, all potential matches
to the input must be rapidly activated. However, to allow for selection among these entries, the incorrect
items must also be quickly inhibited as soon as it can be determined that they do not constitute the best
match to the input. If either activation or inhibition is impaired, healthy language processing is
impossible. For example, the language problems of Broca’s aphasia have been attributed to deficient
activation and the problems of Wernicke’s aphasia to deficient inhibition (3).
In this study we employed magnetoencephalography (MEG) to investigate the neural mechanisms of
inhibition in spoken word recognition. The millisecond-resolution of MEG makes it possible to probe into
the multiple mental processes occurring within the half second between stimulus presentation and
response (4). Behaviorally, inhibitory effects are obtained in priming paradigms where targets are
phonologically similar to their primes, as in spinach-spin (5). Most current models agree that these
inhibitory effects arise from competition among activated entries (6-8). However, these reaction time
effects are ambiguous as to the processing stage that is affected by competition. The delay in reaction
time could originate either from inhibited target activation or from inhibited target recognition.
The inhibited activation and recognition hypotheses make crucially different claims about the
fundamental mechanisms underlying the recognition process. The inhibited activation hypothesis assumes
that inhibition is suppression of wrong matches under their resting level of activation. On this hypothesis,
the delay in reaction time for prime-target pairs such as spinach-spin results from the fact that in primeprocessing the correct match to the prime, i.e. SPINACH, suppresses its competitor SPIN under some
resting level of activation (9). Consequently, when spin occurs as the target, its activation is delayed. In
contrast, the inhibited recognition hypothesis locates the inhibitory effect entirely to post-activation
competition. On this hypothesis, incorrect matches to the input are not suppressed and therefore the
activation of phonologically related targets is not inhibited. Nevertheless, reaction times to targets in pairs
such as spinach-spin are predicted to be delayed because of competition from the highly active prime
entry. Since SPINACH is the correct match to the prime, its activation level at target presentation is
3
necessarily higher than the level of SPIN. Thus the recognition of SPIN as the correct match to the target
should be delayed due to interference from SPINACH. Discrimination between the inhibited activation
and inhibited recognition hypotheses is further complicated by the fact that both hypotheses make the
same predictions even in paradigms which in other situations prove useful for distinguishing between
different hypotheses about the cognitive level of behavioral effects, such as short SOA (stimulus onset
asynchrony) priming paradigms (10).
While the behavioral predictions of the inhibited activation and recognition hypotheses are difficult to
distinguish (11), they make crucially different predictions about the timing of initial lexical activation.
The inhibited activation hypothesis predicts delayed target activation. In contrast, the inhibited
recognition hypothesis predicts facilitated lexical activation: if competitors are not suppressed but simply
fail to reach some threshold of activation for recognition, then their activation should remain elevated for
some time after prime presentation. Thus, to locate the source of the inhibitory effect, one would need an
index of initial lexical activation that is independent of reaction time, i.e. an index of the excitation of
lexical entries before competition effects begin. Such a measure would be able to dissociate facilitation in
early lexical activation from later inhibitory effects.
To investigate the neural sources of behavioral inhibition, we recorded the magnetoencephalograms
of twenty-three healthy native English speakers while they made lexical decisions to visual targets
preceded by phonologically related auditory primes (12). Previous MEG studies on lexical processing
have concentrated on visual word recognition (13-17). Our crossmodal paradigm allowed us to investigate
inhibition in the processing of spoken primes by measuring the effects of inhibition on well-established
response components elicited by visual targets. In particular, we have previously identified an MEG
component at approximately 350ms post-stimulus onset (M350) which is sensitive to lexical factors but
not to competition (18). The M350 is generated by a source in the left superior temporal cortex (13, 17),
and it plausibly corresponds to ERP negativities in the 250-350 ms range found to be sensitive to lexical
factors in word-class studies (19, 20) (rather than the somewhat later-peaking N400). The M350 can vary
independently of reaction time (17) and is also the earliest MEG response component sensitive to lexical
frequency (16). Together, these results strongly support the hypothesis that the M350 indexes initial
excitation of lexical entries. Since this is precisely the stage of lexical processing where the inhibited
activation and inhibited recognition hypotheses make distinct predictions, the M350 offers a promising
tool for investigating the nature of inhibition in word recognition. If inhibition is suppression of
competitors (inhibited activation hypothesis), M350 latencies should be increased for phonologically
related targets. In contrast, if inhibition effects arise purely from post activation competition (inhibited
recognition hypothesis), M350 latencies of phonologically related targets should show facilitation (Fig.
1).
4
The effects of two different types of phonological similarity were investigated. In one type the prime
and target matched in onset, as in spinach-spin, while in the other the prime and target were
phonologically similar without onset-matching, as in teacher-reach (21). The role of word onsets in
lexical access is controversial. The first influential model of spoken word recognition hypothesized that
the candidate set for recognition is constrained by the onset of the input, i.e. bullet would activate
BULLET and BULL but not FULL (22). However, evidence has since accumulated (6, 23, 24) to support
so-called continuous mapping models, (7, 8), where lexical entries are activated at any point of the
stimulus. But even though the necessity for continuous activation is by now generally agreed on, evidence
exists for asymmetries between activated incorrect entries that share an onset with the input and incorrect
entries that do not. For example, while the stimulus trombone semantically primes the target rib via
BONE (25, 26), the same effect is not obtained when the target is homophonous to the beginning of the
prime. In other words, cargo does not prime bus via CAR (26, 27). Our stimulus manipulation allowed us
to investigate not only the neural mechanisms of inhibition but also the additional question of whether
onset-matching and non-onset-matching competitors would pattern together with respect to M350
priming. MEG data was continuously acquired while participants performed the lexical decision task, and
then averaged off-line (28).
In addition to the M350, two earlier components elicited by visual word stimuli were examined. The
first, the M170, follows the visual M100 and peaks at 150-220ms (16, 17, 29, 30). The second
component, the M250, peaks at 200-300ms and has a left-hemisphere source which is somewhat more
posterior than the source of the M350 (16, 17). For all three components, latencies and amplitudes were
determined individually for each participant by calculating the root mean square (RMS) field strength of
the sensors that covered the appropriate field pattern (Fig 2). Two participants’ data did not show the
characteristic distribution of the M350; these participants were excluded from all analyses. In previous
studies, the M170 and the M250 have not shown sensitivity to lexical factors (16, 17); if the M350
indexes initial activation of lexical entries, neither the M170 nor the M250 should vary across conditions
in this experiment, either.
In the behavioral data, inhibition was obtained for both the teacher-reach and spinach-spin
comparison. Response times to targets of phonologically related non-onset-matching pairs (teacher-reach,
x = 685) were significantly slower than responses to their unrelated controls (ocean-reach, x = 666),
t(20) = 2.25, p < 0.05. Similarly, targets of onset-matching pairs (spinach-spin, x = 705) were responded
to more slowly than targets of their unrelated controls (muffler-spin, x = 672), t(20) = 2.76, p < 0.05, (31).
Thus, the two comparisons were behaviorally indistinguishable. However, onset-matching and non-onsetmatching phonological relatedness had strikingly different effects on the M350. The M350 latencies of
targets in teacher-reach type pairs ( x =332.8) were shorter than their controls ( x =345.7), t(20) = -3.33, p
5
< 0.005. In contrast, the M350s elicited by the targets of spinach-spin type pairs ( x = 349.7) were delayed
in comparison to their controls ( x =334.6), t(20) = 2.8, p < 0.05. Similarly, the amplitudes of the M350s
elicited by phonologically related non-onset-matching targets were smaller than their controls (t(20) = 2.6, p < 0.05), while the amplitudes of the M350s elicited by onset-matching targets were larger their
controls, although the latter effect was only marginal (t(20) = 2, p = 0.06). Phonological relatedness did
not modulate the latencies or amplitudes of the earlier M170 or M250 components in either comparison.
Our results locate the inhibitory effects of onset-matching and non-onset-matching targets to distinct
processing stages. The increased M350 latencies and amplitudes of onset-matching targets (spinach-spin)
indicate that their activation is inhibited. In contrast, the decreased M350 latencies and amplitudes of nononset-matching targets (teacher-reach) show that the inhibitory effect observed in their reaction times
must arise purely from post-activation competition. The result supports models where activation is a
separate processing stage from competition (6, 8) and where onset-matching competitors have a special
status in the recognition process. The result also offers an interesting explanation to the asymmetries
obtained in the activations of initially and medially embedded words: cargo does not prime bus via CAR
because CARGO suppresses CAR, while trombone does prime rib since BONE does not match in onset
to the prime. Finally, our study converges on a body of results in acoustics and phonology supporting the
special status of word onsets. Both consonants (32, 33) and vowels (34) are perceptually more salient in
word-initial than in other positions, even when stress and syllabic position are controlled for. Word-initial
segments are also phonologically more stable than word-medial and final segments. For instance,
processes that neutralize phonemic distinctions are relatively rare word-initially (35). If word onsets play
a special role in word recognition, the mechanisms of the speech perception system can potentially shed
light on the distribution of these phonetic and phonological processes (35, 36).
6
References and Notes
1. I. Maddieson, Pattern of sounds. Cambridge University Press (1984).
2. D. O’Shaugnessy, Journal of the Acoustical Society of 76, 1664-1672 (1984).
3. S. E. Blumstein, W. P. Milberg. Language deficits in Broca’s and Wernicke’s aphasia: A singular
impairment. In Y. Grodzinsky, L. Shapiro, D. Swinney, Eds. Language and the brain: Representation
and processing. New York, Academic Press.
4. MEG is similar to EEG except that it measures the magnetic fields, instead of electric potentials, that
are generated by postsynaptic currents in neurons. Unlike electric potentials, magnetic fields are not
distorted by the skull and other tissue. Therefore, isolation of the sources underlying the activity
measured outside the head is more straightforward in MEG than in EEG.
5. K. Rastle et al., Language and Cognitive Processes, 15, 507-537 (2000)
6. P. A. Luce, D. B. Pisoni, S. D. Goldinger, Similarity neighborhoods of spoken words. In G. T. M.
Altmann, Ed., Cognitive models of speech processing: Psycholinguistic and computational
perspectives, 122-147. Cambridge, MA: MIT (1990).
7. J. L. McClelland, J. L. Elman, Cognitive Psychology, 18, 1-86 (1986).
8. D. Norris, Cognition, 52, 189-234 (1994).
9. Throughout this paper, italics are used to refer to stimulus items and capital letters to lexical
representations.
10. Shortening the interval between the prime and the target has been a useful method for discriminating
between different hypotheses about the sources of reaction time effects (e.g. 4). However, the
inhibited activation and recognition hypotheses cannot be distinguished behaviorally, even if stimulus
onset asynchrony (SOA) is very short. In such a paradigm, both hypotheses predict priming, rather
than inhibition, although for different reasons. The inhibited activation hypothesis makes this
prediction because suppression of the competitor SPIN would not have time to develop prior to
presentation of SPIN if the SOA is very short. The inhibited recognition hypothesis, on the other
hand, predicts priming because very brief SOAs would not allow the activation level of SPINACH to
significantly surpass the activation level of the competitor SPIN. Therefore, SPINACH would not
constitute a particularly strong competitor at the time of target presentation. This would plausibly
allow the elevated activation level of SPIN to surface as a speed-up in reaction time, without being
overriden by a competition effect.
10. Although see J. Vroomen, B. de Gelder, Frequency effects and lexical inhibition in spoken word
recognition, Paper presented at the Seventh Conference of the European Society for Cognitive
Psychology, Lisbon, Portugal (1994).
7
12. All subjects (7 females and 16 males) were right-handed and had normal or corrected to normal
vision.
13. P. Helenius et al.,. Journal of Cognitive Neuroscience, 11, 535-550 (1999).
14. S. Koyama et al., Neuropsychologia 36:1, 83-98 (1998).
15. S. Kuriki et al., Cognitive Brain Research, 4, 185-199.
16. D. Embick et al., Cognitive Brain Research, 10: 3, 345-348 (2001).
17. L. Pylkkänen, A. Stringfellow, A. Marantz, Brain and Language (in press).
18. In an initial study, Embick et al. (15) varied lexical frequency in a lexical decision task and showed
that the M350 is the first MEG response component whose latency increases as frequency decreases.
To investigate the cognitive level of the M350 further, Pylkkänen et al. (16) manipulated lexical
activation and selection/recognition simultaneously by varying stimuli in phonotactic probability.
Phonotactic probability affects activation and selection in opposite ways, which made it possible to
potentially dissociate the M350 from reaction times. Stimuli which are high in phonotactic probability
induce fast initial activation but late selection (M. S. Vitevitch, and P. A. Luce. Journal of Memory
and Language 40: 374-408 (1999)). The delay in selection, and subsequently in reaction time, is
caused by the fact that high-probability stimuli are associated with dense similarity neighborhoods,
and consequently with intense competition. M350 latencies were found to be decreased, rather than
increased, for high probability stimuli. This shows that while the M350 is sensitive to lexical factors,
it is not affected by competition.
19. J. W. King, M. Kutas. Neuroscience Letters, 244, 1-4 (1998).
20. C. M. Brown, P. Hagoort, M. ter Keurs. Journal of Cognitive Neuroscience 11:3, pp. 216-281 (1999).
21. The stimulus materials were derived from the behavioral study of L. Gonnerman. Morphology and
the lexicon: exploring the semantics-phonology interface, thesis, University of Southern California
(1999). The phonologically related, non-onset-matching comparison (teacher-reach) was added to her
original design. All auditory primes were prepared by Gonnerman. The two conditions along with
their unrelated controls yielded four categories of word stimuli: phonologically related onsetmatching pairs (spinach-spin, n=28) with their controls (muffler-spin, n=28) and phonologically
related non-onset-matching pairs (teacher-reach, n=56) with their controls (ocean-reach, n=56). In
addition, participants made lexical decision on 290 other word-targets, which served as stimuli for a
different experiment. Nonword stimuli consisted of 170 targets of which 50 were onset-matching to
their primes (lecture-lect), 50 phonologically related without onset-matching (mundane-tund) and 70
phonologically unrelated (biscuit-cobe). To balance the ratio of words to nonwords, all the nonword
stimuli were presented twice. Target stimuli were presented at the offset of the auditory prime, i.e.
SOA equaled the duration of the prime. In addition to the lexical decision task, participants were
8
instructed to pronounce out loud the last auditory stimulus they had heard 30 times in the course of
the experiment. Participants were allowed to rest every 50 trials. Auditory primes were presented
binaurally via airtube earphones, and visual targets were projected onto a ground glass screen. Target
stimuli were presented in nonproportional Courier font, and subtended approximately 1.2° of visual
angle vertically and 1.2° per character horizontally (the length of target stimuli varied 4 to 10 letters)
22. W. D. Marslen-Wilson, A. Welsh, Cognitive Psychology, 10, 29-63 (1978).
23. J. Andruski, S. E. Blumstein, S.E., M. Burton, Cognition, 52, 163-187 (1994).
24. P. D. Allopenna, J. S. Magnuson, K. Tanenhaus, Journal of Memory and Language, 38, 419-439
(1998).
25. J. Vroomen, B. de Gelder, Journal of Experimental Psychology, 23, 710-720 (1997).
26. F. Isel, N. Bacri, Brain and Language, 68, 61-67 (1999).
27. P. Zwitzerlood, Cognition, 32, 25-64 (1989).
28. Neuromagnetic fields were recorded using a 93-channel axial gradiometer whole-head system
(Kanazawa Institute of Technology, Kanazawa, Japan). Data were acquired in a band between 1Hz
and 200Hz, at a 500Hz sampling frequency. External sources of noise were removed online using an
active compensation coil system (Vacuumshmelze, Hanau, Germany). Offline noise reduction using
three orthogonally-oriented reference sensors (CALM algorithm, Y. Adachi et al., IEEE Transactions
on Applied Superconductivity, 11, 669-72) was also performed. Data were averaged by stimulus
condition from the onset of the visual target. For averaging, an epoch length of 900ms, plus a 100ms
baseline period, was used. Trials where the subject responded incorrectly, or responded more than
3SD faster or slower than his/her mean were eliminated from averages. This resulted in the rejection
of 5.7 % of the trials. Artifact rejection excluded all trials to stimuli that contained signals exceeding
± 2pT in amplitude and resulted in the exclusion of an additional 7.5 % of the trials. Averaged files
were low-pass filtered at 30Hz, and baseline adjusted using the 100ms pre-stimulus interval.
29. A. Tarkiainen et al., Brain, 122, 2119-2131 (1999).
30. P. Helenius et al., Cerebral Cortex, 9, 1047-3211 (1999).
31. The phonologically related conditions also elicited more incorrect or timed-out responses than their
unrelated controls. 9.2% of onset-matching targets were responded to incorrectly or too slowly while
their unrelated controls elicited an error-rate of only 6.8%, t(20) = 2.02, p < 0.05. Similarly,
phonologically related non- onsetmatching targets elicited more incorrect or timed-out responses
(5.2%) than their unrelated controls (3.9%), t(20) = -1.78, p < 0.05.
32. N. Umeda, Journal of the Acoustical Society of America, 61, 846-858 (1977).
33. A. M. Cooper, Proceedings of the 12th International Congress of Phonetics Sciences (1991).
9
34. Y. Meynadier, C. Fougeron, C. Meunier, Proceedings of the 14th International Congress of Phonetics
Sciences (1999).
35. D. W. Gow Jr., J. Melvold, S. Manuel, Proceedings of the Fourth International Conference on
Spoken Language Processing, 1, 66-69 (1994)
36. S. Frisch Temporally ordered lexical representations as phonological units. In M. Broe, J.
Pierrehumbert, Eds., Papers in laboratory phonology: Acquisition and the Lexicon, vol. 5.
Cambridge: Cambridge University Press (2001).
37. The work reported here was supported by the Mind Articulation Project, Japan Science and
Technology Corporation, Tokyo, Japan, and the Kanazawa Institute of Technology, Kanazawa,
Japan. We thank Laura Gonnerman for providing her stimulus materials.
10
Fig. 1. Changes in lexical activation levels predicted by the inhibited activation and inhibited recognition
hypotheses. Inhibited activation, attributes behavioral inhibitory effects to suppression of lexical
activation while the inhibited recognition hypothesis locates the source of response delay to intense
competition in target recognition.
Fig. 2. Magnetic field distributions corresponding to the M170, M250 and M350 response components.
RMS for the M350 was calculated from all 44 left hemisphere sensors (excluding midline sensors) as its
distribution often covers all of the left hemisphere. The source of the M250 is somewhat more posterior
than the source of the M350; therefore RMS for the M250 was taken from a set of left hemisphere sensors
that excluded the most anterior ones (n=21). Finally, the occipitotemporal distribution of the M170 was
captured by including all occipitotemporal sensors (n=37) (bilaterally arrayed) in the RMS. The sensors
used for RMS were held constant across experimental conditions and participants.
Fig. 3. Priming in reaction time (RT), M350 latency and M350 amplitude for onset-matching (spinachspin) and non-onset-matching (teacher-reach) targets as compared to their unrelated controls.
Fig. 4. Averaged MEG responses from one representative subject to all experimental conditions. Activity
on the left shows the time course of activation of the M350 positive maximum, indicated by a square on
the contour map on the right (here the same sensor is also the positive maximum or the M170). The
contour maps on the right show the distribution of the magnetic field at the times of M350 maxima for all
stimulus categories.
11
Fig. 1
12
Fig. 2
13
Fig. 3
14
Fig. 4
15