Neural Mechanisms of Spoken Word Recognition: MEG Evidence for Distinct Sources of Inhibitory Effects Liina Pylkkänen, Andrew Stringfellow and Alec Marantz Department of Linguistics and Philosophy, KIT/MIT MEG Laboratory, Massachusetts Institute of Technology, Cambridge, MA Correspondence: Liina Pylkkänen Department of Linguistics and Philosophy Massachusetts Institute of Technology E39-229 77 Massachusetts Avenue Cambridge, MA 02139 (617) 253 2690 (Phone) (617) 253-5017 (fax) [email protected] (E-mail) Category: Report 1 Abstract Spoken word recognition involves excitation of potential matches to a stimulus and inhibition of representations that do not optimally conform to the input. Inhibition is evidenced by delayed reaction times to stimuli whose representation has been rejected as the wrong match to a previous stimulus. The cognitive level of this effect was investigated in a priming paradigm using magnetoencephalography. Results locate the delay to inhibited activation for targets that match in onset with the prime and to inhibited recognition for targets that are otherwise phonologically similar to the prime. The study provides evidence that the neural correlates of behavioral inhibition are not uniform. (103 words) 2 Spoken word recognition involves mapping a temporal arrangement of distinctive speech units, i.e. phonemes, to a single representation in the mental lexicon. A person’s vocabulary consists of tens of thousands of entries. In contrast, the inventory of distinctive units contains on the average only 30 to 40 phonemes (1). It is thus obvious that the sound representations are not highly distinctive. Nevertheless, humans are able to perform this computationally demanding matching task effortlessly and extremely rapidly. The average speaking rate in continuous speech has been estimated to be around five syllables per second (2). Given that many words consist of only one syllable, this implies it must not take humans more than a few hundred milliseconds to match acoustic input to lexical representations. Similarly to all mental processes, successful spoken word recognition depends on a delicate balance between excitation and inhibition. As the incoming speech signal unfolds over time, all potential matches to the input must be rapidly activated. However, to allow for selection among these entries, the incorrect items must also be quickly inhibited as soon as it can be determined that they do not constitute the best match to the input. If either activation or inhibition is impaired, healthy language processing is impossible. For example, the language problems of Broca’s aphasia have been attributed to deficient activation and the problems of Wernicke’s aphasia to deficient inhibition (3). In this study we employed magnetoencephalography (MEG) to investigate the neural mechanisms of inhibition in spoken word recognition. The millisecond-resolution of MEG makes it possible to probe into the multiple mental processes occurring within the half second between stimulus presentation and response (4). Behaviorally, inhibitory effects are obtained in priming paradigms where targets are phonologically similar to their primes, as in spinach-spin (5). Most current models agree that these inhibitory effects arise from competition among activated entries (6-8). However, these reaction time effects are ambiguous as to the processing stage that is affected by competition. The delay in reaction time could originate either from inhibited target activation or from inhibited target recognition. The inhibited activation and recognition hypotheses make crucially different claims about the fundamental mechanisms underlying the recognition process. The inhibited activation hypothesis assumes that inhibition is suppression of wrong matches under their resting level of activation. On this hypothesis, the delay in reaction time for prime-target pairs such as spinach-spin results from the fact that in primeprocessing the correct match to the prime, i.e. SPINACH, suppresses its competitor SPIN under some resting level of activation (9). Consequently, when spin occurs as the target, its activation is delayed. In contrast, the inhibited recognition hypothesis locates the inhibitory effect entirely to post-activation competition. On this hypothesis, incorrect matches to the input are not suppressed and therefore the activation of phonologically related targets is not inhibited. Nevertheless, reaction times to targets in pairs such as spinach-spin are predicted to be delayed because of competition from the highly active prime entry. Since SPINACH is the correct match to the prime, its activation level at target presentation is 3 necessarily higher than the level of SPIN. Thus the recognition of SPIN as the correct match to the target should be delayed due to interference from SPINACH. Discrimination between the inhibited activation and inhibited recognition hypotheses is further complicated by the fact that both hypotheses make the same predictions even in paradigms which in other situations prove useful for distinguishing between different hypotheses about the cognitive level of behavioral effects, such as short SOA (stimulus onset asynchrony) priming paradigms (10). While the behavioral predictions of the inhibited activation and recognition hypotheses are difficult to distinguish (11), they make crucially different predictions about the timing of initial lexical activation. The inhibited activation hypothesis predicts delayed target activation. In contrast, the inhibited recognition hypothesis predicts facilitated lexical activation: if competitors are not suppressed but simply fail to reach some threshold of activation for recognition, then their activation should remain elevated for some time after prime presentation. Thus, to locate the source of the inhibitory effect, one would need an index of initial lexical activation that is independent of reaction time, i.e. an index of the excitation of lexical entries before competition effects begin. Such a measure would be able to dissociate facilitation in early lexical activation from later inhibitory effects. To investigate the neural sources of behavioral inhibition, we recorded the magnetoencephalograms of twenty-three healthy native English speakers while they made lexical decisions to visual targets preceded by phonologically related auditory primes (12). Previous MEG studies on lexical processing have concentrated on visual word recognition (13-17). Our crossmodal paradigm allowed us to investigate inhibition in the processing of spoken primes by measuring the effects of inhibition on well-established response components elicited by visual targets. In particular, we have previously identified an MEG component at approximately 350ms post-stimulus onset (M350) which is sensitive to lexical factors but not to competition (18). The M350 is generated by a source in the left superior temporal cortex (13, 17), and it plausibly corresponds to ERP negativities in the 250-350 ms range found to be sensitive to lexical factors in word-class studies (19, 20) (rather than the somewhat later-peaking N400). The M350 can vary independently of reaction time (17) and is also the earliest MEG response component sensitive to lexical frequency (16). Together, these results strongly support the hypothesis that the M350 indexes initial excitation of lexical entries. Since this is precisely the stage of lexical processing where the inhibited activation and inhibited recognition hypotheses make distinct predictions, the M350 offers a promising tool for investigating the nature of inhibition in word recognition. If inhibition is suppression of competitors (inhibited activation hypothesis), M350 latencies should be increased for phonologically related targets. In contrast, if inhibition effects arise purely from post activation competition (inhibited recognition hypothesis), M350 latencies of phonologically related targets should show facilitation (Fig. 1). 4 The effects of two different types of phonological similarity were investigated. In one type the prime and target matched in onset, as in spinach-spin, while in the other the prime and target were phonologically similar without onset-matching, as in teacher-reach (21). The role of word onsets in lexical access is controversial. The first influential model of spoken word recognition hypothesized that the candidate set for recognition is constrained by the onset of the input, i.e. bullet would activate BULLET and BULL but not FULL (22). However, evidence has since accumulated (6, 23, 24) to support so-called continuous mapping models, (7, 8), where lexical entries are activated at any point of the stimulus. But even though the necessity for continuous activation is by now generally agreed on, evidence exists for asymmetries between activated incorrect entries that share an onset with the input and incorrect entries that do not. For example, while the stimulus trombone semantically primes the target rib via BONE (25, 26), the same effect is not obtained when the target is homophonous to the beginning of the prime. In other words, cargo does not prime bus via CAR (26, 27). Our stimulus manipulation allowed us to investigate not only the neural mechanisms of inhibition but also the additional question of whether onset-matching and non-onset-matching competitors would pattern together with respect to M350 priming. MEG data was continuously acquired while participants performed the lexical decision task, and then averaged off-line (28). In addition to the M350, two earlier components elicited by visual word stimuli were examined. The first, the M170, follows the visual M100 and peaks at 150-220ms (16, 17, 29, 30). The second component, the M250, peaks at 200-300ms and has a left-hemisphere source which is somewhat more posterior than the source of the M350 (16, 17). For all three components, latencies and amplitudes were determined individually for each participant by calculating the root mean square (RMS) field strength of the sensors that covered the appropriate field pattern (Fig 2). Two participants’ data did not show the characteristic distribution of the M350; these participants were excluded from all analyses. In previous studies, the M170 and the M250 have not shown sensitivity to lexical factors (16, 17); if the M350 indexes initial activation of lexical entries, neither the M170 nor the M250 should vary across conditions in this experiment, either. In the behavioral data, inhibition was obtained for both the teacher-reach and spinach-spin comparison. Response times to targets of phonologically related non-onset-matching pairs (teacher-reach, x = 685) were significantly slower than responses to their unrelated controls (ocean-reach, x = 666), t(20) = 2.25, p < 0.05. Similarly, targets of onset-matching pairs (spinach-spin, x = 705) were responded to more slowly than targets of their unrelated controls (muffler-spin, x = 672), t(20) = 2.76, p < 0.05, (31). Thus, the two comparisons were behaviorally indistinguishable. However, onset-matching and non-onsetmatching phonological relatedness had strikingly different effects on the M350. The M350 latencies of targets in teacher-reach type pairs ( x =332.8) were shorter than their controls ( x =345.7), t(20) = -3.33, p 5 < 0.005. In contrast, the M350s elicited by the targets of spinach-spin type pairs ( x = 349.7) were delayed in comparison to their controls ( x =334.6), t(20) = 2.8, p < 0.05. Similarly, the amplitudes of the M350s elicited by phonologically related non-onset-matching targets were smaller than their controls (t(20) = 2.6, p < 0.05), while the amplitudes of the M350s elicited by onset-matching targets were larger their controls, although the latter effect was only marginal (t(20) = 2, p = 0.06). Phonological relatedness did not modulate the latencies or amplitudes of the earlier M170 or M250 components in either comparison. Our results locate the inhibitory effects of onset-matching and non-onset-matching targets to distinct processing stages. The increased M350 latencies and amplitudes of onset-matching targets (spinach-spin) indicate that their activation is inhibited. In contrast, the decreased M350 latencies and amplitudes of nononset-matching targets (teacher-reach) show that the inhibitory effect observed in their reaction times must arise purely from post-activation competition. The result supports models where activation is a separate processing stage from competition (6, 8) and where onset-matching competitors have a special status in the recognition process. The result also offers an interesting explanation to the asymmetries obtained in the activations of initially and medially embedded words: cargo does not prime bus via CAR because CARGO suppresses CAR, while trombone does prime rib since BONE does not match in onset to the prime. Finally, our study converges on a body of results in acoustics and phonology supporting the special status of word onsets. Both consonants (32, 33) and vowels (34) are perceptually more salient in word-initial than in other positions, even when stress and syllabic position are controlled for. Word-initial segments are also phonologically more stable than word-medial and final segments. For instance, processes that neutralize phonemic distinctions are relatively rare word-initially (35). If word onsets play a special role in word recognition, the mechanisms of the speech perception system can potentially shed light on the distribution of these phonetic and phonological processes (35, 36). 6 References and Notes 1. I. Maddieson, Pattern of sounds. Cambridge University Press (1984). 2. D. O’Shaugnessy, Journal of the Acoustical Society of 76, 1664-1672 (1984). 3. S. E. Blumstein, W. P. Milberg. Language deficits in Broca’s and Wernicke’s aphasia: A singular impairment. In Y. Grodzinsky, L. Shapiro, D. Swinney, Eds. Language and the brain: Representation and processing. New York, Academic Press. 4. MEG is similar to EEG except that it measures the magnetic fields, instead of electric potentials, that are generated by postsynaptic currents in neurons. Unlike electric potentials, magnetic fields are not distorted by the skull and other tissue. Therefore, isolation of the sources underlying the activity measured outside the head is more straightforward in MEG than in EEG. 5. K. Rastle et al., Language and Cognitive Processes, 15, 507-537 (2000) 6. P. A. Luce, D. B. Pisoni, S. D. Goldinger, Similarity neighborhoods of spoken words. In G. T. M. Altmann, Ed., Cognitive models of speech processing: Psycholinguistic and computational perspectives, 122-147. Cambridge, MA: MIT (1990). 7. J. L. McClelland, J. L. Elman, Cognitive Psychology, 18, 1-86 (1986). 8. D. Norris, Cognition, 52, 189-234 (1994). 9. Throughout this paper, italics are used to refer to stimulus items and capital letters to lexical representations. 10. Shortening the interval between the prime and the target has been a useful method for discriminating between different hypotheses about the sources of reaction time effects (e.g. 4). However, the inhibited activation and recognition hypotheses cannot be distinguished behaviorally, even if stimulus onset asynchrony (SOA) is very short. In such a paradigm, both hypotheses predict priming, rather than inhibition, although for different reasons. The inhibited activation hypothesis makes this prediction because suppression of the competitor SPIN would not have time to develop prior to presentation of SPIN if the SOA is very short. The inhibited recognition hypothesis, on the other hand, predicts priming because very brief SOAs would not allow the activation level of SPINACH to significantly surpass the activation level of the competitor SPIN. Therefore, SPINACH would not constitute a particularly strong competitor at the time of target presentation. This would plausibly allow the elevated activation level of SPIN to surface as a speed-up in reaction time, without being overriden by a competition effect. 10. Although see J. Vroomen, B. de Gelder, Frequency effects and lexical inhibition in spoken word recognition, Paper presented at the Seventh Conference of the European Society for Cognitive Psychology, Lisbon, Portugal (1994). 7 12. All subjects (7 females and 16 males) were right-handed and had normal or corrected to normal vision. 13. P. Helenius et al.,. Journal of Cognitive Neuroscience, 11, 535-550 (1999). 14. S. Koyama et al., Neuropsychologia 36:1, 83-98 (1998). 15. S. Kuriki et al., Cognitive Brain Research, 4, 185-199. 16. D. Embick et al., Cognitive Brain Research, 10: 3, 345-348 (2001). 17. L. Pylkkänen, A. Stringfellow, A. Marantz, Brain and Language (in press). 18. In an initial study, Embick et al. (15) varied lexical frequency in a lexical decision task and showed that the M350 is the first MEG response component whose latency increases as frequency decreases. To investigate the cognitive level of the M350 further, Pylkkänen et al. (16) manipulated lexical activation and selection/recognition simultaneously by varying stimuli in phonotactic probability. Phonotactic probability affects activation and selection in opposite ways, which made it possible to potentially dissociate the M350 from reaction times. Stimuli which are high in phonotactic probability induce fast initial activation but late selection (M. S. Vitevitch, and P. A. Luce. Journal of Memory and Language 40: 374-408 (1999)). The delay in selection, and subsequently in reaction time, is caused by the fact that high-probability stimuli are associated with dense similarity neighborhoods, and consequently with intense competition. M350 latencies were found to be decreased, rather than increased, for high probability stimuli. This shows that while the M350 is sensitive to lexical factors, it is not affected by competition. 19. J. W. King, M. Kutas. Neuroscience Letters, 244, 1-4 (1998). 20. C. M. Brown, P. Hagoort, M. ter Keurs. Journal of Cognitive Neuroscience 11:3, pp. 216-281 (1999). 21. The stimulus materials were derived from the behavioral study of L. Gonnerman. Morphology and the lexicon: exploring the semantics-phonology interface, thesis, University of Southern California (1999). The phonologically related, non-onset-matching comparison (teacher-reach) was added to her original design. All auditory primes were prepared by Gonnerman. The two conditions along with their unrelated controls yielded four categories of word stimuli: phonologically related onsetmatching pairs (spinach-spin, n=28) with their controls (muffler-spin, n=28) and phonologically related non-onset-matching pairs (teacher-reach, n=56) with their controls (ocean-reach, n=56). In addition, participants made lexical decision on 290 other word-targets, which served as stimuli for a different experiment. Nonword stimuli consisted of 170 targets of which 50 were onset-matching to their primes (lecture-lect), 50 phonologically related without onset-matching (mundane-tund) and 70 phonologically unrelated (biscuit-cobe). To balance the ratio of words to nonwords, all the nonword stimuli were presented twice. Target stimuli were presented at the offset of the auditory prime, i.e. SOA equaled the duration of the prime. In addition to the lexical decision task, participants were 8 instructed to pronounce out loud the last auditory stimulus they had heard 30 times in the course of the experiment. Participants were allowed to rest every 50 trials. Auditory primes were presented binaurally via airtube earphones, and visual targets were projected onto a ground glass screen. Target stimuli were presented in nonproportional Courier font, and subtended approximately 1.2° of visual angle vertically and 1.2° per character horizontally (the length of target stimuli varied 4 to 10 letters) 22. W. D. Marslen-Wilson, A. Welsh, Cognitive Psychology, 10, 29-63 (1978). 23. J. Andruski, S. E. Blumstein, S.E., M. Burton, Cognition, 52, 163-187 (1994). 24. P. D. Allopenna, J. S. Magnuson, K. Tanenhaus, Journal of Memory and Language, 38, 419-439 (1998). 25. J. Vroomen, B. de Gelder, Journal of Experimental Psychology, 23, 710-720 (1997). 26. F. Isel, N. Bacri, Brain and Language, 68, 61-67 (1999). 27. P. Zwitzerlood, Cognition, 32, 25-64 (1989). 28. Neuromagnetic fields were recorded using a 93-channel axial gradiometer whole-head system (Kanazawa Institute of Technology, Kanazawa, Japan). Data were acquired in a band between 1Hz and 200Hz, at a 500Hz sampling frequency. External sources of noise were removed online using an active compensation coil system (Vacuumshmelze, Hanau, Germany). Offline noise reduction using three orthogonally-oriented reference sensors (CALM algorithm, Y. Adachi et al., IEEE Transactions on Applied Superconductivity, 11, 669-72) was also performed. Data were averaged by stimulus condition from the onset of the visual target. For averaging, an epoch length of 900ms, plus a 100ms baseline period, was used. Trials where the subject responded incorrectly, or responded more than 3SD faster or slower than his/her mean were eliminated from averages. This resulted in the rejection of 5.7 % of the trials. Artifact rejection excluded all trials to stimuli that contained signals exceeding ± 2pT in amplitude and resulted in the exclusion of an additional 7.5 % of the trials. Averaged files were low-pass filtered at 30Hz, and baseline adjusted using the 100ms pre-stimulus interval. 29. A. Tarkiainen et al., Brain, 122, 2119-2131 (1999). 30. P. Helenius et al., Cerebral Cortex, 9, 1047-3211 (1999). 31. The phonologically related conditions also elicited more incorrect or timed-out responses than their unrelated controls. 9.2% of onset-matching targets were responded to incorrectly or too slowly while their unrelated controls elicited an error-rate of only 6.8%, t(20) = 2.02, p < 0.05. Similarly, phonologically related non- onsetmatching targets elicited more incorrect or timed-out responses (5.2%) than their unrelated controls (3.9%), t(20) = -1.78, p < 0.05. 32. N. Umeda, Journal of the Acoustical Society of America, 61, 846-858 (1977). 33. A. M. Cooper, Proceedings of the 12th International Congress of Phonetics Sciences (1991). 9 34. Y. Meynadier, C. Fougeron, C. Meunier, Proceedings of the 14th International Congress of Phonetics Sciences (1999). 35. D. W. Gow Jr., J. Melvold, S. Manuel, Proceedings of the Fourth International Conference on Spoken Language Processing, 1, 66-69 (1994) 36. S. Frisch Temporally ordered lexical representations as phonological units. In M. Broe, J. Pierrehumbert, Eds., Papers in laboratory phonology: Acquisition and the Lexicon, vol. 5. Cambridge: Cambridge University Press (2001). 37. The work reported here was supported by the Mind Articulation Project, Japan Science and Technology Corporation, Tokyo, Japan, and the Kanazawa Institute of Technology, Kanazawa, Japan. We thank Laura Gonnerman for providing her stimulus materials. 10 Fig. 1. Changes in lexical activation levels predicted by the inhibited activation and inhibited recognition hypotheses. Inhibited activation, attributes behavioral inhibitory effects to suppression of lexical activation while the inhibited recognition hypothesis locates the source of response delay to intense competition in target recognition. Fig. 2. Magnetic field distributions corresponding to the M170, M250 and M350 response components. RMS for the M350 was calculated from all 44 left hemisphere sensors (excluding midline sensors) as its distribution often covers all of the left hemisphere. The source of the M250 is somewhat more posterior than the source of the M350; therefore RMS for the M250 was taken from a set of left hemisphere sensors that excluded the most anterior ones (n=21). Finally, the occipitotemporal distribution of the M170 was captured by including all occipitotemporal sensors (n=37) (bilaterally arrayed) in the RMS. The sensors used for RMS were held constant across experimental conditions and participants. Fig. 3. Priming in reaction time (RT), M350 latency and M350 amplitude for onset-matching (spinachspin) and non-onset-matching (teacher-reach) targets as compared to their unrelated controls. Fig. 4. Averaged MEG responses from one representative subject to all experimental conditions. Activity on the left shows the time course of activation of the M350 positive maximum, indicated by a square on the contour map on the right (here the same sensor is also the positive maximum or the M170). The contour maps on the right show the distribution of the magnetic field at the times of M350 maxima for all stimulus categories. 11 Fig. 1 12 Fig. 2 13 Fig. 3 14 Fig. 4 15
© Copyright 2025 Paperzz