Proceedings of ISCA Workshop on Plasticity in Speech Perception (PSP2005); London, UK; 15-17 June 2005 Perceptual Learning of Noise Vocoded Words Cognition and Brain Sciences Unit Alexis G. Hervais-Adelman, Matt H. Davis, Robert P. Carlyon. 8000 F requency (Hz) 0 T ime (s ) 3.0 Vocoded Word 10 Subjects Vocoded Word 10 Subjects 0 0 Vocoded-Vocoded-Clear (VVC)- Does not induce pop-out 0 T ime (s ) Report Clear Word Vocoded Word Vocoded-Clear-Vocoded (VCV)- Induces pop-out F requency (Hz) 8000 Noise-Vocoded Speech In NV sentences, words that are initially difficult to understand, or even incomprehensible, appear to "pop out" of the distortion when their identity is known and can be much more accurately reported. Davis and colleagues (2005) found that performance improved more rapidly when the feedback provided induced the experience of popout. We attempt to replicate this finding with single words using feedback conditions designed to either induce or avoid pop-out. 3.0 Individuals fitted with a CI show dramatic improvements in comprehension over the first few months of using their implant. In this work we investigate whether responses to NV words shows a similar perceptual learning effect and further investigate the nature of the cognitive processes involved in this learning. We address 3 main questions: Experiment 1: Is learning affected by feedback on the content of distorted speech? Experiment 2: Is learning affected by the lexical status of single words? Experiment 3: Are improvements in NV word recognition associated with enhanced discrimination performance? Stimuli: The amplitude envelope of the clear words was extracted from 6 bands (50-8000Hz, spaced approximately logarithmically according to Greenwood's (1990) equation). The extracted envelopes were half-wave rectified and used to modulate bandlimited noise in their respective extraction bands. Subjects: All participants were native speakers of British English, aged 18-25 and had no history of hearing impairment or dyslexia. References: Davis, M. H., Johnsrude, I., Hervais-Adelman, A. G., Taylor, K., & McGettigan, C. (2005). Lexical information drives perceptual learning of distorted speech: Evidence from the comprehension of noise-vocoded sentences. JEP: General, 134(2), 222-241. Greenwood, D. D. (1990). A cochlear frequency-position function for several species--29 years later. J Acoust Soc Am, 87(6), 2592-2605. Shannon, R. V., Zeng, F. G., Kamath, V., Wygonski, J., & Ekelid, M. (1995). Speech recognition with primarily temporal cues. Science, 270, 303-304. % Words Reported Correctly Normal Speech Experiment 2 Report 40 35 30 Vocoded-Clear-Vocoded [pop-out] 20 Vocoded-Vocoded-Clear [no pop-out] First Test Block 2 groups of 12 subjects took part in a cross-over study in which they were trained with 120 VCV word and 120 VCV non-word stimuli; one group was trained with words followed by non-words (W.N.) and the other group was trained on non-words followed by words (N.W.). W.N. Test: 40 V Words Train: 120 VCV Words Test: 40 V Words Train: 120 VCV NonWords Test: 40 V Words Train: 120 VCV NonWords Test: 40 V Words Train: 120 VCV Words Test: 40 V Words Test: 40 V Words Experiment 2: Presence of Lexical Information Davis et al. (2005) found that a period of exposure to noisevocoded non-word sentences did not provide any advantage to listeners hearing NV speech, while training with syntactic prose was as effective as training with real English sentences. This was taken as an indication that lexical information is required for effective adaptation to NV speech. However because STM capacity may be exceeded by non-word sentences, we conducted an experiment in which subjects were trained with words and nonwords, and the effectiveness of each type of stimulus was assessed. 40 2AFC trialsminimal pairs of words Free Report: 20 V Words 20 40 35 30 Word - Non Word Training 25 Non Word - Word Training 20 Baseline Second Subjects in both conditions show a significant improvement in comprehension of NV speech (p<0.001). There is a significant difference in performance between VCV and VVC (p<0.003), indicating that training with VCV is more effective than training with VVC. 4 groups of 8 subjects (aged 18-25) were trained with 0, 40, 80 or 120 real English words in VCV triplets. Phoneme-discrimination performance was tested immediately beore and after training, using a 2AFC task with visually-presented minimal phonological pairs of real English words. Free report performance was also assessed- at the beginning and end of the experiment. N.W. Vocoded Word Clear Word 25 Experiment 3 Retest-Test Scores (%) Human speech perception is robust in the face of the degraded speech to which we are exposed in everyday life. This robustness is vital to cochlear implant (CI) users who can understand speech despite a dramatic reduction in the spectral detail and temporal fine-structure provided by their implants. The information provided by a CI can be simulated for normally hearing listeners by using a range of vocoding techniques. In noise vocoded (NV) speech information within a number of frequencyspecific regions (bands) is replaced by band-limited noise modulated by the amplitude envelope of speech within that frequency region (Shannon et al.,1995). Spectrograms of normal speech and noise-vocoded speech Experiment 1: The Role of Feedback % Words Reported Correctly Introduction After one training block After both training blocks There is a significant interaction between training condition and improvement in performance (p<0.03). The improvement in performance after training with words is significantly greater than with non-words (p<0.05). This shows that previous findings were not due to STM limitations. This result confirms the importance of the presence of lexical information for rapid learning of NV speech. Experiment 3: Discrimination It has been shown that subjects' ability to report noise vocoded speech improves with training. However, it is not clear whether this is due to improvements in their ability to discriminate between noise-vocoded speech sounds or due to improved guessing. In this experiment, subjects' discrimination performance was tested explicitly using a 2AFC task. Free report performance was also assessed to provide a comparison of the two tasks. Train: 0, 40, 80 or 120 VCV words 40 2AFC trialsminimal pairs of words Free Report: 20 V Words 20 15 10 5 0 -5 0 -10 120 80 40 Number of Training Words Free Report 2AFC 2AFC performance improves signficantly from the first to the second trial block; however, the amount of improvement appears to be independent of the amount of training received. Improvements in free report performance are significantly greater with more training (p<0.001) with a significant linear correlation between number of training words and performance (p<0.005), suggesting that the two tasks do not rely on the same underlying processes. Discussion We have shown that explicit knowledge of the identity of distorted words improves the rate at which listeners learn to understand NV speech. Knowledge of the acoustic form of the . This advantage depends upon involvement of a level of information beyond the acoustic representation of the stimuli, as demonstrated by the significantly greater effect of training subject with words rather than non-words. The generalisability of the learning to untrained words suggests that the learning occurs at a pre-lexical level, but the importance of lexical information to learning indicates that it is driven by higher levels of the auditory system Improvements in word recognition, but not discrimination, correlate with the amount of training given- it is unclear whether listeners learn to understand NV phonemes after training with NV words.
© Copyright 2026 Paperzz