FONETIK 2012, Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg Integration of Acoustic-Articulatory Information: Event Related Potentials to Speech and Non-speech Materials Jenny Hedberg1, Emma Nilsson1, Cristina Ojeda Alvarez1, Åsa Wolgast1, Eeva Klintfors2, Marie Markelius2, Johannes Bjerva2 and Petter Kallioinen2 1 Department of Clinical Science, Intervention and Technology, Karolinska Institute, Stockholm 2 Department of Linguistics, Stockholm University, Stockholm 33-week old infants (Lacerda, Klintfors, Gustavsson, Marklund, Sundberg, 2005). The infants were placed in front of a split-screen displaying four faces articulating the Swedish syllables and vowels /by/, /ba/, /a/ or /y/, while a sound of either /by/ or /a/ was played. The results showed that infants did not look at the faces consistent with the auditory information; instead they looked more at the faces with bilabial articulations; interpreted as the most visually prominent ones. The same study examined infants’ ability to match sound of hand-clapping to one of four images displaying different tempos of hand-clapping movements in a split-screen. In addition it was explored whether infants could match hand-clapping sound with a face pronouncing the syllable /by/ at the same pace as the clapping sound. In both cases three of the images showed movements or articulations that were incongruent to the hand-clapping audio, while one image displayed synchronized visual materials. The results showed that infants could match the hand-clapping audio with the synchronized visual clapping, but they could not match the temporarily synchronized articulation of /by/ with the sound of hand-clapping. Instead, the infants looked more at the film that displayed the most rapid repetition of articulation. The current investigation will look at ERP (Event Related Potential) components related to speech and non-speech materials. It is a pilot study for future EEG (electroencephalography) studies that will examine the onset of coordination of visual and acoustic information in infancy. One possible relevant ERP-component for the current study is the N400, which is related to semantic processing and characterized by distinctive negativity 400 ms post-stimulus. Semantically incorrect sentences such as “He spread the warm bread with socks” give rise to a more extensive N400 effect than semantically Abstract Twenty adults participated in an EEG pilot study, with the future intention to assess onset of integration of acoustic-articulatory information in infancy. The study examined ERP-effects in response to matching vs. nonmatching audio-visual speech and non-speech materials in four conditions: Condition (1) and (2) consisted of speech sounds and visually displayed articulation of speech sounds - either congruent or incongruent, Condition (3) consisted of the sound of hand-clapping and visually displayed articulation of speech sounds, and Condition (4) consisted of the sound of hand-clapping and visual images of hand-clapping. In the third and fourth condition, all visual materials were presented in different tempos that were either synchronised or unsynchronised with the auditory stimuli. The hypothesis was that all the non-matching materials would elicit a response similar to N400. The results showed a possible N400 effect in the third condition when handclapping sound was synchronized with visually displayed articulation of speech sounds. Introduction It is a known fact that adults utilize visual information from a speaker, in the sense that mouth and lip reading are performed to facilitate perception of speech. However, few studies have examined the age of onset and development of this ability in infants. In a previous study young infants (18- to 20-weeks) showed ability to pair acoustic and articulatory information (Kuhl & Meltzoff, 1982). In the study infants watched a split-screen showing two faces, articulating different vowels while hearing pronunciation of one of these. The results showed that infants looked significantly longer at the face articulating the vowel heard. A similar study was conducted with 25- to 89 FONETIK 2012, Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg and one audio stimulus. All stimuli within each condition were randomized. Three repetitions of the congruent audio-visual pairing (á 10 sec), showing matched auditory and visual information, as well as three incongruent audiovisual pairings (á 10 sec), were shown within each condition. Total duration of each condition was 60 seconds. In addition a grey box was shown between each stimulus (1 sec). All videos featured the same female actress against a blue background. In the first condition the actress articulated four different speech sounds: /a/, /ba/, /y/ and /by/. The audio consisted of repetitions of the syllable /a/. The second condition consisted of the same visual stimuli, while the audio consisted of repetitions of the syllable /by/. In the third condition the syllable /by/ was articulated at different speeds (157%, 101%, 63% and 49% of the original recording tempo). The audio played repetitions of the clapping sound with a pace of 101%. Thus, the auditory stimulus was congruent with the synchronized articulation of /by/ in 101% of the original tempo. In the fourth condition, the visual stimuli were video sequences of hand clapping movements at different speeds (157%, 101%, 63% and 49% of the original recording tempo). The audio consisted of repetitions of the clapping sound in 101% of the original tempo. correct sentences, such as “He spread the warm bread with butter” (Kutas, Hillyard, 1980; Kutas, Hillyard, 1984). The N400 effect has also been observed for nonverbal, pictorial materials. This finding implicates semantic systems that represent conceptual knowledge independent of input modality (Nigam, Hoffman, Simons, 1992). Furthermore, it has been found that unanticipated events in video sequences elicit N400 effects (Reid, Hoehl, Grigutsch, Groendahl, Parise, Striano, 2009). In the present study negativity after 400 ms relative to stimulus onset in the on-going EEG was anticipated when subjects were exposed to congruent and incongruent auditory-visual materials. The negativity was expected to be more extensive in association with incongruent stimuli since they are more unanticipated. Method Participants The participants were 20 adults (8 male, 12 female; age range 20- to 53 years; mean age 27.6 years). The subjects were personal acquaintances, first-year speech and language pathology students and individuals that had been randomly recruited to take part in the experiment. 17 participants were native speakers of Swedish and 3 participants of Spanish, Portuguese and German respectively, however they were fluent in Swedish. All participants reported normal vision and hearing. 19 participants were right-handed and one was ambidextrous. The subjects were informed about the purpose of the experiment after their participation. The participants did not receive anything in exchange for their participation. Procedure The participant was seated in a sound attenuated studio at a distance of approximately 60 cm from a computer screen (HP L1940T, 19”). The experiment was run with E-Prime software (ver. 2.0). The loudspeakers were placed on each side of the screen. When the net (HydroCel Geodesic Sensor Net) was in place the impedance was measured with a threshold of 50 kΩ for each electrode. During experiment the participant and the experimenters were separated by an adequately soundproof wall with an observation window. All data was Materials Materials consisted of four animated conditions (Condition 1- to 4, Table 1) in fixed order. Each condition consisted of six visual stimuli Table 1. Schematic table of the test materials Condition 1: Speech-Articulation Congruent Incongruent Visual: /ba/, Visual: /a/ Auditory: /a/ Auditory: /a/ /by/, /y/ Condition 3: Hand clapping-Articulation Congruent Incongruent Visual: Auditory: Visual: /by/ Auditory: /by/, pace hand-clapping, paces: 49%, hand-clapping, 101% pace: 101% 63%, 157% pace: 101% Congruent Visual: /by/ Condition 2: Speech-Articulation Incongruent Auditory: /by/ Visual: /a/, /ba/, /y/ Auditory: /by/ Condition 4: Hand clapping-Hand clapping movements Congruent Incongruent Visual: Auditory: Visual: Auditory: hand-clapping, hand-clapping, hand-clapping, paces: hand-clapping, pace: 101% pace: 101% 49%, 63%, 157% pace: 101% 90 FONETIK 2012, Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg collected with the EEG recording software Net Station (ver. 4.2.1). The present study strode along the principles of research ethics 1. 55 Analysis of data The data was analyzed with Net Station (ver. 4.2.1). A band pass filter was set to 1-40 Hz to remove body movement artefacts. Immediately prior to stimulus onset a period of 200 ms was used as baseline referencing the EEG-voltage measurements. The data was divided into groups based on congruence/incongruence, as well as speech/non-speech. To exclude eye blinks and eye movements the filter was set to 55-1400 µv. In order to avoid bad channels, changes that exceeded 200 µv per time window were excluded and compensated for. Electrodes 100 and 57 (around processus mastoideus) were used as references. An average ERP waveform was calculated for each test condition over the three first repetitions of the syllable (Condition 1 and 2) or repetitions of the clapping sound (Condition 3 and 4). The data was first controlled for N1-N2 response (vertexpotential) for a number of electrodes. Electrode 55 showing a waveform among the most distinctive ones that could be located on midline was selected for analysis. 55 Figure 1. Averaged ERP waveforms for congruent (thick lines) and incongruent (thin lines) audiovisual materials in Condition 1 (vowel /a/ and articulations of speech sounds), Condition 2 (syllable /by/ and articulations of speech sounds), and Condition 4 (sound of hand-clapping and visual hand clapping movements). No distinct differences between the waves are observed. 55 Results In the first stage of the analysis an apparent N1P2 brain response was found. The results showed that ERP waveforms for Condition 1, 2, and 4 displayed no consistency with a typical N400 effect (Figure 1). The ERP waveforms for Condition 3 displayed a greater negativity (at approximately 400 ms after onset) when the materials were synchronized (Figure 2). 4 µV 3 2 1 55 0 -0,2 -0,1 -1 0 0,1 0,2 0,3 0,4 0,5 0,6 0,4 0,8 0,9 sec -2 -3 -4 Figure 2. Averaged ERP waveforms for congruent (thick line) and incongruent (thin line) audio-visual materials in Condition 3 (sound of hand- clapping and articulations of /by/ in different tempo). Negativity is found for the synchronized stimuli at approximately 400 ms after onset. 1 The project is conducted in accordance to regulations set by the Data Inspection Board and Research Ethics Committee at Karolinska Institute (Dnr 2008/3:3), the Personal Data Act (1998:204) and the Act on Ethics Review of Research Involving Humans (2003:460). 91 FONETIK 2012, Department of Philosophy, Linguistics and Theory of Science, University of Gothenburg explained as a perceptual miss-match or as a difference related to attention factors. In fact, the waveform and topographic similarities between the current results and the results in Proverbio & Riva (2009), who found N400 effects for strange pictures, give some support to a N400 interpretation. Obviously further investigation along that line is needed. The current experiment paradigm was used to validate the materials to assess integration of audio-visual information in infancy. Testing materials with adults being necessary, some inherent methodological difficulties arise when adults are exposed to child-directed materials. For example, the participants in the current study may not have been fully attentive throughout the experiment due to lack of a stimulation provided by the contents of the materials. In an attempt to improve the study, adult participants could be given a concurrent pseudo task to remain attention to the materials. Discussion Studies on N400 have concluded that N400 effects are generated by sentences and words (Kutas, Hillyard, 1980; Kutas et al., 1984) and not just by isolated syllables such as the ones presented in the current study. The current results showed no N400 effects for the incongruent stimuli in none of the test conditions. One explanation for this might be that the stimuli in the current experiment were too short to be perceived as having semantic content. For example, Condition 4 – presenting the sound of hand-clapping and visual handclapping movements – dig not bring much or any linguistic information to the context. Therefore it is likely that the participants in the current study did not perceive these audiovisual materials as semantically confusing enough to elicit a N400 effect. In Condition 3 – presenting the sound of hand-clapping and visual articulatory movements – an extensive negativity was found for the temporarily synchronized stimuli. This implies that the participants experienced that the temporarily synchronized articulation did not match the sound of hand-clapping. A reason why an effect similar to N400 was displayed only for the congruent audio-visual materials, and not when the audio-visual materials were unsynchronized, could be that the participants put together audio-visual information more instinctively when they are presented at the very same time. Adults have knowledge of that mouth and lip movements do not correspond to the clapping sound. The primary finding of the present experiment could be useful in ERP studies in infants, to conclude at what age they begin to perceive that mouth and lip movements do not correspond to the sound of hand-clapping. Also, the age at which infants have achieved the ability to discriminate between speech sounds and non-speech sounds would be indicated. To conclude, the results in Condition 3 are in agreement with previous findings and suggest that N400 effects might not only be related to pure semantic stimuli (Nigam et al, 1992), but also to unanticipated events (Reid et al, 2009). The incongruence effect found in the current experiment could thus be seen as a violation of world knowledge, i.e. interpreted as a semantic incongruence, but it could also be Acknowledgements Research supported by The Swedish Research Council (nr 2009-2245). References Kuhl P. K. and Meltzoff A. N. (1982). The bimodal perception of speech in infancy. Science, 218:1138-1141. Kutas M. and Hillyard S. A. (1980). Reading senseless sentences: Brain potentials reflect semantic incongruity. Science, New Series, 207; 4427: 203-205. Kutas M. and Hillyard S. A. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature, 307: 161-163. Lacerda F., Klintfors E., Gustavsson L., Marklund E. and Sundberg U (2005). Emerging linguistic functions in early infancy. 5th International Workshop on Epigenetic Robotics: 55-62. Nigam A., Hoffman J. E. and Simons R. F. (1992). N400 to semantically anomalous pictures and words. Journal of Cognitive Neuroscience, 4; 1: 15-22. Proverbio A. M. and Riva F. (2009) RP and N400 components reflect semantic violations in visual processing of human actions. Neuroscience Letters, 459: 142146. Reid V. M., Hoehl S., Grigutsch M., Groendahl A., Parise E. and Striano T. (2009). The neural correlates of infant and adult goal prediction: Evidence for semantic processing systems. Developmental Psychology, 45; 3: 620-629. 92
© Copyright 2026 Paperzz