Acoustic Modeling of the Perception of Place Information in Incomplete Stops Presented at the 169th Meeting of the Acoustical Society of America 19 May 2015 Session 2pSC Pittsburgh, PA Megan Willi & Brad Story Speech, Language, and Hearing Sciences, The University of Arizona Method b d g 20 0 20 0 30 32 34 36 38 40 42 44 30 32 34 36 38 40 42 44 d g 20 0 30 32 34 36 38 40 42 44 b 60 d 40 0 /d/ /b/ F3 F2 F2 F2 F1 F1 F1 VV1_100ms 150 100 b 50 d 0 g 30 34 38 42 Figure 3: Relative Formant Deflection Patterns The formant frequencies of the VV contexts are represented with black lines and the relative formant deflections are represented with red (upward deflection) and blue (downward deflection) lines. Figure 4: Characterization of the Tube Talker where the black and blue lines represent the vocal tract shapes for the first and second vowels (i.e. [əә] and [i] respectively) and the red line represents the cross sectional area achieved by the constrictive gesture. All stimuli for Experiment 1 and Experiment 2 had an incomplete closure of .1 cm2 . 100 b 50 d 0 g 3033363942 VV1_100ms 150 100 b 50 d 0 g 3033363942 g 80 Axis Title VV1_100ms 0 100 100 b d 40 g 30 32 34 36 38 40 42 44 g 80 d 40 g 20 0 30 32 34 36 38 40 42 44 30 32 34 36 38 40 42 44 VV1_100ms 120 100 80 b 60 d 40 g 20 0 b 60 VV1_100ms 120 60 d 40 20 30 32 34 36 38 40 42 44 80 b 60 30 32 34 36 38 40 42 44 30 32 34 36 38 40 42 44 80 b 60 d 40 g 20 0 30 32 34 36 38 40 42 44 Figure 9: Average participant ID curves and contour plots (F2 lower panel and F3 upper panel) for three vowel contexts (i.e. [əәi], [əәɑ], [əәu] from top to bottom respectively) for Condition 1, 2, and 3 (left to right respectively). See Figure 8 for a detailed description. [əәu] Discussion Listeners’ phonetic boundaries in Experiment 1 and Experiment 2 indicate that place-of-articulation information is present in incomplete, voiced stop consonant VCV stimuli lacking canonical hold duration and burst cues. Listeners’ phonetic boundaries coincide with the proposed relative formant deflection patterns for all three places-of-articulation (i.e. /b-d-g/) across vowel contexts (i.e. [əәi], [əәɑ], [əәu]) and timing function manipulations (i.e. Conditions 1, 2, 3) except for the velar position in Condition 3. The results suggest that listeners may be sensitive to changes along this relative acoustic dimension and that relative formant deflection patterns could potentially explain the perception of place-of-articulation information in natural, reduced speech contexts. However, further investigation of the the perceptual limits of this cue with respect to place is necessary. Experiment 1: Original Timing- 500 ms Experiment 2: Condition 1- 300 ms (60%) Condition 2 & 3- 200 ms (40%) References Crystal, T. H., & House, A. S. (1988). Segmental durations in connected‐speech signals: Current results. The journal of the acoustical society of America, 83(4), 1553-1573. Story, B.H. (2009). Vowel and consonant contributions to vocal tract shape. The Journal of the Acoustical Society of America, 126, 825-836. Story, B. H., & Bunton, K. (2010). Relation of vocal tract shape, formant transitions, and stop consonant identification. Journal of Speech, Language, and Hearing Research, 53(6), 1514-1528. Warner, N., & Tucker, B. V. (2011). Phonetic variability of stops and flaps in spontaneous and careful speech. The Journal of the Acoustical Society of America, 130(3), 1606-1617. Figure 6: Illustration of the proportionally reduced timedependent activation functions for the constrictive gestures. The percent represents the proportion of the original signal’s timing function maintained. Experiment 1: Original Timing 500 ms Experiment 2: Condition 1- 300 ms (60%) Condition 2- 200 ms (40%) Condition 3- 100 ms (20%) b d 120 0 Figure 5: Illustration of the proportionally reduced VV contexts. The percent represents the proportion of the original signal’s timing function maintained. VV1_100ms 150 Axis Title F3 Axis Title F3 0 30 32 34 36 38 40 42 44 100 40 g 20 100 60 d 40 100 80 b 60 120 20 [əәɑ] 80 VV1_100ms VV1_100ms 30 32 34 36 38 40 42 44 Figure 7: (Experiment 1) Example stimuli from one VV context (i.e. [əәi]) at three different vocal tract locations (i.e. 17.5 cm, 13.9 cm, 11.9 cm respectively for /bd-g/). (Experiment 2) Example stimuli at vocal tract location 17.5 cm (i.e. /b/) for conditions 1, 2, and 3 for each vowel context (i.e. [əәi], [əәɑ], [əәu] respectively). [əәi] g 120 0 g 20 d 40 120 20 Results: Experiment 1 Participants: 10 native English speakers (Exp. 1) and 5 native English speakers (Exp. 2) Task: Forced Choice Test (i.e. /b-d-g/) Materials: All stimuli were 500 ms, vowel-consonant-vowel (VCV) utterances simulated using a voice source model based on the kinematic representation of the medial surfaces of the vocal folds and an airway modulation model of the vocal tract (aka ‘Tube Talker’). VCV continua were created for 3 underlying vowel-to-vowel transition (VV) contexts (i.e. [əәi], [əәɑ], and [əәu]) by incrementally moving the constriction location from the lips toward the velar part of the vocal tract in 20 (Exp.1) and 15 (Exp. 2) discrete 0.4-cm steps. Experiment specific manipulations are described below. Design: Stimuli were randomly presented 5 times (Exp. 1) and 3 times (Exp.2) in a block design were only one vowel context was presented per block. Analysis: Participants ID curve boundaries were compared to the perceptual boundaries predicted by the contour plots of the relative formant deflection patterns. Axis Title [əәi] VV1_100ms 80 b 60 VV1_100ms 100 40 80 0 30 32 34 36 38 40 42 44 120 60 100 20 [əәbu] b 120 100 Axis Title 80 g 120 Axis Title 100 80 d 40 Axis Title 120 100 b 60 VV1_100ms 120 40 80 [əәbɑ] [əәbi] 60 g Axis Title Axis Title 0 VV1_100ms 30 32 34 36 38 40 42 44 d 40 20 30 32 34 36 38 40 42 44 Experiment 2: b 60 Axis Title g Aims Axis Title g 80 [əәɑ] 50 d Evaluate participants’ perceptions of place-of-articulation information in stimuli that simulate: 1) incomplete closure in reduced voiced stop consonants. 2) proportionally reduced consonant and vowel timing functions in stimuli with incomplete stop consonant closure. /g/ d 40 Axis Title b Axis Title 30 32 34 36 38 40 42 44 b 60 [əәu] 100 0 80 VV1_100ms VV1_100ms Axis Title g 100 Axis Title 50 d 120 100 Axis Title b 120 Condition 3 Condition 2 [əәgi] VV1_100ms 100 0 Figure 2: Reduced speech examples of 100ms, VCV segments excised from the read words “sabotage”, “steady”, and “spigot.” [əәdi] VV1_100ms 120 20 150 Axis Title Axis Title 100 Figure 1: Reduced speech example of the read word “sabotage.” [əәbi] VV1_100ms VV1_100ms VV1_100ms 150 Results: Experiment 2 Condition 1 Experiment 1: Axis Title Previous research on stop consonant production found that less than 60% of the stops sampled from a connected speech corpus contained a clearly defined hold duration followed by a plosive release [Crystal & House, JASA, 1988]. How listeners perceive the remaining portion of incomplete stop consonants is not well understood. Prior pilot research demonstrated that participants could identify place information (i.e. /b-d-g/) in reduced, 100 ms vowel-consonant-vowel (VCV) segments excised from read words lists from the Arizona English Recording Corpus. The purpose of the current study is to investigate whether relative formant deflection patterns, a potential model of acoustic invariance proposed by Story and Bunton (2010), are capable of predicting listeners’ perceptions of place information in acoustically continuous, voiced stop consonants. Listeners identified speech stimuli simulated using a computational model of speech production and model parameters based on x-ray microbeam articulatory data from VCV utterances [Story, JASA, 2009]. 0 V Example Stimuli Introduction Figure 8: (Top) Identity curves averaged across all participants for the Forced Choice Test: /b/-(blue), /d/-(red), and /g/-(green). (Bottom) Contour plots depicting the relative formant deflection directions: upwards (red) and downwards (blue). The three panels correspond to F1 (lower panel), F2 (middle panel), and F3 (upper panel). The black lines indicate participants’ phonetic boundaries defined as a 50% crossover point on the ID curve. Acknowledgements This research was supported by the Grunewald Foundation Fellowship and NIH R01-DC011275.
© Copyright 2026 Paperzz