A proposal for labelling prosodic disfluencies in ToBI Alejna Brugos Stefanie Shattuck-Hufnagel Boston University MIT, RLE Speech Communication Group Introduction Proposal: New disfluency transcription conventions Motivation: Respond to increasing interest in prosody of spontaneous speech Proposed prosodic disfluency labelling scheme makes 2 important distinctions: • Based on proposal by Arbisi-Kelm [1, 2] for transcribing speech with particularly high rates of disfluency (stuttering); informed by other disfluency annotation systems [4, 6] - Annotates individual disfluency cues, e.g. separate markers for pause, prolongation, cut words - Does not require intuition about the nature of the well-formed target prosody • Prosodic disfluencies common in spontaneous speech - Interruptions/distortions of prosodic constituents - Due to morphosyntactic errors (e.g. substitutions) and planning difficulties Phenomenon Symbol prolongation --> Need for user-friendly system for transcribing prosodic disfluencies • Obligatorily relate disfluencies to well-formed prosodic structure, e.g. 1p, 2p, 3p - But: disfluencies often ambiguous re intended well-formed target • Don't distinguish different specific disfluency cues - E.g. pause, prolongation, interruption of constituent - Cues appear in a wide variety of combinations, not necessarily in a fixed relationship to e.g. 1p, 2p, 3p always, accompanied by prosodic disfluencies • Provides separate labels for wrong word or sound abnormal and/or incongruous prolongation of a segment withina word ps, psw silence 1) Separates prosodic disfluencies from morpho-syntactic speech errors • Morphosyntactic errors sometimes, but not Description pr disfluent pause Original disfluency conventions in ToBI [5,7,3] have several disadvantages 2) Separates various prosodic characteristics of disfluencies • More accurate, as characteristics co-occur variably - Pause with/without prolongation; abnormal and/or incongruous pause between (ps) or within (psw) words s end of a silence (whether disfluent-sounding or not) - Filled pauses with/without silent pauses; - Cutoff followed or not by restart of same word... filler f filled pause, filler words or segments (e.g. um, huh, mm) error e mispronunciation or wrong word cut c a partially completed word restart word rs restarting of a segment, syllable, word, after a word has been cut off restart phrase %r start of a new phrase after a previous phrase was not finished Summary of advantages: • More user-friendly: relieves labeller of obligation to guess at speaker intent • Enables studies of how cues do and don't co-occur This proposal is a work in progress; we welcome your input. For updates and more info, visit: tobihome.org • Disfluency labels are to be used in conjunction with break indices, where possible --> Results in challenges for labellers; disfluency markers often not used in prosodic transcription Examples: all shown at same time scale Discussion - When more than one symbol is needed, use . (period) as a delimiter. Eg. c.pr , 1psw.ps • These labels go in the break index tier, at the end of the word (right edge of interval), - Exceptions: rs and %r, which indicate that the word following that break index is part of a restart 500 ms References 300 f0 (hz) f0 (hz) 500 75 tones 50 all th e display s flights from Baltimore to <SIL> 1 1 1 3 1 3ps 1 1 1 3 1 2p s Dallas after leave which 0 4 1pr 1pr 3pr.ps 0 4 2p 2p 3p f0 (hz) 75 tones 75 tones s I cou(ld n’t) 0 1 I c.%r.rs 1 track lost 1 of whe the re 1 1 1 1 was the 1 going 1pr words 4 breaks s (new) breaks (new) 4 breaks (old) ginseng uh s 1f 4 2p 4 500 75 sorry words 4pr 2p f0 (hz) 500 p.m. 1pr s f0 (hz) 300 four <SIL> maybe put we 1 1 in it 1 1 tentive 1 <SIL > e.c.ps s.rs tentatively words 4 breaks (new) a across 3 <SIL> 1ps three potential s 1psw <SIL> 4 [1] Arbisi-Kelm, T. (2006), An Intonational Analysis of Disfluency Patterns in Stuttering. Ph.D. thesis, UCLA. [2] Arbisi-Kelm, T. (2010), Intonation structure and disfluency detection in stuttering, In C. Fougeron, et al (eds.) LabPhon 10. Berlin: De Gruyter Mouton, 405-432. [3] Brugos, A., Shattuck-Hufnagel, S. & Veilleux, N. (2006) MIT OCW 6.911: Transcribing Prosodic Structure of Spoken Utterances with ToBI. http://ocw.mit.edu/ courses/electrical-engineering-and-computer-science/6-911transcribing-prosodic-structure-of-spoken-utterances-withtobi-january-iap-2006/ [4] Maekawa, K., Kikuchi, H., Igarashi, Y. & Venditti, J. (2002) X-JToBI: An extended J-ToBI for spontaneous speech. ICSLP 2002. Denver, CO. [5] Nakatani, C. & Shriberg, E. (1993). Proposal for labeling disfluencies in ToBI. Presented at the 3rd ToBI Labeling Workshop, Ohio State University. [6] Rodríguez, L.J., Torres, I. & Varona, A. (2001) Annotation and analysis of disfluencies in a spontaneous speech corpus in Spanish, ITRW on Disfluency in Spontaneous Speech (DiSS'01), Edinburgh, Aug., 2001. [7] Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., Hirschberg, J. (1992): TOBI: a standard for labeling English prosody. In ICSLP-1992, 867-870. Acknowledgments: This project was supported in part by NSF Grant numbers: 0842782 and 0842912.
© Copyright 2026 Paperzz