Discussion Proposal: New disfluency - BU Blogs

A proposal for labelling prosodic disfluencies in ToBI
Alejna Brugos
Stefanie Shattuck-Hufnagel
Boston University
MIT, RLE Speech Communication Group
Introduction
Proposal: New disfluency transcription conventions
Motivation: Respond to increasing interest
in prosody of spontaneous speech
Proposed prosodic disfluency labelling
scheme makes 2 important distinctions:
• Based on proposal by Arbisi-Kelm [1, 2] for transcribing speech with particularly high rates
of disfluency (stuttering); informed by other disfluency annotation systems [4, 6]
- Annotates individual disfluency cues, e.g. separate markers for pause, prolongation, cut words
- Does not require intuition about the nature of the well-formed target prosody
• Prosodic disfluencies common in spontaneous
speech
- Interruptions/distortions of prosodic constituents
- Due to morphosyntactic errors (e.g. substitutions)
and planning difficulties
Phenomenon Symbol
prolongation
--> Need for user-friendly system for transcribing
prosodic disfluencies
• Obligatorily relate disfluencies to well-formed
prosodic structure, e.g. 1p, 2p, 3p
- But: disfluencies often ambiguous re
intended well-formed target
• Don't distinguish different specific
disfluency cues
- E.g. pause, prolongation, interruption of constituent
- Cues appear in a wide variety of combinations, not
necessarily in a fixed relationship to e.g. 1p, 2p, 3p
always, accompanied by prosodic disfluencies
• Provides separate labels for wrong word or sound
abnormal and/or incongruous prolongation of a segment withina word
ps, psw
silence
1) Separates prosodic disfluencies from
morpho-syntactic speech errors
• Morphosyntactic errors sometimes, but not
Description
pr
disfluent pause
Original disfluency conventions in ToBI [5,7,3]
have several disadvantages
2) Separates various prosodic characteristics of
disfluencies
• More accurate, as characteristics co-occur variably
- Pause with/without prolongation;
abnormal and/or incongruous pause between (ps) or within (psw) words
s
end of a silence (whether disfluent-sounding or not)
- Filled pauses with/without silent pauses;
- Cutoff followed or not by restart of same word...
filler
f
filled pause, filler words or segments (e.g. um, huh, mm)
error
e
mispronunciation or wrong word
cut
c
a partially completed word
restart word
rs
restarting of a segment, syllable, word, after a word has been cut off
restart phrase
%r
start of a new phrase after a previous phrase was not finished
Summary of advantages:
• More user-friendly: relieves labeller of
obligation to guess at speaker intent
• Enables studies of how cues do and
don't co-occur
This proposal is a work in progress; we welcome
your input. For updates and more info, visit:
tobihome.org
• Disfluency labels are to be used in conjunction with break indices, where possible
--> Results in challenges for labellers; disfluency
markers often not used in prosodic transcription
Examples: all shown at same time scale
Discussion
- When more than one symbol is needed, use . (period) as a delimiter. Eg. c.pr , 1psw.ps
• These labels go in the break index tier, at the end of the word (right edge of interval),
- Exceptions: rs and %r, which indicate that the word following that break index is part of a restart
500 ms
References
300
f0 (hz)
f0 (hz)
500
75
tones
50
all th
e
display
s
flights
from
Baltimore
to
<SIL>
1
1 1
3
1
3ps
1
1 1
3
1
2p
s
Dallas
after
leave
which
0
4
1pr
1pr
3pr.ps
0
4
2p
2p
3p
f0 (hz)
75
tones
75
tones
s
I cou(ld
n’t)
0 1
I
c.%r.rs 1
track
lost
1
of whe the
re
1 1
1
1
was
the
1
going
1pr
words
4 breaks s
(new)
breaks
(new)
4 breaks
(old)
ginseng
uh
s
1f
4
2p
4
500
75
sorry
words
4pr
2p
f0 (hz)
500
p.m.
1pr
s
f0 (hz)
300
four
<SIL>
maybe
put
we
1
1
in
it
1
1
tentive
1
<SIL
>
e.c.ps
s.rs
tentatively
words
4 breaks
(new)
a
across
3
<SIL>
1ps
three
potential
s
1psw
<SIL>
4
[1] Arbisi-Kelm, T. (2006), An Intonational Analysis of Disfluency
Patterns in Stuttering. Ph.D. thesis, UCLA. [2] Arbisi-Kelm, T.
(2010), Intonation structure and disfluency detection in
stuttering, In C. Fougeron, et al (eds.) LabPhon 10. Berlin: De
Gruyter Mouton, 405-432. [3] Brugos, A., Shattuck-Hufnagel,
S. & Veilleux, N. (2006) MIT OCW 6.911: Transcribing Prosodic
Structure of Spoken Utterances with ToBI. http://ocw.mit.edu/
courses/electrical-engineering-and-computer-science/6-911transcribing-prosodic-structure-of-spoken-utterances-withtobi-january-iap-2006/ [4] Maekawa, K., Kikuchi, H., Igarashi,
Y. & Venditti, J. (2002) X-JToBI: An extended J-ToBI for spontaneous speech. ICSLP 2002. Denver, CO. [5] Nakatani, C. &
Shriberg, E. (1993). Proposal for labeling disfluencies in ToBI.
Presented at the 3rd ToBI Labeling Workshop, Ohio State University. [6] Rodríguez, L.J., Torres, I. & Varona, A. (2001)
Annotation and analysis of disfluencies in a spontaneous
speech corpus in Spanish, ITRW on Disfluency in Spontaneous
Speech (DiSS'01), Edinburgh, Aug., 2001. [7] Silverman, K.,
Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P.,
Pierrehumbert, J., Hirschberg, J. (1992): TOBI: a standard for
labeling English prosody. In ICSLP-1992, 867-870.
Acknowledgments: This project was supported in part
by NSF Grant numbers: 0842782 and 0842912.