The role of listener familiarity in the perception of dysarthric speech

CLINICAL LINGUISTICS & PHONETICS,
1995, VOL. 9, NO. 2, 139-154
The role of listener familiarity in the
perception of dysarthric speech
KRISTIN K. TJADEN* and JULIE M . LISS**
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
115 Shevlin Hall, University of Minnesota, Minneapolis, MN 55455, USA
(Received 30 November 1993; accepted 29 April 1994)
Abstract
The effects of two qualitatively different familiarization procedures on listeners’
perceptions of speech produced by a Korean woman with moderate-severe spasticataxic dysarthria were investigated. Thirty listeners were randomly assigned to one of
three listening conditions. Prior to this task, members of the two experimental listening
groups were familiarized with a speech sample produced by the woman with dysarthria.
Listeners assigned to the Paragraph group heard the speaker read a paragraph twice as
they followed along with a script. Listeners assigned to the Word List group heard the
speaker read the same words comprising the paragraph, but the words were presented
in a random order. This list was heard twice, as the listeners followed along with a script.
The Control group performed the transcription task without receiving prior exposure to
the dysarthric speech. The 30 listeners then orthographically transcribed sentences
produced by the dysarthric speaker. Results showed that listeners familiarized with this
dysarthric person’s speech pattern (Word List and Paragraph groups) performed
significantly better on the sentence transcription task than listeners who were not
familiarized (Control group). Although the Paragraph group did not significantly
outperform the Word List group, as was predicted, a trend in that direction did exist.
Keywords: dysarthria, speech perception, familiarization
Introduction
The human perceptual system is extraordinarily facile in its ability to select and interpret
information available from a complex acoustic signal (Pisoni and Luce, 1986; Engstrand,
1992). It is this proficiency that has permitted researchers to regard perception as somewhat
of a constant in speech intelligibility research. That is, intelligibility has been viewed largely
as a byproduct or function of the integrity of the speech signal, and the acoustic attributes
therein (see Weismer and Martin, 1992, for a review of intelligibility research). This
approach has provided information about the bottom-up component of speech processing,
in which signal characteristics drive perception. This causal relationship has been explored
in studies of deaf speech (e.g. Maassen and Povel, 1985; Metz, Samar, Schiavetti, Sitler
and Whitehead, 1985; Monsen, 1978), dysarthric speech (e.g. Kent, Weismer, Kent and
Rosenbek, 1989; Tikofsky, Glattke and Tikofsky, 1966), and synthesized speech (e.g. Nye
and Gaitenby, 1973).
*Currently affiliated with the University of Wisconsin-Madison.
**Currently affiliated with Arizona State University.
0269-9206/95 $10.00 0 1995 Taylor & Francis Ltd.
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
140
Kristin K. Tjaden and Julie M . Liss
However, there is reason to believe that listeners confronted with a degraded speech
signal must invoke more top-down or higher-level cognitive processing to decipher and
reconstruct the message than would be necessary for the processing of non-degraded speech
(Duffy and Pisoni, 1992; Greene and Pisoni, 1988; Pisoni and Luce, 1986). Numerous
variables have been identified that could influence this process, including word-frequency
effects (Rosenzweig and Postman, 1957), semantic or syntactic predictability (Duffy and
Giolas, 1974; Giolas, Cooker and Duffy, 1970; Hammen, Yorkston and Dowden, 1991),
the amount of available contextual information (Kreider, 1988), and a person’s own prior
listening experience (Platt, Andrews, Young and Quinn, 1980). A successful model of the
perception of degraded speech must be able to accommodate the interaction between listener
variables and signal attributes.
Familiarity and synthetic speech
A first step towards this goal is to identify and define potent listener variables that affect
the listener-signal interaction when the speech signal is substantially degraded. This task
is fairly straightforward when speech signal attributes are well specified, as with synthesized
or digitally filtered sfjeech. For example, consider the hypothesis that listeners can decipher
a degraded signal more readily if they have had prior exposure to it (listener familiarization).
Greenspan, Nusbaum, and Pisoni (1988) evaluated the effects of various listener
familiarization training procedures on the intelligibility of synthetic speech. They found that
listeners who received prior exposure to the Votrax speech signal outperformed those
listeners who did not receive the same training. Perhaps more importantly, these
investigators found a relationship between certain aspects of the training procedure and
listener benefits. Listener performance patterns were influenced by the type of training
(sentence versus word level), the extent of training (single exposure versus repeated
exposure), and the type of listening task subjects were asked to perform (sentence versus
single-word transcription). It was suggested that training at the sentence level may be
superior to word level because it provides the listener with information about prosodic
patterns that can facilitate segmentation of the acoustic stream into words.
The Greenspan et al. (1 988) study is an example of how listener performance, and
perhaps perceptual strategies, can be altered differentially with various familiarization
training procedures. The listener variable of familiarization was manipulated, and the
resulting listener perceptions were interpreted relative to the information available in the
synthetic speech signal. The assumption is that, by knowing the acoustic information
available to listeners in the speech signal and by combining this with knowledge of how
the listener was familiarized with the speech signal, it is possible to hypothesize as to the
nature of the constructive, top-down perceptual strategies that listeners invoked to interpret
the message. Central to Greenspan et al.’s (1988) study is that the acoustic patterns of
synthesized speech are systematic, consistent, and easily characterized. The examination
of listener variables (e.g. listener familiarization) is less straightforward when the speech
signal is degraded in a non-systematic way, as is the case with dysarthria.
Familiarity and dysarthric speech
Although one would expect from anecdotal evidence and clinical experience that
familiarization would improve a listener’s ability to understand dysarthric speech, the few
attempts to document the phenomenon have met with limited success. There are at least
two possible reasons for this, including the heterogeneity of dysarthric speech, and
Listener familiarity
141
methodological differences involving various definitions and levels of familiarity. These
reasons are discussed in turn.
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
Heterogeneity of dysarthric speech
Dysarthric speakers comprise a highly heterogeneous group (Yorkston and Beukelman,
1980). Even people whose speech is regarded as representative of a specific dysarthric
subtype have speech patterns that reflect some combination of the motor speech disorder,
the idiosyncrasies of their premorbid speech style, and their attempts to campensate for their
loss of intelligibility. The picture is complicated further by differences in severity of speech
impairment, and by the lack of stability or consistency of certain segmental and
suprasegmental error patterns.
The fact that there are no ‘typical’ dysarthric speakers is perhaps the greatest barrier
to developing generalizations regarding the perceptual processes listeners use to decipher
dysarthric speech. Dysarthric speech differs on a variety of levels, and we do not yet know
which of these levels might be associated with certain preferences in perceptual strategies.
It is quite possible that listeners invoke different perceptual strategies for different dysarthric
speakers, selecting the strategies that are most efficient and effective. Until the phenomenon
of familiarization is better understood, the variables on which to categorize groups of
speakers in order to obtain generalizable results will be virtually unknown. Only then can
predictions about potentially potent speech signal variables, such as severity of impairment
and pattern of errors, be explored.
Methodological differences among familiarization studies
The term ‘familiarization’can be defined along a variety of continua, including the duration
of the exposure, the type of material that is used, and whether feedback is provided. The
present investigation examined the effects of a brief prior exposure to dysarthric speech on
listeners’ ability to transcribe a series of sentences produced by that dysarthric speaker.
Because listeners were permitted to follow along on a written transcript while they heard
the dysarthric speaker’s familiarization sample, it was hypothesized that they would learn
something about her articulatory and prosodic patterns, and consequently be more
successful in a transcription task than those listeners who did not receive prior exposure.
The method used in the present investigation can be regarded as a form of specijc
familiarization training, because listeners were given the opportunity to hear a sample of
a specific person’s speech before transcribing sentences produced by that speaker. By
contrast, previous studies have examined the effects of more general forms of
familiarization, such as exposure to a group of disordered speakers, or prior experience with
a particular disorder rather than with a specific individual.
Yorkston and Beukelman (1983) investigated the effects of a more general form of
familiarization in perception of dysarthric speech. Listeners in the experimental groups were
exposed to the speech of several dysarthric speakers. Nine dysarthric subjects produced two
sets of sentences (List I and List 11). Nine listeners, who were assigned to one of three
familiarization conditions, transcribed these sets of sentences. The first listener group,
which received no familiarization, transcribed List I and then transcribed List I1 2 weeks
later. The second group transcribed List I, listened to the List I sentences three more times
without feedback, and then immediately transcribed List 11. The third group transcribed List
I, listened to the List I sentences three more times while following along with an accurate
transcript, then transcribed List 11. Results indicated that neither familiarization group
Kristin K. Tjaden and Julie M.Liss
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
142
benefited from the exposure to the dysarthric speech, nor did they perform better than the
non-familiarized group.
Yorkston and Beukelman interpreted these findings as evidence that judges could be
used repeatedly to transcribe intelligibility samples from the same speakers, without the
threat of artifactual increases in intelligibility scores. However, random assignment of
listeners to the three conditions resulted in the most experienced listeners being assigned
to the ‘no-familiarization’ condition. As the investigators noted, this may have accounted
for the higher prefamiliarization intelligibility scores of the ‘no-familiarization’ group
speakers. Thus, differences in listener experience may have obscured any specific
familiarization effects. This possibility is supported by evidence from Platt et al. (1980),
who reported performance differences between experienced and naive listeners in the
transcription of speech produced by people with cerebral palsy (see, however, Hunter, Pring
and Martin, 1991, for counter-evidence in this same population). Monsen (1983) described
similar results for the perception of deaf speech, wherein people who had personal or
professional experience listening to deaf speech performed a transcription task with higher
accuracy than those listeners who had little or no prior experience with deaf speech. These
mixed results suggest that familiarization is a complex phenomenon that goes beyond
allowing a listener to simply map the acoustic-phonetic structure of the speech signal onto
prototypical phonemes (see Verbrugge, Strange, Shankweiler and Edman, 1976).
Because so very little is known about the variables that influence the perceptual
processing of dysarthric speech, the present investigation explored the phenomenon of
familiarization as it pertained to one sample of dysarthric speech. The goals were to
determine whether listeners familiarized with a dysarthric person’s speech benefited from
the exposure, and whether the form of the familiarization material had an impact on degree
of perceptual benefit. It was hypothesized that listeners who were exposed to the speaker’s
sentence-level prosody and inter-word coarticulation would perform better on a sentence
transcription task than those listeners who were exposed only to a list of single words.
Method
Speech sample
A speech sample was collected, as part of a larger investigation, from a dysarthric speaker.
This speaker was a 26-year-old Korean woman with cerebral palsy and a spastic-ataxic
dysarthria of moderate-severe involvement. Because the purpose of this study was not to
generalize the results of the listening experiment to the perception of all dysarthric speech,
but to explore the phenomenon of familiarization with a single dysarthric speech pattern,
any dysarthric speaker with substantially reduced intelligibility’ would have provided
suitable stimuli to test the impact of the two familiarization procedures. The benefit of using
this particular speaker was the opportunity to assess listener responses to both relatively
variable and relatively consistent error patterns. In this case the variable articulatory errors
resulted from the spastic-ataxic dysarthria, and the more consistent error patterns were
judged to be related to the speaker’s acquisition of English as a second language (ESL).
This distinction is important, especially in this initial attempt to document familiarization
as it is defined herein.
The Assessment of Intelligibility of Dysarthric Speech (Yorkston and Beukelman,
1981), as scored by a certified speech-language pathologist unfamiliar with the speaker,
revealed a mean single-word intelligibility of 46%, and a sentence intelligibility range of
0-85% with a mean of 32.7%. A motor speech examination, conducted by the authors,
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
Listener fumiliarity
143
revealed substantial deficits in speech articulation, and in respiratory-laryngeal control.
Prosodic patterns were characterized by inappropriate loudness changes, short phrases, and
inappropriate pauses. Some of the more variable articulatory errors were related to the
breakdown in respiratory-laryngeal control. These included the intermittent devoicing of
voiced consonants and vowels, resulting in a whispered quality for some speech segments.
Although devoicing of voiced phonemes may be considered to be a result of this speaker’s
ESL errors, the fact that she was unable to produce a sustained vowel for more than several
seconds suggests the devoicing to be a laryngeal deficit resultant of the dysarthria.
Subjectively, this woman’s low intelligibility entirely masked the fact that she was a
non-native English speaker. However, some of her articulatory errors were believed to be
the result of her acquisition of English as a second language. These more consistent errors
included her tendency to substitute [I] for /r/ in the word-initial position.
It should be noted that the Korean language uses two variants of the liquid /1/ including
a lateral [I] in word-final position and flap [r] in word-initial position (Kim, 1990). Thus,
it appears that substitution of word-initial [I] for /r/ would not be predicted based upon
transfer of native Korean contrasts to learning English as a second language. However,
Major (1 994) notes that a speaker’s underlying representation or mental representation of
a particular sound may reflect the speaker’s native language, the second language, or
something intermediate. Furthermore, the processes that act on the underlying representation to produce a surface form may be the same as the native or second language, or may
be different from both. To illustrate, Major (1994) gives the example that a Korean learner
of English has an underlying representation of English /r/ that is unlike the native Korean
flap [r] or the English /r/; instead, when [r] is produced it may sound to English speakers
as intermediate between English /r/ and A/. Thus, the interaction between native and
non-native contrasts is complex and may not be predictable across speakers. For our
purposes we were interested in distinguishing between variable dysarthria errors and
consistent (ESL) errors. The mechanism by which the consistent errors were occurring was
not of specific interest in the present investigation. However, Major’s (1994) observations
suggest that our Korean speaker’s substitution of [l] for /r/ word-initially was related to her
acquisition of English as a second language in some complex manner.
Audio recordings of the speech samples were made in a quiet room using a Tascam I 12
audio recorder, a BK-1 Electrovoice condenser microphone mounted on a table-top
microphone stand, and high-fidelity recording medium. For the present study, three audio
tapes were constructed over two 1-hour sessions. The first tape (Transcription) consisted
of sentences that were used as the stimuli for the sentence transcription task performed by
all listeners in this investigation. The second and third audio tapes (Word List and
Paragraph) were used for the familiarization procedures to which some listeners were
exposed. The dysarthric speaker, who was a graduate student proficient in English, read
through all of the speech material several times before the audio taping, to increase her
familiarity with the material. Utterances judged to be poorly read during the taping (read
with dysfluencies or word-level additions or omissions) were repeated.
The Transcription tape contained 48 six-word sentences2, including 16 questions, 16
declaratives, and 16 imperatives (see Table 3). The sentences were created by the
investigators and were designed to sample the speaker’s productions of a wide variety of
phonemes and a range of prosodic variation. The speaker was instructed to produce these
sentences in her customary way. The Paragraph tape was created by asking the speaker to
read a paragraph (see Appendix) that consisted of 12 six-word sentences (72 words). The
Word List was obtained by asking the speaker to read a list of 72 words. This list consisted
of those words contained in the Paragraph tape arranged in a randomized order. Thus, the
Kristin K. Tjaden and Julie M . Liss
144
corpora of words in the Paragraph and Word List tapes were identical. The purpose of
creating two familiarization tapes identical in content but not form was to identify
differential benefits between prior exposure to sentence-level prosody and inter-word
coarticulation (Paragraph tape), and information about word-level articulation patterns only
(Word List tape).
None of the sentences in the Paragraph tape was identical to any sentences on the
Transcription tape. Overlap in content was restricted to common, frequently occurring
words (e.g. ‘the, is, it, off, can, to’).
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
Subjects
Thirty female3college students from the University of Minnesota served as subjects for this
investigation. Subjects reported normal hearing, Standard American English as a first
language, and little or no experience listening to dysarthric speech. The listening task took
approximately 45 minutes, and subjects were compensated for their time.
Listening task
The 30 subjects were randomly assigned, in equal numbers, to one of the three listening
groups: Control, Word List, or Paragraph. One to five subjects belonging to the same group
were assembled in a room equipped with a master tape deck and rows of study carrels that
contained high-quality headphones (University of Minnesota Language Laboratory). This
setting was ideal for data collection because the study carrels reduced visual distractions
and prohibited subjects from being influenced by the written responses of others. General
instructions were identical for all listening groups. Subjects were told they would be hearing
the speech of a woman whose first language was not American English, and who had
suffered damage to her nervous system that affected the way she could use her muscles to
talk. Subjects were told that they would hear her produce a list of six-word sentences, each
sentence followed by a 20-second pause. During this pause, listeners should write what they
heard on an answer sheet. Listeners were encouraged to guess if they were unsure of the
speaker’s utterance.
Immediately prior to performing the sentence transcription task, members of the Word
List and Paragraph groups participated in similar familiarization procedures. Subjects in
the Word List familiarization group were given a written list of words corresponding to the
content of the Word List tape. Subjects followed along with this written list while they
listened to the Word List tape twice. The Paragraph familiarization procedure was
conducted in essentially the same way. However, these subjects followed a written script
of the paragraph while listening to the Paragraph tape twice. Subjects in the familiarization
groups did not transcribe their respective familiarization tapes, but only followed along with
the written scripts while they listened to the tape. Listeners in the Control group did not
participate in any familiarization procedure. Table 1 contains a summary of the
experimental conditions and procedures.
Data analysis
Two scores were calculated from the listener response sheets and were used for statistical
and descriptive purposes. Each score and its method of calculation will be described in detail
below. In addition, scores were calculated for individual listeners and then averaged within
experimental group.
Listener familiarity
145
Table I . Summary of the experimental groups and procedures
Listening groups
Familiarization procedures
Task
Control (n = 10)
None
Transcribe 48 six-word sentences
Word List (n = 10)
Transcribe 48 six-word sentences
Hear Word List tape twice;
follow along with written list
Paragraph (n = 10)
Hear Paragraph tape twice; Transcribe 48 six-word sentences
follow along with written script
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
~
~
Words-correct score
The number of words correctly transcribed of the six words in each sentence was tallied
and then summed across the 48 sentences, resulting in a words-correct score. The possible
range of this score was from 0 words correctly transcribed to 288 (48 sentences X 6 words)
words correctly transcribed. A word was considered to be transcribed correctly if the
listener’s response exactly matched the target word; that is, the word the speaker was
instructed to read.
Total word-response score
A total word-response score was obtained by tallying all of the words, correct and incorrect,
that a listener attempted to transcribe. This score had a range of possible values from 0 to
288. Because listeners were instructed to ‘guess’ if unsure of the speaker’s productions, it
was expected that this score would be considerably higher than the words-correct score.
The purpose of this score was to identify any differences in response rates across listener
groups.
Item analysis
An item analysis was conducted to assess if all groups performed similarly for each of the
48 sentences. This assessment was designed to identify inherent differences in intelligibility
among the sentences, and to reveal any item-specific differences in group performance.
Error pattern assessment
Finally, a qualitative assessment of listener error patterns was accomplished by comparing
listener responses with selected perceptual/acoustic attributes of the 48 stimuli sentences.
Toward this end, the two investigators performed a broad phonetic transcription of the 48
sentences using selected diacritics. Because of the highly distorted nature of the speech,
broad-band (300 Hz) spectrographic displays created on a Kay 5500 Workstation were
used in conjunction with the phonetic transcription to yield a perceptuaUacoustic
characterization for each of the 48 stimuli sentences. These spectrograms served as an
additional source of information to help explain the perceptual judgements made by the
listeners. This error pattern analysis was undertaken to identify and compare perceptual
errors made across listener groups.
Kristin K . Tjaden and Julie M. Liss
146
Table 2. Summary data for experimental groups. Mean words-correct
were calculated on a possible 288 words for each subject. Total
response values are the means of the absolute number of words
(correct f incorrect) transcribed by each group.
Group
No. of words correct Total no. of word responses
Control
Mean
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
SD
Word List
Mean
SD
84.1
10.5
2 19.2
37.3
99.5
16.5
229.0
36.4
110.2
13.5
238.9
29.1
Paragraph
Mean
SD
Results
Parametric results
Table 2 contains performance data for each of the three listener groups. A one-way analysis
of variance (ANOVA) of the words-correct scores revealed significant differences among
these group means F(2,29) = 9 . 1 3 3 3 ; ~< 0.001. In addition, post hoc Newman-Keuls tests
identified significant differences between the Control and Word List groups (q = 3.547;
p < 0.05); and between the Control and the Paragraph groups (q = 6.012; p < 0.05). The
1 0-point difference between the Paragraph and Word List group means was not statistically
significant.
The second data column of Table 2 contains the group means of the total word-response
score. A one-way ANOVA of these total word-response scores revealed no significant
differences among the three group means. This indicates no differences in response rate
among the three groups.
Descriptive results
Item analysis
Table 3 contains the target sentences and the total number of words correctly transcribed
for each sentence by group. The first data column contains total words-correct values in
which the Paragraph (P) group performed better than either the Word List (W), or Control
(C) groups (P > W,C). For example, sentence no. 2 (Open the door for your mother)
indicates that the Paragraph group had 50/60 correct word transcriptions for this sentence,
which exceeded both the 49 and 41 correct words transcribed by the Word List and Control
groups, respectively. This trend, in which the Paragraph group outperformed both the Word
List and Control groups, occurred for 2.5 of the 48 sentences. The second data column
contains total words-correct values for which the Word List performed best (W > P,C). This
pattern of performance held for 14 of 48 sentences.
The item analysis also revealed a range of inherent intelligibility across sentences. Eight
sentences elicited group sums across all listener groups (see Table 3) that met or exceeded
30/60 correct (50%), including sentences 2, 8, 10, 12, 15, 19, 27, and 46. These high
147
Listener familiarity
Table 3. Total number of words correctly transcribed for each sentence by group (Paragraph
( P ) Word List (W), Control ( C ) ) are provided. The,first data column contains values,for
which the Paragraph group outperformed the Word List and Control groups; the second
data column contains values for which the Word List group outperformed the other two.
Total values at the end of these data columns represent the number of times in which these
patterns occurred.
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
Sentence
P >w , c
Old shoes are usually most comfortable.
Open the door for your mother.
Turn off the television before bed.
Shall we go out for lunch?
Who is your college major advisor?
Write your name on the paper.
Was it raining outside yesterday morning?
Too much coffee makes me jumpy.
Will she go to the movie?
What time are we going home?
Throw the garbage in the dumpster.
When do you get to work?
Move your books off the table.
Wash the dishes and dry them.
When can I go to sleep?
Feed the dog canned food only.
Will the mail be here soon?
Take the children to the zoo.
Be sure to lock the door.
Rain may ruin our spring picnic.
Bring a friend to the party.
Who won the game last night?
The husband and wife got divorced.
Are we having steak for dinner?
Let me carry that heavy package.
Working allows people to earn money.
Can you come for dinner tonight?
Please walk the dog for exercise.
Biking is a common summer sport.
Order a sausage pizza for dinner.
Green plants supply oxygen for us.
Will you dust the furniture?
Are flat tyres difficult to change?
The girl bought rice and potatoes.
Water the plants twice a week.
Glass shattered all over the counter.
Kittens sleep under beds and chairs.
Turn the radio to another station.
Find the recipe for cherry pie.
Indian jewellery sells well in shops.
Where will the band perform next?
Fans keep rooms cool during summer.
Chocolate cake is sweet to eat.
Is the shopping centre still open?
Telephones allow two people to communicate
How many brothers do you have?
The gravel crunched beneath our feet.
Put the files in the cabinet.
Total
8>3>0
50 > 49 > 41
46 > 26 > 20
w > P,C
14>9>3
10>8>4
57 > 33 > 26
9>6=6
46 > 40 > 34
39 > 38 > 29
51 =
51 =
20>18>17
46 > 44 > 42
48 > 34 > 27
1 8 > 9 < 14
44 > 38 < 4 1
35 > 22 < 32
13>5<7
O=
48 > 43 > 42
O=
23>22>13
37 > 28 < 32
38>31 =31
12> 1 0 > 4
46> 37 > 35
31 > 2 9 > 18
7>0=0
24>19>8
8>4>2
29>6>5
17>6<8
6=
6>5>0
6>5>4
6>
5>1=1
27> 16> 15
12>9>6
16> 10< 15
8>6>4
32> 18 < 19
3 6 > 2 7 > 18
58 > 52 > 50
7>6>2
25
14
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
148
Kristin K. Tjaden and Julie M. Liss
intelligibility sentences were at the upper end of this speaker’s intelligibility range, as
defined by her performance on the Assessment of Intelligibility of Dysarthric Speech
(Yorkston and Beukelman, 1981). Similarly, 12 sentences elicited group sums across all
listener group: that fell at or below 15% correct (9/60), including sentences 1, 7, 20, 29,
3 I , 34, 35, 36, 37, 40, 43, and 47. These were identified as low intelligibility sentences.
The differences between the high and low intelligibility sentences cannot be explained
entirely by any single attribute; however, sentence length may play a role. For example,
the high intelligibility sentences contained one- and two-syllable words and typically
contained less than eight total syllables. In contrast, many of the low intelligibility sentences
contained three-syllable words and had more than eight syllables total. One interpretation
of these findings is that longer sentences may have been more difficult for the speaker to
produce, and listeners would be challenged in distinguishing word boundaries from syllable
boundaries.
The item analysis also revealed that none of the listener groups displayed a learning
effect across the task. Comparison of performance on items 1-25 was not different from
performance on items 26-50 for any group. The absence of feedback regarding accuracy
during the transcription task, combined with the speaker’s generally low intelligibility, may
have prevented listetfers from learning more about their speech on-line.
Error pattern assessment
Although statistical analysis failed to reveal a significant difference between the two
familiarization procedures, descriptive results point to a pattern of performance in which
the Paragraph group outperformed the Word List group, with both groups outperforming
the Control group. For example, the words-correct scores of Table 2 indicate a consistent
hierarchy of performance in which the Paragraph group exceeded the Word List group,
which in turn surpassed the Control group. The item analysis presented in Table 3 shows
that the Paragraph group outperformed both the Word List and Control groups in 25 of 48
sentences. Also, of the 480 sentence responses per group (48 sentences X 10 listeners per
group), the Paragraph group transcribed 61 sentences without error while the Word List
and Control groups transcribed 48 and 37 sentences completely without error, respectively.
These results could reflect the importance of exposure to patterns of sentence-level prosody
and inter-word coarticulation in providing cues for segmenting the acoustic stream into its
lexical components (Greenspan et ul., 1988; see also Verbrugge et al., 1976).
To further illustrate differences in performance between the Paragraph and Word List
groups for individual sentences, consider listener responses for sentence no. 3 (Table 3).
Table 3 indicates that the Paragraph group outperformed the other two groups for number
of words correctly transcribed (46 > 26 > 20). Although these group differences barely
reach statistical significance,the qualitative differences in response patterns across the three
groups are remarkable. Table 4 shows response differences across listener groups for
sentence no. 3.
First, it is noteworthy that four members of the Paragraph group, three of the Word List
group, and two of the Control group transcribed this sentence without error. Also, four
additional members of the Paragraph group correctly transcribed the words ‘turn off the’,
and five additional members correctly transcribed ‘before bed’. By comparison, only one
additional member from both the Word List and Control groups correctly transcribed
‘before bed’ and the rest of the sentence transcription for each of these subjects (e.g. ‘Put
away potato chips’ for the Word List member and ‘Take the temperature’ for the Control
member) was more different from the target as compared to the Paragraph group members
Turn off the television before bed
Turn off the television before bed
Turn off the television before bed
Talk to all the people there
People people bad
Talk with the teacher people there
Put away potato chips before bed
In the picture people
The picture be perfect
The television
Word List (26 correct)
Control (20 correct)
Turn off the television before bed
Turn off the television before bed
I be prepared
Try the taping people
Carry on the baby
Tunnel vision of picture people vary
Trying to hold people back
Take the temperature before bed
Turn the tape recorder people back
Picture before that
Numbers in parentheses indicate the tctal number of words correctly transcribed by each group.
Turn off the television before bed
Turn off the television before bed
Turn off the television before bed
Turn off the television before bed
Turn off the light before bed
Turn off the light before bed
Turn off the tape before bed
Turn off the temperature before bed
The day begins before bed
[No response]
Paragraph (46 correct)
Table 4. Responses of listeners to sentence no. 3, ‘Turnoffthe television before bed’. Words scored as correctly transcribed are typeset
in bold.
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
Kristin K. Tjaden and Julie M . Liss
150
Table 5. Examples of errors that were judged to be primarily
phonetic (P), or closely related to the acoustic structure of the
distorted production; and those that were judged to be
semantically ( S ) related to other percepts in the utterance
rather than to the acoustic structure of the production.
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
~~
~
Target:
Responses:
Too much coffee makes me jumpy.
Too much coffee makes me grumpy (P)
Too much coffee makes me dumpy (P)
Too much company (P) can be exhausting (S)
Target:
Responses:
Put the j l e s in the cabinet
Put the pie (P) in the oven (S)
Put the fire (P) in the chimney (S)
(e.g. ‘Turn off the light’, ‘Turn off the light’, ‘Turn off the tape’, Turn off the temperature,’
and ‘The day begins’). This suggests that the type of familiarization procedure administered
has an impact on a listener’s ability to identify word boundaries in the acoustic stream.
Inspection of the listener error patterns also revealed that subjects across groups made
similar types of perceptual errors, including phonemic and semantic. Examples of these
types of errors are presented in Table 5. Phonemic errors were assumed to be a direct
byproduct of the distorted acoustic signal. The transcriptions ‘grumpy’ and ‘dumpy’ for the
target word ‘jumpy’ are plausible given the nature of the distortion produced by this speaker,
in which the
was realized as a distorted stop rather than an affricate. In contrast,
‘exhausting’ is not a plausible substitution for ‘jumpy’; rather is appears to be related to
the earlier percept of ‘company’; Similarly, ‘oven’ and ‘chimney’ do not resemble this
speaker’s production of the target, ‘cabinet’ but they are semantically related to ‘pie’ and
‘fire’.
Although no striking differences between groups were noted relative to phonemic and
semantic error patterns, a third error category suggests a particular benefit of the
familiarization procedure. This included misperceptions that occurred relative to the
speaker’s consistent (ESL) consonant substitution patterns. Recall that the speaker in this
investigation produced both variable and consistent segmental errors. Variable errors were
predominantly inappropriate voicing and devoicing, while consistent errors includcd the
word-initial substitution of [l] for /r/. In cases of this consistent speaker error, subjects in
the Control group were more likely than members of the familiarization groups to rely on
the surface phonetic form to interpret these consonant substitutions (see Duffy and Pisoni,
1992), while members of the Word List and Paragraph groups responded more variably to
these consistent speaker errors. That is, they did not always rely on the acoustic signal, but
presumably called upon their knowledge of the speaker’s articulatory patterns to sometimes
override the acoustic evidence.
Table 6 illustrates this differential group performance in response to an ESL consonant
error. Here the target word ‘write’ was produced by the speaker as ‘light’ according to the
broad transcription and acoustic evidence. Six of nine Control group listeners who
attempted a transcription of this word interpreted the initial consonant as /I/.
In contrast,
14 of 17 listeners from the two familiarization groups who attempted a transcription of this
word perceived the initial consonant as /r/.
/a/
151
Listener familiarity
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
Table 6. Listener responses to an Englisli-as-a-second language
consonant error produced by the speaker (Target: ‘write’ /lait/)
Control (CI-CIO)
Word List (WI-WlO)
Paragraph (PI-PIO)
lie
light
live
let
let’s
let’s
write
write
write
write
write
write
write
write
write
write
write
write
write
write
write
write
write
what’s
light
learning
‘no response’
Discussion
The results of the present investigation indicate that listeners familiarized with a single
dysarthric person’s speech pattern (Word List and Paragraph groups) performed
significantly better on a sentence transcription task than listeners who were not familiarized
with her speech (Control group). Results of the statistical analysis did not support the
prediction that the form of the familiarization procedure provides differential perceptual
benefits to the listeners. That is, the Paragraph group did not significantly outperform the
Word List group.
Despite the absence of a statistically significant.difference in performance between the
Paragraph and Word List groups, the Paragraph group consistently received higher scores
than the Word List group on the measures made in this investigation. This included the mean
number of words accurately transcribed, the total number of correct and incorrect words
transcribed by each group, and the total number of sentences transcribed without error.
Assessment of error patterns, such as the responses in Table 4, suggests that differential
benefits of the Paragraph procedure may need to be captured by a systematic qualitative
analysis.
Although the trend in which the Paragraph group outperformed the Word List
group was pervasive, there is some question whether there was an additional perceptual
benefit for perceiving segmental acoustic information. Recall the lack of quantifiable
benefit (Table 6, ‘write’) for the Paragraph group on the lexical items targeted herein.
There was also no notable difference in the types of perceptual errors that were committed
by the three groups. All groups made errors that were related to the surface characteristics
of the acoustic signal (phonemic), and errors that were related to other percepts in the
utterance but did not resemble the features of the acoustic signal (semantic). It was not
possible in the present investigation to draw definitive conclusions about the nature of the
relationships between signal integrity, error type, and perceptual strategies. However, it is
expected that utterance characteristics such as word frequency, semantic predictability, and
syntactic complexity may also play a role. These types of variables should be controlled
in future investigations.
Because so little is known about the variables that affect the processes underlying the
perception of dysarthric speech, the results of the present investigation must be considered
only relative to the conditions specific to this study. A rather limited range of semantic and
syntactic sentence structures were produced by the speaker, and it is likely that
manipulations of these two factors would influence transcription results. Specifically, it is
probable that less complex semantic and syntactic structures, as used in the present
Kristin K. Tjaden and Julie M.Liss
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
152
investigation, would place the least amount of burden on the speaker’s productive ability
and the listener’s perceptual abilities. It is likely that the combination of speaker, listener,
and procedural characteristics permitted the effects of familiarization to be measurable and
significant.
This investigation used a single speaker with severe intelligibility deficits whose speech
contained both consistent and inconsistent error patterns. Although it is possible that the
relative consistency of some of her articulatory errors enhanced the effects of
familiarization, the severity of her dysarthria renders it highly unlikely that the
familiarization effects demonstrated here were attributable solely to these consistent
substitutions. A more plausible interpretation is that the phenomenon of familiarization, as
it was defined here, provided perceptual benefits to listeners. These perceptual benefits
derived from exposure to substantially unintelligible dysarthric speech containing both
consistent and variable error patterns. It would be expected that familiarization would not
have similar effects for all speech patterns, all dysarthria types, and all levels of speech
impairment (see also Hunter et al., 1991).This is supported by the fact that the item analysis
revealed some sentences that were ‘highly intelligible’ and others were ‘unintelligible’ for
all groups; thus, there may be certain floor and ceiling areas of intelligibility in which the
effects of familiarization are less robust.
The characteristics of the listeners used in this investigation should also be considered.
The listeners were female, educated, and had little or no experience listening to dysarthric
speech. The gains in performance afforded by the familiarization procedure might not be
significant for a group of experienced listeners. Experienced listeners also may possess
fine-tuned perceptual strategies that tend to be less speaker-specific.
Finally, the familiarization procedures used here were specific in nature: a single speaker
was used, and feedback about the speaker’s word targets was provided to the listeners in
the form of a script. No feedback about their performance was provided to the listeners
during the actual transcription task. This final point may account for the absence of an
apparent learning effect across the task; that is, this speaker’s low intelligibility did not
permit listeners to capitalize on their exposure to her speech during the task.
Subsequent research on the phenomenon of familiarization should attempt to control
and manipulate the variables identified in this investigation. The results of such studies
could have a profound impact on clinical practice, particularly in the areas of intelligibility
testing, the determination of progress during treatment, and the development of treatment
strategies that target the listener (such as the spouse or caregiver) in addition to the dysarthric
patient.
Acknowledgements
Portions of this paper were presented at the Conference on Motor Speech Disorders and
Speech Motor Control, Boulder, CO, 1992. The authors thank Gary Weismer and Keith
Kluender for their comments on an earlier version of this manuscript. This work was
supported by the Bryng Bryngelson Fund of the University of Minnesota Department of
Communication Disorders. Requests for reprints should be directed to Kris Tjaden 437
Waisman Center, 1500 Highland Ave, Madison, Wl 53705.
Notes
1. It was expected that the transcription task used here to document the impact of the familiarization
procedure might not be sufficiently sensitive to capture the potentially smaller effects for speakers
with very high or very low intelligibility.
Listener familiarity
153
2. The Transcription tape actually contained a total of 50 sentences. However, after the speech
sample had been obtained, it was realized that two sentences contained five rather than six words.
Because subjects were told they would be transcribing six-word sentences, responses from these
sentences have not been included in the analysis. Thus, all analyses are based on 288 target words
rather than the intended 300.
3. Female subjects were chosen because of ease of recruitment from classes in the College of
Liberal Arts. It is an empirical question whether men and women perform differentially on the type
of listening task employed in this investigation.
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
References
DUFFY,J. R. and GIOLAS,T. G. (1974) Sentence intelligibility as a function of key word selection.
Journal of Speech and Hearing Research, 17, 631-637.
DUFFY,S. A. andP1s0~1,D. B. (1992) Comprehension of synthetic speech produced by rule: a review
and theoretical interpretation. Language and Speech, 35, 35 1-389.
0.(1992) Systematicity of phonetic variation in natural discourse. Speech CommuniENGSTRAND,
cation, 11, 337446.
GIOLAS,
T. G., COOKER,
H. and DLJFFY,
J. R. (1970) The predictability of words in sentences. Journal
of Audirory Research, 10, 328-334.
GREENE,
B. G. andPIsON1,D. B. (1988) Perception of synthetic speech by adults and children: research
on processing voice output from text-to-speech systems. In L. E. Bernstein (Ed.), The Vocally
Impaired: Clinical Practice and Research (Philadelphia: Grune & Stratton).
GREENSPAN,
S. L., NUSBAUM,
H. C. and PISONI,D. B. (1988) Perceptual learning of synthetic speech
produced by rule. Journal of Experimental Psychology: Learning, Memory, and Cognition,
14,421433.
V. L., YORKSTON,
K. M. and DOWDEN,
P. (1991) Index of contextual intelligibility: Impact
HAMMEN,
of semantic context in dysarthria. In C. Moore, K. Yorkston, and D. Beukelman (Eds),
Dysarthria and Apraxia of Speech: Perspectives on Management (Baltimore, MD: Paul H.
Brookes).
HUNTER,
L., PRING,
T. and MARTIN,
S. (1991) The use of strategies to increase speech intelligibility
in cerebral palsy: An experimental evaluation. British Journal of Disorders of Communication, 26, 163-174.
KENT, R., WEISMER,
G., KENT,J. and ROSENBEK,
J. (1989) Toward phonetic intelligibility testing in
dysarthria. Journal of Speech and Hearing Disorders, 54, 484499.
KIM,N. K. (1990) Korean. In B. Comrie (Ed.), The World’s Major Languages (New York: Oxford
University Press).
KREIDEK,
M. A. (1988) The effects of context and rate on the intelligibility of the conversational speech
of moderately disordered speakers (Unpublished MS thesis, University of WisconsinMadison).
MAASSEN,
B. and POVEL,D. J. (1985) The effect of segmental and suprasegmental corrections on the
intelligibility of deaf speech. Journal of the Acoiistical Society of America, 78, 877-886.
MAJOR,R. (1994) Current trends in interlanguage phonology. In M. Yavas (Ed.), First and Second
Language Phonology (San Diego, CA: Singular Publishing Group).
METZ,D. E., SAMAR,
V. J. SCHIAVETTI,
N. SITLER,R. W. and WHITEHEAD,
R. L. (1985) Acoustic
dimensions of hearing-impaired speakers’ intelligibility. Journal of Speech and Hearing
Research, 28, 345-355.
MONSEN,
R. (1978) Toward measuring how well hearing-impaired children speak. Journal of Speech
and Hearing Research, 21, 197-219.
MONSEN,
R. (1983) The oral speech intelligibility of hearing-impaired talkers. Journal of Speech and
Hearing Disorders, 48, 286-296.
J. H. (1973) Consonant intelligibility in synthetic speech and in a natural
NYE,P. W. and GAITENBY,
speech control (modified rhyme test results). Haskins Laboratories Status Report on Speech
Research, SR-33, 77-91.
PISONI,D. B. and LLJCE,
P. A. (1986) Speech perception: Research, theory, and the principal issues.
In E. C. Schwab and H. C. Nusbaum (Eds), Pattern Recognition by Humans and Machines:
Speech Perception, I (Orlando, FL: Academic Press).
PLATT,L. J., ANDREWS,
G. YOUNG,M. and QUINN,
P. T. (1980) Dysarthria of adult cerebral palsy:
intelligibility and articulatory impairment. Journal of Speech and Hearing Research, 23,
28-40.
Clin Linguist Phon Downloaded from informahealthcare.com by Arizona State University on 04/18/10
For personal use only.
154
Kristin K . Tjuden and Julie M . Liss
ROSENZWEIG,
M.R. and POSTMAN, L. ( 1957) Intelligibility as a function of frequency of usage. Journal
of Experimental Psychology, 54, 4 12-42 1.
TIKOFSKY,
R. S., G L A ~ KT.
E ,J. and TIKOFSKY,
R. (1966) Listener confusions in response to dysarthric
speech. Folia Phoniatrica, 18, 28G292.
VERBRUGGE,
R. R., STRANGE,
W., SHANKWEILER,
D. P. and EDMAN,
T. R. (1976) What information
enables a listener to map a talker’s vowel space? Journal of the Acoustical Society ofAmerica,
60, 198-212.
WEISMER. G. andMARTIN, R. (1992) Acoustic and perceptual approaches to the study of intelligibility.
In R. D. Kent (Ed.), Intelligibility in Speech Disorders (Amsterdam: John Benjamin).
YORKSTON,
K. and BEUKELMAN,
D. ( 1980) A clinician-judged technique for quantifying dysarthric
speech based on single-word intelligibility. Journal of Communication Disorders, 13, 15-3 I .
YORKSTON,
K. and BEUKELMAN,
D. (1981) Assessment of the Intelligibility of Dysarthric Speech.
(Tigard, OR: CC Publications).
YORKSTON,
K. and BEUKELMAN,
D. (1983) The influence of judge familiarization with the speaker on
dysarthric speech intelligibility. In W. Berry (Ed.), Clinical Dysarfhriu (San Diego, CA:
College-Hill Press).
Appendix
College is different than high school. High expectations push everyone to excel. The
competition makes studies even harder. Students can find ways to relax. Many go biking,
walking, or read. Others will treat themselves to naps. When tests begin, nights get shorter.
Burning the midnight oil is common. When exams get returned most rejoice! Hard work
and discipline pay off! It is time to reward yourself. That is what school is about.