Deaf Studies 6:2 - The Journal of Deaf Studies and Deaf Education

Cognitive Correlates of Visual Speech Understanding in
Hearing-Impaired Individuals
Ulf Andersson
Björn Lyxell
Jerker Rönnberg
Linköping University
Örebro University
Karl-Erik Spens
Royal Institute of Technology
This study examined the extent to which different measures
of speechreading performance correlated with particular cognitive abilities in a population of hearing-impaired people.
Although the three speechreading tasks (isolated word identification, sentence comprehension, and text tracking) were
highly intercorrelated, they tapped different cognitive skills.
In this population, younger participants were better speechreaders, and, when age was taken into account, speech
tracking correlated primarily with (written) lexical decision
speed. In contrast, speechreading for sentence comprehension correlated most strongly with performance on a phonological processing task (written pseudohomophone detection)
but also on a span measure that may have utilized visual, nonverbal memory for letters. We discuss the implications of
this pattern.
Individuals who rely on speechreading because of
hearing impairment communicate to different degrees
of success by means of speechreading and speech
tracking (Arnold, 1997; Demorest & Bernstein, 1992;
Dodd & Campbell, 1987). Much of this variation can
be accounted for by individual differences in specific
cognitive skills (Arnold & Köpsel, 1996; Conrad, 1979;
Gailey, 1987; see Rönnberg, 1995, for a review). However, many of these findings might be valid only for
sentence-based speechreading, as this task has usually
been employed as a measure of visual speech understanding (e.g., Arnold & Köpsel, 1996; see RönnThis work is supported by grants from the Swedish Council for Social
Research awarded to Björn Lyxell (97-0319) and grants awarded to Jerker
Rönnberg (30305108). We thank Ulla-Britt Persson for checking the language. Correspondence should be sent to UIf Andersson, Department of
Behavioural Sciences, Linköping University, S-581 83 Linköping, Sweden (e-mail: [email protected]).
䉷 2001 Oxford University Press
berg, 1995). Because the tasks differ in contextual
constraints (e.g., speech tracking vs. sentence-based
speechreading; see De Filippo & Scott, 1978, and
Demorest, Bernstein, & DeHaven, 1996) and linguistic
complexity (e.g., visual word-decoding vs. sentencebased speechreading; Lyxell & Rönnberg, 1991a), it
seems plausible that different measures of speechreading engage different sets of cognitive skills.
In this study we will re-analyze data from the
Rönnberg, Andersson, Lyxell, and Spens (1998)
speech tracking training study. Speech tracking refers
to a procedure in which a talker reads a text, sentence
by sentence, and the speechreader’s task is to repeat
each sentence verbatim (De Filippo & Scott, 1978).
The Rönnberg et al. study examined the relationship
between cognitive skills and visual and visual-tactile
speech tracking. It found that tasks of visual worddecoding, lexical decision speed, and phonological processing speed accounted for a substantial portion of
the individual differences in visual and visual-tactile
speech tracking performance. That study focused on
speech tracking only, whereas our study used several
tasks of visual speech understanding and included
some additional cognitive measures.
Previous research has identified four cognitive correlates of context-bound sentence-based speechreading
(see Rönnberg, 1995, for review). In addition to the
three skills of visual word-decoding, lexical identification speed, and phonological processing that also correlated with speechtracking, verbal inference-making
ability correlated strongly with the sentence comprehension task (De Filippo, 1982; Lyxell & Rönnberg,
104
Journal of Deaf Studies and Deaf Education 6:2 Spring 2001
1987; 1989; 1991a; 1992; Lyxell, Rönnberg, & Samuelsson, 1994; Öhngren, 1992). Why should these particular aspects of cognition be involved in these speechreading tasks? As lip movements and facial gestures
create a speech signal that is incomplete and poorly
specified, as well as transient (Berger, 1972; Dodd,
1977; Rönnberg, 1990), rapid access to a lexical adress
for identification (i.e., lexical identification speed) is an
important cognitive operation (Lyxell, 1989; Lyxell &
Rönnberg, 1992; Rönnberg, 1990). Slowed lexical identification would impede processing of new incoming
visual information, as well as occupy resources from
other important processes (e.g., inference making),
with negative consequences for speech understanding
(Lyxell, 1989). Similarly, especially when “watching
for meaning,” as in the context-bound sentencespeechreading task, the speechreader must try to infer
missing pieces of verbal information, using stored
knowledge (i.e., verbal inference-making ability; Lyxell & Rönnberg, 1987, 1989). Phonological processing
includes a number of cognitive operations in which
phonological aspects of language are processed, represented, and used during spoken or seen language (i.e.,
speechreading and reading; Cutler, 1989; MarslenWilson, 1987; Rönnberg, 1995; Share & Stanovich,
1995).
Does the strength of correlations of these skills
vary with the different speechreading requirements of
the different spechreading tasks? Rönnberg (1995)
found that lexical identification speed and phonological processing speed were relatively more important in
relation to visual sentence-based speechreading and
visual word-decoding than to visual speech tracking.
Phonological representations were, on the other hand,
relatively more important in relation to visual speech
tracking. A recent study by Lidestam, Lyxell, and Andersson (1999), has also shown that the ability to
speechread long sentences compared to short sentences
is more strongly associated with the individuals’ working memory capacity. Thus, previous studies indicate
that different tasks of visual speechreading relate in
different degrees to particular cognitive abilities.
To examine this further, our study used three
measures of speech tracking and two speechreading
tasks: sentence-based speechreading and visual worddecoding (Rönnberg et al., 1999). Concerning speech
tracking, some of these tasks have been used frequently
in previous studies (cf. Lyxell, 1989), whereas some are
relatively new (cf. Spens, 1995). The optimum wpm
rate and the blockage index constitute the two new
measures of speech tracking, in addition to the conventional wpm rate measure. The rationale for inclusion
of these newer measures is that the optimum wpm rate
measure and the blockage index assess contrasting aspects of speech tracking. The optimum wpm rate measure will provide an estimate of a speech process that
runs efficiently and smoothly (i.e., no blockages),
whereas the blockage index will provide an estimate of
a speech process that breaks down.
Speech tracking and sentence-based speechreading
differ in the degree and nature of contextual constraint.
In speech tracking (De Filippo & Scott, 1978) a coherent story is presented, sentence by sentence, to the
individual. Thus, context works cumulatively during
the tracking task. In contrast, during sentence-based
speechreading, as used here, context is provided by a
title describing the scenario to which the sentence belongs. Finally, a relatively context-free and less linguistically complex speechreading task is speechreading of
single words (a visual word-decoding test). All three
are used in this study.
The cognitive tests in this study are the same
written-material tasks as in Rönnberg, Andersson,
Lyxell, et al. (1998). The rationale for presenting the
test material visually is, besides the obvious reason of
not testing hearing impaired individuals auditorily, that
these tasks all tap central, amodal cognitive functions
common to the processing of both spoken and seen
language/speech (Rönnberg, Andersson, Andersson,
et al., 1998). Lexical identification speed was examined
by means of a lexical decision test (cf. Lyxell & Rönnberg, 1992). Three rhyme-judgment tasks were employed to assess phonological processing ability
(Campbell, 1992; Hanson & MacGarr, 1989; Leybaert & Charlier, 1996; Lyxell et al., 1996). To further
examine the nature of the relationship between phonological processing, speech tracking, and speechreading,
we also included a phonological lexical decision test
and a letter span test (cf. Baddeley & Wilson, 1985).
The letter span test was included to specifically examine the phonological loop component (i.e., phonological coding and articulatory-phonological rehearsal; see
Baddeley, 1997, for a review) of working memory. We
accomplished this by manipulating the phonological
Visual Speech Understanding and Cognition
similarity of the stimulus material used in a short-term
memory task (cf. Baddeley, 1966; Conrad & Hull,
1964). We administered a sentence completion task to
obtain a measure of verbal inference making (i.e., the
ability to fill in missing pieces of information; Lyxell,
1989). General working memory capacity (i.e., the ability to simultaneously process and store information)
was tested by means of a reading span task (Daneman &
Carpenter, 1980; Towse, Hitch, & Hutton, 1998). We
used an analogy test and an antonym test as two indices
of verbal ability (Lyxell et al., 1994).
In sum, in our study we will examine whether performance in different visual speech understanding
tasks, varying in linguistic complexity and contextual
constraint, is related to a single set of cognitive abilities
or whether some tasks are more strongly associated
with a specific cognitive skill or skills. Based on previous research, we think it is reasonable to assume that
lexical identification speed and phonological processing speed will be related to performance in almost
all visual speech understanding tasks (Lyxell, 1989;
Lyxell & Rönnberg, 1992; Lyxell et al., 1994; Rönnberg, 1990; Öhngren, 1992). Linguistically complex
speechreading tasks (i.e., sentence-based speechreading and speech tracking) rather than speechreading of
single words should show this relationship especially
strongly. We make this assumption because linguistically complex tasks require faster throughput of larger
amounts of speech signals than does discrete word
identification.
As contextual support has a positive effect on
speechreading performance (Samuelsson & Rönnberg,
1991, 1993), it could also reduce some of the cognitive
demands that the individual must manage during visual speech understanding. Thus, speech tracking performance may be more weakly associated with lexical
identification speed and phonological processing speed
than sentence-based speechreading (Rönnberg, 1995).
Method
Participants
Eighteen severely hearing-impaired (one male) individuals, ages 21–76 years (M ⫽ 53 years, SD ⫽ 16) participated in this study. Of these, 14 had participated in
the Rönnberg, Andersson, Lyxell, et al. training study
105
(1998). Characteristics for each participant are displayed in Table 1.
The mean duration of the participants’ hearing impairment was 31 years (SD ⫽ 14), and the pure tone
average hearing loss was 75 dB (SD ⫽ 16) for the better
ear according to the most recent available medical records. All participants were native speakers of Swedish
and preferred an oral communication mode.
General Procedure and Tests
All participants were tested individually during two
separate test sessions, one week apart. The cognitive
tests and the speechreading tests were administered in
the first session and the speech tracking test in the second session. All speechreading tests were presented on
a Finlux 26⬙ color TV and a video cassette recorder
(JVC HR-7700EG). The cognitive tests were run on a
Macintosh SE 30 computer with computer-controlled
display and collection of results. (TIPS [TextInformation-Processing-System]; Ausmeel, 1988). All
test instructions were given in writing and complemented with further oral instructions.
Lexical Identification Test
Lexical decision speed. The task was to decide whether
a string of three letters constituted a real word. One
hundred items were used as test material, 50 of which
were monosyllabic real words (e.g., “snö,” SNOW) and
50 were not. Out of the 50 lures, 25 were pronounceable (e.g., “GAR”) and 25 unpronounceable (e.g.,
“NCI”). The letter string was displayed in the center
of the screen for a maximum of 2 sec and the intertrial
interval was 2 sec. Latency time was measured from
onset of the stimulus display to the button press response, or until the maximum response time had expired. The participants responded by button press. Accuracy and speed of performance were measured.
Phonological Processing
The rhyme judgment tests. Three rhyme judgment tests
were administered. The task was to decide whether two
simultaneously presented words or nonwords rhymed.
The procedure for presentation of the items and response collection was as for the lexical decision speed
106
Journal of Deaf Studies and Deaf Education 6:2 Spring 2001
Table 1 Participant characteristics
Participants
Age
Age of
onset
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Mean
SD
41
52
61
61
76
40
72
39
21
57
64
68
45
37
51
45
60
60
53
14
31
36
1
34
61
7
32
3
3
7
39
33
1
1
36
31
15
29
22
18
Years with
hearing aid
Years with
hearing loss
Three frequency
average
(500, 1000, 2000 Hz)
10
8
19
18
9
12
25
36
17
34
22
32
40
15
14
14
37
11
21
11
10
16
61
27
15
33
40
36
18
50
25
35
44
37
15
14
46
31
31
14
83
70
57
88
93
93
85
60
57
75
77
73
87
70
113
65
53
58
75
16
task, except that the maximum response time was set
at 5 sec. The first test list was composed of 50 pairs of
monosyllabic and bisyllabic Swedish words. The word
pairs were of four different types: 12 orthographically
similar rhyming word pairs (e.g., “meter-eter,”
METRE-ETHER), 12 orthographically dissimilar
rhyming word pairs (e.g., “kurs-dusch,” COURSESHOWER), 13 orthographically similar nonrhyming
word pairs (e.g., “klubba-stubbe,” STICK-STUMP),
and 13 orthographically dissimilar nonrhyming word
pairs (e.g., “cykel-päron,” BICYCLE-PEAR). In the
second test, the material consisted of 50 pairs of bisyllabic words, and in each pair there was one “real” word
and one nonword (e.g., “citron-mirol,” LEMONMIROL). The third test contained 30 monosyllabic
pairs of nonwords (e.g., PRET-BLET).
Phonological lexical decision test. The participants were
instructed to decide, as quickly as possible, whether a
string of letters, a nonword, sounded like a real word
or not when it was read silently. Fifty pseudohomophones and 50 nonwords (all pronounceable) were employed as test material. One item at a time was dis-
played on the computer screen, and the subjects
responded “yes” or “no” by means of pressing predefined buttons. The response time was set at 5 sec, and
the string disappeared when participants pressed the
button. Accuracy and speed of performance were measured.
Working Memory Capacity
Reading span test. The participant’s task was to read sequences (3, 4, 5, or 6 sentences) of three-word sentences, judge whether each sentence was semantically
sensible or absurd, and respond by saying “yes” or
“no”. At the end of each sequence of sentences, the
participants were instructed to recall orally, in correct
serial order, the first or the final word in each sentence.
The words in each sentence were presented in sequential order at a rate of one word per 0.8 sec and with
an interword interval of 0.075 sec. The intersentence
interval in each sequence of sentences was 1.75 sec during which time the participants had to respond “yes”
or “no.” Half of the sentences were absurd (e.g.,
“Fisken körde bilen,” THE FISH DROVE THE
Visual Speech Understanding and Cognition
CAR), and half were normal sentences (e.g., “Kaninen
var snabb,” THE RABBIT WAS FAST). Three
different sequences were presented for each span size.
The response interval was set at 80 sec. However, no
participant needed more than 30 sec to respond. The
experimenter started the next sequence of sentences by
pushing a button. The participants’ responses were
scored by the experimenter in terms of total number of
recalled words.
Letter span. The test material consisted of two different
lists of letters, one list of phonologically similar letters
(B, C, D, E, G, P, T, V) and one of phonologically dissimilar letters (F, H, J, K, M, N, R, S). The presentation order of the two lists was counterbalanced across
participants. The letters were presented at a rate of one
letter per 0.8 sec with an interitem interval of 0.075 sec.
Each list contained 12 sequences of letters (three sequences for each span size). The span size ranged from
four to seven letters. The participants were asked to recall orally the letters in correct serial order. As no participant needed more than 30 sec to respond (maximized to 120 sec), presentation of the next series of
letters started after the experimenter pushed a button.
The participants’ responses were scored in terms of total number of letters recalled in correct serial order.
Verbal Inference Making
Sentence completion test. Twenty-eight incomplete sentences constituted the test material (e.g., “Kan jag . . .
ett par byxor?,” MAY I . . . PAIR OF TROUSERS?).
From each sentence two to four words were omitted.
The sentences were 4 to 13 words long. All words were
common Swedish words (Allén, 1970). Half the sentences were related to a restaurant scenario and the
other half to a shop scenario. The task was to complete
the sentence by filling in the missing words. The incomplete sentence was displayed on the computer
screen for 7 sec. Immediately after the presentation of
the sentence, the response interval started, which was
set at 30 sec. The participants completed the sentence
orally, and the experimenter transcribed the responses.
These were scored correct if they were semantically
and syntactically appropriate. The number of correct
words was divided by the maximum number of deleted
107
words in each sentence to generate a sentence-based
score, and means over 28 sentences were the dependent variable.
Verbal Ability
Analogy test. This was a paper-and-pencil test in which
the participants were required to judge which two
words out of five were related to each other, in a similar
(analogous) way as two target words (e.g., “SUGARSWEET: SUN, DAY, WHITE, NIGHT, DARK”).
The test included 27 five-word strings and was performed under time pressure (maximum 5 minutes and
30 sec). The proportion of correct responses was recorded.
Antonym test. This consisted of 29 five-word strings,
and the participants’ task was to decide which two
words out of five were antonyms (e.g., “BEAUTIFUL,
OLD, SAD, FAST, YOUNG”). The participants had
a maximum of 5 minutes to perform the test, and the
scoring procedure was the same as for the analogy test.
This was also a paper-and-pencil test.
Tactile Aids
One single-channel aid and one multichannel aid were
used in this study.
MiniVib 3 (Ab Specialinstrument, Stockholm). The singlechannel aid was the MiniVib 3 device. This aid performs an extraction of the acoustic signal, between 500
and 1800 Hz, but time/intensity variations are presented to the user at a fixed 230 Hz frequency. The tactile signal is transmitted to the user via a boneconductor attached to the user’s wrist by a wristband.
Tact aid 7. The Tact aid 7 device was the multichannel
aid used in this study. This aid was equipped with
seven vibrators. Out of these seven channels four covered the first formant frequencies, and four covered the
second formant frequencies. Thus, one channel was
shared by the first and second formant analyzers. The
seven vibrators were attached to a specially designed
hand prototype, and the participant held the hand prototype in his or her right hand. Five of the vibrators
108
Journal of Deaf Studies and Deaf Education 6:2 Spring 2001
stimulated the fingertips and the two remaining vibrators stimulated the palm. (See Tact aid 7 user’s manual
[1991] for technical details.)
Visual Speech Understanding
Sentence-based speechreading test. The participants’ task
was to speechread sentences with and without tactile
support. Eighteen sentences subdivided into three separate scenarios, each containing six sentences, were
used as test material. The scenarios were a “train scenario,” a “shop scenario,” and a “restaurant scenario.”
Within each scenario, the participants speechread two
sentences in each one of the three speechreading conditions. The sentences were presented by a female native speaker of Swedish. The participants wrote their
responses on an answer sheet. The order of the aids
employed and the presentation order of the sentences
were counterbalanced. Performance was measured as
the proportion of words correctly perceived.
Visual word-decoding test. In this test the participants’
task was to speechread single words presented by a female actor. The test material included 30 common
Swedish bisyllabic nouns. Otherwise, the same test
routine was used as in the sentence-based speechreading test. The proportion of correct responses was measured.
Speech tracking test. A computerized speech tracking
procedure (De Filippo & Scott, 1978; Gnosspelius &
Spens, 1992) was employed in this test. The test material was a simplified and easy-to-read book by PerAnders Fogelström, Mina drömmars stad (“The town of
my dreams”). The following test routine was used: the
experimenter read a text from a computer screen, sentence by sentence, and the participants were instructed
to speechread and repeat as many words as possible following each sentence presentation. Words the participant was unable to perceive were repeated orally twice,
and, if necessary, after approximately 4–5 sec, finally
presented orthographically on an electronic display. All
participants performed the task with their hearing aids
turned off. The participants speechread for 10 min (2
⫻ 5 min) in each of the three test conditions, and the
test order was counterbalanced across participants.
Three measures of speech tracking performance were
employed in this study: (1) conventional wpm rate (the
total number of words speechread during the test session was divided by the time elapsed to give a wpm rate
automatically calculated by the computer); (2) optimum wpm rate (this measure was based on error-free
sentences only, that is, sentences in which all words
were correctly perceived after the first presentation,
and also automatically calculated by the computer program); (3) blockages (words that the participants could
not correctly perceive after the first presentation constituted a blockage and were automatically registered
by the computer program; the total number of
blockages encountered during tracking constituted an
additional measure of speech tracking performance).
Results
The results are presented in two sections: the first section examines the intercorrelations among the five
tasks of visual speech understanding, and the second
the relationship between the cognitive tests and speech
understanding tasks.
Different Measures of Visual Speech Understanding
Descriptive statistics for and intercorrelations among
the five tasks of visual speech understanding, pooled
over the three conditions to reduce the number of variables (i.e., MiniVib 3, Tact aid 7, and visual), are shown
in Table 2. The conventional speech tracking rate and
the sentence-based speechreading test correlated with
almost all the other measures, indicating that they constitute composite indexes of visual speech understanding.
Our sample is relatively heterogeneous with respect to background variables such as chronological
age, age of onset of hearing loss, years with hearing
loss, number of years with hearing aid, and level of dB
loss (see Table 1), and it is, thus, possible that these
variables can affect the correlational patterns (cf.
Dancer, Krain, Thompson, Davis, et al., 1994; Lyxell & Rönnberg, 1991b; Rönnberg, 1990; Shoop & Binnie, 1979). Correlations were for this reason calculated
between these variables and visual speech understanding. Although previous studies have failed to obtain any
Visual Speech Understanding and Cognition
109
Table 2 Descriptive statistics for and correlations among different measures of visual
speech understanding pooled over the three conditions and chronological age
Speech tracking
Chron. age (yrs)
Speech tracking
Wpm conv.
Wpm opt.
Blockages
Speechreading
Sent.-based
Word decod.
X (SD)
Wpm
Wpm
opt.
53 (14)
⫺.50*
⫺.51*
34.92 (17.72)
45.11 (17.97)
41.76 (16.54)
Speechreading
Blockages
.31
.94** ⫺.79**
⫺.62**
Sentencebased
Word
decoding
⫺.51*
⫺.57*
.72**
.63**
⫺.81**
.76**
.67**
⫺.79**
.34 (.29)
.35 (.23)
.87**
*p ⬍ .05.
**p ⬍ .01.
effect of some of these variables (number of years with
hearing loss and dB loss, Rönnberg, 1990), they were
included to validate previous results. The outcome of
these analyses showed that chronological age was the
only variable significantly related to visual speech understanding (see Table 2; Lyxell & Rönnberg, 1991b;
Rönnberg, 1990). To control for the effect of age, we
then computed partial correlations among the five tasks
of visual speech understanding.
As can be seen in Table 3, controlling for age reduced the magnitude of the correlation coefficients in
general, but all significant correlation coefficients remained significant.
Cognitive Correlates of the Visual Speech
Understanding Tasks
The purpose of this study was to identify possible cognitive correlates of visual speech understanding. To accomplish this aim, we computed correlation analyses
between the cognitive tests and the visual speech understanding tests. As the correlational patterns for the
three different speechreading conditions (i.e., MiniVib
3, Tact aid 7, and visual) did not deviate from each
other in any systematic way, the three conditions were
pooled to obtain one composite measure for each one
of the five speechreading tasks. Descriptive statistics
for the cognitive tests and the results of the correlation
analyses are displayed in Table 4.
Lexical identification speed, phonological lexical
decision speed, and rhyme judgment speed all showed
Table 3 Partial correlations among different measures
of visual speech understanding pooled over the three
conditions, holding chronological age constant
Speech tracking
Speechreading
Wpm
Sentence- Word
opt.
Blockages based
coding
Speech tracking
Wpm conv. rate
Wpm: opt. rate
Blockages
Speechreading
Sent.-based
.92**
⫺.77**
⫺.57*
.62**
.47*
⫺.82**
.69**
.53*
⫺.79**
.81**
*p ⬍ .05.
**p ⬍ .01.
systematic and significant correlations with the speech
understanding tests, whereas the tests of working
memory, verbal inference making, and verbal ability
did not correlate significantly with the visual speech
tests.
Additional correlation analyses were calculated between the accuracy scores on the reaction time tests
and the speech tests. However, these analyses did not
reveal any significant correlations.
One interesting feature of the correlational pattern
was that the blockage index did not show any systematic relation with the cognitive tests. Furthermore, the
speed measure of the phonological lexical decision test
was significantly correlated only with the sentencebased speechreading test and optimum speech
tracking rate.
As chronological age affects both cognitive func-
110
Journal of Deaf Studies and Deaf Education 6:2 Spring 2001
Table 4 Descriptive statistics, expressed as proportion and speed, for the cognitive tests used in this study and correlations
among different measures of visual speech understanding, pooled over the three conditions, and cognitive tests
Speechtracking
Cognitive tasks
Working memory
Reading span
Letter span
Phonologically dissimilar
Phonologically similar
Verbal ability
Antonym test
Analogy test
Inference making
Sentence-completion
Lexical identification
Lexical decision speed
Phonological processing speed
Phon. lexical decision
Rhyme judgment
Word-pairs
Pairs of words/nonwords
Pairs of nonwords
Speechreading
Wpm
Wpm
opt.
Blockages
.41 (.10)
⫺.10
⫺.05
⫺.08
.05
.09
.34 (.20)
.21 (.20)
⫺.24
.32
.08
.38
.08
⫺.20
.04
.11
⫺.04
.29
.46 (.21)
.43 (.27)
.06
.16
.22
.29
⫺.19
⫺.29
.24
.46
.10
.43
.71 (.11)
.06
.16
⫺.31
.27
.30
.77 (.15)
⫺.58*
⫺.66**
.38
⫺.61**
⫺.54*
1.98 (.61)
⫺.41
⫺.50*
.43
⫺.61**
⫺.34
1.40 (.41)
1.54 (.48)
1.59 (.51)
⫺.47*
⫺.53*
⫺.52*
⫺.63*
⫺.65**
⫺.62**
.27
.39
.43
⫺.51*
⫺.62**
⫺.68**
⫺.49*
⫺.57*
⫺.55*
X (SD)
Sentencebased
Word
decoding
*p ⬍ .05.
**p ⬍ .01.
tions (see Birren & Fisher, 1995; Salthouse, 1985, for
reviews) and visual speech understanding, partial correlations were calculated to examine whether significant correlations would remain when the effect of age
was partialled out (see Table 5).
The rhyme judgment tests (speed) continued to be
significantly associated with optimum speech tracking
rate and sentence-based speechreading but not with
the other measures of visual speech understanding.
The correlations between the speed measures of lexical
decision and phonological lexical decision tests also remained significant for optimum speech tracking rate
and sentence-based speechreading, respectively. The
most important aspect of the partial correlational pattern was that the letter span task (i.e., phonologically
similar letters) was now significantly correlated with almost all visual speech understanding tests. To examine
the contribution of phonological processing in these
correlations, we calculated a phonological similarity
effect index (PSE) by subtracting performance in the
phonologically similar condition from the phonologically dissimilar condition. This new variable (X ⫽ .23,
SD ⫽ .32, min ⫽ ⫺.34, max ⫽ .78) constitutes a measure of relative phonological processing (i.e., coding) in
verbal working memory (Baddeley, 1966; 1997; Baddeley & Wilson, 1985; Conrad & Hull, 1964). The correlations between the PSE and the speech tasks when the
effect of age was partialled out are presented in Table
5. Significant negative correlations were now evident
with conventional and optimum speech tracking rate.
That is, relative resistance to the PSE (low scores on
this index) were associated with higher speechreading
performance levels and vice versa. The PSE, on the
other hand, was significantly and positively correlated
with performance on the letter span task condition including phonologically dissimilar letters (r ⫽ .54, p ⬍
.05) and also showed a tendency to a positive and significant correlation with the reading span task (r ⫽ .46,
p ⬍ .06). These correlations are consistent with the assumption that phonological coding is particularly efficient for temporal order recall of verbal information
(Hanson, 1982; 1990; Logan, Maybery, & Fletcher,
1996), and as such the correlations also validate the assumption that the phonological similarity effect vari-
Visual Speech Understanding and Cognition
111
Table 5 Partial correlations among different measures of visual speech understanding,
pooled over the three conditions, and cognitive tests, controlling for chronological age
Speechtracking
Cognitive tasks
Wpm
Wpm
opt.
Reading span
Letter span
Phonologically dissimilar
Phonologically similar
Phonological similarity effect
(dissimilar-similar)
Antonym test
Analogy test
Sentence-completion
Lexical decision speed
Phonological lexical decision
Rhyme-judgment
Word-pairs
Pairs of words/nonwords
Pairs of nonwords
⫺.40
⫺.35
Speechreading
Sentencebased
Word
decoding
⫺.20
⫺.19
⫺.08
.07
⫺.04
.58*
.65** ⫺.36
⫺.61** ⫺.52*
.29
.27
.47*
⫺.15
.20
.63**
⫺.39
Blockages
.06
.03
⫺.05
⫺.05
⫺.38
⫺.26
.21
.11
.07
⫺.50*
⫺.38
⫺.18
⫺.17
⫺.26
.24
.31
.23
.32
.21
⫺.41
⫺.52*
.07
.26
.24
⫺.24
⫺.13
⫺.27
⫺.39
⫺.36
⫺.48*
⫺.54*
⫺.48*
.13
.30
.33
⫺.32
⫺.50*
⫺.57*
⫺.25
⫺.43
⫺.37
*p ⬍ .05.
**p ⬍ .01.
able taps phonological processing in working memory
(i.e., coding).
Finally, to examine whether the letter span test and
the rhyme judgment tests tap similar aspects of phonological processing, we calculated intercorrelations.
These correlation coefficients were, however, nonsignificant (⫺.02 ⬍ r ⬍ ⫺.22).
Discussion
The purpose of this study was to examine how cognitive abilities relate to different measures of visual
speech understanding. The results demonstrate that
the pattern of cognitive association varies with the
different tests of visual speech.
Before partialling for age, lexical identification
speed proved, once again (cf. Lyxell, 1989; Lyxell &
Rönnberg, 1992; Öhngren, 1992), to be strongly related
to almost all tests of visual speech understanding, as
well as phonological processing speed (i.e., rhyme
judgment and phonological lexical decision). In contrast to previous studies (Lyxell et al., 1994; Öhngren,
1992), the accuracy measures of the rhyme judgment
and phonological lexical decision tests were not associ-
ated with visual speech understanding. In line with previous studies, however, chronological age was associated
with visual speech understanding (Lyxell & Rönnberg,
1991b; Rönnberg, 1990; Shoop & Binnie, 1979).
There was no relationship between the blockage index of visual tracking fluency and any of the cognitive
tests. This finding is in sharp contrast to the patterns
of correlations for the other measures of visual speech
understanding. Moreover, considering the strong correlation between the blockage index and sentencebased speechreading, one would expect a similar
pattern to arise. This negative finding suggests that individual differences in blockages during speech
tracking are not related to the cognitive tasks used in
this study, whereas the index of a speech process that
runs efficiently and smoothly (i.e., optimum wpm rate)
is well accounted for by these cognitive functions. To
account for breakdowns in the processing of visual
speech input that result in blockages, other measures
of cognitive and perceptual functions may be required.
As the word-decoding tests were related to the
blockage index, the early stages of visual speech understanding (e.g., visual feature extraction) may be implicated here (cf. Gailey, 1987; Lyxell & Rönnberg,
112
Journal of Deaf Studies and Deaf Education 6:2 Spring 2001
1991a). In line with this, the lack of relationship with
the cognitive tests is logical as the tasks used in this
study all tap central amodal cognitive functions (cf.
Rönnberg, Andersson, Andersson, et al., 1998) that
come into play at a later stage of the processing of the
speech signal. Thus, when blockages occur, the flow of
speech information is cut off or considerably reduced,
and these central amodal cognitive functions are not required because there is no information, or only a small
amount, left for processing.
The suggestion that providing contextual information during visual speech understanding may reduce
some of the cognitive demands imposed by the task was
not strongly supported. Lexical identification speed
and phonological processing speed were relatively
stronger correlates of sentence-based speechreading
than of conventional speech tracking rate; however,
these two cognitive skills were also strongly correlated
with the optimum speech tracking rate and with
sentence-based speechreading. The provision of contextual information may not be responsible for the
different strengths of associations between sentencebased speechreading and conventional speech tracking
rate. Rather, it may reflect the fact that conventional
speech tracking rate is a composite measure that includes, among other things, blockages, which, as we
have seen, are uncorrelated with any cognitive test.
Thus, the inclusion of blockages reduces the strength
of the associations compared to optimum speech
tracking rate and sentence-based speechreading. The
results of the partial correlations corroborate this line
of reasoning, as performance on the sentence-based
speechreading test and the optimum rate measure were
the only significant measures when the contribution of
chronological age was eliminated.
All significant correlations with the word-decoding
test disappeared after partialling out for age, except for
the letter span task. This pattern of results conforms
with the finding of Lidestam et al. (1999), showing that
the ability to speechread long sentences, compared to
short sentences, depends more on the individual’s
working memory capacity. Thus, the prediction that
lexical identification and phonological processing are
less powerful correlates of linguistically less complex
speech tests compared to linguistically complex speech
tests is supported. Word-decoding in speechreading
may tap relatively early (perceptual) stages of lexical
identification (see Lyxell & Rönnberg, 1991a), prior to
any phonological processing. Furthermore, the on-line
processing constraints of the word-decoding task are
relatively small compared to speechreading of sentences (cf. Lidestam et al., 1999).
Partialling for age revealed a sharper pattern of distinct associations of cognitive variables with the speechreading tasks. Phonological processing speed remained
significantly related to optimum speech tracking rate
and sentence-based speechreading, whereas lexical
identification speed was still related to optimum speech
tracking rate. A novel finding is that the letter span test,
including phonologically similar letters, emerged as a
significant correlate of all visual speech tests (except
the blockage index). None of the other associations
reached significance.
The correlations between the letter span task, including phonologically similar letters, and almost all
visual speech tests, suggest that working memory contributes to the speechreading process. However, the
finding that the PSE index correlated negatively with
speech tracking implicates reliance on a visual nonverbal component of short-term memory in this task (cf.
De Filippo, 1982), as well as in visual word decoding,
which showed a similar, though nonsignificant pattern.
Thus, these findings suggest that skilled speechreading
requires the ability to use both phonological and visual
working memory coding strategies.
Our results indicate that phonological processing
permeates visual speech understanding. For example,
performance on the phonological lexical decision task
correlated highly with sentence-speechreading. Latencies on the rhyme judgment tests were associated with
visual speech understanding (e.g., optimum speech
tracking rate and sentence-based speechreading). The
rhyme judgment tasks were all strongly associated with
the lexical decision task (.81 ⬍ r ⬍.92), implying that
the rhyme judgment speed taxes aspects of phonological processing related to lexical identification processes. From an auditory spoken word recognition approach (Luce & Pisoni, 1998; Marslen-Wilson, 1987),
phonological processing may contribute to the speechreading process in a number of ways. According to
models of auditory spoken word recognition (Luce &
Pisoni, 1998; Marslen-Wilson, 1987), the initial pho-
Visual Speech Understanding and Cognition
neme is critical for identification of spoken words. A
study reported by Lyxell and colleagues (1991a) shows
that the initial phoneme also is critical in visual speechreading. Thus, the relationships between the rhyme
judgment tasks and speechreading performance may
reflect a stage in the speechreading process when the
extracted initial speech segments of a word are converted into abstract phoneme or syllable representations (cf. Pisoni & Luce, 1987). The correlations with
rhyme judgment speed may also manifest the subsequent process when these phonemic or syllabic representations activate the phonological-lexical items that
share this word-initial information in the lexicon,
which eventually results in lexical identification of the
speechread word (cf. Marslen-Wilson, 1987).
As rhyme judgment tasks involve manipulation and
comparison of suprasegmental information (i.e., syllables) including syllabic stress (see Gathercole & Baddeley, 1993), it is possible that the associations with
speechreading performance also reflect the processing
of prosodic information during speechreading. The
importance of word prosody (i.e., number of syllables
and syllabic stress) and sentence prosody (i.e., rhythm
and intonation) in auditory speech processing is well
established (Cutler, 1989; Kjelgaard & Speer, 1999;
Lindfield, Wingfield, & Goodglass, 1999a, 1999b; Norris, McQueen, & Cutler, 1995). Spoken word recognition is facilitated when word stress, not just initial phonology, is taken into account; that is, the word-initial
cohort is constrained only when the words included
share both stress pattern and initial phonology with
the stimulus word (Lindfield et al., 1999a, 1999b).
Sentence prosody contributes to auditory speech processing by solving syntactic ambiguities and identifying syntactic boundaries (Kjelgaard & Speer, 1999;
Pisoni & Luce, 1987; Schepman & Rodway, 2000;
Steinhauer, Alter, & Friederici, 1999). Word stress patterns (syllabic stress) and the rhythm of an utterance
may be extracted from lips and face movements, as
these speech cues have a direct corresponce to the energy of the acoustic signal (cf. Gathercole & Baddeley,
1993), and, as such, this energy should also be displayed in the face of the talking person. Thus, the
speechreader may be able to obtain these speech cues
and use them to facilitate lexical identification and to
disambiguate speechread utterance (cf. Rönnberg, An-
113
dersson, Andersson, et al., 1998). The amount of prosodic information is greater in speechreading of sentences than in words, which would explain why rhyme
judgment speed was associated with optimum speech
tracking rate and sentence-based speechreading only
after controlling for age.
Inference-making ability did not correlate with visual speech understanding in this study. However, this
study used a different methodology and a different
population than did previous studies (Lyxell & Rönnberg, 1987, 1989) in which such a relationship has been
obtained. Our negative result is, on the other hand, consistent with other previous studies (cf. Rönnberg, 1990).
As in previous studies on hearing-impaired participants, no direct relationships between general working
memory capacity (i.e., reading span) and visual speech
understanding were obtained (Lyxell & Rönnberg,
1989). The studies finding such a relationship have
typically employed populations of normal hearing participants (Lidestam et al., 1999; Lyxell & Rönnberg,
1993). One reason for this difference between the populations is that speechreading in hearing-impaired individuals may be relatively automatized compared to
that of normal hearing individuals. As automatized
processes presumably require less working memory capacity (e.g., LaBerge & Samuels, 1985), it consequently
becomes a less critical cognitive prerequisite in hearing-impaired individuals’ speechreading.
In sum, these results demonstrate that specific cognitive skills relate to individual differences in visual
speech understanding. The major cognitive correlates
proved to be lexical identification speed and phonological processing. We observed the contribution of visual
as well as phonological working memory coding strategies. Furthermore, we elaborated on the importance
and function of phonological processing in visual
speech understanding. As hypothesized, the pattern of
correlations between cognitive skills and strengths of
relationship varied among the different visual speech
measures. Particularly after partialling correlations
with age, linguistically complex tests (i.e., sentencebased speech reading, speech tracking) were associated
with measures of lexical identification speed and phonological processing speed, but the (linguistically) simpler word-decoding task was not. Letter span, when
measured either in terms of performance on similar-
114
Journal of Deaf Studies and Deaf Education 6:2 Spring 2001
sounding letters or as strength of PSE, correlated with
all tests of visual speech understanding. In contrast to
the other speech tests, none of these cognitive tests was
associated with individual differences in number of
blockages encountered during speech tracking. Prediction of this type of measure may require tests tapping
the early stages of visual speech processing (i.e., perceptual functions).
Received October 6, 1999; revision received April 13, 2000; accepted October 5, 2000
References
Allén, S. (1970). Frequency dictionary of present-day Swedish. (In
Swedish: Nusvensk frekvensbok.) Stockholm: Almqvist &
Wiksell.
Arnold, P. (1997). The structure and optimization of speechreading. Journal of Deaf Studies and Deaf Education, 2, 199–211.
Arnold, P., & Köpsel, A. (1996). Lipreading, reading and memory of hearing and hearing-impaired children, Scandinavian
Audiology, 25, 13–20.
Ausmeel, H. (1988). TIPS (Text-Information-ProcessingSystem): A user’s guide. Linköping: Department of Education and Psychology, Linköping University, Sweden.
Baddeley, A. D. (1966). Short-term memory for word sequences
as a function of acoustic, semantic and formal similarity.
Quarterly Journal of Experimental Psychology, 18, 362–365.
Baddeley, A. D. (1997). Human memory: Theory and practice. Rev.
ed. Hove, England: Psychology Press.
Baddeley, A. D., & Wilson, B. (1985). Phonological coding and
short-term memory in patients without speech. Journal of
Memory and Language, 24, 490–502.
Berger, K. W. (1972). Visemes and homophonous words. Teacher
of the Deaf, 70, 396–399.
Birren, J. E., & Fisher, L. M. ( 1995). Speed of behavior: Possible
consequences for psychological functioning. Annual Review
of Psychology, 46, 329–353.
Campbell, R. (1992). Speech in the head? Rhyme skill, reading,
and immediate memory in the deaf. In D. Reisberg (Ed.),
Auditory imagery (pp. 73–94). Hillsdale, NJ: Erlbaum.
Conrad, R. (1979). The deaf schoolchild. London: Harper & Row.
Conrad, R., & Hull, A. J. (1964). Information, acoustic confusion and memory span. British Journal of Psychology, 55,
429–432.
Cutler, A. (1989). Auditory lexical access: Where do we start? In
W. Marslen-Wilson (Ed.), Lexical representation and processes
(pp. 342–356). Cambridge, MA: MIT Press.
Dancer, J., Krain, M., Thompson, C., Davis, P., et al. (1994).
A cross-sectional investigation of speechreading in adults:
Effects of age, gender, practice, and education. Volta Review,
96, 31–40.
Daneman, M., & Carpenter, P. A. (1980). Individual differences
in working memory and reading. Journal of Verbal Learning
and Verbal Behavior, 19, 450–466.
De Filippo, C. L. (1982). Memory for articulated sequences and
lipreading performance of hearing-impaired observers.
Volta Review, April, 134–146.
De Filippo, C. L., & Scott, B. L. (1978). A method for training
and evaluating the reception of ongoing speech. Journal of
the Acoustical Society of America, 63, 1186–1192.
Demorest, M. E., & Bernstein, L. E. (1992). Sources of variability in speechreading sentences: A generalizability analysis.
Journal of Speech and Hearing Research, 35, 876–891.
Demorest, M. E., & Bernstein, L. E., & DeHaven, G. P. (1996).
Generalizability of speechreading performance on nonsense
syllables, words, and sentences: Subjects with normal hearing. Journal of Speech & Hearing Research, 39, 697–713.
Dodd, B. (1977). The role of vision in the perception of speech.
Perception, 6, 31–40.
Dodd, B., & Campbell, R. (1987). Hearing by eye: The psychology
of lipreading. London: Erlbaum.
Gailey, L. (1987). Psychological parameters of lipreading skill. In
B. Dodd & R. Campbell (Eds.), Hearing by eye: The psychology of lipreading (pp. 115–141). London: Erlbaum.
Gathercole, S. E., & Baddeley, A. D. (1993). Working memory and
language. Hove, England: Erlbaum.
Gnosspelius, J., & Spens, K.-E. (1992). A computer-based
speech tracking procedure. Speech Transmissions Laboratory
Quarterly Progress and Status Report, 2, 131–137.
Hanson, V. L. (1982). Short-term recall by deaf signers of American sign language: Implications of encoding strategy for order recall. Journal of Experimental Psychology: Learning,
Memory and Cognition, 8, 572–583.
Hanson, V. L. (1990). Recall of ordered information be deaf
signers: Phonetic coding in temporal order recall. Memory & Cognition, 18, 604–610.
Hanson, V. L., & MacGarr, N. S. (1989). Rhyme generation by
deaf adults. Journal of Speech and Hearing Research, 32, 2–11.
Kjelgaard, M. M., & Speer, S. H. (1999) Prosodic facilitation
and interference in the resolution of temporary syntactic
closure ambiguity. Journal of Memory and Language, 40,
153–194.
LaBerge, D., & Samuels, S. J. (1985). Toward a theory of automatic information processing in reading. In H. Singer &
R. B. Ruddell (Eds.), Theoretical models and processes of reading, 3rd ed. Newark, DE: International Reading Association.
Leybaert, J., & Charlier, B. (1996). Visual speech in the head:
The effect of cued-speech on rhyming, remembering and
spelling. Journal of Deaf Studies and Deaf Education, 1,
234–248.
Lidestam, B., Lyxell, B., & Andersson, G. (1999). Speechreading: cognitive predictors and displayed emotion. Scandinavian Audiology, 28, 211–217
Lindfield, K. C., Wingfield, A., & Goodglass, H. (1999a). The
contribution of prosody to spoken word recognition. Applied
Psycholinguistics, 20, 395–405.
Lindfield, K. C., Wingfield, A., & Goodglass, H. (1999b). The
role of prosody in the mental lexicon. Brain & Language,
68, 312–317.
Logan, K., Maybery, M., & Fletcher, J. (1996). The short-term
memory of profoundly deaf people for words, signs and abstract spatial stimuli. Applied Cognitive Psychology, 10,
105–119.
Luce, P. A., & Pisoni, D. A. (1998). Recognizing spoken words:
Visual Speech Understanding and Cognition
The neighborhood activation model. Ear & Hearing, 19,
1–36.
Lyxell, B. (1989). Beyond lips: Components of speechreading. Doctoral dissertation, University of Umeå, Sweden.
Lyxell, B. (1994). Skilled speechreading: A single-case study.
Scandinavian Journal of Psychology, 35, 212–219.
Lyxell, B., Andersson, J., Arlinger, S., Bredberg, G., Harder,
H., & Rönnberg, J. (1996). Verbal information-processing
capabilities and cochlear implants: Implications for preoperative predictors of speech understanding. Journal of Deaf
Studies and Deaf Education, 1, 190–201.
Lyxell, B., & Rönnberg, J. (1987). Guessing and speech-reading.
British Journal of Audiology, 21, 13–20.
Lyxell, B., & Rönnberg, J. (1989). Information-processing skill
and speech-reading. British Journal of Audiology, 23,
339–347.
Lyxell, B., & Rönnberg, J. (1991a). Visual speech processing:
Word-decoding and word-discrimination related to
sentence-based speechreading and hearing-impairment.
Scandinavian Journal of Psychology, 32, 9–17.
Lyxell, B., & Rönnberg, J. (1991b). Word discrimination and
chronological age related to sentence-based speech-reading
skill. British Journal of Audiology, 25, 3–10.
Lyxell, B., & Rönnberg, J. (1992). The relationship between verbal ability and sentence-based speechreading. Scandinavian
Audiology,21, 67–72.
Lyxell, B., & Rönnberg, J. (1993). The effects of background
noise and working memory capacity on speechreading performance. Scandinavian Audiology, 22, 67–70.
Lyxell, B., Rönnberg, J., & Samuelsson, S. (1994). Internal
speech functioning and speechreading in deafened and normal hearing adults. Scandinavian Audiology, 23, 179–85.
Marslen-Wilson, W. D. (1987). Functional parallelism in spoken
word-recognition. Cognition, 25, 71–102.
McClelland, J. L., & Elman, J. L. (1986). The trace model of
speech perception. Cognitive Psychology, 18, 1–86.
Norris, D., McQueen, J. M., & Cutler, A. (1995). Competition
and segmentation in spoken-word recognition. Journal of
Experimental Psychology: Learning, Memory, & Cognition,
21, 1209–1228.
Öhngren, G. (1992). Touching voices: Components of direct tactually supported speechreading. Doctoral dissertation, University of Uppsala, Sweden.
Pisoni, D. B., & Luce, P. A. (1987). Acoustic-phonetic representations in word recognition. Cognition, 25, 21–52.
Rönnberg, J. (1990). Cognitive and communicative function:
115
The effects of chronological age and “Handicap Age.” European Journal of Cognitive Psychology, 2, 253–273.
Rönnberg, J. (1995). What makes a skilled speechreader? In G.
Plant & K.-E. Spens (Eds.), Profound deafness and speech
communication (pp. 393–416). London: Whurr.
Rönnberg, J., Andersson, J., Andersson, U., Johansson, K., Lyxell, B., & Samuelsson, S. (1998). Cognition as a bridge between signal and dialogue: Communication in the hearing
impaired and deaf. Scandinavian Audiology, 27(suppl. 49),
101–108.
Rönnberg, J., Andersson, J., Samuelsson, S., Söderfeldt, B.,
Lyxell, B., & Risberg, J. (1999). A speechreading expert:
The case of MM. Journal of Speech, Language, and Hearing
Research, 42, 5–20.
Rönnberg, J., Andersson, U., Lyxell, B., & Spens, K-E. (1998).
Vibrotactile speech tracking support: Cognitive prerequisites. Journal of Deaf Studies and Deaf Education, 3, 143–156.
Salthouse, T. (1985). A theory of cognitive aging. Amsterdam:
North-Holland.
Samuelsson, S., & Rönnberg, J. (1991). Script activation in lipreading. Scandinavian Journal of Psychology, 32, 124–143.
Samuelsson, S., & Rönnberg, J. (1993). Implicit and explicit use
of scripted constraints in lipreading. European Journal of
Cognitive Psychology, 5, 201–233.
Schepman, A., & Rodway, P. (2000). Prosody and parsing in coordination structures. Quarterly Journal of Experimental Psychology, 53A, 377–396.
Share, D. L., & Stanovich, K. E. (1995). Cognitive processing in
early reading development: Accomodating individual
difference into a model of acquisition. Issues in Education,
1, 1–57.
Shoop, C., & Binnie, C. A. (1979). The effect of age upon the
visual perception of speech. Scandinavian Audiology, 8, 3–8.
Spens, K-E. (1995). Evaluation of speech tracking results: Some
numerical considerations and examples. In G. Plant & K.-E.
Spens (Eds.), Profound deafness and speech communication (pp.
417–437). London: Whurr.
Steinhauer, K., Alter, K., & Friederici, A. D. (1999). Brain potentials indicate immediate use of prosodic cues in natural
speech processing. Nature Neuroscience, 2, 191–196.
Tact aid 7: User’s manual. (1991). Somerville, MA: Audiological
Engineering Corporation.
Towse, J. N., Hitch, G. J., & Hutton, U. (1998). A reevaluation
of working memory capacity in children. Journal of Memory
and Language, 39, 195–217.