Role of Perceptual Acclimatization in the Selection of Frequency

J Am Acad Audiol 4: 296-306 (1993)
Role of Perceptual Acclimatization
in the Selection of Frequency
Responses for Hearing Aids
Stuart Gatehouse*
Abstract
Previous work concerning the late-onset auditory deprivation and/or acclimatization effect in
adult hearing-aid users has suggested that the benefits of a particular frequency response
from a hearing aid may not become apparent until material exposure to that frequency
response has been achieved . The generality of that finding was tested further. A group of
subjects who were established users (12 to 15 months) of a particular frequency response
(limited at high frequencies by the system of provision) were re-prescribed with a theoretically
advantageous frequency response according to the NAL prescription . Using a speech-innoise test (word identification) and a sentence verification test, the benefits of the represcription were not (or at best only marginally) evident upon immediate testing but became
statistically significant and of material clinical magnitude following experience with the represcription for 8 and 16 weeks. These results suggest that comparative selection regimes
and research designs based upon little or no experience of the listening environment through
the hearing aid are likely to seriously misrepresent the benefits available to the hearingimpaired listener .
his special issue of the Journal of the
American Academy of Audiology is a
T reflection of the growing interest and
the activity in the area of auditory deprivation.
The content also reflects the two strands of
work, which cover effects of deprivation in developing auditory systems (with particular interest in the development of binaural processing abilities) and also deprivation effects in
adults suffering from hearing impairments who
are managed by hearing aids. Auditory deprivation of late onset was put forward as a concept
by Silman and his colleagues (Silman et al,
1984 ; Gelfand et al, 1987) as an explanation for
their finding that subjects using monaural amplification exhibit a relative decrement in speech
identification scores for the usually unaided ear
relative to the usually aided ear, while in contrast individuals using no amplification or us-
*MRC Institute of Hearing Research, Glasgow,
Scotland
Reprint requests : Stuart Gatehouse, MRC Institute
of Hearing Research, Royal Infirmary, Alexandra Parade,
Glasgow G31 2ER, Scotland, UK
296
ing binaural amplification show no such
interaural discrepancies. Further results from
other subject groups have reinforced the evidence for this experimental finding (Silverman,
1989 ; Stubblefield and Nye, 1989). The experimental results have been interpreted as a reflection of a decrease in the analyser capacity of
an (impaired) ear, which suffers a deprivation of
auditory stimulation relative to the (aided)
contralateral ear. These putative deprivation
effects have been suggested to have mechanisms of action somewhat similar to those occurring in the developing auditory system .
Gatehouse (1989) has presented an acceptable but alternative hypothesis for apparent
deprivation effects in terms of perceptual acclimatization . This was prompted by the finding
that in long-term users of a single hearing aid,
the ear that is usually aided performs better
than the usually unaided ear at high presentation levels, while at lower presentation levels
the converse occurs . This intensity dependence
was put forward as evidence that an ear accustomed to receiving a high level of stimulation
will "acclimatize" to the pattern of speech cues
present and be most efficient at analyzing at
high presentation levels . At lower presentation
Acclimatization and Frequency Responses/Gatehouse
levels, the usually unaided ear receives its accustomed pattern ofcues and so performs better
than the usually aided ear. Further experiments (Gatehouse, 1992a) have shown that in
users of a single hearing aid, significant increases in the benefit from amplifying speech in
the aided ear occurred across a time course of 12
weeks after fitting, but not in control (unfitted)
ear. Furthermore, it appeared that the benefits
from providing a particular frequency spectrum
did not emerge immediately, but over a time
course of some 6 to 12 weeks. These experiments
additionally showed a small, but statistically
significant, decrease in the speech identification abilities of the control (not-fitted) ear over
the 12-week period, which might be interpreted
as early signs of deprivation effects.
The debate between the alternative hypotheses of late-onset auditory deprivation and perceptual acclimatization may therefore be resolved into an empirical estimation of magnitudes and time parameters of two comparable
processes. So far we have been concerned with
speech identification abilities (both in the usually aided ear and in the usually not aided ear)
following provision of a single hearing aid and
the roles of the two processes might differ for
other types, materials, or configurations of stimulation . More detailed information and evidence
on the underlying psychoacoustic (and perhaps
physiological) functional changes is required to
underpin the choice of the most appropriate
ways to examine, compensate for, or even exploit acclimatization in practice, while taking
account of the potential effects of deprivation.
Experiments to test some specific hypotheses
underlying the changes in speech identification
ability of users of a single hearing aid are
underway, but in the interim the available experimental data have potentially important and
far reaching implications for both research on
hearing aids and the fitting of hearing aids in
clinical practice . There is some evidence (Hurley, 1991) that the long-term decline in the
speech identification ability of the usually unaided ear of single hearing-aid users might be
reversed by application of a second hearing aid.
At any rate, two important implications for
hearing-aid fitting emerge from the experimental results so far: (1) the benefits of amplification do increase across time, and also (2) the
benefits of a theoretically advantageous frequency response do not emerge immediately,
but only after exposure to that listening environment . This article pursues the latter thread
rather than distinguishing between the two
theoretical concepts of acclimatization and deprivation, which remain the focus of other experiments.
The suggestion that hearing-impaired listeners take time to "get used" to amplified
speech in some general sense is not a new
suggestion to audiologists and indeed the evidence is relatively strong (Kapteyn,1977; Berger
and Hagberg, 1982 ; Brooks, 1999). Hearing-aid
use times and satisfaction do increase with time
(indeed structured rehabilitation regimes aim
to actively encourage manipulation ofsurroundings and acoustic conditions to encourage motivation whilst warning against early over-expectation). The evidence for underlying improvements in speech perception abilities and the
underlying perceptual rationale for this is much
more limited (Barford, 1979 ; Cox and Alexander, 1992 ; Gatehouse, 1992a) . The audiologic
literature contains many methodologies and
formulae for the prescription of the frequency
response and gain of a hearing-aid fitting based
on measurements such as pure-tone thresholds
(McCandless and Lyregaard,1983 ; Berger et al,
1984 ; Byrne and Dillon, 1986) on comfortable
andlor uncomfortable listening levels (Skinner
et al, 1982 ; Cox, 1985) or on strategies such as
the articulation index (Pavlovic, 1988 ; Berger,
1990 ; Rankovic, 1991). By whatever means
employed, these procedures aim to provide an
appropriate degree of amplification at each frequency . These procedures characteristically lack
validation on large-scale clinical populations.
Where attempts at evaluation have taken place,
there has been considerable difficulty in showing benefits from theoretically advantageous
approaches . It may be argued (Gatehouse, 1992b)
that part of the difficulty in distinguishing a
theoretically advantageous prescription from a
control condition has been a reliance on simple
identification scores . Real-world benefit from
amplification can be considered to have other
dimensions, including ease of listening. Another possible reason for the difficulties relates
to the acclimatization hypothesis. If the comparative evaluations have taken place prior to
the completion of any acclimatization (that is,
before the changes in speech identification ability on provision of amplification have the opportunity to asymptote) then the underlying differences between fitting strategies may not be
apparent . It was suggested in Gatehouse (1992a)
that this very process was occurring where the
benefits of a sloping high-frequency emphasis
over a flat-frequency spectrum only became
apparent after some 4 to 6 weeks experience
297
Journal of the American Academy of Audiology/Volume 4, Number 5, September 1993
with the high-frequency emphasis condition.
The general aim of this article is to identify a
situation where one prescription would be unequivocally accepted as superior to an alternative, and then to chart any differences as a
function of exposure to the listening environments.
In setting up such an experimental paradigm, care should be taken in the choice of the
control condition. If its characteristics were
deemed to be totally inappropriate to the hearing impairments fitted, the results would have
little applicability and, indeed, it is well established that speech identification indices can
distinguish between a broadly appropriate and
a broadly inappropriate frequency response
(Walden et al, 1983) . It would be tempting to
take two of the advocated prescription strategies from the literature and to run a comparative trial, but in the absence of any overall
acceptance of one particular methodology as
superior over a competitor, choice of the conditions proves problematic. However, in the United
Kingdom (UK) there is a system of socialized
medicine under which all hearing-impaired individuals can receive a standard hearing-aid
free of charge from a fairly extensive range.
Such a system of course has many advantages in
terms of the service costs per patient and in
particular to the pocket of the individual hearing-impaired listener, but these advantages are
conferred at the expense of quality and flexibility in technology . In the UK, standard postaural
hearing aids and standard ear molds are offered
free of charge, but these can, and do, result in
limited frequency responses. The ethics of a
trial in this circumstance, however, do not pose
any problems as this is the standard system of
provision in the UK. Given the system in place,
there is then the opportunity to change the
standard prescription to one of the methodologies advocated in the literature and to chart the
putative advantages ofthe re-prescription. This
then became the detailed aim of the experiment,
during the course of which individuals who had
been fitted (and acclimatized) to the standard
UK procedure were re-prescribed under a theoretically more advantageous procedure and the
speech identification abilities compared . Given
the suggestions in Gatehouse (1992b) concerning the limitations of simple identification tasks
and the demonstration of the ability of a more
complex procedure to distinguish between two
signal processing strategies, which were similar in terms of traditional identification performance (Baer et al, 1992), the experiment
298
contained not only a measure of traditional
identification ability, but also the recently developed sentence verification test .
METHOD
Audiometry
Pure-tone thresholds were measured in each
ear by air conduction at 0.25, 0.5, 1, 2, 4, and 8
kHz and by bone conduction at 0.25, 0.5, 1, and
2 kHz using a manual audiometry method recommended by the British Society of Audiology/
British Association of Otolaryngologists (1981) .
This is a modified Hughson-Westlake procedure using 5-dB steps. Measurements were
performed using a clinical audiometer calibrated
to BS2497 (British Standards Institution, 1969,
for air conduction ; British Standards Institution, 1972, for bone conduction).
Hearing-Aid Insertion Gain
The real-ear insertion gain of the hearing
aid was measured using a clinical flexible probetube system, the Acoustimed HA2000 . Continuous speech was played in a sound-deadened
room from a loudspeaker 2 m distant and 0
degrees azimuth at a sound pressure level (SPL)
of 65 dB for the mean of the speech peaks at a
point equivalent to the subject's head with the
subject removed from the sound field. Initially
the subject was seated and asked to adjust the
gain on the hearing aid to achieve maximum
intelligibility. Thereafter the insertion gain of
the hearing aid was assessed using a speechshaped wide-band input signal at 65 dB SPL.
The whole procedure was repeated three times
and the results averaged .
Word Identification in Noise
Traditional speech identification performance was assessed for single words in a background of noise using the four alternative auditory feature (FAAF) test. This is a forced-choice
word identification test based on the rhyme test
principle, described by Foster and Haggard
(1979, 1987). The material consists of 20 sets of
four minimally paired words, each based on two
binary auditory/phonetic distinctions giving an
80-item vocabulary and filtered noise with the
same long-term spectrum as the test items. Two
examples of these sets are: (1) mail, bale, nail,
dale ; and (2) rose, rove, robe, road. Nine sets
vary the initial consonant, and eleven the final
Acclimatization and Frequency Responses/Gatehouse
consonant. The test was administered free-field
with the subject seated 2 m from a loudspeaker
at 0 degrees azimuth. A fixed speech intensity of
65 dB SPL (measured at the center of the subject's head with the subject removed from the
sound field) was used with a noise level of 58
dBA. The speech level was defined from a 1-kHz
calibration tone, which had a sound pressure
level equal to the mean of the speech peaks of
the test words and the overall level of the noise
was measured as the A-weighted sound level.
The 80 items are stored as digitized waveform
files and presented to the subject via a standard
clinical audiometer and loudspeaker using 12bit digital-to-analog conversion . Because the
study involved multiple presentations of this
procedure under different conditions, a random
number table was employed to generate sequences of unique ordering for the 80 items.
Subject performance is assessed as the number
of words (out of a total of 80) correctly identified
and is expressed as percentage correct.
Sentence Verification Test
The construction, validation, and evaluation of the sentence verification test continues
(Gatehouse, 1992b), but a brief outline is given
here . The test uses a closed vocabulary to construct four-word sentences from an overall vocabulary of 32 words. There are 4 alternatives
for the first word in the sentence (Liz, Lynne,
Len, Ben), 12 alternatives for the second word
(sold, showed, stole, stored, wore, stitched, drove,
crashed, cracked, corked, read, tore), 12 alternatives for the third word (four, more, two, few,
tweed, cloth, fast, sports, glass, jam, road, street),
and 4 alternatives for the fourth word (caps,
cars, jars, maps). Of the 144 combinations ofthe
second and third words, there are 82 for which
there is at least one fourth word that makes the
sentence unequivocally silly (nonsense) and at
least one fourth word that makes the sentence
unequivocally sensible (e .g ., Ben sold street
maps is sensible, while Ben sold street jars is
silly) . Any combination of a fourth word with a
second word-third word pair previously judged
to be equivocal with regard to sense/nonsense is
not employed in the test. The eventual sentences require identification ofthe second, third,
and fourth words in the sentence before a decision regarding the sense/nonsense of the sentence may be made . The 32 words were stored as
digitized individual waveform files, which were
isolated from sentences spoken by a single male
talker and were concatenated to produce the
desired sentences. During the construction of
the test, care was taken to ensure that the
intonation contours of the items and other aspects such as duration of voicing, were similar
across items. This was done to remove extraneous cues not directly associated with the intelligibility of the individual word .
Following presentation of the sentence to
the listener, the subject was asked to indicate
whether the sentence was "silly" or" sensible"
via a touch-sensitive computer screen, and the
response time for that decision (verification
time) was recorded . This verification element
was followed by the identification element, for
which four potential alternatives for the first
word in the sentence, four for the second, four
for the third and four for the fourth word were
displayed on the touch sensitive computer screen.
The subject was required to identify the components of the sentence . The test is run adaptively
and yields a signal-to-noise ratio for criterion
performance. The verification component of the
test yields the median response time for the
cognitive decision concerning sense/nonsense of
the sentences. Previous work (Gatehouse, 1992b)
has documented the within-session and betweensession stability of the test for both normalhearing and hearing-impaired subjects, and
showed no significant long-term learning effects associated with repeated administration
of the closed vocabulary .
A 1-kHz sinewave was included in the waveform files and the signal level and noise level
(shaped noise with the same long-term spectrum as the single male speaker) were defined
in a manner similar to that described above for
the FAAF test. The test was administered via a
Grason-Stradler GSI 16 audiometer and a
Goodmans B41 loudspeaker in the sound-treated
room with the subject seated 2 m from the
loudspeaker at 0 degrees azimuth. Afixed speech
intensity of 65 dB SPL was used and the level of
filtered noise with the same long-term spectrum
as the SVT items adjusted to achieve criterion
performance. The test was configured to follow
a two-up, one-down procedure described by
Levitt (1971) converging on the 70 .7 percent
correct identification point on the psychometric
function . The test started at a signal-to-noise
ratio of +20 dB and proceeded with a step size of
2 dB . The test continued until ten reversals of
signal-to-noise ratio occurred and the last eight
reversals were averaged to produce a mean
signal-to-noise ratio for 70 .7 percent correct
identification . For the verification component of
the test, only those sentences that were con
299
Journal of the American Academy of Audiology/Volume 4, Number 5, September 1993
rectly verified (correctly labelled as either being
silly or sensible) and identified (each of the four
constituent words in the sentence identified
correctly) were used . The median of the response times for the verification process using
this subset of sentences was then derived. Thus
each run ofthe sentence verification test yielded
an identification component (the signal-to-noise
ratio in dB for 70 .7% correct identification of the
sentences) and a verification element (response
time for the decision regarding the sense/nonsense of the sentence).
Subjects
Subjects were identified from the clinical
records at the Audiology Department at Glasgow Royal Infirmary. Individuals were selected
who had been fitted with a single UK National
Health Service postaural hearing aid between
12 and 15 months previously . They were then
invited to attend for review and a measurement
of real-ear insertion gain (REIG) performed
using the procedure and equipment described
above. The socialized system of medicine currently present in the UK (the National Health
Service, NHS) uses relatively unsophisticated
(though low cost) hearing-aid technology with
limited bandwidth, particularly at high frequencies ; in conjunction with standard earmold
technology (again achieved at low cost at the
expense of output) can lead to limitations in the
high-frequency output of the hearing-aid fitting. The REIGs were inspected to identify subjects who, it might be considered, were using
less high-frequency gain than would be recommended by some external standard . Although
there is no internationally recognized standard
and many prescription procedures have been
proposed in the literature, the formula produced by the National Acoustics Laboratories
(NAL) in Australia (Byrne and Dillon,1986) has
probably achieved the widest acceptance . It was
therefore decided to use the NAL predictions as
the external standard . Subjects were selected
whose aids gave REIG less than the NALpredictions by 12 .5 dB or more averaged across the
frequencies of 2 and 3 kHz (the frequency of 4
kHz was not included in this selection process as
almost all subjects would fall into this category,
and it was desired to identify those subjects who
might be regarded as substantially "underfitted") .
This selection procedure identified 36 such
subjects with a mean age of 64 years (range
46-81 years) and air-conduction thresholds in
300
the fitted ear of 31 dB HL (range 20-45 dB) at
0.25 kHz, 31 dB HL (range 15-35 dB) at 0.5 kHz,
33 dB HL (range 20-55 dB) at 1 kHz, 43 dB HL
(range 20-60 dB) at 2 kHz, 57 dB HL (range 3080 dB) at 4 kHz, and 59 dB HL (range 40-85 dB)
at 8 kHz. The mean values by which these
subjects were "underfitted" with reference to
the NAL predictions were 10 .6 dB, 17 .2 dB, and
19 .7 dB at the frequencies 2, 3, and 4 kHz
respectively .
The 36 subjects were then refitted, again
with a single postaural hearing aid, using hearing-aid and earmold technology available from
the commercial sector in an attempt to achieve
the NAL target figures for the REIG. In selecting the hearing aids, the technology was restricted to devices that were linear (no compression) and achieved their output limiting by peak
clipping in a manner similar to the UK NHS
technology. In addition, the standard coupler
performance in terms of harmonic and
intermodulation distortion between the NHS
fittings and NAL re-prescriptions was similar .
These restrictions were employed in an attempt
to ensure that any differences in performance
between the original fittings and re-prescriptions were due to changes in the frequency
response, as opposed to any other parameters of
the hearing aid characteristic. The solutions
adopted did differ across subjects, particularly
with regard to the necessary modification to the
earmold technology, but a judicious selection of
hearing-aid and earmold characteristics enabled all of the 36 subjects to be fitted to within
±3 dB of the NAL targets for the frequencies
between 0.25 kHz and 2 kHz and ±5 dB for the
frequencies of 3 and 4 kHz. Throughout this
article the original fitting is designated "UK
NHS" and the re-prescription the "NAL" fitting.
Thus, the selection procedure identified 36 subjects who, according to the NAL predictions,
initially received substantial underprovision in
terms of high-frequency gain, which upon application of the NAL fitting was materially alleviated .
Experimental Conditions
Each of the 36 subjects selected by the
process described above had been using their
UK NHS fitting for between 12 and 15 months,
and therefore any process of acclimatization to
that fitting should have been substantially complete. Each subject was then tested on three
occasions designated Week 0, Week 8, and Week
16 . Week 0 was the first occasion upon which the
Acclimatization and Frequency Responses/Gatehouse
NAL target was achieved and represents the
initial comparison of the benefits of providing
the extra high-frequency information. Tests at
Week 0 reflect the comparison between UK
NHS fitting and the NAL fitting when the
subject had no experience ofthe NAL characteristic and was acclimatized to the UK NHS
fitting. Following the assessment at Week 0, the
subject ceased to use the UK NHS fitting and
was switched to the NAL fitting, thus providing
the subject the opportunity to acclimatize to the
NAL fitting. Testing thereafter took place in
both conditions with the subject gaining increasing exposure, and hence acclimatization,
to the NAL fitting. During this time the earmold
and hearing aid associated with the UK NHS
fitting was retained in the laboratory and used
only during the test sessions . At each of the test
sessions (at Week 0, Week 8, and Week 16) the
FAAF test and sentence verification test (SVT)
were applied in a counterbalanced order between subjects and across visits . For each of the
tests (FAAF and SVT) the subject was tested in
the aided and unaided conditions (order counterbalanced) with an initial practice run in a
randomly selected (aided/unaided) condition.
The results from the practice run were discarded and not used in the analysis and were
employed to stabilize initial performance.
RESULTS
he design of the experiment produces reT sults for performance on the FAAF and
sentence verification tests at Week 0, Week 8,
and Week 16 in the two conditions corresponding to the UK NHS fitting and NAL fitting
(remembering that prior to Week 0, the subject
listened in everyday life through the UK NHS
fitting and between Week 0 and Week 16 the
NAL fitting) . The results for the percentage
correct score on the FAAF test are shown graphically in Figure 1, which contains the mean score
for each fitting at Week 0, Week 8, and Week 16
accompanied by the 95 percent confidence interval (±2 standard errors) on that mean. Inspection of Figure 1 would suggest that at Week 0
(experience of the UK NHS fitting but not the
NAL fitting) there is little difference in performance . However, as time progresses to Week 8
and Week 16, performance in the NAL fitting
improves whereas performance under the UK
NHS fitting remains stable . Figures 2 and 3
contain the corresponding results for the identification component of the sentence verification test and the verification element respec-
= UK NHS Fitting
95
U
o
U
c
U
U1
a
NAL Fitting
95
90
90
85
85
80
80
75
WEEK 0
WEEK 8
75
WEEK 16
Figure 1 Mean (± 2 standard errors on the mean) of the
percentage correct score on the FAAF test for the 36
subjects at Week 0, Week 8, and Week 16 for the UK NHS
fitting and the NAL fitting .
tively . Figure 2 shows a somewhat more complex pattern with improvements across time for
both the UK NHS and NALfittings, but with the
advantage of the NAL fitting over the UK NHS
fitting becoming more apparent with time . The
results in Figure 3 for the response time (verification element) from the sentence verification
test show a pattern more similar to those in
Figure 1 from the FAAF test, with no apparent
difference between the UK NHS and NAL fittings at Week 0 but one emerging at Weeks 8
and 16 .
Preliminary inspection of the data showed
the distributions to be approximately normal
and therefore parametric statistical methods
have been employed. As a first stage, a series of
Student's paired t-tests was performed and a
digest of the results is shown in Table 1 . It can
= UK NHS Fitting
NAL Fitting
Figure 2
Mean (± 2 standard errors on the mean) of the
identification component (signal-to-noise ratio in dB for
70 .7% correct identification of the sentences) on the
sentence verification test for the 36 subjects at Week 0,
Week 8, and Week 16 for the UK NHS fitting and the NAL
fitting. Note that a decrease in signal-to-noise ratio for
criterion performance represents an improvement in
performance.
301
Journal of the American Academy of Audiology/Volume 4, Number 5, September 1993
= UK NHS Fitting
NAL Fitting
1600
1600
~ 1400
E
ro
E
1400
1200
1200
1000
1000
N
C
O
d
800
WEEK 0
WEEK 8
WEEK 16
800
Figure 3 Mean (± 2 standard errors on the mean) of the
verification component (median response time for decision concerning sense/nonsense of the sentences) on the
sentence verification test for the 36 subjects at Week 0,
Week 8, and Week 16 for the UK NHS fitting and the NAL
fitting. Note that a decrease in response time represents
an improvement in performance .
be seen that for the UK NHS versus NAL
comparison at Week 0 neither the percentage
correct score on the FAAF test nor the identification element of the sentence verification test
achieves statistical significance, although the
verification element of the sentence verification
test just achieves statistical significance at p <
.05, with an advantage in response time of 112
msec for the NAL fitting over the UK NHS
fitting. Contrast these findings with those at
Table 1
Summary of the Student's Paired t-Tests
FAAF*
UK NHS vs NAL
at Week 0
UK NHS vs NAL
at Week 8
Week 8 and Week 16 where all three aspects of
performance show significant advantages ofthe
NAL fitting. Paired comparisons were also taken
within fitting across time and comparisons for
Week 0 versus Week 8, Week 0 versus Week 16,
and Week 8 versus Week 16 are also shown in
Table 1.
The paired comparisons in Table 1 indicate
the likely outcome of experiments, which might
have been conducted at single points in time,
but such comparisons on repeated data sets
could be misleading due to the problem of multiple comparisons capitalizing on chance statistical results. Accordingly, a series of repeated
measures of analysis of variance was performed
using the FAAF test score and both the identification and verification elements of the sentence verification test as dependent variables.
These are summarized in Tables 2, 3, and 4
respectively . In these analyses, the fitting (NAL
versus UK NHS) and time of assessment (Week
0, Week 8, or Week 16) were designated as
within-subject factors, and the interaction term
between them was included . Table 2 shows that
for the FAAF test there is a significant effect of
fitting leading to an overall improvement of 2.87
percent correct score for NAL as opposed to UK
NHS fitting, and also an overall effect of assessment time with an improvement of score of 1.52
SVT-SINt
SVT-RT $
NS
NS
p< .05
p < .05
(+2 .3%)
p < .001
p< .001
p< .001
(+4.4%)
p< .005
(+0 .7 dB)
Week 0 vs Week 8
for UK NHS
Week 0 vs Week 16
for UK NHS
Week 8 vs Week 16
for UK NHS
NS
NS
NS
NS
p < .01
NS
p < .05
(-1 .1%)
p< .05
(+0 .8 dB)
NS
Week 0 vs Week 8
for NAL
Week 0 vs Week 16
for NAL
Week 8 vs Week 16
for NAL
p < .001
(+3 .6%)
p < .001
(+4 .5%)
NS
p< .05
(+0 .9 dB)
NS
(+1 .2 dB)
NS
(+230 msec)
p < .001
(+151 msec)
UK NHS vs NAL
at Week 16
(+1 .2 dB)
(+1 .1 dB)
p< .01
(+112 msec)
(+223 msec)
p< .001
(+312 msec)
p< .001
Results of the Student's paired t-tests are summarized for *the percentage correct score on the FAAF test ; t the
identification component (signal-to-noise) ratio in dB for 70 .7% correct identification of sentences) of the sentence verification
test ; and $the verification component (median response time) from the sentence verification test . The significance level of
each comparison and where this achieved p < .05 the magnitude of the significant difference is listed . The sign of this
magnitude is arranged to be positive if the NAL fitting gives an advantage over the UK NHS fitting (irrespective of the direction
of the individual metric employed), and also to be positive if there is an improvement across time .
302
11 .rs
Acclimatization and Frequency Responses/Gatehouse
Table 2
Source of
Variation
Summary of the Repeated Measures Analysis of Variance with Percentage Correct Score
on the FAAF Test as the Dependent Variable
Sum of
Squares
DF
Mean
Square
F
P
1724 .8
296 .3
Parameter
Estimate (%)
35
1
SE of
Estimate (~)
49 .3
296 .3
6 .01
p < .02
2 .87
1 .17
Within cells
Time
826 .8
305 .2
70
2
11 .8
152 .6
12 .92
p < .001
1 .52
2 .49
Within cells
NAL vs UK NHS
by Time interaction
341 .3
146 .0
70
2
4 .9
73 .0
14 .97
p < .001
0 .01
2 .01
Within cells
NAL vs UK NHS
.
0 .62
0 .52
0 .39
0 .34
The within subject factors are the hearing-aid fitting (NAL vs UK NHS), and the time of test (Week
8 and Week 16 vs
Week 0) . The interaction term is included . The parameter estimates are the magnitude of the effects with the
reference
condition set as the UK NHS fitting and Week 0 . The sign of the parameter estimate is arranged so
that a positive value
corresponds to an improvement in performance in the NAL vs UK NHS fitting and an improvement in
performance of
Week 8 with respect to Week 0 and Week 16 with respect to Week 0.
Table 3
Source of
Variation
Summary of the Repeated Measures Analysis of V ariance with Identification Component
(Signal-to-Noise Ratio in dB for 70.7% Correct Identification) on the
Sentence Verification Test as the Dependent Variable
Sum of
Squares
DF
Mean
Square
F
P
Parameter
Estimate (dB)
SE of
Estimate (dB)
Within cells
NAL vs UK NHS
174 .8
37 .3
35
1
5 .0
37 .3
7 .47
p < .01
1 .02
0 .37
Within cells
Time
198 .8
49 .4
70
2
2 .8
24 .7
8 .69
p < .001
1 .17
0 .02
0 .25
0 .31
Within cells
NAL vs UK NHS
by Time interaction
106 .4
14 .7
70
2
1 .5
7 .4
4 .93
p < .02
0 .07
0 .84
0 .23
0 .26
The within subject factors are the hearing-aid fitting (NAL vs UK NHS), and the time of test (Week 8 and Week
16 vs Week
0) . The interaction term is included . The parameter estimates are the magnitude of the effects with
the reference condition
set as the UK NHS fitting and Week 0 . The sign of the parameter estimate is arranged so that a positive
value corresponds
to an improvement in performance in the NAL vs UK NHS fitting and an improvement in performance of Week 8 with
respect
to Week 0 and Week 16 with respect to Week 0 .
Table 4 Summary of the Repeated Measures An alysis of Variance with Verification Component
(Median Response Time for Sen se/Nonsense Decision) on the
Sentence Verification Test a s the Dependent Variable
Source of
Variation
Sum of
Squares
DF
Mean
Square
Within cells
NAL vs UK NHS
F
2,660,249
2,506,265
35
1
76,007
2,506,265
32 .97
p < .001
264 .0
46 .1
Within cells
Time
4,010,406
698,026
70
2
57,291
349,013
6 .01
p < .01
47 .4
130 .9
35 .0
44 .3
Within cells
NAL vs UK NHS
by Time interaction
2,021,398
361,133
70
2
28,877
180,566
6 .25
p < .01
6 .5
99 .9
14 .0
37 .5
P
Parameter
Estimate (msec)
SE of
Estimate (msec)
The within subject factors are the hearing-aid fitting (NAL vs UK NHS), and the time of test (Week 8 and
Week 16 vs
Week 0) . The interaction term is included . The parameter estimates are the magnitude of the effects with the reference
condition set as the UK NHS fitting and Week 0 . The sign of the parameter estimate is arranged so
that a positive value
corresponds to an improvement in performance in the NAL vs UK NHS fitting and an improvement in performance
of
Week 8 with respect to Week 0 and Week 16 with respect to Week 0 .
Journal of the American Academy of Audiology/Volume 4, Number 5, September 1993
percent from Week 0 to Week 8 and 2.49 percent
from Week 0 to Week 16. However, the interaction term is also highly significant and its predominant parameter estimate at 2.01 percent is
of comparable magnitude to the overall main
effects described above. Thus the repeated measures analysis of variance confirms the impressions obtained in the graphic representation in
Figure 1, of not only significant changes across
time and within fitting, but a significant interaction between the two, such that the advantages of the NAL fitting over the UK NHS fitting are not apparent immediately, but only
emerge following significant experience . Inspection of Tables 3 and 4 shows very similar findings for the two elements of the sentence verification tests, again with overall main effects of
fitting and assessment time, but with significant interaction components, which yield parameter estimates of comparable magnitude to
the main effects . Thus the impressions from the
graphic representations in Figures 1, 2, and 3
and from the multiple paired t-tests in Table 1
are confirmed by the more robust multivariate
analysis .
DISCUSSION
T
his article has described a group of subjects fitted with single hearing aids who
according to the requirements of a NAL prescription for gain as a function of frequency
have been underfitted at the upper end of the
frequency range. The subjects used a standard
UK NHS postaural aid and mold for between 12
and 15 months, and were then subsequently
fitted according to the NAL prescription . Immediate testing of the NAL fitting versus the UK
NHS fitting using a word-in-noise identification
test and a sentence verification test suggested
little, if any, additional benefit from the provision of the high-frequency information, with
only the verification element (response time) of
the sentence verification test yielding statistically significant results. However, following
experience with the NAL fitting of 8 and 16
weeks, all three of the performance indices
showed clear and statistically significant advantages of the NAL fitting.
An appropriate multivariate analysis of the
results shows that these findings were not attributable to task learning (although contrary
to Gatehouse, 1992b, there was a significant
improvement in performance on the identification element of the sentence verification test,
suggesting that further investigations of tasklearning effects in the SVT might be required).
However, there was a definite interaction be304
tween the NALversus UK NHS fitting and time
of exposure to the re-prescription. Whilst the
initial significant differences on the response
time element of the sentence verification test
are encouraging from the point of view of demonstrating the potential advantages of such a
configuration over traditional identification
paradigms (Gatehouse, 1992b), the results as a
whole have major implications for the interpretation of previous research into hearing aids
and the design of future research and perhaps
also clinical practice . The el_periments did not
attempt to address the issue of acclimatization
versus deprivation as competing hypotheses for
explaining some of the changes in speech identification ability in users of a single hearing aid.
There is certainly little evidence here of decline
in the now unfamiliar condition (if the single
finding of a decrease in the FAAF scores between Week 8 and Week 16 is ignored) . The
initial comparison at Week 0 closely mimics
many reports in the literature of comparative
evaluations of hearing aid prescriptions and/or
hearing aid prescription strategies either in
individuals or in groups, where either null or
only marginally statistically significant results
are achieved . The results from the present experiments suggest that the potential benefits of
a theoretically advantageous frequency response
require exposure for the benefits to become
apparent. As such, the results reinforce the
arguments already put forward in Cox and
Alexander (1992) and Gatehouse (1992a) that,
when new patterns of speech cues are presented
to hearing-impaired listeners, the auditory system takes a certain amount of learning time to
make optimum use of the particular cues . The
present results throw little light, and indeed are
not intended to, on the underlying psychoacoustic
nature of the changes that take place, which
could conceivably be in the domain of streaming/grouping (for an introduction to general
principles, see Yost, 1991) or perhaps in a complex re-mapping of internal loudness coding
across frequencies.
In making the above interpretations of the
experimental finding, it has been necessary to
attribute the changes in test performance to the
differing frequency responses rather than changing patterns of hearing aid use, differing distortion characteristics across the aids employed, or
other as yet unidentified differences between
the conditions . Although such effects cannot be
categorically ruled out, there were no differences in reported daily responses or patterns of
use as assessed by simple self-report, but detailed information from daily diaries was not
available. Similarly there were no measured
Acclimatization and Frequency Responses/Gatehouse
differences in distortion as characterized by
simple harmonic or intermodulation indices,
but measures based on broad-band speech-like
signals were not performed. Given the substantial differences in frequency response, it appears reasonable to ascribe the changes to that
domain rather than to other potentially secondorder effects. The gains measured for the low/
mid frequencies did not change across time,
although by definition the high-frequency measures did do so .
The direct implications in terms of selecting
the frequency response for a particular individual or group of individuals reinforce the
already present evidence concerning the potential drawbacks of relying on immediate consumer preference on intelligibility judgments
by extending the limitations to performance
tests. Hitherto, the limitations of performance
tests have largely been confined to the problems
of configuring stable and sensitive instruments
(Shore et al, 1960 ; Resnick et al, 1963 ; Walden,
1983). The present results suggest that, even if
stable sensitive instruments can be achieved,
the fundamental limitation of acclimatization
would still potentially debar particular optima
from being identified prior to material exposure
to the particular frequency responses. There is,
however, some encouraging evidence in the significant differences even at Week 0 in the response time element of the sentence verification
test, which suggest some promise ofescape from
uncertainty.
The data in Gatehouse (1992b) showed that
for a small group of 4 new hearing aid users, the
benefits of a rising over a flat frequency response only became apparent following some 4
to 6 weeks of experience . The present study on
alarger subject group further expands theargument, though here in experienced hearing aid
users . Given the previous results, the new data
suggest that the benefits of making a range of
high-frequency speech cues newly available to
the listener only become apparent when the
listener has substantial experience and exposure to those new cues . The two sets of results
make it difficult to interpret the finding in
terms of subjects' motivation and effort in familiar circumstances, but rather suggest an underlying change in perceptual mechanisms .
In choosing a group of listeners with substantial "under-amplification" at high frequency,
the present experiment has allowed potentially
large acclimatization effects, due to the potentially large change in patterns of speech available to the listener. Further experiments are
required to quantify the magnitude of any improvements in performance for differing fre-
quency regions. Although the present data has
shown mean improvements across a group of
subjects, without extensive repeat measures as
in Gatehouse (1992b), it is not possible to quantify the number of subjects as individuals who
do show a performance improvement with time,
given the magnitude of a critical difference of
approximately 15 percent for a procedure such
as the 80-item FAAF test . The relationship
between magnitude of impairment and the auditory and nonauditory characteristics of the
listeners remain the subject of further study.
The present results suggest that the current literature on comparative evaluation of
frequency responses be re-interpreted, not only
in the light of the now known limitations of the
particular speech instruments employed, but
also in terms of the time given to the various
frequency responses tested and whether this
achieved a meaningful comparison . It seems
sensible to suggest that the same process might
apply to intelligibility judgments rather than
performance tests themselves, although the
present experimental results do not speak directly to that issue . It is also tempting to speculate that similar results might be found for
other types of processing, in particular some of
the signal processing schemes that attempt to
radically alter the nature of the speech signal
presented to the listener . If this does prove to be
the case, then the development of wearable
digital signal processing devices (as opposed to
simulations on laboratory computers where listening exposure and experience is by necessity
limited) becomes a high priority . Then experiments could be configured in such a way that
hearing-impaired listeners are given the opportunity to be exposed to, and maximally benefit
from, potentially advantageous signal processing strategies before they are assessed . In this
way, although it is likely that only those schemes
with demonstrable potential benefits in the
laboratory are likely to proceed to field trials, it
may be that the laboratory benefits are rather
small and that potentially advantageous
schemes maybe unnecessarily rejected because
of limitations in the experimental design .
Unlike the previous experiments by Gatehouse (1989, 1992a), the current results have
been obtained from a relatively heterogeneous
population in terms of age and hearing level and
use not only word identification in noise but
perhaps a more representative speech sentencebased test . Coupled with the results in Cox and
Alexander (1992) where changes in both measured and reported hearing-aid benefit across
time are documented (although not relative
changes of one hearing-aid prescription against
305
Journal of the American Academy of Audiology/Volume 4, Number 5, September 1993
another) and similar findings in a study on
cochlear implant patients (Tyler et a1,1986), we
suggest that the role of acclimatization to particular hearing aid characteristics must be seriously considered both in the design of experimental research studies and in clinical practice
where comparative selection regimes are in
operation.
Gatehouse S. (1989) . Apparent auditory deprivation effects of late onset: the role of presentation level. JAcoust
Soc Am 86 :2103-2106 .
REFERENCES
Gelfand SA, Silman S, Ross L. (1987) . Long-term effects
of monaural, binaural and no amplification in subjects
with bilateral hearing loss . Scand Audiol 16 :201-207 .
Baer T, Moore BCJ, Gatehouse S. (1992). Spectral contrast enhancement of speech in noise for listeners with
sensorineural hearing impairment : effects on intelligibility, quality and response times . J Rehabil Res Deu in
press.
Barfod J . (1979) . Speech perception processes and fitting
of hearing aids . Audiology 18 :430-441 .
Berger KW, Hagberg EN . (1982) . Hearing aid users
attitudes and hearing aid usage . Monogr ContempAudiol
3 :24.
Berger KW, Hagberg EN, Raine RL. (1984). Prescription
of Hearing Aids : Rationale, Procedures and Results. 4th
ed . Kent, OH :Herald.
Berger KW . (1990). The use of an articulation index to
compare three hearing aid prescriptive methods .
Audecibel 39 :16-19 .
British Society of Audiology/British Association of
Otolaryngologists . (1981) . Recommended procedures for
pure tone audiometry using a manually operated instrument. Br JAudiol 15 :213-216.
British Standards Institution . (1969) . Specification for a
Reference Ofor the Calibration ofPure Tone Audiometers.
Data for Certain Earphones Used in Commercial Practice . BS2497 Part 11 . London : British Standards Institute.
British Standards Institution. (1972) . Specification for a
Reference O for the Calibration ofPure Tone Audiometers.
Normal Threshold of Hearing for Pure Tones by Bone
Conduction . BS2497 Part IV. London : British Standards
Institute.
Gatehouse S. (1992a). The timecourse and magnitude of
perceptual acclimatisation to frequency responses : evidence from monaural fitting ofhearing aids . JAcoust Soc
Am 92(3):1258-1268 .
Gatehouse S. (1992b). The Evaluation of a Sentence
Verification Test for the Assessment ofHearing Aid Benefit . Unpublished data.
Hurley RM . (1991) . Hearing Aid Use in Auditory Deprivation : A Prospective Study. Paper presented at the
meetingof the American Academy ofAudiology in Denver,
Colorado, April 1991 .
Kapteyn K. (1977) . Satisfaction with fitted hearing aid.
Scand Audiol 6:147-156 .
Levitt H. (1971) . Transformed up-down methods in
psychoacoustics. JAcoust Soc Am 49 :467-477 .
McCandless GA, Lyregaard PE . (1983) . Prescription of
gain/output (POGO) for hearing aids . Hear Instr 34 :1621 .
Pavlovic CV . (1988) . Articulation index predictions of
speech intelligibility in hearing aid selection. Asha 30:6365 .
Rankovic CM . (1991). An application of the articulation
index to hearing aid fitting. J Speech Hear Res 34 :391402.
Resnick DM, Becker M . (1963) . Hearing aid evaluationa new approach . Asha 5 :659-699 .
Shore 1, Bilger IC, Hirsh IH . (1960) . Hearing aid evaluations: reliability of repeated measures . J Speech Hear
Disord 25:152-170 .
Silman S, Gelfand SA, Silverman CA. (1984). Late onset
auditory deprivation: effects of monaural versus binaural aids . JAcoust Soc Am 76 :1357-1362 .
Silverman CA. (1989) . Auditory deprivation. Hear Instr
40(9) :26-29 .
Byrne D, Dillon H. (1986). The National Acoustics Laboratories (NAL) new procedure for selecting the gain and
frequency response of a hearing aid . Ear Hear 7:257-265 .
Skinner MW, Pascoe DP, Miller JD, Popelka GR. (1982).
Measurements of the region for the optimum placement
of speech energy within the listener's auditory area: a
basis for selecting amplification characteristics . In :
Studebaker GA, Bess FH, eds . Vanderbilt Hearing Aid
Report . Monogr Contemp Audiol 161-169.
Cox RM . (1985) . Hearing aids and aural rehabilitation: a
structured approach to hearing aid selection. Ear Hear
6:226-239 .
Stubblefield J, Nye C. (1989) . Aided and unaided time
related differences in word discrimination. Hear Instr
40 :38-78 .
Cox RM, Alexander GC . (1992) . Maturation of hearing
aid benefit: objective and subjective measurements . Ear
Hear 13 :131-141 .
Tyler RS, Preece JP, Lansing CR, Otto ST, Gantz BJ .
(1986) . Previous experience as a complementary factor in
comparing cochlear implant processing schemes. JSpeech
Hear Disord 29 :282-287 .
Brooks DN . (1989) . Adult Auditory Rehabilitation . London: Chapman and Hall .
Foster JR, Haggard MP . (1979) . FAAF-An efficient
analytical test of speech perception . Proc Inst Acoust
182 :9-12 .
Foster JR, Haggard MP . (1987) . The four alternative
auditory feature test (FAAF) : the acoustic and psychometric properties of the material with normative data in
noise . Br JAudiol 21 :165-174.
306
Walden BE, Schwartz DM, Williams DL, HolumHardegen LL, Crowley JM. (1983) . Test of the assumptions underlying comparative hearing aid evaluations . J
Speech Hear Disord 48 :264-273 .
Yost WA. (1991) . Auditory image perception and analysis-The basis for hearing. Hear Res 56 :8-18.