pdf

ICaD 2013
6–10 july, 2013, Łódź, Poland
The 19th International Conference on Auditory Display (ICAD-2013)
international Conference on auditory Display
July 6–10, 2013, Lodz, Poland
ANALYSIS OF THE SPECTRAL VARIATIONS IN
REPEATED HEAD-RELATED TRANSFER FUNCTION
Areti Andreopoulou
Agnieszka Roginska
Hariharan Mohanraj
MEASUREMENTS
ANALYSIS OF THE SPECTRAL VARIATIONS IN REPEATED HEAD-RELATED
TRANSFER FUNCTION MEASUREMENTS
∗
†
Music and Audio Research Laboratory (MARL)
Areti Andreopoulou1New
, Agnieszka
Rogińska2, Hariharan Mohanraj
York University
Music and Audio Research Laboratory (MARL)
New York, NY
New York University,
∗
† New York, NY
[email protected]
[email protected]
1
[email protected], [email protected]
ABSTRACT
This paper discusses the range of spectral variations between
HRTF sets measured on the same subjects. The analysis is done in
a corpus of 40 HRTF datasets of 4 subjects (10 datasets per subject). Variations are observed as a function of frequency, distance
of the ears to the sound source (ipsilateral or contralateral), and location. Assessments of the spatial quality of all datasets were made
through a subjective study which confirmed that despite their spectral variations, all individually measured HRTF sets maintained a
high degree of spatial realism. An understanding of the variability
in HRTFs can offer new intuition on objective binaural filter evaluation, and has significance in the research fields of spatial audio
reproduction and virtual auditory display.
1. INTRODUCTION
3D virtual auditory displays are created by applying the spatial
cues that are characteristic of each sources intended location on
monophonic sounds. Location-depended spatial cues are captured in the Head-Related Impulse Responses (HRIRs) or their
frequency domain equivalent Head-Related Transfer Functions
(HRTFs).
Due to the high variability in people’s head shapes and especially pinnae, HRTFs vary vastly among individuals. The use of
non-individualized HRTFs in binaural reproduction may lead to an
unconvincing spatial impression, which can compromise the realism, quality, and accuracy of the delivered message. Consequently,
various scientific approaches have emerged trying to overcome
the need for personalized data, such as HRTF customization [1],
[2], [3], database matching [4], [5], and HRTF modeling [6], [7],
[8]. Also, in an attempt to create realistically dense auditory environments, much research has been done regarding high-accuracy
HRTF interpolation methods [9], [10].
The accuracy of acquired or computed HRTFs can only be
evaluated either perceptually through a user study, or objectively
based on a defined metric, such as the mean squared error (MSE),
the correlation distance, or the signal to distortion ratio (SDR).
While in the first case a successful dataset is the one that conveys
a convincing spatial image, in the latter it is the one that demonstrates the smallest variation from an originally measured set.
Even though objective evaluation processes can be quick, they
are oversimplified as they assume uniformity in the perceptual
weights of spectral variation across frequency. Nevertheless, the
brain has a certain degree of tolerance in HRTF variations, as studies have shown that the human auditory system has the ability to
successfully adapt to altered spectral cues, given time [11]. Hence,
a more enhanced evaluation process would take into account prior
knowledge of the expected variability within HRTF sets, in order
to create perceptually driven thresholds.
Absolute variations in measured datasets have been reported
before by Riederer [12], [13], who offered a detailed overview of
the degree of error introduced to an HRTF pair by a number of
factors, such as the background noise, reflections from the measurement system, accidental movements or misalignment of the
subject, the use of ear plugs, accurate placement of the miniature
microphones etc. However, that work explored the effects of a single error-factor at a time, while in real-life situations multiple can
contribute to the final measurement outcome concurrently. In a
similar study Wersényi, and Illényi investigated the effect of nearthe-head everyday objects like hair, caps, glasses and clothing, on
HRTF datasets. The analysis was based on repeated measurements
on a computer- controlled dummy head measurement system [14].
The goal for this work is to revisit this topic by discussing
the spectral variability of binaural filters, through observation of
the MARL database, a corpus of repeated HRTF measurements
across multiple subjects. More specifically, this paper presents an
assessment of the spectral changes that can be observed between
repeated HRTF measurements on the same subject as a function of
frequency, location (azimuth / elevation angle), and distance of the
ear to the sound source (ipsilateral or contralateral filters).
2. THE DATABASE
2.1. Description of data
The database consists of repeated HRTF measurements on 4 subjects. It has a total of 40 datasets (10 sets per subject), all captured
by the Music and Audio Research Laboratory, at New York University. The spatial resolution of the database was designed to be
uniform at 10◦ horizontal and 15◦ vertical increments from −30◦
to +30◦ in elevation. The measurements were reduced to 256-tap
HRIRs to remove room reflections, and was free-field equalized to
compensate for the spectral characteristics of the speaker setup.
The measurements took place at two different locations: the
Spatial Auditory Research Lab, a 4.5 by 3.5 by 2.5 m semianechoic room, and the Dolan Studio live-room, a 9 by 4.6 by
3 m sound treated space. For both cases 5 different Genelec 8030a
speakers were positioned in a spiral configuration. The subjects
were seated on a rotating stool with adjustable height, at the center
of the spiral, at a distance of 1m from the speakers.
All data was measured using the blocked meatus tech-
213
ICAD 2013
aNaLYSIS OF THE SPECTRaL VaRIATIONS IN REPEaTED HEaD-RELaTED TRaNSFER FUNCTION MEaSUREMENTS
The 19th International Conference on Auditory Display (ICAD-2013)
July 6–10, 2013, Lodz, Poland
3. SPECTRAL VARIATION IN HRTF SETS
3.1. Data post processing and variation estimation
The database contains 256-tap HRIRs in the form of minimum
phase filters with incorporated ITD information. The responses
are low-pass filtered with a cosine window at 20000 Hz to eliminate high frequency content. For the purposes of this analysis,
the length of the HRIRs was shortened to 1.5ms, to include only
the pinnae responses, and the amplitude of each filter set was normalized to the left-right ear maximum, to eliminate the impact
of overall amplitude differences between datasets on the intended
analysis. All binaural filters were band-limited between 500 Hz
and 16000 Hz using a rectangular window. In order to reduce the
weight of high frequency content on the results, signals were averaged across 12th octave bands. Ipsilateral and contralateral content data was stored independently, and analyzed separately.
The spectral variation between HRTF sets was calculated per
12th octave band. When the differences between all possible
sets, which belonged to the same subject, were calculated, the
frequency-band-dependent results were sorted and averaged out,
to indicate the average increase in dB magnitude difference across
all repeated measurements. Changes in the average spectral variations were then studied as a function of frequency, proximity to
the sound source (ipsilateral or contralateral filters), and location.
3.2. Spectral variation as a function of frequency
For this part of the analysis variation in dB magnitude across
HRTF sets is studied exclusively as a function of frequency.
Changes are observed across binaural filters from all available azimuth elevation locations, and all subjects. Results are reported at
the median value, the 75th percentile, at + 2.7 σ and at the maximum value. Figure 1 shows a graphical report of the results.
Below 1.5 kHz the median spectral variations are around 1.5
dB, the 75th percentiles 2.5 to 3.5 dB, and the upper boundaries
5.0 to 7.0 dB. From 1.5 kHz to 6 kHz the medians are between
2.5 dB and 4.0 dB, the 75th percentiles from 3.5 to 6.5 dB, and
the upper boundaries from 8.0 to 16.0 dB. For frequencies above 6
kHz there is an even larger increase in spectral variation. Median
214
20
15
10
5
15868
12862
10211
8106
6436
5110
4057
3221
2559
2033
1616
1283
1020
812
0
647
The capturing procedure of the collection was designed to explore
variability in HRTF sets while maintaining a high degree of accuracy in the measurement process. This was achieved by introducing changes in five otherwise controlled conditions: the space
where the measurements took place, the speaker setup, the removal
and readjustment of the miniature microphones between measurements of different datasets, the degree of precision in monitoring
the alignment of the subjects, and the involvement of 2 moderators with different levels of expertise in the HRTF measurement
process. A more detailed description of the database and its acquisition process can be found in [16].
25
515
2.2. Variability in the data
30
Variation in dB
nique with custom-made miniature binaural microphones using
Sennheiser’s KE − 4 capsules. For 4 of the HRTF sets the positioning of each subject was monitored with the use of laser pointers, while for the remaining the Polhemus Liberty magnetic tracker
was employed. The HRTF filters were captured in scanIR [15]
with 2 different one-second excitation signals -a Maximum Length
Sequence (MLS), and Golay codes-, sampled at 48000 Hz.
Center Frequencies in Hz
Figure 1: Box plot of the frequency dependent spectral variations.
Box edges indicate the 25th and 75th percentiles and the red line
marks the median. Variations are also marked at +2.7σ (black
stem), and at the maximum value (red cross).
values are around 5.3 dB, 75th percentiles are between 9.0 dB and
11.0 dB, and upper boundaries vary from 18.0 dB to 23.0 dB.
As far as maximum variations are concerned below 5 kHz they
range between 7.0 dB and 16.9 dB, while for frequency bands
above that they raise up to 30.0 dB. Deviations of up to 20 dB
have been reported before [14] for dummy head measurements
conducted in conditions where unwanted movements, body alignment, and microphone placements were well controlled. The increase of up to 10 dB in our results can be expected since data is
collected on human subjects.
3.3. Spectral variation as a function of distance from the ears
to the sound source
In this part spectral variations are analyzed separately for ipsilateral and contralateral filters across frequency. Results are presented at the same distribution points as above (Figure 2).
In the ipsilateral ear filter-banks, for frequencies below 1.5
kHz the median variations are around 1.0 dB, the 75th percentiles
around 2.0 dB, and the upper boundaries from 3.0 to 5.0 dB. Between 1.5 kHz and 6 kHz the medians vary between 1.5 dB and
2.9 dB, 75th percentiles between 3.5 dB and 5.0 dB, and the upper
boundaries between 5.0 dB and 11.0 dB. For frequencies above 6
kHz, median values are around 3.5 dB, 75th percentiles are between 6.0 dB and 8.0 dB, and upper boundaries vary from 11.5
dB to 17.0 dB. Looking at the maximum variations, octave bands
below 5 kHz reach maximum values between 5.0 and 12.75 dB,
while high frequency content between 16.0 dB and 23.0 dB.
A comparison of the just reported ipsilateral content results to
the contralateral spectral variations yields striking similarities between to two binaural filter groups. Both demonstrate almost identical variation patterns, with the contralateral ones being higher up
to 0.5 dB for the vast majority of the times. Yet, this is not the
case for the maximum variations. Especially for frequency bands
above 5 kHz differences between the two groups increase by up
to approximately 4.0 dB. However, given that the magnitude of
contralateral filters is lower that that of ipsilateral, and the noise
floor is significantly higher, it is questionable whether contralateral spectral variations can be equally perceived, and are of the
same importance to human listeners.
ICAD 2013
Wednesday, july 10
•
The 19th International Conference on Auditory Display (ICAD-2013)
Ipsilateral Content Filters
30
25
20
15
10
5
15868
12862
10211
8106
6436
5110
4057
3221
2559
2033
1616
1283
1020
812
647
515
Variation in dB
0
Contralateral Content Filters
30
25
20
SESSION 7: HRTF and Spatial audio
July 6–10, 2013, Lodz, Poland
significantly, especially when the source moves outside of the head
shadow (above 30◦ in elevation).
An example of this behavior from the MARL databse can be
viewed in the right column of Figure 4. The top three plots depict
the rate of change in spectral variation across frequency, for azimuth angle 90◦ , at elevations 30◦ , 0◦ , and −30◦ . For frequencies
from 1 kHz to 5 kHz one can see how the rate of increase in the
spectral variation distribution changes as a function of elevation,
being very low for high elevations and higher for upper elevations.
Another observation is that, irrespectively of elevation, frontal
plane azimuth angles yield higher spectral variations across frequency content above 5 kHz, than interaural or rear plane ones.
This can be seen by comparing the plots in the the left to those in
the right column of Figure 4. The left column contains azimuth
location 0◦ , at elevations 30◦ , 0◦ , and −30◦ , and azimuth location 50◦ on the horizontal plane, while the right column contains
azimuth locations 90◦ , at elevations 30◦ , 0◦ , and −30◦ , and 130◦
on the horizontal plane. For example, by looking at frequencies
above 5 kHz in just the bottom two plots, one can see a difference
in the maximum variation between 4.0 dB and 8.0 dB.
4. SUBJECTIVE EVALUATION OF THE HRTF
DATABASE
15
4.1. Experiment overview
10
5
15868
12862
10211
8106
6436
5110
4057
3221
2559
2033
1616
1283
1020
812
647
515
0
Center Frequencies in Hz
Figure 2: Box plots of the frequency dependent spectral variations
across ipsilateral and contralateral HRTFs. Box edges indicate the
25th and 75th percentiles and the red line marks the median. Variations are also marked at +2.7σ (black stem), and at the maximum
value (red cross).
The purpose of this experiment was to perceptually assess the spatial quality of the repeated measurements given three criteria: externalization perception, front / back discrimination, and up / down
discrimination. This was necessary in order to verify whether, in
spite of their spectral and ITD variations, individually measured
HRTF sets maintained spatial accuracy. A similar experimental
design was first introduced by Roginska et al. in [5].
A bank to 10 HRTF datasets was created for each of the 4
subjects in the study, which consisted of their personally measured
sets. All binaural filters were reduced to 128-tap minimum-phase
responses with their corresponding interaural time delays. The test
signals used were 0.5 second pink noise bursts. An overview of the
experimental procedure follows.
3.4. Spectral variation as a function of location
In the last part of the analysis spectral variations are observed as
a function of location. For every azimuth - elevation pair, average
distribution curves are computed in the dB magnitude across all
12th octave bands. Examples of the combined results for a single
location can be viewed in Figure 4, where each of the subplots
demonstrates the degree of change (from minimum to maximum)
in the spectral variations across frequencies.
Results indicate that there is a relationship between the rate
of increase in spectral variation and the elevation angle, for frequencies between 1 kHz and 5 kHz. In this frequency range highelevation distribution vectors are relatively flat, and their maximum values stay within the boundaries of the 75th percentiles, as
reported above for ipsilateral content filters. However, as elevations decrease variations become less flat and the increase more
drastic, reaching at −30◦ elevations the previously reported upper
boundary rates of spectral variation rates. Similar observations
have been reported by Wersényi in his analysis of repeated HRTF
measurements on dummy head mannequins [17]. Results have
shown that as the elevation of a sound source increases spectral
variations become flatter, and re-measurement accuracy improves
R
Figure 3: The MATLAB
interface used in the study
4.2. Experimental procedure
The experiment was conducted in the Spatial Auditory Research
Lab at New York University. Stimuli were presented to all subjects
through the Sennheiser HD650 open headphones. The graphical
interphase used for playback and collection of the user responses
R
was designed in MATLAB
2010 b . (Figure 3).
215
ICAD 2013
The 19th International Conference on Auditory Display (ICAD-2013)
July 6–10, 2013, Lodz, Poland
aNaLYSIS OF THE SPECTRaL VaRIATIONS IN REPEaTED HEaD-RELaTED TRaNSFER FUNCTION MEaSUREMENTS
The 19th International Conference on Auditory Display (ICAD-2013)
July 6–10, 2013, Lodz, Poland
Azimuth Angle: 0 Elevation Angle: 30
Azimuth Angle: 90 Elevation Angle: 30
10
10
Azimuth Angle: 0 Elevation Angle: 30
Azimuth Angle: 90 Elevation Angle: 30
8 10
810
6 8
6 8
4 6
4 6
20
20
18
4
2
18
Azimuth Angle: 0 Elevation Angle: 0
10
11
46
11
0
46
0
15
8
15
86 68
8
51
5
Azimuth Angle: 90 Elevation Angle: 0
16
16
Azimuth Angle: 90 Elevation Angle: 0
10
10
8
57
35
57
35
81
06
81
06
5
51
5
51
51
5
Azimuth Angle: 0 Elevation Angle: 0
20
33
20
33
28
71
28
71
40
57
40
57
2
72
5
72
5
10
20
10
20
14
40
14
40
20
33
20
33
28
71
28
71
40
57
40
57
57
35
57
35
81
06
81
06
11
46
0
11
46
0
15
86
8
15
86
8
2
4
72
5
72
5
10
20
10
20
14
40
14
40
2
10
8
8
8
14
14
6
4 4
2 2
2 2
1212
51
515
5
72
725
5
10
2
10 0
20
14
1440
40
4 4
10
Azimuth Angle: 90 Elevation Angle: −30
Azimuth Angle: 0 Elevation Angle: −30
Azimuth Angle: 0 Elevation Angle: −30
10
10
57
5735
35
81
8106
06
11
11460
46
0
15
15868
86
8
6
20
2033
33
28
7
28 1
71
40
4057
57
6
51
5
51
5
72
5
72
5
10
20
10
20
14
40
14
40
20
33
20
33
28
71
28
71
40
5
40 7
57
57
35
57
35
81
06
81
06
11
4
11 60
46
0
15
8
15 68
86
8
10
Azimuth Angle: 90 Elevation Angle: −30
10
10
8
8
8
8
6
8
6
6
8
6
4
4
4
4
2
2
6
10
8
86 15
8 86
0
15
6
46 11
0 46
10
06 8
81
11
5
73
7
05
35 5
57
1
40
28
57 4
87
71 2
03
33 2
44
3
0
0
02
40 1
20
4
Azimuth Angle: 130 Elevation Angle: 0
10 8
10 8
8 6
8 6
4
4
6
2
4
Azimuth Angle: 130 Elevation Angle: 0
6
10
Azimuth Angle: 50 Elevation Angle: 0
6
14
10
20 1
5 7
25
5 5
15
51
8
86
0
46
011
86
15
46
11
Azimuth Angle: 50 Elevation Angle: 0
815
06
81
35
06
81
40
57
57
35
40
57
57
71
28
33
28
14
71
20
40
40
33
14
10
20
10
72
20
5
5
72
51
5
51
20
2
5
2
72
Spectral Variation
Spectral Variation
6
4
4
2
2
2
15
86
8
0
11
46
06
81
35
57
57
40
28
71
33
20
40
14
20
68
15
8
0
46
11
06
81
5
57
3
40
57
71
28
33
20
14
40
10
5
20
0
72
5
51
10
72
51
5
5
8
86
0
46
15
8
2
Center Frequency in Hz
15
86
11
81
06
60
11
4
06
81
35
57
57
5
57
3
28
40
7
40
5
71
28
20
33
20
71
33
40
14
14
40
20
10
20
10
5
72
72
5
51
5
51
5
0
2
Center Frequency in Hz
Figure 4: Rate of change in spectral variations as a function of location for frequencies 500 - 16000 Hz. The x-axis denotes center
frequencies, the y-axis the indexes of the 10-point distribution vectors, and the color-coding the magnitudes of the spectral variations in
dB. 8 different locations are depicted in the graph, azimuth angles 0◦ and 90◦ , across elevations ±30◦ and 0◦ , and azimuth angles 40◦ and
130◦ 4:
on Rate
the horizontal
Figure
of changeplane.
in spectral variations as a function of location for frequencies 500 - 16000 Hz. The x-axis denotes center
frequencies, the y-axis the indexes of the 10-point distribution vectors, and the color-coding the magnitudes of the spectral variations in
dB. 8 different locations are depicted in the graph, azimuth angles 0◦ and 90◦ , across elevations ±30◦ and 0◦ , and azimuth angles 40◦ and
- E) and were instructed to select all that met the stage-specific
The
130◦ on
theprocedure
horizontalconsisted
plane. of three stages designed to test the
spatial quality of the involved HRTFs, rather than the perceived
localization accuracy. In each of the stages the trials were created by randomly selected datasets convolved with the test signal.
The
procedure
consisted
three stages
test the(A
For
each
trial subjects
were of
presented
with 5designed
different to
intervals
spatial quality of the involved HRTFs, rather than the perceived
localization accuracy. In each of the stages the trials were created by randomly selected datasets convolved with the test signal.
For each trial subjects were presented with 5 different intervals (A
216
criterion. HRTF sets were presented 5 times in each of the three
experimental stages. Stages 2 and 3 (front / back and up / down
discrimination) only used binaural filters that were selected at least
- E)
and
instructed
to select all that
60%
of were
the time
in the externalization
task.met the stage-specific
criterion. HRTF sets were presented 5 times in each of the three
experimental stages. Stages 2 and 3 (front / back and up / down
discrimination) only used binaural filters that were selected at least
60% of the time in the externalization task.
ICAD 2013
The
The
19th
19th
International
International
Conference
Conference
onon
Auditory
Auditory
Display
Display
(ICAD-2013)
(ICAD-2013)
Subject
Subject
AA
Wednesday, july 10
Personalized
Personalized
HRTF
HRTF
Evaluation
Evaluation
100
100
8080
8080
6060
6060
4040
4040
2020
2020
HRTF Evaluation Rate
HRTF Evaluation Rate
100
100
0 0
P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 P9P9 P10
P10
•
SESSION 7: HRTF and Spatial audio
July
July
6–10,
6–10,
2013,
2013,
Lodz,
Lodz,
Poland
Poland
Subject
Subject
HH
Extern.
Extern.
Fr /FrB/ B
UpUp
/ Dn
/ Dn
0 0
P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 P9P9 P10
P10
Subject
Subject
SS
Subject
Subject
MM
100
100
100
100
8080
8080
6060
6060
4040
4040
2020
2020
0 0
P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 P9P9 P10
P10
0 0
P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 P9P9 P10
P10
HRTF
HRTF
Dataset
Dataset
IDID
Figure
Figure5:5:Evaulation
Evaulationofofthetheindividually
individuallymeasured
measuredHRTF
HRTFsets
setsofofthetheMARL
MARLdatabase,
database,forforeach
eachcriterion
criterionseparately,
separately,across
acrossallall4 4subjects.
subjects.
Bar
Bar
colors
colors
represent
represent
thethe
3 criteria
3 criteria
ofof
thethe
study,
study,
and
and
identifiers
identifiers
P1P1
- P10
- P10
thethe
1010
individually
individually
measured
measured
HRTFs
HRTFs
forfor
each
each
subject.
subject.
More
More
specifically,
specifically,stage
stage
1 of
1 of
thethe
experiment
experiment
tested
tested
thethe
ability
ability
ofofeach
eachHRTF
HRTFsetsettotoconvey
conveyexternalized
externalizedimages.
images.Each
Eachinterval
interval
consisted
consistedofofa monophonic
a monophonicreference
referencesignal,
signal,followed
followedbybya series
a series
ofof5 5signals
signalsspatialized
spatializedat atvarious
variouslocations
locationsonon5 5elevations
elevationsfrom
from
◦ ◦
◦ ◦
◦ ◦
◦ ◦
−30
−30
toto+30
+30
. .The
Theazimuth
azimuthlocations
locationsused
usedwere
were±30
±30
, ±60
, ±60
, ,
◦ ◦
◦ ◦
◦ ◦
±90
±90, ±120
, ±120, and
, and
±150
±150. .
InInstage
stage2 2subjects
subjectswere
wereasked
askedtotoselect
selectintervals
intervalsforforwhich
which
they
theycould
coulddiscriminate
discriminatefront
frontfrom
fromback.
back.Each
Eachinterval
intervalconsisted
consisted
ofofa monophonic
a monophonicreference
referencesignal,
signal,followed
followedbyby3 3pairs
pairsofofsignals,
signals,
alternating
alternating
onon
thethe
horizontal
horizontal
plane
plane
onon
azimuths
azimuths
locations
locations
at at
oppoopposite
site
sides
sides
ofof
thethe
cone
cone
ofof
confusion.
confusion.The
The
utilized
utilized
azimuthal
azimuthal
angles
angles
◦ ◦
◦ ◦
◦ ◦
◦ ◦
◦ ◦
were
were
0◦0, ◦±30
, ±30
, ±60
, ±60
, ±120
, ±120
, ±150
, ±150
, and
, and
180
180
, ,
Stage
Stage
3 of
3 of
thethe
experiment
experiment
presented
presented
subjects
subjects
with
with
anan
upup
/ down
/ down
discrimination
discrimination
task.
task.Stimuli
Stimuli
consisted
consisted
ofof
a monophonic
a monophonic
reference
reference
◦ ◦
signal,
signal,followed
followedbyby3 3pairs
pairsofofsignals
signalsalternating
alternatingbetween
between±30
±30
elevations.
elevations.Users
Userswere
wereinstructed
instructedtotoselect
selectallallintervals
intervalsforforwhich
which
they
they
perceived
perceived
a change
a change
inin
elevation.
elevation.The
The
presented
presented
azimuth
azimuth
localoca◦ ◦
◦ ◦
◦ ◦
tions
tions
were
were
0◦0, ◦±30
, ±30
, ±60
, ±60
, and
, and
±90
±90
. .
4.3.
4.3.Results
Results
Figure
Figure5 5presents
presentsthethesubjects’
subjects’assessments
assessmentsofoftheir
theirpersonalized
personalized
sets,
sets,forforevery
everyone
oneofofthethethree
threeexperimental
experimentalcriteria.
criteria.InInterms
termsofof
externalization
externalizationperception,
perception,allallHRTFs
HRTFswere
wererated
ratedvery
veryhighly
highlyinin
their
theirability
abilitytotocreate
createconvincing
convincingauditory
auditoryimages
imagesoutside
outsideofofthethe
subjects’
subjects’
heads,
heads,
receiving
receiving
a selection
a selection
rate
rate
ofof
8080
- 100%.
- 100%.
These
These
rates,
rates,
however,
however,
dropped
dropped
byby
2020
- 40%
- 40%
forfor
thethe
front
front
/ back
/ back
discrimination
discriminationcase.
case.ByBylooking
lookingat atthethegraph,
graph,weweseeseethat
thatfront
front/ /
back
back
discrimination
discrimination
was
was
thethe
weakest
weakest
rated
rated
task
task
across
across
allall
subjects.
subjects.
This
Thisbehavior
behaviorwas
wasrather
ratherexpected,
expected,asasthetheexperiment
experimentconsisted
consisted
only
only
ofof
static
static
(not
(not
following
following
head
head
movement)
movement)
auditory
auditory
cues,
cues,
which
which
areare
notnot
strong
strong
enough
enough
toto
contradict
contradict
visual
visual
components
components
reporting
reporting
nono
potential
potential
sound
sound
sources
sources
inin
front
front
ofof
thethe
subjects.
subjects.However,
However,
despite
despite
this
thisdrop,
drop,allallfilter
filtersets
setsremained
remainedat atororabove
abovethethe60%
60%evaluation
evaluation
margin,
margin,with
withthetheexception
exceptionofofmeasurement
measurement8 8ininsubject
subjectH Hwhich
which
received
received
20%
20%
preference
preference
only
only
onon
thethe
given
given
task.
task.
Individually
Individually
measured
measured
HRTFs
HRTFs
received
received
a 20%
a 20%
increase
increase
inin
thethe
ratings
ratings
ofof
thethe
upup
/ down
/ down
discrimination
discrimination
task,
task,
compared
compared
toto
thethe
front
front
/ back
/ backstage.
stage.However,
However,these
thesescores
scoreswere
werestill
stillmost
mostofofthethetimes
times
lower
lowerthan
thanininexternalization
externalizationcriterion
criterioncase.
case.This
Thisbehavior
behaviorwas
was
unanimous
unanimousacross
acrosssubjects
subjectsand
andforforallalldatasets,
datasets,with
withvery
veryfew
fewex-exceptions,
ceptions,
like
like
HRTF
HRTF
4 in
4 in
subject
subject
M,M,
oror
HRTF
HRTF
1010
inin
subject
subject
S.S.
InIngeneral,
general,allallsubjects
subjectsindicated
indicateda strong
a strongpreference
preferencetowards
towards
individually
individuallymeasured
measuredHRTFs,
HRTFs,bybyselecting
selectingthem
themat atleast
least60%
60%ofof
thethe
time
time
inin
most
most
cases.
cases.AllAll
binaural
binaural
filters
filters
were
were
consistently
consistently
evalevaluated
uatedabove
abovechance
chanceasassuccessful
successfulmeans
meansforforrealistic
realisticsound
soundspaspatialization,
tialization,regardless
regardlessofoftheir
theirpotential
potentialvariations.
variations.Such
Suchfindings
findings
support
support
thethe
known
known
ability
ability
ofof
thethe
human
human
auditory
auditory
system
system
toto
accept
accept
/ disregard
/ disregard
variations
variations
inin
HRTF
HRTF
datasets
datasets
within
within
certain
certain
boundaries.
boundaries.
5. 5.DISCUSSION
DISCUSSION
AND
AND
CONCLUSIONS
CONCLUSIONS
This
Thisstudy
studyinvestigated
investigatedthetherange
rangeofofspectral
spectralvariation
variationbetween
between
thth
HRTF
HRTF
datasets
datasets
originating
originating
from
from
thethe
same
same
subject,
subject,
across
across
1212
oc-octave
tave
bands
bands
from
from
500
500
toto
16000
16000
Hz.
Hz.Variations,
Variations,
observed
observed
asas
a funca function
tionofoffrequency,
frequency,distance
distanceofofthetheears
earstotothethesound
soundsource,
source,and
and
thth
location,
location,arearereported
reportedat atthethemedian
medianvalues,
values,thethe7575
percentile,
percentile,
and
and
at at
++
2.72.7
standard
standard
deviations
deviations
from
from
thethe
mean.
mean.Maximum
Maximum
values
values
areare
also
also
provided.
provided.
It It
is is
anan
extension
extension
ofof
Riederer’s
Riederer’s
[12]
[12]
work
work
onon
thethe
repeatability
repeatability
ofofHRTF
HRTFmeasurements.
measurements.That
Thatresearch
researchpresented
presenteda acase
caseforforthethe
most
most
accurate
accurate
method
method
ofof
capturing
capturing
binaural
binaural
filters,
filters,byby
identifying
identifying
thetheimpact
impactofofdifferent
differentcontrolled
controlledconditions
conditionsononthethemeasurement
measurement
outcome,
outcome,
and
and
thethe
appropriate
appropriate
experimental
experimental
adjustments
adjustments
that
that
could
could
minimize
minimizeit.it.The
Thesignificance
significanceofofmodifying
modifyingeach
eachofofthethecontrol
control
conditions,
conditions,quantified
quantifiedasasthethechange
changeininbinaural
binauralfilters,
filters,was
wasanaanalyzed
lyzed
separately.
separately.However
However
inin
this
this
work
work
thethe
impact
impact
ofof
allall
controlled
controlled
conditions
conditions
is is
studied
studied
collectively,
collectively,
asas
inin
regular
regular
HRTF
HRTF
measurement
measurement
217
ICAD 2013
[2] S. Xu, Z. Li, and G. Salvendy, “Improved method to individualize head-related transfer function using anthropometric
measurements,” Acoustical Science and Technology, vol. 29,
no. 6, pp. 388–390, 2008.
[3] K. J. Faller, A. Barreto, and M. Adjouadi, “Decomposition
of Head-Related
Transfer Functions into Multiple Damped
aNaLYSIS OF THE SPECTRaL VaRIATIONS IN REPEaTED HEaD-RELaTED TRaNSFER
FUNCTION MEaSUREMENTS
The 19th International Conference on Auditory Display (ICAD-2013)
July 6–10,
2013, Lodz,
and Delayed Sinusoidals,” in Advanced
Techniques
in Poland
Computing Sciences and Software Engineering, K. Elleithy, Ed.
Springer Netherlands, 2010, pp. 273–278.
routines more than one of them may occur at a time, affecting the
data.
The analysis is based on a collection of 40 HRTF datasets on
The
19th International
Conference
onon
Auditory
(ICAD-2013)
azimuthal
grid, across
4 subjects
(10 sets per
subject),
a 10◦ Display
◦
◦
5 elevations from −30 to +30 . It is expected that the more
complete data collection of this work will give a broader overview
of the variability
theofHRTF
a wider
rangethe
of
routines
more thaninone
them spectrum
may occuracross
at a time,
affecting
locations,
given
a
greater
number
of
repeated
measurements.
data.
Observations
inter-HRTF
variations
frequency,
The
analysis isofbased
on a collection
of across
40 HRTF
datasets ear
on
◦
(ipsilateral
or
contralateral),
and
azimuth-elevation
location,
can
4 subjects (10 sets per subject), on a 10 azimuthal grid, across
◦
◦
used to create
the
5beelevations
frommore
−30dynamic
to +30ways
. Itofisobjectively
expected evaluating
that the more
quality
of data
non acoustically
For overview
example,
complete
collection ofmeasured
this work binaural
will givefilters.
a broader
they
be applied
definition
of thresholds
the range
distance
of thecan
variability
in to
thetheHRTF
spectrum
across a for
wider
of
functions
evaluatenumber
HRTFs.
Even though
such thresholdlocations, used
giventoa greater
of repeated
measurements.
ing will
still be not
we believe
that it will ear
ofObservations
ofcognitively
inter-HRTFdriven,
variations
across frequency,
fer
a
better
approximation
of
the
perceptual
HRTF
evaluation
pro(ipsilateral or contralateral), and azimuth-elevation location, can
cess,
than
the normally
used metrics,
as the mean
squared
be used
to create
more dynamic
ways ofsuch
objectively
evaluating
the
error
(MSE),
which
assume
uniformity
in
the
perceptual
weights
quality of non acoustically measured binaural filters. For example,
of spectral
frequency.
they
can bevariations
applied toacross
the definition
of thresholds for the distance
Knowledge
the expected
range
of though
variation
in thresholdmeasured
functions
used toofevaluate
HRTFs.
Even
such
HRTF
sets
can
offer
new
intuition
on
objective
binaural
filter
ing will still be not cognitively driven, we believe that it willevalofuation.
Several
spatial audio
researchHRTF
fields,evaluation
such as HRTF
fer a better
approximation
of related
the perceptual
procustomization
HRTFused
modeling,
which
currently
primarily
rely
cess, than the and
normally
metrics,
such
as the mean
squared
on perceptual
evaluation
tasks,
could benefit
that, by
gainerror
(MSE), which
assume
uniformity
in the from
perceptual
weights
ing
a more variations
robust method
of frequency.
assessing the quality of their designs.
of spectral
across
ThisKnowledge
research canofalso
lead
to
morerange
effective
metrics for
the expected
of variation
incomputing
measured
HRTF
similarity,
which
be directly
applied binaural
to database
matchHRTF sets
can offer
newcan
intuition
on objective
filter
evaling
tasks.
Finally,
assessments
of
the
accuracy
of
different
uation. Several spatial audio related research fields, such as HRTF
HRTF
interpolation algorithms
becomewhich
more currently
effective as
filter evalucustomization
and HRTFwill
modeling,
primarily
rely
ations
will be done
according
to perceptual
boundaries,
rather
than
on perceptual
evaluation
tasks,
could benefit
from that,
by gainstrict
mathematical
models.
ing a more
robust method
of assessing the quality of their designs.
work
studying
the database
distributions.
ThisFuture
research
canwill
alsofocus
lead toonmore
effective
metrics for
computing
This
analysis
will
lead
to
a
new
representation
of
HRTFs
will
HRTF similarity, which can be directly applied to databasethat
matchfavortasks.
discrimination
between sets
belong toof
dissimilar
ing
Finally, assessments
of that
the accuracy
differentpeople,
HRTF
while
at the same
time promoting
groupings
of moreassimilar
ones
interpolation
algorithms
will become
more effective
filter evaluin
the
same
class.
Such
a
representation
will
have
to
be
supported
ations will be done according to perceptual boundaries, rather than
by
further
subjectivemodels.
evaluation studies, which will assure that the
strict
mathematical
computed
models
do
contradict
cognitive
assessments.
Future work will not
focus
on studying
the database
distributions.
This analysis will lead to a new representation of HRTFs that will
favor discrimination between
sets that belong to dissimilar people,
6. REFERENCES
while at the same time promoting groupings of more similar ones
F. G.class.
Katz,Such
“Boundary
element method
calculation
of indiin[1]
theB.
same
a representation
will have
to be supported
vidualsubjective
head-related
transferstudies,
function,”
Journal
of the that
Acousby further
evaluation
which
will assure
the
tical models
Society do
of not
America,
vol.cognitive
110, no.assessments.
5, pp. 2440–2455,
computed
contradict
2001.
[2] S. Xu, Z. Li, and G.
“Improved method to individ6. Salvendy,
REFERENCES
ualize head-related transfer function using anthropometric
[1] B.
F. G. Katz, “Boundary
element
method
calculation vol.
of indimeasurements,”
Acoustical
Science
and Technology,
29,
vidual
head-related
no. 6, pp.
388–390, transfer
2008. function,” Journal of the Acoustical Society of America, vol. 110, no. 5, pp. 2440–2455,
[3] K. J. Faller, A. Barreto, and M. Adjouadi, “Decomposition
2001.
of Head-Related Transfer Functions into Multiple Damped
[2] and
S. Xu,
Z. Li, and
G. Salvendy,
“ImprovedTechniques
method to in
individDelayed
Sinusoidals,”
in Advanced
Comualize
using K.
anthropometric
puting head-related
Sciences and transfer
Softwarefunction
Engineering,
Elleithy, Ed.
measurements,”
Acoustical
and Technology, vol. 29,
Springer Netherlands,
2010,Science
pp. 273–278.
no. 6, pp. 388–390, 2008.
[4] D. Zotkin, J. Hwang, R. Duraiswaini, and L. Davis, “HRTF
[3] personalization
K. J. Faller, A. Barreto,
and M. Adjouadi,
“Decomposition
using anthropometric
measurements,”
in
of Head-Related
Transfer Functions
into Multiple
Damped
IEEE
Workshop on.Applications
of Signal
Processing
to Auand and
Delayed
Sinusoidals,”
Advanced
in Comdio
Acoustics,
Inst. for in
Adv.
Comput.Techniques
Studies, Maryland
puting Sciences
Engineering,
K. Elleithy,
Univ.
College and
Park,Software
MD, USA:
IEEE, October
2003, Ed.
pp.
Springer Netherlands, 2010, pp. 273–278.
157–160.
[4] D. Zotkin, J. Hwang, R. Duraiswaini, and L. Davis, “HRTF
personalization using anthropometric measurements,” in
IEEE Workshop on.Applications of Signal Processing to Au218 dio and Acoustics, Inst. for Adv. Comput. Studies, Maryland
Univ. College Park, MD, USA: IEEE, October 2003, pp.
157–160.
[5] D.
A. Zotkin,
Roginska,
T. Santoro,
and G. Wakefield,
“Stimulus[4]
J. Hwang,
R. Duraiswaini,
and L. Davis,
“HRTF
dependent HRTFusing
preference,”
in 129th measurements,”
Audio Engineering
personalization
anthropometric
in
Society
Convention,
San Francisco,
CA, USA,
2010. to AuIEEE
Workshop
on.Applications
of Signal
Processing
July 6–10,Studies,
2013, Lodz, Poland
andDurant,
Acoustics,
Inst. for Adv.
Comput.
[6] dio
E. A.
S. Member,
and G.
H. Wakefield, Maryland
“Efficient
Univ.
College
Park,
MD,
USA:
IEEE,
October
2003, Appp.
Model Fitting Using a Genetic Algorithm : Pole-Zero
157–160.
proximations of HRTFs,” Audio, vol. 10, no. 1, pp. 18–27,
[5] 2002.
A. Roginska, T. Santoro, and G. Wakefield, “Stimulusdependent HRTF preference,” in 129th Audio Engineering
[7] A. Kulkarni and H. S. Colburn, “Infinite-impulse-response
Society Convention, San Francisco, CA, USA, 2010.
models of the head-related transfer function,” The Journal of
[6] E.
Durant, S.
Member,
and G. vol.
H. Wakefield,
theA.
Acoustical
Society
of America,
115, no. 4,“Efficient
pp. 1714
Model
– 1728,Fitting
2004. Using a Genetic Algorithm : Pole-Zero Approximations of HRTFs,” Audio, vol. 10, no. 1, pp. 18–27,
[8] H. Hu, L. Chen, and Z. Y. Wu, “The Estimation of Person2002.
alized HRTFs in Individual VAS,” in 4th International Con[7] A.
Kulkarni
and H. Computation,
S. Colburn, “Infinite-impulse-response
ference
on Natural
vol. 1. Ieee, 2008, pp.
models
of the head-related transfer function,” The Journal of
203–207.
the Acoustical Society of America, vol. 115, no. 4, pp. 1714
[9] T. Ajdler, C. Faller, L. Sbaiz, and M. Vetterli, “Interpola– 1728, 2004.
tion of head related transfer functions considering acoustics,”
[8] H.
Hu,Audio
L. Chen,
and Z. Y.Society
Wu, “The
Estimation
of Person118th
Engineering
Convention,
2005.
alized HRTFs in Individual VAS,” in 4th International Con[10] Z. Xiao-Li and X. Bo-Sun, “Spatial characteristics of headference on Natural Computation, vol. 1. Ieee, 2008, pp.
related transfer function,” Chinese Physics Letters, vol. 22,
203–207.
no. 5, pp. 1166 – 1169, 2005.
[9] T. Ajdler, C. Faller, L. Sbaiz, and M. Vetterli, “Interpola[11] P. M. Hofman, J. G. Van Riswick, and A. J. Van Opstal, “Retion of head related transfer functions considering acoustics,”
learning sound localization with new ears.” Nature neuro118th Audio Engineering Society Convention, 2005.
science, vol. 1, no. 5, pp. 417–421, September 1998.
[10] Z. Xiao-Li and X. Bo-Sun, “Spatial characteristics of head[12] K. A. J. Riederer, “Repeatability analysis of head-related
related transfer function,” Chinese Physics Letters, vol. 22,
transfer function measurements,” in 105 thAudio Engineerno. 5, pp. 1166 – 1169, 2005.
ing Society Convention, 9 1998.
[11] P. M. Hofman, J. G. Van Riswick, and A. J. Van Opstal, “Re[13] ——, “Head-related transfer function measurements, master
learning sound localization with new ears.” Nature neurothesis,,” Ph.D. dissertation, Helsinki University of Technolscience, vol. 1, no. 5, pp. 417–421, September 1998.
ogy, 1998.
[12] K. A. J. Riederer, “Repeatability analysis of head-related
[14] G. Wersényi and A. Illényi, “Differences in dummy-head
transfer function measurements,” in 105 thAudio EngineerHRTFs caused by the acoustical environment near the head,”
ing Society Convention, 9 1998.
Electronic Journal of Technical Acoustics, vol. 1, pp. 1–15,
[13] ——,
“Head-related
transfer
function measurements, master
2005. [Online].
Available:
http://www.ejta.org
thesis,,” Ph.D. dissertation, Helsinki University of Technol[15] B. Boren and A. Roginska, “Multichannel Impulse Response
ogy, 1998.
Measurement in MATLAB,” in 131st Audio Engineering So[14] G.
and New
A. Illényi,
“Differences
in dummy-head
cietyWersényi
Convention,
York, NY,
October 2011.
HRTFs caused by the acoustical environment near the head,”
[16] A. Andreopoulou, A. Roginska, and H. Mohanraj, “A
Electronic Journal of Technical Acoustics, vol. 1, pp. 1–15,
Database of Repeated Head-Related Transfer Function Mea2005. [Online]. Available: http://www.ejta.org
surements,” in 19th International Conference on Auditory
[15] B.
Boren(ICAD
and A.2013),
Roginska,
“Multichannel
Impulse
Response
Display
Lodz,
Poland, July 6-10
2013.
Measurement in MATLAB,” in 131st Audio Engineering So[17] G. Wersényi, “Directional Properties of the Dummy-Head
ciety Convention, New York, NY, October 2011.
in Measurement Techniques based on Binaural Evaluation,”
[16] A.
Andreopoulou,
A. Roginska,
H. Mohanraj,
Journal
of Engineering,
Computingand
& Architecture,
vol.“A
1,
Database
Repeated
no. 2, pp. of
1–15,
2007. Head-Related Transfer Function Measurements,” in 19th International Conference on Auditory
Display (ICAD 2013), Lodz, Poland, July 6-10 2013.
[17] G. Wersényi, “Directional Properties of the Dummy-Head
in Measurement Techniques based on Binaural Evaluation,”
Journal of Engineering, Computing & Architecture, vol. 1,
no. 2, pp. 1–15, 2007.
[17] G.
in
Jou
no.