Audibility of head-above-torso orientation in head-related

Audibility of head-above-torso orientation in
head-related transfer functions
Fabian Brinkmann, Reinhild Roden, Alexander Lindau, Stefan Weinzierl
Audio Communication Group, TU Berlin, Einsteinufer 17c, 10587 Berlin, Germany
Summary
Head-related transfer functions (HRTFs) incorporate the fundamental cues for human spatial hearing,
and are often applied to auralize sound fields based on numerical room acoustic modeling techniques.
HRTFs are typically available for various directions of sound incidence and a fixed head-above-torso
orientation (HATO). Consequently, if head movements are tracked in order to exchange HRTFs in
interactive auralizations, they correspond to a listener turning head and torso simultaneously, while
– in reality – listeners usually turn their head independently above a fixed torso. Earlier studies
showed informal evidence that the influence of the head-above-torso orientation might be audible. A
convincing perceptual evaluation was, however, not available so far. In the present study we used an
ABX listening test design to assess whether correctly accounting for head-above-torso orientations will
be audible in a dynamic binaural simulation. Eleven subjects compared interactive simulations with
constant and variable head-above-torso orientations for different directions of sound incidence and
for different audio stimuli. Results show that differences were audible for all tested source positions
and stimuli thereby underlining the perceptual relevance of correct head-above-torso orientations for
binaural synthesis.
PACS no. 43.60.Sx, 43.66.Pn
1. Introduction
Numerical modeling techniques used to simulate the
sound field in room acoustical environments are typically auralized by superposing the direct sound and
each reflection at a listening position with corresponding HRTFs to obtain binaural room impulse responses
(BRIRs, [1, p. 272]). In order to obtain a stable
and plausible representation of the simulated environment, it is important to account for head movements
of the listener, which help to resolve front-back confusions [2], and are naturally used to judge attributes
such as the envelopment, timbre or the width and location of a sound source [3, 4]. However, HRTFs are
usually measured for different angles of sound incidence relative to a fixed dummy head or head and
torso simulator, so that head movements at the reproduction stage will necessarily correspond to a listener
moving head and torso (Fig. 1, left). In a physiologically typical situation, however, the head would be rotated independently above a fixed torso (Fig. 1, right).
The effect of the torso on HRTFs was extensively
studied by Algazi et al. [5] for static binaural synthesis
(c) European Acoustics Association
and a neutral head-above-torso orientation. Authors
found that in case the direct path from the sound
source to the ear is blocked by the torso, shadowing
occurs for frequencies above approximately 100 Hz,
causing increasing attenuation of up to 25 dB. For
other directions of sound incidence, the torso was
shown to act as a reflector causing comb-filters with
an amplitude of up to ±5 dB to interfere with the
magnitude spectrum of the HRTF. The exact positions of peaks and dips of the comb filter thus depends
mainly on the source elevation. For a source above the
listener the first dip was shown to occur already at a
frequency as low as 700 Hz. Above 3 kHz however, the
HRTF is dominated by distinct head and pinnae cues
with a magnitude of ±20 dB [6, 7].
From an analysis of HRTFs measured for various
head-above-torso orientations, Guldenschuh et al. [8]
found, that most prominent torso reflections occur if
ear, shoulder, and source are approximately aligned,
and if the source elevation is within 20◦ below to 40◦
above the horizontal plane. It was further hypothesized, that effects caused by the torso are audible at
least for critical source positions.
Despite the dominating role of head and pinnae effects on the HRTF, Genuit [9] already assumed torso
effects to hold localization cues at frequencies below
3.5 kHz. Algazi et al. [7] conducted localization experiments with stimuli low-passed at 3 kHz, and showed
FORUM ACUSTICUM 2014
7-12 September, Krakow
Brinkmann et al.: Audibility of head orientation in HRTFs
Figure 2. Photo of the HRTF measurement setup taken
while adjusting the source position with the help of a laser
mounted below FABIAN’s left ear.
Figure 1. Illustration of head movements with constant
(left) and variable (right) head-above-torso orientation.
that such cues aid in detecting the elevation of sound
sources outside the median plane.
In summary, several previous studies cited above support the hypothesis that accounting for correct headabove-torso orientations is necessary for a perceptually transparent binaural synthesis. Thus, in the
present study, we physically and perceptually examined differences between dynamic auralizations allowing for either constant or variable head-above-torso
orientations.
2. HRTF measurements
Before being able to assess the effect of headabove-torso orientation, an appropriate HRTF data
set was measured with the head and torso simulator FABIAN [10]. It is equipped with a softwarecontrolled neck joint, allowing for a precise control
of the head-above-torso orientation in multiple degrees of freedom. Transfer functions were measured
in the fully anechoic chamber of the TU Berlin
(V = 1850 m3 , fc =63 Hz).
HRTFs were measured for head orientations in a range
of ±82◦ azimuth with a resolution of 0.5◦ for six sound
sources each distributed on the frontal hemisphere.
Each head orientation was measured (a) for a constant and (b) for a variable head-above-torso orientation. The range of head movements was chosen according to the maximum range of motion [12, 13]. The
spatial resolution was assumed to be imperceptibly
fine, as it was smaller than the worst-case localization
blur of 0.75◦ reported by Blauert [14, p. 39]. In total,
329 HRTFs were measured per source position and for
both, constant and variable head-above-torso orientation.
Sound source positions were chosen to be both typical (e.g. in front of a listener) and particularly critical with respect to a strong shoulder/torso effect.
The latter is the case for source positions causing the
torso to act either as a strong reflector or as an obstacle for the sound field at the ears. Hence, sound
source positions are particularly critical when being
aligned with shoulder and ear, or when resulting in
a strongly shadowed contralateral ear. Consequently,
sources were placed as follows: (1) ϕ = 0◦ azimuth;
ϑ = 90◦ elevation, (2) 315◦ ; 30◦ , (3) 0◦ ; 0◦ , (4) 45◦ ;
0◦ , (5) 90◦ ; 0◦ , and (6) 315◦ ; −30◦ . Hereby, azimuth
angles of 90◦ denote sources to the left of FABIAN,
while positive elevations refer to sources above the
horizontal plane. Exact source positions were determined using a laser attached to FABIAN’s neck joint.
Sources were placed at distances between 2.1 m and
2.6 m, thus avoiding proximity effects [15, 16] and
ensuring that reflections from the speakers could be
removed by windowing.
For excitation, we used sine sweeps between 50 Hz
and 21 kHz and of an FFT order of 16. A bass emphasis was applied for SNR optimization at low frequencies [17]. A peak-to-tail SNR of about 90 dB
was achieved by averaging across two repetitions of
the sweep signal. Measurements were conducted using
Genelec 8030a active studio speakers, and time variability of the loudspeakers’ frequency response could
be reduced to ±0.2 dB by means of an one hour a
warm-up procedure. Controlling time variability was
particularly important because HRTFs measured at
different points in time were to be directly compared
to each other in a highly sensitive listening test later
on. The measurement setup is shown in Fig. 2.
Subsequent to the HRTF measurements, FABIAN
FORUM ACUSTICUM 2014
7-12 September, Krakow
Brinkmann et al.: Audibility of head orientation in HRTFs
was removed and its DPA 4060 miniature electret
condenser microphones were detached for conducting
reference measurements. The positions of the microphones were adjusted to be identical to FABIAN’s
interaural center. Finally, HRTFs were calculated
by spectral division of the measured HRTF and
the reference spectrum, simultaneously compensating for transfer functions of loudspeaker and microphone. Further processing of HRTFs was conducted
as described in Brinkmann et al. [18], however lowfrequency extrapolation was excluded as it was believed to be irrelevant in this context.
3. Physical evaluation
This section presents a physical evaluation of the
torso’s influence on HRTFs as a function of headabove-torso orientation and source position. Observed
differences between head movements with constant
and variable head-above-torso orientation are discussed and a critical subset of source positions is selected for perceptual evaluation in a subsequent listening test.
3.1. Method
Differences in HRTFs were examined with respect to
interaural time and level differences (ITD, ILD), as
well as with regard to spectral coloration. Therefore,
ILDs were estimated as RMS level differences between
left and right ear, whereas ITDs were calculated as
differences in onset times between left and right ear
using onset detection on ten times up-sampled headrelated impulse responses (HRIRs) [19].
In order to obtain an impression of the spectral differences, the log-ratio of the magnitude responses between the HRTFs for constant and variable headabove-torso conditions was calculated (in dB) as
∆HRTF(f ) = 20lg
|HRTFconst (f )|
,
|HRTFvar (f )|
(1)
where f is the frequency in Hz. For convenience, the
dependency of the HRTF on head orientation, source
position, and left and right ear was omitted.
For a better comparability across source positions, a
single value measure was calculated based on Minnaar et al. [20], who described the error between a
reference and an interpolated HRTF by averaging
absolute magnitude differences at 94 logarithmically
spaced frequencies, and adding results for left and
right ear. This physical measure was a good predictor
for listening test results, where subjects had to detect
differences between original and interpolated HRTFs.
However, instead of calculating the error for discrete
frequencies we used a Gammatone filter bank, as suggested by Schärer and Lindau [21]. The error level (in
dB) in one filter band is given by
R
C(f, fc ) |HRTFconst (f )| df
, (2)
∆HRTF(fc ) = 20lg R
C(f, fc ) |HRTFvar (f )| df
Figure 3. HRTFs measured for the right ear and a source
at (315◦ ; 30◦ ) with constant (top) and variable (middle)
head-above-torso orientation. Difference in magnitude responses are shown at the bottom.
where C is a Gammatone filter with center frequency
fc in Hz as implemented in the Auditory Toolbox [22].
The error level ∆HRTF(fc ) was calculated for N = 39
auditory filters between 70 Hz and 20 kHz. Then, the
results for left and right ear were added and averaged
across fc resulting in a single value error measure ∆G
(in dB) for each pair of HRTFs with constant and
variable head-above-torso orientation
1 X
∆G =
|HRTFl (fc )| + |HRTFr (fc )| (3)
N
fc
3.2. Results
On average, differences in ITD and ILD between head
movements with constant and variable head-abovetorso orientation were well below difference thresholds
of 10 µs and 0.6 dB [14, pp. 153], whereas maximum
deviations of 11.4 µs and 0.95 dB only slightly exceed
it.
Two exemplary HRTF sets for head movements with
constant and variable head-above-torso orientation
are depicted in Fig. 3 for the right ear and a source
at (315◦ ; 30◦ ). For both cases, a comb filter caused
by the shoulder reflection is visible for frequencies
up to approx. 1 kHz. Above 1 kHz, strong peak and
notch patterns caused by pinnae resonances are masking the visibility of the shoulder comb filter. However, when calculating the spectral difference according to (1) identical head-to-source orientation cause
the high frequency pinna cues to be cancelled out
FORUM ACUSTICUM 2014
7-12 September, Krakow
(Fig. 3, bottom). Thus, the comb filter becomes visible again over the full frequency range. As expected,
observed differences are nearly negligible for head-tosource orientations in the vicinity of 0◦ , as in this case
head-above-torso orientations for constant and variable head movements are very similar. For other head
orientations comb-filter-like structures stretch across
0.7 to 20 kHz. They result as the difference of two
mismatched comb filters which are related from to
different distances between ear and shoulder, in the
cases of either constant or variable head-above-torso
orientations. Similar difference patterns were observed
for other source positions, with a tendency for larger
deviations to occur at the contralateral ear. Deviations due to shadowing of the torso were present below 700 Hz, but less pronounced even for the source at
−30◦ elevation. This is in good accordance with Algazi et al. [5], where strongest influences of shadowing
in HRTFs were found for sound sources below −40◦
elevation and the contralateral ear (measurements of
a KEMAR mannequin).
The differences according to (3) are shown in Fig. 4 for
all source positions and head-to-source orientations.
As mentioned above, deviations are small in the vicinity of 0◦ whereas elsewise they reach a maximum of
up to 2.5 dB. Moreover, a tendency for the error to
increase with decreasing source elevation can be seen:
The smallest error of ≤ 1 dB is found for the source at
(0◦ ; 90◦ ). In this case, the shoulder reflection is weak
for both constant and variable head-above-torso orientations as most energy is reflected away from the
ear. Intermediate differences of up to 1.5 dB occur
for the sources on the horizontal plane and 30◦ elevation, most likely caused by strong shoulder reflections. The largest error of 2.5 dB was found for the
source at −30◦ elevation and for head-to-source orientations larger than 45◦ azimuth, because the ear is
partly shadowed by the torso in one case (constant
head-above-torso orientation). Confirming the observations of Guldenschuh [8], the error for sources on
the horizontal plane can be seen to increase with decreasing source azimuth.
4. Perceptual evaluation
To test whether or not differences are audible between head movements with either constant or variable head-above-torso orientation, a highly sensitive
ABX listening test was conducted. The setup allowed
for instantaneous and repeated comparison between
HRTF sets using a dynamic binaural auralization accounting for head movements of the listeners.
4.1. Method
Seven men and four women with a median age of 29
years and normal hearing participated in the test.
Hearing level, assessed by means of fixed tone audiograms averaged across octave frequencies between
Brinkmann et al.: Audibility of head orientation in HRTFs
Figure 4. Differences between HRTFs with constant and
variable head-above-torso orientation calculated according
to (3). Results for sources 1 to 6 are shown from top to
bottom.
500 Hz and 4 kHz, was below 25 dB(HL) for all subjects (Classified no hearing impairment by the World
Organisation [23]). All subjects had a musical background; nine subjects had participated in listening
tests before.
Following the ABX paradigm, three stimuli (A, B,
and X ) were presented to the subjects, whose task
was to identify whether A or B equaled X. Subjects
were encouraged to switch between stimuli and take
their time at will before giving an answer, while head
movements were randomly assigned to A, B, and X.
A single test consisted of 23 ABX sequences. Across
subjects, 23 · 11 = 253 observations were made under each condition. To obtain cumulated type 1 and
type 2 error levels of 0.05 after accounting for multiple testing by Bonferroni correction, results for one
tested condition will be significantly above chance at
≥147 observed correct answers, indicating audibility
between head movements. Insignificant results (<147
correct answers) imply group averaged detection rates
to be significantly below 65% [24].
For reproduction of binaural signals, the BKsystem
[25] consisting of a low noise DSP-driven amplifier
and extraaural headphones was used in conjunction
with a Polhemus Patriot tracker for monitoring the
subjects head positions. The fast convolution engine
fWonder [10] including an approach for post-hoc ITD
individualization [19] was used for dynamic binaural
simulation, allowing for head movements in the range
of ±82◦ azimuth. fWonder was also used for the application of a headphone compensation filter, which
was designed using regulated inversion based on headphone transfer functions measured on FABIAN [26].
4.2. Results
Individual results, and results averaged across subjects are shown in Fig. 5 for all test conditions. Average results, as given by the white horizontal bars,
indicate a clear distinguishability of head movements
with constant and variable head-above-torso orientation. For all tested conditions, even for relatively uncritical source positions at (0◦ ; 90◦ ), the results were
significantly above chance.
When asked for the most prominent perceived difference between head movements with constant and
variable head-above-torso orientation, subjects named
coloration in case of the noise content and coloration
and/or localization for the speech sample.
5. Discussion
In this study, we assessed the audibility of differences
occurring during head movements with constant and
variable head-above-torso orientation. To this end
we conducted a physical evaluation of spectral and
temporal deviations, as well as a highly sensitive
ABX listening test. Results show that differences
were audible for all tested source positions and audio
contents, stressing the importance of accounting for
correct head-above-torso orientation if aiming at an
authentic auralization, i.e. an auralization that is
1
http://dx.doi.org/10.14279/depositonce-31
100
5
2
8
2
11
2
5
5
23
21
87.5
19
75
17
15
62.5
13
2
50
11
9
37.5
noise
(0°;90°)
(90°;0°) (315°;−30°) (0°;90°)
speech
number of correct answers
In order to limit the duration of the experiment, a
subset of three sound sources was selected for perceptual evaluation. By drawing on results of the physical
evaluation, particularly critical and uncritical source
positions at (0◦ ; 90◦ ), (90◦ ; 0◦ ), and (315◦ ; 30◦ ) were
selected. Two different audio stimuli were used: Pink
noise was chosen in order to reveal spectral differences, an excerpt of anechoic male speech was used as
a familiar and typical ’real-life’ sound with a timevariant behavior. The combination of three sound
sources and two audio contents lead to six conditions
which were assessed individually by each subject. The
experiment was split in two blocks while successively
presenting the three source conditions per audio content. Source positions were randomized within blocks,
and the sequence of blocks was balanced across subjects.
The test was conducted in a quiet listening room
(Leq,A = 38 dB SPL). Subjects were placed on a revolving chair to comfortably reach and hold arbitrary
head orientations. The whisPER listening test environment1 [27] was used for the test, with the user
interface displayed on a touchpad. Training was conducted prior to the listening test in order to familiarize
subjects with the interface and stimuli. Subjects maximally needed 1.5 hours for rating and were encouraged to take breaks at will in order to avoid fatigue.
Brinkmann et al.: Audibility of head orientation in HRTFs
percentage of correct answers
FORUM ACUSTICUM 2014
7-12 September, Krakow
7
(90°;0°) (315°;−30°)
Figure 5. Listening test results of 11 subjects and for each
tested condition. Black dots indicate percentage/number
of correct answers; depicted numbers show how many subjects had identical results. Mean scores (given by solid
white lines) above dashed line are significantly above
chance.
indistinguishable from a corresponding real sound
field.
Differences between head movements with constant
and variable head-above-torso orientation were even
audible for a source at(0◦ ;90◦ ), which exhibited the
smallest deviations in the physical evaluation. Thus,
the outcome of the listening test can be expected to
hold for all source positions measured in the current
study.
From results of the physical evaluation, we suppose
perceived differences to be mostly due to spectral
differences, because observed deviations in ITDs and
ILDs were below the threshold of audibility for the
vast majority of head-above-torso orientations and
source positions. Perceived differences in localization,
as reported for the speech stimulus, might also be
due to spectral cues, with mismatched comb filters
in HRTFs exciting different directional bands [14,
pp. 93] and thus evoking differences in perceived
elevation. This assumption would be in accordance
with Algazi et al. [7], who found torso and shoulder
related cues to be involved in the perception of
elevation for sources outside the median plane.
Moreover, results of our physical evaluation support
findings of earlier studies regarding the comb-filter
like effect of the shoulder reflection on the HRTF,
which was found to be most prominent if sound
source, shoulder, and ear are aligned [8, 5, 7].
In summary, the results from our study emphasize
the importance of providing correct head-above-torso
orientations in dynamic auralizations. However, this
would require HRTF data containing various headabove-torso orientations and a fine resolution of sound
source positions, as provided by a recently measured
HRTF data set [18]. Because such data sets are costly
to produce, a perceptually transparent interpolation
between HRTFs with different head-above-torso ori-
FORUM ACUSTICUM 2014
7-12 September, Krakow
entations would be all the more desirable. This was
examined by the authors in a follow-up study [28].
Acknowledgement
This work was done within the Simulation and Evaluation of Acoustical Environments (SEACEN) research
group, and funded by the German Research Foundation (DFG WE 4057/3-1).
References
[1] M. Vorländer: Auralization. 1st Edition, Springer,
Berlin, Heidelberg, 2008.
[2] P. Minnaar, S.K. Olesen, F. Christensen, H. Møller:
The importance of head movements for binaural room
synthesis. Proc. 2001 Int. Conf. Auditory Display. Espoo, Finnland.
Brinkmann et al.: Audibility of head orientation in HRTFs
[16] H. Wierstorf, M. Geier, A. Raake, S. Spors: A free
database of head-related impulse response measurements in the horizontal plane with multiple distances.
Proc. 130th AES Convention 2011 (Engineering Brief),
London, UK.
[17] S. Müller, P. Massarani: Transfer function measurement with Sweeps. Directors’s cut including previously
unreleased material and some corrections. J. Audio
Eng. Soc. (Original release) 49(6):443-471 (2001).
[18] F. Brinkmann, A. Lindau, S. Weinzierl, G. Geissler,
S. van de Par: A high resolution head-related transfer
function database including different orientations of
head above the torso. Proc. AIA-DAGA 2013, International Conference on Acoustics, Meran, Italy, 596-599.
[19] A. Lindau, J. Estrella, S. Weinzierl: Individualization
of dynamic binaural synthesis by real time manipulation of the ITD. Proc. 128th AES Convention 2010,
London, UK.
[3] W.R. Thurlow, J.W. Mangels, P.S. Runge: Head movements during sound localization. J. Acoust. Soc. Am.
42(2):489-493 (1967).
[20] P. Minnaar, J. Plogsties, F. Christensen: Directional
resolution of head-related transfer functions required
in binaural synthesis. J. Audio Eng. Soc. 53(10):919929 (2005).
[4] C. Kim, R. Mason, T. Brookes: Head movements made
by listeners in experimental and real-life listening activities. J. Audio Eng. Soc. 61(6):425-438 (2013).
[21] Z. Schärer and A. Lindau: Evaluation of equalization
methods for binaural signals. Proc. 126th AES Convention 2009, Munich, Germany.
[5] V.R. Algazi, R.O. Duda, R. Duraiswami, N.A.
Gumerov, Z. Tang: Approximating the head-related
transfer function using simple geometric models of the
head and torso. J. Acoust. Soc. Am. 112(5):2053-2064
(2002).
[22] M. Slaney: Auditory Toolbox. Technical Report
#1998-010, Interval Research Corporation. Available
at
https://engineering.purdue.edu/~malcolm/
interval/1998-010, accessed May 2014.
[6] M.B. Gardner: Some monaural and binaural facets of
median plane localization. J. Acoust. Soc. Am. 54(6):
1489-1495 (1973).
[7] V.R. Algazi, C. Avendano and R.O. Duda: Elevation
localization and head-related transfer function analysis
at low frequencies. J. Acoust. Soc. Am. 109(3): 11101122 (2001).
[8] M. Guldenschuh, A. Sontacchi, and F. Zotter: HRTF
modelling in due consideration variable torso reflections. Proc. 2008 Acoustics. Paris, France.
[9] K. Genuit: Ein Modell zur Beschreibung von
Außenohrübertragungseigenschaften. Doctoral Thesis.
Techn. Hochsch. Aachen, Germany (1984).
[10] A. Lindau, T. Hohn, S. Weinzierl: Binaural resynthesis for comparative studies of acoustical environments.
Proc. 122th AES Convention 2007. Vienna, Austria.
[11] A. Lindau, S. Weinzierl: On the spatial resolution of
virtual acoustic environments for head movements on
horizontal, vertical and lateral direction. Proc. 2009
EAA Symposium on Auralization.
[12] C.T. Morgan, A. Chapanis, J.S. Cook, M.W. Lund:
Human Engineering Guide to Equipment Design.
McGraw-Hill, New York, 1963.
[13] P. Schöps, N. Seichert, M. Schenk, U. Petri and
E. Senn: Alters- und geschlechtsspezifische Bewegungsausmaße der Halswirbelsäule. Phys. Rehab. Kur.
Med. 7(3): 80-87 (1997).
[14] J. Blauert: Spatial Hearing. The psychophysics of human sound localization. MIT Press, 2nd Edition, Massachusetts, USA, 1997.
[15] D.S. Brungart, W.M. Rabinowitz: Auditory localization of nearby sources. Head-related transfer functions.
J. Acoust. Soc. Am. 106(3):1465-1479 (1999).
[23] World Health Organisation: Grades of hearing impairment. Available at, www.who.int/pbd/deafness/
hearing_impairment_grades/en, accessed May 2014.
[24] L. Leventhal: Type 1 and type 2 errors in the statistical analysis of listening tests. J. Audio Eng. Soc.
34(6): 437-453 (1986).
[25] V. Erbes, F. Schulz, A. Lindau, S. Weinzierl: An extraaural headphone system for optimized binaural reproduction. Proc. 2012 German annual acoustic conference (DAGA), 313-314.
[26] A. Lindau, F. Brinkmann: Perceptual evaluation of
headphone compensation in binaural synthesis based
on non-individual recordings. J. Audio Eng. Soc.
60(1/2):54-62 (2012).
[27] S. Ciba, A. Wlodarski, H.J. Maempel: WhisPER –
A new tool for performing listening tests. Proc. 126th
AES Convention 2009, Munich, Germany. Available at
http://dx.doi.org/10.14279/depositonce-30.
[28] F. Brinkmann, R. Roden, A. Lindau, S. Weinzierl:
Interpolation of head-above-torso orientation in headrelated transfer functions. Proc. 2014 Forum Acusticum. Krakow, Poland (Accepted for publication).