Audibility of head-above-torso orientation in head-related transfer functions Fabian Brinkmann, Reinhild Roden, Alexander Lindau, Stefan Weinzierl Audio Communication Group, TU Berlin, Einsteinufer 17c, 10587 Berlin, Germany Summary Head-related transfer functions (HRTFs) incorporate the fundamental cues for human spatial hearing, and are often applied to auralize sound fields based on numerical room acoustic modeling techniques. HRTFs are typically available for various directions of sound incidence and a fixed head-above-torso orientation (HATO). Consequently, if head movements are tracked in order to exchange HRTFs in interactive auralizations, they correspond to a listener turning head and torso simultaneously, while – in reality – listeners usually turn their head independently above a fixed torso. Earlier studies showed informal evidence that the influence of the head-above-torso orientation might be audible. A convincing perceptual evaluation was, however, not available so far. In the present study we used an ABX listening test design to assess whether correctly accounting for head-above-torso orientations will be audible in a dynamic binaural simulation. Eleven subjects compared interactive simulations with constant and variable head-above-torso orientations for different directions of sound incidence and for different audio stimuli. Results show that differences were audible for all tested source positions and stimuli thereby underlining the perceptual relevance of correct head-above-torso orientations for binaural synthesis. PACS no. 43.60.Sx, 43.66.Pn 1. Introduction Numerical modeling techniques used to simulate the sound field in room acoustical environments are typically auralized by superposing the direct sound and each reflection at a listening position with corresponding HRTFs to obtain binaural room impulse responses (BRIRs, [1, p. 272]). In order to obtain a stable and plausible representation of the simulated environment, it is important to account for head movements of the listener, which help to resolve front-back confusions [2], and are naturally used to judge attributes such as the envelopment, timbre or the width and location of a sound source [3, 4]. However, HRTFs are usually measured for different angles of sound incidence relative to a fixed dummy head or head and torso simulator, so that head movements at the reproduction stage will necessarily correspond to a listener moving head and torso (Fig. 1, left). In a physiologically typical situation, however, the head would be rotated independently above a fixed torso (Fig. 1, right). The effect of the torso on HRTFs was extensively studied by Algazi et al. [5] for static binaural synthesis (c) European Acoustics Association and a neutral head-above-torso orientation. Authors found that in case the direct path from the sound source to the ear is blocked by the torso, shadowing occurs for frequencies above approximately 100 Hz, causing increasing attenuation of up to 25 dB. For other directions of sound incidence, the torso was shown to act as a reflector causing comb-filters with an amplitude of up to ±5 dB to interfere with the magnitude spectrum of the HRTF. The exact positions of peaks and dips of the comb filter thus depends mainly on the source elevation. For a source above the listener the first dip was shown to occur already at a frequency as low as 700 Hz. Above 3 kHz however, the HRTF is dominated by distinct head and pinnae cues with a magnitude of ±20 dB [6, 7]. From an analysis of HRTFs measured for various head-above-torso orientations, Guldenschuh et al. [8] found, that most prominent torso reflections occur if ear, shoulder, and source are approximately aligned, and if the source elevation is within 20◦ below to 40◦ above the horizontal plane. It was further hypothesized, that effects caused by the torso are audible at least for critical source positions. Despite the dominating role of head and pinnae effects on the HRTF, Genuit [9] already assumed torso effects to hold localization cues at frequencies below 3.5 kHz. Algazi et al. [7] conducted localization experiments with stimuli low-passed at 3 kHz, and showed FORUM ACUSTICUM 2014 7-12 September, Krakow Brinkmann et al.: Audibility of head orientation in HRTFs Figure 2. Photo of the HRTF measurement setup taken while adjusting the source position with the help of a laser mounted below FABIAN’s left ear. Figure 1. Illustration of head movements with constant (left) and variable (right) head-above-torso orientation. that such cues aid in detecting the elevation of sound sources outside the median plane. In summary, several previous studies cited above support the hypothesis that accounting for correct headabove-torso orientations is necessary for a perceptually transparent binaural synthesis. Thus, in the present study, we physically and perceptually examined differences between dynamic auralizations allowing for either constant or variable head-above-torso orientations. 2. HRTF measurements Before being able to assess the effect of headabove-torso orientation, an appropriate HRTF data set was measured with the head and torso simulator FABIAN [10]. It is equipped with a softwarecontrolled neck joint, allowing for a precise control of the head-above-torso orientation in multiple degrees of freedom. Transfer functions were measured in the fully anechoic chamber of the TU Berlin (V = 1850 m3 , fc =63 Hz). HRTFs were measured for head orientations in a range of ±82◦ azimuth with a resolution of 0.5◦ for six sound sources each distributed on the frontal hemisphere. Each head orientation was measured (a) for a constant and (b) for a variable head-above-torso orientation. The range of head movements was chosen according to the maximum range of motion [12, 13]. The spatial resolution was assumed to be imperceptibly fine, as it was smaller than the worst-case localization blur of 0.75◦ reported by Blauert [14, p. 39]. In total, 329 HRTFs were measured per source position and for both, constant and variable head-above-torso orientation. Sound source positions were chosen to be both typical (e.g. in front of a listener) and particularly critical with respect to a strong shoulder/torso effect. The latter is the case for source positions causing the torso to act either as a strong reflector or as an obstacle for the sound field at the ears. Hence, sound source positions are particularly critical when being aligned with shoulder and ear, or when resulting in a strongly shadowed contralateral ear. Consequently, sources were placed as follows: (1) ϕ = 0◦ azimuth; ϑ = 90◦ elevation, (2) 315◦ ; 30◦ , (3) 0◦ ; 0◦ , (4) 45◦ ; 0◦ , (5) 90◦ ; 0◦ , and (6) 315◦ ; −30◦ . Hereby, azimuth angles of 90◦ denote sources to the left of FABIAN, while positive elevations refer to sources above the horizontal plane. Exact source positions were determined using a laser attached to FABIAN’s neck joint. Sources were placed at distances between 2.1 m and 2.6 m, thus avoiding proximity effects [15, 16] and ensuring that reflections from the speakers could be removed by windowing. For excitation, we used sine sweeps between 50 Hz and 21 kHz and of an FFT order of 16. A bass emphasis was applied for SNR optimization at low frequencies [17]. A peak-to-tail SNR of about 90 dB was achieved by averaging across two repetitions of the sweep signal. Measurements were conducted using Genelec 8030a active studio speakers, and time variability of the loudspeakers’ frequency response could be reduced to ±0.2 dB by means of an one hour a warm-up procedure. Controlling time variability was particularly important because HRTFs measured at different points in time were to be directly compared to each other in a highly sensitive listening test later on. The measurement setup is shown in Fig. 2. Subsequent to the HRTF measurements, FABIAN FORUM ACUSTICUM 2014 7-12 September, Krakow Brinkmann et al.: Audibility of head orientation in HRTFs was removed and its DPA 4060 miniature electret condenser microphones were detached for conducting reference measurements. The positions of the microphones were adjusted to be identical to FABIAN’s interaural center. Finally, HRTFs were calculated by spectral division of the measured HRTF and the reference spectrum, simultaneously compensating for transfer functions of loudspeaker and microphone. Further processing of HRTFs was conducted as described in Brinkmann et al. [18], however lowfrequency extrapolation was excluded as it was believed to be irrelevant in this context. 3. Physical evaluation This section presents a physical evaluation of the torso’s influence on HRTFs as a function of headabove-torso orientation and source position. Observed differences between head movements with constant and variable head-above-torso orientation are discussed and a critical subset of source positions is selected for perceptual evaluation in a subsequent listening test. 3.1. Method Differences in HRTFs were examined with respect to interaural time and level differences (ITD, ILD), as well as with regard to spectral coloration. Therefore, ILDs were estimated as RMS level differences between left and right ear, whereas ITDs were calculated as differences in onset times between left and right ear using onset detection on ten times up-sampled headrelated impulse responses (HRIRs) [19]. In order to obtain an impression of the spectral differences, the log-ratio of the magnitude responses between the HRTFs for constant and variable headabove-torso conditions was calculated (in dB) as ∆HRTF(f ) = 20lg |HRTFconst (f )| , |HRTFvar (f )| (1) where f is the frequency in Hz. For convenience, the dependency of the HRTF on head orientation, source position, and left and right ear was omitted. For a better comparability across source positions, a single value measure was calculated based on Minnaar et al. [20], who described the error between a reference and an interpolated HRTF by averaging absolute magnitude differences at 94 logarithmically spaced frequencies, and adding results for left and right ear. This physical measure was a good predictor for listening test results, where subjects had to detect differences between original and interpolated HRTFs. However, instead of calculating the error for discrete frequencies we used a Gammatone filter bank, as suggested by Schärer and Lindau [21]. The error level (in dB) in one filter band is given by R C(f, fc ) |HRTFconst (f )| df , (2) ∆HRTF(fc ) = 20lg R C(f, fc ) |HRTFvar (f )| df Figure 3. HRTFs measured for the right ear and a source at (315◦ ; 30◦ ) with constant (top) and variable (middle) head-above-torso orientation. Difference in magnitude responses are shown at the bottom. where C is a Gammatone filter with center frequency fc in Hz as implemented in the Auditory Toolbox [22]. The error level ∆HRTF(fc ) was calculated for N = 39 auditory filters between 70 Hz and 20 kHz. Then, the results for left and right ear were added and averaged across fc resulting in a single value error measure ∆G (in dB) for each pair of HRTFs with constant and variable head-above-torso orientation 1 X ∆G = |HRTFl (fc )| + |HRTFr (fc )| (3) N fc 3.2. Results On average, differences in ITD and ILD between head movements with constant and variable head-abovetorso orientation were well below difference thresholds of 10 µs and 0.6 dB [14, pp. 153], whereas maximum deviations of 11.4 µs and 0.95 dB only slightly exceed it. Two exemplary HRTF sets for head movements with constant and variable head-above-torso orientation are depicted in Fig. 3 for the right ear and a source at (315◦ ; 30◦ ). For both cases, a comb filter caused by the shoulder reflection is visible for frequencies up to approx. 1 kHz. Above 1 kHz, strong peak and notch patterns caused by pinnae resonances are masking the visibility of the shoulder comb filter. However, when calculating the spectral difference according to (1) identical head-to-source orientation cause the high frequency pinna cues to be cancelled out FORUM ACUSTICUM 2014 7-12 September, Krakow (Fig. 3, bottom). Thus, the comb filter becomes visible again over the full frequency range. As expected, observed differences are nearly negligible for head-tosource orientations in the vicinity of 0◦ , as in this case head-above-torso orientations for constant and variable head movements are very similar. For other head orientations comb-filter-like structures stretch across 0.7 to 20 kHz. They result as the difference of two mismatched comb filters which are related from to different distances between ear and shoulder, in the cases of either constant or variable head-above-torso orientations. Similar difference patterns were observed for other source positions, with a tendency for larger deviations to occur at the contralateral ear. Deviations due to shadowing of the torso were present below 700 Hz, but less pronounced even for the source at −30◦ elevation. This is in good accordance with Algazi et al. [5], where strongest influences of shadowing in HRTFs were found for sound sources below −40◦ elevation and the contralateral ear (measurements of a KEMAR mannequin). The differences according to (3) are shown in Fig. 4 for all source positions and head-to-source orientations. As mentioned above, deviations are small in the vicinity of 0◦ whereas elsewise they reach a maximum of up to 2.5 dB. Moreover, a tendency for the error to increase with decreasing source elevation can be seen: The smallest error of ≤ 1 dB is found for the source at (0◦ ; 90◦ ). In this case, the shoulder reflection is weak for both constant and variable head-above-torso orientations as most energy is reflected away from the ear. Intermediate differences of up to 1.5 dB occur for the sources on the horizontal plane and 30◦ elevation, most likely caused by strong shoulder reflections. The largest error of 2.5 dB was found for the source at −30◦ elevation and for head-to-source orientations larger than 45◦ azimuth, because the ear is partly shadowed by the torso in one case (constant head-above-torso orientation). Confirming the observations of Guldenschuh [8], the error for sources on the horizontal plane can be seen to increase with decreasing source azimuth. 4. Perceptual evaluation To test whether or not differences are audible between head movements with either constant or variable head-above-torso orientation, a highly sensitive ABX listening test was conducted. The setup allowed for instantaneous and repeated comparison between HRTF sets using a dynamic binaural auralization accounting for head movements of the listeners. 4.1. Method Seven men and four women with a median age of 29 years and normal hearing participated in the test. Hearing level, assessed by means of fixed tone audiograms averaged across octave frequencies between Brinkmann et al.: Audibility of head orientation in HRTFs Figure 4. Differences between HRTFs with constant and variable head-above-torso orientation calculated according to (3). Results for sources 1 to 6 are shown from top to bottom. 500 Hz and 4 kHz, was below 25 dB(HL) for all subjects (Classified no hearing impairment by the World Organisation [23]). All subjects had a musical background; nine subjects had participated in listening tests before. Following the ABX paradigm, three stimuli (A, B, and X ) were presented to the subjects, whose task was to identify whether A or B equaled X. Subjects were encouraged to switch between stimuli and take their time at will before giving an answer, while head movements were randomly assigned to A, B, and X. A single test consisted of 23 ABX sequences. Across subjects, 23 · 11 = 253 observations were made under each condition. To obtain cumulated type 1 and type 2 error levels of 0.05 after accounting for multiple testing by Bonferroni correction, results for one tested condition will be significantly above chance at ≥147 observed correct answers, indicating audibility between head movements. Insignificant results (<147 correct answers) imply group averaged detection rates to be significantly below 65% [24]. For reproduction of binaural signals, the BKsystem [25] consisting of a low noise DSP-driven amplifier and extraaural headphones was used in conjunction with a Polhemus Patriot tracker for monitoring the subjects head positions. The fast convolution engine fWonder [10] including an approach for post-hoc ITD individualization [19] was used for dynamic binaural simulation, allowing for head movements in the range of ±82◦ azimuth. fWonder was also used for the application of a headphone compensation filter, which was designed using regulated inversion based on headphone transfer functions measured on FABIAN [26]. 4.2. Results Individual results, and results averaged across subjects are shown in Fig. 5 for all test conditions. Average results, as given by the white horizontal bars, indicate a clear distinguishability of head movements with constant and variable head-above-torso orientation. For all tested conditions, even for relatively uncritical source positions at (0◦ ; 90◦ ), the results were significantly above chance. When asked for the most prominent perceived difference between head movements with constant and variable head-above-torso orientation, subjects named coloration in case of the noise content and coloration and/or localization for the speech sample. 5. Discussion In this study, we assessed the audibility of differences occurring during head movements with constant and variable head-above-torso orientation. To this end we conducted a physical evaluation of spectral and temporal deviations, as well as a highly sensitive ABX listening test. Results show that differences were audible for all tested source positions and audio contents, stressing the importance of accounting for correct head-above-torso orientation if aiming at an authentic auralization, i.e. an auralization that is 1 http://dx.doi.org/10.14279/depositonce-31 100 5 2 8 2 11 2 5 5 23 21 87.5 19 75 17 15 62.5 13 2 50 11 9 37.5 noise (0°;90°) (90°;0°) (315°;−30°) (0°;90°) speech number of correct answers In order to limit the duration of the experiment, a subset of three sound sources was selected for perceptual evaluation. By drawing on results of the physical evaluation, particularly critical and uncritical source positions at (0◦ ; 90◦ ), (90◦ ; 0◦ ), and (315◦ ; 30◦ ) were selected. Two different audio stimuli were used: Pink noise was chosen in order to reveal spectral differences, an excerpt of anechoic male speech was used as a familiar and typical ’real-life’ sound with a timevariant behavior. The combination of three sound sources and two audio contents lead to six conditions which were assessed individually by each subject. The experiment was split in two blocks while successively presenting the three source conditions per audio content. Source positions were randomized within blocks, and the sequence of blocks was balanced across subjects. The test was conducted in a quiet listening room (Leq,A = 38 dB SPL). Subjects were placed on a revolving chair to comfortably reach and hold arbitrary head orientations. The whisPER listening test environment1 [27] was used for the test, with the user interface displayed on a touchpad. Training was conducted prior to the listening test in order to familiarize subjects with the interface and stimuli. Subjects maximally needed 1.5 hours for rating and were encouraged to take breaks at will in order to avoid fatigue. Brinkmann et al.: Audibility of head orientation in HRTFs percentage of correct answers FORUM ACUSTICUM 2014 7-12 September, Krakow 7 (90°;0°) (315°;−30°) Figure 5. Listening test results of 11 subjects and for each tested condition. Black dots indicate percentage/number of correct answers; depicted numbers show how many subjects had identical results. Mean scores (given by solid white lines) above dashed line are significantly above chance. indistinguishable from a corresponding real sound field. Differences between head movements with constant and variable head-above-torso orientation were even audible for a source at(0◦ ;90◦ ), which exhibited the smallest deviations in the physical evaluation. Thus, the outcome of the listening test can be expected to hold for all source positions measured in the current study. From results of the physical evaluation, we suppose perceived differences to be mostly due to spectral differences, because observed deviations in ITDs and ILDs were below the threshold of audibility for the vast majority of head-above-torso orientations and source positions. Perceived differences in localization, as reported for the speech stimulus, might also be due to spectral cues, with mismatched comb filters in HRTFs exciting different directional bands [14, pp. 93] and thus evoking differences in perceived elevation. This assumption would be in accordance with Algazi et al. [7], who found torso and shoulder related cues to be involved in the perception of elevation for sources outside the median plane. Moreover, results of our physical evaluation support findings of earlier studies regarding the comb-filter like effect of the shoulder reflection on the HRTF, which was found to be most prominent if sound source, shoulder, and ear are aligned [8, 5, 7]. In summary, the results from our study emphasize the importance of providing correct head-above-torso orientations in dynamic auralizations. However, this would require HRTF data containing various headabove-torso orientations and a fine resolution of sound source positions, as provided by a recently measured HRTF data set [18]. Because such data sets are costly to produce, a perceptually transparent interpolation between HRTFs with different head-above-torso ori- FORUM ACUSTICUM 2014 7-12 September, Krakow entations would be all the more desirable. This was examined by the authors in a follow-up study [28]. Acknowledgement This work was done within the Simulation and Evaluation of Acoustical Environments (SEACEN) research group, and funded by the German Research Foundation (DFG WE 4057/3-1). References [1] M. Vorländer: Auralization. 1st Edition, Springer, Berlin, Heidelberg, 2008. [2] P. Minnaar, S.K. Olesen, F. Christensen, H. Møller: The importance of head movements for binaural room synthesis. Proc. 2001 Int. Conf. Auditory Display. Espoo, Finnland. Brinkmann et al.: Audibility of head orientation in HRTFs [16] H. Wierstorf, M. Geier, A. Raake, S. Spors: A free database of head-related impulse response measurements in the horizontal plane with multiple distances. Proc. 130th AES Convention 2011 (Engineering Brief), London, UK. [17] S. Müller, P. Massarani: Transfer function measurement with Sweeps. Directors’s cut including previously unreleased material and some corrections. J. Audio Eng. Soc. (Original release) 49(6):443-471 (2001). [18] F. Brinkmann, A. Lindau, S. Weinzierl, G. Geissler, S. van de Par: A high resolution head-related transfer function database including different orientations of head above the torso. Proc. AIA-DAGA 2013, International Conference on Acoustics, Meran, Italy, 596-599. [19] A. Lindau, J. Estrella, S. Weinzierl: Individualization of dynamic binaural synthesis by real time manipulation of the ITD. Proc. 128th AES Convention 2010, London, UK. [3] W.R. Thurlow, J.W. Mangels, P.S. Runge: Head movements during sound localization. J. Acoust. Soc. Am. 42(2):489-493 (1967). [20] P. Minnaar, J. Plogsties, F. Christensen: Directional resolution of head-related transfer functions required in binaural synthesis. J. Audio Eng. Soc. 53(10):919929 (2005). [4] C. Kim, R. Mason, T. Brookes: Head movements made by listeners in experimental and real-life listening activities. J. Audio Eng. Soc. 61(6):425-438 (2013). [21] Z. Schärer and A. Lindau: Evaluation of equalization methods for binaural signals. Proc. 126th AES Convention 2009, Munich, Germany. [5] V.R. Algazi, R.O. Duda, R. Duraiswami, N.A. Gumerov, Z. Tang: Approximating the head-related transfer function using simple geometric models of the head and torso. J. Acoust. Soc. Am. 112(5):2053-2064 (2002). [22] M. Slaney: Auditory Toolbox. Technical Report #1998-010, Interval Research Corporation. Available at https://engineering.purdue.edu/~malcolm/ interval/1998-010, accessed May 2014. [6] M.B. Gardner: Some monaural and binaural facets of median plane localization. J. Acoust. Soc. Am. 54(6): 1489-1495 (1973). [7] V.R. Algazi, C. Avendano and R.O. Duda: Elevation localization and head-related transfer function analysis at low frequencies. J. Acoust. Soc. Am. 109(3): 11101122 (2001). [8] M. Guldenschuh, A. Sontacchi, and F. Zotter: HRTF modelling in due consideration variable torso reflections. Proc. 2008 Acoustics. Paris, France. [9] K. Genuit: Ein Modell zur Beschreibung von Außenohrübertragungseigenschaften. Doctoral Thesis. Techn. Hochsch. Aachen, Germany (1984). [10] A. Lindau, T. Hohn, S. Weinzierl: Binaural resynthesis for comparative studies of acoustical environments. Proc. 122th AES Convention 2007. Vienna, Austria. [11] A. Lindau, S. Weinzierl: On the spatial resolution of virtual acoustic environments for head movements on horizontal, vertical and lateral direction. Proc. 2009 EAA Symposium on Auralization. [12] C.T. Morgan, A. Chapanis, J.S. Cook, M.W. Lund: Human Engineering Guide to Equipment Design. McGraw-Hill, New York, 1963. [13] P. Schöps, N. Seichert, M. Schenk, U. Petri and E. Senn: Alters- und geschlechtsspezifische Bewegungsausmaße der Halswirbelsäule. Phys. Rehab. Kur. Med. 7(3): 80-87 (1997). [14] J. Blauert: Spatial Hearing. The psychophysics of human sound localization. MIT Press, 2nd Edition, Massachusetts, USA, 1997. [15] D.S. Brungart, W.M. Rabinowitz: Auditory localization of nearby sources. Head-related transfer functions. J. Acoust. Soc. Am. 106(3):1465-1479 (1999). [23] World Health Organisation: Grades of hearing impairment. Available at, www.who.int/pbd/deafness/ hearing_impairment_grades/en, accessed May 2014. [24] L. Leventhal: Type 1 and type 2 errors in the statistical analysis of listening tests. J. Audio Eng. Soc. 34(6): 437-453 (1986). [25] V. Erbes, F. Schulz, A. Lindau, S. Weinzierl: An extraaural headphone system for optimized binaural reproduction. Proc. 2012 German annual acoustic conference (DAGA), 313-314. [26] A. Lindau, F. Brinkmann: Perceptual evaluation of headphone compensation in binaural synthesis based on non-individual recordings. J. Audio Eng. Soc. 60(1/2):54-62 (2012). [27] S. Ciba, A. Wlodarski, H.J. Maempel: WhisPER – A new tool for performing listening tests. Proc. 126th AES Convention 2009, Munich, Germany. Available at http://dx.doi.org/10.14279/depositonce-30. [28] F. Brinkmann, R. Roden, A. Lindau, S. Weinzierl: Interpolation of head-above-torso orientation in headrelated transfer functions. Proc. 2014 Forum Acusticum. Krakow, Poland (Accepted for publication).
© Copyright 2026 Paperzz