ICaD 2013 6–10 july, 2013, Łódź, Poland The 19th International Conference on Auditory Display (ICAD-2013) international Conference on auditory Display July 6–10, 2013, Lodz, Poland ANALYSIS OF THE SPECTRAL VARIATIONS IN REPEATED HEAD-RELATED TRANSFER FUNCTION Areti Andreopoulou Agnieszka Roginska Hariharan Mohanraj MEASUREMENTS ANALYSIS OF THE SPECTRAL VARIATIONS IN REPEATED HEAD-RELATED TRANSFER FUNCTION MEASUREMENTS ∗ † Music and Audio Research Laboratory (MARL) Areti Andreopoulou1New , Agnieszka Rogińska2, Hariharan Mohanraj York University Music and Audio Research Laboratory (MARL) New York, NY New York University, ∗ † New York, NY [email protected] [email protected] 1 [email protected], [email protected] ABSTRACT This paper discusses the range of spectral variations between HRTF sets measured on the same subjects. The analysis is done in a corpus of 40 HRTF datasets of 4 subjects (10 datasets per subject). Variations are observed as a function of frequency, distance of the ears to the sound source (ipsilateral or contralateral), and location. Assessments of the spatial quality of all datasets were made through a subjective study which confirmed that despite their spectral variations, all individually measured HRTF sets maintained a high degree of spatial realism. An understanding of the variability in HRTFs can offer new intuition on objective binaural filter evaluation, and has significance in the research fields of spatial audio reproduction and virtual auditory display. 1. INTRODUCTION 3D virtual auditory displays are created by applying the spatial cues that are characteristic of each sources intended location on monophonic sounds. Location-depended spatial cues are captured in the Head-Related Impulse Responses (HRIRs) or their frequency domain equivalent Head-Related Transfer Functions (HRTFs). Due to the high variability in people’s head shapes and especially pinnae, HRTFs vary vastly among individuals. The use of non-individualized HRTFs in binaural reproduction may lead to an unconvincing spatial impression, which can compromise the realism, quality, and accuracy of the delivered message. Consequently, various scientific approaches have emerged trying to overcome the need for personalized data, such as HRTF customization [1], [2], [3], database matching [4], [5], and HRTF modeling [6], [7], [8]. Also, in an attempt to create realistically dense auditory environments, much research has been done regarding high-accuracy HRTF interpolation methods [9], [10]. The accuracy of acquired or computed HRTFs can only be evaluated either perceptually through a user study, or objectively based on a defined metric, such as the mean squared error (MSE), the correlation distance, or the signal to distortion ratio (SDR). While in the first case a successful dataset is the one that conveys a convincing spatial image, in the latter it is the one that demonstrates the smallest variation from an originally measured set. Even though objective evaluation processes can be quick, they are oversimplified as they assume uniformity in the perceptual weights of spectral variation across frequency. Nevertheless, the brain has a certain degree of tolerance in HRTF variations, as studies have shown that the human auditory system has the ability to successfully adapt to altered spectral cues, given time [11]. Hence, a more enhanced evaluation process would take into account prior knowledge of the expected variability within HRTF sets, in order to create perceptually driven thresholds. Absolute variations in measured datasets have been reported before by Riederer [12], [13], who offered a detailed overview of the degree of error introduced to an HRTF pair by a number of factors, such as the background noise, reflections from the measurement system, accidental movements or misalignment of the subject, the use of ear plugs, accurate placement of the miniature microphones etc. However, that work explored the effects of a single error-factor at a time, while in real-life situations multiple can contribute to the final measurement outcome concurrently. In a similar study Wersényi, and Illényi investigated the effect of nearthe-head everyday objects like hair, caps, glasses and clothing, on HRTF datasets. The analysis was based on repeated measurements on a computer- controlled dummy head measurement system [14]. The goal for this work is to revisit this topic by discussing the spectral variability of binaural filters, through observation of the MARL database, a corpus of repeated HRTF measurements across multiple subjects. More specifically, this paper presents an assessment of the spectral changes that can be observed between repeated HRTF measurements on the same subject as a function of frequency, location (azimuth / elevation angle), and distance of the ear to the sound source (ipsilateral or contralateral filters). 2. THE DATABASE 2.1. Description of data The database consists of repeated HRTF measurements on 4 subjects. It has a total of 40 datasets (10 sets per subject), all captured by the Music and Audio Research Laboratory, at New York University. The spatial resolution of the database was designed to be uniform at 10◦ horizontal and 15◦ vertical increments from −30◦ to +30◦ in elevation. The measurements were reduced to 256-tap HRIRs to remove room reflections, and was free-field equalized to compensate for the spectral characteristics of the speaker setup. The measurements took place at two different locations: the Spatial Auditory Research Lab, a 4.5 by 3.5 by 2.5 m semianechoic room, and the Dolan Studio live-room, a 9 by 4.6 by 3 m sound treated space. For both cases 5 different Genelec 8030a speakers were positioned in a spiral configuration. The subjects were seated on a rotating stool with adjustable height, at the center of the spiral, at a distance of 1m from the speakers. All data was measured using the blocked meatus tech- 213 ICAD 2013 aNaLYSIS OF THE SPECTRaL VaRIATIONS IN REPEaTED HEaD-RELaTED TRaNSFER FUNCTION MEaSUREMENTS The 19th International Conference on Auditory Display (ICAD-2013) July 6–10, 2013, Lodz, Poland 3. SPECTRAL VARIATION IN HRTF SETS 3.1. Data post processing and variation estimation The database contains 256-tap HRIRs in the form of minimum phase filters with incorporated ITD information. The responses are low-pass filtered with a cosine window at 20000 Hz to eliminate high frequency content. For the purposes of this analysis, the length of the HRIRs was shortened to 1.5ms, to include only the pinnae responses, and the amplitude of each filter set was normalized to the left-right ear maximum, to eliminate the impact of overall amplitude differences between datasets on the intended analysis. All binaural filters were band-limited between 500 Hz and 16000 Hz using a rectangular window. In order to reduce the weight of high frequency content on the results, signals were averaged across 12th octave bands. Ipsilateral and contralateral content data was stored independently, and analyzed separately. The spectral variation between HRTF sets was calculated per 12th octave band. When the differences between all possible sets, which belonged to the same subject, were calculated, the frequency-band-dependent results were sorted and averaged out, to indicate the average increase in dB magnitude difference across all repeated measurements. Changes in the average spectral variations were then studied as a function of frequency, proximity to the sound source (ipsilateral or contralateral filters), and location. 3.2. Spectral variation as a function of frequency For this part of the analysis variation in dB magnitude across HRTF sets is studied exclusively as a function of frequency. Changes are observed across binaural filters from all available azimuth elevation locations, and all subjects. Results are reported at the median value, the 75th percentile, at + 2.7 σ and at the maximum value. Figure 1 shows a graphical report of the results. Below 1.5 kHz the median spectral variations are around 1.5 dB, the 75th percentiles 2.5 to 3.5 dB, and the upper boundaries 5.0 to 7.0 dB. From 1.5 kHz to 6 kHz the medians are between 2.5 dB and 4.0 dB, the 75th percentiles from 3.5 to 6.5 dB, and the upper boundaries from 8.0 to 16.0 dB. For frequencies above 6 kHz there is an even larger increase in spectral variation. Median 214 20 15 10 5 15868 12862 10211 8106 6436 5110 4057 3221 2559 2033 1616 1283 1020 812 0 647 The capturing procedure of the collection was designed to explore variability in HRTF sets while maintaining a high degree of accuracy in the measurement process. This was achieved by introducing changes in five otherwise controlled conditions: the space where the measurements took place, the speaker setup, the removal and readjustment of the miniature microphones between measurements of different datasets, the degree of precision in monitoring the alignment of the subjects, and the involvement of 2 moderators with different levels of expertise in the HRTF measurement process. A more detailed description of the database and its acquisition process can be found in [16]. 25 515 2.2. Variability in the data 30 Variation in dB nique with custom-made miniature binaural microphones using Sennheiser’s KE − 4 capsules. For 4 of the HRTF sets the positioning of each subject was monitored with the use of laser pointers, while for the remaining the Polhemus Liberty magnetic tracker was employed. The HRTF filters were captured in scanIR [15] with 2 different one-second excitation signals -a Maximum Length Sequence (MLS), and Golay codes-, sampled at 48000 Hz. Center Frequencies in Hz Figure 1: Box plot of the frequency dependent spectral variations. Box edges indicate the 25th and 75th percentiles and the red line marks the median. Variations are also marked at +2.7σ (black stem), and at the maximum value (red cross). values are around 5.3 dB, 75th percentiles are between 9.0 dB and 11.0 dB, and upper boundaries vary from 18.0 dB to 23.0 dB. As far as maximum variations are concerned below 5 kHz they range between 7.0 dB and 16.9 dB, while for frequency bands above that they raise up to 30.0 dB. Deviations of up to 20 dB have been reported before [14] for dummy head measurements conducted in conditions where unwanted movements, body alignment, and microphone placements were well controlled. The increase of up to 10 dB in our results can be expected since data is collected on human subjects. 3.3. Spectral variation as a function of distance from the ears to the sound source In this part spectral variations are analyzed separately for ipsilateral and contralateral filters across frequency. Results are presented at the same distribution points as above (Figure 2). In the ipsilateral ear filter-banks, for frequencies below 1.5 kHz the median variations are around 1.0 dB, the 75th percentiles around 2.0 dB, and the upper boundaries from 3.0 to 5.0 dB. Between 1.5 kHz and 6 kHz the medians vary between 1.5 dB and 2.9 dB, 75th percentiles between 3.5 dB and 5.0 dB, and the upper boundaries between 5.0 dB and 11.0 dB. For frequencies above 6 kHz, median values are around 3.5 dB, 75th percentiles are between 6.0 dB and 8.0 dB, and upper boundaries vary from 11.5 dB to 17.0 dB. Looking at the maximum variations, octave bands below 5 kHz reach maximum values between 5.0 and 12.75 dB, while high frequency content between 16.0 dB and 23.0 dB. A comparison of the just reported ipsilateral content results to the contralateral spectral variations yields striking similarities between to two binaural filter groups. Both demonstrate almost identical variation patterns, with the contralateral ones being higher up to 0.5 dB for the vast majority of the times. Yet, this is not the case for the maximum variations. Especially for frequency bands above 5 kHz differences between the two groups increase by up to approximately 4.0 dB. However, given that the magnitude of contralateral filters is lower that that of ipsilateral, and the noise floor is significantly higher, it is questionable whether contralateral spectral variations can be equally perceived, and are of the same importance to human listeners. ICAD 2013 Wednesday, july 10 • The 19th International Conference on Auditory Display (ICAD-2013) Ipsilateral Content Filters 30 25 20 15 10 5 15868 12862 10211 8106 6436 5110 4057 3221 2559 2033 1616 1283 1020 812 647 515 Variation in dB 0 Contralateral Content Filters 30 25 20 SESSION 7: HRTF and Spatial audio July 6–10, 2013, Lodz, Poland significantly, especially when the source moves outside of the head shadow (above 30◦ in elevation). An example of this behavior from the MARL databse can be viewed in the right column of Figure 4. The top three plots depict the rate of change in spectral variation across frequency, for azimuth angle 90◦ , at elevations 30◦ , 0◦ , and −30◦ . For frequencies from 1 kHz to 5 kHz one can see how the rate of increase in the spectral variation distribution changes as a function of elevation, being very low for high elevations and higher for upper elevations. Another observation is that, irrespectively of elevation, frontal plane azimuth angles yield higher spectral variations across frequency content above 5 kHz, than interaural or rear plane ones. This can be seen by comparing the plots in the the left to those in the right column of Figure 4. The left column contains azimuth location 0◦ , at elevations 30◦ , 0◦ , and −30◦ , and azimuth location 50◦ on the horizontal plane, while the right column contains azimuth locations 90◦ , at elevations 30◦ , 0◦ , and −30◦ , and 130◦ on the horizontal plane. For example, by looking at frequencies above 5 kHz in just the bottom two plots, one can see a difference in the maximum variation between 4.0 dB and 8.0 dB. 4. SUBJECTIVE EVALUATION OF THE HRTF DATABASE 15 4.1. Experiment overview 10 5 15868 12862 10211 8106 6436 5110 4057 3221 2559 2033 1616 1283 1020 812 647 515 0 Center Frequencies in Hz Figure 2: Box plots of the frequency dependent spectral variations across ipsilateral and contralateral HRTFs. Box edges indicate the 25th and 75th percentiles and the red line marks the median. Variations are also marked at +2.7σ (black stem), and at the maximum value (red cross). The purpose of this experiment was to perceptually assess the spatial quality of the repeated measurements given three criteria: externalization perception, front / back discrimination, and up / down discrimination. This was necessary in order to verify whether, in spite of their spectral and ITD variations, individually measured HRTF sets maintained spatial accuracy. A similar experimental design was first introduced by Roginska et al. in [5]. A bank to 10 HRTF datasets was created for each of the 4 subjects in the study, which consisted of their personally measured sets. All binaural filters were reduced to 128-tap minimum-phase responses with their corresponding interaural time delays. The test signals used were 0.5 second pink noise bursts. An overview of the experimental procedure follows. 3.4. Spectral variation as a function of location In the last part of the analysis spectral variations are observed as a function of location. For every azimuth - elevation pair, average distribution curves are computed in the dB magnitude across all 12th octave bands. Examples of the combined results for a single location can be viewed in Figure 4, where each of the subplots demonstrates the degree of change (from minimum to maximum) in the spectral variations across frequencies. Results indicate that there is a relationship between the rate of increase in spectral variation and the elevation angle, for frequencies between 1 kHz and 5 kHz. In this frequency range highelevation distribution vectors are relatively flat, and their maximum values stay within the boundaries of the 75th percentiles, as reported above for ipsilateral content filters. However, as elevations decrease variations become less flat and the increase more drastic, reaching at −30◦ elevations the previously reported upper boundary rates of spectral variation rates. Similar observations have been reported by Wersényi in his analysis of repeated HRTF measurements on dummy head mannequins [17]. Results have shown that as the elevation of a sound source increases spectral variations become flatter, and re-measurement accuracy improves R Figure 3: The MATLAB interface used in the study 4.2. Experimental procedure The experiment was conducted in the Spatial Auditory Research Lab at New York University. Stimuli were presented to all subjects through the Sennheiser HD650 open headphones. The graphical interphase used for playback and collection of the user responses R was designed in MATLAB 2010 b . (Figure 3). 215 ICAD 2013 The 19th International Conference on Auditory Display (ICAD-2013) July 6–10, 2013, Lodz, Poland aNaLYSIS OF THE SPECTRaL VaRIATIONS IN REPEaTED HEaD-RELaTED TRaNSFER FUNCTION MEaSUREMENTS The 19th International Conference on Auditory Display (ICAD-2013) July 6–10, 2013, Lodz, Poland Azimuth Angle: 0 Elevation Angle: 30 Azimuth Angle: 90 Elevation Angle: 30 10 10 Azimuth Angle: 0 Elevation Angle: 30 Azimuth Angle: 90 Elevation Angle: 30 8 10 810 6 8 6 8 4 6 4 6 20 20 18 4 2 18 Azimuth Angle: 0 Elevation Angle: 0 10 11 46 11 0 46 0 15 8 15 86 68 8 51 5 Azimuth Angle: 90 Elevation Angle: 0 16 16 Azimuth Angle: 90 Elevation Angle: 0 10 10 8 57 35 57 35 81 06 81 06 5 51 5 51 51 5 Azimuth Angle: 0 Elevation Angle: 0 20 33 20 33 28 71 28 71 40 57 40 57 2 72 5 72 5 10 20 10 20 14 40 14 40 20 33 20 33 28 71 28 71 40 57 40 57 57 35 57 35 81 06 81 06 11 46 0 11 46 0 15 86 8 15 86 8 2 4 72 5 72 5 10 20 10 20 14 40 14 40 2 10 8 8 8 14 14 6 4 4 2 2 2 2 1212 51 515 5 72 725 5 10 2 10 0 20 14 1440 40 4 4 10 Azimuth Angle: 90 Elevation Angle: −30 Azimuth Angle: 0 Elevation Angle: −30 Azimuth Angle: 0 Elevation Angle: −30 10 10 57 5735 35 81 8106 06 11 11460 46 0 15 15868 86 8 6 20 2033 33 28 7 28 1 71 40 4057 57 6 51 5 51 5 72 5 72 5 10 20 10 20 14 40 14 40 20 33 20 33 28 71 28 71 40 5 40 7 57 57 35 57 35 81 06 81 06 11 4 11 60 46 0 15 8 15 68 86 8 10 Azimuth Angle: 90 Elevation Angle: −30 10 10 8 8 8 8 6 8 6 6 8 6 4 4 4 4 2 2 6 10 8 86 15 8 86 0 15 6 46 11 0 46 10 06 8 81 11 5 73 7 05 35 5 57 1 40 28 57 4 87 71 2 03 33 2 44 3 0 0 02 40 1 20 4 Azimuth Angle: 130 Elevation Angle: 0 10 8 10 8 8 6 8 6 4 4 6 2 4 Azimuth Angle: 130 Elevation Angle: 0 6 10 Azimuth Angle: 50 Elevation Angle: 0 6 14 10 20 1 5 7 25 5 5 15 51 8 86 0 46 011 86 15 46 11 Azimuth Angle: 50 Elevation Angle: 0 815 06 81 35 06 81 40 57 57 35 40 57 57 71 28 33 28 14 71 20 40 40 33 14 10 20 10 72 20 5 5 72 51 5 51 20 2 5 2 72 Spectral Variation Spectral Variation 6 4 4 2 2 2 15 86 8 0 11 46 06 81 35 57 57 40 28 71 33 20 40 14 20 68 15 8 0 46 11 06 81 5 57 3 40 57 71 28 33 20 14 40 10 5 20 0 72 5 51 10 72 51 5 5 8 86 0 46 15 8 2 Center Frequency in Hz 15 86 11 81 06 60 11 4 06 81 35 57 57 5 57 3 28 40 7 40 5 71 28 20 33 20 71 33 40 14 14 40 20 10 20 10 5 72 72 5 51 5 51 5 0 2 Center Frequency in Hz Figure 4: Rate of change in spectral variations as a function of location for frequencies 500 - 16000 Hz. The x-axis denotes center frequencies, the y-axis the indexes of the 10-point distribution vectors, and the color-coding the magnitudes of the spectral variations in dB. 8 different locations are depicted in the graph, azimuth angles 0◦ and 90◦ , across elevations ±30◦ and 0◦ , and azimuth angles 40◦ and 130◦ 4: on Rate the horizontal Figure of changeplane. in spectral variations as a function of location for frequencies 500 - 16000 Hz. The x-axis denotes center frequencies, the y-axis the indexes of the 10-point distribution vectors, and the color-coding the magnitudes of the spectral variations in dB. 8 different locations are depicted in the graph, azimuth angles 0◦ and 90◦ , across elevations ±30◦ and 0◦ , and azimuth angles 40◦ and - E) and were instructed to select all that met the stage-specific The 130◦ on theprocedure horizontalconsisted plane. of three stages designed to test the spatial quality of the involved HRTFs, rather than the perceived localization accuracy. In each of the stages the trials were created by randomly selected datasets convolved with the test signal. The procedure consisted three stages test the(A For each trial subjects were of presented with 5designed different to intervals spatial quality of the involved HRTFs, rather than the perceived localization accuracy. In each of the stages the trials were created by randomly selected datasets convolved with the test signal. For each trial subjects were presented with 5 different intervals (A 216 criterion. HRTF sets were presented 5 times in each of the three experimental stages. Stages 2 and 3 (front / back and up / down discrimination) only used binaural filters that were selected at least - E) and instructed to select all that 60% of were the time in the externalization task.met the stage-specific criterion. HRTF sets were presented 5 times in each of the three experimental stages. Stages 2 and 3 (front / back and up / down discrimination) only used binaural filters that were selected at least 60% of the time in the externalization task. ICAD 2013 The The 19th 19th International International Conference Conference onon Auditory Auditory Display Display (ICAD-2013) (ICAD-2013) Subject Subject AA Wednesday, july 10 Personalized Personalized HRTF HRTF Evaluation Evaluation 100 100 8080 8080 6060 6060 4040 4040 2020 2020 HRTF Evaluation Rate HRTF Evaluation Rate 100 100 0 0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 P9P9 P10 P10 • SESSION 7: HRTF and Spatial audio July July 6–10, 6–10, 2013, 2013, Lodz, Lodz, Poland Poland Subject Subject HH Extern. Extern. Fr /FrB/ B UpUp / Dn / Dn 0 0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 P9P9 P10 P10 Subject Subject SS Subject Subject MM 100 100 100 100 8080 8080 6060 6060 4040 4040 2020 2020 0 0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 P9P9 P10 P10 0 0 P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 P9P9 P10 P10 HRTF HRTF Dataset Dataset IDID Figure Figure5:5:Evaulation Evaulationofofthetheindividually individuallymeasured measuredHRTF HRTFsets setsofofthetheMARL MARLdatabase, database,forforeach eachcriterion criterionseparately, separately,across acrossallall4 4subjects. subjects. Bar Bar colors colors represent represent thethe 3 criteria 3 criteria ofof thethe study, study, and and identifiers identifiers P1P1 - P10 - P10 thethe 1010 individually individually measured measured HRTFs HRTFs forfor each each subject. subject. More More specifically, specifically,stage stage 1 of 1 of thethe experiment experiment tested tested thethe ability ability ofofeach eachHRTF HRTFsetsettotoconvey conveyexternalized externalizedimages. images.Each Eachinterval interval consisted consistedofofa monophonic a monophonicreference referencesignal, signal,followed followedbybya series a series ofof5 5signals signalsspatialized spatializedat atvarious variouslocations locationsonon5 5elevations elevationsfrom from ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ −30 −30 toto+30 +30 . .The Theazimuth azimuthlocations locationsused usedwere were±30 ±30 , ±60 , ±60 , , ◦ ◦ ◦ ◦ ◦ ◦ ±90 ±90, ±120 , ±120, and , and ±150 ±150. . InInstage stage2 2subjects subjectswere wereasked askedtotoselect selectintervals intervalsforforwhich which they theycould coulddiscriminate discriminatefront frontfrom fromback. back.Each Eachinterval intervalconsisted consisted ofofa monophonic a monophonicreference referencesignal, signal,followed followedbyby3 3pairs pairsofofsignals, signals, alternating alternating onon thethe horizontal horizontal plane plane onon azimuths azimuths locations locations at at oppoopposite site sides sides ofof thethe cone cone ofof confusion. confusion.The The utilized utilized azimuthal azimuthal angles angles ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ ◦ were were 0◦0, ◦±30 , ±30 , ±60 , ±60 , ±120 , ±120 , ±150 , ±150 , and , and 180 180 , , Stage Stage 3 of 3 of thethe experiment experiment presented presented subjects subjects with with anan upup / down / down discrimination discrimination task. task.Stimuli Stimuli consisted consisted ofof a monophonic a monophonic reference reference ◦ ◦ signal, signal,followed followedbyby3 3pairs pairsofofsignals signalsalternating alternatingbetween between±30 ±30 elevations. elevations.Users Userswere wereinstructed instructedtotoselect selectallallintervals intervalsforforwhich which they they perceived perceived a change a change inin elevation. elevation.The The presented presented azimuth azimuth localoca◦ ◦ ◦ ◦ ◦ ◦ tions tions were were 0◦0, ◦±30 , ±30 , ±60 , ±60 , and , and ±90 ±90 . . 4.3. 4.3.Results Results Figure Figure5 5presents presentsthethesubjects’ subjects’assessments assessmentsofoftheir theirpersonalized personalized sets, sets,forforevery everyone oneofofthethethree threeexperimental experimentalcriteria. criteria.InInterms termsofof externalization externalizationperception, perception,allallHRTFs HRTFswere wererated ratedvery veryhighly highlyinin their theirability abilitytotocreate createconvincing convincingauditory auditoryimages imagesoutside outsideofofthethe subjects’ subjects’ heads, heads, receiving receiving a selection a selection rate rate ofof 8080 - 100%. - 100%. These These rates, rates, however, however, dropped dropped byby 2020 - 40% - 40% forfor thethe front front / back / back discrimination discriminationcase. case.ByBylooking lookingat atthethegraph, graph,weweseeseethat thatfront front/ / back back discrimination discrimination was was thethe weakest weakest rated rated task task across across allall subjects. subjects. This Thisbehavior behaviorwas wasrather ratherexpected, expected,asasthetheexperiment experimentconsisted consisted only only ofof static static (not (not following following head head movement) movement) auditory auditory cues, cues, which which areare notnot strong strong enough enough toto contradict contradict visual visual components components reporting reporting nono potential potential sound sound sources sources inin front front ofof thethe subjects. subjects.However, However, despite despite this thisdrop, drop,allallfilter filtersets setsremained remainedat atororabove abovethethe60% 60%evaluation evaluation margin, margin,with withthetheexception exceptionofofmeasurement measurement8 8ininsubject subjectH Hwhich which received received 20% 20% preference preference only only onon thethe given given task. task. Individually Individually measured measured HRTFs HRTFs received received a 20% a 20% increase increase inin thethe ratings ratings ofof thethe upup / down / down discrimination discrimination task, task, compared compared toto thethe front front / back / backstage. stage.However, However,these thesescores scoreswere werestill stillmost mostofofthethetimes times lower lowerthan thanininexternalization externalizationcriterion criterioncase. case.This Thisbehavior behaviorwas was unanimous unanimousacross acrosssubjects subjectsand andforforallalldatasets, datasets,with withvery veryfew fewex-exceptions, ceptions, like like HRTF HRTF 4 in 4 in subject subject M,M, oror HRTF HRTF 1010 inin subject subject S.S. InIngeneral, general,allallsubjects subjectsindicated indicateda strong a strongpreference preferencetowards towards individually individuallymeasured measuredHRTFs, HRTFs,bybyselecting selectingthem themat atleast least60% 60%ofof thethe time time inin most most cases. cases.AllAll binaural binaural filters filters were were consistently consistently evalevaluated uatedabove abovechance chanceasassuccessful successfulmeans meansforforrealistic realisticsound soundspaspatialization, tialization,regardless regardlessofoftheir theirpotential potentialvariations. variations.Such Suchfindings findings support support thethe known known ability ability ofof thethe human human auditory auditory system system toto accept accept / disregard / disregard variations variations inin HRTF HRTF datasets datasets within within certain certain boundaries. boundaries. 5. 5.DISCUSSION DISCUSSION AND AND CONCLUSIONS CONCLUSIONS This Thisstudy studyinvestigated investigatedthetherange rangeofofspectral spectralvariation variationbetween between thth HRTF HRTF datasets datasets originating originating from from thethe same same subject, subject, across across 1212 oc-octave tave bands bands from from 500 500 toto 16000 16000 Hz. Hz.Variations, Variations, observed observed asas a funca function tionofoffrequency, frequency,distance distanceofofthetheears earstotothethesound soundsource, source,and and thth location, location,arearereported reportedat atthethemedian medianvalues, values,thethe7575 percentile, percentile, and and at at ++ 2.72.7 standard standard deviations deviations from from thethe mean. mean.Maximum Maximum values values areare also also provided. provided. It It is is anan extension extension ofof Riederer’s Riederer’s [12] [12] work work onon thethe repeatability repeatability ofofHRTF HRTFmeasurements. measurements.That Thatresearch researchpresented presenteda acase caseforforthethe most most accurate accurate method method ofof capturing capturing binaural binaural filters, filters,byby identifying identifying thetheimpact impactofofdifferent differentcontrolled controlledconditions conditionsononthethemeasurement measurement outcome, outcome, and and thethe appropriate appropriate experimental experimental adjustments adjustments that that could could minimize minimizeit.it.The Thesignificance significanceofofmodifying modifyingeach eachofofthethecontrol control conditions, conditions,quantified quantifiedasasthethechange changeininbinaural binauralfilters, filters,was wasanaanalyzed lyzed separately. separately.However However inin this this work work thethe impact impact ofof allall controlled controlled conditions conditions is is studied studied collectively, collectively, asas inin regular regular HRTF HRTF measurement measurement 217 ICAD 2013 [2] S. Xu, Z. Li, and G. Salvendy, “Improved method to individualize head-related transfer function using anthropometric measurements,” Acoustical Science and Technology, vol. 29, no. 6, pp. 388–390, 2008. [3] K. J. Faller, A. Barreto, and M. Adjouadi, “Decomposition of Head-Related Transfer Functions into Multiple Damped aNaLYSIS OF THE SPECTRaL VaRIATIONS IN REPEaTED HEaD-RELaTED TRaNSFER FUNCTION MEaSUREMENTS The 19th International Conference on Auditory Display (ICAD-2013) July 6–10, 2013, Lodz, and Delayed Sinusoidals,” in Advanced Techniques in Poland Computing Sciences and Software Engineering, K. Elleithy, Ed. Springer Netherlands, 2010, pp. 273–278. routines more than one of them may occur at a time, affecting the data. The analysis is based on a collection of 40 HRTF datasets on The 19th International Conference onon Auditory (ICAD-2013) azimuthal grid, across 4 subjects (10 sets per subject), a 10◦ Display ◦ ◦ 5 elevations from −30 to +30 . It is expected that the more complete data collection of this work will give a broader overview of the variability theofHRTF a wider rangethe of routines more thaninone them spectrum may occuracross at a time, affecting locations, given a greater number of repeated measurements. data. Observations inter-HRTF variations frequency, The analysis isofbased on a collection of across 40 HRTF datasets ear on ◦ (ipsilateral or contralateral), and azimuth-elevation location, can 4 subjects (10 sets per subject), on a 10 azimuthal grid, across ◦ ◦ used to create the 5beelevations frommore −30dynamic to +30ways . Itofisobjectively expected evaluating that the more quality of data non acoustically For overview example, complete collection ofmeasured this work binaural will givefilters. a broader they be applied definition of thresholds the range distance of thecan variability in to thetheHRTF spectrum across a for wider of functions evaluatenumber HRTFs. Even though such thresholdlocations, used giventoa greater of repeated measurements. ing will still be not we believe that it will ear ofObservations ofcognitively inter-HRTFdriven, variations across frequency, fer a better approximation of the perceptual HRTF evaluation pro(ipsilateral or contralateral), and azimuth-elevation location, can cess, than the normally used metrics, as the mean squared be used to create more dynamic ways ofsuch objectively evaluating the error (MSE), which assume uniformity in the perceptual weights quality of non acoustically measured binaural filters. For example, of spectral frequency. they can bevariations applied toacross the definition of thresholds for the distance Knowledge the expected range of though variation in thresholdmeasured functions used toofevaluate HRTFs. Even such HRTF sets can offer new intuition on objective binaural filter ing will still be not cognitively driven, we believe that it willevalofuation. Several spatial audio researchHRTF fields,evaluation such as HRTF fer a better approximation of related the perceptual procustomization HRTFused modeling, which currently primarily rely cess, than the and normally metrics, such as the mean squared on perceptual evaluation tasks, could benefit that, by gainerror (MSE), which assume uniformity in the from perceptual weights ing a more variations robust method of frequency. assessing the quality of their designs. of spectral across ThisKnowledge research canofalso lead to morerange effective metrics for the expected of variation incomputing measured HRTF similarity, which be directly applied binaural to database matchHRTF sets can offer newcan intuition on objective filter evaling tasks. Finally, assessments of the accuracy of different uation. Several spatial audio related research fields, such as HRTF HRTF interpolation algorithms becomewhich more currently effective as filter evalucustomization and HRTFwill modeling, primarily rely ations will be done according to perceptual boundaries, rather than on perceptual evaluation tasks, could benefit from that, by gainstrict mathematical models. ing a more robust method of assessing the quality of their designs. work studying the database distributions. ThisFuture research canwill alsofocus lead toonmore effective metrics for computing This analysis will lead to a new representation of HRTFs will HRTF similarity, which can be directly applied to databasethat matchfavortasks. discrimination between sets belong toof dissimilar ing Finally, assessments of that the accuracy differentpeople, HRTF while at the same time promoting groupings of moreassimilar ones interpolation algorithms will become more effective filter evaluin the same class. Such a representation will have to be supported ations will be done according to perceptual boundaries, rather than by further subjectivemodels. evaluation studies, which will assure that the strict mathematical computed models do contradict cognitive assessments. Future work will not focus on studying the database distributions. This analysis will lead to a new representation of HRTFs that will favor discrimination between sets that belong to dissimilar people, 6. REFERENCES while at the same time promoting groupings of more similar ones F. G.class. Katz,Such “Boundary element method calculation of indiin[1] theB. same a representation will have to be supported vidualsubjective head-related transferstudies, function,” Journal of the that Acousby further evaluation which will assure the tical models Society do of not America, vol.cognitive 110, no.assessments. 5, pp. 2440–2455, computed contradict 2001. [2] S. Xu, Z. Li, and G. “Improved method to individ6. Salvendy, REFERENCES ualize head-related transfer function using anthropometric [1] B. F. G. Katz, “Boundary element method calculation vol. of indimeasurements,” Acoustical Science and Technology, 29, vidual head-related no. 6, pp. 388–390, transfer 2008. function,” Journal of the Acoustical Society of America, vol. 110, no. 5, pp. 2440–2455, [3] K. J. Faller, A. Barreto, and M. Adjouadi, “Decomposition 2001. of Head-Related Transfer Functions into Multiple Damped [2] and S. Xu, Z. Li, and G. Salvendy, “ImprovedTechniques method to in individDelayed Sinusoidals,” in Advanced Comualize using K. anthropometric puting head-related Sciences and transfer Softwarefunction Engineering, Elleithy, Ed. measurements,” Acoustical and Technology, vol. 29, Springer Netherlands, 2010,Science pp. 273–278. no. 6, pp. 388–390, 2008. [4] D. Zotkin, J. Hwang, R. Duraiswaini, and L. Davis, “HRTF [3] personalization K. J. Faller, A. Barreto, and M. Adjouadi, “Decomposition using anthropometric measurements,” in of Head-Related Transfer Functions into Multiple Damped IEEE Workshop on.Applications of Signal Processing to Auand and Delayed Sinusoidals,” Advanced in Comdio Acoustics, Inst. for in Adv. Comput.Techniques Studies, Maryland puting Sciences Engineering, K. Elleithy, Univ. College and Park,Software MD, USA: IEEE, October 2003, Ed. pp. Springer Netherlands, 2010, pp. 273–278. 157–160. [4] D. Zotkin, J. Hwang, R. Duraiswaini, and L. Davis, “HRTF personalization using anthropometric measurements,” in IEEE Workshop on.Applications of Signal Processing to Au218 dio and Acoustics, Inst. for Adv. Comput. Studies, Maryland Univ. College Park, MD, USA: IEEE, October 2003, pp. 157–160. [5] D. A. Zotkin, Roginska, T. Santoro, and G. Wakefield, “Stimulus[4] J. Hwang, R. Duraiswaini, and L. Davis, “HRTF dependent HRTFusing preference,” in 129th measurements,” Audio Engineering personalization anthropometric in Society Convention, San Francisco, CA, USA, 2010. to AuIEEE Workshop on.Applications of Signal Processing July 6–10,Studies, 2013, Lodz, Poland andDurant, Acoustics, Inst. for Adv. Comput. [6] dio E. A. S. Member, and G. H. Wakefield, Maryland “Efficient Univ. College Park, MD, USA: IEEE, October 2003, Appp. Model Fitting Using a Genetic Algorithm : Pole-Zero 157–160. proximations of HRTFs,” Audio, vol. 10, no. 1, pp. 18–27, [5] 2002. A. Roginska, T. Santoro, and G. Wakefield, “Stimulusdependent HRTF preference,” in 129th Audio Engineering [7] A. Kulkarni and H. S. Colburn, “Infinite-impulse-response Society Convention, San Francisco, CA, USA, 2010. models of the head-related transfer function,” The Journal of [6] E. Durant, S. Member, and G. vol. H. Wakefield, theA. Acoustical Society of America, 115, no. 4,“Efficient pp. 1714 Model – 1728,Fitting 2004. Using a Genetic Algorithm : Pole-Zero Approximations of HRTFs,” Audio, vol. 10, no. 1, pp. 18–27, [8] H. Hu, L. Chen, and Z. Y. Wu, “The Estimation of Person2002. alized HRTFs in Individual VAS,” in 4th International Con[7] A. Kulkarni and H. Computation, S. Colburn, “Infinite-impulse-response ference on Natural vol. 1. Ieee, 2008, pp. models of the head-related transfer function,” The Journal of 203–207. the Acoustical Society of America, vol. 115, no. 4, pp. 1714 [9] T. Ajdler, C. Faller, L. Sbaiz, and M. Vetterli, “Interpola– 1728, 2004. tion of head related transfer functions considering acoustics,” [8] H. Hu,Audio L. Chen, and Z. Y.Society Wu, “The Estimation of Person118th Engineering Convention, 2005. alized HRTFs in Individual VAS,” in 4th International Con[10] Z. Xiao-Li and X. Bo-Sun, “Spatial characteristics of headference on Natural Computation, vol. 1. Ieee, 2008, pp. related transfer function,” Chinese Physics Letters, vol. 22, 203–207. no. 5, pp. 1166 – 1169, 2005. [9] T. Ajdler, C. Faller, L. Sbaiz, and M. Vetterli, “Interpola[11] P. M. Hofman, J. G. Van Riswick, and A. J. Van Opstal, “Retion of head related transfer functions considering acoustics,” learning sound localization with new ears.” Nature neuro118th Audio Engineering Society Convention, 2005. science, vol. 1, no. 5, pp. 417–421, September 1998. [10] Z. Xiao-Li and X. Bo-Sun, “Spatial characteristics of head[12] K. A. J. Riederer, “Repeatability analysis of head-related related transfer function,” Chinese Physics Letters, vol. 22, transfer function measurements,” in 105 thAudio Engineerno. 5, pp. 1166 – 1169, 2005. ing Society Convention, 9 1998. [11] P. M. Hofman, J. G. Van Riswick, and A. J. Van Opstal, “Re[13] ——, “Head-related transfer function measurements, master learning sound localization with new ears.” Nature neurothesis,,” Ph.D. dissertation, Helsinki University of Technolscience, vol. 1, no. 5, pp. 417–421, September 1998. ogy, 1998. [12] K. A. J. Riederer, “Repeatability analysis of head-related [14] G. Wersényi and A. Illényi, “Differences in dummy-head transfer function measurements,” in 105 thAudio EngineerHRTFs caused by the acoustical environment near the head,” ing Society Convention, 9 1998. Electronic Journal of Technical Acoustics, vol. 1, pp. 1–15, [13] ——, “Head-related transfer function measurements, master 2005. [Online]. Available: http://www.ejta.org thesis,,” Ph.D. dissertation, Helsinki University of Technol[15] B. Boren and A. Roginska, “Multichannel Impulse Response ogy, 1998. Measurement in MATLAB,” in 131st Audio Engineering So[14] G. and New A. Illényi, “Differences in dummy-head cietyWersényi Convention, York, NY, October 2011. HRTFs caused by the acoustical environment near the head,” [16] A. Andreopoulou, A. Roginska, and H. Mohanraj, “A Electronic Journal of Technical Acoustics, vol. 1, pp. 1–15, Database of Repeated Head-Related Transfer Function Mea2005. [Online]. Available: http://www.ejta.org surements,” in 19th International Conference on Auditory [15] B. Boren(ICAD and A.2013), Roginska, “Multichannel Impulse Response Display Lodz, Poland, July 6-10 2013. Measurement in MATLAB,” in 131st Audio Engineering So[17] G. Wersényi, “Directional Properties of the Dummy-Head ciety Convention, New York, NY, October 2011. in Measurement Techniques based on Binaural Evaluation,” [16] A. Andreopoulou, A. Roginska, H. Mohanraj, Journal of Engineering, Computingand & Architecture, vol.“A 1, Database Repeated no. 2, pp. of 1–15, 2007. Head-Related Transfer Function Measurements,” in 19th International Conference on Auditory Display (ICAD 2013), Lodz, Poland, July 6-10 2013. [17] G. Wersényi, “Directional Properties of the Dummy-Head in Measurement Techniques based on Binaural Evaluation,” Journal of Engineering, Computing & Architecture, vol. 1, no. 2, pp. 1–15, 2007. [17] G. in Jou no.
© Copyright 2026 Paperzz