Audio Engineering Society Convention Paper Presented at the 124th Convention 2008 May 17–20 Amsterdam, The Netherlands The papers at this Convention have been selected on the basis of a submitted abstract and extended precis that have been peer reviewed by at least two qualified anonymous reviewers. This convention paper has been reproduced from the author's advance manuscript, without editing, corrections, or consideration by the Review Board. The AES takes no responsibility for the contents. Additional papers may be obtained by sending request and remittance to Audio Engineering Society, 60 East 42nd Street, New York, New York 10165-2520, USA; also see www.aes.org. All rights reserved. Reproduction of this paper, or any portion thereof, is not permitted without direct permission from the Journal of the Audio Engineering Society. Room-dependent preference of Virtual Surround Sound Frederick S. Scott 1 and Agnieszka Roginska2 1 Music Technology, New York University, New York, NY, USA [email protected] 2 Music Technology, New York University, New York, NY, USA [email protected] ABSTRACT A common method for simulating surround sound over headphones, so-called virtual surround sound, is the convolution of content information with binaural cues. Often, room information is included. This paper examines if using HRTFs with room impulse responses customized to the room the listener is in enhances the listening experience. Perceptual experiments were conducted to evaluate the effect of the room listeners were seated in on their preference of the processing technology. Scott, Roginska 1. Room-dependent Preference of Virtual Surround INTRODUCTION 2. In the past digital signal processing was challenging, expensive and required the use of specialized chips or designated processors. Now nearly everyone has a computer that can handle numerous computationallytaxing processes. The ownership of devices that can handle DSP operations has reached critical mass and mp3 players, such as the iPod, have created a generation of listeners for whom the sound of earbuds is frequently heard. Though it is very unlikely for headphone listening to totally supplant speaker listening in the near future, more attention is given to enhancing and improving headphone listening. There are numerous headphone enhancement technologies on the market. These range from hardwarebased solutions, to binaural cue replication, to fast convolution algorithms that focus specifically at simulating reverberation in real time. Previous work demonstrates several different methods to simulate surround sound over headphones. [1] [2] [3] Some specific examples are Dolby Headphone, Yamaha Silent Cinema, Creative Labs CMSS-3D. The goal of most of these systems is to replicate a surround sound speaker setup. In the ideal case scenario, a listener would not be able to tell the difference between a speaker configuration and a virtual surround sound imitation. Among the key elements for this are head-related transfer functions, interaural times differences, interaural level differences, and reverberation. All of these features can be individually measured and combined together. Existing technologies typically employ fixed processing, with no, or limited, number of presets with respect to listener (HRTF) and room response selection. METHODOLOGY Psychoacoustic experiments were designed and performed to test subjective preference of virtual surround sound reproduction technologies in three studios in the Music Technology program at New York University. Three studios were selected on merits of general acoustical quality and proximity to one another. Room A is 2.6m by 4.3m by 2.43m and is used for Room A Room B Room C The goal of this paper is to investigate whether there exists a correlation between the evaluated preference of a virtual surround sound processing method and the room a person is physically located in. A listener seated in a large room while listening to a virtual surround sound that is using room acoustics of a small room may lead to an unnatural perception of the environment. Customization of binaural room responses may lead to an improved virtual surround sound representation. This paper focuses specifically on evaluating subject preference of virtual surround sound presented over headphones, using matching and mismatching binaural room impulse responses (BRIRs) to the room a subject was seated in. AES 124th Convention, Amsterdam, The Netherlands, Figure 1. Room 2008 Diagrams May 17–20 Page 2 of 5 Scott, Roginska Room-dependent Preference of Virtual Surround mastering and mixing. Room B is 5.3m by 3.9m by 2.59m and is used for mixing and creation of multimedia projects. Room C is 9.4m by 9.4m by 2.59m and is used as a classroom. The RT60’s of the three rooms are: room A .48 seconds, room B .44 seconds and room C .63 seconds. Five binaural room impulse responses were measured for each of the three rooms. The existing surround speaker setups were used in rooms A and B, Genelec 1030a’s in the former and 1031a’s in the latter. Impulse response measurements in room C were taken with AuSIM’s AuProbe speaker which has a 3-inch driver. The speaker was placed in locations representing a surround sound setup: center location, 30° for the front channels and 110° for the rear channels. A five-second sweep was used and repeated five times for a higher signal to noise ratio. The responses for each speaker were recorded binaurally with a Neumann KU 100 and a RME Fireface 400 at 96 kHz. The dummy head was placed in the sweet spot of each speaker setup in each room. The impulse responses were deconvolved from the recordings in MATLAB using scripts written by the author based off of the works of Farina [4] For the listening test two audio samples were selected with surround ambiance in mind. The first sample was an excerpt of a Haydn string quartet taken from the THX demo disc II. The second sample was from the tune "The Great Pagoda of Funn" by Donald Fagen, off his 2006 album Morph the Cat. The 5.1 AC3 files from the DVD’s were decoded and demultiplexed with Apple’s Compressor software and then cut down to 10 second selections. The 3 BRIR sets were down sampled to 48 kHz and convolved with the two selections using MATLAB. The LFE audio channel was kept and convolved with the center channel IR in both in cases. Another pair of listening samples was created by summing the 5.1 to stereo. 3. PROCEDURE A total of 10 subjects participated in the listening experiment, aged 23 to 35. The mean age was 28.7 with a standard deviation of 4.6 years. The range of years of musical experience ranged from 2 years to 30 with a mean of 16.9 years and a standard deviation of 8.2. Eight out of the ten subjects were graduate students in the Music Technology program at New York University. We consider this narrow population advantageous due to their finer listening skills. The experiment was designed as a two-interval forced choice task between four different simulations: three sets of measured impulse responses and a 5.1 summedto-stereo. Subjects listened to the two music samples and were asked to state their preference between two simulations. Each simulations pair was repeated twice. There was a total of 24 trials. Upon listening to a comparison the subject selected which surround sound simulation they preferred using a GUI in MATLAB. The code behind the GUI presented the comparisons in a random order with the 2 musical styles intermixed. This process was repeated by having each subject take the same listening test in the three rooms the BRIR’s were measured in, starting in room A, continuing in room C and finishing in room B for every subject. Subjects took approximately 30 minutes to complete the test. The listening was done through a pair of Sennheiser HD650 headphones from the headphone out of a Macbook Pro. The laptop was placed where the listener would be in the same location where the dummy head was for the recording of the impulse responses. Figure 2. Listening Test GUI 4. RESULTS To counteract for each comparison being played only twice per listening room, per style, any pair that had contradictory answers was disregarded. From figure 3 it can be seen that when averaged across all subjects there is not a case in which the virtual room is preferred in the real space, but this is muddied with a large amount of variance as shown in figure 4. Any case in which the subject picked the virtualization customized to the listening room they were in at the time was labeled as a “correct” room pick. As per figure 6the mean of “correct” room picks across all listeners and all rooms was 29.4% with a minimal variance of 2.6% amongst the rooms and 11.1% amongst the subjects. Another important statistic was the amount of change of a subject’s preference between different listening rooms. This was examined in two ways. The first was by taking the maximum difference for a single comparison, e.g. the simulation of B versus the summed for the rock samples, between listening rooms, per subject, shown in figure 7. The second was taking the mean of the changes AES 124th Convention, Amsterdam, The Netherlands, 2008 May 17–20 Page 3 of 5 Scott, Roginska Room-dependent Preference of Virtual Surround between two rooms for all comparisons, shown in figure 8. The maximum single change for all subjects had mean of 63.3%, while the mean for all subjects of their own individual means of all comparisons between all rooms was only 20.83%. In the comparison of mean change between rooms the maximum of means was 25% change illustrated in figure 9. V. A Classical V. C Classical V. A Rock V. C Rock V. B Classical Summed Classical V. B Rock Summed Rock 0.500 0.375 0.250 0.125 0 1 2 3 4 5 6 7 8 9 10 mean STD Figure 6. “Correct” Room Picks per Subject 0.700 1.000 0.525 0.667 0.350 0.333 0.175 0 0 In A In B In C Figure 3. Mean Preferences for all Subjects 1 2 3 4 5 6 7 8 9 10 mean STD Figure 7. Maximum Change Between Choice Between Rooms 0.4 0.700 0.3 0.525 0.2 0.1 0.350 0 1 2 3 4 5 6 7 8 9 10 mean STD 0.175 0 Figure 8. Mean Change For All Comparisons per Subject In A In B In C classical Figure 4. STD of Mean Preferences for all Subjects 0.2500 0.700 0.1875 0.525 rock 0.1250 0.350 0.0625 0.175 0 0 In A In B In C In AvB In AvC Figure 9. Mean Change Between Rooms Figure 5. Mean Preferences for all Subjects w/o Summed AES 124th Convention, Amsterdam, The Netherlands, 2008 May 17–20 Page 4 of 5 In BvC Scott, Roginska 5. Room-dependent Preference of Virtual Surround technique,” 108th AES Convention, Paris, 2000 February 18-22 CONCLUSIONS The data suggests that simulated surround sound is not significantly enhanced with IR’s customized to the listening room. In fact, for the mean of preferences across all subjects, the simple summed to stereo samples were generally more preferred than most of the simulated rooms. This is further suggested that the mean of “correct” room picks was only 29.4% of the time the choice was available. There is evidence, however, that supports the idea that a listener’s preference of simulated rooms can change according to their physical location. Although the mean change across all choices was barely significant, 20.8%, the mean of the biggest change for all users was 63.3%. When looking at individual subjects there were 3 cases out of 10 in which there was a 100% reversal of opinion on one of the simulations. All of those reversal cases involved the summed audio, but there was not a significant pattern to which listening rooms they occurred between. The summed samples would be drier than any of the simulations perhaps explaining this. This could be a potential artifact of the testing procedure itself by having the subjects listen in all three rooms in rapid succession and could disappear if the testing were spread out over time. Also the order of progression through the listening rooms kept static. The perfect situation would have every subject running the test an additional 5 times for every potential room order. Familiarity of the subjects with the listening rooms was not investigated. With 8 of the subjects being students in the program at NYU it is probable that these subjects could have spent a large amount of time in many of the listening rooms, possibly influencing their results. 6. REFERENCES [1] Cheng, CI., Wakefield, GH, "Introduction to HeadR e l a t e d T r a n s f e r F u n c t i o n s ( H RT F s ) : Representations of HRTFs in Time, Frequency, and Space," Journal of the Audio Engineering Society, 2001 [2] Begault, D., "3D Sound For Virtual Reality and Multimedia," R Academic Press Inc, 1994 [3] Lorho, G., Isherwood, D. , Zacharov, N., Huopaniemi, J., "Round Robin Subjective Evaluation of Stereo Enhancement Systems for Headphones," AES 22nd International Conference on Virtual, Synthetic and and Entertainment Audio 2002 [4] Farina, A., “Simultaneous measurement of impulse response and distortion with a swept-sine AES 124th Convention, Amsterdam, The Netherlands, 2008 May 17–20 Page 5 of 5
© Copyright 2026 Paperzz