Applied Acoustics 73 (2012) 1282–1288 Contents lists available at SciVerse ScienceDirect Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust Three-dimensional acoustic sound field reproduction based on hybrid combination of multiple parametric loudspeakers and electrodynamic subwoofer Yutaro Sugibayashi a, Sota Kurimoto a, Daisuke Ikefuji a, Masanori Morise b,⇑, Takanobu Nishiura b a b Graduate School of Science and Engineering, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan College of Information Science and Engineering, Ritsumeikan University, 1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan a r t i c l e i n f o Article history: Available online 12 April 2012 Keywords: Three-dimensional acoustic sound field reproduction Parametric loudspeaker Electrodynamic subwoofer Sound image localization Sound quality improvement a b s t r a c t Auditory Mixed Reality (MR) systems that reproduce Three-Dimensional (3-D) acoustic sound fields have recently become a research focus because the combination of visual and auditory MR systems can achieve a greater sense of presence than conventional visual MR systems. General auditory MR systems usually use a headphone-based system with a Head-Related Transfer Function (HRTF), which is a major system for reproducing 3-D acoustic sound fields. However, the localization accuracy of sound images with a HRTF depends on the individual. On the other hand, we have already proposed a system for reproducing a 3-D acoustic sound field with parametric loudspeakers instead of headphones. The 3-D acoustic sound field reproduced by this system has achieved a highly accurate localization of sound images. However, one problem is that it is difficult to reproduce lower frequency sounds using parametric loudspeakers, which causes a poorer sound quality. We tried to accomplish a greater sense of presence for 3-D acoustic sound fields based on a hybrid combination of an electrodynamic subwoofer and the parametric loudspeakers by improving the sound quality. Sound images were formed at the target location using the parametric loudspeakers, and a lower frequency sound was compensated for by using the electrodynamic subwoofer. Subjective evaluation experiments were conducted to verify the effectiveness of the proposed system. We confirmed the improved sound quality while maintaining a higher accuracy of sound image localization by using the proposed system. We also confirmed the optimum parameters of the proposed system to achieve a greater sense of presence. Ó 2012 Elsevier Ltd. All rights reserved. 1. Introduction Mixed Reality (MR) systems that seamlessly merge real and virtual spaces have recently become a research focus for use as an applied Virtual Reality (VR) technology that presents a visual sense of presence to users [1]. Conventional MR systems have presented only Three-Dimensional (3-D) visual Computer Graphics (CG) objects to users. Auditory MR systems which reproduce 3-D acoustic sound fields have also recently been a focus because the combination of visual and auditory MR can present a greater sense of presence to users than that when using conventional visual MR systems [2]. Auditory MR systems require the reproduction of a 3-D acoustic sound field as virtual sound by forming sound images at target locations and to seamlessly merge real and virtual sounds. A two-by-two audio-visual MR system that compatibly manages both the visual and auditory MR systems has been proposed [3]. ⇑ Corresponding author. Tel./fax: +81 77 561 5075. E-mail addresses: [email protected] (Y. Sugibayashi), cm005068 @ed.ritsumei.ac.jp (S. Kurimoto), [email protected] (D. Ikefuji), mor [email protected] (M. Morise), [email protected] (T. Nishiura). 0003-682X/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.apacoust.2012.03.009 This system has merged real and virtual sounds without shutting out the real sound by using open-air headphones and used a headphone-based system with a Head-Related Transfer Function (HRTF) [4] to reproduce 3-D acoustic sound images surrounding the human head, which is a major 3-D acoustic sound reproduction system. Sound images can be accurately formed at target locations by using a headphone-based system with a HRTF. However, a headphone-based system with a HRTF requires the measurement of the personal HRTF of each user because the shapes of human heads or ears differ based on the person. It has been proposed to select or generate an optimum HRTF from the previously measured HRTF database [5] because of the many computational costs of measuring HRTF [6]. However, users often confuse the front-back localization of sound images [7]. Auditory MR systems should overcome these problems. We have already proposed a system for reproducing a 3-D acoustic sound field by using parametric loudspeakers [8] instead of headphones to overcome these problems [9]. As a result, the proposed system achieved a higher level of accuracy for the sound image localization, which did not depend on the individual users. However, one problem with this system is that it is difficult to Y. Sugibayashi et al. / Applied Acoustics 73 (2012) 1282–1288 1283 The 3-D acoustic sound fields have higher localization accuracy of sound image without depending on the individual. The tightness the user feels on their head because of the headphones is reduced. We have already proposed a system of reproducing the 3-D acoustic sound field by using parametric loudspeakers instead of headphones to fulfill these requirements [9]. We discuss the underlying principle and characteristics behind the parametric loudspeakers used in the proposed system in next section. We then discuss the sound image localization by using parametric loudspeakers as the underlying principle of the proposed system. Fig. 1. Directional patterns of parametric and electrodynamic loudspeakers. reproduce lower frequency sounds by using the parametric loudspeakers, which causes a poorer sound quality. We attempted to achieve a greater sense of presence for 3-D acoustic sound fields based on a hybrid combination of an electrodynamic subwoofer and parametric loudspeakers to improve the sound quality for this paper. 2. Problems with headphone-based system with HRTF and requirements for auditory MR Auditory MR systems require the formation of sound images at the target locations and to seamlessly merge real and virtual sounds. A two-by-two audio-visual MR system [3] has reproduced a virtual 3-D acoustic sound field by using a headphone-based system with a HRTF and merged real and virtual sounds by using open-air headphones. However, the headphone-based system with a HRTF has two problems mentioned below. It is difficult to measure the personal HRTF of each user due to the computational costs although a personal HRTF is required to accurately present sound images. The localization accuracy of sound images depends on the individual, provided that the user uses a HRTF of another person. In particular, users often confuse the front-back localization of sound images. The users may feel an increased amount of the tightness on their head because they have to be equipped with headphones in addition to Head Mounted Displays (HMD). The MR system should not only overcome these system problems in order to reproduce a 3-D acoustic sound field, but also fulfill the requirements of an auditory MR system. Therefore, the MR system should also fulfill the following two requirements. 3. 3-D acoustic sound field reproduction with parametric loudspeakers 3.1. Principle behind parametric loudspeaker Parametric loudspeakers with sharper directivity can emit audible sounds to a particular area in contrast with conventional loudspeakers that emit widely spreading acoustic sound. The particular area where a listener can hear the audible sound is defined as an audio spot in this paper. Parametric loudspeakers use ultrasounds as the carrier sounds, which have sharper directivity characteristics. Fig. 1 outlines the directional patterns of parametric and electrodynamic loudspeakers. Fig. 2 outlines the principle behind parametric loudspeakers. The amplitude of an ultrasound is modulated with an audible sound. The modulated ultrasound consists of the frequencies of the carrier sound and the adjacent sidebands. Parametric loudspeakers emit an intense modulated ultrasound. A difference tone or combination tone is then generated because of the nonlinear interaction in the air. The difference tone between the carrier sound and each sideband is equal to the original audible sound. In other words, the emitted ultrasound is demodulated into the original audible sound because of a nonlinear interaction in the air. The modulated ultrasound vAM(t) with an audible sound is calculated as v AM ðtÞ ¼ V cm ð1 þ mV S ðtÞÞV C ðtÞ; ð1Þ V sm m¼ ; V cm ð2Þ where, Vcm represents the maximum amplitude of the carrier sound, m represents the amplitude modulation factor, and Vsm represents the maximum amplitude of the audible sound. Here, VS(t) represents the audible sound and VC(t) represents the carrier sound. Fig. 2. Underlying principle of parametric loudspeaker. 1284 Y. Sugibayashi et al. / Applied Acoustics 73 (2012) 1282–1288 Fig. 4. Concept behind proposed system. 3.3. Overview of proposed system Fig. 3. Concept behind reflective audio spot. Based on this method, parametric loudspeakers with the sharper directivity characteristics can be created. 3.2. Sound image localization with parametric loudspeakers The audible sound emitted from parametric loudspeakers is reflected from the walls while maintaining a sharper directivity [10]. This particular area affected by the reflection, where a listener can hear audible sounds, is defined as a reflective audio spot in this paper. Fig. 3 outlines the concept behind a reflective audio spot in which the listener can perceive an acoustic sound image from the location of a wall, and not from that of the a loudspeaker [11]. This is because only reflected sound not direct sound is transmitted to the listener because of sharper directivity. A steering sound image can thereby be achieved by steering the emission angle of the parametric loudspeaker. Therefore, we have proposed a system to reproduce a 3-D acoustic sound field using these characteristics of reflective audio spots with parametric loudspeakers. Fig. 4 outlines the concept behind the proposed system. The proposed system uses a unit with multiple parametric loudspeakers mounted on it. The emitted sounds from the unit are reflected from the walls, ceiling, or floor, similarly to the principle of a light planetarium. Sound reflections with parametric loudspeakers can form sound images at various target locations. Listeners using the proposed system can experience 3-D acoustic sound fields without having to wear headphones. 3.4. Configuration for proposed system The proposed system consists of a unit that has ten parametric loudspeakers mounted on it (as shown in Fig. 5a) and the reflectors (as shown in Fig. 5b). The direction of emissions for all the parametric loudspeakers can be adjusted. The sounds emitted from the unit are reflected from the reflector in addition to the walls, ceiling, or floor. We used acrylic boards (500 500 5 mm) as reflectors and constructed a 3-D acoustic sound field with them. The directions of the reflectors were adjusted so that the emitted sound arrived at the listening location. Fig. 5. Configuration for proposed system. Y. Sugibayashi et al. / Applied Acoustics 73 (2012) 1282–1288 1285 Table 1 Experimental conditions. Parametric loudspeaker Electrodynamic subwoofer Microphone Loudspeaker amplifier Microphone amplifier A/D, D/A converter Sampling Background noise level Reverberation time MITSUBISHI, MSP-50E YAMAHA, YST-SW225 HOSIDEN, KUC-1333 YAMAHA, P2500S Thinknet, MA2016 Roland, UA-101 96.0 kHz, 16 bit 36.7 dBA 670 ms 3.5. Problem with proposed system The 3-D acoustic sound fields produced by the proposed system have achieved a highly accurate localization of sound images. However, the main problem is that it is difficult to reproduce lower frequency sound by using parametric loudspeakers, which causes poorer sound quality. We tried to accomplish a greater sense of presence with 3-D acoustic sound fields based on a hybrid combination of an electrodynamic subwoofer and parametric loudspeakers by compensating for the low frequency sound. 4. 3-D acoustic sound field reproduction based on hybrid combination of multiple parametric loudspeakers and electrodynamic subwoofer Fig. 6. Frequency characteristics of each loudspeaker and proposed system. frequency characteristics of the parametric loudspeakers, the electrodynamic subwoofer, and the proposed system to find what effect the electrodynamic subwoofer has on the sound images from the parametric loudspeakers. 4.2. Experiment to measure frequency characteristics We proposed a 3-D acoustic sound field reproduction that is based on hybrid combination of multiple parametric loudspeakers that reproduce higher frequency sound and an electrodynamic subwoofer that reproduces a lower frequency sound. Sound images were formed at the target location with the parametric loudspeakers, and lower frequency sound was compensated for by the electrodynamic subwoofer. Here, we attempted to meet the following three requirements to achieve a greater sense of presence. Sound quality of 3-D acoustic sound field is improved. The higher localization accuracy of the sound image at the target locations, which is formed by the parametric loudspeakers, is maintained even though the parametric loudspeakers and electrodynamic subwoofer are combined. The sound image is not localized at the electrodynamic subwoofer location because the sound images should be localized at only the target locations. The experiments to measure the frequency characteristics of the loudspeakers were conducted to confirm the frequency characteristics of the parametric loudspeakers, the electrodynamic subwoofer, and the proposed system. Table 1 summarizes the experimental conditions. Fig. 6 plots the experimental results. We confirmed that the electrodynamic subwoofer used in these experiments could emit sound up to 1 kHz as a result of its frequency characteristics. It could compensate for the low frequency characteristics of the parametric loudspeakers by using the pro- In other words, the electrodynamic subwoofer should not affect the sound images of the parametric loudspeakers while improving the sound quality for achieving a greater sense of presence. Therefore, we needed to investigate the influence of the electrodynamic subwoofer on the sound images of the parametric loudspeakers. 4.1. Influence of electrodynamic subwoofer on sound images of parametric loudspeakers Studies have found that the localization accuracy decreases where people localize a lower frequency sound [12] because lower frequency sound spreads more widely [13]. Therefore, people do not generally localize sound images on an electrodynamic subwoofer that emits sound below the 200-Hz frequency band. However, because it is also difficult to reproduce lower frequency sound above 200-Hz with parametric loudspeakers, we also need an electrodynamic subwoofer to emit sound above this band to completely compensate for the low frequency band. Therefore, the electrodynamic subwoofer may affect the sound images from the parametric loudspeakers. Consequently, we should measure the Fig. 7. Experimental environments. 1286 Y. Sugibayashi et al. / Applied Acoustics 73 (2012) 1282–1288 5. Evaluation experiments We conducted two subjective experiments. The experiments to evaluate the localization accuracy of the sound image were conducted to find what influence the electrodynamic subwoofer had on the localization accuracy of the sound image at the target locations, which was formed by the parametric loudspeakers, when using the proposed system. The subjective experiments using the Mean Opinion Score (MOS) were conducted to determine the optimum cut-off frequency for the LPF of the electrodynamic subwoofer and the optimum LHR to improve the sound image localization for the humans. The subjects evaluated the sound quality and sound image localization using the MOS. Fig. 8. Results from experiments to evaluate localization accuracy of sound image. posed system (as shown in Fig. 6). However, the electrodynamic subwoofer may have affected the sound images from the parametric loudspeakers when it emitted sound whose frequency was up to 1 kHz. The higher localization accuracy of the sound image at the target locations, which was formed by the parametric loudspeakers, may therefore decrease when the proposed system was used. Therefore, for the evaluation experiments, we first evaluated the localization accuracy of the sound image at the target locations, which was formed by the parametric loudspeakers, when the proposed system was used. A sound image may also be localized at the electrodynamic subwoofer location although the sound images should be localized at only the target locations. To overcome this problem, we should optimize the parametric loudspeakers and electrodynamic subwoofer parameters, which may affect the sound image localization, and we should achieve the sound image that can be localized at only the target location, not at the electrodynamic subwoofer. 5.1. Experiments to evaluate localization accuracy of sound image and results Fig. 7 outlines the experimental environments. Seven sound images were randomly presented to the subjects by using the seven parametric loudspeakers while using the proposed system as shown in Fig. 7. The subjects were asked to select one of the seven directions, in which they localized the sound image. The stimulus was white noise and its duration was 10.0 s. Five subjects (males aged 22–24) with normal hearing took part in the evaluation. Fig. 8 plots the results obtained from the experiments to evaluate the localization accuracy of the sound image, which indicates a 95.7% accuracy. These results correspond to the results obtained without the use of an electrodynamic subwoofer [9]. Furthermore, the subjects did not confuse the front-back localization of the sound images, while front-back confusion was a main problem of the headphone-based system. These suggest that the higher localization accuracy of the sound image at the target locations, which was formed by the parametric loudspeakers, was maintained when using the proposed system. 5.2. Subjective evaluations on sound quality and sound image localization and results 4.3. Optimization for cut-off frequency of low pass filter and sound pressure levels We attempted to optimize the two parameters of the proposed system, which may affect the sound image localization, and attempted to achieve the sound image localized at only the target location, not at the electrodynamic subwoofer. We optimized the cut-off frequency of the Low Pass Filter (LPF) with which the sound emitted by the electrodynamic subwoofer was processed. We also optimized the Sound Pressure Level (SPL) of the electrodynamic subwoofer to that of parametric loudspeaker ratio through the evaluation experiments. The SPL of the electrodynamic subwoofer to that of the parametric loudspeaker ratio was defined as a Low to a High frequency Ratio (LHR) in this paper. The observed output signal Y(x) and LHR were calculated as YðxÞ ¼ aPðxÞ þ bSðxÞLðxÞ; ð3Þ LHR ¼ 20 logðb=aÞ; ð4Þ where P(x) represents the output signal of the parametric loudspeaker, S(x) represents the output signal of the electrodynamic subwoofer, and L(x) represents the signal of the LPF. x represents the angular frequency, and a and b represent the amplification coefficients of each loudspeaker. Subjective evaluations were conducted to determine the optimum cut-off frequency for the LPF for the electrodynamic subwoofer and optimum LHR to achieve the sound image localized at only the target location, not at the electrodynamic subwoofer. Subjective evaluations were conducted on the sound quality and sound image localization using the MOS. The subjects evaluated the sound quality and sound image localization, according to the standards listed in Table 2. All the subjects listened to a reference sound (Score 1 in Table 2) before the experiments to evaluate the sound quality. The parametric loudspeaker presented the reference sound that was a stimulus for score 1. The stimuli were randomly presented to the subjects. The stimuli consisted of a voice (male), and music (orchestra and cello). The durations of the stimuli were 10.0 s. Five subjects (males aged 22–24) with normal hearing took part in the evaluation. The cut-off frequencies for the LPF were 200, 400, 600, 800, 1000, and 1 Hz. The slope was Table 2 Score and opinion. Score Sound quality Sound image localization 5 Excellent 4 Good 3 2 Fair Poor 1 Worst Sound image is at only target direction (not at subwoofer direction) Sound image is at almost only target direction (almost not at subwoofer direction) Fair Sound images are at almost two directions (also slightly at subwoofer direction) Sound images are at two directions (also at subwoofer direction) Y. Sugibayashi et al. / Applied Acoustics 73 (2012) 1282–1288 Fig. 9. Results from subjective evaluation on sound quality for cut-off frequency of LPF. 1287 Fig. 11. Results from subjective evaluation of sound quality for LHR. Fig. 12. Results from subjective evaluation on sound image localization for LHR. Fig. 10. Results from subjective evaluation on sound image localization for cut-off frequency of LPF. 30 dB/oct. for the lower frequency. The LHRs were 20, 15, 10, and 5 dB. Fig. 9 plots the results obtained by subjectively evaluating the sound quality for the cut-off frequency of the LPF. The error bars in the barcharts represent the standard deviations. We confirmed an improvement in the sound quality accomplished by using a hybrid combination because score 1 represents the sound quality equivalent for the parametric loudspeakers. The sound quality especially improves with the increasing cut-off frequency of the LPF. These results suggest that the lower frequency sound emitting from the parametric loudspeakers could be compensated for by the electrodynamic subwoofer. On the other hand, Fig. 10 plots the results obtained from subjectively evaluating the sound image localization for the cut-off frequency of the LPF. We confirmed that the MOS for the sound image localization was extremely higher when using cut-off frequencies from 400 to 800 Hz, while the MOS for the sound image localization differed depending on the sound source. The MOS for the sound image localization was higher for the voice (male) and music (cello). Although we could not confirm the tendency for improvement by varying the cut-off frequencies with the other sound sources, the sound quality and the sound image localization effectively improved on average with the cut-off frequencies for the electrodynamic subwoofer from 400 to 800 Hz as can be seen from Fig. 10. Fig. 11 plots the results obtained by the subjective evaluation of the sound quality for the LHR. We confirmed that the sound quality especially improved with the increasing LHR from 20 to 10 dB. Moreover, the sound quality improved even under the conditions where the LHR was 20 and 15 dB because score 1 represents the sound quality equivalent to that of the parametric loudspeakers. To achieve a greater sense of presence, we should also take the sound image localization into consideration. On the other hand, Fig. 12 plots the results obtained from subjectively evaluating the sound image localization for the LHR. We could confirm that the MOS for the sound image localizations was extremely higher when decreasing the LHR, while the MOS for the sound image localization differed depending on the sound source. The MOS for the voice (male) was 4.9 and 4.7, and that for music (cello) was 4.7 and 4.6 when the LHR was 20 and 15 dB. The minimum MOS at 20 and 15 dB was 3.1 and 2.9 for music (orchestra). Score 3 represents fair as can be seen in Table 2. Therefore, the MOS for the sound image localization was higher, provided that the LHR was 20 and 15 dB. 6. Discussion Results from experiments to evaluate the localization accuracy of the sound image suggested that the highly accurate localization 1288 Y. Sugibayashi et al. / Applied Acoustics 73 (2012) 1282–1288 of the sound images at the target locations, which were formed by the parametric loudspeakers, was maintained when using the proposed system. Therefore, the proposed system was effective at accurately locating the sound image. Furthermore, both a higher sound quality and an improved sound image localization were effectively achieved using the cut-off frequencies from 400 to 800 Hz for the electrodynamic subwoofer and LHRs of 20 and 15 dB. Therefore, the cut-off frequencies from 400 to 800 Hz and LHRs of 20 and 15 dB were optimum for achieving a greater sense of presence. On the other hand, we confirmed that the MOS for the sound image localization differed depending on the sound source. Therefore, we need to investigate the frequency characteristics of the sound source with high and low MOSs for sound image localization to achieve a greater sense of presence in the future. 7. Conclusion We tried to achieve a greater sense of presence for 3-D acoustic sound fields based on a hybrid combination of an electrodynamic subwoofer and parametric loudspeakers. We found from the experimental results that a higher localization accuracy of the sound images at the target locations, which were formed by using parametric loudspeakers, was maintained even though the parametric loudspeakers and an electrodynamic subwoofer were combined. Furthermore, we attempted to optimize the parameters of the parametric loudspeakers and electrodynamic subwoofer, which may affect the sound image localization, and attempted to improve the sound image localization for humans. We confirmed from the experimental results that the optimum cut-off frequencies for the electrodynamic subwoofer ranged from 400 to 800 Hz, and LHRs of 20 and 15 dB were optimum for achieving both an improved sound quality and sound image localization. We intend to investigate the frequency characteristics of a sound source with high and low MOSs for the sound image localization, and will attempt to achieve a greater sense of presence for 3-D acoustic sound fields using a hybrid combination of an electrodynamic subwoofer and parametric loudspeakers in future studies by further improving the sound image localization. Acknowledgement This work was partly supported by a Grand-in-Aid for Scientific Research funded by the Ministry of Education, Culture, Sports, and Science (MEXT) of Japan. References [1] Ohta Y, Tamura H. Mixed reality-merging real and virtual worlds. Ohm-sha & SpringerVerlag; 1999. [2] Gaver WW, Smith RB, O’Sherk T. Effective sounds in complex systems: the ARKola simulation. In: Proceedings of the CHI’91; 1991. p. 85–90. [3] Higa K, Nishiura T, Kimura A, Shibata F, Tamura H. A two-by-two mixed reality system that merges real and virtual worlds in both audio and visual senses. In: Proceedings of the ISMAR2007; 2007. p. 203–206. [4] Kawaura J, Suzuki Y, Asano F, Sone T. Sound localization in headphone reproduction by simulating transfer functions from the sound source to the external ear. J Acoust Soc Jpn (E) 1991;12(5):203–16. [5] Takeda laboratory at nagoya university, <http://www.sp.m.is.nagoya-u.ac.jp/ HRTF/>. [6] Shimada S, Hayashi N, Hayashi S. A clustering method for sound localization transfer functions. J Audio Eng Soc 1994;42(7/8):557–84. [7] Morimoto M, Ando Y. On the simulation of sound localization. J Acoust Soc Jpn (E) 1980;29(3):167–74. [8] Yoneyama M, Fujimoto J, Kawamo Y, Sasabe S. The audio spotlight: an application of nonlinear interaction of sound waves to a new type of loudspeaker design. J Acoust Soc Am 1983;73(5):1532–6. [9] Sugibayashi Y, Kurimoto S, Morise M, Nishiura T. Design of system to reproduce 3-D sound field with multiple parametric loudspeakers. In: Proceedings of the Internoise2011, CD-ROM Proceedings, 2011. [10] Hirokawa K, Morise M, Nishiura T. The fundamental design of reflective audio spot utilizing ultrasound loudspeaker. In: Proceedings of the WESPAC2009, CD-ROM Proceedings; 2009. [11] Morise M, Ikefuji D, Tsujii H, Hirokawa K, Nishiura T. A design of reflective audio spot with reflective objects. In: Proceedings of the ICA2010, CD-ROM Proceedings; 2010. [12] Blauert J. Sound localization in the median plane. Acustica 1969;22:205–13. [13] Kates JM. Optimum loudspeaker directional patterns. J Audio Eng Soc 1980;28(11):787–94.
© Copyright 2024 Paperzz