A11 - 199 - University of Pittsburgh

Conference Session (A11)
Paper #199
Disclaimer — This paper partially fulfills a writing requirement for first year (freshman) engineering students at the
University of Pittsburgh Swanson School of Engineering. This paper is a student, not a professional, paper. This paper is
based on publicly available information and may not be provide complete analyses of all relevant data. If this paper is used for
any purpose other than these authors’ partial fulfillment of a writing requirement for first year (freshman) engineering students
at the University of Pittsburgh Swanson School of Engineering, the user does so at his or her own risk.
AMBISONICS AS A MEANS FOR SOUND DESIGN IN VIRTUAL
REALITY SYSTEMS
Ayem Kpenkaan, [email protected], Mahboobin 10AM, Alex Rosati, [email protected], Mahboobin 10 AM,
Nicholas Carnevali-Doan, [email protected], Sanchez 5 PM
Abstract — Virtual reality (VR), is a computer
technology that creates a simulated environment where users
interact with images and sounds contoured to the physical
motions of the user. For a user to feel they are immersed in an
actual space, sounds must be replicated to behave exactly as
they would in the real world. This creates the need for
spatialization: the placement of noise sources within a threedimensional
environment.
Accurate
spatialization
encompasses a host of auditory factors which provide a sense
of directionality to sounds.
Currently, the most popular method of achieving this
aesthetic is Ambisonics. Ambisonics, used in conjunction with
digital audio workstations (DAWs), allows sound designers to
produce soundtracks in a virtual sound field independent of
speakers. Ambisonics effectively mimics the components which
make sound spatialized while also coordinating the multitude
of channels used to playback the sound of multiple emitters
interacting with each other.
3D audio has a diverse array of applications beyond the
obvious one of entertainment; there is potential for
educational, research, and even medical implementations.
These can range from immersive cinematography to
navigational training for the blind. 3D audio and Ambisonics
have the potential to be the great leap forward in sound design.
As Ambisonics progresses, so will its value and sustainability.
The relationship between sound designers and the technical
ability to create spatialized sound will become more symbiotic
leading to advanced realism in all applications
Key Words–Ambisonics, spatialization, Vector Based
Amplitude Panning (VBAP), Digital Audio Workstations
(DAW), Emitters Soundfield, Sustainability
INTRODUCTION
Modern virtual reality is capable of placing the user in a
three-dimensional environment and creating entire worlds that
can be explored and interacted with. The visual aspect of VR
has progressed to the point where visuals are nearly
indistinguishable from reality; however, sound is one crucial
component of the experience that remains underdeveloped.
University of Pittsburgh Swanson School of Engineering 1
3/3/2017
The typical VR headset consists of goggles that the user
wears which display visuals on two individual screens, one
located in front of each eye. The visuals react to the user’s head
motion and create a three-dimensional viewing sphere [1].
This gives the user a visual experience synonymous with reallife. VR’s main flaw is, although the visuals may feel real, the
audio is still being played through a stereo headset with one
audio channel for each ear. Regardless of how impressive the
visuals may be, this lackluster production of sound breaks the
user’s immersion. Two channel audio is two-dimensional; it
can make an otherwise three-dimensional environment feel
flat.
To create an audio environment that is genuinely threedimensional, there must be a recognition of all the facets that
make up sound sources. This includes an understanding of how
ears perceive sound and how shape and distance affect this
phenomenon. Accounting for the directionality of sound and
environmental interactions, comes together to create
“Spatialization”, or the feeling that a sound is coming from a
source rather than a speaker [2]. Ambisonics is an approach to
sound spatialization that uses a spherical coordinate system to
project periphonic soundfields. This removes limits on the
freedom of developers who have worked with sound projected
on a three-dimensional Cartesian plane [2]. This paper will
discuss the current adaption of ambisonics in the development
of virtual reality along with the advantages and disadvantages
of this approach to sound design.
SPATIALIZATION
Sound moves uniquely through three-dimensional
spaces. To properly mimic this in virtual reality, designers
have turned to utilizing advanced sound generational
techniques that can capture the spatialization that classifies
real-world experiences. Spatialization is a process that
involves the careful placement of sound sources throughout a
3D plane [2]. Spatial hearing, how one perceives sound in open
space, depends on distance from a sound source and the
direction of sound.
Importance of Spatialization
Nicholas Carnevali-Doan
Ayem Kpenkaan
Alex Rosati
It affects higher end frequencies to a greater degree and can be
visualized as the flattening of a wave [6].
In an absence of visual cues, hearing can inform an
individual of characteristics of the environment they inhabit.
For example, if an individual is in a crowded restaurant they
will hear multitude of sounds coming from everywhere in the
room. Hearing alone allows them to focus in on any one of
these sounds and hear in greater clarity. This is spatialization,
the fact that these sounds are not just present, they are
distinguishable and unique. In virtual reality, an understanding
of this is crucial for creating immersion. Sound must be as
realistic as possible so that auditory factors do not break the
immersion. Even those who are not particularly aware of the
intricacies of sound design will immediately perceive a sound
being out of place or misrepresented.
An experiment conducted by Iwaya Ohuchi, a researcher
at the Association of Computing Machinery explores the
precision associated with sound cues and how they affect a
person’s understanding of their surroundings [3]. This
experiment measured 8 individuals’ perceptions of where a
sound was coming from. Half of the experiment group was
blind; the other half was sighted. The test subjects were tasked
to localize sound from 12 equally spaced speakers placed
around them at ear level. The blind demonstrated an ability to
localize with half the error of the sighted group [3]. This study
helped show ability of localized sound sources to aid in
perception of environmental stimuli.
Directionality
When dealing with spatialization, directionality also
needs to be taken into account. Field directionality as it
pertains to sound “concentrates acoustic energy into a narrow
beam so that it can be projected to a discrete area” [4].
Generating directionality consists of producing multiple high
frequency sonic beams most of which can’t be heard. Only
when the sound waves meet the desired destination does one
hear the waves colliding with said object. To have these waves
meet at the specific desired point calculations involving wave
physics are done and a popular method of solving these
equations is known as the “Rayleigh Integral” [5]. This
computes specifically the desired sound field and how to
manipulate the sound to involve the directionality. When a
sound is coming from the left of an individual in the VR
environment, like in real-life, it would be heard predominately
by the left ear. Developers would do this by concentrating the
waves of sound to have a higher amplitude in the direction of
the left ear rather than the right thus accurately mimicking the
differences we hear from sound sources between ears[6].
Head Related Transfer Functions
Head Related Transfer Functions (HRTFs) are an
integral part of spatialization. They refer to functions that
determine the reception of sound between ears based on the
structure and movements of the head and ears [7]. HRTF’s
were first studied by attaching powerful ear scanners to a
dummy head. They did this to figure out how sound changes
as an individual moves his/her head. Due to the structure of
things like the ear canal, sound is distorted based on its location
in relation to the individual hearing it [7]. Therefore, to gather
the proper information, the experiment was designed to mimic
how sound is perceived in real-life. This provided designers
with insight as to how sound changes due to the shape of the
listener’s head and ears.
To document the effect of HRTFs, measurements were
taken in echo chambers for a finite number of source positions
to recreate an HRTF database [7]. Databases that were then
used in calculations done by “applying numerical analysis to a
wave equation whose boundary models a human head” [7].
HRTFs can be applied in a variety of directions from the source
to the receptor but are only truly accurate at a single
distance. Experiments performed by Moreto, a researcher for
the Acoustic Society of America showed that within 1 meter
of the head, the HRTF spectrum differed extremely based on
the source distance. HRTFs in general were more effective in
facilitating spatialization of sounds within one meter [7].
However, the farther from the head the HRTF spectrum was
observed, the less it varied. When HRTF measurements of
sound sources with different distances are taken, a method
known as “distance extrapolation” is utilized. This provides
COMPONENTS OF SPATIALIZATION
The ability to localize sound relies primarily distance and
directionality.
Distance
Distance, although not as good an indicator of location as
direction, can be an important gauge on one’s surroundings.
The further sound sources are, the more time they have to
interact with their environment [4]. For example, the size and
shape of a room along with the materials that make up the
space all change how a sound will reflect and echo. Reflections
are used mostly to determine the distance a sound has traveled.
This is because sounds that come from sources further away
will experience more reflections. Reverberation is a result of
reflections and describes the persistence of a sound after its
source becomes silent. Reverb is found in large, closed empty
rooms, but may not be a helpful tool for perceiving depth in
open spaces. In these situations, attenuation may be a better
indication of how far away a sound source is. Attenuation
describes a wave’s loss of intensity over distances and is a
result of absorption, scattering, and mode conversions [5].
Attenuation is technically a loss of energy, but is often
perceived through changes in volume dampening of
frequencies. Frequency damping is caused by friction between
the median of which a sound travels and the waves themselves.
2
Nicholas Carnevali-Doan
Ayem Kpenkaan
Alex Rosati
“distance-level decay” to the various HRTF functions [8].
Distance level decay, which mimics the difference in the time
it takes to recognize sound between ears, is related to interaural
time
Phasing: Interference of Waves
. Channels are the path that sound waves created in a VR
environment take to their intended destinations. When sound
sources are placed throughout the 3D plane, multiple channels
compete and issues such as sound distortion and interference
will occur; this is known as competing channels. To overcome
this, designers usually perform calculations that determine
degrees of separation between each sound source that the ear
can accurately account for. These factors include elevation and
the physical characteristics of the ear canal. Pablo Hoffmann
and Christensen Fleming (researchers at the Acoustical
Society of America) performed such experiments with these
characteristics in mind to determine with these characteristics
in mind and determined what the ideal recognition of sound
from an individual in real-life would be [9]. They utilized three
measurements to accomplish this. The first of which was
figuring out the “response of the microphone when centered
on the baffle…for all directions” [9]. With the baffle being the
device that filters the sound in a specific direction. They then
calculated the “response at the ideal canal entrance when the
human ear cast was mounted at the center of the baffle” [9].
Finally, they measured the “response of the semi-ideal hear
through device…when mounted on the ear cast” [9]. Putting
all these measurements together, they came up with the
equation shown in figure 1:
[10] Figure 2: Spherical Coordinate System
“The X/Z plane (or median plane) cuts through the
symmetry axis of the listener’s head and separates the
acoustical environment into a left and right half, while the Y/Z
plane (or frontal plane) is used to distinguish front and rear”
[10]. The letters φ and θ are used to represent the azimuth, the
horizontal angle, and zenith, the vertical angle. The letter r
represents the distance away the sound source is from the
listener.
Coordinates
The W component, where the gain factors are not
dependent on location of emitters, holds the constant zero
order information. The X, Y, and Z components carry first
order information stored in variables Ɵ and ɸ that depend on
the location of emitters. More detailed information is stored in
the second and third order planes. This expands the
decomposition of sound fields by adding spherical harmonics
of greater degrees. Spherical harmonic configurations of zerothird order B-format components are shown here:
[9] Figure 1: Phasing Equation
The equation calculates the “difference between the
directional characteristics of the ideal position and those of the
semi-ideal position of the hear through device” [9]. With this
calculation, they could adjust the sound placement so that any
conflicting channels and interference would be minimized.
AMBISONICS: MAPPING SOUND IN
A SPHERE
While every sound in a space emanates from a source, as
explained in above sections, real world sounds consist of
factors like reflections and reverberations that come from all
directions. Ambisonics, a method of understanding and
producing auditory signals, bases its approach to sound off this
concept. Ambisonics breaks down sounds into W, X, Y, and Z
components [10].
[10] Figure 3: 0-3rd Order Spherical Harmonics
The omnidirectional W component accounts for all
sounds in a space equally. This input, with a constant signal
from all directions (isotropic), only accounts for volume and
therefore holds little information regarding where a sound is
coming from. However, in conjunction with the rest of the
3
Nicholas Carnevali-Doan
Ayem Kpenkaan
Alex Rosati
components, which split the sound field into front to back (X),
left to right (Y), and top to bottom (Z) directions, this input can
provide a realistic display of sound sources across a field [10].
A realistic display of sound is not sustainable without
Ambisonics. Modern, channel-based sound emitters are
incapable of capturing the true essence of sound. The spherical
coordinate system has been proven to be an effective method
of representing sound.
Technological advancements in VR come with the demand for
greater CPU set limitations on its practicality with increasingly
complex audiovisual content.
The use of a virtual sound field, or B-format, makes
Ambisonics unique in both the input and encoding of
harmonics, and the decoding and flexibility of the output. A
decoupled sound field becomes necessary with the use of
isotropic input. Unlike traditional systems, which are
explained in greater detail in the next section on vector based
amplitude panning (VBAP), Ambisonics uses every speaker to
ensure that each individual sound is represented in all
directions. An alternative to this would be pairing sources to
speaker channels. This means sound sources would be fixed to
the one or two speakers. It is a simpler way to localize sound
sources, but does not contain the omnidirectional (W-field)
information used in Ambisonics B-format. General
applications of soundtracks do not deal with speakers that are
moving, so this does not create a problem. However, virtual
reality headsets consist of two speakers (one for each ear) that
are constantly changing location within the fields they are
representing. B-format runs independently of speakers and is
therefore conducive to use in virtual reality [11].
B-Format
The composition of W, X, Y and Z inputs creates a
speaker independent sound field to represent sources in a
workable space. This form, known as B-format provides the
producers with more freedom to focus on source placement
without the worry of speaker configurations, which are
constantly changing in virtual reality. B-format is best
understood as a simulated map of a desired scene that holds
data in terms of Ambisonics’ polar coordinates [11]. When
noises or tracks are placed in this field, they are transposed into
polar patterns that match the format that Ambisonic
Microphones record in. Equations for the spherical harmonics
which compose B-format are listed below.
Channel Optimization
Channel optimization is the selective use of sound
emitters to represent each source. Ambisonics uses the entire
soundfield to represent each source therefore sacrifices some
of its ability to pick and choose which emitters are contributing
to a sound source. This is great for spatializing sounds at higher
volumes in medium or long ranges. These are sounds that
interact with their environment. As an example, it makes sense
to use multiple channels when producing a loud bang from a
few meters away. Here, a listener will anticipate significant
scattering and reflecting This sound is heard distinctly through
both the listener’s ears with differences that give context to the
sources location. The performance of Ambisonics declines
with low volume and nearby sources that do not experience as
much environmental interference [10]. It makes less sense to
use multiple channels while playing a whisper to the listener’s
right ear which can only be heard on one side. By using all
available channels in an arrangement to produce sound
sources, Ambisonics can generate a realistic sound field
especially with louder, fuller sounds. However, this technique
sacrifices precision in the context of sounds that are highly
localized or panned heavily to one side. For these sounds,
channel optimization is crucial and can be better executed with
Vector Based Amplitude Panning.
[10] Figure 4: Equations for Spherical Harmonics in BFormat
Vector Based Amplitude Panning
In practice, first order Ambisonics lack quality and are
not useful for virtual reality applications. Higher order
Ambisonics, localize sources with much more accuracy by
providing better coverage of soundfields and holding more
directional information on the planes represented [11]. This
platform has been adopted into the virtual reality industry.
Vector Based Amplitude Panning (VBAP) describes
another mathematical approach to display and manipulate
sounds. VBAP utilizes “Cartesian coordinates to depict sound
sources” and is the accepted standard when working with
4
Nicholas Carnevali-Doan
Ayem Kpenkaan
Alex Rosati
sound production [10]. This means Ambisonic mixes, which
work in polar coordinates, are converted to VBAP format prior
playback. The decoding of this data is complex, but boils down
to a relatively direct polar to parametric conversion. Without
Ambisonics, VBAP constitutes an extensive platform also
capable of designing spatialized audio. Along with their
coordinate systems, there is another major difference between
production with Ambisonics and VBAP, Ambisonics
encourages the use of an entire sound field for each source
while VBAP aims to minimize the channels that create a sound
to maximize the sound quality [12]. With VBAP, up to three
speakers are used to produce a source. The number used
depends on where the source is in relation to array of speakers.
If a source lies directly in line with a speaker, only that speaker
will be used to play that sound. If a source lies in a direct line
between two speakers, these two channels will share the input
with gain factors concordant with the distance from each
speaker [10]. Here, sound is considered to derive from a
‘phantom source’ created in the space between these channels.
If the source lies outside of this line, a triplet will be formed
with a third speaker creating a plane on which the phantom
source can be imagined. VBAP optimization is best imagined
in a large speaker array, and the use of phantom sources is
depicted below:
standalone tracks or be repurposed as sound sources to be
coupled with content and used in programs such as
Ambisonics.
SPATIALIZATION IN AN AUDIO
SUBSYSTEM
Spatialization accomplishes 3D sound in a sequential
process. Sound is first generated in an independent source then
sent to a network where the virtual reality environment
information will also be gathered. Then the coordinates of the
individual in the VR environment will be configured in
relation to the coordinates of the virtual sound source. Lastly,
the sound will be transformed to match the physical parameters
of the virtual world and transmitted to the device that emits the
sound.
The process can be seen more carefully in the figure
provided below:
[2] Figure 6: Adaptive Process of Sound Production,
Configuration, and Transmission
The figure details a continuous process where the sound
source, a data center, and the headphones are part of a system
where information is shared and the sound emitted is tailored
to that. What makes spatialization effective is its adaptability
and it can only be effective if it responds in real time to the
user’s orientation in his/her virtual environment. Components
of this system include the Room geometry wherein the
structure of the user’s virtual environment is configured and
sound is then recreated to match said structure. The head
position and orientation is the user’s positioning in the virtual
world. Information taken from the VR headset in use. The
source localization is the process of having the sound mimic a
realistic sound coming from the specified position. All this
[10] Figure 5: Triplet Representing Phantom Sound
Source in VBAP
Digital Audio Workstation
Digital Audio Workstations (DAW) are software
programs that allow users to record, mix, and edit digital audio
files. This is where tracks are created for music, movies, and
more recently, virtual reality. DAWs provide the channels that
allow sound to be produced for use in a variety of applications.
Digital audio files created in these programs can be used as
5
Nicholas Carnevali-Doan
Ayem Kpenkaan
Alex Rosati
information is compressed and then sent as a 3D signal to the
Binaural headphones.
generally agreed upon that the lower the frequency the more
effective the headphones can be. The next physical property is
known as “Interaural Time Difference” which is the “delay of
arrival of sound between ears”. It is a very miniscule time
difference often measured in milliseconds but that difference
still contributes to the perception of realistic sound and overall
realism of the virtual environment. As mentioned before
Interaural Level Difference is harder to exploit at higher
frequencies so another method employed is known as
“Direction Dependent Filtering (DDF)” [7] which is also a
component of general incorporation of directionality into
sounds. DDF filters specific sounds based on the direction that
the sound is being transmitted and this is useful because a
sound heard from one direction would be more clear in the ear
closer to the sound source and DDF filters the sound from the
other ear to generate realism.
Binaural Headphones
These headphones which utilize Binaural Spatialization
were developed to account for the structure and density of the
ears and human head [13]. Developers do this by mounting
binaural recording microphones in a dummy, these recording
devices notice the differences of how the sound is perceived
based on placement around ears and positioning of dummy ear
[7]. By taking these into account the headphones were
developed to change sound based on movement. More
specifically when observing a sound in the virtual reality
environment the headphones will allow the user to hear it
differently when they are facing the sound directly or turn their
head in another direction. As opposed to surround sound which
is the placement of speakers to create a 3D field so that sound
comes from its corresponding direction, Spatialization adapts
to the user in the VR environment and so when some individual
turns left, whatever sound was before in front of them, is now
to their right and will be perceived as such.
The Binaural headphones are the optimal vessel for
spatialized sound as tradition headphones won't be able to
accurately convey realistic 3d sound. The Binaural headphones
utilization of head tracking allow them to be a sustainable
option for emitting spatialized sound in the future. Before these
headphones, the practicality of spatialized sound as a tool for
more than extremely specific tasks was in question. As Drazen
Boznjak, a sound designer who collaborated on The Martian
VR experience stated, “You can sit at the mixing board and
plug your headphones in, but how do you simulate yourself
looking around?” In short, Binaural Headphones. With this
advancement, strides can now be made to pair the spatialized
sound with the innovative headphones in a range of
applications varying from entertainment to education. Boznjak
went further and stated that the “best thing would be for the
VR tech community to embrace the traditional audio
engineering community.” A successful early example of his
statement and a clue as to the sustainability of this technology
are these headphones. More developments are to come that
will allow Audio in VR to be extremely immersive.
Navigational Training for the Blind
In the Virtual Reality industry, an overlooked population
has been the blind. The use of stereo speakers in VR games
does not convey enough of a sound’s information to provide
blind players a means of interacting with the game. A faculty
of electrical engineers at the Delft University of Technology in
the Netherlands presented an overview of their project at the
2015 IEEE 2nd VR Workshop on Sonic Interactions for
Virtual Environments (SIVE) which aims to include blind
players. Their Audio game, Legend of Iris (LOI), will not only
be fully comprehensible and accessible to the blind, but will
also serve as a means of development of training navigation
skills for blind children [14].
The product that the developers of LOI are working on
has been done before. Unfortunately, previous attempts have
missed a few crucial components which would have allowed
them to gain traction. Along with targeting the navigational
skills of blind children, the game itself must be enjoyable. The
game should be immersive, engaging, and replayable to
motivate children to play it. Legend of Iris is an audio-only
adventure game with an entertaining story along with puzzles
to teach navigation skills.
If newly developed technologies are to address the
sustainability of VR, applications have to target more practical
uses that reach beyond the niche entertainment markets they
attract now. The aim of the developers was to transform the
challenges that blind players would face in real-life into new
ones in the game’s fantasy setting. The first section of the game
begins with a series of small tasks to help familiarize the player
with the controls of the game. Once the player gets
comfortable, they are then faced with a series of progressively
difficult challenges focused on the different components of
auditory navigation. Examples of these challenges as stated by
the aforementioned paper are the following: “1. Locating the
origin of a sound, 2. Focusing on a specific sound in the
presence of distractions, 3. Following moving objects by sound
only, 4. Avoiding moving objects by sound only” [14].
Spatialization in Headphones
The headphones work by exploiting the physical
properties of sound being perceived by ears. The first of which
is referred to as the “Interaural Level Difference” which is
expressed as “the difference in intensity between the
ears…usually measured in dB”. At low frequencies, the
headphones adjust the intensity that the ears perceive so that it
matches that of a real-life experience, at high intensities the
structure of the head is too big of an obstacle for each ear to be
adjusted to realistic levels and creates a much larger difference
that cannot be made up for by the headphones. So it is
6
Nicholas Carnevali-Doan
Ayem Kpenkaan
Alex Rosati
Legend of Iris utilizes the concepts of ambisonics to
challenge the player. Localization is present to allow them to
find the origin of sound. There needed to be a diversity of
sounds for the player to locate one specific sound in the
presence of other sounds. Distance and directionality are
incorporated to allow the player to locate sound sources and
interact with moving objects without visual aid. The Legend of
Iris must utilize spatialization to create realistic sound and
grant the player full immersion.
Legend of Iris is an example of how Ambisonic sound
technology can be used outside of the strictly entertainment
setting. The use of this audio game opens a realm of possible
therapeutic and medical applications. In near future as a
testament to their sustainability, VR and 3D audio will need to
be taken into consideration for their wide array of useful and
beneficial applications.
headphones. Spatialized sound is the key to true, consistent
immersion.
SOURCES
[1]T.Siriborvornratanakul. “A Study of Virtual Reality
Headsets and Physiological Extension possibilities.” Cham
Springer.
06.12.2016.
Accessed
2.27.2017
https://link.springer.com/chapter/10.1007%2F978-3-31942108-7_38#aboutcontent
[2] D. Mauro. R. Mekuria. M. Sanna. “Binaural Spatialization
for 3D immersive audio Communication in a virtual world.”
Association for Computing Machinery. 09.18.2013. Accessed
2.27.2018.
http://dl.acm.org/citation.cfm?doid=2544114.2544115
[3] A. Barretto. J. Kenneth. M. Adjouadi. “3D Sound for
Human-Computer Interaction: Regions with Different
Limitations in Elevation Localization.” Association for
Computing Machinery. 10.25.2009. Accessed 2.28.2017.
http://dl.acm.org/citation.cfm?doid=1639642.1639680
[4] K. Chung. A. Neuman. M. Higgings. “Effects of the in-theear microphone directionality on sound direction
identification.” Acoustical Society of America. 2008.
Accessed
3.1.2017.
http://asa.scitation.org/doi/figure/10.1121/1.2883744
Sustainability
The International Data Corporation projected,
“worldwide revenues for the augmented reality and virtual
reality (AR/VR) market will grow from $5.2 billion in 2016 to
more than $162 billion in 2020” [15]. That equates to an almost
200 percent annual growth. With such a drastic increase in
interest and content, it is impossible to ignore the role that
technology will play in accommodating such growth. The
practicality of Ambisonics’ spherical, speaker independent
sound fields may be the more important aspect to allow for the
better and faster development of content. Even if technology
continues to advance, without a practical platform, the VR
industry’s growth would be unsustainable.
Ambisonics and 3D sound are far more advanced than
current mainstream technology. They account for the
multitude of factors that compose sound. They do with sound
what virtual reality did with visuals. Entire three-dimensional
environments can be simulated using audio alone. The illusion
of space is created by taking advantage of such factors as
directionality, distance, intensity, and reverb. Ambisonics is a
very powerful way to represent sound and has the potential to
revolutionize the audio industry.
[5]P. Budarapu. T. Narayana. B. Ramohhan. T. Rabczuk.
“Directionality of sound radiation from rectangular panels.”
Elsevier
Ltd.
03.2015.
Accessed
2.28.2017.
http://www.sciencedirect.com/science/article/pii/S0003682X
14002291
[6]B.Matan. The Science Behind Nx 3D Audio. Waves Audio
LTD. 10.16.2016. http://www.waves.com/science-behind-nx3d-audio
[6] S. Vecherin. K. Wilson. V. Ostashev. “Incorporating sound
directionality into outdoor sound propogation calculations.”
Acoustical Society of America. 12.2011. Accessed 3.1.2017.
http://asa.scitation.org/doi/10.1121/1.3655881
[7] M. Otani. T. Hirahara. S. Ise. “Numerical study on sourcedistance dependency of head related transfer functions.”
Acoustical Society of America. 05.2009. Accessed 3.1.2017.
http://asa.scitation.org/doi/full/10.1121/1.3111860
[8] M. Bai. T. Tsao. “Numerical Modeling of Head-Related
Transfer Functions Using The Boundary Source
Representation.” American Society of Mechanical Engineers.
04.2006.
Accessed
3.1.2017.
http://vibrationacoustics.asmedigitalcollection.asme.org/articl
e.aspx?articleid=1470886
[9] P. Hoffmann. C. Flemming. D. Hammershoi. “Quantitative
assessment of spatial sound distortion by the semi-ideal
recording point of a hear-through device.” Acoustical Society
of
America.
06.07.2013.
Accessed
3.1.17.
http://asa.scitation.org/doi/abs/10.1121/1.4799631
[10] V. Pullke. “Virtual Sound Source Positioning Using
Vector Base Amplitude Panning.” Audio Engineering Society.
CONCLUSION
The virtual reality industry is only getting bigger along
with the demand for advancements in content and experience.
Ambisonics remains a promising application to champion a
new era in sound design. Some limitations hindering its
sustainability include the demand for more computing power,
the need to convert the format to VBAP, and the developers
general lack of understanding regarding the technology.
However, advantages of the system are freedom, accurate
representation of soundfields. It is already being implemented
in new technologies like game/training and spatialized
7
Nicholas Carnevali-Doan
Ayem Kpenkaan
Alex Rosati
PP. 456-466.
04.05.1997. Accessed. 03.02.2017.
http://lib.tkk.fi/Diss/2001/isbn9512255324/article1.pdf
[11] R. Nishimura. K. Sonada. “B-format for Binaural
Listening of Higher Order Ambisonics.” National Institute of
Information and Communications Technology. 05.2013.
Accessed
3.2.2017.
http://asa.scitation.org/doi/abs/10.1121/1.4800849
[12] S. Astapov. E. Petlenkov. A. Tepljekov. K. Vassiljeva. D.
Draheim. “Sound localization and processing for inducing
synthetic experiences in virtual reality.” IEEE Computer
Society. November 14, 2016. Accessed March 2, 2017.
[13] Hollerweger, Florian. Periphonic Sound Spatialization in
Multi-User Virtual Environments. University of California
Santa
Barbara.
3.14.2006.
Accessed
3.1.2017
https://pdfs.semanticscholar.org/c559/b2a68d40aa8f296590e
01d7e4de823c0df41.pdf
[14] J. F. P. Cheiran, L. Nedel and M. S. Pimenta, "Inclusive
Games: A Multimodal Experience for Blind Players," 2011
Brazilian Symposium on Games and Digital Entertainment,
Salvador, 2011, pp. 164-172.
http://ieeexplore.ieee.org/document/6363230/
[15] “Worldwide Revenues for Augmented and Virtual Reality
Forecast to Reach $162 Billion in 2020, According to IDC”.
International Data Corporation . Accessed 10.30.16
http://www.idc.com/getdoc.jsp?containerId=prUS41676216&
utm_source=Triggermail&utm_medium=email&utm_campai
gn=Post%20Blast%20%28bii-apps-andplatforms%29:%20Google%27s%20appinstall%20ad%20business%20hits%20growth%20spurt%20
%E2%80%94%20Global%20VR%20and%20AR%20revenu
e%20to%20reach%20%24162B%20by%202020%20%E2%
ACKNOWLEDGEMENTS
We would like to show appreciation for our Conference
Co-Chair Samuel Birus and our Conference chair as he helped
us with the structure and content of the outline. We would also
like to mention the helpfulness of the Engineering Library
employees as they helped us with getting a general idea of how
to acquire relevant sources along with our writing instructor
for providing valuable insight on how to improve our paper.
Our parents were also instrumental in allowing us the
opportunity to be in the Engineering school so we could
research this topic.
8
Nicholas Carnevali-Doan
Ayem Kpenkaan
Alex Rosati
9