Dolby Corporate Title Slide

Introducing Audio
Signal Processing &
Audio Coding
Dr Michael Mason
Senior Manger, Sound Development
Dolby Australia Pty Limited
Overview
Audio Signal Processing Applications @ Dolby
Audio Signal Processing Basics
• Sampling
• What is an audio signal?
• Signal Processing Domains
Case Study 1 – Headphone Virtualisation
• Frequency Response
• FIR filtering
• Computational Complexity
Case Study 2 – Perceptual Audio Coding
• Psychoacoustics
© 2016-17 DOLBY LABORATORIES, INC.
CONFIDENTIAL
Audio Signal Processing
Applications @ Dolby
Audio Signal Processing Applications @ Dolby
Cinema
• Delivering channel based audio - 5.1 – 7.1
– Distribute movies to multiple screens in a multiplex
– Cinemas use speaker arrays – rather than single speakers – so processing required to fill the arrays from single channel feeds
• Rendering immersive audio – Dolby Atmos
– Cinema soundtrack is express as individual objects and locations - in every cinema the movie is renderer for that specific
cinema’s speaker locations
• Speaker equalisation & protection
– Process the audio sent to each speaker to compensate for the electro-acoustic properties of the speaker. (e.g., frequency
response, distortion characteristics)
– Ensure that audio sent to the speakers doesn’t over driver the speaker, which would damage them.
CONFIDENTIAL
© 2016-17 DOLBY LABORATORIES, INC.
4
Audio Signal Processing Applications @ Dolby
Broadcast / Home Theatre
• Compression of Audio for Streaming / DVD / Blu-ray Disc
– Perceptual audio coding (case study later)
– Matrix encoding (Pro-logic)
– Multi-channel audio coding
– Multiple languages
– Multiple playback formats (stereo / 5.1 / etc)
• Broadcast end-to-end
– Capture, mixing, coding, transmission, playback
• AV Receivers (AVRs), Set Top Boxes (STBs), Digital Media Adapters (DMAs)
• Games consoles
CONFIDENTIAL
© 2016-17 DOLBY LABORATORIES, INC.
5
Audio Signal Processing Applications @ Dolby
Personal Audio
• Devices
– Mobile phones (feature phones & smart phones)
– Tablets
– Music players
– PCs
• Same issues as Home Theatre, but usually more limited acoustic hardware (i.e. cheap speakers)
• Headphone playback is a big use case (case study later)
CONFIDENTIAL
© 2016-17 DOLBY LABORATORIES, INC.
6
Audio Signal Processing Applications @ Dolby
Voice Processing
• Many of the ‘same’ basic challenges – but because speech has some specifically different characteristics from general
audio, different solutions exist
• Speech coders use different approaches than audio codecs
– What makes a good codec is measured differently
– The transmission bandwidths used for the data is much more limited
• Conferencing & Telephony
CONFIDENTIAL
© 2016-17 DOLBY LABORATORIES, INC.
7
CONFIDENTIAL
Audio Signal Processing
Basics
Audio Signal Processing Basics
Sampling
• Digital signals have samples which are discrete in time and magnitude
• Process of converting a continuous signal to the digital domain is Sampling
– Two key questions when sampling are: How often to sample & how precisely?
Analogue
to Digital
Converter
(ADC)
CONFIDENTIAL
Digital
Signal
Processing
Digital to
Analogue
Converter
(DAC)
© 2016-17 DOLBY LABORATORIES, INC.
9
Audio Signal Processing Basics
Sampling Frequency – 𝑓𝑠 (how often?)
• Number of samples per second
• Nyquist rate:
– Greater than twice the highest frequency
CONFIDENTIAL
© 2016-17 DOLBY LABORATORIES, INC.
10
Audio Signal Processing Basics
Resolution (how precisely?)
• Each sample is represented by a number, how many bits should we use?
• Converting a continuous value to a discrete value requires quantisation.
1
• Quantisation Error
– ‘1’ → 0.5
– ‘0’ → -0.5
Digital
0
-1.0
CONFIDENTIAL
Analogue
+1.0
© 2016-17 DOLBY LABORATORIES, INC.
11
Audio Signal Processing Basics
Resolution (how precisely?)
• By using more bits, we reduce the error
101
… skipping all the math …
• Each additional bit of resolution improves SNR (signal to noise ratio) by
6.02 dB
000
-1.0
CONFIDENTIAL
Analogue
+1.0
© 2016-17 DOLBY LABORATORIES, INC.
12
Audio Signal Processing Basics
Audio Signal
• Sampling Frequency
– Human perception – 20 Hz – 20,000 Hz
– Nyquist says Fs >= 40 kHz
• CD Audio: 44.1 kHz
• Blu-ray (and before that DAT): 48 kHz
• Bit depth
– Range of loudness relative to human hearing…
• Threshold of hearing – 0 dB
When/Where might we
use more?
(higher sampling rate or more bits?)
• Jet Engines – 110-140 dB
• Busy Road (standing at the curb) – 100 dB
• Sustained exposure will cause damage – 85dB
– 16 bits per sample gives ~ 96 dB of dynamic range
– 24 bits per sample = 144 dB
CONFIDENTIAL
© 2016-17 DOLBY LABORATORIES, INC.
13
Audio Signal Processing Basics
Audio Signal
• Raw data rate
– 48 kHz, 16 bits per sample = 768 kbps / ch
– 3.86 GB for a 2hr movie (5.1 channels) (NB: DVD capacity = 4.7GB)
CONFIDENTIAL
© 2016-17 DOLBY LABORATORIES, INC.
14
Audio Signal Processing Basics
Processing domains
• Sampled audio i.e., Pulse Code Modulated (PCM) data is in the time domain
• Not everything we want to do with audio is formulated as a time domain operation
– e.g., Flattening the frequency response of a speaker
• The Fourier Transform expresses a signal in terms of it’s frequency components (sinusoids). Using it we can formulate
processing in the frequency domain
• Whether processing is implemented in the time or the frequency domain can depend on where it is most efficient.
• Signal processing also has other useful transform domains which may offer advantages for specific types of processing
– e.g., image coding often uses the Discrete Cosine Transform – DCT
CONFIDENTIAL
© 2016-17 DOLBY LABORATORIES, INC.
15
Headphone Virtualisation
Case Study 1
Headphone Virtualisation
How do you get surround sound out of a pair of headphones?
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
17
Headphone Virtualisation
Two things we need to achieve:
• Make it sound like the audio is coming from different directions
• Make it sound like the listener is in a room.
Both can be achieved by filtering the signal using the impulse response of the room (RIR) and the
head-related transfer functions (HRTF).
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
18
Headphone Virtualisation
Room impulse response
• By measuring how a short impulsive sound is altered by a room, the room’s reflections and echoes can be characterised to
create an impulse response.
https://www.youtube.com/watch?v=PkZjIHTJ4jc
• The impulse response can in turn be used to filter any signal, to make it sound like it was in the room.
• The process of filtering a signal using an impulse response is convolution:
∞
𝑦[𝑛] =
ℎ 𝑘 𝑥 𝑛−𝑘
𝑘=−∞
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
19
Headphone Virtualisation
Room impulse response
• How many points would be required to capture a
room? (i.e. how long is the impulse response?)
• Limiting the impulse response to 50ms gives us
1440 points (@48kHz)
• Considering the computational cost:
1440 * 48k –> 69 MFLOPS
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
20
Headphone Virtualisation
Computational load
• On a DSP chip with a single cycle MAC -> 69
MCPS
• On an ARM, ‘MAC’s ~ 3.5 cycles each -> ~240
MCPS
• 5.1 channels -> 10 filters = 2,400 MCPS
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
21
Headphone Virtualisation
The solution?
• Convolution in Time domain <-> Multiplication in Frequency Domain
– Fourier Transform the impulse response & the signal
• Block based, e.g., blocks of 2048
• O[N.log2(N)] -> k*22528 ~ 78,848
– Operate in the Frequency domain,
• Complex multiplies -> 4 * 2048 -> 8,192
– Transform the result back to the time domain.
• Same as forward transform
– Blocks per second?
• 23 blocks/sec … ~4 MFLOPS / filter
What about the HRTFs ?
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
22
Headphone Virtualisation
Head-related Transfer Function
• Measured on a dummy
• Applied as filters
• Same computational arguments lead us to the need to apply these in the
frequency domain.
NB: we don’t need to go back to the time domain between the
two sets of filters
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
23
Dolby Atmos for headphones debuted in Blizzard’s Overwatch
© 2016-17 DOLBY LABORATORIES, INC.
24
Perceptual Audio Coding
Case study 2
Perceptual Audio Coding
How do you reduce the storage and transmission bandwidth requirements of Audio signals?
Bitrates:
• Uncompressed : 768 kbps / ch
• DVD (AC3) : 448 kbps (5.1 channels) (~10:1 compression ratio)
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
26
Perceptual Audio Coding
Audio Coding is Lossy
• Lossless compression: must perfectly reconstruct their source. (zip files)
• Lossy compression: can ‘throw away’ data if it isn’t ‘needed’. The reconstruction need only be ‘good enough.’
– Deciding which bits to ‘throw away’ and what is ‘good enough’ is the hard part.
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
27
Perceptual Audio Coding
Time/Frequency
analysis
Quantisation
Psychoacoustic
Bit allocation
Entropy
coding
analysis
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
28
Perceptual Audio Coding
Psychoacoustics
• Study of sound Perception
– Perception implies the human experience – which include physiological and psychological factors.
https://auditoryneuroscience.com/McGurkEffect
• Is at the heart of the question of which parts of an audio signal are important, or unimportant.
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
29
Perceptual Audio Coding
Psychoacoustics
• Most perceptual quantities are non-linear and subjective
• Loudness
– Non-linearly related to sound pressure
– Scales include: sone, phon
• Pitch
– Non-linearly related to frequency
– Scales include: Bark, Mel, ERB
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
30
Perceptual Audio Coding
Frequency Masking
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
31
Perceptual Audio Coding
Temporal Masking
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
32
Perceptual Audio Coding
Time/Frequency analysis
• Break the incoming signal into time blocks and
transform into the frequency domain
• Coding is always block based
• The frequency representation is analysed in bins
of equal perceptual bandwidth (bark)
Psychoacoustic analysis
• Use the frequency representation of the current
block to calculate the masking curve
Time/Frequency
analysis
Quantisation
Psychoacoustic
Bit allocation
analysis
• Use the frequency masking curves from previous
frames to account for temporal masking
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
33
Perceptual Audio Coding
Masking Curve
• Areas of the spectrum where the masking curve is above
the signal energy, represent ‘things we can’t hear’
• If we can’t hear them, we shouldn’t spend bits encoding
them
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
34
Perceptual Audio Coding
Bit allocation
• Using the masking curve, we can calculate the
allowed signal to noise ratio in each of the frequency
bands
• Knowing that allocating a bit to a quantiser improves
SNR by 6 dB, iterative allocate the bits available in the
bit pool to band, until we either; run out of bits, or
exceed the SNR requirements in all bands
• (any left over bits can be used to code the next frame)
Time/Frequency
analysis
Quantisation
Psychoacoustic
Bit allocation
analysis
• The bit distribution must be sent to the decoder
Quantiser
• Quantise the frequency domain representation to
send to the decoder.
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
35
Perceptual Audio Coding
Decoding is ‘simple’
• Recreate the frequency representation of each frame
• Transform back to the time domain
• Additional processing can be used to enhance the reconstructed signal
Introducing Audio Signal Processing & Audio Coding
© 2016-17 DOLBY LABORATORIES, INC.
36
CONFIDENTIAL
Summary
Summary
Audio Signal Processing Applications
Audio Signal Processing Basics
• Sampling
• What is an audio signal?
• Signal Processing Domains
Case Study 1 – Headphone Virtualisation
• Frequency Response
• FIR filtering
• Computational Complexity
Case Study 2 – Perceptual Audio Coding
• Psychoacoustics
Questions?
CONFIDENTIAL
© 2016-17 DOLBY LABORATORIES, INC.
38