Lecture 04, Speech II

Course Presentation
Multimedia Systems
Speech I
Mahdi Amiri
February 2011
Sharif University of Technology
Sound
Basics
waves of pressure which propagates through
compressible media such as air or water.
Digital Representation of
an Analog Signal
Sampling and Quantization
Parameters:
Sound is a sequence of
Sampling Rate (Samples per Second)
Quantization Levels (Bits per Sample)
This is a form of coding too:
Pulse-code modulation (PCM)
Page 1
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Why Call it PCM?
4-bit PCM
Page 2
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Bit per Second (bit/s or bps)
How to choose proper…
Sampling Rate
8 Khz ?
Quantization Level
8 bit/sample ?
Bit per Second for 8000 Hz 8 bit PCM
64 kbit/s
Page 3
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Sampling Rate
Human Hearing Frequency Range
20 Hz to 20 kHz
Play with “Audacity”
tone generator to test
your hearing
Most people will find that their hearing is
most sensitive around 1-4 kHz and that it is less
sensitive at high and low frequencies.
Page 4
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Hearing Range
Porpoise
Ferret
Page 5
Check out
Freq. Alloc. Table
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Sampling Rate
Human Vocal Range
Normal: 80 Hz to 1100 Hz
Charles Kellogg (14 KHz) (not verified)
Guinness Book of Records
Female: Georgia Brown
(Eight octaves, 25087Hz)
Male: Tim Storms
(Six octaves)
Page 6
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Common Sampling Rates
8,000 Hz - Telephone, adequate for human speech.
11,025 Hz – lower quality PCM (one quarter the sampling rate of audio CDs).
22,050 Hz – Radio.
32,000 Hz - miniDV digital video camcorder, DAT (LP mode).
44,100 Hz - Audio CD, also most commonly used with MPEG-1 audio (VCD, SVCD,
MP3) (Originally chosen by Sony, 1979).
48,000 Hz - Digital sound used for miniDV, digital TV, DVD, DAT, films and
professional audio.
96,000 or 192,000 Hz - DVD-Audio, some LPCM DVD tracks, BD-ROM (Blu-ray
Disc) audio tracks, and HD-DVD (High-Definition DVD) audio tracks.
2.8224 MHz - Super Audio CD (SACD), 1-bit sigma-delta modulation process known
as Direct Stream Digital (DSD), co-developed by Sony and Philips.
5.6448 MHz - Double-Rate DSD, 1-bit Direct Stream Digital at 2x the rate of the
SACD. Used in some professional DSD recorders (128 * 44100 Hz).
DXD - 24-bit sampled at 352.8 kHz, suited for editing, eq. with 8.4672 MHz 1-bit DSD
Page 7
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Direct Stream Digital
The trademark name used by Sony and Philips.
Uses pulse-density modulation encoding
Page 8
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Quantization, Images
Page 9
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Uniform Quantizer, Midtread
Simple and popular
midtread
odd number of
reconstruction levels
Quantization error for bounded input
Page 10
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Uniform Quantizer, Midtrise
Simple and popular
midtrise
even number of
reconstruction levels
Page 11
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Quantization Levels
Want to prevent human ear fatigue by
minimizing quantization noise
Signal-to-Noise Ratio = 6.02*B dB
SNR is approximately 6 dB per bit.
16-bit => 96 dB
Above 36 dB is required
Page 12
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
6 dB per bit rule of thump
Average power of a process or signal:

 
   
e n  xˆ n  x n
   e  n  
2
2

2
x


p
x
dx






x
x

2Xm
2B
x :
2
Mean
p  x :
: Variance
Probability density function
SNRdB  B   10log10  x2  e2 
 X m  x  n  X m
Assumption: e  n is uniform over (  ,  ]
2 2

     e  e 
2
e

2
2
2
X m2
1
2
p  e  de    e de 
 2B


12 2  3
2

2
2
SNRdB  B   10 log10  22 B 3 x2 X m2 
  20 log10 2  B  10 log10  3 x2 X m2 
The probability density function of e[n]
Page 13
SNRdB  B  6.02B  10log10  3 x2 X m2 
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
6 dB per bit rule of thump
SNRdB  B  6.02B  10log10  3 x2 X m2 
Example 1: SNR for Uniform Quantization of Uniformly-Distributed Input
 x2  X m2 3
SNRdB  B  6.02B
Example 2: SNR for Uniform Quantization of Sinusoidal Input
 x2  X m2 2
SNRdB  B  6.02B  1.76
Example 3: SNR for Uniform Quantization of Gaussian Input
 x2  X m2 16
Page 14
SNRdB  B  6.02B  7.27
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Good to Know
The average person cannot tell the difference between a bitrate above 192
kbit/s and the original CD/WAV.
Even if your headphones seal really well around your ears, they will
probably only give you about 20 to 25 dB insulation from the external sound
20 ~ 25 dB insulation
Noise level for 192 kbps audio is under -125 db and certainly inaudible
Page 15
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
Nonuniform Quantization
Typical speech signal and its histogram
(a) Uniform and (b) non-uniform quantization Q(x) and quantization error q(x)
Page 16
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
u-law, a-law
Nonuniform quantizers: Difficult to make, Expensive.
Solution: Companding  Uniform Q.  Expanding
Page 17
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
u-law, a-law
Page 18
Multimedia Systems, Speech I
Pulse-code Modulation (PCM)
u-law, a-law
u-law
North America and Japan
Page 19
Multimedia Systems, Speech I
a-law
Europe
Differential PCM (DPCM)
Idea
Take advantage of data redundancy
[… 110 112 111 112 112 114 115 115 114 114… ]
[… +2 -1 +1 0 +2 +1 0 -1 0 …]
Page 20
Multimedia Systems, Speech I
Differential PCM (DPCM)
Basic Scheme
General Predictive Coding
Delta Modulation (DM):  ai x n i  z 1
Problem?
Page 21
Multimedia Systems, Speech I
Differential PCM (DPCM)
Better Structure
Page 22
Multimedia Systems, Speech I
Multimedia Systems
Speech I
Thank You
Next Session: Speech II
FIND OUT MORE AT...
1. http://ce.sharif.edu/~m_amiri/
2. http://www.dml.ir/
Page 23
Multimedia Systems, Speech I