Lec04, Speech II, v1.03.pdf

Course Presentation
Multimedia Systems
Speech II
Mahdi Amiri
February 2011
Sharif University of Technology
Adaptive DPCM (ADPCM)
Idea
Problem?
Page 1
Multimedia Systems, Speech II
Adaptive DPCM (ADPCM)
Size of Quantization Step
ADM: [n ]  M [n  1]
P  2, Q  1
2
M  P  1 if c [n ]  c [n  1]
M  Q  1 if c [n ]  c [n  1]
Page 2
Multimedia Systems, Speech II
Speech Compression Concepts
FFT, No Time Localization
Speech Signal
Joseph Fourier, 1768-1830
FFT
(is only localized in frequency)
Page 3
Multimedia Systems, Speech II
Speech Compression Concepts
STFT
Speech Signal
Dennis Gabor, 1900-1979
STFT
(fixed time and frequency localization)
Page 4
Multimedia Systems, Speech II
Speech Compression Concepts
Spectrogram
3D surface spectrogram of a part
from a music piece.
Page 5
Multimedia Systems, Speech II
Speech Compression Concepts
Spectrogram
Spectrogram of a male voice saying ‘nineteenth century’.
Page 6
Multimedia Systems, Speech II
Speech Compression Concepts
Spectrogram, Demonstration
Bat Echolocation Call
Flute by Jean Pierre Rampal
Face!
Singing Voice
Page 7
Multimedia Systems, Speech II
Speech Compression Concepts
Formant
The time and frequency domain
presentation of vowels /a/, /i/, and /u/
/a/
/i/
/u/
Page 8
Multimedia Systems, Speech II
Speech Compression Concepts
Sample Application
A computing system to answer
questions posed in natural language
www-943.ibm.com/innovation/us/watson/
Dr. David Ferrucci, Watson Principal Investigator
Page 9
Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson
Multimedia Systems, Speech II
Linear Predictive Coding (LPC)
Modeling
Page 10
Multimedia Systems, Speech II
Linear Predictive Coding (LPC)
Modeling (Hiss or Buzz)
Buzzer  Filter
Chuncks: 30 thr. 50 frames/sec.
Speech = Formants + Residue
P
Predictor for each frame:
x [n ]   ai x [n  i ]
i 1
Page 11
Multimedia Systems, Speech II
Linear Predictive Coding (LPC)
Modeling (Hiss or Buzz)
The human vocal tract as an infinite impulse response (IIR) system
LPC Block Diagram
Page 12
Multimedia Systems, Speech II
Vowel /a/
Linear Predictive Coding (LPC)
Original Paper, Atal-Hanauer 1971
Original
Synthetic
Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's
time we rounded up that herd of Asian cattle," spoken by a male speaker
Page 13
Multimedia Systems, Speech II
Linear Predictive Coding (LPC)
Voiced Frame Example
Original
Synthetic
Time Domain
180 samples, Pitch period: 75
Page 14
Multimedia Systems, Speech II
Frequency Domain
Linear Predictive Coding (LPC)
Unvoiced Frame Example
Original
Synthetic:
White noise
with uniform
distribution
Time Domain
180 samples
Page 15
Multimedia Systems, Speech II
Frequency Domain
Code Excited Linear Prediction
CELP
Problem of LPC
Where there is both Hiss and Buzz
Solution
Encoder
Encode residue
Method
Vector Quantization
(Codebook)
Page 16
Decoder
Multimedia Systems, Speech II
Vector Quantization
Block Diagram
Page 17
Multimedia Systems, Speech II
Vector Quantization
Example
Page 18
Multimedia Systems, Speech II
Vector Quantization
Codebook Design
Page 19
Multimedia Systems, Speech II
Comparison of Speech Coders
Sample Speech
A lathe is a big tool. Grab every dish of sugar.
Page 20
Multimedia Systems, Speech II
Comparison of Speech Coders
Demonstration
Page 21
Original
ADPCM
LPC
CELP
Multimedia Systems, Speech II
Speech Coding
ITU-T Standards
G.711
PCM
u-law, a-law
64, 80 and 96 kbps
G.722
Check out a complete list at
http://en.wikipedia.org/wiki/List_of_codecs#Audio_codecs
A comparison of Internet audio compression formats
http://www.sericyb.com.au/audio.html
ADPCM
48, 56 and 64 kbps
G.728
A form of CELP
16 kbps
Vocoders
Page 22
Multimedia Systems, Speech II
Speech Coding
Free and Open Source Code
HawkVoice
http://hawksoft.com/hawkvoice/
Check out voice samples of HawkVoice™ codecs at
http://hawksoft.com/hawkvoice/codecs.shtml
Page 23
Multimedia Systems, Speech II
Multimedia Systems
Speech II
Thank You
Next Session: Entropy Coding
FIND OUT MORE AT...
1. http://ce.sharif.edu/~m_amiri/
2. http://www.dml.ir/
Page 24
Multimedia Systems, Speech II