Course Presentation Multimedia Systems Speech II Mahdi Amiri February 2011 Sharif University of Technology Adaptive DPCM (ADPCM) Idea Problem? Page 1 Multimedia Systems, Speech II Adaptive DPCM (ADPCM) Size of Quantization Step ADM: [n ] M [n 1] P 2, Q 1 2 M P 1 if c [n ] c [n 1] M Q 1 if c [n ] c [n 1] Page 2 Multimedia Systems, Speech II Speech Compression Concepts FFT, No Time Localization Speech Signal Joseph Fourier, 1768-1830 FFT (is only localized in frequency) Page 3 Multimedia Systems, Speech II Speech Compression Concepts STFT Speech Signal Dennis Gabor, 1900-1979 STFT (fixed time and frequency localization) Page 4 Multimedia Systems, Speech II Speech Compression Concepts Spectrogram 3D surface spectrogram of a part from a music piece. Page 5 Multimedia Systems, Speech II Speech Compression Concepts Spectrogram Spectrogram of a male voice saying ‘nineteenth century’. Page 6 Multimedia Systems, Speech II Speech Compression Concepts Spectrogram, Demonstration Bat Echolocation Call Flute by Jean Pierre Rampal Face! Singing Voice Page 7 Multimedia Systems, Speech II Speech Compression Concepts Formant The time and frequency domain presentation of vowels /a/, /i/, and /u/ /a/ /i/ /u/ Page 8 Multimedia Systems, Speech II Speech Compression Concepts Sample Application A computing system to answer questions posed in natural language www-943.ibm.com/innovation/us/watson/ Dr. David Ferrucci, Watson Principal Investigator Page 9 Jeopardy! champions Ken Jennings (left) and Brad Rutter (right) versus the IBM computer Watson Multimedia Systems, Speech II Linear Predictive Coding (LPC) Modeling Page 10 Multimedia Systems, Speech II Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) Buzzer Filter Chuncks: 30 thr. 50 frames/sec. Speech = Formants + Residue P Predictor for each frame: x [n ] ai x [n i ] i 1 Page 11 Multimedia Systems, Speech II Linear Predictive Coding (LPC) Modeling (Hiss or Buzz) The human vocal tract as an infinite impulse response (IIR) system LPC Block Diagram Page 12 Multimedia Systems, Speech II Vowel /a/ Linear Predictive Coding (LPC) Original Paper, Atal-Hanauer 1971 Original Synthetic Comparison of wide-band sound spectrograms for synthetic and original speech signal for the utterance "It's time we rounded up that herd of Asian cattle," spoken by a male speaker Page 13 Multimedia Systems, Speech II Linear Predictive Coding (LPC) Voiced Frame Example Original Synthetic Time Domain 180 samples, Pitch period: 75 Page 14 Multimedia Systems, Speech II Frequency Domain Linear Predictive Coding (LPC) Unvoiced Frame Example Original Synthetic: White noise with uniform distribution Time Domain 180 samples Page 15 Multimedia Systems, Speech II Frequency Domain Code Excited Linear Prediction CELP Problem of LPC Where there is both Hiss and Buzz Solution Encoder Encode residue Method Vector Quantization (Codebook) Page 16 Decoder Multimedia Systems, Speech II Vector Quantization Block Diagram Page 17 Multimedia Systems, Speech II Vector Quantization Example Page 18 Multimedia Systems, Speech II Vector Quantization Codebook Design Page 19 Multimedia Systems, Speech II Comparison of Speech Coders Sample Speech A lathe is a big tool. Grab every dish of sugar. Page 20 Multimedia Systems, Speech II Comparison of Speech Coders Demonstration Page 21 Original ADPCM LPC CELP Multimedia Systems, Speech II Speech Coding ITU-T Standards G.711 PCM u-law, a-law 64, 80 and 96 kbps G.722 Check out a complete list at http://en.wikipedia.org/wiki/List_of_codecs#Audio_codecs A comparison of Internet audio compression formats http://www.sericyb.com.au/audio.html ADPCM 48, 56 and 64 kbps G.728 A form of CELP 16 kbps Vocoders Page 22 Multimedia Systems, Speech II Speech Coding Free and Open Source Code HawkVoice http://hawksoft.com/hawkvoice/ Check out voice samples of HawkVoice™ codecs at http://hawksoft.com/hawkvoice/codecs.shtml Page 23 Multimedia Systems, Speech II Multimedia Systems Speech II Thank You Next Session: Entropy Coding FIND OUT MORE AT... 1. http://ce.sharif.edu/~m_amiri/ 2. http://www.dml.ir/ Page 24 Multimedia Systems, Speech II
© Copyright 2025 Paperzz