Manfred R. Schroeder Computer Speech Recognition, Compression, Synthesis With Introductions to Hearing and Signal Analysis and a Glossary of Speech and Computer Terms With 78 Figures Springer Contents 1. Introduction 1.1 Speech: Natural and Artificial 1.2 Voice Coders 1.3 Voiceprints for Combat and for Fighting Crime . 1.4 The Electronic Secretary 1.5 The Human Voice as a Key 1.6 Clipped Speech 1.7 Frequency Division 1.8 The First Circle of Hell: Speech in the Soviet Union 1.9 Linking Fast Trains to the Telephone Network 1.10 Digital Decapitation 1.11 Man into Woman and Back 1.12 Reading Aids for the Blind 1.13 High-Speed Recorded Books 1.14 Spectral Compression for the Hard-of-Hearing 1.15 Restoration of Helium Speech 1.16 Noise Suppression 1.17 Slow Speed for Better Comprehension 1.18 Multiband Hearing Aids and Binaural Speech Processors . . . . 1.19 Improving Public Address Systems 1.20 Raising Intelligibility in Reverberant Spaces 1.21 Conclusion 1 1 2 4 7 8 9 11 12 13 14 16 16 16 17 17 17 19 19 20 20 21 2. A Brief History of Speech 2.1 Animal Talk 2.2 Wolfgang Ritter von Kempelen 2.3 From Kratzenstein to Helmholtz 2.4 Helmholtz and Rayleigh 2.5 The Bells: Alexander Melville and Alexander Graham Bell 2.6 Modern Times 2.7 The Vocal Tract 2.8 Articulatory Dynamics 2.9 The Vocoder and Some of Its Progeny 23 23 24 26 27 ."T 28 29 30 31 34 XXVIII Contents 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 3. 4. Formant Vocoders Correlation Vocoders The Voice-Excited Vocoder Center Clipping for Spectrum Flattening Linear Prediction Subjective Error Criteria Neural Networks Wavelets Conclusion Speech Recognition and Speaker Identification 3.1 Speech Recognition 3.2 Dialogue Systems 3.3 Speaker Identification 3.4 Word Spotting 3.5 Pinpointing Disasters by Speaker Identification 3.6 Speaker Identification for Forensic Purposes 3.7 Dynamic Programming 3.8 Markov Models 3.9 Shannon's Outguessing Machine - A Hidden Markov Model Analyzer 3.10 Hidden Markov Models in Speech Recognition 3.11 Neural Networks 3.11.1 The Perceptron 3.11.2 Multilayer Networks 3.11.3 Backward Error Propagation 3.11.4 Kohonen Self-Organizing Maps 3.11.5 Hopfield Nets and Associative Memory 3.12 Whole Word Recognition 3.13 Robust Speech Recognition 3.14 The Modulation Transfer Function Speech Compression 4.1 Vocoders 4.2 Digital Simulation 4.3 Linear Prediction 4.3.1 Linear Prediction and Resonances 4.3.2 The Innovation Sequence 4.3.3 Single Pulse Excitation 4.3.4 Multipulse Excitation ^ 4.3.5 Adaptive Predictive Coding 4.3.6 Masking of Quantizing Noise 4.3.7 Instantaneous Quantizing Versus Block Coding 4.3.8 Delays 35 36 36 37 38 38 39 39 40 41 42 44 45 46 47 48 49 49 50 51 52 53 54 54 55 55 56 57 57 63 64 65 66 67 71 72 74 74 75 76 78 Contents XXIX 4.3.9 Code Excited Linear Prediction (CELP) 4.3.10 Algebraic Codes 4.3.11 Efficient Coding of Parameters 4.4 Waveform Coding 4.5 Transform Coding 4.6 Audio Compression 79 79 79 80 81 82 5. Speech Synthesis 5.1 Model-Based Speech Synthesis 5.2 Synthesis by Concatenation 5.3 Prosody 85 87 88 89 6. Speech Production 6.1 Sources and Filters 6.2 The Vocal Source 6.3 The Vocal Tract 6.3.1 Radiation from the Lips 6.4 The Acoustic Tube Model of the Vocal Tract 6.5 Discrete Time Description 91 92 92 95 96 98 102 7. The 7.1 7.2 7.3 7.4 105 106 106 106 107 8. Hearing 8.1 Historical Antecedents 8.2 Thomas Seebeck and Georg Simon Ohm 8.3 More on Monaural Phase Sensitivity 8.4 Hermann von Helmholtz and Georg von Bekesy 8.4.1 Thresholds of Hearing 8.4.2 Pulsation Threshold and Continuity Effect 8.5 Anatomy and Basic Capabilities of the Ear 8.6 The Pinnae and the Outer Ear Canal 8.7 The Middle Ear 8.8 The Inner Ear 8.9 Mechanical to Neural Transduction 8.10 Some Astounding Monaural Phase Effects 8.11 Masking 8.12 Loudness 8.13 Scaling in Psychology 8.14 Pitch Perception and Uncertainty . . . .\ Speech Signal Spectral Envelope and Fine Structure Unvoiced Sounds The Voiced-Unvoiced Classification The Formant Frequencies 109 Ill 113 113 114 114 115 116 116 116 118 125 127 130 130 131 133 XXX 9. Contents Binaural Hearing — Listening with Both Ears 9.1 Directional Hearing 9.2 Precedence and Haas Effects 9.3 Vertical Localization 9.4 Virtual Sound Sources and Quasi-Stereophony 9.5 Binaural Release from Masking 9.6 Binaural Beats and Pitch 9.7 Direction and Pitch Confused 9.8 Pseudo-Stereophony 9.9 Virtual Sound Images 9.10 Philharmonic Hall, New York 9.11 The Proper Reproduction of Spatial Sound Fields 9.12 The Importance of Lateral Sound 9.13 How to Increase Lateral Sounds in Real Halls 9.14 Summary 135 135 137 139 141 144 145 146 150 152 153 154 156 158 161 10. Basic Signal Concepts 10.1 The Sampling Theorem and Some Notational Conventions . . . 10.2 Fourier Transforms 10.3 The Autocorrelation Function 10.4 The Convolution Integral and the Delta Function 10.5 The Cross-Correlation Function and the Cross-Spectrum . . . . 10.5.1 A Bit of Number Theory 10.6 The Hilbert Transform and the Analytic Signal 10.7 Hilbert Envelope and Instantaneous Frequency 10.8 Causality and the Kramers-Kronig Relations 10.8.1 Anticausal Functions 10.8.2 Minimum-Phase Systems and Complex Frequencies . . . 10.8.3 Allpass Systems 10.8.4 Dereverberation 10.9 Matched Filtering 10.10 Phase and Group Delay 10.11 Heisenberg Uncertainty and The Fourier Transform 10.11.1 Prolate Spheroidal Wave Functions and Uncertainty. . 10.12 Time and Frequency Windows 10.13 The Wigner-Ville Distribution 10.14 The Cepstrum: Measurement of Fundamental Frequency . . . . 10.15 Line Spectral Frequencies 163 163 164 167 169 171 173 174 176 180 181 182 183 184 185 186 188 190 194 195 197 200 A. Acoustic Theory and Modeling of the Vocal Tract A.I Introduction A.2 Acoustics of a Hard-Walled, Lossless Tube A.2.1 Field Equations A.2.2 Time-Invariant Case A.2.3 Forrnants as Eigenvalues 203 203 204 204 208 209 Contents XXXI A.2.4 Losses and Nonrigid Walls A.3 Discrete Modeling of a Tube A.3.1 Time-Domain Modeling A.3.2 Frequency-Domain Modeling, Two-Port Theory A.3.3 Tube Models and Linear Prediction A.4 Notes on the Inverse Problem A.4.1 Analytic and Numerical Methods A.4.2 Empirical Methods 211 212 213 216 219 220 221 223 B. Direct Relations Between Cepstrum and Predictor Coefficients 225 B.I Derivation of the Main Result 225 B.2 Direct Computation of Predictor Coefficients from the Cepstrum 227 B.3 A Simple Check 228 B.4 Connection with Algebraic Roots and Symmetric Functions . . 228 B.5 Connection with Statistical Moments and Cumulants 230 B.6 Computational Complexity 230 B.7 An Application of Root-Power Sums to Pitch Detection 231 References 234 General Reading 249 Selected Journals 259 A Sampling of Societies and Major Meetings 260 Glossary of Speech and Computer Terms 261 Name Index 285 Subject Index 293 The Author 315
© Copyright 2026 Paperzz