Benha University Faculty of Engineering (at Shoubra) Electrical Engineering Department Master (Computer Systems Engineering) Subject: Voice Recognition - CES 607 Fall Semester Final Exam Date: Sat 14/01/2017 Duration: 3 hours № of Questions: 3 in 1 page Total Marks: 60 Attempt all the following questions: Question 1: Determine whether the following sentences are True or False: (10 Marks) 1. Without exception, vowels are spoken with more power than other phonemes, in normal speech. (True) 2. Consonants convey a greater share of vocal intelligibility than vowels. (True) 3. AMDF may be less immune to confusion by noise than the frame power measure. (True) 4. SB-ADPCM haves much higher quality than the ADPCM. (True) 5. The fewer bits of quantization, the less audio information is captured. (True) 6. Linear predictive Coefficients (LPC) parameters are never quantized directly. (True) 7. Discrete speech recognition describes a speech recognition system that can recognize continuous sentences of speech. In theory this would not require a user to pause when speaking, and would include dictation and transcription systems. (False) 8. speech classification is the process of creating artificial speech, whether by mechanical, electrical or other means. (False) 9. Zero-crossing rate (ZCR) works well for a noisy waveform like a sine wave. (True) 10. On a train, speech level is greater than noise level. (False) Question 2: Complete the following sentences: (20 Marks) 1. Place of articulation of articulation, is the location at which two speech organs come together in producing a speech sound. 2. Manner of articulation of articulation, is the degree of obstruction or the type of channel imposed upon the passage of air at a given place of articulation. 3. The pitch of the voice is defined as the rate of vibration of the vocal folds. 4. A phoneme is the smallest structural unit of speech. There may be several of these comprising a single word. 5. Single or clustered phonemes form units of sound organization called syllables which generally allow a natural rhythm in speaking. 6. Speech quality is normally measured subjectively, in terms of mean opinion score (MOS). 7. Audio is stored as a vector in MATLAB. 8. The most common window function is Hamming Function. 9. There are many ways to visualize signals such as Plot of wave form or Frequency spectrum. 10. Overlapping means that instead of straightforward segmentation of the audio vector into sequential frames, each new frame is made to contain a part of the previous frame and a part of the next. CES 607 1 of 7 Final – Saturday 14-01-2017 Benha University Faculty of Engineering (at Shoubra) Electrical Engineering Department Master (Computer Systems Engineering) Subject: Voice Recognition - CES 607 Fall Semester Final Exam Date: Sat 14/01/2017 Duration: 3 hours № of Questions: 3 in 1 page Total Marks: 60 Attempt all the following questions: 11. Audio analysis, refers to the extraction of information and meaning from audio signals. 12. Frame power, is a measure of the signal energy over an analysis frame, and is calculated as the sum of the squared magnitude of the sample values in that frame. 13. The average magnitude difference function, is designed to provide much of the information of the frame power measure, but without multiplications. 14. Parametric coder in which knowledge of speech features and characteristics are used for parameterization of speech signal. 15. The quantization process, is reducing the amount of information stored. 16. Audio compression, is reducing the number of bits required to store the audio, but without compromising quality too much. 17. ADPCM tends to match the loudest sounds, leaving quieter sounds to be lost in the quantization noise (i.e. lose the sound). 18. SB-ADPCM, being able to simultaneously code one very loud and one very quiet sound in different frequency ranges. 19. Parameterization is choosing several values to represents important aspects of the speech signal. These values are then transmitted from coder to decoder, where they are used to recreate a similar (but not identical) waveform. 20. Automatic speech recognition (ASR) describes a system that can recognize speech without additional user input. Question 3: a) (17 Marks) One of the basic audio handling is normalization, what is the difference between: absolute scaling and relative scaling? (3 Marks) Solution: Scaling can be done using one of the following methods: Absolute scaling considers the format that the audio was captured in, and scales relative to that (so we would divide each element in the input vector by the biggest value in that representation: 32 768 for 16-bit signed linear). Relative Scaling scales relative to the largest value in the sample vector. This is the method we used when playing back audio earlier. CES 607 2 of 7 Final – Saturday 14-01-2017 Benha University Faculty of Engineering (at Shoubra) Electrical Engineering Department Master (Computer Systems Engineering) Subject: Voice Recognition - CES 607 Fall Semester Final Exam Date: Sat 14/01/2017 Duration: 3 hours № of Questions: 3 in 1 page Total Marks: 60 Attempt all the following questions: b) What is the difference between quality and intelligibility when focusing on speech understanding? Support your answer with examples. (3 Marks) Solution: Both are correctly used interchangeably at times, but their measurement and dependencies are actually very different. • Quality: is a measure of the fidelity of speech. This includes how well the speech under examination resembles some original speech, but extends beyond that to how nice the speech sounds. • Intelligibility is a measure of how understandable the speech is. In other words, it concentrates on the information-carrying content of speech. • Examples: - A string of nonsense syllables, similar to baby speech, spoken by someone with a good speaking voice can sound very pleasant, of extremely high quality, but contains no verbal information, in fact has no intelligibility at all. - A recording of speech with a high-frequency buzzing sound in the background will be rated as having low quality even though the words themselves may be perfectly understandable. In this case the intelligibility is high. c) (3 Marks) State and explain the three basic problems for HMM. Solution: 1. Evaluation Given the observation sequence O = O1 O2 O3… OT and a model λ = (A,B,π) how do we efficiently compute P(O| λ ), i.e., the probability of the observation sequence given the model 2. Recognition Given the observation sequence O = O1 O2 O3… OT and a model λ = (A,B,π) how do we choose a corresponding state sequence Q = q1 q2 q3… qT which is optimal, i.e., best explains the observations 3. Training Given the observation sequence O = O1 O2 O3… OT , how do we adjust the model parameters λ = (A,B,π) to maximize P(O| λ ) d) (4 Marks) State and explain three basic speech recognition difficulties. Solution: 1. Segmentation of speech into smaller units is often required in processing systems. CES 607 3 of 7 Final – Saturday 14-01-2017 Benha University Faculty of Engineering (at Shoubra) Electrical Engineering Department Master (Computer Systems Engineering) Subject: Voice Recognition - CES 607 Fall Semester Final Exam Date: Sat 14/01/2017 Duration: 3 hours № of Questions: 3 in 1 page Total Marks: 60 Attempt all the following questions: 2. Word stress can be very important in determining the meaning of a sentence, and although it is not captured in the written word, is widely used during vocal communications. 3. Context although it cannot be relied upon in all cases, it can be used to strengthen recognition accuracy upon occasion. e) State the disadvantages of Pulse Coded Modulation (PCM) and Delta Modulation. (4 Marks) Solution: Disadvantage of PCM: ◦ If max freq in speech=4KHZ sampling freq. = 2*4K=8Ksample/sec (Nyquist theory) ◦ Since each sample is represented in 16-bit and we have 8ksample/sec..then the total number of transmitted bits = 16*8k=128kbps WHICH IS TOO MUCH Disadvantage of Delta Modulation: 1. the quantization error depends on the step size. In order to represent audio as accurately as possible, the step height should be small. i.e. more steps are needed to reach up to larger waveform peaks. 2. Slew rate or slope overload is the limit on the gradient. CES 607 When rising up to the first peak, and dropping down after it, there is a large gap between the waveform we desire to quantize and the actual step values. This is because ‘delta mod’ can only increase a single step at a time, but the gradient of the waveform has exceeded this. 4 of 7 Final – Saturday 14-01-2017 Benha University Faculty of Engineering (at Shoubra) Electrical Engineering Department Master (Computer Systems Engineering) Subject: Voice Recognition - CES 607 Fall Semester Final Exam Date: Sat 14/01/2017 Duration: 3 hours № of Questions: 3 in 1 page Total Marks: 60 Attempt all the following questions: Question 4: a) (13 Marks) (4 Marks) The figure bellow shows, the effect of limiting speech frequency range on the intelligibility of speech syllables. Illustrate and analyze. Solution: • An analysis of the previous figure reveals that if a speech signal were lowpass filtered at 1 kHz, around 25% of speech syllables would be recognizable. • CES 607 If it were high-pass filtered at 2 kHz, around 70% would be recognizable. 5 of 7 Final – Saturday 14-01-2017 Benha University Faculty of Engineering (at Shoubra) Electrical Engineering Department Master (Computer Systems Engineering) Subject: Voice Recognition - CES 607 Fall Semester Final Exam Date: Sat 14/01/2017 Duration: 3 hours № of Questions: 3 in 1 page Total Marks: 60 Attempt all the following questions: b) (6 Marks) The figure bellow shows, a measurement of the effects of contextual information on understanding by plotting the percentage of correctly identified digits syllables or words spoken in the presence of the given degree of background noise. Complete the graph using (words in sentence – spoken digits – isolated words – nonsense syllables) then briefly explain. Solution: CES 607 6 of 7 Final – Saturday 14-01-2017 Benha University Faculty of Engineering (at Shoubra) Electrical Engineering Department Master (Computer Systems Engineering) Subject: Voice Recognition - CES 607 Fall Semester Final Exam Date: Sat 14/01/2017 Duration: 3 hours № of Questions: 3 in 1 page Total Marks: 60 Attempt all the following questions: c) (3 Marks) The figure bellow demonstrates the parameterisation of the speech signal into components based loosely upon the human speech production system. Complete the graph. Solution: Good Luck Dr.Shady Yehia Elmashad CES 607 7 of 7 Final – Saturday 14-01-2017
© Copyright 2026 Paperzz