Question 2: Complete the following sentences:

Benha University
Faculty of Engineering (at Shoubra)
Electrical Engineering Department
Master (Computer Systems Engineering)
Subject: Voice Recognition - CES 607
Fall Semester
Final Exam
Date: Sat 14/01/2017
Duration: 3 hours
№ of Questions: 3 in 1 page
Total Marks: 60
Attempt all the following questions:
Question 1: Determine whether the following sentences are True or False:
(10 Marks)
1. Without exception, vowels are spoken with more power than other phonemes, in
normal speech. (True)
2. Consonants convey a greater share of vocal intelligibility than vowels. (True)
3. AMDF may be less immune to confusion by noise than the frame power measure.
(True)
4. SB-ADPCM haves much higher quality than the ADPCM. (True)
5. The fewer bits of quantization, the less audio information is captured. (True)
6. Linear predictive Coefficients (LPC) parameters are never quantized directly. (True)
7. Discrete speech recognition describes a speech recognition system that can
recognize continuous sentences of speech. In theory this would not require a user to
pause when speaking, and would include dictation and transcription systems. (False)
8. speech classification is the process of creating artificial speech, whether by
mechanical, electrical or other means. (False)
9. Zero-crossing rate (ZCR) works well for a noisy waveform like a sine wave. (True)
10. On a train, speech level is greater than noise level. (False)
Question 2: Complete the following sentences:
(20 Marks)
1. Place of articulation of articulation, is the location at which two speech organs
come together in producing a speech sound.
2. Manner of articulation of articulation, is the degree of obstruction or the type of
channel imposed upon the passage of air at a given place of articulation.
3. The pitch of the voice is defined as the rate of vibration of the vocal folds.
4. A phoneme is the smallest structural unit of speech. There may be several of these
comprising a single word.
5. Single or clustered phonemes form units of sound organization called syllables
which generally allow a natural rhythm in speaking.
6. Speech quality is normally measured subjectively, in terms of mean opinion score
(MOS).
7. Audio is stored as a vector in MATLAB.
8. The most common window function is Hamming Function.
9. There are many ways to visualize signals such as Plot of wave form or Frequency
spectrum.
10. Overlapping means that instead of straightforward segmentation of the audio
vector into sequential frames, each new frame is made to contain a part of the
previous frame and a part of the next.
CES 607
1 of 7
Final – Saturday 14-01-2017
Benha University
Faculty of Engineering (at Shoubra)
Electrical Engineering Department
Master (Computer Systems Engineering)
Subject: Voice Recognition - CES 607
Fall Semester
Final Exam
Date: Sat 14/01/2017
Duration: 3 hours
№ of Questions: 3 in 1 page
Total Marks: 60
Attempt all the following questions:
11. Audio analysis, refers to the extraction of information and meaning from audio
signals.
12. Frame power, is a measure of the signal energy over an analysis frame, and is
calculated as the sum of the squared magnitude of the sample values in that frame.
13. The average magnitude difference function, is designed to provide much of the
information of the frame power measure, but without multiplications.
14. Parametric coder in which knowledge of speech features and characteristics are
used for parameterization of speech signal.
15. The quantization process, is reducing the amount of information stored.
16. Audio compression, is reducing the number of bits required to store the audio, but
without compromising quality too much.
17. ADPCM tends to match the loudest sounds, leaving quieter sounds to be lost in the
quantization noise (i.e. lose the sound).
18. SB-ADPCM, being able to simultaneously code one very loud and one very quiet
sound in different frequency ranges.
19. Parameterization is choosing several values to represents important aspects of the
speech signal. These values are then transmitted from coder to decoder, where they
are used to recreate a similar (but not identical) waveform.
20. Automatic speech recognition (ASR) describes a system that can recognize speech
without additional user input.
Question 3:
a)
(17 Marks)
One of the basic audio handling is normalization, what is the difference
between: absolute scaling and relative scaling?
(3 Marks)
Solution:
Scaling can be done using one of the following methods:
 Absolute scaling
considers the format that the audio was captured in, and scales relative to that (so we
would divide each element in the input vector by the biggest value in that
representation: 32 768 for 16-bit signed linear).
 Relative Scaling
scales relative to the largest value in the sample vector. This is the
method we used when playing back audio earlier.
CES 607
2 of 7
Final – Saturday 14-01-2017
Benha University
Faculty of Engineering (at Shoubra)
Electrical Engineering Department
Master (Computer Systems Engineering)
Subject: Voice Recognition - CES 607
Fall Semester
Final Exam
Date: Sat 14/01/2017
Duration: 3 hours
№ of Questions: 3 in 1 page
Total Marks: 60
Attempt all the following questions:
b)
What is the difference between quality and intelligibility when focusing on
speech understanding? Support your answer with examples.
(3 Marks)
Solution:
Both are correctly used interchangeably at times, but their measurement and
dependencies are actually very different.
•
Quality: is a measure of the fidelity of speech. This includes how well the speech
under examination resembles some original speech, but extends beyond that to how
nice the speech sounds.
•
Intelligibility is a measure of how understandable the speech is. In other words, it
concentrates on the information-carrying content of speech.
•
Examples:
- A string of nonsense syllables, similar to baby speech, spoken by someone with a
good speaking voice can sound very pleasant, of extremely high quality, but
contains no verbal information, in fact has no intelligibility at all.
- A recording of speech with a high-frequency buzzing sound in the background will
be rated as having low quality even though the words themselves may be perfectly
understandable. In this case the intelligibility is high.
c)
(3 Marks) State
and explain the three basic problems for HMM.
Solution:
1. Evaluation Given the observation sequence O = O1 O2 O3… OT and a model λ =
(A,B,π) how do we efficiently compute P(O| λ ), i.e., the probability of the
observation sequence given the model
2. Recognition Given the observation sequence O = O1 O2 O3… OT and a model λ =
(A,B,π) how do we choose a corresponding state sequence Q = q1 q2 q3… qT which
is optimal, i.e., best explains the observations
3. Training Given the observation sequence O = O1 O2 O3… OT , how do we adjust
the model parameters λ = (A,B,π) to maximize P(O| λ )
d)
(4 Marks) State
and explain three basic speech recognition difficulties.
Solution:
1. Segmentation of speech into smaller units is often required in processing
systems.
CES 607
3 of 7
Final – Saturday 14-01-2017
Benha University
Faculty of Engineering (at Shoubra)
Electrical Engineering Department
Master (Computer Systems Engineering)
Subject: Voice Recognition - CES 607
Fall Semester
Final Exam
Date: Sat 14/01/2017
Duration: 3 hours
№ of Questions: 3 in 1 page
Total Marks: 60
Attempt all the following questions:
2. Word stress can be very important in determining the meaning of a sentence,
and although it is not captured in the written word, is widely used during vocal
communications.
3. Context although it cannot be relied upon in all cases, it can be used to
strengthen recognition accuracy upon occasion.
e)
State the disadvantages of Pulse Coded Modulation (PCM) and Delta
Modulation.
(4 Marks)
Solution:
 Disadvantage of PCM:
◦
If max freq in speech=4KHZ
sampling freq. = 2*4K=8Ksample/sec (Nyquist theory)
◦
Since each sample is represented in 16-bit and we have 8ksample/sec..then
the total number of transmitted bits = 16*8k=128kbps
WHICH IS TOO MUCH
 Disadvantage of Delta Modulation:
1. the quantization error depends on the step size.

In order to represent audio as accurately as possible, the step height
should be small.

i.e. more steps are needed to reach up to larger waveform peaks.
2. Slew rate or slope overload is the limit on the gradient.
CES 607

When rising up to the first peak, and dropping down after it, there is
a large gap between the waveform we desire to quantize and the
actual step values.

This is because ‘delta mod’ can only increase a single step at a time,
but the gradient of the waveform has exceeded this.
4 of 7
Final – Saturday 14-01-2017
Benha University
Faculty of Engineering (at Shoubra)
Electrical Engineering Department
Master (Computer Systems Engineering)
Subject: Voice Recognition - CES 607
Fall Semester
Final Exam
Date: Sat 14/01/2017
Duration: 3 hours
№ of Questions: 3 in 1 page
Total Marks: 60
Attempt all the following questions:
Question 4:
a)
(13 Marks)
(4 Marks) The
figure bellow shows, the effect of limiting speech frequency
range on the intelligibility of speech syllables. Illustrate and analyze.
Solution:
• An analysis of the previous figure reveals that if a speech signal were lowpass filtered at 1 kHz, around 25% of speech syllables would be
recognizable.
•
CES 607
If it were high-pass filtered at 2 kHz, around 70% would be recognizable.
5 of 7
Final – Saturday 14-01-2017
Benha University
Faculty of Engineering (at Shoubra)
Electrical Engineering Department
Master (Computer Systems Engineering)
Subject: Voice Recognition - CES 607
Fall Semester
Final Exam
Date: Sat 14/01/2017
Duration: 3 hours
№ of Questions: 3 in 1 page
Total Marks: 60
Attempt all the following questions:
b)
(6 Marks) The figure bellow shows, a measurement of the effects of contextual
information on understanding by plotting the percentage of correctly
identified digits syllables or words spoken in the presence of the given
degree of background noise. Complete the graph using (words in sentence
– spoken digits – isolated words – nonsense syllables) then briefly explain.
Solution:
CES 607
6 of 7
Final – Saturday 14-01-2017
Benha University
Faculty of Engineering (at Shoubra)
Electrical Engineering Department
Master (Computer Systems Engineering)
Subject: Voice Recognition - CES 607
Fall Semester
Final Exam
Date: Sat 14/01/2017
Duration: 3 hours
№ of Questions: 3 in 1 page
Total Marks: 60
Attempt all the following questions:
c)
(3 Marks) The
figure bellow demonstrates the parameterisation of the speech
signal into components based loosely upon the human speech production
system. Complete the graph.
Solution:
Good Luck
Dr.Shady Yehia Elmashad
CES 607
7 of 7
Final – Saturday 14-01-2017