INTRODUCTION
Automatic recognition of speech by machine has been a goal of research
for more than four decades & has inspired many science fiction wonders. However, in
spite of the glamour of designing an intelligent machine that can recognize the spoken
word and comprehend its meaning, and in spite of enormous research efforts spent in
trying to create such a machine, we still have to achieve the desire goal of machine that
can understand spoken discourse on any subject by all speakers in all environments.
One of the most difficult aspects of performing research in speech
recognition by machine is its interdisciplinary nature, and the tendency of most
researchers to apply monolithic approach to individual problems.
Following are some disciplines that have been applied to one or more speech recognition
problems:
Signal processing:
The process of extracting relevant information from the speech signal in an efficient and
robust manner.
Physics(Acoustics):
The science of understanding relationship between the physical speech signal and
physiological mechanism that produce and perceive speech.
Pattern recognition:
Set of algorithms used to cluster data to create one or more prototypical patterns of data
ensemble.
Communication & information theory:
The procedures for estimating parameters of statistical models.
Linguistics :
The relationship between sounds (phonology), words in a language (syntax), meaning of
spoken words (semantics), and sense derived from meanings (pragmatics).
Physiology:
Understanding of higher order mechanisms within the human central nervous system that
account for speech production or perception in human beings.
Computer science:
Study of efficient algorithms for implementing the various method used in practical
speech recognition system.
SPEECH RECOGNITION
Psychology:
The science of understanding the factors that enables a technology to be used by human
beings in practical tasks.
The process of speech production & perception:
This process begins when talker formulates a message (in his mind) that he
want to transmit. The machine counterpart of this process is the creation of the printed
text expressing the words of the message. Next is the conversion of that message into
message code. This in machine context is converting the print text into a set of phoneme
sequence. Once the language code is selected, the talker executes a sequence of
neuromuscular commands to cause the vocal cords to vibrate appropriately thereby
producing an acoustic signal as the final output. Once this signal is generated &
propagated to the listener the speech recognition process begins.
First the signal is processed along the basilar membrane in the inner ear. A neural
transduction process converts the spectral signal into activity signals on the auditory
nerve. This in the machine counterpart refers to extraction process. The neural activity,
along the auditory nerve, is converted into a language code & finally message
comprehension is achieved on the listener’s side.
Successful speech recognition systems require knowledge & expertise from
a wide range of disciplines. Hence, it is important for researcher to have good
understanding of fundamentals of speech recognition.
SPEECH RECOGNITION
EXPERIMENT NO. 1
AIM:
A) Write a program in Matlab/C/C++ to generate following basic functions:
i) Unit impulse
ii) Unit step
iii) Ramp sequence
iv) Exponential sequence
v) Sine sequence
vi) Cosine sequence
B) Read the .wave file, plot the graph and play the file.
THEORY:
a. Unit impulse
The unit impulse "function" (or Dirac delta function) is a signal that has infinite
height and infinitesimal width. Considering time on the x-axis and amplitude on the yaxis, a unit impulse is generated. This function is one that is infinitesimally narrow,
infinitely tall, yet integrates to unity.
b. Unit step
The Unit step is another basic function. The function depends on time (‘t’).
Considering the time on x- axis and amplitude on y – axis, unit step function is
implemented. The output is shown in discrete and continuous form.
c. Unit ramp
The Unit Ramp function is closely related to the Unit step function. Where the
unit-step goes from zero to one instantaneously, the ramp function better resembles a
real-world signal, where there is some time needed for the signal to increase from zero to
its set value. It can be defined as
SPEECH RECOGNITION
d. Exponential
The complex exponential is one of the most fundamental and important signal in
signal and system analysis. Its importance comes from its functions as a basis for periodic
signals as well as being able to characterize linear, time-invariant signals. Complex
exponential can be broken up into its real part and its imaginary part. The exponential
function can be easily derived using the Taylor series, i.e.
∞
e = 1 XK
K=0
K!
X
e. Sine
Sine function is a geometric waveform obtained by simple harmonic variation,
and is defined by the function y = sin x. In other words, it is an s-shaped, smooth wave
that oscillates above and below zero. A sine wave is generated taking the time on x-axis
and amplitude on y-axis. It can be defined as
f. Cosine
Cosine wave is identical to the sine wave, except that the amplitude is 90° (/2
radians) out of phase. A cosine wave is generated taking the time on x-axis and amplitude
on y-axis. It can be defined as
DESCRIPTION:
1. SUBPLOT
SUBPLOT Creates axes in Tiled positions.H=SUBPLOT(m,n,p), or
SUBPLOT(mnp),breaks the Figure window into an m-by-n matrix of small
axes,selects the pth axes for the current plot,and return the axis handle.The axes are
counted along the top row of the figure window, then the second row,etc.
SPEECH RECOGNITION
2. STEM(Y)
STEM(Y) plots the data sequence Y as stems from the x-axis terminated with circles
for the data value.
3. STEM(X,Y)
STEM(Y) plots the data sequence Y at the values specified in X.
4. ZEROS
ZEROS(N) is an N-by-N matrix of zeros.
ZEROS(M,N) or ZEROS([M,N]) is an M-by-N matrix of zeros.
ZEROS(M,N,P,…) or ZEROS([M,N,P…]) is an M-by-N-by-P-by-…
zeros.
ZEROS(SIZE(A)) is the same size as A and all zeros.
array of
5. ONES
ONES Ones array.
ONES (N) is an N-by-N matrix of ones.
ONES (M, N) or ONES ([M,N]) is an M-by-N matrix of ones.
ONES
(M,N,P,…)
or
ONES([M,N,P…])
is
an
array of ones.
ONES (SIZE(A)) is the same size as A and all ones.
M-by-N-by-P-by-…
6. FIGURE()
FIGURE Creates Figure window.
FIGURE, by itself, creates a new figure window, and returns its handle.
FIGURE(H) makes H the current figure, forces it to become visible, and raises it above
all other figures on the screen. If Figure H does not exist, and H is an integer, a new
figure is created with handle H.
CODE A:
clc;
clear all;
close all;
n=6; % Variable %
% to generate unit impulse signal %
x=-2:1:2;
SPEECH RECOGNITION
y=[zeros(1,2),ones(1),zeros(1,2)];
subplot(3,2,1);
stem(x,y);
ylabel('Amplitude---------->');
xlabel('n--------->');
title('Unit Impulse Signal');
%Generate unit step signal%
t=-n:1:n-1;
y1=[zeros(1,n),ones(1,n)];
subplot(3,2,2);
stem(t,y1);
ylabel('Amplitude------>');
xlabel('n------>');
title('Unit Step Signal');
% to generate the ramp sequence%
t=0:1:n;
subplot(3,2,3);
stem(t,t);
ylabel('Amplitude------>');
xlabel('n------->');
title('Ramp sequence');
%to generate exponential sequence%
t=0:1:n;
y=exp(-1*t);
subplot(3,2,4);
stem(t,y);
ylabel('Amplitude------->');
xlabel('n------->');
title('Exponential sequence');
%generate sine sequence%
t1=0:0.01:pi;
y2=sin(2*pi*t1);
subplot(3,2,5);
stem(t1,y2);
ylabel('Amplitude------->');
xlabel('n------->');
SPEECH RECOGNITION
title('sine sequence');
%generate cosine sequence%
t2=0:0.01:pi;
y3=cos(2*pi*t2);
subplot(3,2,6);
stem(t2,y3);
ylabel('Amplitude------->');
xlabel('n------->');
title('cosine sequence');
CODE B :
clc;
clear all;
close all;
speechSignal = wavread('above');
plot(speechSignal);
title('Speech: Above');
soundsc (speechSignal);
SPEECH RECOGNITION
OUTPUT A :
SPEECH RECOGNITION
OUTPUT B :
SPEECH RECOGNITION
EXPERIMENT NO. 2
AIM:
A) 1-Dimensional Linear Convolution
Write a program in Matlab to determine the Linear Convolution of a finite duration
sequence x(n) and h(n). Accept the sequences x(n) and h(n) form the user. Display the
output sequence y(n). Plot the three sequences. Test on input: x (n) = [1 1 1 1 1]. (n) = [1
2 3 4 5 6 7 8] {1-Dimensional}.
B) 2-Dimensional Linear Convolution
Write a program to determine the 2-D Linear Convolution of a finite duration sequence
x(n1,n2) and h(n1,n2). Accept the sequences x(n1,n2) and h(n1,n2) from the user.
Display the output sequences y(n1,n2). Plot all three sequences. Test on input
x(n1,n2)=[1 2 ; 3 4 ]; h(n1,n2) = [5 6 ; 7 8 ];
C) Circular Convolution:
Write a program in Matlab to determine Circular Convolution of the sequence x(n) and
h(n). Accept the sequences x(n) and h(n) form the user and display the output sequence
y(n). Test on input: x(n) = [ 1 2 4 ]. h(n) = [ 1 2 ].
D) 2-D Cross and Auto Correlation:
Write a program for determining 2-D Cross and Auto Correlation of sequence x(n1,n2)
and h(n1,n2). Accept the sequences x(n) and h(n) from the user and display the output
sequence y(n1,n2). Test on input : x(n1,n2) = [ 1 2 ; 3 4 ]; h(n1,n2) = [5 6 ; 7 8 ];
THEORY:
Convolution is an integral concatenation of two signals. It has many
applications in numerous areas of signal processing. The most popular application is the
determination of the output signal of a linear time-invariant system by convolving the
input signal with the impulse response of the system. The linear convolution of two
continuous time signals x(t)and h(t) is defined by
For discrete time signals
summation
SPEECH RECOGNITION
and
, the integration is replaced by a
In the following program the inputs x(n) and h(n) are provided by the user. The conv
function is used to calculate the linear convolution. In case of a 2 dimensional linear
convolution, conv2 is used.
If both f and h are periodic, with period N, then their convolution (as defined above) will
yield an infinite result (assuming the signals are non-zero). However, we can usefully redefine the convolution of such signals:
It is easy to see that g is also periodic, with period N. (Strictly, this only requires that h is
periodic, but we also require that f is periodic to ensure circular convolution commutes.)
In the program below, Circular convolution is implemented using C as there is no direct
function in Matlab for the same
DESCRIPTION:
CONV(A,B)
C= CONV (A, B) convolves vectors A and B. The resulting vector is length
LENGTH(A)+LENGTH(B)-1. If A and B are vectors of polynomial coefficients,
convolving them is equivalent to multiplying the two polynomials.
CODE :
A) 1-Dimensional Linear Convolution
%x(n)--> Input Sequence% Given-->[1 1 1 1 1]
%h(n)--> Impulse Response% Given-->[1 2 3 4 5 6 7 8]
clc;
clear all;
close all;
x=[1 1 1 1 1];
h=[1 2 3 4 5 6 7 8];
y=conv(x,h);
figure(1);
subplot(3,1,1);
SPEECH RECOGNITION
stem(x);
xlabel('n------->');
ylabel('Amplitude------->');
title('Input Sequence-----');
subplot(3,1,2);
stem(h);
xlabel('n------->');
ylabel('Amplitude->');
title('Impulse Response');
subplot(3,1,3);
stem(y);
xlabel('n------->');
ylabel('Ampitude------->');
title('Output Sequence');
B) 2-Dimensional Linear Convolution
clc
clear all
close all
h=[1 2 3 4 5 6 7 8];
x=[1 2; 3 4];
y=conv2(x,h);
disp(y);
figure(2),subplot(3,1,1);
plot(x);
title('input sequence-->');
subplot(3,1,2);
plot(h);
title('impulse response-->');
subplot(3,1,3);
plot(y);
title('output sequence--->');
SPEECH RECOGNITION
C) Circular Convolution:
%x(n) input sequence%Given-->[1 2 4]
%h(n) impulse response%Given-->[1 2]
clc
clear all
close all
%x=[1 2 4]
x=input('Enter the x(n):=');
%h=[1 2];
h=input('Enter the h(n):=');
N1=length(x);
N2=length(h);
N=max(N1,N2);
y=conv(x,h);
ly=length(y);
for i=1:1:N
if(N+i<=ly)
r(i)=y(i)+y(N+i);
else
r(i)=y(i);
end
end
disp(r);
D) 2-D Cross and Auto Correlation:
%Two-Dimensional Cross-Correlation And Auto-Correlation%
clc
clear all
close all
x=[ 1 2;3 4];
h=[5 6;7 8];
y=xcorr2(x,h);
%to compute cross correlation
disp('Cross correlation');
disp(y);
y1=xcorr2(x);
%to compute auto correlation
disp('Auto correlation');
disp(y1);
SPEECH RECOGNITION
OUTPUT:
A) 1-Dimensional Linear Convolution
SPEECH RECOGNITION
D) 2-Dimensional Linear Convolution
1
3
4
10
7 10 13 16 19 22 16
17 24 31 38 45 52 32
SPEECH RECOGNITION
E) Circular Convolution:
Enter the x(n):=[1 2 1 2]
Enter the h(n):=[3 2 1 4]
16
14
16
14
F) 2-D Cross and Auto Correlation:
Cross Correlation
8 23 14
30 70 38
18 39 20
Auto Correlation
4 11 6
14 30 14
6 11 4
SPEECH RECOGNITION
EXPERIMENT NO. 3
AIM:
Write a program in Matlab/C/C++ to design a Low-Pass FIR Filter.
1.
2.
3.
4.
5.
6.
7.
8.
Read the sampling frequency
Read the order of the filter
Input the sine sequence
Input cut-off frequency
Create FIR Low Pass Filter with Hamming window
Plot the magnitude response of the filter
Plot the frequency response
Plot the phase response
Design a N-point FIR Low-Pass filter with the cut off frequency 0.2*pi using
Hamming
Plot for N = 16, 32, 64, 128, 256.
Input the Fourier series:
X(n) = 4*sin(2*pi*f*t)+(4/3)*sin(2*pi*3*f*t)+(4/5)
*sin(22*pi*5*f*t)_(4/7)*sin(2*pi*7*f*t)
And obtain the output when f = 0.8, 0.6, 0.4, 0.2 and display.
Plot the magnitude and phase response of the filter.
THEORY:
HAMMING WINDOW
Compute a Hamming window.
SYNTAX:
w = hamming(n)
w = hamming(n,'sflag')
DESCRIPTION:
w = hamming(n) returns an n-point symmetric Hamming window in the column
vector w. n should be a positive integer. The coefficients of a Hamming window are
computed from the following equation.
W = hamming(n,’sflag’) returns an n-point Hamming window using the window
SPEECH RECOGNITION
sampling specified by ‘sflag’, which can be either ‘periodic’ or ‘symmetric’
(the default). When ‘periodic’ is specified, hamming computes a length n+1
window and returns the first n points.
CODE :
%To Design Lowpass FIR Filter Using Hamming Windows%
clc;
clear all;
close all;
sn = 0;
f=input('Enter The Sampling Frequency Where f = 0.8 or 0.6 or 0.4 or 0.2 := ');
n=input('Enter The Order Of Filter Where n = 16 or 32 or 64 or 128 or 256 := ');
for i=1:256
sn(i) = ((4*sin(2*pi*(f)*i)) +
((4/3)*sin(2*pi*3*(f)*i))+((4/5)*sin(2*pi*5*(f)*i))+((4/7)*sin(2*pi*7*(f)*i)));
end
subplot(3,2,1);
plot(sn);grid;
axis([0 150 -10 10]);
%Create a filter and use sn as input
wn = 0.2 * pi; %Cutoff frequency
win=hamming(n+1);
b = fir1(n,wn,win); %Creating FIR LP filter with hamminng window
hs = filter(b,1,sn);
%Plot the output
subplot(3,2,2);
plot(hs);grid;
axis([0 150 -10 10]);
title('Output of the FIR LP filter');
%Plot the magnitude response of the filter
subplot(3,2,3);
[H,F] = freqz(b,1,256); % returns the frequency response vector H and the
corresponding angular frequency vector F
plot(F,20 * log10(abs(H)));grid;
xlabel('Normalized F (rad/sample)');
ylabel('Magnitude (dB)');
SPEECH RECOGNITION
title('Magnitude Response');
%Plot the phase response of the filter
subplot(3,2,4);
plot(F,unwrap(angle(H))*180/pi); grid;
xlabel('Normalized F (rad/sample)');
ylabel('Phase (degrees)');
title('Phase Response');
%Plot the filter
subplot(3,2,5);
plot(hs);grid;
xlabel('Filter using hamming window');
fvtool(b);
SPEECH RECOGNITION
OUTPUT:
Enter The Sampling Frequency Where f = 0.8 or 0.6 or 0.4 or 0.2 := 0.8
Enter The Order Of Filter Where n = 16 or 32 or 64 or 128 or 256 := 16
SPEECH RECOGNITION
SPEECH RECOGNITION
EXPERIMENT NO. 4
AIM:
Write a program in Matlab/C/C++ to design a High-Pass Filter
1.
2.
3.
4.
5.
6.
7.
8.
Read the sampling frequency.
Read the order of the filter.
Input the sine sequence .
Input cut-off frequency.
Create FIR High Pass Filter with Hamming window.
Plot the magnitude response of the filter.
Plot the frequency response.
Plot the phase response
Design a N-point FIR High-Pass filter with the cutoff frequency 0.2*pi using
Hamming Window.
Plot for N = 16, 32, 64,128, 256.
Input the Fourier series:
X(n) = 4*sin(2*pi*f*t)+(4/3)*sin(2*pi*3*f*t)+(4/5)
*sin(22*pi*5*f*t)_(4/7)*sin(2*pi*7*f*t)
And obtain the output when f = 0.8, 0.6, 0.4, 0.2 and display.
Plot the magnitude and phase response of the filter.
CODE:
%To Design Highpass FIR Filter Using Hamming Windows%
clc
clear all
close all
sn = 0;
f=input('Enter The Sampling Frequency Where f = 0.8 or 0.6 or 0.4 or 0.2 := ');
n=input('Enter The Order Of Filter Where n = 16 or 32 or 64 or 128 or 256 := ');
for i=1:256
sn(i) = ((4*sin(2*pi*(f)*i)) +
((4/3)*sin(2*pi*3*(f)*i))+((4/5)*sin(2*pi*5*(f)*i))+((4/7)*sin(2*pi*7*(f)*i)));
end
SPEECH RECOGNITION
subplot(3,2,1);
plot(sn);grid;
axis([0 150 -10 10]);
%Create a filter and use sn as input
wn = 0.2 * pi; %Cutoff frequency
win=hamming(n+1);
b = fir1(n,wn,'high',win); %Creating FIR HP filter with hamming window
hs = filter(b,1,sn);
%Plot the output
subplot(3,2,2);
plot(hs);grid;
axis([0 150 -10 10]);
title('Output of the FIR HP filter');
%Plot the magnitude response of the filter
subplot(3,2,3);
[H,F] = freqz(b,1,256); % returns the frequency response vector H and the
corresponding angular frequency vector F
plot(F,20 * log10(abs(H)));grid;
xlabel('Normalized F (rad/sample)');
ylabel('Magnitude (dB)');
title('Magnitude Response');
%Plot the phase response of the filter
subplot(3,2,4);
plot(F,unwrap(angle(H))*180/pi); grid;
xlabel('Normalized F (rad/sample)');
ylabel('Phase (degrees)');
title('Phase Response');
%Plot the filter
subplot(3,2,5);
plot(b);grid;
xlabel('Filter using hamming window');
fvtool(b);
SPEECH RECOGNITION
OUTPUT:
Enter The Sampling Frequency Where f = 0.8 or 0.6 or 0.4 or 0.2 := 0.8
Enter The Order Of Filter Where n = 16 or 32 or 64 or 128 or 256 := 64
SPEECH RECOGNITION
SPEECH RECOGNITION
EXPERIMENT NO.5
AIM :
Write a program in Matlab/C/C++ to design Spectrogram, narrowband or
Wideband.
1.Read the sampling frequency.
2. Read the input speech signal.
3. Use spectrogram functions to plot the spectrogram
4. Specgram( y,nfft,Fs,window,noverlap)Where y is the speech
signal
5.Window is the hamming window of length nfft
6.Noverlap is the number ofoverlapping segment that produce 50%
overlap between the segments.
7.Nfft is the fft length and is the maximum of 256 or the power of
2 greater than the length of each segment of x.
THEORY :
Demonstration of the spectrogram, narrowband or wideband. Notice that
the latter has better time-resolution (since it uses a short window in time) but worse
frequency resolution. In the case of the wideband spectrogram, the window is of about
the same duration as the pitch period. Notice the vertical striations due to this fact.
CODE:
clc;
clear all;
close all;
% Long window in time - Narrowband
[y,Fs,NBITS]=wavread('above.wav');
t_window_narrowband = .05;
% window - time
t_overlap_narrowband = .001;
window_narrowband = t_window_narrowband*Fs; % window samps
noverlap_narrowband = t_overlap_narrowband*Fs;
nfft_narrowband = 1024;
SPEECH RECOGNITION
window = window_narrowband;
noverlap = noverlap_narrowband;
nfft = nfft_narrowband;
subplot(2,1,1);
specgram(y,nfft,Fs,window,noverlap);
xlabel('Time (sec)');
ylabel('Frequency (Hz)');
title('NarrowBand');
% Spectrogram, Wideband, short window in time
t_window_wideband = .005;
% window - time
t_overlap_wideband = .001;
window_wideband = t_window_wideband*Fs;
% window samps
noverlap_wideband = 1; %t_overlap_wideband*Fs;
nfft_wideband = 8192;
window = window_wideband;
noverlap = noverlap_wideband;
nfft = nfft_wideband;
subplot(2,1,2)
specgram(y,nfft,Fs,window,noverlap);
xlabel('Time (sec)');
ylabel('Frequency (Hz)');
title('WideBand');
SPEECH RECOGNITION
OUTPUT:
SPEECH RECOGNITION
EXPERIMENT NO. 6
AIM:
Write a program in Matlab/C/C++ to design Spectrogram, FFT of speech
1. Representation of speech in the frequency domain is demonstrated.
2. The importance as per computational tool becomes also clear.
3. Read the input speech signal.
4. For phase unwrap the signal.
5.Window the signal with hamming and apply the Fast Fourier Transform
THEORY:
The discrete-time discrete-frequency version of the Fourier transform
(DFT) converts an array of N sample amplitudes to an array of N complex
harmonic amplitudes. If the sampling rate is Fs, the N input samples are 1/Fs
seconds apart, and the output harmonic frequencies are Fs/N hertz apart. That is
the N output amplitudes are evenly spaced at frequencies between 0 and
N−1 ∗Fs /N Hertz.
The Discrete Time Fourier Transform (DFT) is given by the equation,
In effect, an inner product is computed of a given discrete-time signal x[n] and N
complex exponential signals representing sine and cosine terms at discrete
frequencies 2∗ ∗K /N . Each of these inner product computations produces a
single complex valued ``weight'' that indicates the relative strength of a specific
sinusoidal frequency in the signal.
To compute the DFT in MATLAB, we use the function fft(x,n). This function
takes a waveform x and the number of samples n. When n is less than the length
of x, then x is truncated; when n is longer than the length of x, then x is padded
SIES NERUL, M.Sc. (IT) First Year 2007-2008
with zeros. The output is an array of complex amplitudes of length n. You can
obtain the magnitude of each spectral component with abs(), and its phase with
angle() (result in radians).
SPEECH RECOGNITION
How is a spectrum obtained?
Given a discrete-time signal x[n], we can determine its frequency response
using the Discrete Fourier Transform (DFT):
•
The DFT determines sinusoidal ``weights'' via the inner product of
sinusoids and the signal.
•
The DFT can be interpreted as the sum of projections of x[n] onto a set of k
sampled complex sinusoids or sinusoidal basis functions at (normalized)
radian frequencies given by .
•
In this way, the DFT and its inverse provide a ``recipe'' for reconstructing a
given discrete-time signal in terms of sampled complex sinusoids.
•
The DFT coefficients are complex values. To plot the magnitude response
of a signal's spectrum, we calculate the magnitude of each coefficient. For
example, if a given coefficient is given by a + jb, its magnitude can be
determined as
•
•
The Matlab function abs performs this calculation.
The phase response of a signal is given by the ``angles'' of its complex DFT
coefficients. For a coefficient given by a + jb, the phase angle is
determined as
•
. The Matlab function angle performs this calculation.
In this example, we generate a sinusoidal waveform signal, then calculate
and display the magnitude spectrum and the phase spectrum. Finally display the
spectrogram of the generated signal. The fft() function calculates the discrete
Fourier transform in the most efficient way possible, but transforms of length 2m
(m=integer) are considerably fastest to calculate. Use sizes of 512, 1024, etc for
the fastest speed.
SPEECH RECOGNITION
CODE:
clear all;
Close all;
f=1000; % Fundamental frequency f = 1 KHz
Fs=10000; % Sampling Frequency Fs = 10 KHz
n=1:20000; % no. of samples = 2 * Fs
Ts=1/Fs; % Sampling interval = 1/Sampling frequency
x=sin(2*pi*f*n*Ts);
subplot(2,2,1), plot(x);
ylabel('Amplitude'), grid on;
xlabel('Time [Seconds]');
% perform 1000-point transform
y = fft(x,1000); % y contains 1000 complex amplitudes
y = y(1:500); % just look at first half
m = abs(y); % m = magnitude of sinusoids
p = unwrap(angle(y)); % p = phase of sinusoids, unwrap()
% copes with 360 degree jumps
% plot spectrum 0..fs/2
f = (0:499)*Fs/1000; % calculate Hertz values
subplot(2,2,2), plot(f,m); % plot magnitudes
ylabel('Abs. Magnitude'), grid on;
subplot(2,2,3), plot(f,p*180/pi); % plot phase in degrees
ylabel('Phase [Degrees]'), grid on;
xlabel('Frequency [Hertz]');
%Spectrogram plot
Subplot(2,2,4);
specgram (x,[],Fs);
colormap(gray);
SPEECH RECOGNITION
OUTPUT :
SPEECH RECOGNITION
EXPERIMENT NO.7
AIM:
A) Short-Time Speech Measurements, Short-Time energy Calculation
B) Short-Time Speech Measurements, Average Magnitude Calculation
THEORY:
Short-Time energy is a simple short-time speech measurement. It is
defined as:
where x(m) denotes input speech samples and
w(n) is window function.
This measurement can in a way distinguish between voiced and unvoiced speech
segments, since unvoiced speech has significantly smaller short- time energy. For
the length of the window a practical choice is 10-20 ms that is 160-320 samples
for sampling frequency 16kHz.
Average magnitude function defined as
does not emphasize large signal levels so much as short-time energy since its calculation
does not include squaring. The signals in the graphs are normalized
CODE A:
clc;
clear all;
close all;
[y,Fs,nbits]=wavread('above.wav');
% A hamming window is chosen
winLen = 301;
winOverlap = 300;
wHamm = hamming(winLen);
SPEECH RECOGNITION
% Framing and windowing the signal without for loops.
sigFramed = buffer(y, winLen, winOverlap, 'nodelay');
sigWindowed = diag(sparse(wHamm)) * sigFramed;
% Short-Time Energy calculation
energyST = sum(sigWindowed.^2,1);
% Time in seconds, for the graphs
t = [0:length(y)-1]/Fs;
subplot(1,1,1);
plot(t,y);
title('speech: Above');
xlims = get(gca,'Xlim');
hold on;
% Short-Time energy is delayed due to lowpass filtering. This delay is
% compensated for the graph.
delay = (winLen - 1)/2;
plot(t(delay+1:end - delay), energyST, 'r');
xlim(xlims);
xlabel('Time (sec)');
legend({'Speech','Short-Time Energy'});
hold off;
CODE B:
clear all;
close all;
clc;
[speechSignal,Fs,nbits]=wavread('above.wav');
winLen = 301;
winOverlap = 300;
wHamm = hamming(winLen);
% Framing and windowing the signal without for loops.
sigFramed = buffer(abs(speechSignal), winLen, winOverlap, 'nodelay');
sigWindowed = diag(sparse(wHamm)) * sigFramed;
SPEECH RECOGNITION
% Average Magnitude Calculation
magnitudeAv = sum(sigWindowed,1);
% Time in seconds, for the graphs
t = [0:length(speechSignal)-1]/Fs;
subplot(1,1,1);
plot(t, speechSignal/max(abs(speechSignal)));
title('speech: Above');
hold on;
delay = (winLen - 1)/2;
plot(t(delay+1:end-delay), magnitudeAv/max(magnitudeAv), 'r');
legend('Speech','Average Magnitude');
xlabel('Time (sec)');
hold off;
OUTPUT A:
SPEECH RECOGNITION
OUTPUT B:
SPEECH RECOGNITION
EXPERIMENT NO. 8
AIM:
A) Short time speech measurement, short time auto-correlation, varying window
length
B) Short time speech measurement, short time auto-correlation, voiced speech
THEORY:
Short-time autocorrelation is defined as
and is actually the autocorrelation of a windowed speech segment. The window used is
the rectangular. Notice the attenuation of the autocorrelation as the window length
becomes shorter. This is expected, since the number of the samples used in the
calculation decreases.
The property of the short-time autocorrelation to reveal periodicity in a signal is
demonstrated. Notice how the autocorrelation of the voiced speech segment retains the
periodicity. On the other hand, the autocorrelation of the unvoiced speech segment looks
more like noise. In general, autocorrelation is considered as a robust indicator of
periodicity.
CODE A:
clc;
clear all;
close all;
[speechSignal,Fs,nbits]=wavread('eee.wav');
figure(1),plot(speechSignal);
% Three speech segments, of different length.
ss1 = speechSignal(16095:16700);
ss2 = speechSignal(16095:16550);
ss3 = speechSignal(16095:16400);
% Calculation of the short time autocorrelation
[ac1, lags1] = xcorr(ss1);
[ac2, lags2] = xcorr(ss2);
[ac3, lags3] = xcorr(ss3);
SPEECH RECOGNITION
figure(2);
subplot(3,1,1);
plot(lags1, ac1);
legend('Window Length: 606 Samples')
title('Short-Time Autocorrelation Function')
grid on;
subplot(3,1,2);
plot(lags2, ac2);
xlim([lags1(1) lags1(end)]);
legend('Window Length: 456 Samples')
grid on;
subplot(3,1,3);
plot(lags3, ac3);
xlim([lags1(1) lags1(end)]);
legend('Window Length: 306 Samples')
xlabel('Lag in samples')
grid on;
CODE B:
clc;
clear all;
close all;
% An unvoiced speech segment.
[speechSignal,Fs,nbits]=wavread('eee.wav');
figure(1),plot(speechSignal);
ss1 = speechSignal(16095:16700);
ss4 = speechSignal(12200:12800);
[ac4, lags4] = xcorr(ss4);
figure(2);
subplot(2,2,1);
plot(ss1);
legend('Voiced Speech')
[ac1, lags1] = xcorr(ss1);
subplot(2,2,2);
plot(lags1, ac1);
xlim([lags1(1) lags1(end)]);
legend('Autocorrelation of Voiced Speech')
grid on;
subplot(2,2,3);
SPEECH RECOGNITION
plot(ss4);
legend('Unvoiced Speech')
subplot(2,2,4);
plot(lags4, ac4);
xlim([lags1(1) lags1(end)]);
legend('Autocorrelation of Unvoiced Speech')
grid on;
clc;
clear all;
close all;
% An unvoiced speech segment.
[speechSignal,Fs,nbits]=wavread('eee.wav');
ss1 = speechSignal(16095:16700);
ss4 = speechSignal(12200:12800);
[ac4, lags4] = xcorr(ss4);
figure(1);
subplot(1,2,1);
plot(ss1);
legend('Voiced Speech')
[ac1, lags1] = xcorr(ss1);
subplot(1,2,2);
plot(lags1, ac1);
xlim([lags1(1) lags1(end)]);
legend('Autocorrelation of Voiced Speech')
grid on;
SPEECH RECOGNITION
OUTPUT A:
OUTPUT B:
SPEECH RECOGNITION
EXPERIMENT NO. 9
AIM:
Linear Predication Autocorrelation method
a) Apply Hamming window to the signal
b) Use LPC autocorrelation method of order 20.
c) Use LPC function of MATLAB
d) Estimate the coefficient of speech signal
e) Estimate the predication error
f) Find autocorrelation of the prediction error
g) Calculate the frequency response of the linear prediction model
h) Calculate the spectrum of the windowed signal
i) Calculate the spectrum of the error signal
j) Display the results
THEORY:
Linear predictive analysis of speech is demonstrated. The methods used are
either the autocorrelation method or the covariance method. The autocorrelation method
assumes that the signal is identically zero outside the analysis interval (0<=m<=N-1).
Then it tries to minimize the prediction error wherever it is nonzero, that is in the interval
0<=m<=N-1+p, where p is the order of the model used. The error is likely to be large at
the beginning and at the end of this interval. This is the reason why the speech segment
analyzed is usually tapered by the application of a Hamming window, for example. For
the choice of the window length it has been shown that it should be on the order of
several pitch periods to ensure reliable results. One advantage of this method is that
stability of the resulting model is ensured. The error autocorrelation and spectrum are
calculated as a measure of the whiteness of the prediction error.
CODE:
close all;
clc;
clear all;
% Voiced sound
[x,Fs,nbits] = wavread('above.wav');
len_x = length(x);
% The signal is windowed
w = hamming(len_x);
wx = w.*x;
SPEECH RECOGNITION
% Lpc autocorrelation method
order = 20;
% LPC function of MATLAB is used
[lpcoefs, errorPow] = lpc(wx, order);
% The estimated signal is calculated as the output of linearly filtering
% the speech signal with the coefficients estimated above
estx = filter([0 -lpcoefs(2:end)], 1, [wx; zeros(order,1)]);
% The prediction error is estimated in the interval 0<=m<=N-1+p
er = [wx; zeros(order,1)] - estx;
%Prediction error energy in the same interva
erEn = sum(er.^2);
% Autocorrelation of the prediction error
[acs,lags] = xcorr(er);
% Calculate the frequency response of the linear prediction model
[H, W] = freqz(sqrt(erEn), lpcoefs(1:end), 513);
% Calculate the spectrum of the windowed signal
S = abs(fft(wx,1024));
% Calculate the spectrum of the error signal
eS = abs(fft(er,1024));
% Display results
subplot(5,1,1);
plot([wx; zeros(order,1)],'g');
title('Word : Above - Linear Predictive Analysis, Autocorrelation Method');
hold on;
plot(estx);
hold off;
xlim([0 length(er)])
legend('Speech Signal','Estimated Signal');
subplot(5,1,2);
plot(er);
xlim([0 length(er)])
legend('Error Signal');
subplot(5,1,3);
plot(linspace(0,0.5,513), 20*log10(abs(H)));
hold on;
SPEECH RECOGNITION
plot(linspace(0,0.5,513), 20*log10(S(1:513)), 'g');
legend('Model Frequency Response','Speech Spectrum')
hold off;
subplot(5,1,4);
plot(lags, acs);
legend('Prediction Error Autocorrelation')
subplot(5,1,5);
plot(linspace(0,0.5,513), 20*log10(eS(1:513)));
legend('Prediction Error Spectrum')
OUTPUT:
SPEECH RECOGNITION
EXPERIMENT NO. 10
AIM:
Hidden Markov Model
a) Generate a test sequence
b) Generate a random sequence of step and emissions from the model
c) Estimate the state sequence
d) To test the accuracy of HMM viterbi codes ,compute the percentage
of the actual
e) sequence states that agrees with the sequence likely states.
f) Estimating posterior state probabilities
THEORY:
Markov models are mathematical models of stochastic processes -processes that generate random sequences of outcomes according to certain probabilities.
A simple example of a stochastic process is a sequence of coin tosses, the outcomes being
heads or tails. People use Markov models to analyze a wide variety of stochastic
processes, from daily stock prices to the positions of genes in a chromosome. You can
construct Markov models very easily using state diagrams, such as the one shown in this
figure
A State Diagram for a Markov Model
SPEECH RECOGNITION
The rectangles in the diagram represent the possible states of the process
you are trying to model, and the arrows represent transitions between states. The label on
each arrow represents the probability of that transition, which depends on the process you
are modeling. At each step of the process, the model generates an output, or emission,
depending on which state it is in, and then makes a transition to another state. For
example, if you are modeling a sequence of coin tosses, the two states are heads and tails.
The most recent coin toss determines the current state of the model and
each subsequent toss determines the transition to the next state. If the coin is fair, the
transition probabilities are all 1/2. In this simple example, the emission at any moment in
time is simply the current state. However, in more complicated models, the states
themselves can contain random processes that affect their emissions. For example, after
each flip of the coin, you could roll a die to determine the emission at that step. A hidden
Markov model is one in which you observe a sequence of emissions, but you do not
know the sequence of states the model went through to generate the emissions. In this
case, your goal is to recover the state information from the observed data.
The next section, Example of a Hidden Markov Model, provides an
example. The Statistics Toolbox includes five functions for analyzing hidden Markov
models:
Hmmdecode -- Calculates the posterior state probabilities of a sequence
hmmgenerate -- Generates a sequence for a hidden Markov model
hmmestimate -- Estimates the parameters for a Markov model
hmmtrain
-- Calculates the maximum likelihood estimate of hidden Markov model
parameters
hmmviterbi -- Calculates the most likely state path for a hidden Markov model sequence
SPEECH RECOGNITION
CODE :
close all;
clear all;
clc;
% Generate a state sequence
disp('Create transition matrix');
TRANS = [.9 .1; .05 .95]
disp('Create Emission matrix');
EMIS = [1/6, 1/6, 1/6, 1/6, 1/6, 1/6;7/12, 1/12, 1/12, 1/12, 1/12, 1/12]
SPEECH RECOGNITION
[seq, states] = hmmgenerate(1000, TRANS, EMIS);
%Estimate the state sequence
likelystates = hmmviterbi(seq, TRANS, EMIS);
sum(states==likelystates)/1000
%Estimating transition and emission matrix
disp('Estimated Transition and Emission matrix');
[TRANS_EST, EMIS_EST] = hmmestimate(seq, states)
%Estimating posterior state probabilties
PSTATES = hmmdecode(seq, TRANS, EMIS);
OUTPUT :
Create transition matrix
TRANS =
0.9000
0.0500
0.1000
0.9500
Create Emission matrix
EMIS =
0.1667
0.5833
0.1667
0.0833
0.1667
0.0833
0.1667
0.0833
0.1667
0.0833
ans =
0.8490
Estimated Transition and Emission matrix
TRANS_EST =
0.9127
0.0515
0.0873
0.9485
EMIS_EST =
0.1481
0.5772
0.1534
0.0707
0.1931 0.1587 0.1720 0.1746
0.1013 0.0788 0.0852 0.0868
SPEECH RECOGNITION
0.1667
0.0833
© Copyright 2026 Paperzz