Hardware Descriptions of Multi-Layer Perceptions with Different

Paper by
E.M. Ortigosa , A. Canas, E.Ros,
P.M. Ortigosa, S. Mota , J. Dı´az
Paper Review by,
Ryan MacGowan
1
 What
does this even mean??
 Speech Recognition
 Artificial Neural Networks
 Solutions
 Results
 Likes and Dislikes
 Conclusion
2
 Multi-Layer
Perception (referred to as
MLP) is a type of standard feed forward
neural network which uses at least 3
layers (input, hidden, and output)
3
 Different
abstraction levels just means
that the solution will be realized using 2
different methods, one using low-level
VHDL, and another using higher-level
Handel -C
4
 Hardware
Descriptions of Multi-Layer
Perceptions with Different Abstraction
Levels
…… REALLY MEANS ……….
 FPGA
Implementation of two Neural
Networks using VHDL, and Handel-C, to
solve a problem
 In
this case the problem that is solved is
Speech Recognition
5




Due to the increasing power of FPGAs, solutions for Speech
Recognition can be designed using an Artificial Neural
Network built right into the FPGA.
Useful for applications in cars, GPS, toys, and other
embedded systems where control over speech would be
useful
A computer samples audio, and this waveform is converted
into a vector using vector bank and prediction analysis. This
vector is what is sent to the neural network.
The Neural Network computes which word was spoken as
the output
6



All solutions were realized using
Artificial Neural Network presented
here.
10 vectors with 22 features are 220
input data values, sent to the 24
hidden neurons for computation,
whose output are sent to the output
neurons which classify the input and
provide an out
If a spoken command falls in a class,
we expect that output node to give a
high value
7
 In
order to ensure maximum accuracy, a
number of neural network structures
were tested.
 The
best result of 96.3% accuracy was
present when 24 hidden neurons were
used.
8



This is the functional unit used in the implementations.
The first 8 bit input is the input value, and the second
represents the connection weight for that input
The output of the multiplier is sign extended due to the
maximum size of the summation of weight values in EQ1
9


The output of this functional unit presented in the
previous slide is then passed into the sigmoid
activation function, which gives an 8-bit output based
on the 23-bit input
This 8-bit output can either be passed to another layer
of hidden neurons, or passed to the output neurons
10
11



We can enhance the speed of
the design by placing the
Ram containing the weights ,
and the functional units in
parallel
The output sums from these
functional units are stored in
a register, and selected by
the multiplexer to be sent to
the activation function
This is only a partial parallel
design as the outputs of each
layer are still computer
sequentially
12



The Handel-C design is done in both serial and
parallel, as shown below
Serial
Parallel
NumHidden is the number of hidden neurons(24),
NumInput is the number of input values (220), W is the
array containing the weights, In is the input array, and
Sum is the sum of the weights multiplied by the inputs
13
 In
order to try to test more solutions and come
up with an optimal one, different RAM types
were used in the Handel-C Design:
1) Only Distributed RAM Blocks
2) A combination of Embedded and Distributed
RAM Blocks
3) Only Embedded RAM Blocks
14
 Here
are the results
15


When looking at the
throughput, the VHDL
design provides the
best in both serial and
parallel cases.
HC(a) provides the
highest throughput
due to the higher
frequency that can be
achieved
16



This trend continues as we
also examine the
performance cost.
The VHDL design is the
best as it provides the
highest throughput while
also using the least
number of gates
Handel-C designs have at
least a 1.6X higher
performance cost
17


These graphs show the linear
relationship between the
number of neurons in the hidden
layer and the amount of
resources used
Since our design only uses 24
hidden neurons, the resources
are manageable, however our
design is also very small (only
10 words)
18



While these tables and graphs above show that the
VHDL design is superior in terms of throughput and
performance cost, there are also drawbacks which we
must consider.
Design time is 10X longer on the VHDL design
Exploring different solutions with a VHDL Design takes
considerably longer as it requires an entire new control
unit to be designed each time.
19
 The
FPGA used is a Virtex-E 2000
• CLBs contain 4 LCs(Logic Cells), each with 4-
input function generator, carry logic, storage
element
• The 4 LCs are placed in 2 slices, each slice
provides 5-6 input function generators
• Each LC has a 4-input LUT which provides a 16x1
memory block
• Also contains EMB memory blocks
20
 VHDL
was coded with FPGA Advantage
Tool 5.3 from Mentor Graphics
 DK
Design Suite was used for Handel-C
implementation
 Both
designs were placed and routed
using ISE Foundation Tool 3.5i
21
 The
relevance to labs performed in the
course
 The
comparison between parallel and serial
for all types of implementation
 Great
description of neural network
covering all aspects
 The
application is very practical with todays
electronic culture
22

More detail on actual voice recognition system –
specifically the computer used to preprocess the
audio.

Paper is not entirely modern (2006)

Some sources used are even older (1990-1993)


The system is unrealistically small (10 different
words), with no discussion about viability in more
complex environment
Not much information given on VHDL. No
pseudocode given, no simulations given.
23

Using Neural Networks for Speech recognition allows
compact embedded systems to be developed

Parallel processing allows a speedup of up to 17X over a
serial implementation

VHDL results in an implementation which is 1.21-1.24X
faster than the Handel-C implementation

When using Handel-C, it is important to know the most
optimized type of RAM for your application, in this case
being distributed RAM.

Handel C designs have a 1.6X higher performance cost

Computation of output takes only 13-16ms
24
25