VishnuParamsivamPFKE2013TOC

vii
TABLE OF CONTENTS
CHAPTER
TITLE
DECLARATION
DEDICATION
ACKNOWLEDGEMENT
ABSTRACT
ABSTRAK
TABLE OF CONTENTS
LIST OF TABLES
LIST OF FIGURES
LIST OF ABBREVIATIONS
LIST OF SYMBOLS
LIST OF APPENDICES
1
INTRODUCTION
1.1
Block-based Neural Networks
1.2
Problem Statement
1.3
Objectives
1.4
Scope of Work
1.5
Contributions
1.6
Dissertation Organization
2
LITERATURE REVIEW AND THEORETICAL BACKGROUND
2.1
Artificial Neural Networks (ANN)
2.1.1
Background Theory of ANNs
2.1.2
Neuron Activation Functions
2.1.3
Common Criticism of Conventional
ANNs
2.2
Background Theory of BbNNs
2.2.1
Previous Work on BbNNs in Hardware
PAGE
ii
iii
iv
v
vi
vii
xii
xiv
xix
xxi
xxii
1
2
3
6
6
8
9
11
11
11
13
16
18
20
viii
2.3
Genetic Algorithm
2.3.1
Multi-objective Optimization Models Using GA
Previous Work on Selected Case Studies
2.4.1
Driver Drowsiness Classification Using
Heart Rate Variability
2.4.2
ECG Signal Classification in Detection
Heart Arrhythmia
2.4.3
Tomita Language Grammar Inference
22
METHODOLOGY
3.1
Research Approach
3.1.1
System Latency and GA-based Training
for the Proposed BbNN
3.1.2
Design Considerations in Neuron Block
Hardware Development
3.1.3
Design Approach of the Simulation
Model for the Proposed BbNN
3.1.4
Hardware Design: Co-Design and SoC
Implementation
3.2
Software Tools and Design Environment
3.2.1
The GAlib Library
3.2.2
Mentor Graphics Modelsim
3.2.3
Icarus Verilog
3.2.4
Verilator
3.2.5
Nios2-Linux Embedded Operating System
3.3
Hardware Design Verification Methodology
3.3.1
Using the Command-line Version of
Modelsim for Verification
3.3.2
Using SystemVerilog DPI-C in Modelsim
for Co-Verification
3.3.3
Viewing Waveforms with GTKWave
3.3.4
Using Icarus Verilog for Verification
3.3.5
Using Verilator for Advanced CoSimulation
3.4
Case Studies for Verification of the Proposed BbNN
3.4.1
Test Dataset Creation from HRV Feature
Vectors for Case Study 2
31
31
2.4
3
25
26
26
28
30
33
34
35
36
38
38
40
40
41
42
44
45
47
49
49
51
54
55
ix
3.4.2
3.4.3
4
5
Test Dataset Creation from ECG Signal
Feature Vectors for Case Study 3
Test Dataset Creation Using Tomita
Grammar for Case Study 4
MODELING, ALGORITHMS, AND SIMULATION OF
PROPOSED BLOCK-BASED NEURAL NETWORKS
4.1
BbNN as a Structured Neural Network
4.2
Modeling of a Neuron Block in the Proposed BbNN
4.2.1
System Latency and Synchronous Logic
Structure of Neuron Blocks
4.2.2
Design Considerations: Range of Values
for the Synaptic Weights
4.3
GA-based Training and Optimization of the
Proposed BbNN
4.3.1
Formulation of the GA Fitness Function
4.4
BbNN Array Interconnect Algorithm
4.5
Implementation of the BbNN Simulation Model
HARDWARE DESIGN AND IMPLEMENTATION
5.1
System Architecture of BbNN in Hardware
5.2
Hardware Design Considerations
5.2.1
Number Representation for Fixed-Point
Arithmetics and Precision
5.2.2
Activation Function of the Hardware
Neuron Block
5.3
Design of the Proposed Hardware Neuron Block
5.3.1
Neuron Block FSM Control Unit
5.3.2
Multiplexing Sequence of the Multiplier
Inputs
5.3.3
Neuron Block Dual-Port On-chip Memory Submodule
5.3.4
Neuron Block Input Multiplexer Submodule
5.3.5
Neuron Block ALU Submodule
5.3.6
Neuron Block Operand Selector Submodule
58
61
63
63
65
68
71
72
75
79
84
87
87
89
89
92
96
98
100
102
104
105
108
x
5.3.7
6
Top Level BbNN Controller with Avalon
Bus Interfacing
EXPERIMENTAL RESULTS, CASE STUDIES, AND
ANALYSIS
6.1
Analysis to Obtain Optimum Range of Weight
Values
6.1.1
Geometric Analysis of a Neuron Block
with One Synapse
6.1.2
Geometric Analysis of a Neuron Block
with Two Synapses
6.1.3
Geometric Analysis of a Neuron Block
with Three Synapses
6.1.4
Weight Boundary Selection and Empirical Analysis
6.2
Verification and Analysis of Proposed BbNN
through Case Studies
6.2.1
Functional Verification of Proposed
BbNN Using the XOR Classification
Problem
6.2.2
Case Study 2: Driver Drowsiness Detection Based on HRV
6.2.3
Case Study 3: Heart Arrhythmia Classification from ECG Signals
6.2.4
Case Study 4: Grammar Inference of the
Tomita Language
6.3
Verification of the BbNN Implementation Model in
Hardware
6.3.1
Simulation Results of the Neuron Block
Submodules
6.3.2
Simulation Results of the Neuron Block
6.3.3
Co-Simulation of BbNN HW-SW Design
6.4
Resource Utilization of the BbNN Implementation
Model in Hardware
6.5
Performance Analysis of the Proposed BbNN
Models
6.5.1
Single Thread vs Multithreaded BbNN
Simulation Models
109
113
113
114
116
118
121
122
123
125
127
130
133
134
136
138
140
142
143
xi
6.5.2
6.5.3
7
Software vs Verilated BbNN Simulation
Models
Embedded Software vs Hardware BbNN
Implementation Models
CONCLUSION
7.1
Concluding the Experimental Results
7.2
Contributions
7.3
Future Work
REFERENCES
Appendices A – E
144
144
147
148
150
151
153
160 – 182
xii
LIST OF TABLES
TABLE NO.
5.1
5.2
5.3
5.4
6.1
6.2
6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
TITLE
All the required multiplication operands used for computing
Equation 5.12-5.15 used in this work.
Summary of ALU adder operation based on opcode.
The values of Sel Addr, Sel X and Sel Y based on the
operand counter that is required by the state machine.
Description of all the registers in the BbNN controller
module.
Distribution cases for Equation 6.1.
Distribution cases for Equation 6.2 with an ignored bias (b =
0).
All possible combination of offsets used on Equation 6.2 with
x1 and x2 preset to maximum positive values.
Parameters used for GA evolution for this analysis.
Statistics of generations required by GA to achieve a fitness
of 0.95 when training a 7 × 1 BbNN for classification of heart
arrhythmia.
XOR training dataset for the BbNN structure.
Parameters used for GA evolution.
Results on BbNN classification accuracy of heart arrhythmia
using individual record training.
Benchmarking results on BbNN classification accuracy of the
heart arrhythmia case study with all records combined.
Results on BbNN classification accuracy for Tomita
languages.
Results on BbNN configuration and optimum latency
achieved for each Tomita language.
Benchmarking results in classification accuracy for the
grammar inferencing case study.
The training accuracy and latency of all the case studies tested
with the verilated BbNN model.
PAGE
101
106
109
111
115
117
119
122
122
123
125
129
130
131
132
133
139
xiii
6.14
6.15
6.16
6.17
6.18
6.19
6.20
FPGA logic utilization of the proposed design in relation
to previous works using various neuron block array
configurations.
Selected FPGA prototyping platform and BbNN bit precision
for the proposed design in comparison with previous work.
Benchmark of the processors used in this work.
Performance analysis for the training of single and
multithreaded BbNN simulation models.
Performance analysis for the training of the software and
verilated BbNN simulation models.
Performance analysis for both embedded software and
hardware implementation BbNN models for various latency
values.
Performance analysis for the training of the embedded
software and hardware implementation BbNN models.
141
141
142
143
144
145
145
xiv
LIST OF FIGURES
FIGURE NO.
1.1
1.2
1.3
1.4
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.1
3.2
3.3
3.4
TITLE
A multilayer-perceptron ANN with a single hidden layer.
Neuron blocks interconnected as a scalable grid to form a
BbNN structure [8].
Example of a 2 × 2 BbNN structure configured in recurrent
mode.
Layers of the proposed embedded BbNN SoC.
A single neuron with three synapses in a conventional ANN.
A multilayer-perceptron ANN with a single hidden layer.
The observable bell shaped curve from the plot of a sigmoidal
function.
Depiction of a massively interconnected multilayerperceptron ANN.
Neuron blocks interconnected as a scalable grid to form a
BbNN structure [8].
Functional block diagram of a neuron block in hardware as
described in [20, 36, 37].
Behavioral flow chart for Simple GA [38].
Spectral analysis of heart rate using the three different power
bands.
ECG signals from a patient depicting (a) a normal heartbeat
and (b) a heartbeat with premature ventricular contraction
(PVC).
BbNN structure with a latency of three computational cycles.
Verification of the neuron block using Modelsim and DPIC interfacing with a golden reference software model as
comparison.
Methodology for co-simulation of the BbNN hardware design
using Verilator and C/C++.
Layers of the proposed embedded BbNN SoC.
PAGE
1
2
4
8
12
12
15
17
18
21
23
27
29
33
37
37
38
xv
3.5
3.6
3.7
3.8
3.9
3.10
3.11
3.12
3.13
3.14
3.15
3.16
3.17
3.18
3.19
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
Diagram of the connection between Nios2-Linux, FPGA,
Nios II CPU and the external hardware.
The testbench design environment [69].
Constrained-random test progress over time vs. directed
testing [69].
Modelsim compilation and simulation output for the
TestVerilog module.
Verification of a hardware module using Modelsim and DPIC interfacing with a golden reference software model as
comparison.
GTKWave displaying the waveforms of a simulated module.
The methodology of verifying a design using Verilator
by converting SystemVerilog hardware modules into cycle
accurate C/C++ libraries.
Output from the C/C++ verification program for testing the
verilated adder module.
HRV sleep state classification using BbNN.
QRS complex detection.
HR plotting at the center of every RR interval.
HRV extraction to obtain power bands.
Spectral analysis of HR.
Overview of ECG signal preprocessing and feature extraction.
ECG signal preprocessing flow.
Diagram depicting (a) neuron blocks interconnected as a
scalable grid to form a BbNN structure and (b) an example
of a configured neuron block.
All possible internal configurations of a neuron block.
Conceptual model of a single neuron block.
BbNN structure with a latency of (a) five computational
cycles and (b) three computational cycles.
Example of a recurrent mode 2 × 2 BbNN structure.
A flowchart depicting the (a) basic concept of the BbNN
training model and (b) multi-population parallel GA with
migration applied in this work.
Encoding the chromosome of a BbNN structure.
A r × c BbNN with detailed labeling of the input/output
interconnects.
Architecture of the proposed BbNN simulation model.
43
44
45
46
47
50
51
54
55
55
57
57
58
60
60
64
65
66
68
70
73
74
80
86
xvi
4.10
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5.15
5.16
5.17
5.18
6.1
6.2
6.3
6.4
CPU utilization during BbNN training.
System architecture of the proposed FPGA-based BbNN
SoC.
Layers of the proposed embedded BbNN SoC.
The Q4.11 fixed-point number format used in this work.
A neuron block with the maximum possible combination of
weights connected to the output neuron.
Performing addition (or subtraction) using fixed-point
arithmetic on Q4.11 type numbers.
Performing multiplication using fixed-point arithmetic on
Q4.11 type numbers with truncation.
Plot of the proposed piecewise quadratic (PWQ) function.
Functional block diagram of the proposed neuron block in
hardware.
Algorithmic state machine chart for the FSM control unit.
Individual interconnections of the multiplier unit within the
neuron block.
Block diagram of the dual-port on-chip memory of the neuron
block.
Using the direction bits to perform a conditional statement to
enable or disable the first operand.
The architecture of the input multiplexer within the neuron
block.
The architecture of the ALU within the neuron block.
The logic circuit for obtaining the select signal S2.
Logic circuit for determining the positive and negative limit
of the x value in register A.
Block diagram of the operand selector module within the
neuron block.
Functional block diagram of the BbNN controller module
designed for handling the CPU bus interface.
A neuron block with one synapse connected to the output
node.
Geometric structure of Equation 6.1 for both input cases with
weights ranging from −1 to +1.
Geometric structure of Equation 6.1 with weights ranging
from −3 to +3.
A neuron block with two synapses connected to the output
node.
86
88
88
90
90
91
92
95
97
99
100
102
103
104
105
106
107
108
110
115
115
116
116
xvii
6.5
6.6
6.7
6.8
6.9
6.10
6.11
6.12
6.13
6.14
6.15
6.16
6.17
6.18
6.19
6.20
6.21
6.22
6.23
6.24
Geometric structure of Equation 6.2 with weights ranging
from −1 to +1 using (a) ignored bias and (b) maximum bias
offset.
Geometric structure of Equation 6.2 with weights ranging
from −3 to +3 with (a) maximum positive bias offset and
(b) maximum negative bias offset.
A neuron block with three synapses connected to the output
node for mathematical analysis.
Geometric structure of Equation 6.3 for case 7 with weights
ranging from (a) −1 to +1 and (b) −3 to +3.
Geometric structure of Equation 6.3 for case 8 with weights
ranging from (a) −1 to +1 and (b) −3 to +3.
Geometric structure of Equation 6.3 for case 9 with weights
ranging from (a) −1 to +1 and (b) −3 to +3.
Evolved BbNN that solves the XOR problem.
Fitness trend of the BbNN evolution for the XOR problem
case study.
Evolved BbNN structure in driver drowsiness detection.
Fitness trend of the BbNN evolution for the driver drowsiness
detection case study.
Fitness trend of the BbNN evolution for the heart arrhythmia
classification case study.
Evolved BbNN structure for the classification of heart
arrhythmia using all records combined as training data.
Evolved BbNN structure for the classification of Tomita
language 3.
Fitness trend of the BbNN evolution for the grammar
inference of Tomita language 3.
Simulation results (in dumpfile form) for the dual-port onchip memory.
Simulation results (in dumpfile form) for the ALU
submodule.
Simulation results (in dumpfile form) for the FSM
submodule.
Simulation results (in dumpfile form) for the operand selector
submodule.
SystemVerilog testbench with DPI-C interfacing for coverification of the neuron block.
Results of the neuron block co-verification routine.
117
118
118
120
120
121
124
124
126
126
128
128
131
132
134
135
135
136
137
138
xviii
6.25
6.26
Co-simulation of the BbNN hardware design using Verilator
and C/C++.
Terminal output from the training algorithm after successfully
training the verilated BbNN hardware applied on case study
2.
139
140
xix
LIST OF ABBREVIATIONS
ALU
–
Arithmetic Logic Unit
ANN
–
Artificial Neural Network
BPF
–
Band Pass Filter
BbNN
–
Block-based Neural Network
CPU
–
Central Processing Unit
DDR SDRAM –
Double Data Rate Synchronous Dynamic Random-Access
Memory
DPI
–
Direct Programming Interface
DUT
–
Device-Under-Test
DUV
–
Device-Under-Verification
EA
–
Evolutionary Algorithm
ECG
–
Electrocardiogram
FFT
–
Fast Fourier Transform
FPU
–
Floating Point Unit
FPGA
–
Field Programmable Gate Array
FSM
–
Finite State Machine
GA
–
Genetic Algorithm
GCC
–
GNU Compiler Collection
GNU
–
Gnu’s Not Unix
GUI
–
Graphical User Interface
HDL
–
Hardware Description Language
HF
–
High Frequency
HPF
–
High Pass Filter
HRV
–
Heart Rate Variability
HW
–
Hardware
HW-SW
–
Hardware-Software
IC
–
Integrated Circuit
I/O
–
Input/Output
xx
LE
–
Logic Elements
LF
–
Low Frequency
LPF
–
Low Pass Filter
LUT
–
Lookup Table
Mbit
–
Mega Bits
MDIPS
–
Millions of Dhrystone Instructions Per Second
MHz
–
Mega Hertz
MLP
–
Multilayer perceptron
MMU
–
Memory Management Unit
ms
–
Millisecond
MUX
–
Multiplexer
MWIPS
–
Millions of Whetstone Instructions Per Second
NoC
–
Network-on-Chip
ns
–
Nanosecond
OS
–
Operating System
PC
–
Personal Computer
PWL
–
Piecewice Linear
PWQ
–
Piecewice Quadratic
RTL
–
Register Transfer Level
RTOS
–
Real Time Operating System
SoC
–
System-on-Chip
SOM
–
Self Organizing Maps
SW
–
Software
USB
–
Universal Serial Bus
VHDL
–
Very High Speed Integrated Circuit Hardware Description
Language
VLF
–
Very Low Frequency
WFDB
–
Waveform Database
xxi
LIST OF SYMBOLS
x0−3
–
Inputs of a neuron block
y0−3
–
Outputs of a neuron block
w0−5
–
Weight values for the synapses inside a neuron block
b0−3
–
Bias values for the nodes inside a neuron block
d0−3
–
Direction values for the neuron block interconnects
r
–
The amount or rows in a BbNN structure
c
–
The amount or columns in a BbNN structure
fBbN N
–
Fitness value of the BbNN structure
Perr
–
BbNN error parameter
PLQ
–
BbNN latency quality parameter
Ppen
–
BbNN penalty parameter
α
–
General scaling or modifier constant
β
–
General scaling or modifier constant
β1−3
–
BbNN parameter scaling constants
Ls
–
Specified latency value of the BbNN system
Lmin
–
Minimum possible latency value
Lmax
–
Maximum possible latency value
σ
–
Latency modifier for Lmax
φ
–
Latency offset for Lmin
Ninv
–
Number of neuron blocks with invalid configurations
g(x)
–
General representation of an activation function
sp
–
Activation function saturation point
Ca
–
Classification accuracy
TP
–
The number of true positives
FP
–
The number of false positives
TN
–
The number of true negatives
FN
–
The number of false negatives
xxii
LIST OF APPENDICES
APPENDIX
A
B
C
D
E
TITLE
Publications
BbNN Simulation Model Source Codes
BbNN Hardware Implementation Source Codes
Verification Routines
Octave Routines used for Experimentation and Data
Preparation
PAGE
160
162
167
176
182