Developing Function Mapping Neural Networks

UNIVERSITY OF MANCHESTER
SCHOOL OF COMPUTER SCIENCE
Third Year Project Report 2013
BSc (Hons) Computer Science
Developing Function
Mapping Neural Networks
Author: Cosmin Spataru
Supervisor: Dr. Richard Neville
April 2013
Abstract
Developing Function
Mapping Neural Networks
Author: Cosmin Spataru
Artificial neural networks are mathematical models capable of determining automatically
complex relationships between an input and an output set of data. Due to their ability to learn
by example, they became a popular tool for data analysis and modelling. Successful
implementations of neural networks are used in a variety of tasks, including credit card fraud
detection, automated trading and optical character recognition systems.
This report presents the engineering of a neural network system, which is capable of learning
to map different user defined functions. The resulting tool provides the functionality to create
different network structures and to perform learning with various network training
parameters. Due to its interactive nature, that enables one to see how the network adapts to
the desired function, the system can be used as a neural network education tool. This report
details all the key aspects of the steps undertaken to develop the system, from its inception, to
the validation and verification phase. Finally, the system is evaluated and a conclusion of the
work done is provided, along with possible future developments.
Supervisor: Richard Neville
Acknowledgements
I would like to thank my supervisor, Dr. Richard Neville, for the patience, guidance and
feedback offered throughout this project. I am also grateful to my brother, Adrian Spataru, for
the invaluable support offered during my entire academic experience. Thanks also go to
Corina Preda for criticising this report, and my family for their continued encouragement and
support.
Table of Contents
Chapter 1: Introduction ........................................................................................................... 7
1.1 Overview ........................................................................................................................ 7
1.2 Project Aim .................................................................................................................... 7
1.3 Report Structure ............................................................................................................. 8
Chapter 2: Background ............................................................................................................ 9
2.1 Chapter Overview .......................................................................................................... 9
2.2 An Introduction to Artificial Intelligence (AI) ............................................................... 9
2.3 An Introduction to Machine Learning .......................................................................... 10
2.4 Artificial Neural Networks (ANNs) ............................................................................. 10
2.4.1 Biological background ................................................................................................... 11
2.4.2 Artificial neurons ........................................................................................................... 11
2.5 Learning paradigms ...................................................................................................... 13
2.5.1 Perceptron ...................................................................................................................... 14
2.5.2 Delta Rule ...................................................................................................................... 15
2.5.3 Backpropagation and multi-layer nets ........................................................................... 15
2.6 Concluding Remarks .................................................................................................... 16
Chapter 3: Requirements ....................................................................................................... 17
3.1 Chapter Overview ........................................................................................................ 17
3.2 Software Development Methodology .......................................................................... 17
3.3 Requirements Elicitation and Analysis ........................................................................ 17
3.3.1 Stakeholders Identification ............................................................................................ 18
3.3.2 System Context Diagram ............................................................................................... 18
3.3.3 Use Case Diagram ......................................................................................................... 19
3.3.4 Scenarios........................................................................................................................ 20
3.4 Requirements Specification.......................................................................................... 21
3.4.1 Functional Requirements ............................................................................................... 21
3.4.2 Non-Functional Requirements ....................................................................................... 21
3.4.1 Hierarchical Task Analysis ............................................................................................ 23
3.5 Concluding Remarks .................................................................................................... 23
Chapter 4: Design ................................................................................................................... 24
4.1 Chapter Overview ........................................................................................................ 24
4.2 Architectural Patterns ................................................................................................... 24
4.2.1 Model-View-Controller paradigm ................................................................................. 24
4.2.2 Model-View-Presenter paradigm .................................................................................. 25
4.2.3 Chosen Architectural Pattern ......................................................................................... 25
4.3 Class Responsibility Collaboration model ................................................................... 25
4.4 System Sequence Diagram ........................................................................................... 26
4.5 System Class Diagram ................................................................................................. 27
4.6 Concluding Remarks .................................................................................................... 28
Chapter 5: Implementation .................................................................................................... 29
5.1 Chapter Overview ........................................................................................................ 29
5.2 Development Technologies .......................................................................................... 29
5.2.1 Programming languages ................................................................................................ 29
5.2.2 Application Programming Interfaces............................................................................. 29
5.3 Prototyping ................................................................................................................... 30
5.3.1 Proof-of-concepts prototypes ........................................................................................ 31
5.3.2 System prototypes.......................................................................................................... 32
5.4 Algorithmics ................................................................................................................. 33
5.4.1 Artificial Neural Network .............................................................................................. 33
5.4.2 Feed-forward propagation ............................................................................................. 34
5.4.3 BackPropagation............................................................................................................ 34
5.5 Additional key implementation aspects ....................................................................... 36
5.5.1 Object Serialization ....................................................................................................... 36
5.5.2 Multithreading ............................................................................................................... 36
5.5.3 System architecture implementation.............................................................................. 37
5.6 Concluding Remarks .................................................................................................... 37
Chapter 6: Results................................................................................................................... 38
6.1 Chapter Overview ........................................................................................................ 38
6.2 System Walkthrough .................................................................................................... 38
6.2.1 System parameters setup ............................................................................................... 38
6.2.2 Training results display ................................................................................................. 39
6.3 Sensitivity Analysis of ANN mathematical model parameters .................................... 41
6.3.1 Learning rate .................................................................................................................. 42
6.3.2 Momentum rate.............................................................................................................. 43
6.3.3 Activation function ........................................................................................................ 44
6.4 Concluding Remarks .................................................................................................... 45
Chapter 7: Testing .................................................................................................................. 46
7.1 Chapter Overview ........................................................................................................ 46
7.2 System Verification and Validation Overview ............................................................ 46
7.3 White Box testing ......................................................................................................... 46
7.3.1 Unit testing .................................................................................................................... 46
7.3.2 Regression testing .......................................................................................................... 48
7.4 Black Box testing ......................................................................................................... 48
7.4.1 System testing ................................................................................................................ 48
7.5 Concluding Remarks .................................................................................................... 49
Chapter 8: Conclusion ............................................................................................................ 50
8.1 General Conclusion ...................................................................................................... 50
8.2 Reflection ..................................................................................................................... 50
8.3 Future Development ..................................................................................................... 50
References ................................................................................................................................ 52
Appendix A .............................................................................................................................. 54
Appendix B .............................................................................................................................. 55
List of figures
Figure 1.1: Visual representation of project aim. ...................................................................... 7
Figure 2.1: Biological neuron [8]............................................................................................. 11
Figure 2.2: Nonlinear model of a neuron [9]. .......................................................................... 12
Figure 2.3: Plot of hyperbolic tangent and logistic activation functions. ................................ 13
Figure 2.4: Feed-forward topology. ......................................................................................... 16
Figure 3.1: System Context Diagram....................................................................................... 19
Figure 3.2: Use case diagram. .................................................................................................. 19
Figure 3.3: A hierarchical task analysis of the system functionalities. .................................... 23
Figure 4.1: Model-View-Controller Architectural Pattern. ..................................................... 24
Figure 4.2: Model-View-Presenter Architectural Pattern. ....................................................... 25
Figure 4.3: System Sequence Diagram. ................................................................................... 27
Figure 4.4: System Class Diagram........................................................................................... 28
Figure 5.1: Backpropagation proof-of-concept prototype. ...................................................... 31
Figure 5.2: Neural network structure (1-3-3-1) ....................................................................... 33
Figure 5.3: Compute neuron output. ........................................................................................ 33
Figure 5.4: Feed-forward propagation algorithm..................................................................... 34
Figure 5.5: Backpropagation algorithm. .................................................................................. 35
Figure 5.6: XML Neural Network object structure. ................................................................ 36
Figure 5.7: Model-View-Presenter system architecture. ......................................................... 37
Figure 6.1: Set function............................................................................................................ 38
Figure 6.2: Set ANN parameters. ............................................................................................. 39
Figure 6.3: Set backpropagation parameters. ........................................................................... 39
Figure 6.4: Training result window. ........................................................................................ 40
Figure 6.5: ANN mapping versus defined function. ................................................................ 40
Figure 6.6: Parameters setup tab. ............................................................................................. 41
Figure 6.7: Log Window. ......................................................................................................... 41
Figure 6.8: Error evolutions at various learning rates.............................................................. 43
Figure 6.9: Error evolutions at various momentum values. ..................................................... 44
Figure 6.10: Logistic vs Tanh activation: Training error evolution........................................ 45
Figure 7.1: An example of test fixture setup attribute. ............................................................ 47
Figure A. 1: CRC model. .......................................................................................................... 54
Figure B. 1: NUnit tests. ........................................................................................................... 55
List of tables
Table 3.1: Stakeholders ............................................................................................................ 18
Table 3.2: Functional requirements. ........................................................................................ 22
Table 3.3: Non-functional requirements. ................................................................................. 22
Table 4.1: NeuralNetwork class - CRC card ........................................................................... 26
Table 5.1: Main prototypes. ..................................................................................................... 30
Table 6.1: Learning rate analysis results.................................................................................. 42
Table 6.2: Momentum analysis results. ................................................................................... 43
Table 6.3: Activation functions analysis results. ..................................................................... 44
Table 7.1: System tests. ........................................................................................................... 49
INTRODUCTION
Chapter 1: Introduction
1.1 Overview
One of the major aims of the Artificial Intelligence field is to develop systems that are
capable to automatically model and learn complex relationships within data. Neural networks
represent an approach towards building such systems and their functionality is inspired from
the biological human brain. They are commonly used for classification, regression and
clustering tasks. This project focuses on the capability of a neural network mathematical
model to perform regression (or function approximation) on a set of inputs and outputs
generated from a user defined function. The following sections present the major aim of the
project and the how this report is structured.
1.2 Project Aim
The aim of this project is to learn about the underpinning theory of artificial neural networks
(ANN) and to implement a system that trains an ANN to map a mathematical function. The
system must be able to build different neural network structures and evolve their mapping
knowledge using a supervised learning paradigm in which a set of patterns (inputs from the
domain and desired outputs from the function co-domain) are presented to the network. The
final goal of a trained network is to be able to accurately generalise or produce reasonable
responses for unseen inputs. Figure 1.1 illustrates a visual representation of the project aim.
Figure 1.1: Visual representation of project aim.
7
INTRODUCTION
1.3 Report Structure
The remaining chapters of this report are structured as follows:
Chapter 2 Background: provides essential information related to the neural network
mathematical model and various supervised learning approaches employed in this project.
Chapter 3 Requirements: describes the requirements analysis process and provides a list of
formal and tabulated requirements.
Chapter 4 Design: presents the overall system design and an analysis of various architectural
patterns considered.
Chapter 5 Implementation: describes the algorithms implemented to train a neural network
and other technical implementation aspects.
Chapter 6 Results: highlights the results of the project through a system walkthrough and
an analysis which demonstrate the performance of the implemented neural network.
Chapter 7 Testing: describes the testing and evaluation methods employed throughout
development to ensure the application met its requirements.
Chapter 8 Conclusion: concludes the report by providing a reflection on the work done, as
well as discussing possible further plans for the system.
8
BACKGROUND
Chapter 2: Background
2.1 Chapter Overview
This chapter outlines the foundations of the research undertaken for this project. It begins
with a brief introduction of the broad Artificial Intelligence field and continues with a
description of one of its branches, Machine Learning, which is concerned with the study of
systems that can learn from data. Furthermore, the biological inspiration and the
mathematical theory of the Artificial Neural Networks model are described. The chapter
concludes with different learning algorithms which can be used to train a single or a multilayer neural network.
2.2 An Introduction to Artificial Intelligence (AI)
In recent decades, a significant amount of effort has been put into developing technologies
that attempt to resemble the human thinking processes, such as reasoning, learning and
analysing. The Artificial Intelligence field is the branch of Computer Science which tries to
support the development of these technologies. From the variety of definitions of AI, one of
the most accepted and comprehensive is the one provided by Sage, which states that Artificial
Intelligence represents
“the development of paradigms or algorithms that require machines to perform cognitive
tasks, at which humans are currently better” [1].
A machine or program can be called intelligent if it is able to do three main things [2]:
1. Store knowledge;
2. Apply the accumulated knowledge to resolve problems; and
3. Acquire new knowledge through a learning process.
The standard test of evaluating if a machine possesses intelligence similar to a human is the
Turing Test and it was first proposed by Alan B. Turing in 1950s. He stated that a machine is
intelligent only if a person is not capable of distinguishing between the machine and a real
human. So far, no machine has been able to pass the Turing test, although many trials were
conducted [3].
Nowadays, artificial intelligence systems are more employed than ever in everyday life. The
devices people use on a daily basis, such as mobile phones, cameras or tablets incorporate
intelligent technologies (such as voice and face recognition) which not only make our lives
better, but also greatly augment our human capabilities. Artificial Intelligence systems are
also used to enhance our security and provide us with answers when need. Examples of this
are the pieces of software which are used by planes to land or the knowledge search engine
9
BACKGROUND
websites, such as “WolframAlpha”, which computes question answers from structured data.
Although currently our lives are surrounded by various intelligent systems, there are still
many surprising simple things (such as playing the board game Go) in which current AI
systems are unable to perform well.
2.3 An Introduction to Machine Learning
Machine learning is the subdivision of artificial intelligence that is mainly concerned with the
development of computer programs that are able to learn from experience to solve problems,
rather than being programmed in advance. Machine learning encapsulates fundamental
concepts from a various set of disciplines, including probability and statistics, information
theory, physics, neurobiology and optimization [4]. Successful implementation of machine
learning algorithms are currently being used in a variety of tasks, including speech
recognition, optical character recognition, email spam detection, credit card fraud detection
and medical analysis and diagnosis [5].
The machine learning field encompasses a diverse set of learning paradigms. Major
approaches include Artificial Neural Networks (ANNs) and Support Vector Machines
(SVMs). While both approaches can be used for regression and classification, they have a
different theoretical foundation. Artificial Neural Networks are learning paradigms inspired
from the biological nervous system and their implementation is more heuristic, while the
Support Vector Machines are more strongly theoretical founded [6].
2.4 Artificial Neural Networks (ANNs)
An artificial neural network (also known as neural network or connectionism) is a
mathematical model inspired from the structure of the human brain and processes information
using a connectionist approach to solve specific problems. A neural network is composed of a
large number of simple processing units (neurons) and an internal representation which
describes the structure of the connectionism between the units. Similar to people, ANNs
acquire knowledge by examples, as they go through a learning process in which a desired
objective is presented to the network. During learning the connections strengths between the
neurons are adjusted so that the network final behaviour matches the expected one. At the end
of the learning process, the connections strengths between the network’s neurons encapsulate
the acquired knowledge. The main benefit of an artificial neural network model is its ability
to use the acquired knowledge to provide generalization; this refers to the neural network
capability to produce reasonable responses for inputs that were not used in the learning
process [2].
10
BACKGROUND
2.4.1 Biological background
The structure of the human brain and its techniques to solve problems represented the source
of inspiration for the neural networks models. The human brain contains approximately 100
billion nerve cells or neurons and every neuron communicates to the other neurons through
an estimated 10,000 or more synaptic connections. Although the events in biological neurons
are processed several times slower than in silicon logic gates, the brain, due to its large
number of neurons and the substantial number of connections between them, currently
surpasses in performance any human-made machine [2]. A biological neuron is the basic
processing unit which is fundamental to the operation of the human brain and consists of the
following three main components (illustrated in Figure 2.1):
1. Dendrites: are the neuron’s receptors which conduct the information or the electrical
“action potentials” received from other neurons to the cell body or soma;
2. Soma: represents the largest part of the nerve cell and contains the neuron’s nucleus,
which is the cell control centre. In the Soma, the information received from the
dendrites is processed and a response or a new electrical “action potential” is
generated, which is then sent to the axon [7]; and
3. Axons: These are extensions of the neuron cells and their role is to conduct the
information to the dendrites of neighbouring neurons. The junction between the axons
and the dendrites are called synapses and they are capable of transferring electrical
signals to other neurons very fast.
Figure 2.1: Biological neuron [8].
2.4.2 Artificial neurons
Artificial neurons are the basic building blocks of an artificial neural network and their
implementation and functionalities are derived from the biological neuron. The standard
model of an artificial neuron is illustrated in Figure 2.2 and its major elements are described
below:
1. Inputs: are equivalent to the biological dendrites and represent the information which
is fed to the neuron during the training process;
11
BACKGROUND
2. Weights: represent the connection strengths between neurons and are used to store the
knowledge from the training process. When a signal is propagated through the neuron,
each input is multiplied by its respective weight and this makes possible for the
neuron to attribute different levels of importance to inputs;
3. Adder function: is associated to the biological cell body and its responsibility is to
take all the inputs of the neuron and return the weighted sum of them. An additional
bias value can be added to the weighted sum in order to increase or lower the network
input to the activation function [2]; and
4. Activation function: takes the output of the adder function and maps it to a finite range
using a threshold step, linear or a non-linear (sigmoid) function.
Figure 2.2: Nonlinear model of a neuron [9].
The process for generating a response of an artificial neuron can be described by the
following equation:
∑(
where
neuron and
represent the neuron inputs;
the activation function [2].
)
the inputs-related weights of
Non-linear (sigmoidal) activation
The non-linear (sigmoidal) activation functions in a multi-layer network help introduce nonlinearity in the model by limiting (squashing) the output of a neuron to a specific range. For
the proposed system, two types of activation functions were analysed and implemented: the
logistic sigmoid and the hyperbolic tangent function (tanh).
1. The logistic function is mathematically described by the following equation:
12
BACKGROUND
where is the Euler’s number and is the slope parameter of the logistic function.
Increasing the slope parameter would make the function curve steeper, while
decreasing its value would make the curve flatter. Regardless of the input, the output
of the logistic function is a number between 0 and +1.
2. The hyperbolic tangent function is mathematically described by equation 2.3. Its form
is inspired from the logistic function in equation 2.2, the only difference being that it
contains a linear transformation which makes it have an antisymetric form in respect
to the origin [2]. The output of the tanh function is always a number between -1 and
+1. The tanh function was implemented to test the empirical hypothesis that using a
tanh function determines a faster convergence of the training algorithm than the
logistic function [10]. Figure 2.3 illustrates graphically the difference between the
logistic and tanh function.
Figure 2.3: Plot of hyperbolic tangent and logistic activation functions.
2.5 Learning paradigms
There are many approaches for training an artificial neural network. Two major learning
paradigms are supervised and unsupervised learning. In supervised learning a set of inputs
and their matching outputs patterns (targets) are provided to the network by an external
teacher, while in unsupervised learning only a set of unlabelled examples (without the output
mapping) is provided. Moreover, the scope of the unsupervised learning algorithms is to
determine how the data can be organized based on their collective properties, while the scope
of a supervised learning algorithm is to determine the set of network parameters or weights
that minimise the error between the desired and the actual results. For this project, only
supervised learning methods were analysed and implemented. This section details in a
13
BACKGROUND
gradual manner, starting from simple algorithms to more complex ones, the supervised
learning algorithms that were researched and implemented in the system.
2.5.1 Perceptron
The Perceptron is one of the early models of a linear classifier that was based on the ANN
paradigm. The model was introduced by Frank Rosenblatt in 1957 and consisted of a single
artificial neuron that used a threshold step function as the activation function. This means the
output value of the perceptron can take only two possible values (e.g. 1 or -1) as illustrated in
equation 2.4.
{
The perceptron model may have many inputs and each input must have a related weight
(connection strength). During training, the input weights are adjusted based on the error rate,
which is the difference between the desired and actual perceptron output that can be -1 or 1.
Equation 2.5 illustrates the weights update rule of the perceptron:
(
where
)
is the new weight, is the learning rate and governs the step size of the update,
) is the output error on pattern and
is the input of the
training pattern.
The perceptron training process consists of a continuous iteration through the training
samples, in which every training pattern is evaluated; in case of an error in the output, all the
neuron’s weights are updated according to the learning rule defined in equation 2.5. The
learning process stops when all the training patterns are classified correctly. Before starting
the training process it is usually advised to initialise the weights and the threshold to small
random values. If the training data is linearly separable, the perceptron learning algorithm has
been proven to converge on a solution in a finite number of steps; however, if the data is not
linearly separable, the learning algorithm will iterate infinitely through the training examples
and fail to find a solution [10]. Another limitation of the perceptron model is that its output
can take only two values, therefore the perceptron can only be used to distinguish between
two linearly separable classes.
The perceptron model limitations were first described in detail in 1969 by Minsky and Papert
in their book, “Perceptrons: An Introduction to Computational Geometry” [11]. As a result
of their research, the funding of the artificial neural network field sharply decreased in the
next years, especially in the USA; this is considered to have brought a significant prejudice
towards the neural network research area [12].
14
BACKGROUND
2.5.2 Delta Rule
The Delta Rule (also known as Least Mean Square rule) algorithm was introduced by
Widrow and Hoff in 1960. Its main idea is to adjust the weights (connections) of a network so
that the difference between the target and the obtained activation is minimised. This is
achieved by using the gradient descent technique to change the value of the weights to reduce
the network error (which is defined as a function of weights). For this project, the meansquare error (MSE) was used to measure the network’s error and its formula is illustrated by
the following equation:
∑
where represents the number of training patterns and
the difference between the
th
target and the actual activation on j training pattern. Unlike the perceptron algorithm which
uses a threshold step function to compute the output, the Delta Rule algorithm disregards it.
In its simple form, the Delta Rule can be written as
(
)
where
represents the ith input. When the neurons use a sigmoid activation function, an
extra term, the derivative of the sigmoid, must be added as illustrated in following equation
(
where
is the sigmoid derivative and
)
the activation value.
The steps of the Delta Rule algorithm are similar to the perceptron algorithm defined in
section 2.5.1 , the only difference being the weight update rule and the training stopping
criteria. While the training of the perceptron terminates when all the training patters are
classified correctly, the Delta Rule training usually stops when the rate of change of the error
is sufficiently small or the MSE is greater than a specific value [7]. A key advantage of using
the Delta Rule algorithm over the Perceptron is that the network output can map to any
number of values.
2.5.3 Backpropagation and multi-layer nets
One of the most widely used supervised learning techniques for training multi-layer nets is
the Backpropagation algorithm. A multi-layer net represents an aggregation of at least two
artificial neurons and is able to solve more complex problems (such as non-linear
classification) than a single neuron. There are many different ways of connecting the neurons
in a network, but for this project only the feed-forward topology was analysed. In this
topology, the information can flow only in one direction, from the input neurons to the
outputs, as illustrated in Figure 2.4. An alternative topology is the recurrent network in which
15
BACKGROUND
the information can also flow in the opposite direction, forming direct cycles between the
neurons.
Figure 2.4: Feed-forward topology.
The Backpropagation algorithm represents a generalization of the Delta Rule (defined in
section 2.5.2) for multi-layer networks. The aim of the algorithm is still to minimise the
network error, defined as a function of the weights, by using the gradient descent technique.
The major difference is that this time the error function now depends on the weights from two
or more layers of neurons. While the error of neurons from the output layer is easily
computed using the Delta Rule formula defined in equation 2.8, the error of neurons from the
hidden layers is a function of the errors of all neurons that use its output [13].
The Backpropagation learning algorithm can be split into three main phases:
1. Forward propagation: A training sample is propagated through the network, starting
from the input layer up to the output layer. Upon propagation, each neuron is
“activated” by computing its output based on the inputs received. The neurons in the
hidden layer use a sigmoidal activation function to introduce non-linearity in the
network. After a neuron from the output layer is activated, the network error (delta) is
computed by subtracting the neuron output from the expected target;
2. Backward propagation: The overall network error is propagated backwards,
distributing a part of the blame to each weight of every neuron; and
3. Weight update: Each neuron’s weights are updated based on their contribution to the
overall network output error.
All the three phases are repeated until a specific value of the network MSE is achieved or
until a different termination condition is met (such as a restrictive number of epochs).
2.6 Concluding Remarks
This chapter presented the background research of the artificial neural network mathematical
model together with a list of supervised learning algorithms that would be implemented in the
system. In the next chapter, a clear set of requirements for the proposed system are identified.
16
REQUIREMENTS
Chapter 3: Requirements
3.1 Chapter Overview
This chapter aim is to establish the requirements of the proposed system. These requirements
emerged after going gradually through a series of requirements engineering processes, such
as Requirements Elicitation, Analysis and Specification. The chapter begins with the
introduction of the software development methodology chosen and continues with the
Software Elicitation section in which stakeholders, use cases and scenarios are identified. The
chapter concludes with the Requirements Specification (section 3.4) in which the functional
and non-functional requirements gathered from the elicitation and analysis phase are listed.
3.2 Software Development Methodology
Before commencing the development process phase, several development methodologies
were analysed. Major differences were found between the traditional waterfall lifecycle and
other agile methodologies, mainly due to the fact that in waterfall it is assumed that all
requirements and a thorough design can be defined before programming. However, in
practice, a typical software project experiences a 25% change in requirements and this leads
to a high rate of defects and maybe failures in waterfall development. In contrast, agile
practices are backed by evidence to have a better productivity and low defect rates [14].
For the proposed system, the agile Extreme Programming (XP) development methodology
was adopted, as it includes an approach to testing which increases the chances of discovering
errors after system changes. The key aspects of testing in XP are [15]:
1. Test driven development;
2. Incremental test development;
3. User involvement in test development, acceptance and validation; and
4. Use of automated testing frameworks.
Extreme Programming also incorporates popular agile practices that have been proven to
increase the development productivity and defects rate. These practices include: short timeboxed iterations, evolutionary development, adaptive planning, incremental delivery and
other values that encourage agility [14].
3.3 Requirements Elicitation and Analysis
Discovering and understanding requirements from different system stakeholders represents a
challenging process. The requirements elicitation and analysis phase facilitates this process
through a specific set of activities. These activities are [15]:
17
REQUIREMENTS
1. Requirements discovery - refers to the interaction with the system stakeholders to
discover their requirements;
2. Requirements classification - refers to the collection of the requirements into
coherent clusters;
3. Requirements prioritization - concerned with how to prioritize requirements and
requirements conflicts through negotiation; and
4. Requirements specification - concerned with the documentation of the elicited
requirements.
The requirements elicitation and analysis activities were followed throughout the entire
system development process and they assisted in identifying and understanding the system
requirements. An important step in the requirements discovery activity is the identification of
the system stakeholders and their roles, which will be covered in following section.
3.3.1 Stakeholders Identification
System stakeholders are external entities or individuals which have a particular interest in a
system. Table 3.1 illustrates the stakeholders of the proposed project along with their title and
a description of their role.
Table 3.1: Stakeholders
Stakeholder
Title
Dr Richard Neville
Supervisor
Cosmin Spataru
Dr Liping Zhao
Training User
Testing User
Role
Overlooks and reviews the project development
process.
Developer
Responsible for all the development aspects, such as
design and implementation, of the system.
Second Marker Inspects and marks the project.
Virtual User
Constructs mathematical functions, training sets and
neural network structures. Sets the training
parameters and trains the constructed neural network.
Virtual User
Test the mapping accuracy of neural networks.
1
3.3.2 System Context Diagram
A System Context Diagram represents a picture of the environment in which the system
should operate. It helps to define the boundary of a system and also to determine the
exchange of events with its external entities (actors) [15]. Figure 3.1 represents a simple
context model for the proposed application.
18
REQUIREMENTS
Figure 3.1: System Context Diagram.
3.3.3 Use Case Diagram
Use cases are an effective technique for eliciting and analysing functional (behavioural)
requirements from the system stakeholders. As software artefacts, use cases identify the
interactions between the system and its external actors [15]. A use case diagram summarises
the entire behaviour of a system and its actors, and represents a great picture of the system
context [14]. After analysing the system requirements, two types of actors have been
identified: Training Users and Testing Users. Figure 3.2 illustrates the main use cases for
each of actor identified and represents a simple overview of the system interactions. The
complex use cases were modelled using the scenario technique in following section.
Figure 3.2: Use case diagram.
19
REQUIREMENTS
3.3.4 Scenarios
The scenarios are used to add more detail to complex use cases. A use case scenario
represents a detailed textual description of a user-system interaction session and identifies the
user intent and the flow of the actions performed within the interaction. In XP, scenarios are
also known as user stories [15]. This section will provide a scenario for the major use cases
identified in the Use case diagram in Figure 3.2.
Create new function
Scenario intent: Define a mathematical function.
Main flow of events: Users define a mathematical function using the wizard interface or the
setup parameters tab. A user needs to select the function type (polynomial, logarithmic or
trigonomethrical) first, and then set the function parameters and input range.
Set training sets’ sizes
Scenario intent: Create the training sets to be used for training a neural network.
Main flow of events: Users may modify the training sets by going to the setup parameters tab
and specify different sizes for the learning sets. This will cause the system to reset the
training, validation and testing sets.
Create Neural Network (NN) structure
Scenario intent: Define an ANN structure.
Main flow of events: A user may define a new neural network structure from the setup
parameters tab, by setting up the number of hidden layers and the neurons per layer. The
system will generate a new ANN data structure and will randomise its knowledge. After
creation, the current network testing error should be displayed to the user.
Train
Scenario intent: Train a Neural Network.
Main flow of events: The user needs to set up the training regime parameters via the wizard
template or the setup parameters tab. All the parameters should have default values. During
training, the system will display the training progress and log the current results to a file and a
window. When training has finished, the system will output the training results error plot, the
ANN mapping and the network testing error. The system should allow the user to continue to
train the network with different parameters if required.
Test ANN mapping
Scenario intent: Test the ANN mapping by propagating a value throughout the network.
Main flow of events: Users may test a network mapping by performing a feed-forward on a
trained network with a value from the input range. The system should output the network
mapping response to the input.
20
REQUIREMENTS
Export Neural Network structure to file
Scenario intent: The user wants to save a trained network structure to a file.
Main flow of events: Users selects a path where to save the network structure. The system
creates a new file in the selected path containing the trained network structure in a humanreadable format.
3.4 Requirements Specification
In order to develop a reliable, extensible and flexible system, an explicit set of requirements
needed to be defined. This set of requirements was discovered through consultation and
analysis with the project stakeholders and it described the scope of the system, the services
required and the constraints on the system’s operation and development. Five major
categories, Functional, Usability, Performance and Supportability (FURPS) were used as a
model to classify the system’s requirements. For a more common usage the requirements
were also categorized as functional (behavioural) in Table 3.2 and non-functional in Table
3.3.
3.4.1 Functional Requirements
Functional requirements describe the services the system should provide and how the system
should behave in particular situations [16]. Due to the fact that the development approach
was incremental and iterative, the functional requirements emerged in an evolutionary
fashion through a series of iterations, based on the constant feedback from the project
stakeholders. Table 3.2 illustrates the evolved list of functional requirements together with
their priority and risk associated. The priority attribute was used to identify the requirements
which had to be implemented first. A high priority requirement was tackled before a medium
one. The risk attribute was used to categorise the risk of implementing a requirement for the
whole system. A value of one represented a high risk requirement, while a value of three a
minimal risk.
3.4.2 Non-Functional Requirements
Non-functional requirements (NFRs) are the constraints on the services and functions offered
by a system [15]. They represent a list of qualitative attributes such as the usability,
reliability, maintainability or supportability, which may determine the architecture of a
system. Table 3.3 illustrates the list of non-functional requirements identified for this project
together with their corresponding FURPS category and a priority attribute.
21
REQUIREMENTS
Table 3.2: Functional requirements.
FR Identifier
FR1.1
FR1.2
FR1.3
FR1.4
FR1.5
FR2.1
FR2.2
FR3.1
FR3.2
FR3.3
FR4.1
FR4.2
FR4.3
Description
Priority
Risk [1-3]
The user should be able to input different types of polynomial,
trigonometric or logarithmic mathematical functions and their input
range in the system.
The user should be able to build a neural network structure using a
graphical interface or by importing the structure from a file.
The user should be able to set the size of the training, validation and
testing sets.
The system should automatically generate the training, validation and
testing sets using the user defined mathematical function.
The user should be able to set the following parameters of the
Backpropagation learning algorithm: learning rate, momentum, sigmoid
slope, maximum error to be achieved, maximum epoch cycles and
activation function.
The user should be able to train a neural network to map a function using
the Backpropagation learning algorithm.
During training, the system should display the current progress.
After training has finished, the system should plot the evolution of the
training and validation error on a chart.
The user should be able to see through an interactive graph how the
neural network adapts to a defined mathematical function.
The system should output the training results, such as the testing error
and epoch achieved.
The system should log the training feedback within a log file and a
separate log window.
The user should be able to test the mapping of the neural network.
The user should be able to save a neural network structure to a file.
Medium
3
Medium
2
Medium
3
Medium
1
High
2
High
1
Low
Medium
3
2
Medium
2
Medium
3
Low
3
Medium
Low
2
2
Table 3.3: Non-functional requirements.
NFR Identifier
NFR1.1
NFR1.2
NFR1.3
NFR2.1
NFR3.1
Description
Category
Priority
The user interface should be displayed in an intuitive and
accessible way. Complex input logic (function mapping
creation) should be displayed in a sequential manner through a
series of well-defined steps (wizard style).
The system should display error messages when incorrect input
is provided.
The graphical user interface should be responsive during the
training process.
The number of system crashes should be minimal.
Increasing or extending the functionality of the software
components should be done in an easy manner.
Usability
High
Usability
Low
Usability
Medium
Reliability
Supportability
High
Medium
22
REQUIREMENTS
3.4.1 Hierarchical Task Analysis
Hierarchical Task Analysis (HTA) represents a technique which uncovers a system’s
conceptual model from the perspective of a user. A HTA diagram is a hierarchical model
which specifies the overall goal of a system, together with the sub-tasks or operations
required for achieving the specified goal. The model is used to identify the flow of events and
the conditions and constraints which lead to the overall goal to be fulfilled [17]. A diagram
which depicts the overall goal of the proposed system is illustrated in Figure 3.3. This
diagram is part of the system specification and represents a possible flow of actions to train a
network to map a function.
Figure 3.3: A hierarchical task analysis of the system functionalities.
3.5 Concluding Remarks
This chapter focused on describing a set of formal requirements for the proposed system. The
next chapter presents the design techniques employed and the system conceptual model built
based on the requirements established.
23
DESIGN
Chapter 4: Design
4.1 Chapter Overview
This chapter presents the design artefacts of the proposed system, which were built based on
the requirements identified in Chapter 3. First an analysis of the considered architectural
patterns is presented, followed by a discussion of the various object-oriented techniques used
to build the system’s static and dynamic models. Finally, the system class diagram illustrating
the static structure of the system is presented, together with a detailed description of it.
4.2 Architectural Patterns
Architectural patterns refer to a set of known principles which provide help and guidance in
defining the architecture for a family of systems. They represent a good practice of system
organisations which have been successfully tried and tested in different environments [15].
For the proposed system, several architectural patterns were analysed. This section depicts
such patterns, focusing on the chosen one.
4.2.1 Model-View-Controller paradigm
The Model-View-Controller architecture separates the user interaction and presentation from
the system data. This is achieved by dividing the system into three logical components
(Model, View and Controller) and defining the interactions between them. Firstly, the Model
component represents the application or business logic and controls the system data and its
associated operations. When there is a change in its state, the Model component must notify
directly its associated Views. Secondly, the View component defines and controls how the
information from the model is presented. Finally, the Controller component manipulates and
interprets the user interactions by sending commands either to the View or to the Model
component [15]. The MVC architecture decouples the Model from the View component by
establishing a publish-subscribe relationship between them. A benefit of this separation is the
ability to attach multiple views to a model without having to implement any changes into it
[18]. Figure 4.1 presents the organisation of the MVC pattern.
Figure 4.1: Model-View-Controller Architectural Pattern.
24
DESIGN
4.2.2 Model-View-Presenter paradigm
The Model-View-Presenter architectural pattern represents a derivative of the MVC pattern,
aimed at providing a better separation of concerns between the components [19]. The system
is similarly divided in three logical components as in MVC, but with different
responsibilities. The Model component represents the business logic to be acted upon;
nevertheless, when there is a change in its state, the Model notifies the Presenter, not the
Controller, as in MVC. The View in MVP routes the user events to be handled by the
Presenter and does not contain any presentation logic. The Presenter in MVP is considered to
be the middle-man between the View and Model; additionally, it also incorporates the
presentation logic. Since there is no interaction between the Model and the View, the system
achieves a better coupling and facilitates testing procedures and the implementation of
different user interfaces. Figure 4.2 presents how the components in MVP interact.
Figure 4.2: Model-View-Presenter Architectural Pattern.
4.2.3 Chosen Architectural Pattern
Since the proposed system involved implementing a dynamic neural network structure and a
complex training algorithm (which involved computationally intensive calculations), the
testing phase represented an important process during development. Therefore, after the
analysis of both architectural patterns, it was decided that the new system would be built
using the Model-View-Presenter pattern because it facilitated the writing of unit tests.
Another benefit for choosing this pattern was the ability to easily change and add new user
interfaces. MVC was not taken into consideration due to the low dependency (notify events)
between the Model and the View components, which increases the complexity of running
unit tests.
4.3 Class Responsibility Collaboration model
The Class Responsibility Collaboration (CRC) cards are a popular Object Oriented (OO)
design technique which provides a useful and simple way of exploring the interactions
between the objects of a system [14]. A CRC model represents a set of textual CRC cards
where each card specifies a system class together with its responsibilities and collaborations.
25
DESIGN
A class responsibility refers to what the class knows and does, while the collaborations refer
to the interactions with other system classes. In eXtreme Programming (XP), the CRC cards
are used prominently as an OO design technique [20].
For the proposed system, the CRC model was used in the early stages of development, to
justify the existence of software classes in the system. Table 4.1 illustrates a CRC card from
the CRC model representing the NeuralNetwork system class. The rest of the CRC classes
are listed in Appendix A.
Table 4.1: NeuralNetwork class - CRC card
NeuralNetwork
Responsibilities
1
Reset network knowledge
Build network layer
Collaborators
NetworkLayer
BackPropagation
FMPresenter
4.4 System Sequence Diagram
System Sequence Diagrams (SSDs) are a valuable dynamic modelling artefact which
illustrates actor interactions and the operations initiated by their actions. They facilitate the
analysis of system behaviours to particular external events. A sole system sequence diagram
is a picture that depicts the flow of events in one particular use case [15].
For the proposed system, SSDs were implemented for each significant use case identified in
order to understand and precisely identify the external system events and their related
behaviour. Figure 4.3 represents a sequence diagram that illustrates what happens in the
system when the user wants to train a neural network. Furthermore, a detailed description of
the flow of the events is listed:
1. The training user clicks on the “Start Learning” button from the user interface. The
user interface sends the event to the FMPresenter object which contains the logic to
handle the event;
2. The FMPresenter triggers the set parameters event in the BackPropagation
(Backprop) object, which in turn communicates with the LogPresenter to start the
logging process. Then, the BackProp gets the training sets from the LearningSets
object and starts the training process;
3. During the training process the BackProp object updates the knowledge (weights) of
the NeuralNetwork object;
4. When the training process ends, the BackProp object returns the results back to the
FMPresenter together with the trained NeuralNetwork;
5. The FMPresenter interprets the results and sends an event to the GraphPlotter, which
contains the logic to plot the result charts; and
6. The LogPresenter returns the log data to the FMPresenter, which in turn displays the
results to the user using the graphical user interface.
26
DESIGN
Figure 4.3: System Sequence Diagram.
4.5 System Class Diagram
Class Diagrams are structural design diagrams which show the organisation of a system in
terms of its components and their relationships. They are widely used in object-oriented
modelling to illustrate the classes in a system, together with their responsibilities (attributes
and methods) and associations. A System Class Diagram represents the closest architectural
artefact which can be used to translate the design into an object oriented language code [14].
The class diagram in Figure 4.4 illustrates the core part of the overall system diagram. The
user interfaces were neglected from the Class Diagram, as new GUIs can be easily attached
and detached to the system as a consequence of the Model-View-Presenter architectural
pattern implemented.
As presented in the diagram, most of the system classes collaborate with the FMPresenter.
The role of the FMPresenter in the system is to coordinate the interactions between the
software classes in order to respond accordingly to the user events. To reduce the code
complexity and improve low coupling and high cohesion, some of the application logic from
the main FMPresenter class was transferred to other pure fabrication classes (which were not
in the problem domain), such as the GraphPresenter and the LogPresenter class.
For the implementation of an ANN structure, several classes have been used following the
separation of concerns principle. As it can been seen, the NeuralNetwork class is an
aggregation of NetworkLayer classes and it is only concerned with the creation and
27
DESIGN
initialization of NetworkLayer objects. The NetworkLayer class is an aggregation of Neuron
classes, and it is only concerned with the creation and initialization of Neuron objects. The
Neuron class contains the most important information of the neural network structure, the
knowledge (weights); hence, the Neuron class incorporates the logic to manipulate this data,
such as to compute activation values or reset weights.
Figure 4.4: System Class Diagram.
4.6 Concluding Remarks
This chapter outlined the object-oriented techniques used to create a conceptual model of the
system. The next chapter focuses on the implementation aspects of the system.
28
IMPLEMENTATION
Chapter 5: Implementation
5.1 Chapter Overview
This chapter provides details about the technical implementation aspects of the system. It
begins with an overview of the technologies used and the prototypes developed throughout
the project, followed by algorithms implemented in the current system. Other key features
implemented such as system architecture, multithreading and object serialization are also
discussed within this chapter.
5.2 Development Technologies
To facilitate the implementation of the system various technologies and tools were used. This
section details the programming languages, the development environments and the
application programming interfaces used.
5.2.1 Programming languages
A programming language plays a vital role in the successful implementation of a system.
Since the design of the proposed system was based on OO design principles, an objectoriented programming language had to be used. C# and Java represented the main options,
but C# was chosen for the following reasons: the author having previous experience in
working with it and that C# supports the use of the DynamicDataDisplay open source library,
which contains graphical display optimisations and facilitates the creation of interactive
graphs using dynamic data. Microsoft Visual Studio 2010 was chosen as the default
development environment for building the application because of its extensive set of tools
and features (such as Intelli-sense, Debugging tools, XAML editor) and the support for
integrating unit tests.
5.2.2 Application Programming Interfaces
The Microsoft’s .NET framework facilitated the implementation of the system through its
support for creating graphically user interfaces. The .NET Framework 3.0 and its following
releases contain the Windows Presentation Foundation (WPF) API which supports the
separation of the user interfaces from the business logic by using a XML-based language
(XAML) to define and link various user interface elements.
The DynamicDataDisplay (v0.3) API is an open source library which provides a set of WPF
controls for dynamic data visualisation [21]. This library was used to create dynamic charts
29
IMPLEMENTATION
which displayed the results from the training process and to provide interactive chart
manipulation tools, such as zooming.
The NUnit 2.0 API is an open source framework which provides a set of functionalities for
writing and running unit tests. The NUnit library was used to create separate unit tests cases
for the system’s class files, while the NUnit GUI was used to run the implemented test cases.
The IDE utilised for this project (Microsoft Visual Studio) closely integrated with NUnit to
run the tests and display the result inside the IDE.
5.3 Prototyping
Since the development approach was agile and flexible, prototyping was used to support the
incremental and evolutionary development of the system and help with the understanding of
the problem domain. Before commencing the implementation of the system, all the work was
split into time-boxed iterations, at the end of which a working prototype was produced. For
each planned prototype a number of functional and non-functional requirements (specified in
section 3.4) were allocated. For clarity, the implemented prototypes were categorised into
two types: system prototypes and proof-of-concept prototypes. A system prototype
represented an early instance of a system that could be improved in the subsequent iterations.
A proof-of-concept prototype represented an experimental model which was used to clarify
some of the functional requirements. The idea of the experimental model prototyping is to
write a minimal amount of code to clarify some aspects from the problem domain vision and
understand specific design implementations [22]. Table 5.1 illustrates the main prototypes
developed together with their category type and the functional and non-functional
requirements allocated. The following sections will detail the particularities of the major
prototypes implemented.
Table 5.1: Main prototypes.
No
Milestone
Type
FR/NFR
Interface
1
2
3
4
5
6
7
8
Perceptron
Delta Rule
Backpropagation
Core architecture
Dynamic results plot
Wizard GUI
Serialisation
Multithreading
Proof-of-concept
Proof-of-concept
Proof-of-concept
System prototype
System prototype
System prototype
System prototype
Final system
FR2.1
FR2.1
FR2.1
FR1.1 – FR2.1
FR3.1 – FR3.3
NFR1.1
FR4.3
FR2.2, NFR1.3
Command line
Command line
GUI
GUI
GUI
GUI
GUI
GUI
1
30
IMPLEMENTATION
5.3.1 Proof-of-concepts prototypes
Since the artificial neural network field is complex, in the early stages of the development
process three proof-of-concept prototypes were implemented. A proof-of-concept prototype
represented a system or model that was built to test specific concepts to facilitate the
understanding of the neural network field. The amount of code written for a proof-of-concept
prototype was minimal because it was not intended to be used in the final system. Moreover,
all the proof-of-concepts prototypes were built to fulfil only the high priority and risk
requirement FR2.1 (Table 3.2), which referred to the implementation of the Backpropagation
algorithm. The proof-of-concepts prototypes were developed gradually based on their
complexity, starting from simple learning algorithms and evolving to more complex ones.
The first proof-of-concept prototype encapsulated the simple Perceptron training algorithm
(section 2.5.1) and was used to train a single neuron to learn and map the logic AND
operation with two inputs. The second proof-of-concept prototype represented the
implementation of Delta Rule algorithm (section 2.5.2) and was used to train a single neuron
to map the logic OR operation. The first two algorithms implemented were unable to learn
non-linear functions, such as the logic XOR operation. The third prototype encapsulated the
implementation of the multi-layer network and the Backpropagation algorithm. By using a
non-linear activation function for each neuron, the ANN was able to learn the logic XOR
operation. To further understand how the backpropagation learning algorithm works, research
was undertaken to investigate the sensitivity of the training parameters. To facilitate and
assist the sensitivity analysis process, a graphical user interface was implemented for the third
prototype. Figure 5.1 shows a snapshot of the Backpropagation proof-of-concept prototype.
The first part of the GUI is concerned with the definition of a binary logic function, the
second part with the definition of the network parameters and the third part with the testing of
the neural network mapping.
Figure 5.1: Backpropagation proof-of-concept prototype.
31
IMPLEMENTATION
5.3.2 System prototypes
After the Backpropagation training logic was clarified, the work on the final system started.
Each system prototype implemented represented an early model of the ultimate system and
encapsulated in a graduate fashion the system requirements. The priority and risk attributes
from the system’s requirements specification (Table 3.2) had a major role in deciding what
functionalities to be implemented first. High priority and risk features were implemented in
the first prototypes (such as the Backpropagation algorithm), while low priority and risk
features were left for latter.
The first three system prototypes incorporated the core parts, the high priority features and
some of the usability requirements of the proposed system. This included the Model-ViewPresenter system architecture, the ANN data structure, the Backpropagation training
algorithm and various user interfaces. After refactoring the code to improve its readability
and complexity, the system was used as a model to be further developed in the fourth
prototype.
Since the project work was ahead of time, in the fourth prototype an addition to the standard
Backpropagation algorithm was analysed and implemented. This was an experimental
implementation and was not planned in the system requirements specification. The
experiment consisted of adding a separate learning rate for each of the weights in a network,
adapting their values during the training process and using the learning rates to update the
weights. This technique is known under the name of “Delta-bar-delta” algorithm and the idea
is to accelerate the learning process by increasing the learning rate if, in two or more
successive iterations, the network is heading into the same direction (sign of the partial
derivative of the error function with respect to the weights has not changed) and decelerate
otherwise [23]. A drawback of this technique was the fact that new constant parameters had
to be set in order to control the magnitude of the learning rates change and their values could
depend on the input given to the network [23]. The training results of the experiment were
not as conclusive as expected and did not improve significantly the speed of training. A
possible reason for this might be the insufficient analysis of the set-up of the extra parameters
required. Because of the inconclusive results and the fact that the new addition increased the
system complexity and coupling (for each weight, a new learning rate was added) it was
decided to disregard this technique in the final prototype.
The final prototype was concerned with the separation of the training thread from the user
interface thread, validation of the user input and finding the best values for the training
parameters after undertaking a sensitivity analysis. Moreover, although not in the
requirements, an alternative to the logistic activation function was added to the system in
order to analyse the role of an activation function in the training process; this newlyintroduced function is the hyperbolic tangent function, which under different training regimes
caused a faster convergence than when using a logistic activation. All the system functional
and non-functional requirements defined in section 3.4 were fulfilled in the final prototype.
32
IMPLEMENTATION
5.4 Algorithmics
5.4.1 Artificial Neural Network
An ANN structure was modelled as an aggregation of layers, each of which was modelled as
an aggregation of neurons. The neuron model is mainly composed of a set of inputs, their
related weights and an activation function which is used to determine the neuron output.
Furthermore, the implemented network model made use of two types of neurons:
1. A linear neuron which is used the network output layer and has no activation
function, in order to not restrict the output value; and
2. Non-linear neurons in the hidden layers, which contain a sigmoidal activation
function, in order to introduce nonlinearity in the network.
Figure 5.2: Neural network structure (1-3-3-1)
Figure 5.2 illustrates a representation of a standard network structure with two hidden layers
and three neurons per each layer, emphasizing where the linear neuron was used. Figure 5.3
illustrates how the output of single neuron was computed. Regardless of the user definition,
the standard ANN structure was restricted to contain only one neuron in the input layer and
one non-linear neuron in the output layer. The reason for this is that the training sets used
consisted of single input-output pairs.
Figure 5.3: Compute neuron output.
33
IMPLEMENTATION
5.4.2 Feed-forward propagation
Feed-forward refers to the propagation of an input through an ANN, from the input layer to
the output layer, in order to compute the network response to the input provided. The feedforward propagation is consistently used during training to compute the training pattern error
and update the weights. Figure 5.4 represents the pseudo-code of the algorithm implemented
to propagate an input through the network. The algorithm activates each neuron in a layer by
computing its output and then the results are fed as inputs for the neurons in the next layer.
The output obtained by the neuron in the last layer is considered to be the neural network
response to the input provided.
function propagateForward(trainingSample, network)
for each neuron in firstlayer(network) do
initialiseNeuronWithInput(network, trainingSample)
end for
for each layer in network do
for each neuron in layer do
output ← computeNeuronOutput(neuron, useLogisticActivation)
if type(layer) != outputLayer then
add output as input for each neuron in next layer
else
compute error by subtracting output from target
end if
end for
end for
return output
end function
Figure 5.4: Feed-forward propagation algorithm.
5.4.3 BackPropagation
The Backpropagation algorithm was the learning algorithm implemented in the final system
for network training. The aim of the algorithm is to update the network knowledge (weights)
so that for every training pattern provided a value close to the desired value is returned by the
network. In order to avoid overfitting and reach a better generalization of the network
mapping, the training data was split into three types of sets:
1. The training set was solely used to train the ANN. During training, the learning
algorithm tried to minimise the error on this set;
2. The validation set was used to evaluate the ANN mapping on unseen patterns. When
the error in this set was small, the training stopped; and
3. The testing set was used after the training has stopped in order to check the overall
generalisation or predictive power of the network.
34
IMPLEMENTATION
Figure 5.5 illustrates a pseudo-code of implemented algorithm which was derived from the
mathematical description provided in 2.5.3. The first step of the Learn function is to check
whether the training should start with a new network or should continue from a previous
saved network state. In case of a new network, the weights are reset and the current epoch is
set to zero. The second step contains the Backpropagation training body, which represents a
continuous iteration process through the training samples until a desired validation error or a
maximum epoch is achieved. An epoch represents going one time through all the samples in
the training set. A training epoch can be split into three main phases, namely a forward, a
backward and a weight update phase. The forward phase represents the propagation of a
training input through the network. The second phase distributes the network error (assigns
blame) to each of the weights in the network. The third phase updates the network weights
based on their contribution to the network output error. The applyMomentum function adds to
the weights update an amount of the value learned in the previous update in order to speed-up
the training. Once the iteration process ends the network predictive power on unseen pattern
is checked by computing the network’s responses on the testing set. A low error value would
indicate that a good generalisation has been achieved.
function Learn(continueTraining)
if continueTraining != true then
𝑐 ←0
resetWeights(network)
while
𝑐 < maxEpoch 𝒅𝒐
randomise(trainingSet)
for each sample in trainingSet do
network ← propagateInputsForward(network, sample)
» Forward
network ← propagateErrorBackward(network)
» Backward
network ← applyMomentum(network)
network ← updateNetworkUsingGradientDescent(network) » Update
end for
trainingError(epoch) ← computeError(network, trainingSet)
validationError(epoch) ← computeError(network, validationSet)
if validationError(epoch) < maxError then
break while
end if
epoch ←epoch + 1
end while
testingError ← computeError(network, testingSet) » Check network prediction
end function
prediction power
Figure 5.5: Backpropagation algorithm.
35
IMPLEMENTATION
5.5 Additional key implementation aspects
5.5.1 Object Serialization
For complex neural network structures, training represents a time consuming process. Object
serialization was considered in order to be able to continue training from a previous saved
network state. After analysing two different types of serialization (binary and eXtensible
Markup Language - XML), the XML serialization was chosen, because it saves the data into
a human readable format. Figure 5.6 illustrates the XML scheme used to store a neural
network object to a file. An alternative to the XML file storage was a relational database.
XML serialization was chosen over a relational database because it is more flexible to data
structure changes, while a relational database is more rigid.
<?xml version="1.0" encoding="utf-8"?>
<NeuralNetworkObj>
<NoOfLayers></NoOfLayers>
<NoOfNeurons></NoOfNeurons>
<Layers>
<NetworkLayer>
<Neurons>
<Neuron>
<Inputs></Inputs>
<Weights></Weights>
<WeightChange></WeightChange>
<Bias></Bias>
<BiasChange></BiasChange>
<Delta></Delta>
<NeuronType></NeuronType>
</Neuron>
</Neurons>
<LayerType></LayerType>
</NetworkLayer>
</Layers>
</NeuralNetworkObj>
Figure 5.6: XML Neural Network object structure.
5.5.2 Multithreading
In the early system prototypes the application was running on a single thread (UI thread) and
hence during a time-consuming network training process, the system could not respond to
any other operation, such as user interface interaction. Since the user could perceive this is a
frozen application and try to kill it, it was decided to delegate the training operation to a
separate, dedicated thread. Another benefit of this decision was that the system could inform
the GUI about the progress of the training operation. In order to implement this decision, the
.NET BackgroundWorker class was used. The main reason for using a BackgroundWorker
instead of a simple Thread object was that it provides extra functionalities such as an eventdriven API to report the progress of the operation (ProgressChanged) and indicate when the
operation has finished (RunWorkerCompleted). Since the training thread shared objects with the
main UI thread (such as ANN structure), hard-to-debug errors appeared especially when a
network feed-forward was performed from the GUI during training or when the log data was
displayed to the GUI. In order to solve this and to avoid other thread race conditions, locks
around objects and method synchronization attributes (which restrict the access to a particular
36
IMPLEMENTATION
method to one thread) had to be added to the system. As a consequence of the
synchronizations and locks added, the overall running time of the training thread increased.
5.5.3 System architecture implementation
According to the Model-View-Presenter pattern followed, the system classes were split into
three major categories (models, views, presenters), each having a different responsibility.
Figure 5.7 illustrates how the classes were split into categories. The model category was the
largest (9 classes) and contained the system logic classes such as BackPropagation,
NeuralNetwork and MathFunction. The view category consisted of three windows: a main
window which displayed all the system options, a wizard-style alternative to the main
window and a log window. In order to have a good separation of concerns, three different
presenters were implemented. The main presenter (FMPresenter) coordinated the main
operations of the system; these included creating a new thread for training, loading and
manipulating the user interfaces and the delegation of the rest of the tasks to the other
presenters. The second presenter (GraphPlotterPresenter) solely contained the logic for
plotting two charts representing the training results. The third presenter (LogPresenter)
contained the logic to create a log and to save it to file or display it to a window. The file log
was used for running a sensitivity analysis of training parameters since it encapsulated the
errors rates of each training epoch.
Figure 5.7: Model-View-Presenter system architecture.
5.6 Concluding Remarks
This chapter was focused on the main implementation aspects of the system. The next chapter
highlights the project results together with the analysis undertaken using the system to check
its performance and to find optimal values for training parameters.
37
RESULTS
Chapter 6: Results
6.1 Chapter Overview
This chapter aim is to demonstrate the capabilities of the final system. It begins with a system
walkthrough in which key features are presented from the perspective of a user. It also
includes an overview of a sensitivity analysis undertaken using the implemented system to
investigate the impact that the training parameters have on an artificial neural network
performance. The tabulated results of the analysis prove the performance of the implemented
neural network system.
6.2 System Walkthrough
This section presents the final version of the system from the perspective of an end user. The
system can be divided in two main parts: system parameters setup and training results
display.
6.2.1 System parameters setup
The first step when using the application is to set up the parameters for defining and training
a neural network model. This is achieved through a user-friendly wizard-style window, which
guides the user sequentially through five steps:
1. In the first phase (Figure 6.1), the user selects the type of function (such as
polynomial, trigonometric or logarithmic) and input its coefficients;
2. The second phase (Figure 6.1) requires the user to specify the input range (or interval)
of the function domain. This range is used to generate the training sets in the
following phase;
Figure 6.1: Set function.
3. In the third phase (Figure 6.2), the user specifies the number of learning samples of
the training, validation and testing sets. The sets samples are generated randomly
based on the function defined in the earlier steps;
38
RESULTS
4. The fourth phase (Figure 6.2) requires the definition of the network structure to be
trained. The number of hidden layers and the number of neurons per hidden layer are
specified in this phase; and
Figure 6.2: Set ANN parameters.
5. In the fifth phase (Figure 6.3), various backpropagation parameters (namely the
learning rate, the momentum, the logistic sigmoid shape, the maximum error and the
maximum epoch) need to be set and an activation function selected. At the end of this
phase, the system closes the wizard-style window and starts a new thread (second
thread) which is concerned only with training the defined network structure. The main
thread also opens a new window (Figure 6.4) in which the training progress and epoch
from the second thread is displayed. Other facilities of the new window are described
in the following section.
Figure 6.3: Set backpropagation parameters.
6.2.2 Training results display
The window displayed after setting the system parameters is illustrated in Figure 6.4. Once
the network training process finishes, the window is updated with the training results, which
are the testing error, the epoch achieved and three tabs containing different views. The first
tab shows a graph representing the evolution of the validation and testing error against
training epochs. The second tab (Figure 6.5) contains a graph which compares the mapping of
the trained neural network to the mapping of the defined mathematical function. A close fit
would represent that the network achieved a good generalisation performance. The third tab
(Figure 6.6) represents a control panel in which the users can change the system parameters
39
RESULTS
and perform different actions; these include resetting the training sets, resetting the network
weights, importing and/or exporting a network structure to and from a file.
Figure 6.4: Training result window.
At the bottom of the results window (Figure 6.4), functionalities to continue learning and to
test the mapping of the trained network are provided.
Figure 6.5: ANN mapping versus defined function.
40
RESULTS
Figure 6.6: Parameters setup tab.
Every time a training process finishes, the graphs are updated and a log window containing
the errors achieved in each training epoch will appear, as shown in Figure 6.7.
Figure 6.7: Log Window.
6.3 Sensitivity Analysis of ANN mathematical model parameters
Sensitivity analysis refers to the examination of how different values of an independent
variable impact other dependent variables under the same set of conditions. This technique
was used in this project to evaluate the role of the parameters of a neural network model and
to try to optimise these parameters to obtain the lowest error in the shortest training time. The
main reasons for the analysis were:
1. To investigate the impact that the training parameters have on the overall mapping of the
implemented neural network system (e.g. training time, network accuracy); and
2. To find the parameters values for which a network would produces the best output results
after learning the mapping of a function. Although these (optimal) values are dependent
41
RESULTS
on the function and network topology defined, they were set as default values in the final
implemented system.
The analysis was conducted on three neural network training parameters, namely the learning
rate, momentum and activation output function. A single parameter analysis consisted of
selecting different values of the parameter and for each of its values, a selected sample of ten
neural networks were trained for 1,000 epochs under the same set of conditions. For each
different value of a parameter an average of the training results was calculated. The purpose
was to determine how a parameter change impacts on average a neural network performance.
The results of the analysis are listed in the following sections.
6.3.1 Learning rate
The learning rate represents a parameter which controls the training step-size and is used to
update the network weights and bias. Table 6.1 contains the results obtained after training a
sample of ten networks with different weights at various learning rates. The objective of the
network training was to learn the mapping of the function
, with an input range
between -2 and +2. Columns two to five represents the network parameters which were
constant throughout the analysis. The topology column refers to the network structure, where
“1-10-10-1” represents a structure with one input, two hidden layers with ten neurons each
and one output. The “μ” column represents the momentum parameter, the “ρ” column
represents the slope parameter of the logistic activation function and the “Samples” column
refers to the size of the training set used. The next columns (six to nine) represent the results
of the analysis at different epochs. The “Avg | | 500 epochs” column refers to average of the
training errors (more explicitly mean-squared errors (MSE)) after 500 epochs. The “SD | |
500 epochs” column refers the standard deviation of the training errors after 500 epochs. A
small training error relates to a better network mapping (generalisation) achieved, while a
high value links to a worse one.
Table 6.1: Learning rate analysis results.
Learning
rate (α)
0.03
0.07
0.1
0.17
0.5
Topology
1-10-10-1
1-10-10-1
1-10-10-1
1-10-10-1
1-10-10-1
μ
ρ
0.3
0.3
0.3
0.3
0.3
0.7
0.7
0.7
0.7
0.7
Samples
100
100
100
100
100
Avg | |
500 epochs
0.088357
0.109008
0.15174
0.047581
1.075606
SD | |
500 epochs
0.034377
0.094142
0.183262
0.03376
0.243964
Avg | |
1000 epochs
0.0686534
0.0378803
0.0355612
0.0308256
1.221888
SD | |
1000 epochs
0.035358
0.012336
0.01251
0.01195
0.097216
As it can be seen in Table 6.1, the learning rate had a great impact on the average neural
network performance. On one hand, when small learning rates were used (such as 0.03), the
neural network learned slowly; on the other hand, when high learning rates (such as 0.5) were
used, the network was unable to converge and achieve a good result. Based on the results
obtained, it was decided to set the system default learning rate to 0.17, because the average
42
RESULTS
Error
error and standard deviation had the lowest values. Figure 6.8 illustrates an example
containing a comparison of how the training error evolved when different learning rates were
used. The errors are plotted on the vertical axis, while the training epochs on the horizontal
axis.
α=0.17
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
α=0.1
α=0.07
100
α=0.03
150
200
Epoch
Figure 6.8: Error evolutions at various learning rates.
250
300
6.3.2 Momentum rate
A momentum parameter is used to speed up the convergence and to maintain a good network
generalisation. This parameter also helps the network to avoid settling in a local minimum.
Table 6.2 illustrates the results of training a sample of ten neural networks using various
momentum values under the same set of conditions. The parameters related to table 6.2 are
defined in section 6.3.1 . From Table 6.2, it can be concluded that using a momentum (μ)
parameter between 0.1 and 0.5 increases the average performance of the network mapping
(generalisation). Based on the results obtained, it was decided to set the system’s default
momentum value to 0.3, as its related average error and standard deviation had the lowest
values.
Table 6.2: Momentum analysis results.
μ
0.1
0.3
0.5
0.9
Topology
1-10-10-1
1-10-10-1
1-10-10-1
1-10-10-1
α
0.1
0.1
0.1
0.1
ρ
0.7
0.7
0.7
0.7
Samples
100
100
100
100
Avg | |
500 epochs
0.068405
0.145991
0.075725
1.121849
SD | |
500 epochs
0.028509
0.186729
0.063315
0.228244
Avg | |
1000 epochs
0.0597602
0.0361801
0.0643988
1.1578406
SD | |
1000 epochs
0.030813
0.016414
0.075141
0.264172
43
RESULTS
Figure 6.9 illustrates an example of how the network training error evolved when the
momentum had different values.
Momentum=0.3
Momentum=0.1
1
Error
0.8
0.6
0.4
0.2
0
0
50
100
150
Epoch
200
250
300
Figure 6.9: Error evolutions at various momentum values.
6.3.3 Activation function
An activation function introduces nonlinearity in a neural network. Since the final system
incorporated two activation functions, an analysis of their impact had to be undertaken. Table
6.3 illustrates the results of training a sample of ten various networks (with network topology:
“1-10-10-1”) using first the logistic activation and then the hyperbolic tangent (tanh)
activation function. The parameters related to the table are defined in section 6.3.1 .
Table 6.3: Activation functions analysis results.
Activation
Function
ρ
μ
α
Logistic
Logistic
Logistic
Hyperbolic tangent
1
0.7
0.4
-
0.3
0.3
0.3
0.3
0.02
0.02
0.02
0.02
Samples
100
100
100
100
Avg | |
500 epochs
0.097613
0.09803
0.199817
0.087033
SD | |
500 epochs
0.060737
0.006554
0.04953
0.068283
Avg | |
1000 epochs
0.0713352
0.062783
0.1203151
0.0626328
SD | |
1000 epochs
0.058226
0.010538
0.037272
0.026168
The results in Table 6.3 indicated that when the tanh function was used, the average network
error dropped faster in the early training epochs (i.e. after 500 epochs); consequently, it can
be claimed that the result supported Guyon’s theory, which stated that the asymmetry of the
tanh function (centred in 0) appears to speed-up the learning [24]. Based on the results from
Table 6.3, it was decided to set the system’s default activation function to a logistic sigmoid
with a slope of 0.7, because it seemed to achieve the best trade-off between the average error
and the standard deviation value. The difference between the average error related to the
hyperbolic tangent function and the error related to the logistic sigmoid with slope of 0.7 was
44
RESULTS
considered to be insignificant. Figure 6.10 illustrates an example of how the error evolved
when different activation functions were used.
Logistic Activation
Tanh
Error
1
0.8
0.6
0.4
0.2
0
0
50
100
150
200
250
Epoch
Figure 6.10: Logistic vs Tanh activation: Training error evolution
6.4 Concluding Remarks
This chapter presented the results of the project through a system walkthrough and an
analysis which proved the performance of the implemented neural network system. The next
chapter focuses on the verification and validation techniques employed during development.
45
TESTING
Chapter 7: Testing
7.1 Chapter Overview
This chapter provides a description of the various testing strategies followed throughout the
implementation process, to verify and validate the system. White Box techniques, such as
unit and regression testing are discussed, highlighting how they have been applied, as well as
their importance in XP. The chapter concludes with the Black Box testing techniques (such as
system testing), used in the final stage of the development to ensure the system adhered to its
specification.
7.2 System Verification and Validation Overview
Within software development projects, verification and validation processes are used in
determining if the developed software conforms to the user specification and delivers the
required functionality. Although they are commonly confused, the verification process is
different than the validation one. Verification is used to determine if the software fulfils its
functional and non-functional requirements, while validation is used to establish if the final
software meets the user requirements [15]. The verification of the system was achieved by
implementing unit and regression tests, while system validation was achieved by
implementing system tests.
7.3 White Box testing
White Box Testing, also known as structural testing, refers to designing test cases based on
the internal structure, design and implementation of the tested system. In white box testing,
the tester must have full knowledge of the source code. A benefit of this approach is that
more tests can be written for those system parts that the developer knows they are more prone
to fail [25]. The White Box Testing technique was used throughout development to examine
each component of the implemented system. The following sections focus on the testing
strategies employed to investigate and check the internal structure of the system.
7.3.1 Unit testing
Unit testing (or component testing) is a technique used to ensure that a single component
performs its designated function correctly in isolation. It is a core technique in XP and it is
frequently applied in iterative software development [26]. In object-oriented programming, a
unit testing component is typically a single class or method, but it can also be a small cluster
of classes. A key aspect of unit testing is that their test cases are re-usable and can be easily
re-run in later stages to check if a component has changed its own behaviour or the behaviour
46
TESTING
of other components. Unit testing also makes the code refactoring process easier since unit
tests can verify if a small change in the system structure introduces any change in its
functionality.
The unit testing technique was frequently used from the commencement of the system
implementation. The CRC model (section 4.3) had a major role in determining the test case’s
input and expected output of the implemented unit tests. Since writing unit tests by hand is a
laborious process, a popular framework (NUnit) which handles test automation was used.
This also facilitated the regression testing procedure defined in section 7.3.27.3.2 , because
each time a software bug was solved, a large number of test cases could be easily run again.
The NUnit testing framework used facilitated the process of writing test cases and also
provided an automatic environment for running the tests. NUnit also provides various
compiler attributes (which are added before a method declaration) to make the writing of tests
cases easier. Two examples are [TestFixtureSetUP] and [TestFixtureTearDown], which
allow the execution of a specific code block prior or after the execution of all the tests in a
test class. The [TestFixtureSetUP] attribute was heavily used in the implemented test classes
(especially to initialise a neural network structure), so that more tests can get access to the
same object. Figure 7.1 illustrates an example of how the [TestFixtureSetUp] attribute was
used. The final prototype of the system incorporated 24 unit tests, many of which were
written according to the XP “writing test before code” methodology. Appendix B contains a
snapshot with a run of all the unit tests implemented, together with an example of a unit test
code.
[TestFixture] //used to mark a class as a test class
public class NeuralNetworkTests{
NeuralNetwork network;
[TestFixtureSetUp] //execute this code prior to any test
void initiliaseNetwork(){
NeuralNetwork ann = new NeuralNetwork(1, 2, 1);
//network initialisation
ann.NetworkLayer[0].Neurons[0].Weights[0] = 0.1;
. . .
ann.SetRho(1);
network = ann;
}
[Test] // used to mark a method as a unit test method
public void checkgetNNStructure_Return_1_2_1(){
//check the structure of the new created neural network
NeuralNetwork ann = network.Clone();
StringAssert.AreEqualIgnoringCase("1-2-1", ann.GetStructure());
}
} //NeuralNetworkTests class
Figure 7.1: An example of test fixture setup attribute.
47
TESTING
7.3.2 Regression testing
Regression testing is a technique which discovers new software bugs in a modified system by
running previously implemented tests. This ensures that any modification to a system (such
as adding a new functionality or fixing an existing bug) is not causing any unexpected side
effects [27]. Regression testing supports an iterative and incremental development and,
because of that, it represents a fundamental technique in any agile methodology. In XP,
regression tests are usually automated and the technique is frequently used during code
refactoring or component integration [26].
Regression testing was used regularly throughout development by repeating the previously
automated unit tests, after each major code modification. In the early stages of development,
a black box approach of this technique was to log to a file every system action (such as new
object instantiation or exceptions) and compare it to previously logs. While this technique
was useful when the system did not provide much functionality, it was not practical in later
stages due to the large size of the log file.
7.4 Black Box testing
Black Box Testing refers to the technique of testing a system without knowing its internal
logic or code structure. In Black Box Testing, the tester is concerned only with the
functionality of the system, regardless of its implementation [28]. After the implementation
of each software prototype, black box testing techniques was used to verify if the system
adhered to the user requirements and specification. The next section detail the Black Box
testing technique used to evaluate a prototype from a user perspective.
7.4.1 System testing
System testing (also known as functional testing) refers to the evaluation of an application as
a whole, to ensure that all components integrate well and the system complies with its
specified requirements. This approach is used to discover the system bugs that are revealed
only when the system components interact with each other to produce a desired behaviour
[15].
With the help of a test plan, system testing was conducted after the implementation of each
software prototype. The testing plan contained a series of tests in order to establish if a
prototype fulfilled its functional and non-functional allocated requirements. A simple test
contained a description, the requirements tested, an expected outcome and the actual test
result. Table 7.1 illustrates a part of the system testing conducted, together with their related
requirements.
48
TESTING
Table 7.1: System tests.
Test Description
FR
Train neural network
F2.1,
using Backpropagation F3.1,
F3.3
Export network to file
F4.4
Import network from
file
F4.4
Expected Outcome
Pass/Fail
The value of the weights should be different and the
testing error and current epoch achieved should be
displayed. Also a plot with the error evolution
throughout training and another one with the
network mapping should be displayed.
A file with an XML format containing the network
structure should be created in the desired location.
Network structure and testing error should update.
Also the current epoch should be set to 0.
Pass
Pass
Pass
7.5 Concluding Remarks
This chapter outlined the testing techniques employed during the system implementation. The
following chapter provides an overall conclusion of the project.
49
CONCLUSION
Chapter 8: Conclusion
8.1 General Conclusion
The overall goal of the project was successfully achieved; a neural network function mapping
system, which fulfils all the functional and non-functional requirements defined in section
3.4, was successfully implemented. The system uses one of the most popular supervised
learning techniques, the backpropagation algorithm, to train a user-defined neural network
structure to map a function. Due to the interactive nature that enables one to see how the
network adapts to the desired function and the application’s ability to easily change a training
parameter and monitor its impact, the implemented system can be utilised as teaching tool for
students to observe how the network knowledge changes throughout the learning process.
Moreover, the network structure and the backpropagation training source code of the system
were translated from C# to Java and were effectively used to predict the response of the
follower in the strategic economical competition game Stackelberg. The architecture of the
system, which was based on the Model-View-Presenter pattern, has greatly facilitated the
process of adapting the code to a different language and purpose.
8.2 Reflection
This project represented a great opportunity to research mathematical artificial neural
network models. The important lessons learned from the implementation and analyses of the
desired system are the following:
1. The performance of mathematical neural network models is problem-dependent.
Finding optimal training parameters represents a difficult process; however it is an
essential task in achieving an accurate generalisation performance;
2. When using multiple threads that depend on each other, extreme care needs to be
taken to avoid race conditions which might lead to hard-to-debug errors;
3. Following prototyping and agile methodologies ensures that a system evolves
gradually in iterations and that it fulfils the desired requirements; and
4. More time should be allocated to tasks which involve dealing with external APIs. The
external DynamicDataDisplay library used in the final system proved to contain small
issues and insufficient documentation.
8.3 Future Development
The implementation of the neural network function mapping system can be upgraded to solve
classification problems, such as handwritten character recognition system. This would
50
CONCLUSION
involve research into image processing techniques (such as skeletonization), which would
allow the extraction of a smaller set of features that represent an image structure.
Moreover, although different opinions exist regarding the fact that stock markets may be
unpredictable, the implemented neural network system can be tested to see if it can find a
correlation between the price of a financial instrument and other historical indicators, such as
the trading volume and previous closing price. Analysis with various feature extraction
techniques could be employed in order to determine the overall mapping or prediction
performance of the network.
Finally, different enhancements for the neural network learning algorithm, such as adaptive
learning rates or ensemble learning can be explored. In ensemble learning, multiple networks
can be trained to solve the same problem and their solutions can be appropriately combined
to provide better generalisation accuracy.
51
REFERENCES
References
[1] S. Haykin, in Neural networks: A comprehensive foundation, Prentice-Hall, 1999, p. 34.
[2] S. Haykin, Neural networks: A Comprehensive Foundation, New Jersey: Prentice-Hall,
1999.
[3] Oracle - ThinkQuest, “AI Basics,” [Online]. Available:
http://library.thinkquest.org/05aug/01158/. [Accessed 28 04 2013].
[4] T. Mitchell, Machine Learning, Singapore: McGraw Hill, 1997.
[5] M. Mohri, A. Rostamizadeh and A. Talwalkar, Foundations of Machine Learning,
Cambridge: MIT Press, 2012.
[6] F. Orabona, C. Castellini, B. Caputo, L. Jie and G. Sandini, “On-line Independent
Support Vector Machines,” 2009. [Online]. Available:
http://www.idiap.ch/~bcaputo/publik/09pr.pdf. [Accessed 27 04 2013].
[7] K. Gurney, An Introduction to Neural Networks, London: CRC Press, 1997.
[8] McGraw-Hill, “Neuron Diagram,” McGraw-Hill, [Online]. Available:
http://ygraph.com/chart/1600. [Accessed 27 04 2013].
[9] “Diagram of an artifical neuron,” Creative Commons, [Online]. Available:
http://commons.wikimedia.org/wiki/File:ArtificialNeuronModel_english.png. [Accessed
2013 04 27].
[10] C. M. Bishop, Neural Networks for Pattern Recognition, New York: Oxford University
Press, 1995.
[11] S. P. Marvin Minsky, Perceptrons: An Introduction to Computational Geometry,
Cambridge: MIT Press, 1969.
[12] D. S. Christos Stergiou, “Neural Networks,” [Online]. Available:
http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html#The Learning
Process. [Accessed 27 04 2013].
[13] D. Leverington, “A Basic Introduction to Feedforward Backpropagation Neural
Networks,” 2009. [Online]. Available:
http://www.webpages.ttu.edu/dleverin/neural_network/neural_networks.html. [Accessed
27 04 2013].
[14] C. Larman, Applying UML and Patterns - An Introduction to Object-Oriented Analysis
and Design and Iterative Development (3rd Edition), NJ, USA: Prentice Hall PTR ,
2004.
[15] I. Sommerville, Software Engineering (9th Edition), Pearson, 2011.
[16] I. Sommerville and P. Sawyer, Requirements Engineering - A good practice guide,
Wiley, 1997.
[17] P. Salmon, D. Jenkins, N. Stanton and G. Walker, “Hierarchical task analysis vs.
cognitive work analysis: comparison of theory, methodology and contribution to system
design,” Theoretical Issues in Ergonomics Science, vol. 11:6, pp. 504-531, 2010.
[18] E. Gamma, R. Helm, R. Johnson and J. Vlissides, Design Patterns - Elements of
52
REFERENCES
Reusable Object-Oriented Software, Addison-Wesley, 1994.
[19] M. Fowler, “GUI Architectures,” [Online]. Available:
http://martinfowler.com/eaaDev/uiArchs.html. [Accessed 2013 04 28].
[20] S. W. Ambler, “Class Responsibility Collaborator (CRC) Models,” [Online]. Available:
http://www.agilemodeling.com/artifacts/crcModel.htm. [Accessed 28 04 2013].
[21] CodePlex, “D3 Dynamic Data Display,” 2009. [Online]. Available:
http://dynamicdatadisplay.codeplex.com/. [Accessed 28 04 2013].
[22] S. McConnell, Code Complete - A practical handbook of software construction (2nd
edition), Microsoft Press, 2004.
[23] R. Rojas, Neural Networks: A Systematic Introduction, Berlin: Springer, 1996.
[24] I. Guyon, “Applications of Neural Networks to Character Recognition,” International
Journal of Pattern Recognition and Artificial Intelligence, vol. 5, pp. 353-382, 1991.
[25] L. Williams, “White-Box Testing,” 2006. [Online]. Available:
http://agile.csc.ncsu.edu/SEMaterials/WhiteBox.pdf. [Accessed 28 04 2013].
[26] D. Wells, “Unit Tests,” 2009. [Online]. Available:
http://www.extremeprogramming.org/rules/unittests.html. [Accessed 28 04 2013].
[27] Microsoft, “Regression Testing,” 2013. [Online]. Available:
http://msdn.microsoft.com/en-us/library/aa292167(v=vs.71).aspx. [Accessed 28 04
2013].
[28] G. Myers, The Art of Software Testing, John Wiley & Sons. Inc, 2004.
53
APPENDIX A
Appendix A
NeuralNetwork
NetworkLayer
Responsibilities
Collaborators
Responsibilities
Collaborators
Reset network knowledge
Build network layer
NetworkLayer
BackPropagation
FMPresenter
Build neuron
Neuron
NeuralNetwork
Neuron
BackPropagation
Responsibilities
Collaborators
Responsibilities
Collaborators
Randomise weights
Compute neuron output
NetworkLayer
Train Neural Network
Log results
NeuralNetwork
FMPresenter
LearningSets
LogPresenter
FMPresenter
GraphPlotterPresenter
Responsibilities
Collaborators
Responsibilities
Collaborators
Build Neural Network
Build Propagation
Update GUI
Start training
Report training progress
MathFunction
NeuralNetwork
BackPropagation
LogPresenter
GraphPresenter
Plot ANN function
Plot math function
Plot training error
Plot validation error
FMPresenter
MathFunction
LoggerPresenter
Responsibilities
Collaborators
Responsibilities
Collaborators
Compute function output
FMPresenter
Add log entry
Update file log
Update GUI log
BackPropagation
LearningSets
TrainingSet
Responsibilities
Collaborators
Responsibilities
Collaborators
Create TrainingSet
BackPropagation
TrainingSet
Shuffle training data
Sort training data
Store training data
LearningSets
LearningResults
Responsibilities
Collaborators
Store training results
BackPropagation
Figure A. 1: CRC model.
54
APPENDIX B
Appendix B
Figure B. 1: NUnit tests.
55