classification

Artificial Neural Networks
This is lecture 15 of the module `Biologically Inspired Computing’
An introduction to Artificial Neural Networks
Recall this from the first (overview) lecture:
Some things that classical computing is not good at.
Pattern Recognition
Classification
In week 1, we recognised that these were the same thing,
and defined the problem of classification, which is to to
see a complex pattern and assign the correct label to it.
Well, classical computational methods are fine if we
know the rules that underpin a classification task – but
when we don’t, they are useless.
`Brains’, however, seem to be very good at this task
Artificial Neural Networks
• An ANN is a bio-inspired machine learning
technique very, very widely applicable in almost
every area of industry and science.
• In this lecture we will look at the basic ideas
involved, which are really quite simple.
• Understanding how they are usually `trained’
needs a certain level of maths, but we won’t go
into that. Anyway, it turns out that we can train
them with EAs instead (in fact PSO is particularly
good at it …)
Real Neural Networks
The business end of this is made of lots of these joined in
networks like this
Our own computations are performed in/by this network
This type of computer is fabulous at pattern recognition
Real Neural Networks II
Aarghhh !!!
Black stripes
Yellow stripes
Mower
A part of my brain.
buzzing
Excitatory connection
Inhibitory connection
High level of excitation leads to neuron firing
An individual neuron receives electrical signals from other (excited)
neurons. If the total input is enough, it will become active, and send
signals out to those it is connected to.
Artificial Neural Networks
An artificial neuron (node)
An ANN (neural network)
Nodes abstractly model neurons; they do very simple number crunching
Numbers flow from left to right: the numbers arriving at the input layer get
“transformed” to a new set of numbers at the output layer.
There are many kinds of nodes, and many ways of combining them into
a network, but we need only be concerned with the types described here,
which turn out to be sufficient for any (consistent) pattern classification task.
A single node (artificial neuron)
works like this
3
2
1
-2
2
A single node (artificial neuron)
works like this
4
-3
3
2
1
-2
2
0
Numbers come along (inputs from us, or from other nodes)
A single node (artificial neuron)
works like this
4x3=12
2
-3x1=-3
-2
0x2=0
They get multiplied by the strengths on the input lines …
A single node (artificial neuron)
works like this
3
f(12-3+0)
2
1
-2
2
The node adds up its inputs, and applies a simple function to it
A single node (artificial neuron)
works like this
3
2 x f(9)
1
-2 x f(9)
2
It sends the result out along its output lines, where it will in turn get
multiplied by the line weights before being delivered …
Simple ANN Example
1
A
1
0.5
B
1
0.5
-1
This one calculates XOR of the inputs A and B
Each non-input node is an LTU (linear threshold unit), with a threshold of 1.
Which means: if the weighted sum of inputs is >= 1, it fires out a 1,
Otherwise it fires out a zero.
Computing AND with a NN
A
0.5
B
0.5
The blue node is the output node.
It adds the weighted inputs, and outputs 1 if the result is >= 1, otherwise 0.
Computing OR with a NN
A
1
B
1
The blue node is the output node.
It adds the weighted inputs, and outputs 1 if the result is >= 1, otherwise 0.
With these weights, only one of the inputs needs to be a 1, and the output
will be 1. Output will be 0 only if both inputs are zero.
Computing NOT with a NN
A
-1
1
Bias unit which always
sends fixed signal of 1
This NN computes the NOT of input A
The blue unit is a threshold unit with a threshold of 1 as before.
So if A is 1, the weighted sum at the output unit is 0, hence output is 0;
If A is 0, the weighted sum is 1, so output is 1.
So, an NN can compute AND,
OR and NOT – so what?
It is straightforward to combine ANNs together, with outputs from some becoming
the inputs of others, etc. That is, we can combine them just like logic gates on
a microchip.
E.g. this one computes (A AND B) OR NOT(A OR C)…
A
0.5
B
0.5
And you’re telling me this because … ?
Imagine this.
Image of handwritten character converted into array of grey levels (inputs)
26 outputs, one for each character
7
2
0
0
3
0
…
a
b
c
d
e
f
… …
Weights are the links are chosen such that the output corresponding to the
correct letter emits a 1, and all the others emit a 0.
This sort of thing is not only possible, but routine:
Medical diagnosis, wine-tasting, lift-control, sales prediction, …
Getting the Right Weights
Clearly, an application will only be accurate if the weights are right.
An ANN starts with randomised weights
And with a database of known examples for training
If this pattern
corresponds to a “c”
7
2
0
0
3
0
0
0
1
0
0
0
We want these
outputs
If wrong, weights are adjusted in a simple way which makes it more likely
that the ANN will be correct for this input next time
Training an NN
It works like this:
Send Training
Pattern in
Crunch to
outputs
Adjust
weights
All correct
STOP
Some wrong
Present a pattern as a series of numbers at the
first layer of nodes.
Each node in the next layer does its simple processing, and
sends its results to the next layer, and so on, until numbers
call out at the output layer
Compare the NN’s output pattern with the known correct pattern
for this input. If different, adjust the weights somehow to make it more
likely to be correct on this pattern next time.
`Classical’ NN Training
An algorithm called backpropagation BP is the classic way of
training a neural network.
Based on partial differentiation, it prescribes a way to
adjust the weights so that the error on the latest pattern would
probably be reduced next time.
However, we can instead use an EA to evolve the weights for a
NN. In this context, we can see BP as similar to a constructive
heuristic approach; it will provide fast results, but these results
will usually be at a poor local minimum.
The first ever application of particle swarm optimisation, on the
other hand, showed that it was faster than BP, with better results.
Generalisation
The ANN is Learning during its training phase
When it is in use, providing decisions/classifications for live cases it
hasn’t seen before, we expect a reasonable decision from it. I.e. we
want it to generalise well.
Suppose a network was trained with the black As and Bs; here, the black line is a
visualisation of it’s decision space; it will think anything on one side is an A, and
anything on the other side is a B. the white A represents an unseen test case.
In the third example, it thinks this is a B
A
A
A
A
A
B
B B
Good
generalisation
A
A
A
A
B
B B
Fairly poor
generalisation
A
A
A
B
B B
Stereotyping?
Coverage and extent of training data helps to avoid poor generalisaton
Main Point: when an NN generalises well, its results seems sensible, intuitive,
and generally more accurate than people
More next time
There are certain things you need to know about ANNs
example applications
how to avoid poor generalisation
other useful types of NNs, with applications
You will have to wait until Thursday to find out