Neural Networks

Bias
b
x1
w1
Input
Signal
or
x2
w2

data
xm


Local
Field
v
Activation
function
 ()
Output
y
Summing
function
wm
Synaptic
weights
2
Bias is an external parameter of the neuron.
Can be modeled by adding an extra input.
x0 = +1
x1
Input
signal
v   wj xj
j 0
w0
w0  b
w1
x2

w2

xm
m
Local
Field
v
Summing
function

wm Synaptic
weights
Activation
function
 ()
Output
y
• Bias b has the effect of applying an affine
transformation to u
v=u+b
• v is the induced field of the neuron
v
u
m
u   wjxj
j 1
NN 1
•Synapse
•Dendrites
(Input)
Input
Equilibrium: Membrane Potential
Dendrites: Passive Conductance
Axon: Spikes (Hodgkin Huxley Eqns)
•Cell Body Output
•Axon
(Output)
Thresh
old
Time
-1
W = 1.5
output
X
t = 0.0
W=1
Y
{
output=
1 if  wi xi >t
0 otherwise
-1
W=?
x
W=?
t = 0.0
For AND
A B Output
00 0
01 0
10 0
11 1
W=?
y
•What are the weight values?
•Initialize with random weight values
-1
W = 0.3
x1
W = 0.5
t = 0.0
AND problem
A B Target
00 0
01 0
10 0
11 1
W = -0.4
x2
Belum
cocok dgn
target
b
I1
I2
Summation
Output
-1
0
0
(-1*0.3) + (0*0.5) + (0*-0.4) = -0.3
0
-1
0
1
(-1*0.3) + (0*0.5) + (1*-0.4) = -0.7
0
-1
1
0
(-1*0.3) + (1*0.5) + (0*-0.4) = 0.2
1
-1
1
1
(-1*0.3) + (1*0.5) + (1*-0.4) = -0.2
0
Epoch : Presentation of the entire training set to the neural
network.
In the case of the AND function an epoch consists of
four sets of inputs being presented to the network
(i.e. [0,0], [0,1], [1,0], [1,1])
Error:
The error value is the amount by which the value
output by the network differs from the target value.
For example, if we required the network to output 0
and it output a 1, then Error = -1
Target Value, T : When we are training a network we not only
present it with the input but also with a value that we
require the network to produce.
For example, if we present the network with [1,1] for the
AND function the training value will be 1
Output , O : The output value from the neuron
Ij :
Inputs being presented to the neuron
Wj :
Weight from input neuron (Ij) to the output neuron
LR :
The learning rate. This dictates how quickly the network
converges. It is set by a matter of experimentation. It is
typically 0.1
• In simple cases, divide feature space by drawing a
hyperplane across it.
• Known as a decision boundary.
• Discriminant function: returns different values on
opposite sides. (straight line)
• Problems which can be thus classified are linearly
separable.
Linear Separability
X1
A
A
A
Decision
Boundary
B
A
A
B
B
A
A
B
B
B
B
B
X2
<Training examples>
x1
x2
output
0
0
1
1
0
1
0
1
-1
-1
-1
1
Decision hyperplane :
w0 + w1 x1 + w2 x2 = 0
-0.8 + 0.5 x1 + 0.5 x2 = 0
x2
x1
0
0
1
1
<Test Results>
wixi
x2
0
1
0
1
-0.8
-0.3
-0.3
0.2
output
-1
-1
-1
1
-
+
-
-
x1
-0.8 + 0.5 x1 + 0.5 x2 = 0
• The two-input perceptron can implement the OR function when
we set the weights: w0 = -0.3, w1 = w2 = 0.5
<Train in g examp les>
x1
x2
o utput
0
0
1
1
x1
0
0
1
1
0
1
0
1
-1
1
1
1
<Test Res ults >
x2
wixi
0
1
0
1
Decision hyperplane :
w0 + w1 x1 + w2 x2 = 0
-0.3 + 0.5 x1 + 0.5 x2 = 0
-0.3
0.2
0.2
0.7
output
-1
-1
-1
1
x2
+
+
-
+
x1
-0.3 + 0.5 x1 + 0.5 x2 = 0
It’s impossible to implement the XOR function by a single
perception.
<Training examples>
x1
x2
output
0
0
1
1
0
1
0
1
-1
1
1
-1
A two-layer network of
perceptrons can represent
XOR function.
Refer to this equation,
x2
+
-
-
+
x1
•
In most common family of feed forward networks, called, “ multilayered perceptrons”, neurons are organized into layers that have
unidirectional connections between them.
•
Feed forward networks are memory-less- in the sense that their
output is independent of the previous network state.
•
Recurrent or feedback networks, on the other hand are dynamic
•
In dynamic systems, when a new input pattern is presented, the
neuron outputs are computed. Because of the feedback paths, the
inputs to each neurons are modified, which leads the network to
enter a new state.
•
Different network architectures require appropriate learning
algorithms.
• The architecture of a neural network is linked with the
learning algorithm used to train
• Three different classes of network architectures
– single-layer feed-forward
– multi-layer feed-forward
– recurrent
neurons are organized
in acyclic layers
1)
2)
3)
4)
5)
6)
7)
Feed Forward Networks
Feed Forward Back Propagation Networks
Recurrent (or feedback) Networks
Adaptive Resonance Learning Network Model
Hopfield Network
Self Organising Maps (SOM)
Learning Vector Quantisation (LVQ) Networks
Feed Forward Networks:
There are no feedback paths and you can not modify
outputs based on the errors between the actual output
and the desired outputs.
Feed Forward Back Propagation Networks
These types of networks are most common and enough
time has already been devoted on this network
architecture.
Input layer
of
source
nodes
Output
layer
of
neurons
A collection of neurons form a ‘Layer’
X2
X3
Direction of information flow
X1
X4
Input Layer
- Each neuron gets ONLY
one input, directly from outside
Hidden Layer
- Connects Input and Output
layers
Output Layer
- Output of each neuron
directly goes to outside
y1
y2
Input
layer
a
Hidden
layer
output
layer
b
• The input layer.
– Introduces input values into the network.
– No activation function or other processing.
• The hidden layer(s).
– Perform classification of features
– Two hidden layers are sufficient to solve any problem
– Features imply more layers may be better
• The output layer.
– Functionally just like the hidden layers
– Outputs are passed on to the world outside the neural
network.

f
x2

xn

x1
layer1

f
f

f

Layer 2

f
y1
f

f
y2
f

f
yn
Layer n
Input
layer
Output
layer
Hidden Layer
Input: X1 X2 X3
X1 =1
Output: Y
X2=-1
0.6
-0.1
X3 =2
-0.2
0.1
0.7
0.5
0.2
f (0.2) = 0.55
0.55
0.1
Model: Y = f(X1 X2 X3)
0.9
f (0.9) = 0.71
0.71
0.2 = 0.5 * 1 –0.1*(-1) –0.2 * 2
f(x) = ex / (1 + ex)
f(0.2) = e0.2 / (1 + e0.2) = 0.55
Predicted Y = 0.478
-0.2
-0.087
f (-0.087) = 0.478
0.478
Suppose
Actual Y = 2
Then
Prediction Error = (2-0.478) =1.522
• This 8x3x8 network was trained to learn the identity function.
• 8 training examples are used.
• After 5000 training iterations, the three hidden unit values
encode the eight distinct inputs using the encoding shown
on the right.
Training pattern
One of n inputs, each with 21 bits
From Bioinformatics by David W. Mount, p. 453
A
B
C
D
OCR
E
Hidden
Layer
Input
Layer
Output
Layer
1
1
-
A1
Am
-
-
-
-
Ai
1
-
Aj
1
Recurrent Network with hidden neuron(s): unit delay
operator z-1 implies dynamic system
• In these networks
there are feedback
loops are present.
• These networks can
learn from their
mistakes and are of
highly adaptive in
nature.
• These kind of
networks train
slowly and work
well with noisy
inputs.

f

f
x2

f

f
y2
xn

f

f
yn
x1
Lapisan 1
Lapisan 2
y1
• Various types of neurons
• Various network architectures
• Various learning algorithms
• Various applications
90% accurate
learning head pose,
and recognizing 1of-20 faces
38
40
• A problem-solving paradigm modeled after the
physiological functioning of the human brain.
• A typical training set contains over 100 nonhomologous protein chains comprising more than
15,000 training patterns.
• The number of training patterns is equal to the total
number of residues in the 100 proteins.
• For example, if there are 100 proteins and 150 residues
per protein there would be 15,000 training patterns.
• Hopfield networks use energy function as a tool for designing
recurrent network.
• It is a single layered network. The property of this network is is
that it produces a content- addressable memory, which correctly
yields the output from any sub-part (data) of sufficient size.
However, the storage capacity of patterns is limited to about 15%
of the number of nodes. This is an important limitation.
• This architecture is used for identification from partially visible
and/or noisy information- military targets identification/ robotic
control systems.
• The output of each PE is normally binary. The final output can be
binary or continuous type.
outputs
inputs
Hopfield Network
43
The Hopfield neural network is
perhaps the simplest type of
neural network. The Hopfield
neural network is a fully
connected single layer,
autoassociative network. This
means it has a single layer in
which each neuron is connected
to every other neuron.
Autoassociative means that if
the neural network recognizes a
pattern, it will return that pattern.
Neuron 1 (N1)
Neuron 2
(N2)
Neuron 3
(N3)
Neuron 4
(N4)
Neuron 1(N1)
(n/a)
N2->N1
N3->N1
N4->N1
Neuron 2(N2)
N1->N2
(n/a)
N3->N2
N3->N2
Neuron 3(N3)
N1->N3
N2->N3
(n/a)
N4->N3
Neuron 4(N4)
N1->N4
N2->N4
N3->N4
(n/a)
weights used to recall 0101 and 1010;
Neuron 1 (N1) 0 -1 1 -1
Neuron 2 (N2) -1 0 -1 1
Neuron 3 (N3) 1 -1 0 -1
Neuron 4 (N4) -1 1 -1 0
•
Sub-type of recurrent neural nets
–
–
–
–
•
Fully recurrent
Weights are symmetric
Nodes can only be on or off
Random updating
Learning: Hebb rule (cells that fire together wire
together)
– Biological equivalent to LTP and LTD
•
Can recall a memory, if presented with a
corrupt or incomplete version

auto-associative or
content-addressable memory
The Kohonen network (or "selforganizing map", or SOM, for
short) has been developed by
Teuvo Kohonen.
The basic idea behind the
Kohonen network is to setup a
structure of interconnected
processing units ("neurons") which
compete for the signal.
While the structure of the map may
be quite arbitrary, this package
supports only rectangular and
linear maps.

The SOM defines a mapping from the input data space
spanned by x1..xn onto a one- or two-dimensional array
of nodes.

The mapping is performed in a way that the topological
relationships in the n-dimensional input space are
maintained when mapped to the SOM.

In addition, the local density of data is also reflected by
the map: areas of the input data space which are
represented by more data are mapped to a larger area
of the SOM.
Each node of the map is defined by a vector wij whose elements are
adjusted during the training.
The basic training algorithm is quite simple:
― select an object from the training set
― find the node which is closest to the selected data (i.e. the distance
between wij and the training data is a minimum)
― adjust the weight vectors of the closest node and the nodes around
it in a way that the wij move towards the training data
― repeat from step 1 for a fixed number of repetitions
The amount of adjustment in step 3 as well as the range of the
neighborhood decreases during the training. This ensures that there
are coarse adjustments in the first phase of the training, while fine
tuning occurs during the end of the training.
A special feature of the particular implementation of this package is the
availability of cyclic maps. This means that the neighborhood is extended
beyond the map borders and wrapped to the opposite boundary. In this
case a rectangular map becomes a torus, a linear map will be a circle.
The Kohonen map reflects the inner structure of the training data.
However, one cannot say which neurons are activated by which input
vectors. In addition, the neurons corresponding to some input vectors after
a particular training, will correspond to another set of vectors after another
training run.
So the SOM has to be calibrated. This can be achieved by presenting well
known examples to the net and by recording which neuron is activated
with a given example vector. As Kohonen maps tend to form some kind of
elastic surface on the range of input vectors of the training data, neurons
which are not activated in the calibration process may be interpreted by
interpolation.
Step 1: Input & Ouput
Input: X1 X2 X3
Output: Y Model: Y = f(X1 X2 X3)
# Input Neurons = # Inputs = 3
# Hidden Layer = ???
# Neurons in Hidden Layer = ???
# Output Neurons = # Outputs = 1
Try 1
Try 2
Architecture is now defined …
No fixed strategy.
By trial and error
How to get the weights ???
Given the Architecture There are 8 weights to decide.
W = (W1, W2, …, W8)
Training Data: (Yi , X1i, X2i, …, Xpi ) i= 1,2,…,n
Given a particular choice of W, we will get predicted Y’s
( V1,V2,…,Vn)
They are function of W.
Choose W such that over all prediction error E is minimized
E =  (Yi – Vi) 2
Step 2: Training the Model
Back Propagation
Feed forward
• Start with a random set of weights.
• Feed forward the first observation through the net
X1
Network
V1
; Error = (Y1 – V1)
• Adjust the weights so that this error is reduced
( network fits the first observation well )
• Feed forward the second observation.
Adjust weights to fit the second observation well
• Keep repeating till you reach the last observation
E =  (Yi – Vi) 2
• This finishes one CYCLE through the data
• Perform many such training cycles till the
overall prediction error E is small.
Question?