Perceptron Learning Rule

CHAPTER 4
Perceptron
Learning
Rule
Ming-Feng Yeh
1
Objectives
How do we determine the weight matrix and
bias for perceptron networks with many
inputs, where it is impossible to visualize
the decision boundaries?
The main object is to describe an algorithm
for training perceptron networks, so that
they can learn to solve classification
problems.
Ming-Feng Yeh
2
History : 1943
Warren McCulloch and Walter Pitts introduced
one of the first artificial neurons in 1943.
The main feature of their neuron model is that a
weighted sum of input signals is compared to
a threshold to determine the neuron output.
They went to show that networks of these
neurons could, in principle, compute any
arithmetic or logic function.
Unlike biological networks, the parameters of
their networks had to designed, as no training
method was available.
Ming-Feng Yeh
3
History : 1950s
Frank Rosenblatt and several other researchers
developed a class of neural networks called
perceptrons in the late 1950s.
Rosenblatt’s key contribution was the
introduction of a learning rule for training
perceptron networks to solve pattern
recognition problems.
The perceptron could even learn when initialized
with random values for its weights and biases.
Ming-Feng Yeh
4
History : ~1980s
Marvin Minsky and Seymour Papert (1969)
demonstrated that the perceptron networks were
incapable of implementing certain elementary
functions (e.g., XOR gate).
It was not until the 1980s that these limitations
were overcome with improved (multilayer)
perceptron networks and associated learning
rules.
The perceptron network remains a fast and
reliable network for the class of problems that it
can solve.
Ming-Feng Yeh
5
Learning Rule
Learning rule: a procedure (training
algorithm) for modifying the weights
and the biases of a network.
The purpose of the learning rule is to
train the network to perform some task.
Supervised learning, unsupervised
learning and reinforcement (graded)
learning.
Ming-Feng Yeh
6
Supervised Learning
The learning rule is provided with a set of
examples (the training set) of proper
network behavior: {p1,t1}, {p2,t2},…, {pQ,tQ}
where pq is an input to the network and tq is
the corresponding correct (target) output.
As the inputs are applied to the network, the
network outputs are compared to the
targets. The learning rule is then used to
adjust the weights and biases of the
network in order to move the network
outputs closer to the targets.
Ming-Feng Yeh
7
Supervised Learning
Ming-Feng Yeh
8
Reinforcement Learning
The learning rule is similar to supervised
learning, except that, instead of being
provided with the correct output for each
network input, the algorithm is only given a
grade.
The grade (score) is a measure of the
network performance over some sequence
of inputs.
It appears to be most suited to control
system applications.
Ming-Feng Yeh
9
Unsupervised Learning
The weights and biases are modified in
response to network inputs only. There are
no target outputs available.
Most of these algorithms perform some kind
of clustering operation. They learn to
categorize the input patterns into a finite
number of classes. This is especially in such
applications as vector quantization.
Ming-Feng Yeh
10
Two-Input / Single-Neuron
Perceptron
p1
w11
p2
w12
a
n

b
1
n=0
p2
1
a0
1
Ming-Feng Yeh
a 1
W
point
toward
p1
1
n  Wp  b  w11 p1  w12 p2  b
a  hardlim (n)
Dicision boundary :
n  w11 p1  w12 p2  b  0
w11  1, w12  1, b  1
 n  p1  p2  1  0
The boundary is always
orthogonal to W.
T
p  2 0

 2 
 a  hardlim  1 1   1  1
0  

11
Perceptron Network Design
The input/target pairs for the AND gate are





0
0
1
1
p1   , t1  0, p 2   , t2  0, p3   , t3  0, p 4   , t4  1.
0
1
0
1





AND
W
Step 1: Select a decision boundary
Step 2: Choose a weight vector W that is
orthogonal to the decision boundary
 W  2 2
Step 3: Find the bias b, e.g., picking a
point on the decision boundary and
Dark circle : 1
satisfying n = Wp + b = 0
Light circle : 0
T
p  1.5 0  b  3
Ming-Feng Yeh
12
Test Problem
The given input/target pairs are




1
 1
0
p1   , t1  1, p 2   , t2  0, p3   , t3  0.
 2
2
 1




p1
p2
p3
Dark circle : 1
Light circle : 0
Ming-Feng Yeh
 Two-input and one output network
without a bias
 the decision boundary must pass
through the origin.
 The length of the weight vector
does not matter; only its direction is
important.
13
Constructing Learning Rule
Training begins by assigning some initial values for
the network parameters.


1
 p1   , t1  1, W  1.0  0.8.
 2



1 
a  hardlim  1.0  0.8    hardlim  0.6  0
 2 

 The network has not returned the
p1
correct value, a = 0 and t1 = 1.
p2
p3
Ming-Feng Yeh
W
 The initial weight vector results in a
decision boundary that incorrectly
classifies the vector p1.
14
Constructing Learning Rule
p1
p2
p3
One approach would be set W
equal to p1 such that p1 was
classified properly in the future.
W
Unfortunately, it is easy to
construct a problem for which
this rule cannot find a solution.
Ming-Feng Yeh
15
Constructing Learning Rule
p1
p2
W
p3
Another way would be to add W
equal to p1. Adding p1 to W
would make W point more in the
direction of p1.
If t  1 and a  0, then W new  Wold  p.
W
W
Ming-Feng Yeh
new
W
old
 1.0  1 2.0
 p1  
   

 0.8 2 1.2 
16
Constructing Learning Rule
p2
p3
 The next input vector is p2.

 1 
a  hardlim  2.0 1.2  
p1
 2 

 hardlim 0.4  1
W
 A class 0 vector was misclassified as
a 1, a = 1 and t2 = 0.
W
If t  0 and a  1, then Wnew  Wold  p.
W
Ming-Feng Yeh
new
W
old
2.0  1  3.0 
 p2        

1
.
2
2

0
.
8
    

17
Constructing Learning Rule
p1
p2
p3
 Present the 3rd vector p3

 0 
a  hardlim  3.0  0.8  
 1 

 hardlim 0.8  1
W  A class 0 vector was misclassified as
a 1, a = 1 and t2 = 0.
W
If t  0 and a  1, then Wnew  Wold  p.
W
Ming-Feng Yeh
new
W
old
 3.0   0  3.0
 p3  
    

 0.8  1 0.2
18
Constructing Learning Rule
 If we present any of the input vectors to the neuron,
it will output the correct class for that input vector.
The perceptron has finally learned to classify the
three vectors properly.
 The third and final rule: If t  a, then W new  Wold .
If t  1 and a  0, then W new  W old  p.
If t  0 and a  1, then W new  W old  p.
If t  a, then W new  W old .
 Training sequence: p1 p2 p3 p1 p2 p3 
One iteration
Ming-Feng Yeh
19
Unified Learning Rule
 Perceptron error: e = t – a
If t  1 and a  0, then W new  W old  p.
If t  0 and a  1, then W new  W old  p.
If t  a, then W new  W old .
If e  1, then W new  W old  p.
If e  1, then W new  W old  p.
If e  0, then W new  W old .
W new  Wold  ep  Wold  (t  a)p
b new  bold  e 1  bold  (t  a)
Ming-Feng Yeh
20
Training Multiple-Neuron
Perceptron
W new  Wold  epT  Wold  (t  a)pT
b new  bold  e  bold  (t  a)
Learning rate : 0    1
W new  Wold    epT
b new  b old    e
Ming-Feng Yeh
21
Apple/Orange
Recognition Problem



1
1

 1, t  0, p   1 , t  1
p

 1   1
 2   2









1

1
 
 



W  0.5 1 0.5, b  0.5
a  hardlim ( Wp 1  b)  hardlim (2.5)  1  e  t1  a  1
 new
old
T
new
old


W

W

e
p


0
.
5
0
0
.
5
,
b

b
 e  0.5

a  hardlim ( Wp 2  b)  hardlim (0.5)  0  e  t2  a  1
 new
old
T
new
old


W

W

e
p

0
.
5
1

0
.
5
,
b

b
 e  0.5

a  hardlim ( Wp 1  b)  hardlim (0.5)  1  e  t1  a  1
 new
old
T
new
old


W

W

e
p


0
.
5
2
0
.
5
,
b

b
 e  0.5

Ming-Feng Yeh
22
Limitations
The perceptron can be used to classify input vectors
that can be separated by a linear boundary, like AND
gate example.
 linearly separable (AND, OR and NOT gates)
Not linearly separable, e.g., XOR gate
Ming-Feng Yeh
23
Solved Problem P4.3
Design a perceptron network to solve the next problem
(1,2)
(1,2)
Class 2: t = (0,1)
(2,1)
(1, 1)
Class 3: t = (1,0)
(2, 2)
Class 1: t = (0,0)
(1,1) (2,0)
Class 4: t = (1,1)
(2,1)
A two-neuron perceptron creates two decision boundaries.
Ming-Feng Yeh
24
Solution of P4.3
Class 1
Class 2

1
1
0 
p1   , p 2   , t1  t 2    
1
 2
0 


2
 2
0 
p3   , p 4   , t 3  t 4    
 1
0 
1 

Class 3
Class 4

 1
 2
1 
p5   , p 6   , t 5  t 6    
2
1
0 


 1
  2
1 
p 7   , p8   , t 7  t 8    
 1
  2
1 

1
 3
0  
T
 b1 1 w p   3  1   1 x -3 T
1w  

x
1
  1  w T    3  1 
11  in 1 1 w   3 out 1
1
 W   T  
,b   

W


,
b

1
0
   T  T
   -1 w

 1  2
0
2


1

2
0
w


w


b

w
p


1

2

0




2


2
2
2
 2
0 
 y
y
 
 
in
out
-2
Ming-Feng Yeh
0
25
Solved Problem P4.5
Train a perceptron network to solve P4.3 problem using
the perceptron learning rule.
1 0
1
1
0
W(0)  
, b(0)   . p1   , t1   

0 1
1
1
0
1
 1
a  hardlim W(0)p1  b(0)      e  t1  a   
1
 1
 0  1
0
W(1)  W(0)  ep  
, b(1)  b(0)  e   

 1 0 
0
T
1
Ming-Feng Yeh
26
Solution of P4.5
 0  1
0 
1 
0 
W(1)  
, b(1)   . p 2   , t 2   

 1 0 
0 
 2
0 
0 
0 
a  hardlim W(1)p 2  b(1)      e  t 2  a   
0 
0 
 0  1
W(2)  W(1)  ep  
,

 1 0 
0 
b(2)  b(1)  e   .
0 
T
2
Ming-Feng Yeh
27
Solution of P4.5
 0  1
0
2
0 
W(2)  
, b(2)   . p 3   , t 3   

 1 0 
0
 1
1
1
 1
a  hardlim W(2)p 3  b(2)      e  t 3  a   
0
1
 2 0 
W(3)  W(2)  ep  
,

 1  1
 1
b(3)  b(2)  e   .
1
T
3
W(8)  W(7)  W(6)  W(5)  W(4)  W(3)
b(8)  b(7)  b(6)  b(5)  b(4)  b(3)
Ming-Feng Yeh
28
Solution of P4.5
 2 0 
 1
1
0
W(8)  
, b(8)   . p1   , t1   

 1  1
1
1
0
0
0
a  hardlim W(8)p1  b(8)      e  t1  a   
1
 1
 2 0 
T
W(9)  W(8)  ep1  
,

 0  2
 1
b(9)  b(8)  e   .
0
-1
xin -2
0
0
yin
-2
xout
yout
0
Ming-Feng Yeh
29