Derivation of Backpropagation for Sigmoid

Neural Network Assignment
Due December 10th
The first two questions are required. Question three is 10% extra credit.
1. You are given a simple Perceptron with 3 inputs, A, B, and C that uses a step
function with threshold of 0.5 (on the step boundary the output of the step
function is not defined). Find weights between 0 and 1 such that the output is:
(There are multiple correct answers, below is just one.)
a. (A and B) or C A = B = 0.3, C =0.6
b. A and (B or C) A = 0.4, B = C = 0.2
2. You are given the following Perceptron to train on the xor function. Fill out the
table provided in order to determine what the new weights will be after two
training iterations. Use the weight modification rules provided in class for a
sigmoidal excitation function, with the following change. For the hidden and
output neurons, there is a bias that is applied to the sigmoid functions. This can be
interpreted as implied neuron that always fires and whose weight is set equal to
that bias. This allows one to learn biases for the neurons as well as learning the
weights. Use  = 0.5.
a
b
h1
h2
o
Initial
t
1
New
2
weights
Case
ab=00
ab=01
ab=10
ab=11
weights
ab=00
ab=01
ab=10
ab=11
erro
errh1
errh2
wbias,h1
0.68
∆wbias,h1
-0.6328
0.3562
0.3572
-0.6512
-0.0279
0.0155
0.0156
-0.0281
-0.0279
0.0155
0.0156
-0.0281
-0.0031
0.0012
0.0015
-0.0016
0
0
0.0015
-0.0016
0
0.0012
0
-0.0016
wbias,h1
wa,h1
wb,h1
wbias,h2
wa,h2
wb,h2
wbias,o
wh1,o
wh2,o
0.6779
0.3799
0.8295
0.4978
0.7095
0.4297
0.2344
0.1409
0.1432
-0.0023
0.0010
0.0012
-0.0012
0
0
0.0012
-0.0012
0
0.0010
0
-0.0012
-0.0024
0.0013
0.0012
-0.0014
0
0
0.0012
-0.0014
0
0.0013
0
-0.0014
-0.0722
0.0462
0.0463
-0.0729
-0.0479
0.0378
0.0344
-0.0633
-0.0449
0.0331
0.0357
-0.610
-0.6027
0.3888
0.3896
-0.6169
-0.0203
0.0130
0.0131
-0.0205
-0.0207
0.0132
0.0133
-0.0209
wa,h1
0.38
∆wa,h1
wb,h1
0.83
∆wb,h1
wbias,h2
0.5
∆wbias,h2
-0.0033
wa,h2
0.71
∆wa,h2
wb,h2
0.43
∆wb,h2
wbias,o
0.30
∆wbias,o
wh1,o
0.19
∆wh1,o
wh2,o
0.19
∆wh2,o
0.0016
0.0014
-0.0019
0
0
0.0014
-0.0019
0
0.0016
0
-0.0019
-0.0735
0.0408
0.0410
-0.0740
-0.0488
0.0335
0.0305
-0.0643
-0.0457
0.0293
0.0316
-0.0619
Neural Network Assignment
3. Attached is a derivation for the backpropagation rules for Perceptrons that use a
sigmoidal excitation function. Derive an appropriate error function and weight
update rule for Perceptrons that use a tanh excitation function. Make the rules as
simple as possible by choosing appropriate substitutions – such as
 
 tanh x 
o i h  for tanh xi  wi i h  . (Hint:
 sec h 2 x   1  tanh 2 x  )
x
From
P x ,c
wi i , j h 
P x ,c
wi i , j h 
P x ,c
wi h , j o 


~   

~  
 2 f f x  win   whid whid f x  w j h  xi  out , we get
 2 xi 1  ok2o  w j[ h ],k o  1  o 2j h   out . Comparing that to


 
~   

 2 f xi , wi i h  f f x  win   whid  out  2oi h  1  o 2j o   out , we see that if


you define  j h  w j[ h],k o  1 ok2o  out , then
wi , j  

x ,c T
P x ,c
wi , j
P x ,c
wi , j
 2oi 1  o 2j  j , where
still holds.
Page 2 of 4
Neural Network Assignment
Derivation of Backpropagation for Sigmoid
For each time step, we present all training data to our multi-layered Perceptron.
The change in the weight from “neuron” i to “neuron” j (where i’s output is being fed
into j) is given by
P x ,c
.
wi , j   
x ,c T wi , j
Where β is a parameter that controls how quickly the weights change, and P x,c is the
performance of the network for the training data with input x and correct output c. This
performance is defined as
 
2
P x ,c  F w, x   c  .
If we define the output error (i.e., the error at the last level of our Perceptron) to be
 
 out  F w, x   c  ,
then clearly we can rewrite the performance as
2
.
P x ,c   out
We can now calculate the derivative of our performance with respect to a given weight
wi , j as
 
P x ,c

F w, x 
 2
 2
wi , j
wi , j
wi , j
We will now adopt the convention that neuron i will be identified as an input
neuron by the addition of the symbol [i], as a hidden neuron by the addition of the symbol
[h] (we could use [h1] and [h2] if we had two hidden layers, and as an output neuron by
the addition of the symbol [o]. Therefore a weight connecting input neuron i to hidden
neuron j would be represented by wi i , j h  . Additionally, all inputs to a hidden neuron i


will be represented as xi , and the weights associated with those inputs by wi . The output
 
of this neuron is therefore f  xi  wi i h  or o i h  . We will also use the convention that for
  
an output neuron j, the inputs can be represented as f  x  w , so that the output from that
  

neuron is f f x  win   whid or o j o  . This notation allows us to see that there are two
 
special cases to consider when taking the derivative of F w, x  with respect to wi , j .

The first case is if wi , j  whid . Using the chain rule we find
 

 
F w, x  ~   
 f f x  win   whid f xi , wi i h  ,
wi h , j o 
~
~
where f x   f x  . For the sigmoid function f x   f x 1  f x  .
x

The second case is if wi , j  win . After a double application of the chain rule we find
 

~  
F w, x  ~   
 f f x  win  whid whid f x  w j h  xi .
wi i , j h 
This leads to performance derivatives that can be written as






Page 3 of 4
Neural Network Assignment
P x ,c
wi h , j o 
P x ,c
wi h , j o 




 
~   

 2 f xi , wi i h  f f x  win   whid  out , or


  
  
 


 2 f xi , wi i h  f f x  win   whid 1  f f x  win   whid  out ,
 
for the hidden neurons. If we use the alternative identifications oi h   f xi  wi i h  and
  

o j o   f f x  win   whid , this becomes


P x ,c
 2oi h o j o  1  o j o   out .
wi h , j o 
Likewise, for the input neurons, we find
P x ,c
~   

~  
 2 f f x  win   whid whid f x  w j h  xi  out , or
wi i , j h 

P x ,c
wii, jh





 2 xi o jh 1  o jh w j[ h ],k ook o 1  ok o  out .
If we define the error for the hidden neurons to then be  j h   w j[ h ],k o ok o  1  ok o   out ,
and the inputs xi to be considered as the outputs from the input neurons oi i  , then we can
finally combine both equations into
P x ,c
 2oi o j 1  o j  j .
wi , j
Page 4 of 4