323-670 ปัญญาประดิษฐ์ (Artificial Intelligence)

Chapter 6
Neural Network
Hopfield Networks
• Hopfiled [1982] : theory of memory p.490
• Model of content addressable memory p. 491
–
–
–
–
distribute representative
distributed, asynchronous control
content-addressable memory
fault tolerance
• Figure 18.1 p. 490
–
–
black unit = active
white unit = inactive
323-670 Artificial Intelligence
2
Chapter 6
Hopfield Networks
– units are connected to each other with weighted
– symmetric connection
– a positive weighted connection  indicates that
the two unit tend to activate each other.
– a negative weighted connection 
allows an active unit to deactivate a neighboring
units.
323-670 Artificial Intelligence
3
Chapter 6
Parallel relaxation algorithm
The network operates as follows:
a random unit is chosen
if any of it neighbors are active, the unit
computes the sum of the weights on the
connections to those active neighbors.
if the sum is positive, the unit becomes active,
otherwise it becomes inactive.
Another random unit is chosen, and the process
repeat until the network reach a stable state.
(e.g. until no more unit can change state)
323-670 Artificial Intelligence
4
Chapter 6
Hopfield Networks
• Figure 18.1 p. 490 A Hopfield network
– black and positive  will attempt to activate
the unit connected to it
• Figure 18.2 p. 491 Four stable states :
storing the pattern
– given any set of weights and any initial
state, the parallel relaxation algorithm will be
stale into one of these four states
323-670 Artificial Intelligence
5
Chapter 6
Hopfield Networks
•
Figure 18.3 p. 491 Model of content-addressable
memory
–
–
–
–
•
Setting the activities of the units to correspond to a partial
pattern
To retrieve a pattern, we need to apply the portion of it.
the network will then settle into the stable state that best
matches the partial pattern.
shows the local minima = nearest stable state
Figure 18.4 p. 492 what a Hopfield network compute
–
from one state to another state.
323-670 Artificial Intelligence
6
Chapter 6
Hopfield Networks
–
–
Problem : sometimes the network can not find global
solution because the network stick with the local
minima because they settle into stable states via a
completely distributed algorithm.
For example in Figure 18.4, If a network reaches a
stable state A then no single unit willing to change its
state in order to move uphill, so the network will
never reach global optimal state B.
323-670 Artificial Intelligence
7
Chapter 6
Perceptron
 A perceptron (1962) Rosenblatt
 models a neural by taking a weight sum of its inputs and sending the
output 1 if the sum is grater than some adjustable threshold value
(otherwise it send 0)
 Figure 18.5-18.7 p.493-394 Threshold function
 Figure 18.8 Intelligence System
 g(x) = summation i = 1 to n of wixi
 output(x) = 1 if g(x) > 0
0 if g(x) < 0
323-670 Artificial Intelligence
8
Chapter 6
Perceptron
 In case of zero with two inputs
 g(x) = w0 + w1x1 + w2x2 = 0
 x2 = -(w1/w2)x1 - (w0/w2)  equation for a line
 the location of the line is determine by the weight w0 w1 and w2
 if an input vector lies on one side of the line, the perceptron will
output 1
 if it lies on the other side, the perceptron will output 0
 Decision surface : a line that correctly separates the training instances
corresponds to a perfectly function perceptron.
 See Figure 18.9 p. 496
323-670 Artificial Intelligence
9
Chapter 6
Decision surface
 the absolute value of g(x) tells how far a given input vector x
lies from the decision surface.
 so we know how good a set of weights is.
 let w be the weight vector (w0, w1,.., wn)
323-670 Artificial Intelligence
10
Chapter 6
Multilayer perceptron
Figure 18.10 p. 497 Adjusting the weights by Gradient Descent
hill-climbing/down-hill
See Algorithm Fixed-Increment Perceptron Learning
Figure 18.11 p.499 A perceptron learning to solve a classification
problem : K = 10, K = 100, K = 635
 Figure 18.12 p.500 the XOR is not linearly separable
 We need multilayer perceptron to solve the XOR problem




 See Figure 18.3 p. 500 where x1 = 1 and x2 = 1
323-670 Artificial Intelligence
11
Chapter 6
Backpropagation Algorithm







Parker 1985, Rumelhart et. al 1986
fully connected, feedforward network, multilayer network
Figure 18.14, p.502
fast, resistant to damage , learn efficiently
see Figure 18.15, p.503
use for classify problem
use sigmoid activation function (S-shaped) : it process a real value
between 0 and 1 as output see Figure 18.16, p.503
323-670 Artificial Intelligence
12
Chapter 6
Backpropagation Algorithm
 Figure 18.14, p.502
 start with a random set of weights
 the network adjusts its weights each time it sees an input-outout pair
 each pair require two stages
1) a forward pass : involves presenting a sample input to the network and
letting activations flow until they reach the output layer.
2) a backward pass : the network’s actual output (from the forward pass)
is compared with the target output and error estimates a re computed
323-670 Artificial Intelligence
13
Chapter 6
for output units
Backpropagation Algorithm
 The weight connected to the output units can be adjusted in order to
reduce the errors.
 We can use the error estimates of the output units to derive error
estimates for the units in the hidden layers.
 Finally errors are propagated back to the connections stemming from
the input units.
323-670 Artificial Intelligence
14
Chapter 6
Backpropagation Algorithm
 p. 504-506... initial weight = -0.1 to 0.1, initial the activation
function of the thresholding unit, learning rate = ,
 choose an input-output pair
 oj = network actual value (ค่าที่ network คานวณได้)
 yj = target output (ค่าจริ งของข้อมูล ที่เราใช้ train)
 adjust weights between the hidden layer and output layer (w2ij)
 adjust weights between the input layer and hidden layer (w1ij)
 input layer w1ij hidden layer w2i j output layer
Xi A
323-670 Artificial Intelligence
hj B
15
oj C
Chapter 6
Backpropagation Algorithm
 Backpropagation updates its weights after seeing each inputoutput pair. After it has seen all the input-output pairs and
adjusts its weight that many times, we call one epoch had
been completed.
 number of epochs make the network more efficiency
 we can speed up the network by using the momentum term 
 see equation p. 506
 perceptron convergence theorem (Rosenblatt 1962) : guarantees
that the perceptron will find a solution...
323-670 Artificial Intelligence
16
Chapter 6
Backpropagation Algorithm
 Generalization
 Figure 18.17 p.508
 Good network should capable of storing
entire training sets and have a setting for
weights that generally describe the
mapping for all cases, not the individual
input-output pairs.
323-670 Artificial Intelligence
17
Chapter 6
Reinforcement Learning
 use punishment and reward system (same as animal)
1) the network is presented with a sample input form the training set
2) the network computes what it thinks should be the sample output
3) the network is supplied with a real-valued judgment by a teacher
receive positive value : indicates good performance
receive negative value : indicates bad performance
4) the network adjusts its weights, and process repeats
 we try to receive positive value or to have good performance
 supervised learning
323-670 Artificial Intelligence
18
Chapter 6
Unsupervised Learning
 no feedback for its outputs
 no teacher required
 given a set of input data, the network is allowed to discover
regularities and relations between the different parts of the
input
 feature discovery : Figure 18.8 p. 511 Data for unsupervised
learning
 3 types of animal... 1) mammals 2)reptiles 3) birds
323-670 Artificial Intelligence
19
Chapter 6
Unsupervised Learning
 we need to sure that only one of the three output units
becomes active for any given input.
 see Figure 18.19 p. 512 A competitive learning network
 use winner-take-all behavior
323-670 Artificial Intelligence
20
Chapter 6
Unsupervised Learning
 single competitive learning algorithm p. 512-513
1) present an input vector
2) calculate the initial activation for each output unit
3) let the output units fight until only one is active
4) adjust the weights on the input lines that lead to the single active
output unit, increase the weights on connections between the active
output unit and active input units. (this makes it more likely that
the output unit will be active next time the pattern is required)
5) repeat steps 1 to 4 for all input patterns for many epochs.
323-670 Artificial Intelligence
21
Chapter 6
Recurrent Networks
 Jordan 1986
 use in temporal AI task, planning, natural language
processing
 we need more than a single output vector
 we need a series of output vectors
 Figure 18.22 p. 518 A Jordan network
 Figure 18.23 p. 519 A recurrent network with a mental
model
323-670 Artificial Intelligence
22
Chapter 6
Hopfield Networks
• Hopfiled [1982] : theory of memory p.490
• Model of content addressable memory p. 491
–
–
–
–
distribute representative
distributed, asynchronous control
content-addressable memory
fault tolerance
• Figure 18.1 p. 490
–
–
black unit = active
white unit = inactive
323-670 Artificial Intelligence
23
Chapter 6
323-670 Artificial Intelligence
24
Chapter 6
Hopfield Networks
– units are connected to each other with weighted
– symmetric connection
– a positive weighted connection  indicates that
the two unit tend to activate each other.
– a negative weighted connection 
allows an active unit to deactivate a neighboring
units.
323-670 Artificial Intelligence
25
Chapter 6
Parallel relaxation algorithm
The network operates as follows:
a random unit is chosen
if any of it neighbors are active, the unit
computes the sum of the weights on the
connections to those active neighbors.
if the sum is positive, the unit becomes active,
otherwise it becomes inactive.
Another random unit is chosen, and the process
repeat until the network reach a stable state.
(e.g. until no more unit can change state)
323-670 Artificial Intelligence
26
Chapter 6
Hopfield Networks
• Figure 18.1 p. 490 A Hopfield network
– black and positive  will attempt to activate
the unit connected to it
• Figure 18.2 p. 491 Four stable states :
storing the pattern
– given any set of weights and any initial
state, the parallel relaxation algorithm will be
stale into one of these four states
323-670 Artificial Intelligence
27
Chapter 6
323-670 Artificial Intelligence
28
Chapter 6
Hopfield Networks
•
Figure 18.3 p. 491 Model of content-addressable
memory
–
–
–
–
•
Setting the activities of the units to correspond to a partial
pattern
To retrieve a pattern, we need to apply the portion of it.
the network will then settle into the stable state that best
matches the partial pattern.
shows the local minima = nearest stable state
Figure 18.4 p. 492 what a Hopfield network compute
–
from one state to another state.
323-670 Artificial Intelligence
29
Chapter 6
323-670 Artificial Intelligence
30
Chapter 6
Hopfield Networks
–
–
Problem : sometimes the network can not find global
solution because the network stick with the local
minima because they settle into stable states via a
completely distributed algorithm.
For example in Figure 18.4, If a network reaches a
stable state A then no single unit willing to change its
state in order to move uphill, so the network will
never reach global optimal state B.
323-670 Artificial Intelligence
31
Chapter 6
323-670 Artificial Intelligence
32
Chapter 6
Perceptron
 A perceptron (1962) Rosenblatt
 models a neural by taking a weight sum of its inputs and sending the
output 1 if the sum is grater than some adjustable threshold value
(otherwise it send 0)
 Figure 18.5-18.7 p.493-394 Threshold function
 Figure 18.8 Intelligence System
 g(x) = summation i = 1 to n of wixi
 output(x) = 1 if g(x) > 0
0 if g(x) < 0
323-670 Artificial Intelligence
33
Chapter 6
323-670 Artificial Intelligence
34
Chapter 6
323-670 Artificial Intelligence
35
Chapter 6
323-670 Artificial Intelligence
36
Chapter 6
323-670 Artificial Intelligence
37
Chapter 6
Perceptron
 In case of zero with two inputs
 g(x) = w0 + w1x1 + w2x2 = 0
 x2 = -(w1/w2)x1 - (w0/w2)  equation for a line
 the location of the line is determine by the weight w0 w1 and w2
 if an input vector lies on one side of the line, the perceptron will
output 1
 if it lies on the other side, the perceptron will output 0
 Decision surface : a line that correctly separates the training instances
corresponds to a perfectly function perceptron.
 See Figure 18.9 p. 496
323-670 Artificial Intelligence
38
Chapter 6
323-670 Artificial Intelligence
39
Chapter 6
Decision surface
 the absolute value of g(x) tells how far a given input vector x
lies from the decision surface.
 so we know how good a set of weights is.
 let w be the weight vector (w0, w1,.., wn)
323-670 Artificial Intelligence
40
Chapter 6
Multilayer perceptron
Figure 18.10 p. 497 Adjusting the weights by Gradient Descent
hill-climbing/down-hill
See Algorithm Fixed-Increment Perceptron Learning
Figure 18.11 p.499 A perceptron learning to solve a classification
problem : K = 10, K = 100, K = 635
 Figure 18.12 p.500 the XOR is not linearly separable
 We need multilayer perceptron to solve the XOR problem




 See Figure 18.3 p. 500 where x1 = 1 and x2 = 1
323-670 Artificial Intelligence
41
Chapter 6
323-670 Artificial Intelligence
42
Chapter 6
323-670 Artificial Intelligence
43
Chapter 6
323-670 Artificial Intelligence
44
Chapter 6
323-670 Artificial Intelligence
45
Chapter 6
Backpropagation Algorithm







Parker 1985, Rumelhart et. al 1986
fully connected, feedforward network, multilayer network
Figure 18.14, p.502
fast, resistant to damage , learn efficiently
see Figure 18.15, p.503
use for classify problem
use sigmoid activation function (S-shaped) : it process a real value
between 0 and 1 as output see Figure 18.16, p.503
323-670 Artificial Intelligence
46
Chapter 6
323-670 Artificial Intelligence
47
Chapter 6
323-670 Artificial Intelligence
48
Chapter 6
323-670 Artificial Intelligence
49
Chapter 6
Backpropagation Algorithm
 Figure 18.14, p.502
 start with a random set of weights
 the network adjusts its weights each time it sees an input-outout pair
 each pair require two stages
1) a forward pass : involves presenting a sample input to the network and
letting activations flow until they reach the output layer.
2) a backward pass : the network’s actual output (from the forward pass)
is compared with the target output and error estimates a re computed
for
output
units
323-670 Artificial Intelligence
50
Chapter 6
Backpropagation Algorithm
 The weight connected to the output units can be adjusted in order to
reduce the errors.
 We can use the error estimates of the output units to derive error
estimates for the units in the hidden layers.
 Finally errors are propagated back to the connections stemming from
the input units.
323-670 Artificial Intelligence
51
Chapter 6
Backpropagation Algorithm
 p. 504-506... initial weight = -0.1 to 0.1, initial the activation
function of the thresholding unit, learning rate = ,
 choose an input-output pair
 oj = network actual value (ค่าที่ network คานวณได้)
 yj = target output (ค่าจริ งของข้อมูล ที่เราใช้ train)
 adjust weights between the hidden layer and output layer (w2ij)
 adjust weights between the input layer and hidden layer (w1ij)
 input layer w1ij hidden layer w2i j output layer
Xi A
323-670 Artificial Intelligence
hj B
52
oj C
Chapter 6
323-670 Artificial Intelligence
53
Chapter 6
Backpropagation Algorithm
 Backpropagation updates its weights after seeing each inputoutput pair. After it has seen all the input-output pairs and
adjusts its weight that many times, we call one epoch had
been completed.
 number of epochs make the network more efficiency
 we can speed up the network by using the momentum term 
 see equation p. 506
 perceptron convergence theorem (Rosenblatt 1962) : guarantees
that the perceptron will find a solution...
323-670 Artificial Intelligence
54
Chapter 6
323-670 Artificial Intelligence
55
Chapter 6
323-670 Artificial Intelligence
56
Chapter 6
Backpropagation Algorithm
 Generalization
 Figure 18.17 p.508
 Good network should capable of storing
entire training sets and have a setting for
weights that generally describe the
mapping for all cases, not the individual
input-output pairs.
323-670 Artificial Intelligence
57
Chapter 6
323-670 Artificial Intelligence
58
Chapter 6
Reinforcement Learning
 use punishment and reward system (same as animal)
1) the network is presented with a sample input form the training set
2) the network computes what it thinks should be the sample output
3) the network is supplied with a real-valued judgment by a teacher
receive positive value : indicates good performance
receive negative value : indicates bad performance
4) the network adjusts its weights, and process repeats
 we try to receive positive value or to have good performance
 supervised learning
323-670 Artificial Intelligence
59
Chapter 6
Unsupervised Learning
 no feedback for its outputs
 no teacher required
 given a set of input data, the network is allowed to discover
regularities and relations between the different parts of the
input
 feature discovery : Figure 18.8 p. 511 Data for unsupervised
learning
 3 types of animal... 1) mammals 2)reptiles 3) birds
323-670 Artificial Intelligence
60
Chapter 6
323-670 Artificial Intelligence
61
Chapter 6
Unsupervised Learning
 we need to sure that only one of the three output units
becomes active for any given input.
 see Figure 18.19 p. 512 A competitive learning network
 use winner-take-all behavior
323-670 Artificial Intelligence
62
Chapter 6
323-670 Artificial Intelligence
63
Chapter 6
Unsupervised Learning
 single competitive learning algorithm p. 512-513
1) present an input vector
2) calculate the initial activation for each output unit
3) let the output units fight until only one is active
4) adjust the weights on the input lines that lead to the single active
output unit, increase the weights on connections between the active
output unit and active input units. (this makes it more likely that
the output unit will be active next time the pattern is required)
5) repeat steps 1 to 4 for all input patterns for many epochs.
323-670 Artificial Intelligence
64
Chapter 6
Recurrent Networks
 Jordan 1986
 use in temporal AI task, planning, natural language
processing
 we need more than a single output vector
 we need a series of output vectors
 Figure 18.22 p. 518 A Jordan network
 Figure 18.23 p. 519 A recurrent network with a mental
model
323-670 Artificial Intelligence
65
Chapter 6
323-670 Artificial Intelligence
66
Chapter 6
323-670 Artificial Intelligence
67
Chapter 6
The End
323-670 Artificial Intelligence
68
Chapter 6