p - UNSTABLE.NL

Activations, attractors, and
associators
Jaap Murre
Universiteit van Amsterdam en
Universiteit Utrecht
[email protected]
Toets
• Op welke wijze abstraheert een neuraal
network neuron (‘node’) van een biologisch
neuron?
• Noem tenminste 5 kenmerken
Overview
•
•
•
•
•
•
•
•
Interactive activation model
Hopfield networks
Constraint satisfaction
Attractors
Traveling salesman problem
Hebb rule and Hopfield networks
Bidirectional associative networks
Linear associative networks
Much of perception is dealing
with ambiguity
LAB
Many interpretations are
processed in parallel
CAB
The final interpretation must
satisfy many constraints
In the recognition of letters and words:
i. Only one word can occur at a given position
ii. Only one letter can occur at a given
position
iii. A letter-on-a-position activates a word
iv. A feature-on-a-position activates a letter
i. Only one word can occur at a given position
LAP
L..
C..
CAP CAB
.A.
..P
..B
ii. Only one letter can occur at a given position
LAP
L..
C..
CAP CAB
.A.
..P
..B
iii. A letter-on-a-position activates a word
LAP
L..
C..
CAP CAB
.A.
..P
..B
iv. A feature-on-a-position activates a letter
LAP
L..
C..
CAP CAB
.A.
..P
..B
Recognition of a letter is a
process of constraint satisfaction
LAP
L..
C..
CAP CAB
.A.
..P
..B
Recognition of a letter is a
process of constraint satisfaction
LAP
L..
C..
CAP CAB
.A.
..P
..B
Recognition of a letter is a
process of constraint satisfaction
LAP
L..
C..
CAP CAB
.A.
..P
..B
Recognition of a letter is a
process of constraint satisfaction
LAP
L..
C..
CAP CAB
.A.
..P
..B
Recognition of a letter is a
process of constraint satisfaction
LAP
L..
C..
CAP CAB
.A.
..P
..B
Hopfield (1982)
• Bipolar activations
– -1 or 1
• Symmetric weights (no self weights)
– wij= wji
• Asynchronous update rule
– Select one neuron randomly and update it
• Simple threshold rule for updating
Energy of a Hopfield network
Energy E = - ½ i,jwjiaiaj
E = - ½ i(wjiai + wijai)aj = - iwjiai aj
Net input to node j is iwjiai = netj
Thus, we can write E = - netj aj
Given a net input, netj, find aj so
that - netjaj is minimized
• If netj is positive set aj to 1
• If netj is negative set aj to -1
• If netj is zero, don’t care (leave aj as is)
• This activation rule ensures that the energy
never increases
• Hence, eventually the energy will reach a
minimum value
Attractor
• An attractor is a stationary network state
(configuration of activation values)
• This is a state where it is not possible to
minimize the energy any further by just
flipping one activation value
• It may be possible to reach a deeper
attractor by flipping many nodes at once
• Conclusion: The Hopfield rule does not
guarantee that an absolute energy minimum
will be reached
Attractor
Local minimum
Global minimum
Example: 8-Queens problem
• Place 8 queens on a chess board such that
they are not able to take each other
• This implies the following three constraints:
– 1 queen per column
– 1 queen per row
– 1 queen on any diagonal
• This encoding of the constraints ensures
that the attractors of the network correspond
to valid solutions
The constraints are satisfied by
inhibitory connections
Column
Diagonals
Row
Diagonals
Problem: how to ensure that
exactly 8 nodes are 1?
• A term may be added to control for this in
the activation rule
• Binary nodes may be used with a bias
• It is also possible to use continuous valid
nodes with Hopfield networks (e.g, between
0 and 1)
Traveling Salesman Problem
The energy minimization question
can also be turned around
• Given ai and aj, how should we set the
weight wji = wji so that the energy is
minimized?
• E = - ½  wjiaiaj, so that
– when aiaj = 1, wji must be positive
– when aiaj = -1, wji must be negative
• For example, wji= aiaj, where  is a
learning constant
Hebb and Hopfield
• When used with Hopfield type activation
rules, the Hebb learning rule places patterns
at attractors
• If a network has n nodes, 0.15n random
patterns can be reliably stored by such a
system
• For complete retrieval it is typically
necessary to present the network with over
90% of the original pattern
Bidirectional Associative
Memories (BAM, Kosko 1988)
•
•
•
•
Uses binary nodes (0 or 1)
Symmetric weights
Input and output layer
Layers are updated in
order, using threshold
activation rule
• Nodes within a layer are
updated synchronously
BAM
• BAM is in fact a Hopfield network with two
layers of nodes
• Within a layer, weights are 0
• These neurons are not dependent on each
other (no mutual inputs)
• If updated synchronously, there is therefore
no danger of increasing the network energy
• BAM is similar to the core of Grossberg’s
Adaptive Resonance Theory (Lecture 4)
Linear Associative Networks
• Invented by Kohonen (1972), Nakano
(1972), and by Anderson (1972)
• Two layers
• Linear activation rule
– Activation is equal to net input
• Can store patterns
• Their behavior is mathematically tractable
using matrix algebra
Associating an input vector p with an
output vector q
Storage: W = qpT
with  = (pTp)-1
Recall: Wp = qpTp = pTpq = q
Inner product pTp gives a scalar
3
0
1
p
4
0
1
pT
3 0 1 4 0 1
9
0
1
16
0
1
9
0
1
16
0
1
27
 = (pTp)-1 = 1/27
Outer product qpT gives a matrix
3
3
6
0
6
12
3
0
0
0
0
0
0
0
pT input vector
1 4 0 1
1 4 0 1 1
2 8 0 2 2
0 0 0 0 0
q output vector
2 8 0 2 2
4 16 0 4 4
1 4 0 1 1
W/
weight matrix divided by constant
Final weight matrix W =
3
0.11
0.22
0
0.22
0.44
0.11
0
0
0
0
0
0
0
1
0.04
0.07
0
0.07
0.15
0.04
4
0.15
0.3
0
0.3
0.59
0.15
0
0
0
0
0
0
0
T
qp
1
0.04
0.07
0
0.07
0.15
0.04
1
2
0
2
4
1
Recall: Wp = q
Output vector
1
2
0
2
4
1
0.11
0.22
0
0.22
0.44
0.11
Input vector
Weight matrix
0
0
0
0
0
0
0.04
0.07
0
0.07
0.15
0.04
0.15
0.3
0
0.3
0.59
0.15
0
0
0
0
0
0
0.04
0.07
0
0.07
0.15
0.04
0.113 + 00 + 0.04 1 + 0.154 + 0 0 + 0.041 = 1
0.223 + 00 + 0.07 1 + 0.304 + 0 0 + 0.071 = 2
3
0
1
4
0
1
Storing n patterns
Storage: Wk = kqkpkT, with k = pkTpk
W = W1 + W2 + … + Wk + … + Wn
Recall: Wpk = kqkpkTpk + Error = q + Error
Error = W1pk + … + Whpk + … + Wnpk
is 0 only if phTpk for all h  k
Conclusion
• LANs work only well, if the input patterns
are (nearly) orthogonal
• If an input pattern overlaps with others, then
recall will be contaminated with the output
patterns of those overlapping patterns
• It is, therefore, important that input patterns
are orthogonal (i.e., have little overlap)
LANs have limited
representational power
• For each three-layer LAN,
there exists an equivalent two
layer LAN
• Proof: Suppose that q = Wp
and r = Vq, than we have
r = Vq = VWp = Xp
with X = VW
r
V
q
W
p
r
X
p
Summing up
• There is a wide variety of ways to store and
retrieve patterns in neural networks based
on the Hebb rule
–
–
–
–
Willshaw network (associator)
BAM
LAN
Hopfield network
• In Hopfield networks, stored patterns can be
viewed as attractors
Summing up
• Finding an attractor is a process of
constraint satisfaction. It can can be used as:
– A recognition model
– A memory retrieval model
– A way of solving the traveling salesman
problem and other difficult problems