i 1

What is Unsupervised
Learning?
• Learning without a teacher.
• No feedback to indicate the desired
outputs.
• The network must by itself discover the
relationship of interest from the input
data.
The Nearest Neighbor
Classifier
1
2
x(1)
3
x(2)
4
x(3)
x(4)
The Nearest Neighbor
Classifier
1
2
x(1)
 ?Class
3
x(3)
x(2)
4
x(4)
The Hamming Networks
• Stored a set of classes represented
by a set of binary prototypes.
• Given an incomplete binary input, find
the class to which it belongs.
• Use Hamming distance as the
distance measurement.
• Distance vs. Similarity.
The Hamming Net
MAXNET
Winner-Take-All
Similarity
Measurement
x1
x2
xn
The Hamming Distance
y = 1 1 1
1 1
x = 1 1
1
1
1
1
1 1
1
The Hamming Distance
y = 1 1 1
1 1
x = 1 1
1
1
1
1
1 1
1
The Hamming Distance
y = 1 1 1
1 1
x = 1 1
1
1
1
1
1 1
1
HD(x, y)  12 (7  1)  3
The Hamming Distance
y = 1 1 1
1 1
x = 1 1
1
1
1
1
1
1
1
1 1
1
1 1 1
1
Sum=1
The Hamming Distance
y  ( y1 , y2 ,
, ym )
x  ( x1 , x2 ,
T
T
, xm )
yi  {1, 1}
xi  {1, 1}
HD(x, y )  ?
Similarity (x, y )  ?
The Hamming Distance
y  ( y1 , y2 ,
, ym )
x  ( x1 , x2 ,
T
yi  {1, 1}
T
xi  {1, 1}
, xm )
HD(x, y )  (m  x y )
1
2
T
Similarity (x, y )  m  12 (m  xT y )
 12 m  12 xT y
The Hamming Distance
y  ( y1 , y2 ,
, ym )
x  ( x1 , x2 ,
T
yi  {1, 1}
T
xi  {1, 1}
, xm )
HD(x, y )  (m  x y )
1
2
T
Similarity (x, y )  m  12 (m  xT y )
 12 m  12 xT y
The Hamming Net
y1
y2
yn1
yn
MAXNET
x1
1
2
n1
n
1
2
n1
n
x2
xm1
Winner-Take-All
Similarity
Measurement
xm
The Hamming Net
y1
y2
yn1
yn
MAXNET
WM=?
1
2
n1
n
1
2
n1
n
Winner-Take-All
Similarity
Measurement
WS=?
x1
x2
xm1
xm
Stored n patterns s1 , s 2 ,
s k  ( s1k , s2k ,
, sn
smk )T with sik {1,1}.
The Stored Patterns
k
1y
y
y
y
Similarity
 m
1
2 ( x, s )n1
n 
2
WM=?
1
2
Similarity
(x, sn1)  m 
1
2
k
1
n
2
n1
n
x2
xm1
T k
x s
m
MAXNET
xs
k
1Winner-Take-All
i i
2
i 1
Similarity
Measurement
WS=?
x1
1
2
xm
Stored n patterns s1 , s 2 ,
s k  ( s1k , s2k ,
, sn
smk )T with sik {1,1}.
The Stored Patterns
Similarity (x, s k )  12 m  12 xT s k
Similarity (x, s )  m 
k
1
2
m
1
2
xs
i 1
k
i i
Similarity (x, s k )
m/2
k
1
2
x1
s1k
1
2
x2
s
k
2
...
1
2
Similarity
Measurement
smk
xm
• Weight update: w j  w j  w j
– Method 1:
Method 2
w j   (il  w j )
w j   il
il – w j
il + w j
η (il - wj)
il
il
wj
wj + η(il - wj)
ηil
wj
wj + ηil
In each method, w j is moved closer to il
– Normalize the weight vector to unit length after it is
updated w j  w j / w j
– Sample input vectors are also normalized il  il / il
– Distance il  w j  il  w j
2
 i (il ,i  w j ,i ) 2
•
w j is moving to the center of a cluster of sample vectors after
repeated weight updates
– Node j wins for three training
samples: i1 , i2 and i3
– Initial weight vector wj(0)
– After successively trained
by i1 , i2 and i3 ,
the weight vector
changes to wj(1),
wj(2), and wj(3),
wj(0)
wj(3)
wj(1)
i3
i1
i2
wj(2)
Example
w2
w1
w1 will always win no matter
the sample is from which
class
w2
is stuck and will not
participate
in learning
unstuck:
let output nodes have some conscience
temporarily shot off nodes which have had very
high
winning rate (hard to determine what rate should
be
Example
w2
Results depend on the
w1
sequence
of sample presentation
w2
w1
Solution:
Initialize wj to randomly
selected input vector il
that are far away from
each other