Unsupervised Learning

NN 4
Unsupervised Learning
Neural
networks for
unsupervised
learning
attempt to
discover
interesting
structure in the
data,
without making
use of
information
about the class
of an example.
NN 4
1
K-means Clustering
– Initialize K weight vectors, e.g. to randomly chosen
examples. Each weight vector represents a cluster.
– Assign each input example x to the cluster c(x) with the
nearest corresponding weight vector:
c(x) = arg min j x − w j ( n )
– Update the weights:
w j ( n + 1) =
∑x
x such that c(x) = j
/ Pj
with Pj the number of examples assigned to cluster j
–Increment n by 1 and go until no noticeable changes of
weight vectors occur.
NN 3
2
1
NN 4
Example I
Initial Data and Seeds
Final Clustering
NN 4
3
Example II
Initial Data and Seeds
Final Clustering
NN 4
4
2
NN 4
Problems
•
How many clusters?
–
•
–
–
–
•
Use a given parameter K
What similarity measure?
Euclidean distance
Correlation coefficient
Ad-hoc similarity measure
How to assess the quality of a clustering?
–
Compact and well separated clusters are better … many different
quality measures have been introduced.
NN 3
5
What Is Good Clustering?
• A good clustering method will produce high quality
clusters with
– high intra-class similarity
– low inter-class similarity
• The quality of a clustering result depends on both
the similarity measure used by the method and its
implementation.
• The quality of a clustering method is also measured
by its ability to discover hidden structures.
NN 4
6
3
NN 4
Self Organizing Maps (SOM)
• SOM is an unsupervised neural network that
approximates an unlimited number of input
data by a finite set of nodes arranged in a
grid, where neighbor nodes correspond to
more similar input data.
• The model is produced by a learning
algorithm that automatically orders the inputs
on a one or two-dimensional grid according
to their mutual similarity.
NN 4
7
Biological Motivation
Nearby areas of the cortex correspond to related
brain functions
NN 4
8
4
NN 4
Brain’s self-organization
The brain maps the external
multidimensional representation of
the world into a similar 1 or 2 - D
internal representation.
That is, the brain processes the
external signals in a topologypreserving way and our
computational system should be
able to do the same.
NN 4
9
Self-Organized Map: idea
o' ' '
' '
' oo ' ' '
'
'' ' ' ' ' ' ' oo
'o o '
' '
o=data
(i)
' = network W
parameters
'o
x
y
'
feature space
o'
z
input neurons
W assigned to
processors
2-D grid with
processors
- Data: vectors X = (X¹, X², X³) from 3-dimensional space.
- SOM: Grid (lattice) of nodes, with local processor (called
neuron) in each node; each local processor j has d =3
adaptive parameters w1 , w2, w3 ≡ W j .
- Goal: change W j to recover data clusters in X space.
NN 4
10
5
NN 4
Concept of the SOM I.
Input space
Input layer
Reduced feature space
Map layer
Cluster centers (code vectors)
Place of these code vectors
in the reduced space
Clustering and ordering of the cluster centers
in a two dimensional grid
NN 4
11
SOM Formalization
• Every input data component is connected with each neuron
of the lattice.
• The topology of the lattice allows to define a neighborhood
structure on the neurons, like those illustrated below.
2-dimensional topology
and two possible neighborhoods
1-dimensional topology
with a small neighborhood
NN 4
12
6
NN 4
A 2-dimensional SOM with 3D inputs
Layer
of
input
nodes
NN 4
13
SOM: interpretation
• Each SOM neuron can be considered as
representative of a cluster containing all the
input examples which are mapped to that
neuron.
• For a given input, the output of SOM is the
neuron whose weight vector is most similar
to that input.
NN 4
14
7
NN 4
SOM: learning
• Through repeated presentations of the
training examples, the weight vectors of
the neurons are adapted and tend to
follow the distribution of the examples.
• This results in a topological ordering of the
neurons, where neurons adjacent to each
other in the map tend to have similar
weight vectors.
• The input space of patterns is mapped
into a discrete output space of neurons.
NN 4
15
SOM: learning algorithm
• Initialization. n=0. Choose random small values for weight
vectors components.
• Sampling. Select a pattern x from the input examples.
• Similarity matching. Find the winning neuron i(x) at iteration n:
i = arg min || x(n) − w j (n) ||
j
• Updating: adjust the weight vectors of all neurons using the
following rule
w j (n + 1) = w j ( n) + η (n) hi , j (x(n) - w j (n) )
• Continuation: n = n+1. Go to the Sampling step until no
noticeable changes in the weights are observed.
NN 4
16
8
NN 4
Neighborhood Function
Gaussian neighborhood
function:
in a 1-dimensional lattice: | j - i |
⎛ r −r
j
i
⎜
hi , j = exp⎜ −
2
2σ
⎜
⎝
2
⎞
⎟
⎟⎟
⎠
σ measures the degree of cooperation in the
learning process of the excited neurons in the
vicinity of the winning neuron.
In the learning phase σ is updated at each
iteration during the ordering phase using the
following exponential decay update rule
with
σ (n) = σ 0 exp⎛⎜ − n T ⎞⎟
parameters: σ 0 and T1
1⎠
⎝
NN 4
17
Cooperation
1.2
sigma = 50
hi ;j
sigma = 30
1
sigma = 20
sigma = 10
0.8
k
0.6
0.4
0.2
0
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
NN 4
106 113 120 127 134 141 148 155 162 169 176 183 190 197
18
9
NN 4
UPDATE RULE
w j (n + 1) = w j (n) + η (n) hij (n) (x - w j (n) )
Also the learning rate parameter has an exponential
decay update:
η ( n ) = η0 exp⎛⎜ − n T ⎞⎟
2⎠
⎝
NN 4
19
Weight update
η (n) hij (n) (x - w j (n) )
x − w j (n )
w j (n )
w j (n + 1)
x
NN 4
20
10
NN 4
Two-phases learning approach
1- Self-organizing or ordering phase. The learning
rate and spread of the Gaussian neighborhood
function are adapted during the execution of
SOM, using for instance the exponential decay
update rule.
2- Convergence phase. The learning rate and
Gaussian spread have small fixed values during
the execution of SOM.
NN 4
21
Training: Ordering Phase
• Self organizing or ordering phase:
– There is a topological ordering of weight vectors.
– It may take 1000 or more iterations of SOM algorithm.
• The choice of the parameter values is important
• With a proper initial setting of the parameters, the
neighborhood of the winning neuron includes
almost all neurons in the network, then it shrinks
slowly with time.
NN 4
22
11
NN 4
Training: Convergence Phase
• Convergence phase:
– Fine tune the weight vectors.
– Must be at least 500 times the number of
neurons in the network ⇒ thousands or tens of
thousands of iterations.
• Choice of parameter values:
– η(n) maintained on the order of 0.01.
– Neighborhood function such that the neighbor of
the winning neuron contains only the nearest
neighbors. It eventually reduces to one or zero
neighboring neurons.
NN 4
23
Visualization
• Neurons are visualized as changing
positions in the weight space as learning
takes place. Each neuron is described by
the corresponding weight vector consisting
of the weights of the links from the input
layer to that neuron.
• Two neurons are connected by an edge if
they are direct neighbors in the NN lattice.
NN 4
24
12
NN 4
Example 1
A two dimensional lattice driven by a two dimensional
distribution:
• 100 neurons arranged in a 2D lattice of 10 x 10
nodes.
• Input is bidimensional: x = (x1, x2) from uniform
distribution in region:
{ (-1 < x1 < +1); (-1 < x2 < +1) }
• Weights are initialized with small random values.
NN 4
25
Example 1
NN 4
26
13
NN 4
Initial h function (Example 1)
NN 4
27
Example 2
A one dimensional lattice driven by a two
dimensional distribution:
• 100 neurons arranged in one dimensional
lattice.
• Input space is the same as in Example 1.
• Weights are initialized with random values
(again like in example 1).
NN 4
28
14
NN 4
Example 2
NN 4
29
Example 2
NN 4
30
15
NN 4
Application: Italian olive oil
572 samples of olive
oil were collected
from 9 Italian
provinces.
Content of 8 fats
was determined for
each oil.
SOM 20 x 20
network,
Map 8D => 2D.
Note that topographical relations are preserved.
NN 4
31
Other real-life applications
Helsinki University of Technology web site
http://www.cis.hut.fi/research/refs/
has a list of > 5000 papers on SOM and its applications!
• Brain research: creation of various topographical maps in
motor, auditory and visual areas.
• AI and robotics: analysis of data from sensors, control of
robot’s movement (motor maps), spatial orientation maps.
• Information retrieval and text categorization.
• Bioinformatics: clusterization of genes, protein properties,
chemical compounds
• Business: economical data, business and financial data ....
• Data compression (images and audio), information filtering.
• Medical and technical diagnosis.
NN 4
32
16
NN 4
And some more ..
• Natural language processing: linguistic analysis, parsing, learning
languages, hyphenation patterns.
• Optimization: configuration of telephone connections, VLSI design,
time series prediction, scheduling algorithms.
• Signal processing: adaptive filters, real-time signal analysis, radar,
sonar seismic, USG, EKG, EEG and other medical signals ...
• Image recognition and processing: segmentation, object
recognition, texture recognition ...
• Content-based retrieval
NN 4
33
17

Download Report

Unsupervised Learning

Paperzz.com

Your Paperzz