Genetic algorithms for clustering

Genetic Algorithms
for clustering problem
Pasi Fränti
7.4.2016
General structure
Genetic Algorithm:
Generate S initial solutions
REPEAT Z iterations
Select best solutions
Create new solutions by crossover
Mutate solutions
END-REPEAT
Main principle
Components of GA
•
•
•
•
Representation of solution
Selection method
Crossover method
Mutation
Most critical !
Representation
Representation of solution
• Partition (P):
– Optimal centroid can be calculated from P
– Only local changes can be made
• Codebook (C):
– Optimal partition can be calculated from C
– Calculation of P takes O(NM)  slow
• Combined (C, P):
– Both data structures are needed anyway
– Computationally more efficient
Selection method
• To select which solutions will be used in
crossover for generating new solutions
• Main principle: good solutions should be
used rather than weak solutions
• Two main strategies:
– Roulette wheel selection
– Elitist selection
• Exact implementation not so important
Roulette wheel selection
• Select two candidate solutions for the
crossover randomly.
• Probability for a solution to be selected is
weighted according to its distortion:
1
w(C , P) 
1  distortion (C , P)
p(Ci , Pi ) 
w(Ci , Pi )
S
 w(C , P )
j 1
j
j
Elitist selection
• Main principle: select all possible pairs
among the best candidates.
Select next pair(i, j):
REPEAT
IF (i+j) MOD 2 = 0
THEN imax(1, i-1); jj+1;
ELSE jmax(1, j-1); ii+1;
UNTIL ij.
RETURN(i, j)
i
1
1
2
Elitist approach using
zigzag scanning among the
best solutions
4
5
...
j
3
2
3
4
5
...
Crossover
Crossover methods
Different variants for crossover:
•
•
•
•
•
Random crossover
Centroid distance
Pairwise crossover
Largest partitions
PNN
Local fine-tuning:
• All methods give new allocation of the centroids.
• Local fine-tuning must be made by K-means.
• Two iterations of K-means is enough.
Random crossover
Select M/2 centroids randomly from the two parent.
Solution 1
Solution 2
+
Parent solution A
Parent solution B
c1
c1
c4
c3
c3
c2
c4
c2
1
New Solution:
2
4
5
8
M=4
Some possibilities:
How to create a new solution?
Parent A
Parent B
Rating
Picking M/2 randomly chosen cluster centroids from
each of the two parents in turn.
c2, c4
c1, c4
Optimal
c1, c2
c3, c4
Good (K-Means)
c2, c3
c2, c3
Bad
How many solutions are there?
36 possibilities how to create a new solution.
Probability to select a good one?
Not high, some are good but K-Means is needed,
most are bad. See statistics.
Rough statistics:
Optimal: 1
Good:
7
Bad:
28
Explanation
Data point
Centroid
M – number of clusters
Parent solution A
Parent solution B
c1
c1
c4
c3
c3
c2
c4
c2
1
Child solution (optimal)
c1
c4
Child solution (good)
c1
c3
c2
4
5
8
Child solution (bad)
c4
c3
c2
2
c1
c2 c
3
c4
Centroid distance crossover
[Pan, McInnes, Jack, 1995: Electronics Letters ]
[Scheunders, 1997: Pattern Recognition Letters ]
• For each centroid, calculate its distance to
the center point of the entire data set.
• Sort the centroids according to the distance.
• Divide into two sets: central vectors (M/2
closest) and distant vectors (M/2 furthest).
• Take central vectors from one codebook and
distant vectors from the other.
Parent solution A
Parent solution B
c4
6
6
5
c4
5
Ced
c1
Ced
c3
c2
c1
c3
1
1
1
2
4
5
8
1) Distances d(ci, Ced):
A: d(c4, Ced) < d(c2, Ced) < d(c1, Ced) < d(c3, Ced)
B: d(c1, Ced) < d(c3, Ced) < d(c2, Ced) < d(c4, Ced)
c2
1
2
4
5
8
New solution:
c4
2) Sort centroids according to the distance:
A: c4, c2, c1, c3, B: c1, c3, c2, c4
Variant (a)
Take cental vectors from parent solution A
and distant vectors from parent solution B
3) Divide into two sets (M = 4):
OR
A: central vectors: c4, c2, distant vectors: c1, c3
B: central vectors: c1, c3, distant vectors: c2, c4
Explanation
c4
c2
Variant (b)
Take distant vectors from parent solution A
and central vectors from parent solution B
Data point
c1
Centroid
c3
Centroid of entire dataset
M – number of clusters
c1
c3
c2
Child – variant (b)
Child - variant (a)
c4
6
6
5
5
c3
Ced
c3
Ced
c4
c2
1
1
2
4
c2
c1
c1
1
5
8
1
2
4
5
8
New solution:
c4
Variant (a)
Take cental vectors from parent solution A
and distant vectors from parent solution B
c4
OR
Explanation
Data point
Centroid
c2
Variant (b)
Take distant vectors from parent solution A
and central vectors from parent solution B
Centroid of entire dataset
c1
M – number of clusters
c1
c3
c3
c2
Pairwise crossover
[Fränti et al, 1997: Computer Journal]
Greedy approach:
• For each centroid, find its nearest centroid in the
other parent solution that is not yet used.
• Among all pairs, select one of the two randomly.
Small improvement:
• No reason to consider the parents as separate
solutions.
• Take union of all centroids.
• Make the pairing independent of parent.
Pairwise crossover example
Initial parent solutions
MSE=8.79109
MSE=11.92109
Pairwise crossover example
Pairing between parent solutions
MSE=7.34109
Pairwise crossover example
Pairing without restrictions
MSE=4.76109
Largest partitions
[Fränti et al, 1997: Computer Journal]
Crossover algorithm:
• Each cluster in the solutions A and B is assigned with a
number, cluster size S, indicating how many data objects
belong to it.
• In each phase we pick the centroid of the largest cluster.
• Assume that cluster i was chosen from A. The cluster centroid
Ci is removed from A to avoid its reselection.
• For the same reason we update the cluster sizes of B by
removing the effect of those data objects in B that were
assigned to the chosen cluster i in A.
Largest partitions
[Fränti et al, 1997: Computer Journal]
Parent solution A
Parent solution B
S=50
S=50
S=30
S=30
S=100
S=100
c1
S=20
S=20
Explanation
Data point
Centroid
PNN crossover for GA
[Fränti et al, 1997: The Computer Journal]
Initial 1
Combined
Initial 2
Union
PNN
After PNN
The PNN crossover method (1)
[Fränti, 2000: Pattern Recognition Letters]
CrossSolutions(C1, P1, C2, P2)  (Cnew, Pnew)
Cnew  CombineCentroids(C1, C2)
Pnew  CombinePartitions(P1, P2)
Cnew  UpdateCentroids(Cnew, Pnew)
RemoveEmptyClusters(Cnew, Pnew)
PerformPNN(Cnew, Pnew)
CombineCentroids(C1, C2)  Cnew
Cnew  C1  C2
CombinePartitions(Cnew, P1, P2)  Pnew
FOR i1 TO N DO
IF xi  c p
2
1
i
 xi  c p 2
pinew  pi1
ELSE
pinew  pi2
END-FOR
i
2
THEN
The PNN crossover method (2)
UpdateCentroids(C1, C2)  Cnew
FOR j1 TO |Cnew| DO
c new
 CalculateCentroid(Pnew, j )
j
PerformPNN(Cnew, Pnew)
FOR i1 TO |Cnew| DO
qi  FindNearestNeighbor(ci)
WHILE |Cnew|>M DO
a  FindMinimumDistance(Q)
b  qa
MergeClusters(ca, pa, cb, pb)
UpdatePointers(Q)
END-WHILE
Importance of K-means
(Random crossover)
Bridge
260
without k-means
distortion
240
Worst
220
Best
200
with k-means
180
160
0
5
10 15 20 25 30 35 40 45 50
generation
Effect of crossover method
(with k-means iterations)
Bridge
190
185
180
distortion
Random
Largest
partitions
175
170
Cent.dist.
Pairwise
165
PNN
160
0
5
10
15
20 25 30
generation
35
40
45
50
Effect of crossover method
(with k-means iterations)
Binary data
Bridge2
1.50
1.45
distortion
Random
1.40
1.35
Largest
partitions
Cent.dist.
Pairwise
1.30
PNN
1.25
0
5
10
15
20 25 30 35
generation
40
45
50
Mutations
Mutations
• Purpose is to implement small random
changes to the solutions.
• Happens with a small probability.
• Sensible approach: change the location of
one centroid by the random swap!
• Role of mutations is to simulate local search.
• If mutations are needed  crossover
method is not very good.
Effect of k-means and mutations
180
Bridge
Random crossover + K-means
Distortion
175
Mutations + K-means
170
PNN
165
PNN crossover + K-means
160
0
10
20
30
40
50
Number of iterations
K-means improves
but less vital
Mutations alone
better than
random crossover!
GAIS – Going extreme
Agglomerative clustering
PNN: Pairwise Nearest Neigbor method
– Merges two clusters
– Preserves hierarchy of clusters
IS: Iterative shrinking method

Removes one cluster

Repartition data vectors in
removed cluster
Iterative shrinking
Before cluster removal
+ +
S2
+
+
+
+
+
+
After cluster removal
+
S3
S1
+
+
+
+
+
+
+
+
+
+
xx
+
+
+
x
+
+
+
+
+
x
+
+
+
xx
+
+
+
+
+
+
+
x
+
+
+
+
+
+
S4
+
Code vectors:
+
x
x
S5
+
x
+
+
+
+
+
Data vectors:
Vector to be removed
x
Data vectors of the cluster to be removed
Remaining vectors
+
Other data vectors
+
Pseudo code
IS(X, M)  C, P
m  N;
FOR  i1, m:
ci  xi; pi  i; ni  1;
FOR  i1, m:
qi  FindSecondNearestCluster(C, xi);
REPEAT
CalculateRemovalCosts(C, P, Q, d);
a  SelectClusterToBeRemoved(d);
RemoveCluster(P, Q, a);
UpdateCentroids(C, P, a);
UpdateSecondaryPartitions(C, P, Q, a);
m  m - 1;
UNTIL m=M.
Local optimization of IS
Finding secondary cluster:
qi  arg min
1 j  m
j  pi
nj
nj  1
xi  c j
2
Removal cost of single vector:
Di 
nqi
nqi  1
xi  cqi
2
 xi  ca
2
Example (1)
0.000100
S3
0.000095
0.000090
F-ratio
0.000085
PNN
0.000080
0.000075
0.000070
IS
0.000065
minimum
0.000060
25
20
15
Number of clusters
10
5
Example (2)
0.000120
S4
0.000115
F-ratio
0.000110
PNN
0.000105
0.000100
0.000095
IS
0.000090
0.000085
minimum
0.000080
25
20
15
Number of clusters
10
5
Pseudo code of GAIS
[Virmajoki & Fränti, 2006: Pattern Recognition]
GeneticAlgorithm(X)  (C, P)
FOR i1 TO Z DO
Ci  RandomCodebook(X);
Pi  OptimalPartition(X, Ci);
SortSolutions(C,P);
REPEAT
{C,P}  CreateNewSolutions( {C,P} );
SortSolutions(C,P);
UNTIL no improvement;
CreateNewSolutions({C, P})  {Cnew, Pnew }
Cnew-1, Pnew-1  C1, P1;
FOR i2 TO Z DO
(a,b)  SelectNextPair;
Cnew-i, Pnew-I  Cross(Ca, Pa, Cb, Pb);
IterateK-Means(Cnew-i, Pnew-i);
Cross(C1, P1, C2, P2)  (Cnew, Pnew)
Cnew  CombineCentroids(C1, C2);
Pnew  CombinePartitions(P1, P2);
Cnew  UpdateCentroids(Cnew, Pnew);
RemoveEmptyClusters(Cnew, Pnew);
IS(Cnew, Pnew);
CombineCentroids(C1, C2)  Cnew
Cnew  C1  C2
CombinePartitions(Cnew, P1, P2)  Pnew
FOR i1 TO N DO
IF x  c 1
i
p
i
2
 xi  c p 2
2
THEN
i
pinew  pi1
ELSE
pinew  pi2
END-FOR
UpdateCentroids(C1, C2)  Cnew
FOR j1 TO |Cnew| DO
c new
j
 CalculateCentroid(Pnew, j );
PNN vs. IS crossovers
Further improvement of about 1%
166
Bridge
165
PNN crossover
MSE
164
163
IS crossover
162
PNN crossover + K-means
161
IS crossover + K-means
160
0
10
20
30
Number of Iterations
40
50
Optimized GAIS variants
GAIS short (optimized for speed):
- Create new generations only as long as the best
solution keeps improving (T=*).
- Use a small population size (Z=10)
- Apply two iterations of k-means (G=2).
GAIS long (optimized for quality):
- Create a large number of generations (T=100)
- Large population size (Z=100)
- Iterate k-means relatively long (G=10).
Comparison with image data
Random
K-means
SOM
FCM
Split
Split + K-means
RLS
Split-and-Merge
SR
PNN
PNN + K-means
IS
IS + K-means
GAPNN
GAIS (short)
GAIS (long)
Image sets (M=256)
Miss
Bridge
House
America
251.32
12.12
8.34
5.96
179.68
7.81
5.92
173.63
7.59
177.43
7.72
6.18
170.22
6.18
5.40
165.77
6.06
5.28
5.28
164.64
5.96
163.81
5.98
5.19
162.45
6.03
5.26
168.92
6.27
5.36
165.04
6.07
5.24
163.38
6.09
5.19
162.38
6.02
5.17
162.37
5.92
5.17
161.59
5.92
5.11
5.89
160.73
5.07
Popular
Simplest of
the good ones
Previous GA
BEST!
What does it cost?
Bridge
Random:
K-means:
SOM:
GA-PNN:
GAIS – short:
GAIS – long:
~0 s
8s
6 minutes
13 minutes
~1 hour
~3 days
Comparison of algorithms
Image sets
Bridge House
Random
K-means (aver.)
K-means (best)
SOM
FCM
Split
Split + k-means
RLS
Split-n-Merge
SR (average)
SR (best)
PNN
PNN + k-means
GKM – fast 10
IS
IS + k-means
GA (k-means)
GA (PNN)
SAGA
GAIS (short)
GAIS (long)
251.32 12.12
179.87 7.81
176.95 7.35
173.63 7.59
178.39 7.79
170.22 6.18
165.77 6.06
164.64 5.96
163.81 5.98
162.45 6.02
161.96 5.98
168.92 6.27
165.04 6.07
164.12 5.94
163.38 6.09
162.38 6.02
174.91 6.61
162.37 5.92
161.22 5.86
161.59 5.92
160.73 5.89
Birch data sets
Synthetic data sets
Time
Miss
B1
B2
B3
S1
S2
S3
S4
Bridge
America
8.34 14.44 35.73 8.20 78.55 72.91 55.42 47.05
<1
5.96 5.52 7.99 2.53 20.53 20.91 21.37 16.78
5
5.93 5.13 6.87 2.16 13.23 16.07 18.96 15.71
50
5.92 13.50 10.03 15.18 20.11 13.28 21.10 15.71
376
6.22 5.02 5.29 2.48 8.92 13.28 16.89 15.71
166
5.40 4.81 2.29 1.91 8.95 13.33 17.50 16.01
13
5.28 4.64 2.28 1.91 8.92 13.28 16.92 15.77
17
5.28 4.64 2.28 1.86 8.92 13.28 16.89 15.71
1146
5.19 4.64 2.28 1.93 8.92 13.28 16.91 15.75
85
5.27 4.84 3.39 1.99 9.52 13.68 17.31 15.80
213
5.25 4.76 3.12 1.98 8.93 13.28 16.89 15.71
2130
5.36 4.73 2.28 1.96 8.93 13.44 17.70 17.52
272
5.24 4.64 2.28 1.88 8.92 13.28 16.89 16.87
285
5.34 4.64 2.28 1.92 8.92 13.28 16.89 15.71 91721
5.19 4.70 2.28 1.89 8.92 13.29 16.96 15.79
717
5.17 4.64 2.28 1.86 8.92 13.28 16.89 15.71
719
5.54 6.58 5.96 2.45 11.66 15.99 19.22 16.14
654
5.17 4.98 2.28 1.98 8.92 13.28 16.89 15.71
404
5.10 4.64 2.28 1.86 8.92 13.28 16.89 15.71 74554
5.11 4.64 2.28 1.86 8.92 13.28 16.89 15.72
1311
5.07 4.64 2.28 1.86 8.92 13.28 16.89 15.71 387533
Variation of the result
25
GAIS
k-means
Frequency
20
15
IS + k-means
10
IS
PNN
5
0
160
165
170
175
MSE
180
185
190
Time vs. quality comparison
Bridge
190
SAGA
185
repeated
K-means
MSE
180
175
RLS
170
GAIS
PNN
IS
165
160
0
1
10
100
1000
Time (s)
10000 100000
Conclusions
• Best clustering obtained by GA
• Crossover method most important
• Mutations not needed
References
1.
2.
3.
4.
5.
6.
P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering
problems", Pattern Recognition, 39 (5), 761-765, May 2006.
P. Fränti, "Genetic algorithm with deterministic crossover for vector
quantization", Pattern Recognition Letters, 21 (1), 61-68, January
2000.
P. Fränti, J. Kivijärvi, T. Kaukoranta and O. Nevalainen, "Genetic
algorithms for large scale clustering problems", The Computer Journal,
40 (9), 547-554, 1997.
J. Kivijärvi, P. Fränti and O. Nevalainen, "Self-adaptive genetic
algorithm for clustering", Journal of Heuristics, 9 (2), 113-129, 2003.
J.S. Pan, F.R. McInnes and M.A. Jack, VQ codebook design using
genetic algorithms. Electronics Letters, 31, 1418-1419, August 1995.
P. Scheunders, A genetic Lloyd-Max quantization algorithm. Pattern
Recognition Letters, 17, 547-556, 1996.
Text box

Download Report

Genetic algorithms for clustering

Paperzz.com

Your Paperzz