K-means Clustering

K-Means Clustering
CMPUT 615
Applications of Machine Learning
in Image Analysis
K-means Overview
• A clustering algorithm
• An approximation to an NP-hard combinatorial
optimization problem
• It is unsupervised
• “K” stands for number of clusters, it is a user
input to the algorithm
• From a set of data points or observations (all
numerical), K-means attempts to classify them
into K clusters
• The algorithm is iterative in nature
K-means Details
•
X1,…, XN are data points or vectors or observations
•
Each observation will be assigned to one and only one cluster
•
C(i) denotes cluster number for the ith observation
•
Dissimilarity measure: Euclidean distance metric
•
K-means minimizes within-cluster point scatter:
1 K
W (C )     xi  x j
2 k 1 C (i )k C ( j )k
2
K
  Nk
where
mk is the mean vector of the kth cluster
Nk is the number of observations in kth cluster
k 1
 x m
C (i )k
i
k
2
K-means Algorithm
• For a given assignment C, compute the cluster means
mk:
 xi
mk 
i:C ( i )  k
Nk
, k  1,, K .
• For a current set of cluster means, assign each
observation as:
C (i)  arg min xi  mk , i  1,, N
2
1 k  K
• Iterate above two steps until convergence
Image Segmentation Results
An image (I)
Three-cluster image (J) on
gray values of I
Matlab code:
I = double(imread(‘…'));
J = reshape(kmeans(I(:),3),size(I));
Note that K-means result is “noisy”
Summary
• K-means converges, but it finds a local minimum
of the cost function
• Works only for numerical observations (for
categorical and mixture observations, Kmedoids is a clustering method)
• Fine tuning is required when applied for image
segmentation; mostly because there is no
imposed spatial coherency in k-means algorithm
• Often works as a staring point for sophisticated
image segmentation algorithms
Otsu’s Thresholding Method
(1979)
• Based on the clustering idea: Find the
threshold that minimizes the weighted withincluster point scatter.
• This turns out to be the same as maximizing
the between-class scatter.
• Operates directly on the gray level histogram
[e.g. 256 numbers, P(i)], so it’s fast (once the
histogram is computed).
Otsu’s Method
• Histogram (and the image) are bimodal.
• No use of spatial coherence, nor any other
notion of object structure.
• Assumes uniform illumination (implicitly), so
the bimodal brightness behavior arises from
object appearance differences only.
The weighted within-class variance is:
 (t)  q1(t) (t)  q2 (t) (t)
2
w
2
1
2
2
Where the class probabilities are estimated as:
t
q1 (t)   P(i)
q2 (t) 
i 1
I
 P(i)
i  t 1
And the class means are given by:
t
iP(i)
1 (t)  
i 1 q1 (t)
I
iP(i)
2 (t)  
i t 1 q2 (t )
Finally, the individual class variances are:
t
P(i)
 (t)  [i  1 (t)]
q1 (t)
i1
2
1
I
2
P(i)
 (t)   [i   2 (t)]
q2 (t)
i t 1
2
2
2
Now, we could actually stop here. All we need to do is just
run through the full range of t values [1, 256] and pick the
value that minimizes  w2 (t).
But the relationship between the within-class and betweenclass variances can be exploited to generate a recursion
relation that permits a much faster calculation.
Finally...
Initialization...
q1(1)  P(1) ; 1(0)  0
Recursion...
q1(t 1)  q1(t)  P(t 1)
q1 (t) 1 (t)  (t  1)P(t  1)
1 (t  1) 
q1 (t  1)
2 (t  1) 
  q1 (t  1)1 (t  1)
1  q1 (t  1)
After some algebra, we can express the total variance as...
   (t)  q1(t)[1  q1 (t)][1(t)  2 (t)]
2
2
w
Within-class,
from before
2
2
Between-class,  B (t)
Since the total is constant and independent of t, the effect of
changing the threshold is merely to move the contributions of
the two terms back and forth.
So, minimizing the within-class variance is the same as
maximizing the between-class variance.
The nice thing about this is that we can compute the quantities
2
in  B (t) recursively as we run through the range of t values.
Result of Otsu’s Algorithm
An image
Binary image
by Otsu’s method
0.06
0.05
Matlab code:
0.04
I = double(imread(‘…'));
0.03
0.02
I = (I-min(I(:)))/(max(I(:))-min(I(:)));
0.01
0
0
50
100
150
200
Gray level histogram
250
300
J = I>graythresh(I);