Validation of index SV

國立雲林科技大學
National Yunlin University of Science and Technology
N.Y.U.S.T.
I. M.
Validity index for clusters of different sizes
and densities
Presenter: Jun-Yi Wu
Authors: Krista Rizman Zalik, Borut Zalik
2011 PRL
Intelligent Database Systems Lab
Outline

Motivation

Objective

Methodology

Experiments

Conclusion

Comments
N.Y.U.S.T.
I. M.
2
Intelligent Database Systems Lab
Motivation
N.Y.U.S.T.
I. M.

Most of the previous validity indices have been considerably
dependent on the number of data objects in clusters, on
cluster centroids and on average values.

Most popular validity measures have the tendency to ignore
clusters with low density and are not efficient in validation of
partitions having different sizes and densities.
3
Intelligent Database Systems Lab
Objective
N.Y.U.S.T.
I. M.

Two cluster validity indices are proposed for efficient
validation of partitions containing clusters that widely
differ in sizes and densities.

To design a cluster validity index that is suitable for the
validation of partitions having different sizes and densities.
A good partitions:
 Overlap
 Compactness
 Separation distance
4
Intelligent Database Systems Lab
Methodology
N.Y.U.S.T.
I. M.
Review several popular validity indices
Dunn index; D Indx
XiE index
Davies-Bouldin’s index; DB index
C index
G index
G+ index
Partition coefficient; PC index
Classification entropy; CE index
5
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
Review several popular validity indices.
D Index
G+ Index
PC
CE
DB Index
C Index
G Index
XiE
6
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
new clustering validity indices.

SV-index

Validation of index SV

Fuzzification of the SV index

The proposed index OS exploiting overlap and separation measures

Overlap measure

Separation measure and validity index SV

Validation of index OS
7
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
SV-index
a measure for partition validity that consists of clusters that widely differ in
density or size
8
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
Validation of index SV
9
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
Fuzzification of the SV index
A fuzzy version of the index SV is obtained by integrating the membership
values in the variation measure.
10
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
The proposed index OS exploiting overlap and separation
measure

Experiment results suggested that inter-cluster separation plays a more
important role in cluster validation.

Indices are limited in their ability to compute the compactness and the
separation in partitions having overlapping clusters and clusters of
different sizes, which leads to an incorrect validation results.

Considering these results a cluster validity index is suggested based on
an overlap and separation measures.
11
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
Overlap measure
12
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
Separation measure and validity index SV
13
Intelligent Database Systems Lab
Methodology

N.Y.U.S.T.
I. M.
Validation of index OS
14
Intelligent Database Systems Lab
Experiments

N.Y.U.S.T.
I. M.
To demonstrate the effectiveness of the proposed SV and
OS indices for determining the optional number of
clusters.

Artificial data set A1

Artificial data set A2

Artificial data set A3

Iris data set

Wine data set

Glass data set
15
Intelligent Database Systems Lab
Experiments-Artificial data set A1
N.Y.U.S.T.
I. M.
16
Intelligent Database Systems Lab
Experiments-Artificial data set A2
N.Y.U.S.T.
I. M.
.
17
Intelligent Database Systems Lab
Experiments-Artificial data set A3
N.Y.U.S.T.
I. M.
18
Intelligent Database Systems Lab
Experiments-Artificial data set A3
N.Y.U.S.T.
I. M.
19
Intelligent Database Systems Lab
Experiments -Iris data set.
N.Y.U.S.T.
I. M.
.
20
Intelligent Database Systems Lab
Experiments-Wine data set
N.Y.U.S.T.
I. M.
21
Intelligent Database Systems Lab
Experiments-Wine data set
N.Y.U.S.T.
I. M.
22
Intelligent Database Systems Lab
Conclusion
N.Y.U.S.T.
I. M.

The experimental results proved that the new indices outperform
the other considered indices, especially when cluster widely differ
in sizes or densities.

A good partition is expected to have low degree of overlap and a
larger separation distance and compactness.

The maximum value of the ratio of the SV index and the minimum
value of the OS index indicate the optimal partition.
23
Intelligent Database Systems Lab
Comments

Advantage

Drawback


N.Y.U.S.T.
I. M.
….
Application

Clustering

Validity index
24
Intelligent Database Systems Lab