research.

CLASSIFICATION OF MIXED DATA POINTS
FOR COUPLED CIRCLES ESTIMATION
A Thesis presented to
the Faculty of the Graduate School
at the University of Missouri
In Partial Fulfillment
of the Requirements for the Degree
Master of Science
by
YUE ZHANG
Dr. DOMINIC K.C. HO, Thesis Supervisor
May 2015
The undersigned, appointed by the Dean of the Graduate School, have examined
the dissertation entitled:
Classification of Mixed Data Points for Coupled Circle Estimation
presented by Yue Zhang,
a candidate for the degree of Master of Science and hereby certify that, in their
opinion, it is worthy of acceptance.
Dr. Dominic K.C. Ho
Dr. Justin Legarsky
Dr. Jianlin Cheng
ACKNOWLEDGMENTS
I would like to take this opportunity to gratefully acknowledge everyone who made
the thesis possible. First and foremost, I want to show my deepest gratitude to my
advisor, Dr. Dominic K.C. Ho, a respectable, responsible and resourceful scholar, who
provides me with valuable guidance in every stage of writing this thesis. Without his
enlightening instruction, impressive kindness and patience, I could not have completed
my thesis. His keen and vigorous academic observation enlightens me not only in the
thesis but also in my future study.
I would also like to express my sincere gratitude to my committee members Dr.
Justin Legarsky and Dr. Jianlin Cheng for spending their valuable time and giving
me important suggestions that improved the quality of the thesis.
Thank you to all my friends, especially Yanlin and Sasa, for their helps and advices. Thanks to Dr. Zhenhua Ma who provided the asymptotically efficient estimator
algorithm which supports me a lot to complete the thesis.
Last but not least, I would like to thank University of Missouri, for giving me the
opportunity to pursue my Master of Science degree and receive high quality education
that enrich my experience.
ii
TABLE OF CONTENTS
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . .
ii
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xv
CHAPTER
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.1
Background and Motivations . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Previous Work on Circle Fitting . . . . . . . . . . . . . . . . . . . . .
3
1.3
Data Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.4
Data Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
1.5
Organization of the Thesis . . . . . . . . . . . . . . . . . . . . . . . .
11
1.6
Major Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2 Circles Fitting Basis . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.1
Signal Circle Estimator . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.1.1
Kasa Estimator . . . . . . . . . . . . . . . . . . . . . . . . . .
16
2.1.2
Maximum Likelihood Estimator [10] . . . . . . . . . . . . . .
18
2.2
Concentric Circles Estimator . . . . . . . . . . . . . . . . . . . . . . .
20
2.3
KCR Lower Bound for Concentric Circles Fitting . . . . . . . . . . .
24
2.4
Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
iii
2.4.1
Single Circle . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.4.2
Concentric Circles . . . . . . . . . . . . . . . . . . . . . . . . .
31
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
38
3 Circles Estimate with Known Number of Concentric Circles . . .
40
2.5
3.1
Distance Division Method . . . . . . . . . . . . . . . . . . . . . . . .
40
3.2
K-Means Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
42
3.3
Naive Bayes Classifier
. . . . . . . . . . . . . . . . . . . . . . . . . .
44
3.4
Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
3.5
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
64
4 Circles Estimation with Unknown Number of Concentric Circles .
66
4.1
Distance Threshold Method . . . . . . . . . . . . . . . . . . . . . . .
66
4.2
Mean Shift Method . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
4.3
Concentric Circles Estimation Without Knowing Number of Circles .
71
4.4
Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
72
4.5
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . .
91
5.1
Conclusion of Research . . . . . . . . . . . . . . . . . . . . . . . . . .
91
5.2
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
95
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
97
iv
LIST OF TABLES
Table
2.1
Page
Summary of the MSE of the Kasa method and the ML method with the
KCR lower bound with different selections of number of data points,
radius or the range of sampled arc β when the noise power is σ 2 = 10
2.2
30
Comparison of the MSE of the Kasa method and the Asymptotically Efficient method with the KCR lower bound when the ranges of
sampled arc are different. The noise power is σ 2 = 10 . . . . . . . . .
2.3
33
Comparison of the MSE of the Kasa method and the Asymptotically
Efficient method with the KCR lower bound when the difference of the
radii is increasing. The number of data point is N = 10, 10, 10. The
noise power is σ 2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1
35
Classification accuracy with different replication time of the K-Means
method. The number of data points of each circle is N = 10, 10, 10
and the radius of each circle is r0 = 140, 120, 100 . . . . . . . . . . . .
3.2
48
Classification accuracy with different number of training groups for
the Naive Bayes classifier. The number of data points of each circle is
N = 10, 10, 10 and the radius of each circle is r0 = 140, 120, 100 . . .
v
50
3.3
Comparison of classification accuracy of the Naive Bayes classifier with
the calculated prior probability and prior probability with the uniform
distribution. The radius of each circle is r0 = 140, 120, 100 . . . . . .
3.4
51
The classification accuracy of the Naive Bayes classifier with the crossvalidation technique. The number of data points of each circle is N =
10, 20, 30. The radius of each circle is r0 = 140, 120, 100 . . . . . . . .
3.5
52
Comparison of classification accuracy of the K-Means method, the
Distance Division method and the Naive Bayes method with the equal number of data points of the circles. The radius of each circle is
r0 = 140, 120, 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6
53
Comparison of classification accuracy of the K-Means method, the
Distance Division method and the Naive Bayes method with different number of data points of each circle. The radius of each circle is
r0 = 140, 120, 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7
56
Comparison of classification accuracy of the K-Means method, the Distance Division method and the Naive Bayes method with different radii
differences. The number of data points of each circle is N = 10, 10, 10
3.8
59
Comparison of classification accuracy of the K-Means method, the Distance Division method and the Naive Bayes method with different
number of concentric circles when the number of data points and radii
difference of each circle is the same . . . . . . . . . . . . . . . . . . .
vi
61
3.9
Comparison of classification accuracy of each circle of the K-Means
method, the Distance Division method and the Naive Bayes method
when the number of data points and radii difference of each circle is
the same. The number of circles is raising from 2 to 4 . . . . . . . . .
4.1
63
Estimation accuracy of the number of circles with appropriate window
width of the Mean Shift Method. The number of concentric circles is
1 to 4. The number of data points of each circle is 20. The difference
of radii is 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2
75
Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with the equal number of data
points of each circle. The radii are r0 = 140, 120, 100 . . . . . . . . .
4.3
76
Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with the different number of data
points of each circle. The radii are r0 = 140, 120, 100 . . . . . . . . .
4.4
77
Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with different radii. The number
of each circle is N = 20, 20, 20. The difference of radii is raising from
10 to 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5
78
Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with different number of concentric
circles. The number of circles is raising from 1 to 4 . . . . . . . . . .
vii
80
4.6
Comparison of classification accuracy of each circle of the Mean Shift
method and the Distance Threshold method when the number of data
points and the radii difference of each circle is the same. The number
of circles is raising from 2 to 4 . . . . . . . . . . . . . . . . . . . . . .
4.7
82
Summary of MSE of the circles center and radii, estimation accuracy
of circles number and classification accuracy of data points with the
Meanshift-Naivebayes estimator, truly classified data points estimation
when the number of data points and radii are different . . . . . . . .
4.8
85
Summary of MSE of the circles center and radii, estimation accuracy
of the number of circles and classification accuracy of points with the
Meanshift-Naivebayes estimator, truly classified data points estimation
when the number of concentric circles are different . . . . . . . . . . .
viii
87
LIST OF FIGURES
Figure
2.1
Page
Fitted circles of the Kasa method and the ML method when N = 10,
r0 = 20, range of sampled arc is 0 to 2π and noise power σ 2 = 10 . . .
2.2
27
Comparison of the MSE of the Kasa and the ML methods with the
KCR lower bound when N = 10, r0 = 20, range of sampled arc is 0 to π 27
2.3
Comparison of the MSE of the Kasa and the ML methods with the
KCR lower bound when N = 10, r0 = 20, range of sampled arc is 0 to
2π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4
28
Comparison of the MSE of the Kasa and the ML methods with the
KCR lower bound when N = 20, r0 = 20, range of sampled arc is 0 to
2π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5
29
Comparison of the MSE of the Kasa and the ML methods with the
KCR lower bound when N = 10, r0 = 10, range of sampled arc is 0 to
2π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.6
29
Comparison of the MSE of the Kasa and the ML methods with the
KCR lower bound when N = 10, r0 = 50, range of sampled arc is 0 to
2π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ix
30
2.7
Fitted circles of the Kasa method and the ML method when N =
12, 10, 8, r0 = 50, 30, 20, range of sampled arc is 0 to 2π and noise
power is σ 2 = 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.8
32
Comparison of the MSE of the Asymptotically Efficient method and
the Kasa method with the KCR lower bound when N = 12, 10, 8,
r0 = 50, 30, 20, range of sampled arc is 0 to π . . . . . . . . . . . . . .
2.9
32
Comparison of the MSE of the Asymptotically Efficient method and
the Kasa method with the KCR lower bound when N = 12, 10, 8,
r0 = 50, 30, 20, range of sampled arc is 0 to 2π . . . . . . . . . . . . .
33
2.10 Comparison of the MSE of the Asymptotically Efficient method and
the Kasa method with the KCR lower bound by changing the difference
of two radii. N = 10, 10, 10, r10 = 120 : 20 : 200,r20 = 110 : 10 : 150 and
r30 = 100. The range of sampled arc is 0 to 2π and noise power is σ 2 = 1 34
2.11 Comparison of the MSE of the Kasa method and the Asymptotically
Efficient method with the KCR lower bound when one data point is
wrongly assigned to another circle. N = 10, 10, r0 = 50, 30. The range
of sampled arc is 0 to π . . . . . . . . . . . . . . . . . . . . . . . . . .
36
2.12 Comparison of the MSE of the Kasa method and the Asymptotically
Efficient method with the KCR lower bound when the coupled circles
are estimated as a single circle. N = 10, 10, r0 = 50, 30. The range of
sampled arc is 0 to π . . . . . . . . . . . . . . . . . . . . . . . . . . .
x
37
2.13 Comparison of the MSE of the Kasa method and the Asymptotically
Efficient method with the KCR lower bound for the estimations of the
single circle center when a single circle is estimated as the coupled
circles. N1 = 20, r10 = 50. The range of sampled arc is 0 to π . . . . .
37
2.14 Comparison of the MSE of the Kasa method and the Asymptotically
Efficient method with the KCR lower bound for the estimations of the
single circle radius r10 = 50 when a single circle is estimated as the
coupled circles. N1 = 20, and the range of sampled arc is 0 to π . . .
3.1
38
Classification accuracy with different replication times of the K-Means
method. The number of data points of each circle is N = 10, 10, 10
and the radius of each circle is r0 = 140, 120, 100 . . . . . . . . . . . .
3.2
49
Classification accuracy with different number of training subset. The
number of data points of each circle is N = 10, 10, 10 and the radius
of each circle is r0 = 140, 120, 100 . . . . . . . . . . . . . . . . . . . .
3.3
50
Compare the classification accuracy with calculated prior probability
and the uniform prior probability. The number of data points of each
circle is N = 10, 10, 10 or N = 10, 20, 30 and the radius of each circle
is r0 = 140, 120, 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4
53
The classification accuracy of applied the cross-validation to the Naive
Bayes classifier. The number of data points number of each circle is
N = 10, 20, 30 and the radius of each circle is r0 = 140, 120, 100 . . .
xi
54
3.5
Comparison of classification accuracy of the K-Means method, the
Distance Division method and the Naive Bayes method with the equal number of data points of the circles. The radius of each circle is
r0 = 140, 120, 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6
54
Comparison of the MSE of the K-Means method, the Distance Division
method, the Naive Bayes classifier with the KCR lower bound when
the number of data points of the circles is equal. The number of data
points is N = 20, 20, 20. The radius of each circle is r0 = 140, 120, 100
3.7
57
Comparison of classification accuracy of the K-Means method, the Distance Division method and the Naive Bayes method with a different number of data points of each circle. The radius of each circle is
r0 = 140, 120, 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8
57
Comparison of the MSE of the K-Means method, the Distance Division
method, the Naive Bayes classifier and the KCR lower bound with a
different data points number of each circle. The number of data points
is N = 30, 15, 10. The radius of each circle is r0 = 140, 120, 100 . . . .
3.9
58
Comparison of classification accuracy of the K-Means method, the Distance Division method and the Naive Bayes method with different radii
differences. The number of data points of each circle is N = 10, 10, 10
59
3.10 Comparison of the normalized MSE of the K-Means method, the Distance Division method, the Naive Bayes classifier with the KCR lower
bound when the space between radii is different. The number of data
points of each circle is N = 10, 10, 10. The noise power is 10 . . . . .
xii
60
3.11 Comparison of classification accuracy of the K-Means method, the Distance Division method and the Naive Bayes method with different
number of concentric circles when the number of data points and radii
difference of each circle is the same. The number of data points of each
circle is 10. The difference of radii is 20 . . . . . . . . . . . . . . . . .
62
3.12 Classification accuracy of the circle with r0 = 140 by the K-Means
method, the Distance Division method and the Naive Bayes method.
The number of data points of each circle is 10. The difference of radii
is 40 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1
64
Appropriate window width of the Mean Shift Method. The number of
concentric circles is 1 to 4. The number of data points of each circle is
20. The difference of radii is 20 . . . . . . . . . . . . . . . . . . . . .
4.2
74
Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with the equal number of data
points of each circle. The radii of each circle are r0 = 140, 120, 100 . .
4.3
76
Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with the different number of data
points of each circle. The radii of each circle are r0 = 140, 120, 100 . .
4.4
78
Estimation of accuracy of the number of circles of the Mean Shift
method and the Distance Threshold method with the difference of
radii raising from 10 to 40. The number of data points of each circle
is N = 20, 20, 20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xiii
79
4.5
Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with the difference number of
concentric circles. The number of circles is raising from 1 to 4 . . . .
4.6
81
Comparison of the MSE of the Meanshift-Naivebayes estimate method
and the truly classified data estimation with the KCR lower bound
when N = 20, 20, 20, r0 = 140, 120, 100 . . . . . . . . . . . . . . . . .
4.7
83
Comparison of the MSE of the Meanshift-Naivebayes estimate method
and the truly classified data estimation with the KCR lower bound
when N = 10, 20, 30, r0 = 140, 120, 100 . . . . . . . . . . . . . . . . .
4.8
84
Comparison of the MSE of the Meanshift-Naivebayes estimate method
and the truly classified data estimation with the KCR lower bound
when N = 20, 20, 20, r0 = 120, 110, 100 . . . . . . . . . . . . . . . . .
4.9
84
Comparison of the MSE of the Meanshift-Naivebayes estimation method
and the truly classified data estimation with the KCR lower bound
when N = 20, r0 = 100 . . . . . . . . . . . . . . . . . . . . . . . . . .
88
4.10 Comparison of the MSE of the Meanshift-Naivebayes estimation method
and the truly classified data estimation with the KCR lower bound
when N = 20, 20, r0 = 120, 100 . . . . . . . . . . . . . . . . . . . . .
88
4.11 Comparison of the MSE of the Meanshift-Naivebayes estimation method
and the truly classified data estimation with the KCR lower bound
when N = 20, 20, 20, 20, r0 = 160, 140, 120, 100 . . . . . . . . . . . . .
xiv
89
ABSTRACT
Concentric circles fitting is a challenge task since the nonlinear fitting problem
needs to find out the implicit relationship between the noisy measurement data points
and the unknown parameters, circles center and radii, to be estimated. For most of
the concentric circles estimators, they require the knowledge of the number of circles
and the data points belonging to the different circles. However, this information is
often not available in practice. In this thesis, we shall try to solve these two problems.
When the number of concentric circles is available, we propose and compare three
classification methods, the K-Means method, the Distance Division method and the
Naive Bayes classifier, to assign the data points to the circles. If the number of
concentric circles is not known, the non-parametric data clustering methods, such as
the Mean Shift method and the Distance Threshold method, are developed in this
thesis to estimate the number of circles for the estimate later.
A new method is proposed to combine the Mean Shift method and the Naive Bayes
classifier to improve the joint estimation of the number of circles and classification
of data points. The performances of the proposed solutions are supported by the
simulations using synthetic data.
xv
Chapter 1
Introduction
1.1
Background and Motivations
Fitting circles to noisy data points is a nonlinear estimation problem. Given a collection of data points in the plane, an important goal of circle fitting research is to
locate the circle center and to estimate the radius. Unlike line fitting, circle fitting
is a much more challenging topic because of the nonlinear relationship between the
measurements and the unknown parameters. The estimation can be evaluated by the
mean square error (MSE) between the unknown variable estimates and true parameter values. The performance benchmark can be the KCR Lower Bound. There are
two basic approaches for fitting circles: one is from statistics that the noisy circle data
points are treated as a list of measurements and usually in the real valued coordinates
[1]; the other is based on the image processing, such as circular Hough Transform [2].
The thesis mainly uses the first approach.
1
Circle fitting is an important and basic research field which has attracted wide
variety of interests in many application areas. For instance, pattern recognition and
computer vision need extract the circular shapes from image data. The noise is
usually small in the applications and a large number of data points are available.
[3] provides a multiple circles fitting algorithm based on connectivity that does not
require an accurate initial guess of curves and effectively avoids false circles detection.
However, the method cannot deal with sparsely distributed edge points. [4] applies the
circle fitting to the iris recognition system based on the Delogne-Kasa circle fitting
technology to extract a more precise iris area from an eye image. Target tracking
application receives sequential measurement data points and estimates the circle that
changes center and radius over time. [5] proposes a tracking circle fitting algorithm by
using the Bayesian theorem that requires no assumption on the locations of the true
points on the circle. It assumes that the measurement noise is known and can obtain
more precise estimation result than traditional Bayesian approaches with high noise.
Circle fitting application can be used for localizing circle shaped landmarks in the
mobile robotics area which laser range readings are provided in the polar coordinates
system. [6] introduces a circle fitting algorithm in the Cartesian coordinates space
that is able to deal with errors in both coordinates of the range readings that are not
independent. A modified cylindrical reference target positioning system is proposed
by [7] which is inspired by the voting procedure of the Hough Transform. In microwave
engineering, the circle fitting problems often occur to deal with variable delays. The
paper [8] uses semi-parametric technique to find more accurate values of the variable
positions and the parameters of the circle.
For multiple circles estimation, one of the difficulties given a series of noisy data
2
points is to determine the number of circles. According to the examples shown at
Chapter 2, large error will be produced if single circle is estimated as coupled circles
and vice versa. To the best of the author’s knowledge, not many studies in literature
talk about the estimation of circles number of multiple concentric circles. If the
number of circles has already been known, we can directly separate the data points
into groups and do the estimation, otherwise we will need to find out the number
of circles first. The thesis aims on providing a new method to estimate the circles
number and classify the data points that belong to the different circles. The method
is based on the data clustering technique and, the Mean Shift algorithm, that groups
the data points in order to calculate the number of circles and assigns the data points
to right circles by the Naive Bayes classifier for the asymptotically efficient estimator
to fit the concentric circles. More details will be provided in the following chapters.
1.2
Previous Work on Circle Fitting
Many methods have been proposed over these years for single circle estimation, which
can be summarized as two solutions: iteration and non-iteration. The Kasa method
[9] is a closed form solution and can reach the CRLB accuracy when the noise is
white Gaussian and the noise level is small. The Full Least Square (FLS) method
is able to achieve the Cramer-Rao Lower Bound (CRLB) performance but it needs
iterative and numerical solution. The Maximum Likelihood (ML) method based on
the probability density function (PDF) of noise measurement has better performance
than the FLS method when the signal to noise (SNR) is high [10]. It can be achieved
by Newton-Raphson method or convolution method [11]. Nevertheless, the estimation
3
accuracy of ML method is heavily relies on the initial values. The Branch and Bound
principle can be used to compute the lower and upper bounds for each subspace on
the objective function of the ML solution which leads to an efficient search algorithm
[12]. The Average of Intersections (AI) method, the Reduced Least Squares (RLS)
method and the Modified Least Squares (MLS) method yield closed form solutions.
However, an obvious disadvantage of the AI method is that it fails if any three of the
circle points are collinear. It is also very unstable since small changes of relatively
close points will drastically lead to estimated center change and thus resulting in
a very different circle. The RLS method is also sensitive to the small changes in
the data points. The MLS method is robust against the measurement error. When
the outliers are outside the circle, the MLS method performs better than the Kasa
method which provides a better circle fit if the outliers are inside the circle [13].
The multiple circles estimation problem, such as concentric circles, is more complex than the single circle estimation. There are a few of studies presenting some
solutions to the concentric circles estimation problem. O’Leary gave a quadratically
constrained total least-square method for the coupled geometric objects fitting which
provides good result when the noise level is low [14]. The method proposed by Marot
and Bourennane is able to extend to the concentric circles fitting which depends on
the radii estimate only [15]. [16] combines the gradient Hough Transform and the
one-dimensional Hough Transform to obtain a concentric circles estimator that has
good reliability and is highly adaptive to the noise. Al-Subaihi proposed a new Least
Square norm modified by the Orthogonal Distance Regression to fit the concentric
circles [17]. Ma and Ho presented an asymptotically efficient estimator for the coupled circles and ellipses fitting which has explicit solutions and is derived through
4
equation error formulation and non-linear parameters transformation [18].
1.3
Data Clustering
Data clustering is a technique of assigning data points into clusters according to
similar characteristics. A cluster is a collection of data points that are similar to
one another in the same cluster and are dissimilar in other clusters. Clustering,
which is an unsupervised method, differs from classification method. The clustering
method makes no assumption about the number of clusters and tries to find an
optimal number of clusters depended on the objective function. On the other hand,
the group number is a known parameter for classification method. Data clustering
has been applied in many research fields. [19] maps the sensor network into a kind
of cluster model to compress the data and reduce the amount of data efficiently. A
novel clustering algorithm based on the minimum volume ellipsoid proposed by [20]
can be successfully applied to several computer vision problems formulated in the
feature space paradigm.
Data clustering can be classified in several categories like partitioning, hierarchical,
density-based, grid-based and so on. Partitioning is the most popular clustering
method which has several famous algorithms such as K-Means and K-medoids. The
number of clusters should be decided at the beginning. Hierarchical method, including
CURE, BIRH, Chameleon and etc., construct a hierarchical tree structure to execute
clustering [21]. Density-based method like DBSCAN and OPTICS, places data points
into the same cluster when the density of data points is higher than a set threshold.
Grid-based clustering method segments data space into various grids and performs
5
clustering with the data points in the grids. STING and WaveCluster are both the
grid-based clustering algorithms.
The partitioning approach is efficient but unstable with the noisy data points.
The hierarchical method has high clustering accuracy but is time consuming since
each instance has to be compared with all the objects, which leads to a high estimate
complexity [22]. The density-based algorithm can filter noise and is suitable of space
clustering while the drawback is inefficient. The clustering time of grid-based method
does not depend on the size of data set but the number of grid which reduces the
clustering time and limits the clustering accuracy to some extent.
The thesis uses the K-Means and Mean Shift methods to cluster the data points
of circles. K-Means is a widely used data clustering algorithm since it is easy to
implement and very efficient. It assumes that the number of clusters is known as a
priori and the centers of clusters are iteratively updated based on optimization of the
objective function [23]. The performance of K-Means clustering depends heavily on
the selection of cluster number k and the initial cluster centers. If the selected initial
centers are near to the optimum solution, the convergence will be guaranteed. Considering the weakness and in order to improve the performance of K-Means to suit
of different situations, a lot of variations of K-Means have been proposed. Bradley
and Fayyad introduced a modified K-Means method that can handle large databases
and reduce the dependence on the selection of cluster centers [24]. Dhillon proposed
a novel weighted kernel K-Means algorithm that can monotonically decrease the normalized cut [25]. K-Modes presented by [26] is analogous to the traditional K-Means
method which is able to derive clusters from categorical (nominal scale) data.
The Mean Shift algorithm is an unsupervised, non-parametric, iterative clustering
6
method proposed by Fukunaga and Hostetler for locating maximum of an estimated probability density function [27]. It was readopted and generalized by Cheng
as a gradient ascent method and can be used in clustering and global optimization
[28]. Unlike the K-Means method, the Mean Shift clustering does not rely upon a
previous knowledge of the number of clusters. Mean Shift is widely used in many
imaging applications such as image and video segmentation, object tracking and texture classification and etc. [29] proposed a general non-parametric technique based
on Mean Shift to analyze the complex multimodal feature space and delineate arbitrarily shaped clusters which can be applied to discontinuity preserving smoothing
and image segmentation. [30] proposed an anisotropic kernel Mean Shift technique
in which the shape, scale and orientation of the kernels can adapt to the local structure of the image and video. The Mean Shift algorithm is suitable for gradient-based
optimization that can be used to solve the target localization problem [31]. [32] exploits a locality-sensitive hashing technique to reduce the computational complexity
of adaptive Mean Shift algorithm for a texture classification study. The disadvantage
of Mean Shift method applied to data clustering is the computational complexity of
large data set. Several techniques have been developed to increase the computation
speed. [33] uses the Fast Gauss Transform to speed up the sum in the Mean Shift
iteration. [34] proposes a fast Mean Shift procedure that applies a new iteration strategy based on updating cluster centers according to dynamically updated sample
set.
7
1.4
Data Classification
Data Classification is a fundamental issue in many research fields such as machine
learning and data mining. Given a serious of input data with class labels, which
is called the training set, containing multiple attributes or features, the goal is to
construct a classifier by developing an accurate description or model for each class
using the features presented in the data. Once derived, the classifier can be used to
categorize future test data for which the class labels are unknown, as well as develop
a better understanding of the data base contents. Data Classification has numerous
applications including image classification, bioinformation, pattern recognition, etc.
[35] used the Back Propagation Network and Probabilistic Neural Network to classify
the brain tumor in MRI image by the texture features. [36] proposed a rotationinvariant classifier that can achieve the balance between the privacy quality and
performance accuracy. The least square SVM with an RBF kernel is used by [37]
to classify the microarray data to optimize the performance of clinical predictions.
[38] developed a deep convex network for speech recognition from the Deep-NeuralNetwork/ Hidden-Markov-Model which can deal with virtually unlimited amount of
training data. Most of the classifiers construct the classification models by learning
process. In many applications, there may be not many training examples available.
Various classification algorithms have been designed to tackle the problem. The
Naive Bayes Classifier is a simple and easy learning algorithm by assuming that the
attributes are independent given classes [39]. In practice the Naive Bayes method
often competes well with more sophisticated algorithms although independence is
generally a poor assumption. It bases on the Bayes theorem to train a classifier
that can output the probability of possible classes for the new data set. A detailed
8
introduction of the Naive Bayes classifier is in the chapter 3 for the classification
of circles data points. Compared with other classification methods, the Decision
Tree classifier is relatively fast and can be converted into simple classification rules
[40]. A decision tree is built by the training data which consists of internal nodes
and leaf nodes. The internal nodes represent a decision on an attribute. The leaf
nodes are the classes. The decision path of a new data is traced from the root to
a leaf node that holds the class predication. The Decision Tree has many extended
algorithms that have better classification accuracy such as ID3, CART, C5.0, etc.
The Decision Tree classifier is suffered from the fragmentation problem that data
is split based on the test and very little data left to base decisions after two dozen
levels [41]. The Neural Network algorithm inspired by the central nervous system
is a layered graph with one node feeding into one or many other nodes in the next
layer. The classification process concentrates on the structure of the graph and the
weights assigned to the links between the nodes that are computed in the iterative
training procedure [42]. The Neural Network model is very complex computationally
that needs a lot of training data and the iteration is slow to converge. It gives a lower
classification error rate than the Decision Tree but requires longer learning time. The
Support Vector Machine (SVM) is a relatively new classification algorithm that has
remarkable robust performance with noisy data [43]. The SVM non-linearly maps the
data into a higher dimensional space and constructs an optimal separating hyperplane
in the space. It can not only correctly separate data into right classes, but also
identify instances whose classification is not supported by the data. The SVM yields
better classification results than the Neural Network regarding accuracy, simplicity
and robustness [44]. The k Nearest Neighbours (KNN) method is one of the oldest
9
and simplest methods for data classification [45]. The majority voting among the
data points in the neighbourhood is used to decide the classification groups of k. The
success of classification highly depends on the selection of this value. One of simple
methods of choosing k is to run the program many times with different k and choose
the one obtained the best performance. The KNN method has a high cost of new
instances classification since all computation happens at classifying time rather than
other classifiers that training examples are first encountered. The Bayesian Network
is a graphical model that encodes probabilistic relationship among variables [46]. It
is able to handle incomplete data sets and avoid data over fitting. The Bayesian
Network can be used to learn the causal relationships and is an ideal representation
for combining the prior knowledge and data set.
The Naive Bayes classifier is the simplest form of Bayesian Network, in which all
features are conditional independent. [47] uses the Naive Bayes classifier to calculate
membership probabilities to the individual sources based on the X-ray properties and
visual properties. [48] developes a non-parametric Naive Bayes approach that can
tackle the issue of the non-normality of numerical continuous data. A new learning
algorithm proposed by [49] combines the Naive Bayes classifier and Decision Tree
method that performs balance detection and keeps false positive at acceptable level
for different network types. The modified algorithm can handle continuous attributes,
deal with missing attribute values and reduce noise in the training data. [50] aggregates the Naive Bayes method of the correlated flows for traffic classification that can
improve the classification performance when few training data are available.
10
1.5
Organization of the Thesis
This section describes a brief content of the research works included in the thesis.
Chapter 2 will introduce several single circle fitting methods and multiple circles
fitting methods. The ML solution based on the Taylor-series linearization will be
mainly introduced which has good performance even when the noise power is large.
We shall compare its performance with the Kasa method that is non-iterative and
has a closed form solution. The concentric circles estimator proposed in [18], which
is an asymptotically efficient estimator that needs one time of iteration and avoids
the initialization problem will be presented in details. All the introduced methods
will be tested in the simulation section. The KCR lower bound expressions for the
estimated parameters of concentric circles will also be developed as evaluation.
In Chapter 3 we will talk about the situation that the number of circles M is
known and what need to be done is classifying the data points into groups for circles estimation. Three classification methods, Distance Division, K-Means and Naive
Bayes classifier, will be presented. All the methods assume all the data points belonging to a signal circle to estimate the circle center at the beginning and calculate
the distance between each individual data point and the approximate circle center.
The following steps of the three methods are different. The K-Means method uses
the known circles number k to do the unsupervised data clustering, but the Distance
Division method equally divides the data points into 2M − 1 groups and estimates
the concentric circles by the 1st , 3rd , ... , (2M − 1)th groups to obtain temporary
circles radii. It then compares the distances of data points of 2nd , ... , (2M − 2)th
groups to the neighbor radii to decide the assignment. The Naive Bayes classifier
builds the classifier model from the training subset and classifies the new data subset
11
based on the model. A cross-validation method is applied to the Naive Bayes classifier
to improve the utilization rate of training data subset when the number of received
circles data is limited. The K-Means algorithm is rapid and easy to implement. It
performs better than the Distance Division method when the number of data points
of each circle is different. The Distance Division method has higher accuracy than
the K-Means method when the number of data points of each circle is the same. The
Naive Bayes classifier usually achieves the best classification result when the training
data subset is large enough. The classification accuracy of the Naive Bayes classifier
is determined by the training procedure which can be implied by the non-parametric
data clustering algorithm, such as the Mean Shift method.
The solution of estimating the number of concentric circles will be presented in
Chapter 4. Two different methods: Distance Threshold and Mean Shift, will be
introduced and compared. As with the previous chapter, both methods depend on
the distance between the data points and the assumed signal circle center. The first
method based on the probability density function of distance computes threshold
to separate the data by a given false alarm probability. The Mean Shift algorithm
clusters the data by the distance distribution of data points. The Distance Threshold
approach works better when the number of data points is limited, but it cannot
hold more than three concentric circles. The Mean Shift method is not good for
small number of data points, however it provides better result when the number
of data points or the number of concentric circles is large. Both methods can be
used as a training procedure for the prerequisite of the Naive Bayes classifier. A
new method combined the Mean Shift and Naive Bayes classifier is proposed for the
multiple concentric circles without the knowledge of the number of circles. It improves
12
the accuracy of circles number estimate to almost 100 percent and achieves higher
classification accuracy at high level of noise power.
Chapter 5 summaries the thesis and discusses the idea of future research works.
1.6
Major Contribution
The main contributions of the thesis are listed as follow:
i. Three methods to cluster the data points of concentric circles for estimation are
presented when the circles number is known a priori. The K-Means is a rapid and
robust method that is suitable to every individual circle with the same or different
number of data points and the MSE of the estimated circles center and radii can
reach the KCR lower bound. The Distance Division approach gets higher classified
accuracy when the number of data points in every circle is equal or nearly equal to each
other. The Naive Bayes classifier is fast and usually obtains the highest classification
accuracy but it requires large number of data points to train the classified model.
The cross-validation technique can be used to enhance the stability of performance
of the Naive Bayes classifier when the number of data points is limited.
ii. Propose two solutions to estimate the number of concentric circles when the
circles number is unknown. The Mean Shift algorithm is able to deal with the situation
that the number of circles is large or the number of data points is large enough. The
Distance Threshold approach performs better than the Mean Shift algorithm when
the number of data points is limited. Although both the methods can also assign
the data points, the performance is worse than the three classification methods above
since the number of circles is unavailable.
13
iii. A new method that combines the Mean Shift method and Naive Bayes classifier
to produce a new concentric circles estimator that does not require the number of
circles to be known. The cross-validation technique is applied to reduce the influence
of limited number of received data points. The new method improves the accuracy
of the circles number estimation to 100% and raises the accuracy of data points
assignment when the noise power is high. The MSE of the estimated parameters,
center and radii, can reach the KCR lower bound performance when the received
data is not heavily affected by the noise.
14
Chapter 2
Circles Fitting Basis
2.1
Signal Circle Estimator
From the view of realizing approaches, the circle fitting solutions can be separated
as iteration method and non-iteration method. This section will give an example of
each solution. The Kasa method that is simple and obvious is a non-iterative method.
The Maximum Likelihood (ML) estimator can be solved by Gauss-Newton iteration.
Set N be the number of noisy data points defined as
si = s0i + ni
(2.1)
where si = [xi , yi ]T , i = 1, 2, ..., N . s0i = [x0i , yi0 ]T is the true data points collected from
the circle whose center is c0 = [a0 , b0 ]T and radius is r0 . They satisfy the following
equation
0
si − c0 = r0
15
(2.2)
where k∗k is the Euclidean norm. ni is the noise modeled as zero mean
 Gaussian

1 δ 
with a diagonal covariance matrix Q = diag{R, R, ..., R} where R = σ 2 
 and
δ 1
σ 2 is the noise power, δ is the correlation coefficient in the x and y noise components.
The value of a correlation coefficient ranges between -1 and 1. If the correlation
coefficient is close to 1, it indicates that the two variables are positively linear related
and the scatter plot almost falls along a straight line with positive slope. δ = −1
means that the variables are negatively linear related. And for 0, it indicates a weak
linear relationship. Our target is finding out the unknown parameters estimation
θ̂ = [ĉT , r̂]T .
The circle can be presented by the Chan & Thomas parametric form
x0i = a0 + r0 cos(φ0i )
(2.3)
yi0 = b0 + r0 sin(φ0i )
where φ0i = tan−1
2.1.1
yi0 − b0
is the phase of points s0i .
x0i − a0
Kasa Estimator
The formulation of Kasa method is shown as
(ĉ, r̂) = arg min
N
X
[(xi − a)2 + (yi − b)2 ]2 .
i=1
16
(2.4)
The minimum can be gotten by derivation method. Since the derivative of (2.4) are
non-linear with respect to a, b and r, a simple change can make them linear
A = −2a
(2.5)
B = −2b
C = a2 + b 2 − r 2 .
Then (2.4) can be written as
(ĉ, r̂) = arg min
N
X
[(Axi + Byi + zi + C]2
(2.6)
i=1
where zi = x2i + yi2 . Differentiate (2.6) with respect to A, B, C to yield the linear
equations
A
N
X
x2i
+B
i=1
A
A
N
X
i=1
N
X
i=1
N
X
xi y i + C
i=1
N
X
xi yi + B
xi + B
yi2 + C
i=1
N
X
N
X
xi = −
i=1
i=1
N
X
N
X
yi = −
i=1
yi + C = −
i=1
N
X
xi zi
yi zi
(2.7)
i=1
N
X
zi .
i=1
By solving (2.7), rewrite them as
XT XH = −XT Z
17
(2.8)
where H = (A, B, C)T and Z = [z1 , z2 , ..., zN ]T . Then the solution of (2.8) is
H = −X−1 Z
(2.9)
where X −1 is the pseudo inverse of X, and it can be computed by the singular value
decomposition (SVD) of matrix [51]. The SVD solution works stably even in the
singular case, where det X T X = 0.
Finally we obtain the estimated vector θ̂,
A
2
B
b̂ = −
√2
A2 + B 2 − 4C
r̂ =
.
2
â = −
(2.10)
The Kasa algorithm is very simple and fast. It costs less than the ML method
solved by the Gauss-Newton iteration. However, a direct solution is usually not
the best option, (2.7) is an analogue of normal equations that may cause numerical
instability [52]. A more stable solution will be described in the next section.
2.1.2
Maximum Likelihood Estimator [10]
The ML estimator for the circle estimator can be formulated as
(ĉ, r̂) = arg min
N
X
ksi − s0i (θ)k2 .
(2.11)
i=1
The above formulation is an equality constrained optimization problem and the ML
solution is the value of θ that minimizes (2.11). This function cannot be solved
18
directly since s0i (θ) is related to θ in a highly nonlinear manner. The Taylor-series
linearization approach can be used to obtain a suitable value of θ through iteration.
Let θ 0 = [a0 , b0 , r0 ]T be an initial solution guess. It can be set as random values
in the simulation. According to the Taylor-series, s0i (θ) can be expanded as
s0i (θ) ≈ s0i (θ 0 ) + Gi (θ 0 )θ 0 (θ − θ 0 )
(2.12)
∂s0i (θ) where Gi (θ 0 ) =
is the gradient matrix. Then we can obtain
∂θ θ0
(ĉ, r̂) ≈ arg min
N
X
ksi − s0i (θ 0 ) − Gi (θ 0 )(θ − θ 0 )k2 .
(2.13)
i=1
(2.13) is a quadratic equation whose minimum can be achieved when
"
θ1 = θ0 +
N
X
#−1 "
Gi (θ 0 )T Gi (θ 0 )
·
N
X
Gi (θ 0 )T si − s0i (θ 0 )
#
.
(2.14)
i=1
i=1
(2.14) can be iterated to obtain the optimized value of θ
"
θ k+1 = θ k +
N
X
#−1 "
Gi (θ k )T Gi (θ k )
·
N
X
#
Gi (θ k )T
si − s0i (θ k )
(2.15)
i=1
i=1
where k is the iteration count and k = 0, 1, ... The iteration stops when kθ k+1 − θ k k <
δ, where δ is a really small value.
We now need to determine s0i (θ k ) and Gi (θ k ) for the iterative solution. Using
19
(2.3) and θ k = [ak , bk , rk ]T , s0i (θ k ) can be represented as
 


 ak 
cos(φi )
s0i (θ k ) =   + rk 

bk
sin(φi )
(2.16)
yi0 − bk
. Since s0i = [x0i , yi0 ]T is unavailable, we replace it by si =
x0i − ak
[xi , yi ]T to obtain the s0i (θ k ) expression. The gradient matrix is
where φi = tan−1
∂xi
 ∂a

=

 ∂y

∂s0i (θ) Gi (θ k ) =
∂θ θk
i
∂a
∂xi
∂b
∂yi
∂b

∂xi ∂r 



∂yi 
∂r θk
(2.17)
where
∂xi (xi − ak )2 ∂xi (xi − ak )(yi − bk )
=
=
,
∂a θk
rk2
∂b θk
rk2
∂yi (yi − bk )2 ∂yi (xi − ak )(yi − bk )
=
,
=
2
∂b θk
rk
∂a θk
rk2
∂xi xi − ak ∂yi y i − bk
=
,
=
.
∂r θk
rk
∂r θk
rk
2.2
(2.18)
Concentric Circles Estimator
The concentric circles mean a variety of circles with the same center and different
radii. Let Ni be the number of data points of the ith circle and the noisy data points
on this circle is Si = {s11 , s12 , ..., s1N1 , s21 , s22 , ..., s2N2 , ..., sM 1 , sM 2 , ..., sM NM }, where
20
M is the number of circles. The data from the concentric circles are modeled as
sij = s0ij + nij , i = 1, 2, ..., M, j = 1, 2, ..., Ni
(2.19)
where sij = [xij , yij ]T is the noisy data point, s0ij = [x0ij , yij0 ]T is the true data
point sampled from the concentric circles with center c0 = [a0 , b0 ]T and radii ri0 , i =
1, 2, ..., M . They satisfy the following equations
0
sij − c0 = ri0
(2.20)
where k ∗ k is the Euclidean norm. nij is the noise modeled as zero mean Gaussian
with a covariance
 matrix Q. Q = diag{R, R, ..., R} is a block diagonal matrix where

1 δ 
R = σ2 
 and σ 2 is the noise power, δ is the correlation coefficient in the x and
δ 1
y noise components. The unknown parameters are θ̂ = [â, b̂, r̂i ]T , i = 1, 2, ..., M .
The solution can be easily extended to more than two circles or reduced to one
circle. The thesis uses the asymptotically efficient estimators for coupled circles fitting
proposed by [18].
2
From (2.19) and (2.20), we get the equation ksij − c0 − nij k = ri0 . Expanding
the square on the left and rearranging the terms yield
0 T
2(sij − c ) nij −
nTij nij
2
= ksij k −
2sTij c0
−
2
ri0
0 2
− kc k
.
(2.21)
The left side is the error and noise and right side is concentric circles parameters.
21
Collecting all N equations for i = 1, 2, ..., M and j = 1, 2, ..., Ni gives
Bc nij − η = hc − Gc ϕ0c

0 T
(si1 − c )

..
where Bi = 
.


0T
···
..
.
(2.22)

T
0
..
.


 and Bc = 2 · diag{B1 , B2 , ..., BM },


(siNi − c0 )T

T
 2s11
 .
 ..


 T
 2s1N1

 



 2sT21
h1 


2
 .
 
 ksi1 k 
 ..
 h2 
 . 


..  and hc = 
hi = 
,
G
=

 . 
c


 2sT




 2N2
 .. 

 
ksiNi k2
 ..
 .
hM


 2sT
 M1
 .
 ..


2sTM NM


0
c


 2

 r0 − kc0 k2 
1



. In Bi , 0T = [0, 0]T . The noise
..


.




0 2
0 2
rM − kc k
···
0
..
.
0
..
.
···
..
.
1 0
0
···
0
..
.
1
..
.
0
..
.
···
..
.
0
..
.
1
..
.
0
..
.
···
..
.
0
..
.
0 ···
.. ..
. .
0
..
.
0 0 ···
0
1
..
.

0
.. 
.



0


0

.. 
.


0


.. 
.


1

.. 
.


1
, ϕ0c =
N ×(M +1)
ni = [nTi1 , nTi2 , ..., nTiNi ]T and n =
[n1 , n2 , ..., nM ]T , η is an N × 1 vector whose elements are nTij nij . ϕ0c is the reparameterization form of θ 0c and there is an unique one-to-one mapping relationship
between them. ϕ0c is the unknown vector to be found. The second order noise components can be negligible when the SNR is enough, and Bc can be treated as nearly
22
noiseless and η is insignificant. Then we can obtain the negative log-likelihood function for (2.22)
(ĉ, r̂) = −arg min
Ni
M X
X
(hc − Gc ϕ0c )T Wc (hc − Gc ϕ0c )
(2.23)
i=1 j=1
where Wc = (Bc QBTc )−1 is the weighting matrix. The solution is
ϕc = (GTc Wc Gc )−1 GTc Wc hc .
(2.24)
Since Wc contains unknown parameters, the circles center, the solution strategy
can set Wc as an identity matrix to obtain an initial estimate of ϕ0c . Then a suitable
Wc will be generated after the first time iteration. The performance can be improved
through iteration. From the simulation later, we find that twice iteration is sufficient.
Further iteration cannot generate better results.
Since Gc includes the noisy data point si , the ill-conditioned problem may occur
when performs the inverse in (2.24). In that case, the pseudo inverse need be used to
instead. The pseudo inverse is a matrix inverse-like instrument that can be defined
for a complex matrix, even if it is not necessarily squared. The pseudo inverse of a
matrix can be expressed by the singular value decomposition (SVD) [51].
From the equation of ϕc calculated from (2.22) the estimated parameters can be
23
expressed as




â
ϕc (1)
  

  

 b̂  

ϕ
(2)
c
  

  

p



.
θ̂ c =  r̂1  = 
ϕc (1)2 + ϕc (2)2 + ϕc (3)

  

..
 ..  

.
 .  

  

p
2
2
r̂M
ϕc (1) + ϕc (2) + ϕc (3 + M − 1)
2.3
(2.25)
KCR Lower Bound for Concentric Circles Fitting
The KCR lower bound presented by Chernov and Lesort is extended from the lower
bound proposed by Kanatani which can be applied to evaluate any biased curve fitting
estimator by the leading term of the mean square error of estimated parameters [53].
For the concentric circles fitting, the KCR lower bound of the unknown parameter
vector θ̂ = [â, b̂, r̂i ]T , i = 1, 2, ..., M is
h
i−1
T
T
KCR(θ̂) = P0c (B0c QB0c )−1 P0c
24
(2.26)


T
0
0 T
0

(si1 − c ) · · ·


.
.
.
0
 and B0c = 2·diag{B01 , B02 , ..., B0M }, P0c =
..
..
..
where Bi = 




T
0
0 T
0
· · · (siNi − c )


0
0 T
0
r1 0 0 · · · 0 
 (s11 − c )

..
.. .. .. ..
.. 

.
.
.
.
.
. 





 0
 (s1N1 − c0 )T r10 0 0 · · · 0 



 0
0
T
0
 (s21 − c )

0
·
·
·
0
0
r
2



..
.. .. .. ..
.. 

.
. . . .
. 


. In B0i , 0T = [0, 0]T .
2


 (s0 − c0 )T
0 r20 0 · · · 0 
 2N2



..
.. .. .. ..
.. 

.
. . . .
. 




0 
 (s0 − c0 )T
0
0
0
·
·
·
r
M
 M1

..
.. .. .. ..
.. 

.
. . . .
. 




0 T
0
0
0 0 0 · · · rM
(sM NM − c )
N ×(M +1)
2.4
Simulation
We will test the performance of the presented estimators and compare it with the
KCR lower bound.
2.4.1
Single Circle
Randomly collect N data points S0 = {s01 , s02 , ..., s0N } from an arc of the circle with
center c0 = [0, 0]T , radius r0 and range from 0 to β. Add zero mean Gaussian noise to
the true data points to generate the noisy data point S = {si |i = 1, 2, ..., N }. Setting
the range of noise power σ 2 = [10−2 , 102 ] and the correlation coefficient δ in the x
25
and y noise components is 0.8. In application, the noise power σ 2 and correlation
coefficient δ can be computed from the received data. The estimation accuracy is
evaluated by the mean square error (MSE) of the estimated parameters vector θ̂,
M SE(θ) =
2
PL 0
i=1 θ̂(l) − θ L
(2.27)
where θ̂(l) is the solution of ensemble run l, θ 0 is the true value and L = 1000 is
enough for the ensemble runs.
Fig 2.1 shows the circle fitting result of the Kasa method, the ML method and
compares with the true circle when the number of data points is N = 10, radius is
r0 = 20, the range of sampled circle is [0, 2π] and noise power is σ 2 = 10. Because
of the noisy measurement data points, the fitted circles of the Kasa method and the
ML method are nearly overlapped but deviate from the true circle. The result states
that the estimation accuracy is heavily affected by the noise.
Fig 2.2 illustrates the MSE of the Kasa method and the ML method and compares
with the KCR lower bound when the number of data points is N = 10, radius is
r0 = 20, the range of arc is [0, π] and noise power changes from 10−2 to 102 . When
noise power is less than 1, both the methods can achieve the KCR lower bound.
After the noise power reaches 10, the deviation is happening on both methods, but
the estimation accuracy is acceptable.
Fig 2.3 changes the range of sampled arc and keeps the other parameters the
same. The MSE and the KCR lower bound is smaller when the range of sampled arc
is become larger. It is an obvious result since we can obtain more information from
the whole circle than half of the circle.
26
true center
estimated Kasa center
estimated ML center
N sample points
Kasa circle
ML circle
True Circle
40
30
Y cordinate
20
10
0
−10
−20
−20
−10
0
10
X cordinate
20
30
40
Figure 2.1: Fitted circles of the Kasa method and the ML method when N = 10,
radius is r0 = 20, range of sampled arc is [0, 2π] and noise power is σ 2 = 10.
25
20
Kasa
ML
KCR
15
10*log(MSE)
10
5
0
−5
−10
−15
−20
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 2.2: Comparison of the MSE of the Kasa and the ML methods with the KCR
lower bound when N = 10, r0 = 20, range of sampled arc is [0, π].
27
25
Kasa
ML
KCR
20
15
10*log(MSE)
10
5
0
−5
−10
−15
−20
−25
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 2.3: Comparison of the MSE of the Kasa and the ML methods with the KCR
lower bound when N = 10, r0 = 20, range of sampled arc is [0, 2π].
Fig 2.4 adds 10 more data points and keeps the other parameters the same as fig
2.3. It indicates that both the Kasa method and the ML method have lower MSE as
N increases because more data points means more information.
In generating fig 2.5 and fig 2.6, keep the other parameters the same as N = 20,
range of arc is [0, 2π] and the range of noise power is σ 2 = [10−2 , 10−2 ] with a step
of 10 and change the radius from 10 to 50. We can see that when the radius is too
small, the MSE curves of both the methods have a larger deviation than the radius
is large. The reason is that the data points distributed in the small circle are more
easily affected by the noise than the data points collected from the large circle.
28
20
Kasa
ML
KCR
15
10
10*log(MSE)
5
0
−5
−10
−15
−20
−25
−30
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 2.4: Comparison of the MSE of the Kasa and the ML methods with the KCR
lower bound when N = 20, r0 = 20, range of sampled arc is [0, 2π].
25
20
Kasa
ML
KCR
15
10*log(MSE)
10
5
0
−5
−10
−15
−20
−25
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 2.5: Comparison of the MSE of the Kasa and the ML methods with the KCR
lower bound when N = 10, r0 = 10, range of sampled arc is [0, 2π].
29
20
Kasa
ML
KCR
15
10
10*log(MSE)
5
0
−5
−10
−15
−20
−25
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 2.6: Comparison of the MSE of the Kasa and the ML methods with the KCR
lower bound when N = 10, r0 = 50, range of sampled arc is [0, 2π].
Table 2.1 is a summary of situations with different initial parameters when the
noise power is 10. For both the methods, we can get smaller mean square error and
the smaller value of the KCR lower bound in the log scale if more data points are
collected or collected from the longer arc. What is more, the MSE of radius is raising
as the radius is enlarged, but the MSE of radius will sharply increase when the radius
is too small.
Table 2.1: Summary of the MSE of the Kasa method and the ML method with the
KCR lower bound with different selections of number of data points, radius or the
range of sampled arc when the noise power is 10.
N = 10 N = 10 N = 20 N = 10 N = 10
MSE (10 log(∗)) r0 = 20 r0 = 20 r0 = 20 r0 = 10 r0 = 50
β = π β = 2π β = 2π β = 2π β = 2π
Kasa
12.24
7.60
5.32
9.49
7.71
ML
12.24
7.30
4.79
8.77
7.64
KCR
11.53
7.16
4.07
7.66
7.55
30
2.4.2
Concentric Circles
In this simulation section, we will use both the Kasa mehtod and the introduced
asymptotically efficient estimator to estimate the concentric circles. Different from
the asymptotically efficient estimator estimates the whole concentric circles simultaneously, the Kasa method estimates the circles separately and averages all the
centers independently as the concentric circles center. Set the number of circles is
M = 3 and the numbers of data points of the circles are N1 = 12, N2 = 10, N3 = 8.
The noisy data points sij , i = 1, 2, ..., M, j = 1, 2, ..., Ni are randomly sampled from
the three circles by adding zero mean Gaussian noise to the true data points S0 =
[s011 , ..., s01N1 , s021 , ..., s02N2 , s031 , ..., s03N3 ]. The concentric circles center is c0 = [0, 0]T and
radii are r0 = [50, 30, 20]. The noise power σ 2 is changing from 10−2 to 10 and the
correlation coefficient δ between the x and y noise components is 0.8. The number of
ensemble runs is L = 1000.
Fig 2.7 gives an estimation instance, when N1 = 12, N2 = 10, N3 = 8, r10 = 50, r20 =
30, r30 = 20, range of sampled arc is [0, 2π] and noise power is σ 2 = 10. The dots ‘•’
are randomly selected data points with noise. ‘◦’ is the true circle center. ‘x’ is the
estimated circle center by the asymptotically efficient estimator and ‘4’ represents
the estimated center by the Kasa method. Both the estimations have some deviation
from the true circles because of the noisy data points.
Fig 2.8 shows the comparison of the two estimators with the KCR lower bound
when N1 = 12, N2 = 10, N3 = 8, r10 = 50, r20 = 30, r30 = 20, range of the sampled
arc is [0, π]. The Kasa method cannot reach the KCR lower bound since it does not
estimate them as the whole concentric circles. The asymptotically efficient estimator
can achieve the KCR lower bound if the noise power is lower than 1.
31
concentric circles fitting of AE method, Kasa method with true circles
100
true center
estimated AE center
estimated Kasa center
sample points of circle
estimated AE circles
estimated Kasa circles
True Circle
80
60
Y cordinate
40
20
0
−20
−40
−60
−60
−40
−20
0
20
X cordinate
40
60
80
Figure 2.7: Fitted circles of the Kasa method and the ML method when N =
[12, 10, 8], r0 = [50, 30, 20], range of sampled arc is [0, 2π] and noise power is σ 2 = 10.
15
10
Asymptotically Efficient method
Kasa method
KCR
5
10*log(MSE)
0
−5
−10
−15
−20
−25
−20
−15
−10
−5
0
10*log(noisepower)
5
10
Figure 2.8: Comparison of the MSE of the Asymptotically Efficient method and the
Kasa method with the KCR lower bound when N = [12, 10, 8], r0 = [50, 30, 20], range
of sampled arc is [0, π] and noise power is σ 2 = [10−2 , 10].
32
The parameters of the simulation in fig 2.9 are all the same as fig 2.8 except that
the range of arc is changed to [0, 2π]. This change does not affect the performance of
the Kasa method, however the asymptotically efficient method obtains smaller MSE.
Table 2.2 shows the MSE of both the methods and the KCR lower bound in the log
scale when noise power is 10.
15
10
Asyptotically Efficient method
Kasa method
KCR
5
10*log(MSE)
0
−5
−10
−15
−20
−25
−20
−15
−10
−5
0
10*log(noisepower)
5
10
Figure 2.9: Comparison of the MSE of the Asymptotically Efficient method and the
Kasa method with the KCR lower bound when N = [12, 10, 8], r0 = [50, 30, 20], range
of sampled arc is [0, 2π] and noise power is σ 2 = [10−2 , 10].
Table 2.2: Comparison of the MSE of the Kasa method and the Asymptotically
Efficient method with the KCR lower bound when the ranges of sampled arc β are
different. The noise power is σ 2 = 10.
N = [12, 10, 8] N = [12, 10, 8]
MSE (10 log(∗))
r0 = [50, 30, 20] r0 = [50, 30, 20]
β=π
β = 2π
Kasa
13.34
13.32
Asymptotically Efficient estimator
11.20
7.57
KCR
9.76
6.36
33
Fig 2.10 expresses the result by changing the difference of three radii from 10 to
50, such as the radius r10 changing from 120 to 200 with a step of 20, r20 changing from
110 to 150 with a step of 10 and keeping r30 = 100. The noise power is σ 2 = 1. The
numbers of data points are N1 = 10, N2 = 10, N3 = 10. The Kasa method cannot
reach the KCR lower bound. When the difference of the radii is not smaller than 20,
the asymptotically efficient estimator is able to achieve the KCR lower bound. The
MSE of the two methods and the KCR lower bound with the increasing difference of
the radii is shown in Table 2.3.
3.5
Asymptotically Efficient method
Kasa method
KCR
3
2.5
10*log(MSE)
2
1.5
1
0.5
0
−0.5
−1
−1.5
10
15
20
25
30
35
radii difference
40
45
50
Figure 2.10: Comparison of the MSE of the Asymptotically Efficient method and
the Kasa method with the KCR lower bound by changing the difference of two radii.
N = [10, 10, 10], r10 = 120 : 20 : 200,r20 = 110 : 10 : 150 and r30 = 100. The range of
sampled arc is [0, 2π] and noise power is σ 2 = 1.
34
Table 2.3: Comparison of the MSE of the Kasa method and the Asymptotically Efficient method with the KCR lower bound when the difference of the radii is increasing.
The number of data point is N = [10, 10, 10]. The noise power is σ 2 = 1.
MSE (10 log(∗))
Kasa Asymptotically Efficient Estimator KCR
0
r = [120, 110, 100] 2.171
-0.870
-0.964
r0 = [140, 120, 100] 2.076
-0.964
-0.964
0
r = [160, 130, 100] 2.113
-0.964
-0.964
0
r = [180, 140, 100] 2.107
-0.930
-0.964
r0 = [200, 150, 100] 2.088
-0.916
-0.964
Fig 2.11 gives an instance of coupled concentric circles with N1 = 10, N2 = 10,
r10 = 50, r20 = 30 and the range of sampled arc [0, π] when a point of the outer circle
is assigned to the inner circle. Both the methods perform poorly. It shows that data
classification is an important intermediate if we do not know which circle the data
points belong to. A small assigned mistake will heavily affect the final estimate result.
The examples given by fig 2.12 shows the MSE of coupled concentric circles with
N1 = 10, N2 = 10, r10 = 50, r20 = 30 and the range of sampled arc [0, π] estimated as a
single circle and compares with the MSE of coupled circles estimation and the KCR
lower bound. The MSE of coupled circles estimated as a single circle is much larger
than the MSE of the right estimation.
Fig 2.13 and fig 2.14 shows the results of estimated a single circle as the coupled
circles and compares with the MSE of the asymptotically efficient when N1 = 20,
r10 = 50 and the range of sampled arc is [0, π]. From fig 2.13, we can figure out that
for both the estimators, the MSE of the single circle center can achieve the KCR
lower bound accuracy since the coupled circles estimator estimates the single circle as
the coupled circles with two equal radii. The MSE of the radii of the coupled circles
2
estimator is calculated by M SEc = k[r̂1 , r̂2 ] − [r10 , r10 ]k , while the MSE of radii of the
2
single circle estimator is calculated by M SEs = k[r̂1 , r̂1 ] − [r10 , r10 ]k . Fig 2.14 shows
35
that the MSE of the radii of coupled circles estimation cannot reach the KCR lower
bound. These results of fig 2.12, fig 2.13 and fig 2.14 show that a big error will be
generated if the number of concentric circles is falsely estimated.
20
15
10
10*log(MSE)
5
0
−5
−10
Asyptotically Efficient method
Kasa method
KCR
−15
−20
−20
−15
−10
−5
0
10*log(noisepower)
5
10
Figure 2.11: Comparison of the MSE of the Kasa method and Asymptotically Efficient
method with the KCR lower bound when one data point is wrongly assigned to
another circle. N = [10, 10], r0 = [50, 30]. The range of sampled arc is [0, π] and
noise power is σ 2 = [10−2 , 10].
36
25
20
15
10
10*log(MSE)
5
0
−5
−10
−15
concentric circles estimate
single circle estimate
KCR
−20
−25
−20
−15
−10
−5
0
10*log(noisepower)
5
10
Figure 2.12: Comparison of the MSE of the Kasa method and the Asymptotically
Efficient method with the KCR lower bound when the coupled circles are estimated
as a single circle. N = [10, 10], r0 = [50, 30]. The range of sampled arc is [0, π].
10
5
concentric circles estimate
single circle estimate
KCR
10*log(MSE)
0
−5
−10
−15
−20
−25
−20
−15
−10
−5
0
10*log(noisepower)
5
10
Figure 2.13: Comparison of the MSE of the Kasa method and the Asymptotically
Efficient method with the KCR lower bound for the estimations of the single circle
center c0 = [0, 0]T when a single circle is estimated as the coupled circles. N1 = 20,
r10 = 50. The range of sampled arc is [0, π].
37
5
0
concentric circles estimate
single circle estimate
KCR
−5
10*log(MSE)
−10
−15
−20
−25
−30
−35
−20
−15
−10
−5
0
10*log(noisepower)
5
10
Figure 2.14: Comparison of the MSE of the Kasa method and the Asymptotically
Efficient method with the KCR lower bound for the estimations of the single circle
radius r10 = 50 when a single circle is estimated as the coupled circles. N1 = 20, and
the range of sampled arc is [0, π] and noise power is σ 2 = [10−2 , 10].
2.5
Summary
This chapter mainly reviews several circles fitting methods. For single circle estimation, a non-iterative algorithm: the Kasa method and an iterative algorithm: the ML
method are introduced. The MSE of estimated circle parameters will be smaller if
more data points are sampled or sampled from a larger arc of a circle. The iterative ML method provides better estimation performance and costs more time than
the non-iterative Kasa method. An asymptotically efficient estimator is introduced
for multiple concentric circles estimation. This estimator requires more computation
time but gives better estimate results than the Kasa method. The asymptotically
38
efficient estimator works better when the radii difference of circles is large. This estimator can be reduced to single circle estimate. At the end of the simulation section,
two problems need be addressed: if the circles number is known and the assigned data
points are not available, how to classify the data points and improve the classification
accuracy; if the number of circles is unknown, how to estimate it. Some methods will
be proposed for each problem in the following Chapter 3 and Chapter 4.
39
Chapter 3
Circles Estimate with Known
Number of Concentric Circles
In this chapter, we will discuss the situation that the number of concentric circles is
already known but the belonging of the data points to the circles is unknown. Three
methods, the Distance Division, the K-Means and the Naive Bayes classifier, will be
introduced to solve the data points classification problem. The simulation presented
at the end of Chapter 2 expresses that the classification accuracy of data points will
heavily affect the performance of circles estimator, so a classification method with
high accuracy is necessary for circles fitting later.
3.1
Distance Division Method
The Distance Division method is based on a direct thinking. Given a couple of
concentric circles as an instance, the data set can be divided into three groups by the
40
distance of data points to the circles center: the farthest group, the median group
and the nearest group. It considers that the data points of the farthest group and the
nearest group belonging to the outer and inner circle. Then what need to be done is
to make sure which circle the data points of the median group belong to.
Given a number of noisy data points S = {s1 , s2 , ..., sN }, N = N1 + N2 + · · · + NM ,
in order to get the distances between the data points and the concentric circles center,
we assume that all the data points belonging to a single circle first to estimate an
approximate center coordinates ĉs = [âs , b̂s ]T .
Calculate the distances of the data points si to the approximate single circle center
ĉs
dj = ksj − ĉk, j = 1, 2, · · · , N1 + N2 + · · · + NM
(3.1)
Sort the data set S in descending order by distances to get S0 = {s01 , s02 , ..., s0N }, where
d01 > d02 > · · · > d0N . Divide S0 into 2M − 1 equal parts: S0 = {S01 , S02 , ..., S02M −1 },
the left points added to the last subset. Estimate the approximate concentric circles
variables θ̂ c = [âc , b̂c , r̂c1 , · · · , r̂cM ]T by the data sets S0o = {S01 , S03 , ..., S02M −3 , S02M −1 },
where ĉc = [âc , b̂c ]T is the approximate concentric circles center and r̂ci , i = 1, 2, ..., M
are the radii.
Calculate the difference of distances between the data points of data sets S0e =
{S02 , S04 , ..., S02M −4 , S02M −2 } and the approximate concentric circles center ĉc to the two
neighbor radii
pl±1,k = r̂cl±1 − ks0l,k − ĉc k , l = 2, 4, ..., 2M − 2, k = 1, 2, ..., b
N
c
2M − 1
(3.2)
where l is the label of data sets of S0e , k is the label of data points of the lth circle
41
and pl±1,k is the distances between the data point s0l,k and the (l ± 1)th circles. If the
distance pl+1,k > pl−1,k , the data point s0l,k ∈ S0l−1 else s0l,k ∈ S0l+1 .
e = {S
e1, S
e 2 , ..., S
e M } which conAfter the classification we get the new data sets S
tains all the data points. Use the concentric circles estimator repeatedly to obtain
the final results θ̂ = [â, b̂, r̂1 , ..., r̂M ]T .
This method can achieve high classified accuracy when the number of data points
of each circle is the same or nearly same (comparing with the total number of data
points of each circle, the quantity difference of the data points of each circle can be
ignored) or the classification accuracy will decline. It is a time-costing method since
it does twice circles estimation during the data classification.
3.2
K-Means Method
The K-Means algorithm is one of the most well know clustering method. It is a
partitioning clustering method which needs to know the number of clusters. The
K-Means algorithm assigns the data based on their aggregation according to compare
the Euclidean distance, sum of absolute differences (city-block), the cosine of the
angle between points, or the sample correlation between points and percentage of
bits which differ (Hamming). The method based on the Euclidean distance we used
is the commonest one. The data points of each group are presented by the cluster
centroids. It means that the K-Means algorithm attempts to look for the best points
as the cluster centroids. The K-Means method is an iterative algorithm and the
cluster centroids need be initialized.
In the application, we use the distance between the concentric circles data points
42
and the circles center as the feature to compare the Euclidean distance. Given a
various of noisy data points S = {s1 , s2 , ..., sN }, N = N1 + N2 + · · · + NM , we estimate
the approximate center coordinates ĉs = [âs , b̂s ]T . Use equation (3.1) to calculate the
distances dj , j = 1, 2, ..., N between the data points sj and the approximate single
circle center ĉs . Then the step of the K-Means method for data points clustering is
summarized as:
Step 1. Randomly initialize the centroid of cluster
E = {e1 , e2 , ..., eM }
(3.3)
where M is the number of concentric circles. Since we will circulate the process for
1000 ensemble runs to get a relatively stable result, the randomly selected cluster
centroid is not an influence factor.
Step 2. Calculate the distances between the data points and each cluster centroid
dij = ksj − ei k, i = 1, 2, ..., M, j = 1, 2, ..., N
(3.4)
where dij means the distance of the j th data point sj to the ith cluster centroid ei .
The data points will be clustered with the centroid which can minimize the distance.
Step 3. Recalculate the centroids based on the definition of centroid
ei =
1 X
N̂i
sj
(3.5)
∀sj ∈Si
where Si is the subset of data points belonging to cluster i and N̂i is the number of
data points in this cluster.
43
Repeat step 2 and step 3 until satisfy one of the stopped criteria:
1.The change of cluster centroids is ignorable.
2.There is no data point of clusters changing.
3. Reach the maximum of iteration predefined.
3.3
Naive Bayes Classifier
The Naive Bayes algorithm based on the Bayes theorem is an extremely fast and
simple probabilistic classifier which assumes all the features X = {x1 , x2 , ..., xN } are
conditionally independent of each other, given the class C. This assumption simplifies
the representation of probability p(X|C). The Naive Bayes classifier requires a series
of variables linear in the number of features for training. Although the conditionally
independent assumption is rarely true in real-world situations, the Naive Bayes classifier often works quite well in many complex applications. Comparing with other
iterative classification methods, the Naive Bayes classifier requires only a single pass
through the data if all attributes are discrete. In the classification application of the
data points of circles, the attributes are S = {s1 , s2 , ..., sn }, N = N1 + N2 + · · · + NM .
According to the Bayes theorem, the posterior probability model can be written
as
p(C|s1 , · · · , sN ) =
p(C)p(s1 , ..., sN |C)
p(s1 , ..., sN )
(3.6)
where p(C) is the prior probability, p(s1 , ..., sN |C) is the likelihood component and
p(s1 , ..., sN ) is the evidence. Using the chain rule for repeated applications of the
44
conditionally independent probability, p(s1 , ..., sN |C) can be simplified as
p(s1 , ..., sN |C)
=p(s1 |C)p(s2 , ..., sN |C, s1 )
=p(s1 |C)p(s2 |C, s1 ) · · · p(sN |C, s1 , ..., sn−1 )
=
N
Y
(3.7)
p(sj |C)
j=1
The goal is to train a classifier model that can output the probability of possible
classes of C for each new data point sj . The probability of data point classified to
the k th class can be expressed as
p(C = ck )
p(C = ck |s1 , · · · , sN ) = PM
i=1
p(C =
QN
j=1 p(sj |C = ck )
Q
ci ) N
j=1 p(sj |C = ci )
(3.8)
where the sum taken over all possible values of class C is p(s1 , ..., sN ). Given a
new number of data points S0 = {sj |j = 1, 2, ..., N }, N = N1 + N2 + · · · + NM , we
can calculate the posterior probability by the prior probability p(ck ) and likelihood
p(sj |ck ) estimated from the training data. The prior probability distribution p(ck )
can be easily set as uniform distribution or calculated directly from the counts by the
Mean Shift method introduced in the Chapter 4. We are interested only in the most
probable value of ck which can give the maximum of posterior probability
p(C = ck )
class(c1 , ..., cM ) ∝ arg max PM
ck
i=1
p(C =
QN
j=1 p(sj |C = ck )
.
Q
ci ) N
j=1 p(sj |C = ci )
(3.9)
Since the denominator of the above function does not depend on ck , (3.9) can be
45
simplified to the following
class(c1 , ..., cM ) ∝ arg max p(C = ck )
ck
N
Y
p(sj |C = ck ).
(3.10)
j=1
Although the independence assumption is unrealistic, the Naive Bayes classifier
has good performance in practice since its classification decision may often be correct
even if the estimated probability from training is inaccurate [54]. It means that the
estimated posterior probabilities of each class may be incorrect but the class with the
maximum posterior probability is often correct. The Naive Bayes classifier works well
if the features are completely independent or functionally dependent [39].
Unlike the other classification applications having the known training subset to
build the classifier model, the knowledge of the concentric circles is unavailable. The
Naive Bayes classifier needs other algorithms, such as the non-parametric data clustering method to calculate the prior probability which requires enough training subset.
When the received number of data points is limited, a method which can maximize
the use of data points of circles is necessary. From the simulation following, the Naive
Bayes classifier can achieve the best result when the level of noise is low. Then we try
to improve the classification performance when the noise power is large by proposing
a new technique, cross validation, applied to the Naive Bayes classifier.
Cross validation partitions a sample of data into complementary subsets, performing the analysis on one subset, which called the training set, and validating the
analysis on the other subsets, which called the testing set. In order to reduce the variability, cross validation can be performed by using several partitions and averaged
the validation results. For the classification of circles data points, we can partition
the L groups of data points into several subsets and do the cross validation as follows:
46
Step 1. Partition the L groups of data into k subsets.
Step 2. Train the k−1 subsets to calculate the prior probability that is the probability
of the data points belonging to the circles and the clustering centroids which are the
approximate radii of the data points to the circles center.
Step 3. Test the left subset by the built classification model and calculate the probability of the data points belong to the circles and the radii of the data points to the
circles center.
Step 4. Repeat Step 2 and Step 3 for all the subsets and compute the prior probability
and radii by the classification results.
Step 5. Classify each subset once again by the averaged probability and radii of the
other k − 1 subsets from the classification groups of Step 4.
3.4
Simulation
In this section, we will test these classifiers separately to find the appropriate parameters, and then compare their classification performance. Unless specified otherwise,
the number of concentric circles is M = 3. N = N1 + N2 + N3 is the number of
data points randomly sampled from the concentric circles. The noisy data points
sj , j = 1, 2, ..., N are generated by adding zero mean Gaussian noise to the true data
points. The concentric circles center is c0 = [0, 0]T . The range of noise noise power
is σ 2 = [10−2 , 102 ] and the correlation coefficient δ between the x and y noise components is 0.8. The number of ensemble runs is L = 1000. In order to ensure the
methods work stably, we will show the average results of five times running.
For the K-Means method, the replicate time means the times of initializing cluster
47
centroid in the clustering. More iterations will generate better results until reach
the threshold. From Table 3.1 and fig 3.1, when the numbers of data points are
N1 = 10, N2 = 10, N3 = 10, radii are r10 = 140, r20 = 120, r30 = 100 and range of
sampled circle is [0, 2π], we can see that the classification accuracy improves a lot
after the replicate times are up to three. Five times replications have a small gain
of classification accuracy to three times replications. The classification accuracy of
5 times replications is able to reach the accuracy of 7 times replications. When the
replicate time is equal to one or three, the classification accuracy of σ 2 = 10−2 is
lower than σ 2 = 1, one of the reasons is that the initialized cluster centroids are
randomly selected and the replicate time is too little to give an optimized clustering
result. After the replicate time is increasing to 5, this problem does not happen and
the K-Means method performs better with lower noise power. More than 5 times of
replications will cost much computation but obtain little improvement, so we choose
the replicate times of the K-Means method as 5.
Table 3.1: Classification accuracy with different replication times of the K-Means
method for fig 3.1. The number of concentric circles is M = 3, circles center is
c0 = [0, 0]T , and the range of sampled circle is [0, 2π]. The number of data points of
each circle is N = [10, 10, 10] and the radius of each circle is r0 = [140, 120, 100].
replicate times
1
3
5
7
σ 2 = 10−2
0.8998 0.9468 0.9548 0.9496
σ 2 = 10−1
0.8994 0.9430 0.9516 0.9484
2
σ =1
0.9142 0.9500 0.9518 0.9510
σ 2 = 10
0.9090 0.9258 0.9282 0.9284
σ 2 = 102
0.7296 0.7398 0.7408 0.7374
48
1
classification accuracy
0.95
0.9
0.85
0.8
0.75
0.7
−20
replicate 1
replicate 3
replicate 5
replicate 7
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.1: Classification accuracy with different replication times of the K-Means
method. The number of data points of each circle is N = [10, 10, 10] and the radius
of each circle is r0 = [140, 120, 100].
Table 3.2 shows the classification accuracy of the Naive Bayes classifier with different number of training subset. When the number of training subset is reduced
from 100 to 50, the classification accuracy is still steady. But the classification result
becomes unstable as the training data set is descending to 30 groups. Different from
the other classification instances, the knowledge of the concentric circles is unavailable, and the classified model is trained by the Mean Shift method or the Distance
Threshold method proposed in Chapter 4. Both the methods require enough training
subset to compute the prior probability and likelihood. This table states that the
Naive Bayes classifier needs plenty of data for training. When the number of received
data points is limited, cross validation is a good technique to make the most efficient
use of the data subset.
49
Table 3.2: Classification accuracy with different number of training groups for the
Naive Bayes classifier for fig 3.2. The number of concentric circles is M = 3, circles
center is c0 = [0, 0]T , and the range of sampled circle is [0, 2π]. The number of
data points of each circle is N = [10, 10, 10] and the radius of each circle is r0 =
[140, 120, 100].
size of training datasets
100
50
40
30
20
2
−2
σ = 10
1
1
1 0.8728
1
2
−1
σ = 10
1
1
1 0.8028
1
σ2 = 1
1
1
1 0.9999 0.6587
σ 2 = 10
0.9788 0.9490 0.9651 0.9895 0.9513
2
2
σ = 10
0.7674 0.7858 0.8028 0.7721 0.7325
1
0.95
classification accuracy
0.9
0.85
0.8
0.75
0.7
0.65
−20
training subset: 100
training subset: 50
training subset: 40
training subset: 30
training subset: 20
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.2: Classification accuracy with different number of training subset. The
number of data points of each circle is N = [10, 10, 10] and the radius of each circle
is r0 = [140, 120, 100].
The prior probability of the classified model of the Naive Bayes classifier can be
calculated from the training process or easily set as the uniform distribution. Table
3.3 and fig. 3.3 give the classification accuracy of the two classified model. When the
number of data points is N1 = 10, N2 = 10, N3 = 10, the prior probability with the
50
uniform distribution obtains higher classification accuracy since the number of data
points of each circle is equal. The calculated prior probability has better classification
performance when the number of data points of each circle is different such as N1 =
10, N2 = 20, N3 = 30. The reason is obvious because the prior probability which
meets the distribution of data points better can get higher classification accuracy.
Table 3.3: [Comparison of classification accuracy of the Naive Bayes classifier with the
calculated prior probability and prior probability with the uniform distribution for
fig 3.3. The number of concentric circles is M = 3, circles center is c0 = [0, 0]T , and
the range of sampled circle is [0, 2π]. The radius of each circle is r0 = [140, 120, 100].
10log(σ 2 )
-20
-10
0
10
20
N = [10, 10, 10] calculated pc
1
1
1 0.9788 0.7674
pc = [ 13 , 31 , 13 ]
1
1
1 0.9842 0.7804
N = [10, 20, 30] calculated pc
1
1
1 0.9707 0.7491
1 1 1
pc = [ 3 , 3 , 3 ] 0.9741 0.9994 0.9983 0.9306 0.7308
Table 3.4 and fig. 3.4 express the classification result by applying the crossvalidation to the Naive Bayes classifier. The number of received data sets is L = 500
and the number of data points of each circle is N1 = 10, N2 = 20, N3 = 30, and the
radius of each circle is r10 = 140, r10 = 120, r10 = 100. From Table 3.2 we know that the
training process requires enough data points to obtain a good classification model, so
we partition the data set into several subsets and make sure that each subset includes
at least 100 groups of data. Table 3.4 gives the classification accuracy of each subset
and the averaged accuracy. The “first testing” of “subset 1” means the classification
accuracy of building the classification model by training subset 2 to 5 to test subset
1. The “second testing” of “subset 1” means using the test results of subset 2 to 5 of
“first testing” as the training model to classify subset 1 again. From fig. 3.4 we can
see that the “second testing” has higher classification accuracy than “first testing”
51
and the normal Naive Bayes classifier when the noise power is large. It states that
the cross-validation technique can not only make the efficient use of limited number
of received data points and also improve the classification performance.
Table 3.4: The classification accuracy of the Naive Bayes classifier with the crossvalidation technique for fig 3.4. The number of concentric circles is M = 3, circles
center is c0 = [0, 0]T , and the range of sampled circle is [0, 2π]. The number of data
points of each circle is N = [10, 20, 30]. The radius of each circle is r0 = [140, 120, 100].
10log(σ 2 )
-20
-10
0
10
20
subset 1
1
1
1 0.9765 0.7588
subset 2
0.9996
1
1 0.9747 0.7727
subset 3
1 0.9989 0.9990 0.9733 0.7668
first testing
subset 4
1
1
1 0.9850 0.7637
subset 5
1 0.9992
1 0.9748 0.7642
averaged accuracy
1
1 0.9998 0.9769 0.7652
subset 1
1
1
1 0.9812 0.7910
subset 2
1
1 0.9997 0.9815 0.7905
subset 3
1
1
1 0.9810 0.7908
second testing subset 4
1 0.9998
1 0.9812 0.7910
subset 5
1
1
1 0.9812 0.7912
averaged accuracy
1
1
1 0.9812 0.7909
normal Naive Bayes classifier
1
1
1 0.9707 0.7491
Table 3.5 and fig 3.5 show the classification results of changing the number of data
points of the concentric circles. The number of data points of each circle is equal.
All the classification methods get higher accuracy as the number of data points is
increasing. Because all the three methods depend on the calculated distances between
the data points to the approximate single circle center and more data points will
generate more accurate single circle center, according to the conclusion of Chapter 2.
52
Table 3.5: Comparison of classification accuracy of the K-Means method, the Distance
Division method and the Naive Bayes method with the equal number of data points
number of the circles for fig 3.5. The number of the concentric circles is M = 3,
circles center is c0 = [0, 0]T , and the range of sampled circle is [0, 2π]. The radius of
each circle is r0 = [140, 120, 100].
10log(σ 2 )
-20
-10
0
10
20
N = [10, 10, 10] 0.9490 0.9536 0.9436 0.9250 0.7392
K-Means
N = [20, 20, 20] 0.9986 0.9912 0.9920 0.9766 0.7718
N = [30, 30, 30] 0.9974 0.9984 0.9988 0.9868 0.7868
N = [10, 10, 10] 0.9814 0.9836 0.9744 0.9436 0.7548
Distance Division
N = [20, 20, 20] 0.9988 0.9990 0.9982 0.9790 0.7772
N = [30, 30, 30]
1
1
1 0.9868 0.7884
N = [10, 10, 10]
1
1 0.9960 0.9842 0.7904
Naive Bayes classifier N = [20, 20, 20]
1 0.9993
1 0.9880 0.7922
N = [30, 30, 30]
1
1
1 0.9890 0.8127
1
classification accuracy
0.95
0.9
0.85
0.8
calculated pc, N=10,10,10
pc=[1/3,1/3,1/3], N=10,10,10
calculated pc, N=10,20,30
pc=[1/3,1/3,1/3], N=10,20,30
0.75
0.7
−20
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.3: Compare the classification accuracy with calculated prior probability and
uniform prior probability. The number of data points of each circle is N = [10, 10, 10]
or N = [10, 20, 30] and the radius of each circle is r0 = [140, 120, 100].
53
1
classification accuracy
0.95
0.9
0.85
0.8
0.75
0.7
−20
first testing
second testing
normal Naive Bayes classifier
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.4: The classification accuracy of applied the cross-validation to the Naive
Bayes classifier. The number of data points number of each circle is N = [10, 20, 30]
and the radius of each circle is r0 = [140, 120, 100].
1
classification accuracy
0.95
0.9
0.85
K−Means N=10,10,10
K−Means N=20,20,20
K−Means N=30,30,30
Distance Division N=10,10,10
Distance Division N=20,20,20
Distance Division N=30,30,30
Naive Bayes N=10,10,10
Naive Bayes N=10,10,10
Naive Bayes N=10,10,10
0.8
0.75
−20
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.5: Comparison of classification accuracy of the K-Means method, the Distance Division method and the Naive Bayes method with the equal number of data
points of the circles. The radius of each circle is r0 = [140, 120, 100].
54
Fig 3.6 illustrates the estimation results of the concentric circles with the ungrouped data points by using the three classified methods and compares them with
the KCR lower bound and the MSE of circles with the true classification of data
points when N1 = 20, N2 = 20, N3 = 20, r10 = 140, r20 = 120, r30 = 100 and the
range of sampled circles [0, 2π]. The MSE of circles center and radii estimated by the
Distance Division method can reach the KCR lower bound accuracy when the noise
power is not higher than 1. The K-Means method can not achieve the KCR lower
bound except for the noise power is lower than σ 2 = 10−2 . It is reasonable since the
classification accuracy of the K-Means method is lower than the classification accuracy of the Distance Division method when each circle has equal number of data points.
The data points classified by the Naive Bayes classifier obtain the best estimation results since they get the best classification accuracy. From the wrong classification
example given at the end of the simulation section of Chapter 2, we know that the
asymptotically efficient estimator is so sensitive to the wrong classification that one
wrongly assigned point will produce large estimation error.
Table 3.6 and fig 3.7 express the classification accuracy of the K-Means method,
the Distance Division method and the Naive Bayes method with each circle having
unequal number of data points, N1 = 20, N2 = 10, N3 = 10, r10 = 140, r20 = 120, r30 =
100 and the range of sampled circles [0, 2π]. The classification accuracy of the KMeans method is higher than the Distance Division method. The Distance Division
method assumes that the number of data points of each circle is equal when separates
the distance between the data points to the approximate center into groups, so it
performs badly when the number of data points of circles is unequal actually. After
increasing the difference of the number of data points to N1 = 30, N2 = 15, N3 = 10,
55
the classification accuracy of the Distance Division method is decreasing but the KMeans method obtains higher classification accuracy because more data points are
sampled. It states that the K-Means method does not require the number of data
points is equal. The Naive Bayes classifier still achieves the highest classification
accuracy and is not affected by the difference of the number of data points in each
circle so as the training process that are not influenced by the difference of number
of data points.
Table 3.6: Comparison of classification accuracy of the K-Means method, the Distance
Division method and the Naive Bayes method with different number of data points
of each circle for fig 3.7. The number of concentric circles is M = 3, circles center
is c0 = [0, 0]T , and the range of sampled circle is [0, 2π]. The radius of each circle is
r0 = [140, 120, 100].
10log(σ 2 )
-20
-10
0
10
20
K-Means
N = [20, 10, 10] 0.9806 0.9786 0.9784 0.9594 0.7378
N = [30, 15, 10] 0.9916 0.9886 0.9912 0.9798 0.7320
Distance Division
N = [20, 10, 10] 0.8344 0.8330 0.8356 0.8372 0.6980
N = [30, 15, 10] 0.7532 0.7534 0.7598 0.7486 0.6426
Naive Bayes classifier N = [20, 10, 10]
1
1 0.9969 0.9831 0.7836
N = [30, 15, 10]
1
1
1 0.9780 0.7397
The MSE of circles center and radii expressed by fig 3.8 is based on the classification results when N1 = 30, N2 = 15, N3 = 10 . The MSE of the K-Means method and
the Distance Division method have a certain gap to the KCR lower bound because
the classification accuracy is not high enough. The MSE of the Naive Bayes classifier can achieve the KCR lower bound when the noise power is low. The Distance
Division method performs the worst since it is not suitable to the unequal number
of data points of the circles. If more data points can be received, the performance of
the K-Means method will be better and the gap of the MSE to the KCR lower bound
will be reduced.
56
20
K−Means
Distance Division
Naive Bayes
True Classification
KCR
15
10
10log(MSE)
5
0
−5
−10
−15
−20
−25
−30
−20
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.6: Comparison of the MSE of the K-Means method, the Distance Division
method, the Naive Bayes classifier with the KCR lower bound when the number of
data points of each circle is equal. The number of data points is N = [20, 20, 20]. The
radius of each circle is r0 = [140, 120, 100].
1
0.95
classification accuracy
0.9
0.85
0.8
0.75
0.7
0.65
−20
K−Means N=20,10,10
K−Means N=30,15,10
Distance Division N=20,10,10
Distance Division N=30,15,10
Naive Bayes N=20,10,10
Naive Bayes N=30,15,10
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.7: Comparison of classification accuracy of the K-Means method, the Distance Division method and the Naive Bayes method with a different number of data
points of each circle. The radius of each circle is r0 = [140, 120, 100].
57
30
20
10log(MSE)
10
0
−10
K−Means
Distance Division
Naive Bayes
True classification
KCR
−20
−30
−20
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.8: Comparison of the MSE of the K-Means method, the Distance Division
method, the Naive Bayes classifier and the KCR lower bound with a different data
points number of each circle. The number of data points is N = [30, 15, 10]. The
radius of each circle is r0 = [140, 120, 100].
How does the difference of the radii of concentric circles affect the classification
result? Table 3.7 and Fig 3.9 give the results. When the noise power is no larger
than 1, the radii difference has no influence on the classification accuracy of all the
three methods. The difference between radii will be an important factor if the noise
power is large. From Table 3.7 we can see that when the noise power is equal to 10
and 102 , all the classification methods with a large space between circles have much
better performance than the small space.
58
Table 3.7: Comparison of classification accuracy of the K-Means method, the Distance
Division method and the Naive Bayes method with different radii differences for fig
3.9. The number of concentric circles is M = 3, circles center is c0 = [0, 0]T , and
the range of sampled circle is [0, 2π]. The number of data points of each circle is
N = [10, 10, 10].
10log(σ 2 )
-20
-10
0
10
20
0
r = [120, 110, 100] 0.9547 0.9527 0.9420 0.8450 0.5713
K-Means
r0 = [140, 120, 100] 0.9543 0.9557 0.9490 0.9260 0.7380
r0 = [180, 140, 100] 0.9520 0.9490 0.9470 0.9437 0.8747
r0 = [120, 110, 100] 0.9853 0.9807 0.9687 0.8617 0.5813
Distance
r0 = [140, 120, 100] 0.9850 0.9827 0.9790 0.9483 0.7540
Division
r0 = [180, 140, 100] 0.9810 0.9820 0.9770 0.9697 0.8907
r0 = [120, 110, 100]
1
1 0.9937 0.8725 0.5847
0
Naive Bayes r = [140, 120, 100]
1
1
1 0.9742 0.7804
classifier
r0 = [180, 140, 100]
1
1
1 0.9847 0.9064
1
0.95
classification accuracy
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
−20
K−Means r=120,110,100
K−Means r=140,120,100
K−Means r=180,140,100
Distance Division r=120,110,100
Distance Division r=140,120,100
Distance Division r=180,140,100
Naive Bayes r=120,110,100
Naive Bayes r=140,120,100
Naive Bayes r=180,140,100
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.9: Comparison of classification accuracy of the K-Means method, the Distance Division method and the Naive Bayes method with different radii differences.
The number of data points of each circle is N = [10, 10, 10].
As the radii of concentric circles are enlarged, the MSE of the radii is increasing
59
despite the classification accuracy is high. In order to test the influence of the differM SE(θ)
ence of radii to the MSE of the radii, we use the normalized M SE 0 (θ) =
r̂l,k+1 − r̂l,k
showed in fig 3.10, where the denominator is the space between two neighbor radii. The number of data points is N1 = 10, N2 = 10, N3 = 10, the radii are r0 =
{[120, 110, 100], [140, 120, 100], ..., [200, 150, 100]} and the noise power is σ 2 = 10. We
notice that the normalized MSE of radii of the Distance Division method and the
Naive Bayes method is going down as the radii difference is raising. The MSE of
the radii of the K-Means method is not monotonic decreasing since the classification
accuracy is not high enough.
−2
K−Means
Distance Division
Naive Bayes
True classification
KCR
−4
−6
10log(MSE)
−8
−10
−12
−14
−16
−18
10
15
20
25
30
35
radii difference
40
45
50
Figure 3.10: Comparison of the normalized MSE of the K-Means method, the Distance Division method, the Naive Bayes classifier with the KCR lower bound when
the space between radii is different. The number of data points of each circle is
N = [10, 10, 10]. The noise power is 10.
The classification accuracy of different number of the concentric circles is shown
by Table 3.8 and fig 3.11. Keep the number of data points and radii difference the
60
same and increase the number of concentric circles. For all the three classification
methods, less number of circles generates higher classification accuracy. The descending tendency of the K-Means method and the Naive Bayes method is faster than the
Distance Division method with the same noise power.
Table 3.8: Comparison of classification accuracy of the K-Means method, the Distance
Division method and the Naive Bayes method with different number of concentric
circles for fig 3.11 when the number of data points and radii difference of each circle
is the same. The number of concentric circles is M = 3, circles center is c0 = [0, 0]T
and range of the sampled circle is [0, 2π].
10log(σ 2 )
-20
-10
0
10
20
N = [10, 10]
r0 = [140, 100]
0.9933 0.9340 0.9923 0.9913 0.9380
K-Means
N = [10, 10, 10]
r0 = [180, 140, 100]
0.9520 0.9490 0.9470 0.9437 0.8747
N = [10, 10, 10, 10]
r0 = [220, 180, 140, 100] 0.9070 0.9033 0.9010 0.8937 0.8380
N = [10, 10]
r0 = [140, 100]
0.9983 0.9983 0.9970 0.9957 0.9430
Distance
N = [10, 10, 10]
Division
r0 = [180, 140, 100]
0.9810 0.9820 0.9770 0.9697 0.8907
N = [10, 10, 10, 10]
r0 = [220, 180, 140, 100] 0.9603 0.9607 0.9557 0.9417 0.8597
N = [10, 10]
r0 = [140, 100]
1
1
1 0.9917 0.7216
Naive Bayes N = [10, 10, 10]
classifier
r0 = [180, 140, 100]
1
1
1 0.9737 0.7736
N = [10, 10, 10, 10]
r0 = [220, 180, 140, 100] 0.9970 0.9957 0.9893 0.9437 0.5370
61
1
0.95
classification accuracy
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.55
0.5
−20
K−Means 2 circles
K−Means 3 circles
K−Means 4 circles
Distance Division 2 circles
Distance Division 3 circles
Distance Division 4 circles
Naive Bayes 2 circles
Naive Bayes 3 circles
Naive Bayes 4 circles
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.11: Comparison of classification accuracy of the K-Means method, the Distance Division method and the Naive Bayes method with different number of concentric circles when the number of data points and radii difference of each circle is the
same. The number of data points of each circle is 10. The difference of radii is 20.
Table 3.9 gives the classification accuracy of each single circle when the number of
data points and radii difference of each circle is the same. As the number of concentric
circles is increasing, the classification accuracy of each circle of all the methods is
decreasing. For the K-Means method the classification accuracy of the outer circle is
higher than the inner circle and the classification accuracy of the inner circle is higher
than the median circle. The classification accuracy of the Distance Division method is
nearly average. Draw fig 3.12 by selecting the classification accuracy of a circle whose
radius is equal to 140. We can see that for the circle with the same radius, all the
methods perform worse as the number of concentric circles is larger. The gap between
the curves of the K-Means method is larger than the curves of the Distance Division
method. It confirms the conclusion gotten above. The classification accuracy of the
62
Naive Bayes classifier is decreasing evidently as the noise power is increasing.
Table 3.9: Comparison of classification accuracy of each circle of the K-Means method,
the Distance Division method and the Naive Bayes method for fig 3.12 when the
number of data points and radii difference of each circle is the same. The circles
center is c0 = [0, 0]T , and range of sampled circle is [0, 2π]. The number of circles is
raising from 2 to 4.
N = [10, 10]
10log(σ 2 )
-20
-10
0
10
20
0
K-Means
r1 = 140 0.9983 0.9988 0.9995 0.9967 0.9443
r20 = 100 0.9996 0.9996 0.9993 0.9976 0.9504
Distance Division
r10 = 140 0.9999 0.9999 0.9999 0.9977 0.9508
r20 = 100 0.9973 0.9999 0.9999 0.9945 0.9337
Naive Bayes classifier r10 = 140
1
1
1
1 0.8949
r20 = 100
1
1
1 0.9898 0.9636
N = [10, 10, 10]
r10 = 180 0.9752 0.9823 0.9776 0.9781 0.9220
K-Means
r20 = 140 0.9227 0.9234 0.9253 0.9216 0.8437
r30 = 100 0.9475 0.9463 0.9353 0.9287 0.8819
r10 = 180 0.9843 0.9802 0.9786 0.9735 0.9051
Distance Division
r20 = 140 0.9753 0.9787 0.9819 0.9680 0.9093
r30 = 100 0.9772 0.9824 0.9761 0.9607 0.8695
1
1 0.9488 0.9294 0.9358
r10 = 180
0
1 0.9926
1 0.9924 0.8206
Naive Bayes classifier r2 = 140
0
r3 = 100
1
1
1
1 0.9500
N = [10, 10, 10, 10]
r10 = 220 0.9372 0.9330 0.9537 0.9516 0.8912
K-Means
r20 = 180 0.8793 0.8694 0.8768 0.8830 0.8032
r30 = 140 0.8781 0.8809 0.8754 0.8695 0.7963
r40 = 100 0.9187 0.9123 0.9160 0.9087 0.8631
r10 = 220 0.9443 0.9496 0.9592 0.9457 0.8528
Distance Division
r20 = 180 0.9452 0.9493 0.9498 0.9505 0.8572
r30 = 140 0.9690 0.9624 0.9644 0.9517 0.8723
r40 = 100 0.9564 0.9633 0.9628 0.9436 0.8567
r10 = 220
1
1
1
1 0.9885
0
Naive Bayes classifier r2 = 180
1
1 0.9947 0.9854 0.8923
r30 = 140
1
1 0.9864 0.8742 0.8015
0
r4 = 100
1
1
1
1 0.8681
63
1
classification accuracy
0.95
0.9
0.85
0.8
0.75
−20
K−Means 2 circles
K−Means 3 circles
K−Means 4 circles
Distance Division 2 circles
Distance Division 3 circles
Distance Division 4 circles
Naive Bayes 2 circles
Naive Bayes 3 circles
Naive Bayes 4 circles
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 3.12: Classification accuracy of the circle with r0 = 140 by the K-Means
method, the Distance Division method and the Naive Bayes method. The number of
data points of each circle is 10. The difference of radii is 40.
3.5
Summary
This chapter assumes a situation that the number of concentric circles is known but
the belonging of the data points to the circles is not known. Three methods are proposed to classify the data points to different circles. The Distance Division method
assumes the number of data points of each circle is equal and equally divides them
into 2M − 1 groups based on the distance between the data points and an approximate single circle center. This hypothesis leads to a weakness that this method will
perform badly if the number of data points of each circle is different. The K-Means
is a data clustering method that classifies the data points with similar characteristic
together. We set the distance of data points to the approximate single circle center as
64
the clustering characteristic and randomly initialize the clustering centroids to find
the best centroids through an iteration process. The K-Means method is not sensitive
to the difference of the number of data points and has acceptable results when the
difference of data points number of each circle is large although its performance is
worse than the Distance Division method if the numbers of data points of the circles are the same. The Naive Bayes classifier usually achieves the best classification
accuracy as long as the data set for training is enough since the training is actually
one time of data clustering to improve the classification performance. In order to improve the efficient utilization of the limited data points, we apply the cross-validation
method to the Naive Bayes classifier which can also advance the classification accuracy when the noise power is in high level. All the methods depend on calculating
the distances between the points to the approximate single circle center which brings
in an inevitable error. The reason is that the random selection of data points from
the concentric circles will affect the approximate single circle center estimate which
will lead into the error of distance calculation.
65
Chapter 4
Circles Estimation with Unknown
Number of Concentric Circles
We talked about the data classification in the last chapter and the concentric circles
number is a precondition for the classification methods. Two methods, Distance
Threshold and Mean Shift, will be proposed to calculate the number of concentric
circles in this chapter.
4.1
Distance Threshold Method
The kernel of the Distance Threshold Method is calculating a threshold to separate
the data points based on the probability density distribution of distance between the
data points and estimated approximate center. The distance is between the data
points to the estimated approximate single circle center. This method is inspired by
the Hypothesis testing. Hypothesis testing is a method of testing a hypothesis about
66
a parameter in the population. The process of hypothesis testing is choosing between
competing hypotheses based on the probability density distribution of observed data.
The hypothesis testing is determined by the likelihood of a sample that can be selected
if the hypothesis of this parameter is true [55].
A threshold needs be determined among the M radii of concentric circles: H1
is the hypothesis for the radius belonging to the ith circle, i = 1, 2, ..., M and H0
corresponds to the hypothesis as the radii of other circles. In order to maximize the
pD (detection probability) for a given pF A (false alarm probability), H1 should be
chosen if the likelihood ratio
L(r) =
p(r; H1 )
>γ
p(r; H0 )
(4.1)
where p(r; H0 ) is the probability density function of r under the hypothesis H0 ,
p(r; H1 ) is the probability density function of r under the hypothesis H1 and the
threshold γ is calculated by
Z
pF A =
p(r; H0 )dr.
(4.2)
r;L(r)>γ
γ−r
σ2
) or γ = Φ−1 (pF A ) + r, where
This formulation can be solved by pF A = Φ( p
N
σ 2 /N
Φ(∗) is the tail probability of the standard normal distribution, and r is the radius
of single circle estimated by N data points. Actually the probability density distribution p(r; H0 ) is not a Normal distribution but a Rician distribution, however we can
consider it nearly as a normal distribution.
Given many noisy data points S = {s1 , s2 , ..., sN }, N = N1 + N2 + · · · + NM ,
67
estimate the approximate center coordinates ĉs = [âs , b̂s ]T in a p-dimensional feature
space. We shall use equation (3.1) to calculate the distances dj , j = 1, 2, ..., N between
the data points si and the approximate single circle center ĉs . This is an iterative
method. If the distance dj is smaller than the threshold γ, the data point sj belongs
to data set S01 or belongs to the other data set S02 . If the size of one data set is smaller
than a preset value α, the data set will be dropped. Then a new radius is generated
by data points of S01 to generate a new threshold γ 0 . Do the same steps to the data
set S02 also. Repeat the procedures until less than α data points left. At last we can
e = {S
e1, S
e 2 , ..., S
e }, where M̂ is the estimated number of
get groups of data points S
M̂
concentric circles.
The Distance Threshold method is high computation cost since it needs M −
1 times circle estimation. Another drawback is the decision of stop condition of
iteration. It is not easy to find an optimized value α as an abdicable size of data
groups that is suitable to all the situations. This method can only distinguish no
more than 3 concentric circles right now and it still need be improved.
4.2
Mean Shift Method
The Mean Shift algorithm is a non-parametric, iterative, gradient-based clustering
method for locating modes of the parameter probability density function. Unlike
the K-Means method, the Mean Shift clustering method does not need a previous
knowledge of the number of clusters. It iteratively shifts between each data point to
find the stationary points with weighted average of neighbor points by the estimated
probability density function. The Mean Shift method is unique from other gradient-
68
based algorithms since it is not necessary to specify the step size explicitly when the
window is moving along a path which means that the Mean Shift step is large in low
density regions and is small in high density [29].
Given a number of data points S = {s1 , s2 , ..., sN }, N = N1 + N2 + · · · + NM in
p-dimensional feature space. The circles fitting is in the 2-D plane. Estimate the
approximate center coordinates ĉs = [âs , b̂s ]T . Use equation (3.1) to calculate the
distances dj , j = 1, 2, ..., N between the data points si and the approximate single
circle center ĉs . Then the kernel density estimator of parameter dj with symmetric
kernel function K(∗) can be written as
!
N
X
d − dj 2
1
k fˆh,K (d) =
h N hp j=1
(4.3)
where fˆh,K (d) is an estimated space averaged density function at the point d, h is the
fixed bandwidth to be initialized and k(∗) is the profile of the kernel K such that the
kernel function can be expressed as
K(d) = ck kdk2
(4.4)
where c is a normalization constant. The modes of fˆh,K (d) are located among the
zeros of the gradient
2 !
N
X
2c
d
−
d
j
0 ∇fˆh,K (d) =
(d
−
d
)k
j
h .
N hp+2 j=1
69
(4.5)
Let g(d) = −k 0 (d) to yield


!
d − dj 2

j=1 dj g h 
2 ! − d
.
d − dj PN

j=1 g h
PN
" N
!# 
X
d − dj 2

2c

∇fˆh,K (d) =
g
h 
N hp+2 j=1

(4.6)
The first part of the above product is proportional to the PDF fh,K (d) and the second
part is the Mean Shift vector
!
d − dj 2
j=1 dj g h m(d) =
! − d.
d − dj 2
PN
j=1 g h PN
(4.7)
According to (4.3) and (4.6), the Mean Shift vector can be obtained as
∇fˆh,K (d)
1
m(d) = h2 c
2
fˆh,K (d)
(4.8)
which illustrates that the Mean Shift vector at point d is proportional to the density
gradient estimation, so the m(d) always directs to the gradient increasing direction
of fh,K (d). Thus a path leading to a stationary point of the estimated density can be
defined. The property of the Mean Shift that walks along the direction of increasing
gradient to the nearest mode makes it an ideal data clustering method.
The performance of the Mean Shift method heavily depends on the selection
of window size h. A small value of h will easily result in an overmuch number of
clusters while a large window width may lead to details missed and get incorrect
clusters number. In this simulation, we take the standard variance of all the distances
70
dj , j = 1, 2, ..., N as the basement to generate the bandwidth, since the variance
represents the density distribution. The window width h will be small if the density
of distance distribution is high so that the variance is small while h will be large when
the variance is large which means the distance of points distribute sparsely.
4.3
Concentric Circles Estimation Without Knowing Number of Circles
Come back to the question raised at the end of Chapter 2: how to do the estimation
if the number of concentric circles is unavailable. Combining the circles number
estimation method proposed by 4.2 and data points classification method proposed
by Chapter 3, we can solve the problem now. Given a series of noisy measurement
data points of concentric circles, we will estimate the number of circles first through
the Mean Shift method or the Distance Division method, and then use the K-Means
method, the Distance Division method or the Naive Bayes method to classify the
data points into groups to estimate the unknown parameters: radii and circles center.
Since the number of circles is not known, the number of circles estimation method,
such as the Mean Shift, can be used not only to estimate the number of circles but
also as training tools to construct the classified model for the Naive Bayes method
to calculate the prior probability and likelihood partition which introduced in section
3.3. The flowchart of the estimation algorithm for concentric circles without circles
number is drawn as below,
71
Partition all the received data points into several pieces.
pieces of data points
Train the other pieces of data points by the Mean Shift method.
training model
repeat until
all the pieces
are tested
Test the left piece of data points by the Naive Bayes classifier.
circles number and new training (testing)
model
Test every piece again by the Naive Bayes classifier.
classified data points
Estimate the concentric circles by the Asymptotically efficient estimator.
This circles estimation method combines the Mean shift method and the Naive
Bayes classifier through the cross-validation technique to estimate the number of
circles and classify the data points which can increase the estimation performance
and improve the classification accuracy when the noise power is large. The estimator
with the cross-validation technique is much slower than normal estimator combining
the Mean Shift and Naive Bayes methods since the training and testing procedures
is repeated for several times.
4.4
Simulation
Since the standard variance quantizes the data points density distribution, we will try
the standard variance and half of standard variance as the Mean Shift window width
to find a suitable h to cluster the data points. Then we will test the performance
of the Mean Shift method and the Distance Threshold method by changing several
parameters, such as the number of data points Ni , radii ri0 , the number of concentric
circles M, etc. Single circle is a special case with h = dmax − dmin , where d is the
72
distance between the data points and the estimated approximate single circle center.
The simulation results of the question raised in the Chapter 2 that estimates the
concentrate circles without the knowledge of the number of circles will be expressed
at the end of this section.
Unless specified otherwise, the N = N1 + N2 + · · · + NM data points are randomly
sampled from the whole concentric circles. The noisy data points sj , j = 1, 2, ..., N
are generated by adding zero mean Gaussian noise to the true data points. The
concentric circles center is c0 = [0, 0]T and range of sampled circle is [0, 2π]. The
range of noise power is σ 2 = [102 , 10−2 ] and the correlation coefficient δ between the
x and y noise components is 0.8. The number of ensemble runs is L = 1000. In order
to ensure both the methods work stably, we will show the average results of five times
running.
Table 4.1 gives the estimation accuracy of the number of circles as the Mean Shift
window width is equal to half of standard variance or standard variance. The numbers
of concentric circles are M = 1, 2, 3, 4 and number of data points number in each circle
is Ni = 20, i = 1, 2, 3, 4. The distance between each radius of circle is 20. From the
Table 4.1, we can see that the Mean Shift method with window width h = std/2
performs better than h = std when the number of circles is large but h = std has
higher estimation accuracy of the number of circles when the number of circles is
small. Considering these two conditions, we choose the window width equaled to the
average of standard variance and half of standard variance. It can always achieve a
acceptable result as the number of circles is changing. All of the three window width
h cannot obtain a good result when the noise power is 102 . Fig 4.1 plots the circles
number estimation accuracy of three window width. In the following simulation, the
73
window size h = (std + std/2)/2 will be used.
Table 4.2 and fig 4.2 show the estimation accuracy of the number of circles by
changing the number of data points in each circle and keeping the number of data
points of each circle the same. The number of circles is M = 3, and the radii are
r10 = 140, r20 = 120, r30 = 100. The performance of the Mean Shift method is better
when the number of data points of each circle is changing from 10 to 30. The same
situation happens in the Distance Threshold method. It is easy to understand since
receiving more data points means more information being estimated. When the
amount of data points is small, the Distance Threshold method works better than
the Mean Shift method, but the estimation accuracy of the number of circles by the
Mean Shift method improves faster as the number of data points is increasing.
1
std/2 3 circles
std/2 3 circles
std/2 4 circles
std 2 circles
std 3 circles
std 4 circles
(std+std/2)/2 2 circles
(std+std/2)/2 3 circles
(std+std/2)/2 4 circles
0.9
circles number estimation accuracy
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−20
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 4.1: Appropriate window width of the Mean Shift Method. The number of
concentric circles is 1 to 4. The number of data points of each circle is 20. The
difference of radii is 20.
74
Table 4.1: Estimation accuracy of the number of circles with appropriate window
width of the Mean Shift Method for fig 4.1. The number of concentric circles is 1 to
4. The circles center is c0 = [0, 0]T , and the range of sampled circle is [0, 2π]. The
number of data points of each circle is 20. The difference of radii is 20.
10log(σ 2 )
-20
-10
0
10
20
N = 20
r0 = 100
1
1
1 0.0532
0
N = [20, 20]
std/2
r0 = [120, 100]
0.6444 0.6360 0.6196 0.1264
0
N = [20, 20, 20]
r0 = [140, 120, 100]
0.7752 0.7612 0.7536 0.5112 0.0008
N = [20, 20, 20, 20]
r0 = [160, 140, 120, 100] 0.8212 0.8024 0.7908 0.6012 0.0236
N = 20
r0 = 100
1
1
1 0.0532
0
N = [20, 20]
std
r0 = [120, 100]
0.9792 0.9776 0.9804 0.8868 0.1872
N = [20, 20, 20]
r0 = [140, 120, 100]
0.9540 0.9440 0.9348 0.9156 0.5128
N = [20, 20, 20, 20]
r0 = [160, 140, 120, 100] 0.3340 0.3164 0.2972 0.1220 0.0776
N = 20
r0 = 100
1
1
1 0.0532
0
N = [20, 20]
(std/2+ r0 = [120, 100]
0.9164 0.9008 0.9100 0.6728 0.0112
std)/2
N = [20, 20, 20]
r0 = [140, 120, 100]
0.9472 0.9364 0.9312 0.8484 0.2644
N = [20, 20, 20, 20]
r0 = [160, 140, 120, 100] 0.8924 0.8984 0.8732 0.8292 0.4672
75
Table 4.2: Estimation accuracy of the number of circles of the Mean Shift method and
the Distance Threshold method with the equal number of data points of each circle
for fig 4.2. The number of concentric circles is M = 3, circles center is c0 = [0, 0]T ,
and range of sampled circle is [0, 2π]. The radii are r0 = [140, 120, 100].
10log(σ 2 )
-20
-10
0
10
20
N = [10, 10, 10] 0.7848 0.7688 0.7672 0.6264 0.1916
Mean Shift
N = [20, 20, 20] 0.9512 0.9396 0.9332 0.8408 0.2504
N = [30, 30, 30] 0.9868 0.9860 0.9796 0.9196 0.3116
N = [10, 10, 10] 0.9136 0.8992 0.8940 0.8524 0.0616
Distance Threshold N = [20, 20, 20] 0.8912 0.9260 0.9788 0.8860 0.0718
N = [30, 30, 30] 0.9356 0.9580 0.9764 0.8636 0.3260
1
0.9
circles number estimation accuracy
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−20
Mean Shift N=10,10,10
Mean Shift N=20,20,20
Mean Shift N=30,30,30
Distance Threshold N=10,10,10
Distance Threshold N=20,20,20
Distance Threshold N=30,30,30
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 4.2: Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with the equal number of data points of each
circle. The radii of each circle are r0 = [140, 120, 100].
The Mean Shift clustering method is not affected by the difference of the number
of data points of each circle as table 4.3 and fig 4.3 shown. After we enlarge the
difference of the number of data points by changing N1 = 20, N2 = 10, N3 = 10 to
N1 = 30, N2 = 15, N3 = 10, the Mean Shift method obtains better performance since
76
the number of data points is increasing. On the other hand, the Distance Threshold
method performs well when N1 = 20, N2 = 10, N3 = 10, but it works poorly when
N1 = 30, N2 = 15, N3 = 10. It states that the Distance Threshold method can adapt
to a small difference between the number of data points of each circle, but it cannot
hold large quantity variance.
Table 4.3: Estimation accuracy of the number of circles of the Mean Shift method and
the Distance Threshold method with the different number of data points of each circle
for fig 4.3. The number of concentric circles is M = 3, circles center is c0 = [0, 0]T ,
and range of sampled circle [0, 2π]. The radii are r0 = [140, 120, 100].
10log(σ 2 )
-20
-10
0
10
20
Mean Shift
N = [20, 10, 10] 0.9288 0.9188 0.9124 0.8276 0.2552
N = [30, 15, 10] 0.9844 0.9784 0.9760 0.8752 0.2668
Distance Threshold N = [20, 10, 10] 0.9868 0.9828 0.9684 0.9276 0.5148
N = [30, 15, 10] 0.0148 0.0112 0.0108 0.1180 0.5868
Table 4.4 and fig 4.4 illustrate the data clustering result when N1 = 20, N2 =
20, N3 = 20 and the difference of radii is raising from 10 to 40. For the Mean Shift
method, the large difference of radii is helpless to the accuracy when the noise power
is small but it is helpful if the noise power is larger than 10−1 . The gained difference
between the radii of circles has a great improvement to the Distance Threshold method
especially when the noise power is larger than 10.
77
Table 4.4: Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with different radii for fig 4.4. The number of
concentric circles is M = 3, circles center is c0 = [0, 0]T , and range of sampled circle
is [0, 2π]. The number of each circle is N = [20, 20, 20]. The difference of radii is
raising from 10 to 40.
10log(σ 2 )
-20
-10
0
10
20
0
r = [120, 110, 100] 0.9536 0.9428 0.9164 0.5012 0.1172
Mean Shift r0 = [140, 120, 100] 0.9548 0.9468 0.9360 0.8484 0.2476
r0 = [180, 140, 100] 0.9561 0.9540 0.9463 0.9260 0.6562
r0 = [120, 110, 100] 0.8984 0.8988 0.9752 0.4568
0
Distance
r0 = [140, 120, 100] 0.9400 0.9364 0.9640 0.9696 0.0224
Threshold r0 = [180, 140, 100] 0.9788 0.9752 0.9816 0.9920 0.6236
1
0.9
circles number estimation accuracy
0.8
0.7
0.6
0.5
0.4
Mean Shift N=20,10,10
Mean Shift N=30,15,10
Distance Threshold N=20,10,10
Distance Threshold N=30,15,10
0.3
0.2
0.1
0
−20
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 4.3: Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with the different number of data points of each
circle. The radii of each circle are r0 = [140, 120, 100].
78
1
0.9
circles number estimation accuracy
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
−20
Mean Shift R=120,110,100
Mean Shift R=140,120,100
Mean Shift R=180,140,100
Distance Threshold R=120,110,100
Distance Threshold R=140,120,100
Distance Threshold R=180,140,100
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 4.4: Estimation of accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with the difference of radii raising from 10 to 40.
The number of data points of each circle is N = [20, 20, 20].
Table 4.5 and fig 4.5 express the data clustering results of both the method with
different number of concentric circles. Considering about the Mean Shift clustering
method first, we do not regard the single circle situation since it is based on an
independent window width. The estimation accuracy of the number of circles of
the Mean Shift method ascends from 2 concentric circles to 3 concentric circles and
descends from 3 concentric circles to 4 concentric circles. It is decided by the window
width h. From table 4.1 we know that small window width performs better when
the number of circles is small and large window width is more suitable to the large
number of circles , so the appropriate window width need to be selected according
to different situation. The estimation accuracy of the Distance Threshold method
decreases obviously when the number of circles is growing. The Distance Threshold
79
method operates better than the Mean Shift method if the number of circles is small.
Table 4.5: Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with different number of concentric circles for fig
4.5. The circles center is c0 = [0, 0]T , and the range of sampled circle is [0, 2π]. The
number of circles is raising from 1 to 4.
10log(σ 2 )
-20
-10
0
10
20
N = 20
r0 = 100
1
1
1 0.0532
0
N = [20, 20]
Mean
r0 = [140, 100]
0.9164 0.9004 0.9008 0.6592 0.0084
Shift
N = [20, 20, 20]
r0 = [180, 140, 100]
0.9416 0.9492 0.9416 0.8512 0.2472
N = [20, 20, 20, 20]
r0 = [220, 180, 140, 100] 0.9064 0.8780 0.8752 0.8208 0.4696
N = 20
r0 = 100
1
1
1
1
1
N = [20, 20]
Distance
r0 = [140, 100]
0.9996 0.9996 0.9936 0.8868 0.5488
Threshold N = [20, 20, 20]
r0 = [180, 140, 100]
0.8844 0.9172 0.9772 0.9016 0.7140
N = [20, 20, 20, 20]
r0 = [220, 180, 140, 100] 0.1536 0.2220 0.3028 0.9080 0.6716
80
Mean Shift 1 circles
Mean Shift 2 circles
Mean Shift 3 circles
Mean Shift 4 circles
Distance Threshold 1 circles
Distance Threshold 2 circles
Distance Threshold 3 circles
Distance Threshold 4 circles
circles number estimation accuracy
1
0.8
0.6
0.4
0.2
0
−20
−15
−10
−5
0
5
10log(noise power)
10
15
20
Figure 4.5: Estimation accuracy of the number of circles of the Mean Shift method
and the Distance Threshold method with the difference number of concentric circles.
The number of circles is raising from 1 to 4.
Setting M = 2, 3, 4, Ni = 20, i = 1, 2, ..., M , the inner circle radius are r10 = 100
and the difference between the radii is 40, check out the classification accuracy of
both the data clustering methods and show in the Table 4.6. Different from the
above simulation only thinking about the circles number estimation, this test focuses
on the belonging of the data points to the circles. Because this experiment is specific to each data point, it is clear that the classification accuracy is lower than the
estimation accuracy of the number of circles. Comparing with the three data classification methods, the K-Means method, the Distance Division method and the Naive
Bayes classifier introduced in Chapter 3, the classification accuracy of the Mean Shift
method and the Distance Threshold method is lower since the number of concentric
circles is not known.
81
Table 4.6: Comparison of classification accuracy of each circle of the Mean Shift
method and the Distance Threshold method for fig 4.6 when the number of data points
and the radii difference of each circle is the same. The circles center is c0 = [0, 0]T ,
and the range of sampled circle is [0, 2π]. The number of circles is raising from 2 to
4.
N = [10, 10]
10log(σ 2 )
-20
-10
0
10
20
0
Mean Shift
r1 = 140 0.9020 0.9320 0.8960 0.8737 0.2430
r20 = 100 0.9020 0.9320 0.8960 0.8739 0.2440
Distance Threshold r10 = 140 0.9260 0.9300 0.9260 0.9016 0.1666
r20 = 100 0.8400 0.8284 0.8157 0.8339 0.0915
N = [10, 10, 10]
r10 = 180 0.9375 0.9406 0.9385 0.9158 0.6282
Mean Shift
r20 = 140 0.9272 0.9244 0.9248 0.8931 0.5559
r30 = 100 0.9338 0.9393 0.9322 0.9096 0.6152
r10 = 180 0.9507 0.9433 0.9220 0.9300 0.6066
Distance Threshold r20 = 140 0.8380 0.8552 0.8642 0.8820 0.3580
r30 = 100 0.8225 0.8227 0.8424 0.8277 0.3149
N = [10, 10, 10, 10]
r10 = 220 0.8756 0.8781 0.8738 0.8398 0.6515
Mean Shift
r20 = 180 0.8301 0.8271 0.8290 0.7920 0.5580
r30 = 140 0.8450 0.8421 0.8325 0.7947 0.5360
r40 = 100 0.8683 0.8684 0.8782 0.8382 0.6441
r10 = 220 0.9040 0.8900 0.8960 0.8660 0.7031
Distance Threshold r20 = 180 0.1006 0.1143 0.1560 0.2615 0.4839
r30 = 140 0.1243 0.1466 0.1916 0.3014 0.3633
r40 = 100 0.2089 0.2516 0.4145 0.5825 0.4456
Now we test the estimate results of the estimator combining the Mean shift method
and the Naive Bayes classifier through the cross-validation technique by the MSE of
the estimated parameters: circles center and radii. Given the number of data points
is N1 = 20, N2 = 20, N3 = 20, radii are r10 = 140, r20 = 120, r30 = 100 and the range of
sampled circle is [0, 2π]. The range of noise power is σ 2 = [10−2 , 102 ]. From fig 4.6
we can see that the curve of MSE of the proposed method is very near to the curve
of MSE of the true classified data points and the KCR lower bound. It can reach the
82
KCR lower bound when the noise power is lower than 10−2 .
20
15
Meanshift−Naivebayes estimator
True classification data
KCR
10
10*log(MSE)
5
0
−5
−10
−15
−20
−25
−30
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 4.6: Comparison of the MSE of the Meanshift-Naivebayes estimate method and
the truly classified data estimation with the KCR lower bound when N = [20, 20, 20],
r0 = [140, 120, 100].
Then we change the number of data points of each circle to N1 = 10, N2 =
20, N3 = 30 to see how the proposed estimator performs showed in fig 4.7. The radii
of concentric circles are r10 = 140, r20 = 120, r30 = 100. The MSE of the circles center
and radii of the proposed method cannot reach the KCR lower bound unless the noise
power is lower than 10−2 .
Decrease the difference of the circles radii to r10 = 120, r20 = 110, r30 = 100 and
keep the number of data points of each circle as N1 = 20, N2 = 20, N3 = 20 to do the
estimate given in fig 4.8. Similar as the results above, the MSE of the circles center
and radii of the proposed estimator reaches the KCR lower bound if the noise power
is smaller than 10−2 .
83
20
15
Meanshift−Naivebayes estimator
True classification data
KCR
10
10*log(MSE)
5
0
−5
−10
−15
−20
−25
−30
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 4.7: Comparison of the MSE of the Meanshift-Naivebayes estimate method and
the truly classified data estimation with the KCR lower bound when N = [10, 20, 30],
r0 = [140, 120, 100].
30
20
Meanshift−Naivebayes estimator
True classification data
KCR
10*log(MSE)
10
0
−10
−20
−30
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 4.8: Comparison of the MSE of the Meanshift-Naivebayes estimate method and
the truly classified data estimation with the KCR lower bound when N = [20, 20, 20],
r0 = [120, 110, 100].
84
Table 4.7: Summary of MSE of the circles center and radii, estimation accuracy of circles number and classification accuracy of data points with the Meanshift-Naivebayes
estimator, truly classified data points estimation when the number of data points and
radii are different.
10log(σ 2 )
-20
-10
0
10
20
0
N = [20, 20, 20], r = [140, 120, 100]
MSE of Meanshift-Naivebayes
-27.513 -14.446 -2.551 5.885 18.367
MSE of True classification
-27.521 -18.020 -8.212 2.220 12.710
KCR
-27.606 -18.055 -8.212 1.413 11.492
accuracy of the number of circles
1
1
1
1
1
classification accuracy of points
1
1
1 0.980 0.798
0
N = [10, 20, 30], r = [140, 120, 100]
MSE of Meanshift-Naivebayes
-26.885 -2.451 -0.166 6.504 16.428
MSE of True classification
-27.653 -16.944 -6.959 2.708 13.093
KCR
-28.040 -17.091 -6.928 2.222 11.906
accuracy of the number of circles
1
1
1
1
1
classification accuracy of points
1
0.998 0.999 0.982 0.797
N = [20, 20, 20], r0 = [120, 110, 100]
MSE of Meanshift-Naivebayes
-28.023 -13.585 -4.461 6.096 21.808
MSE of True classification
-28.925 -18.275 -8.273 1.410 13.339
KCR
-28.943 -18.283 -8.345 1.347 11.997
accuracy of the number of circles
1
1
1
1
1
classification accuracy of points
1
0.999 0.992 0.897 0.538
Table 4.7 summaries the MSE and classification accuracy, estimation accuracy
when the number of data points and radii are different for fig 4.6, 4.7 and 4.8. The
proposed estimator has better performance with the equal number of data points
than the different number of data points in each circle because the situation with
equal number of data points has high classification accuracy. Although the concentric
circles with large difference of radii have high classification accuracy of data points,
the MSE of the circles center and radii of the proposed method with larger radii
difference is not always smaller since the MSE of radii is affected by the length of the
radii. The estimation accuracy of circles number reaches to 100% for the proposed
85
circles estimator since the cross-validation technique repeats the Mean Shift clustering
method to average the results.
Fig 4.9, 4.10, 4.6 and 4.11 express the MSE of the circles center and radii of the
proposed estimation method and compare with the MSE of true classification data
points and the KCR lower bound when the number of concentric circles is raising
from 1 to 4. The number of data points of each circle is 20 and the radii of inner
circle is 100, the difference between the neighbor radii is 20. As the number of circles
is increasing, the performance of the Meanshift-Bayes estimate method is worse since
the classification accuracy of data points is decreasing. Although most of the data
points are correctly classified, several wrong assignment will highly affect the averaged
MSE. Table 4.8 summaries the MSE of the circles center and radii of the MeanshiftBayes circles estimation method, true classification data points and the KCR lower
bound and the estimation accuracy of circles number and classification accuracy of
data points of the Meanshift-Bayes method when the number of circles is changing
from 1 to 4 and the range of noise power is 10−2 to 102 . The estimation accuracy
of the number of circles of the Meanshift-Bayes method is able to reach 100% with
different number of circles.
86
Table 4.8: Summary of MSE of the circles center and radii, estimation accuracy of the
number of circles and classification accuracy of points with the Meanshift-Naivebayes
estimator, truly classified data points estimation when the number of concentric circles
are different.
10log(σ 2 )
-20
-10
0
10
20
0
N = [20], r = [100]
MSE of Meanshift-Naivebayes
-26.215 -16.682 -6.140 3.221 13.779
MSE of True classification
-26.215 -16.682 -6.140 3.221 13.779
KCR
-26.933 -17.118 -7.147 1.676 12.910
accuracy of the number of circles
1
1
1
1
1
classification accuracy of points
1
1
1
1
1
0
N = [20, 20], r = [120, 100]
MSE of Meanshift-Naivebayes
-27.888 -17.886 -7.730 2.513 18.990
MSE of True classification
-27.888 -17.875 -8.062 1.844 12.621
KCR
-27.985 -17.960 -8.562 1.616 11.914
accuracy of the number of circles
1
1
1
1
1
classification accuracy of points
1
1
1 0.992 0.737
N = [20, 20, 20], r0 = [140, 120, 100]
MSE of Meanshift-Naivebayes
-27.513 -14.446 -2.551 5.885 18.367
MSE of True classification
-27.521 -18.020 -8.212 2.220 12.710
KCR
-27.606 -18.055 -8.212 1.413 11.492
accuracy of the number of circles
1
1
1
1
1
classification accuracy of points
1
1
1 0.980 0.798
N = [20, 20, 20, 20], r0 = [160, 140, 120, 100]
MSE of Meanshift-Naivebayes
-5.052 -2.711 5.929 9.819 26.879
MSE of True classification
-27.927 -17.455 -7.470 2.375 13.288
KCR
-28.191 -17.871 -7.481 2.102 12.376
accuracy of the number of circles
1
1
1
1
1
classification accuracy of points
0.999
0.995 0.991 0.962 0.562
87
15
10
Meanshift−Naivebayes estimator
True classification data
KCR
5
10*log(MSE)
0
−5
−10
−15
−20
−25
−30
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 4.9: Comparison of the MSE of the Meanshift-Naivebayes estimation method
and the truly classified data estimation with the KCR lower bound when N = [20],
r0 = [100].
20
15
Meanshift−Naivebayes estimator
True classification data
KCR
10
10*log(MSE)
5
0
−5
−10
−15
−20
−25
−30
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 4.10: Comparison of the MSE of the Meanshift-Naivebayes estimation method
and the truly classified data estimation with the KCR lower bound when N = [20, 20],
r0 = [120, 100].
88
30
20
Meanshift−Naivebayes estimator
True classification data
KCR
10*log(MSE)
10
0
−10
−20
−30
−20
−15
−10
−5
0
5
10*log(noisepower)
10
15
20
Figure 4.11: Comparison of the MSE of the Meanshift-Naivebayes estimation method
and the truly classified data estimation with the KCR lower bound when N =
[20, 20, 20, 20], r0 = [160, 140, 120, 100].
4.5
Summary
This chapter introduces two methods to estimate the number of concentric circles
and classify the data points to the circles. The Distance Threshold method is similar
to the hypothesis testing technique that the test statistic is computed to make a
decision regarding to the desired hypothesis. A threshold is calculated from the false
alarm probability to cluster the circles data points. The Mean Shift method does not
depend on a previous knowledge of the number of concentric circles but clusters them
by the characteristic of distance between the data points and an initial estimate of the
single circle center. The Distance Threshold method performs better than the Mean
Shift method when the number of circles is small and the difference of the number
89
of data points of each circle is not large. The Mean Shift method does not affect
by the difference of number of data points of each circle and still works well if the
number of circles is large. The estimation accuracy of the number of circles using
the Mean Shift method is determined by the window width and appropriate window
width will generate good clustering result. The performance of data assignment of
Mean Shift method is worse than the K-Means method, since the Mean Shift method
does not use the information of the number of circles. Both the Mean Shift method
and Distance Threshold method are iterative.
The Mean Shift method and Distance Threshold method can be used as training
procedure to estimate the prior probability and the likelihood for the Naive Bayer
classifier. A new concentric circles estimator is proposed that combines the Mean Shift
method and the Naive Bayes classifier together with the cross-validation algorithm.
It can raise the estimation accuracy of the number of circles to 1 and also increase
the classification accuracy, which can help to improve the performance of circles
estimation.
90
Chapter 5
Conclusion and Future Work
In this chapter, we shall summary the research work proposed in this thesis. The
possible research directions in the future that may improve the performance of the
research work will also be presented.
5.1
Conclusion of Research
Circles fitting is an important nonlinear estimation problem in the digital signal
processing research field. The main objective is to estimate the unknown circles
parameters from a number of noisy data points on the circles. The KCR lower bound
that is suitable to any biased estimator is used as a performance reference to evaluate
the MSE of the estimated parameters such as circles center and radii.
The non-iterative estimator Kasa method and the iterative estimator ML method
are introduced in Chapter 2. The Kasa method is easy to implement and costs less
than the iteration method, but it may not be a good choice since it computes normal
91
equations which will lead to numerical instability. The ML estimator realized by the
Taylor-series approach performs better than the Kasa method when the noise power
is large. However, the performance of the ML method depends on the initialization
which may lead to local optimum instead of global optimum solution. In the simulation, both the estimators are compared by the same parameters and measurement
data points with white Gaussian noise. The ML method approaches higher estimate
accuracy than the Kasa method when the data points are sampled from the whole circle which means more analyzable information. The estimation accuracy of the Kasa
method is close to the ML method if the data points are sampled from a smaller arc.
Both the estimators can reach the KCR lower bound when the noise power is in low
level and the ML method has better performance than the Kasa method as the noise
power becomes large.
The single circle estimation is the basic of circles fitting and concentric circles estimation is also an important and popular problem in many applications. An asymptotically efficient estimator for concentric circles is introduced in Chapter 2 that is based
on two digital signal processing techniques, the weighted equation error formulation
and the nonlinear parameter transformation [18]. The weighting matrix depends on
the unknown values and the acceptable estimated results can be achieved after one
repeated calculation with the weighting matrix updated. The asymptotically efficient
estimator can be applied to multiple concentric circles and reduced to single circle
fitting. We compare the asymptotically efficient estimator with the Kasa method and
evaluate their estimate accuracy of parameters, circles center and radii, by the KCR
lower bound. The asymptotically efficient estimator requires more computation time
but gives better results than the Kasa method. The asymptotically efficient estimator
92
can reach the KCR lower bound when the noise power is small or radii difference is
large. At the end of Chapter 2, two problems of concentric circles estimate are put
forward: data points classification with known number of concentric circles and with
unknown number of concentric circles. Several solutions have been proposed for each
problem in Chapter 3 and Chapter 4.
The Distance Division method is based on the comparison of the estimated radii
and the distance between data points and circles center to classify the data points.
Since the circles center is unavailable, we assume that all the data points are from
a single circle to estimate the circle center. The Distance Division method does the
circle estimate twice to classify the data points belonging to different circles. The
K-Means method is a widely used data clustering method that is efficient and easy
to implement. For the data points classification, the known number of circles is the
clustering number k. The K-Means is an iterative algorithm and the cluster centroids
are randomly selected or initialized. Appropriate repetition times can be chosen to
balance the clustering accuracy and the calculation time. The Naive Bayes classifier
is an efficient and inductive learning algorithm for data classification. The classified
model is constructed by the prior probability and the likelihood component calculated
by the training process. The Distance Division method performs better than the KMeans method when each circle has equal or nearly equal number of data points. The
Naive Bayes classifier usually obtains the best and stablest performance so long as
the training data set is large enough. The classification accuracy of K-Means method
is improved as the number of data points is becoming large and not influenced by the
difference of the number of data points in each circle. For all the three classification
methods, the more is number of concentric circles, the lower is the classification
93
accuracy.
We must estimate the number of concentric circles before concentric circles fitting
if it is not known. Two methods are proposed to estimate the number of concentric circles in Chapter 4. The Distance Threshold method repeatedly computes the
threshold to cluster the data points to estimate the number of circles. The Mean
Shift is a gradient-based clustering method that clusters the data points without the
knowledge of the number of clusters, and it can be used to estimate the number of
concentric circles. It is an iterative clustering method and the window width needs be
initialized, which will heavily affect the clustering performance. The Distance Threshold method achieves higher estimation accuracy of the number of circles when there
are limited number of data points and the difference of the number of data points
of each circle is not very large. The clustering accuracy of the Mean Shift method
rises obviously if the number of data points is large and is not sensitive to the data
number difference of concentric circles. The Mean Shift method can also be applied to
data classification but the performance is worse than the K-Means method. However
the Mean Shift method does not need to know the number of concentric circles but
the K-Means method does. Although the classification accuracy of the Mean Shift
method and the Distance Threshold method is not high enough, they can be used as
training of the Naive Bayes classifier to obtain the prior probability and likelihood
component. It reduces the computation if the number of concentric circle is unavailable. All of the five data clustering methods are actually clustering the distances of
the data points to estimated circle center which is based on the assumption that all
the data points belong to a single circle. This assumption has avoidless estimation
error that will lead to inaccuracy in distance computation and consequently affecting
94
the clustering results.
After comparing all the data classification methods and circles number estimation
method, we propose a new concentric circles estimator which can fit the concentric
circles when the number of circles is not known. This estimator combines the Mean
Shift method and the Naive Bayes method and applies the cross-validation algorithm
to improve performance. It can improve the accuracy of circles number estimation
to nearly 100% and also increase accuracy of data classification when the noise power is large. Compared with estimation method which simply uses the Mean Shift
method to estimate the circles number and then classifies the data points by the
Naive Bayes classifier, the new proposed estimator has better performance but costs much more calculation time since the cross-validation algorithm requires that the
Mean Shift method and Naive Bayes classifier is repeatedly used to average the result
that improve the stability of the estimator.
5.2
Future Work
From Chapter 3 we find that the concentric circles estimator we used is highly sensitive to the data classification and the MSE of circle parameters estimated from
the classified data points is higher than the truly classified data points although the
probability of classification accuracy is 99% more or less. Since it may be difficult to
raise the classification accuracy, finding out a concentric circles estimator that is not
so heavily affected by the error of data points classification may be possible.
The Distance Division method for data points classification easily separates the
data points into several groups with equal number of data points in each circle. This
95
solution has a big problem when the number of data points of each circle is different.
Separating the data points through statistical probabilistic approach instead of equal
separation may improve the classification performance for different number of data
points of each circle.
The clustering centroids of the K-Means method are randomly selected that may
result in local optimum and we use iterations to find the optimized clustering. If
the number of concentric circles is unavailable and the Mean Shift method is used
to estimate the number of circles, the clustering centroid of the Mean Shift method
may be used as the initial cluster centroids for the K-Means method to improve the
classification accuracy and reduce computation time.
The data clustering accuracy of the Mean Shift method is decided by the window
width selection and a fixed function based on standard variance cannot adapt to all
the situations. Our hope is to find out a adaptive window width that can achieve
more accurate data clustering as the number of concentric circle is changing.
96
BIBLIOGRAPHY
[1] W. Li, J. Zhong, T.A. Gulliver and B. Rong, “Fitting Noisy Data to A Circle: A
Simple Iterative Maximum Likelihood Approach”, IEEE Int. Conf. Communications
(ICC), pp. 1-5, Jun. 2011.
[2] J. Lllingworth and J. Kittler, “A Survey of the Hough Transform”, Computer
Vision, Graphics, and Image Processing, vol. 44, pp. 87-116, Oct. 1988.
[3] Y. Qiao and S.H. Ong, “Connectivity-based Multiple-circle Fitting”, Pattern
Recognition, vol. 37, pp. 755-765, Apr. 2004.
[4] P. Chung, S. Yu, C.M. Lyu and J. Liu, “An Iris Segmentation Scheme Using
Delogne-Kasa Circle Fitting Based on Orientation Matching Transform”, IEEE Int.
Symposium Computer, Consumer and Control, pp. 127-130, 2014.
[5] M. Baum, V. Klumpp and U.D. Hanebeck, “A Novel Bayesian Method for Fitting
a Circle to Noisy Points” , IEEE Int. Conf. Information Fusion (FUSION), pp. 1-6,
Jul. 2010.
[6] X. Huang, T. Sasaki, H. Hashimoto and F. Inoue, “Circle Detection and Fitting
Based Positioning System Using Laser Range Finder”, IEEE/SICE Int. Symposium.
System Integration (SII), pp. 442-447, Dec. 2010.
[7] X. Huang, T. Sasaki, H. Hashimoto and F. Inoue, “Circle Detection and Fitting Using Laser Range Finder for Positioning System”, IEEE Int. Conf. Control
Automation and Systems (ICCAS), pp. 1366-1370, Oct. 2010.
97
[8] G. Vandersteen, J. Schoukens, Y. Rolain and A. Verschueren, “A Circle Fitting
Procedure Using Semi-Parametric Modeling: Toward An Improved Sliding Load Calibration Procedure”, IEEE Int. Conf. Instrumentation and Measurement Technology,
vol. 2, pp. 1254-1258, Jun. 1996.
[9] I. Kasa, “A Circle Fitting Procedure and Its Error Analysis”, IEEE Trans. Instrumentation and Measurement, vol. IM-25, pp. 8-14, Mar. 1976.
[10] Z. Ma, K.C. Ho and L. Yang, “Solutions and Comparison of Maximum Likelihood
and Full-Least-Squares Estimations for Circle Fitting”, IEEE Int. Conf. Acoustics,
Speech and Signal Processing, pp. 3257-3260, Apr. 2009.
[11] E.E. Zelniker and I.V.L. Clarkson, “Maximum Likelihood Estimation of Circle
Parameters Via Convolution”, IEEE Trans. Image Processing, vol. 15, pp. 865-876,
Apr. 2006.
[12] E.E. Zelniker, B.C. Appleton and I.V.L. Clarkson, “Optimal Circle Fitting Via
Branch and Bound”, IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol.
4, pp. 709-712, 2005.
[13] D. Umbach and K.N. Jones, “A Few Methods for Fitting Circles to Data”, IEEE
Trans. Instrumentation and Measurement, vol. 52, pp. 1881-1885, Dec. 2003.
[14] P. OLeary, M. Harker and P. Zsombor-Murray, “Direct and Least Square Fitting
of Coupled Geometric Objects for Metric Vision”, Vision, Image and Signal Processing, vol. 152, pp. 687-694, Dec. 2005.
[15] J. Marot and S. Bourennane, “Subspace-Based and DIRECT Algorithms for
98
Distorted Circular Contour Estimation”, IEEE Trans. Image Processing, vol. 16, pp.
2369-2378, Sept. 2007.
[16] X. Chen, L. Lu and Y. Gao, “A New Concentric Circle Detection Method Based
on Hough Transform”, IEEE Int. Conf. Computer Science & Education (ICCSE),
pp. 753-758, Jul. 2012.
[17] I.A. Al-Subaihi, “Fitting Two Concentric Circles and Spheres to Data by Orthogonal Distance Regression”, International Mathematical Forum, no. 21, pp. 1021-1032,
2009.
[18] Z. Ma and K.C. Ho, “Asymptotically Efficient Estimators for the Fittings of
Coupled Circles and Ellipses”, Digital Signal Processing, vol. 25, pp. 28-40, Feb.
2014.
[19] T. Wang, L. Wang, Z. Xie and R. Yang, “Data Compression Algorithm Based on
Hierarchical Cluster Model for Sensor Networks”, IEEE Inf. Conf. Future Generation
Communication and Networking, vol. 2, pp. 319-323, Dec. 2008.
[20] J.M. Jolion, P. Meer and S. Bataouche, “Robust Clustering with Applications in
Computer Vision”, Trans. Pattern Analysis and Machine Intelligence, vol. 13, pp.
791-802, Aug. 1991.
[21] Y. Cheng, S. Huang, T. Lv and G. Liu, “A New Data Clustering Algorithm”,
IEEE Int. Conf. Internet Computing for Science and Engineering, pp. 106-111, 2010.
[22] J. Agrawal, S. Soni, S. Sharma and S. Agrawal, “Modification of Density Based
Spatial Clustering Algorithm for Large Data base Using Naive Bayes Theorem”, IEEE
99
Int. Conf. Communication Systems and Network Technologies (CSNT), pp. 419-423,
Apr. 2014.
[23] R.V. Singh and M.P.S. Bhatia, “Data Clustering with Modified K-Means Algorithm”, IEEE Int. Conf. Recent Trends in Information Technology (ICRTIT), pp.
717-721, Jun. 2011.
[24] P.S. Bradley, U. Fayyad and C. Reina, “Scaling Clustering Algorithm to Large
Databases”, American Association for Artificial Intelligence, pp. 1-7, 1998.
[25] I.S. Dhillon, Y. Guan, B. Kulis, “Kernel K-Means, Spectral Clustering and Normalized Cuts”, Int. Conf. Knowledge Discovery and Data Mining, pp. 551-556, Aug.
2004.
[26] A. Chaturvedi, P.E. Green and J.D. Caroll, “K-Modes Clustering”, Journal of
Classification, vol. 18, pp. 35-55, Jun. 2011.
[27] K. Fukunaga and L. Hostetler, “The Estimation of The Gradient of Density Function, with Applications in Pattern Recognition”, IEEE Trans. Information Theory,
vol. 21, pp. 32-40, Jan. 1975.
[28] Y. Cheng, “Mean Shift, Mode Seeking, and Clustering”, IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 17, pp. 790-799, Aug. 1995.
[29] D. Comaniciu and P. Meer, “Mean Shift: A Robust Approach Toward Feature
Space Analysis”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24,
pp. 603-619, May 2002.
100
[30] J. Wang, B. Thiesson, Y. Xu and M. Cohen, “Image and Video Segmentation by
Anisotropic Kernel Mean Shift”, European Conference on Computer Vision (ECCV),
pp. 238-249, May 2004.
[31] D. Comaniciu, V. Ramesh and P. Meer, “Kernel-Based Object Tracking”, IEEE
Trans. Pattern Analysis and Machine Intelligence, vol. 25, pp. 564-577, May 2003.
[32] B. Georgescu, I. Shimshnoi and P. Meer, “Mean Shift Based Clustering in High
Dimensions: A Texture Classification Example”, IEEE Int. Conf. Computer Vision,
vol. 1, pp. 456-463, Oct. 2003.
[33] C. Yang, R. Duraiswami, N.A. Gumerov and L. Davis, “Improved Fast Gauss
Transform and Efficient Kernel Density Estimation”, IEEE Int. Conf. Computer
Vision, vol. 1, pp. 664-671, Oct. 2003.
[34] H. Guo, P. Guo and H. Lu, “A Fast Mean Shift Procedure with New Iteration
Strategy and Re-sampling”, IEEE Int. Conf. Systems, Man and Cybernetics, vol. 3,
pp. 2385-2389, Oct. 2006.
[35] S. Jain and S. Mishra, “ANN Approach Based on Back Propagation Network
and Probabilistic Neural Network to Classify Brain Cancer”, International Journal
of Innovative Technology and Exploring Engineering (IJITEE), vol. 3, pp. 101-105,
Aug. 2013.
[36] K. Chen and L. Liu, “Privacy Preserving Data Classification with Rotation Perturbation”, IEEE Int. Conf. Data Mining, Nov. 2005.
[37] N. Pochet, F.D. Smet, J.A.K. Suykens and B. Moor, “Systematic Benchmarking
101
of Microarray Data Classification: Assessing the Role of Non-linearity and Dimensionality Reduction”, Bioinformatics, vol. 20, pp. 3185-3195, Jul. 2004.
[38] L. Deng and D. Yu, “Deep Convex Net: A Scalable Architecture for Speech
Pattern Classification”, Interspeech, pp. 2285-2288, Aug. 2011.
[39] I. Rish, “An Empirical Study of the Naive Bayes Classifier”, IJCAI workshop
Empirical Methods in AI, pp. 41-46, 2001.
[40] M. Mehta, R. Agrawal and J. Rissanen, “SLIQ: A Fast Scalable Classifier for
Data Mining”, Advances in Database Technology, pp. 18-32, Jun. 2005.
[41] R. Kohavi, “Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree
Hybrid”, Int. Conf. Knowledge Discovery and Data Mining, pp. 202-207, 1996.
[42] J.A. Benediktsson, P.H. Swain and O.K. Ersoy, “Neural Network Approaches
Versus Statistical Methods in Classification of Multisource Remote Sensing Data”,
IEEE Trans. Geoscience and Remote Sensing, vol. 28, pp. 540-552, Jul. 1990.
[43] T.S. Furey, N. Cristianini, N. Duffy, D.W. Bednarski, M. Schummer and D.
Haussler, “Support Vector Machine Classification and Validation of Cancer Tissue
Samples Using Microarray Expression Data”, Bioinformatics, vol. 16, pp. 906-914,
May 2000.
[44] G. Camps-Valls, L. Gomez-Chova, J. Calpe-Maravilla and J.D. Martin-Guerrero,
“Robust Support Vector Method for Hyperspectral Data Classification and Knowledge Discovery”, IEEE Trans Geoscience and Remote Sensing, vol. 42, pp. 1530-1542,
Jul. 2004.
102
[45] G. Guo, H. Wang, D. Bell, Y. Bi and K. Greer, “KNN Model-Based Approach
in Classification”, On the Move to Meaningful Internet Systems, pp. 986-996, 2003.
[46] D. Hecherman, “Bayesian Networks for Data Mining”, Data Mining and Knowledge Discovery, vol. 1, pp. 79-119, 1997.
[47] P.S. Broos, K.V. Getman, M.S. Povich, L.K. Townsley, E.D. Feigelson and G.P.
Garmire, “A Naive Bayes Source Classifier for X-ray Sources”, Astrophysical Journal
Supplement Series, Feb. 2011.
[48] D. Soria, J.M. Garibaldi, F. Ambrogi, E.M. Biganzoli and I.O. Ellis, “A Nonparametric Version of the Naive Bayes Classifier”, Knowledge Based Systems, vol.
24, pp. 775-784, Aug. 2011.
[49] D.M. Farid, N. Harbi and M.Z. Rahman, “Combining Naive Bayes and Decision
Tree for Adaptive Intrusion Detection”, International Journal of Network Security &
Its Applications, 2010.
[50] J. Zhang, C. Chen, Y. Xiang and W. Zhou, “Internet Traffic Classification by
Aggregating Correlated Naive Bayes Predictions”, IEEE Trans. Information Forensics
and Security, vol. 8, pp. 5-15, Oct. 2012.
[51] G.H. Golub and C. Reinsch, “Singular Value Decomposition and Least Squares
Solutions”, Numerische Mathematik, vol. 14, pp. 403-420, 1970.
[52] N. Chernov, “Circular and Linear Regression: Fitting Circles and Lines by Least
Squares”, Boca Raton, 2011.
103
[53] N. Chernov and C. Lesort, “Statistical Efficiency of Curve Fitting Algorithms”,
Computational Statistics & Data Analysis, vol. 47, pp. 713-728, Nov. 2004.
[54] P. Domingos, M. Pazzani, “Beyond independence: Conditions for the Optimality
of the Simple Bayesian Classifier”, Int. Conf. Machine Learning, 1996.
[55] F.J. Gravetter and L.B. Wallnau, “Statistics for the Behavioral Sciences”, 9th
edition, Casebound, 2013.
104