Comparison of Supervised Learning Algorithms for RF

Comparison of Supervised Learning Algorithms for
RF-Based Breast Cancer Detection
Sudarshan Nayak and Dipanjan Gope
Indian Institute of Science, Bangalore, 560055, India
[email protected]
Abstract— Three-dimensional imaging based on radio frequency that exploits the contrast in dielectric properties of
tissues may be used as a low-cost, non-invasive and nonionizing methodology for breast cancer detection. This paper
demonstrates the use of various supervised machine learning
algorithms in classification of breast tissues into less-dense fatty
and dense fibroglandular or malignant classes from the measured
scattered electric field data obtained through antennas placed
around the breast tissue. A comparison on the performance of
these algorithms are also presented. Such a classification step may
be followed by a quantitative non-linear optimization scheme to
obtain a more precise reconstruction of the tissue profile.
Fig. 1. RF imaging antennas placed in a hemisphere around the breast profile.
I. I NTRODUCTION
Breast cancer is one of the leading causes for death in
women across the globe. Early detection is key to reducing the
mortality rate [1]. Among existing techniques, mammogram is
based on ionizing radiation and MRI is expensive and therefore
not conducive to regular monitoring. A three-dimensional (3D) imaging scheme that is based on radio frequency (RF) and
exploits the contrast in the dielectric properties of benign and
malignant tissues may provide a low-cost, non-invasive, nonionizing alternative [2–4].
In an RF-based imaging scheme, several antennas are placed
in a hemisphere around the breast tissue. While one of these
antennas is excited, the scattered electric field is measured at
all different antenna locations. This scattered field is dependent
on the distribution of different tissue types in the breast. While
the permittivity value of malignant (M) and benign-fatty (BF) is quite different, that of benign fibroglandular (B-FG) and
malignant tissue is almost similar. In this work, a classification
of tissue into less-dense fatty and dense fibroglandular or
malignant class is proposed using different supervised learning
algorithms and a comparison of the performances of these
algorithms is shown.
both transmitters and receivers and hence, 400 scattered fields
are measured. An in-house forward electromagnetic solver is
employed in generating the scattered field results.
B. Breast Model
Breast phantoms are discretized into 3-D grids of voxels
which represent the smallest unit of discretization. The voxels
are identical in shape and size, hence resulting in a uniform
grid. The dielectric values are assigned according to [5] and
are enumerated in Table I for an operational frequency of 2
GHz.
TABLE I
D IELECTRIC P ROPERTIES OF B REAST T ISSUE
Type
Benign Adipose
Benign Fibroglandular
Malignant
Skin
Medium
Relative Permitivity
5±1
46 ± 5
53 ± 1
39
10
Conductivity (S/m)
0.1 ± 0.02
1.56 ± 0.1
1.78 ± 0.1
1.2
0.2
II. RF-BASED T ISSUE C LASSIFICATION
Common supervised learning algorithms are applied on the
measured data obtained from an array of antennas placed
around the breast tissue.
A. Hardware Setup
An array of antennas is placed around the breast profile
as shown in Fig. 1. Sequentially, each antenna is excited and
the scattered electric field is measured at all different antenna
locations. For Na number of antennas, Na2 data points are
obtained. In this work, 20 antennas are used which work as
C. Dataset Generation
Breast phantoms of size 50×50×1 are generated with voxel
dimension 4 mm in each direction. Data sets are generated
by introducing a random seed point for FG/M tissue into the
breast phantom followed by a random shape and size selection.
A total of 5,250 breast phantoms are generated. Simulated
scattered electric fields are obtained for each phantom using
the forward solver. Data is partitioned into 4,000 training
datasets and 1,250 test datasets.
D. Supervised Learning Algoritms for Classification
Supervised learning algorithm performance is generally
problem dependent [6]. Naı̈ve Bayes, decision tree, neural
network (NN), and support vector machine (SVM) are some
of the supervised learning algorithms which can be used for
classification of the data. These algorithms are applied to the
scattered field datasets and their performance is compared.
Naı̈ve Bayes is a simple classification algorithm that uses
Bayesian method to classify the test data after learning
probability distribution form the training data. Decision tree
builds a tree from the entropy of the training set data and
classifies the test data accordingly. NN and SVM try to get
the optimal decision boundary from the training set data
and classify the test set data into corresponding classes with
regard to the obtained boundary.
E. Accuracy Prediction
The metrics used for accuracy prediction are
PT
θ=
PT + N F
and
NT
µ=
,
NT + PF
where PT (true positives) is the number of voxels whose
actual dielectric value is in B-FG/M range and is correctly
classified; NF (false negatives) is the number of voxels whose
actual dielectric value is in the B-FG/M tissue range, but
is incorrectly classified; NT (true negatives) is the number
of voxels whose actual dielectric value is in the B-F tissue
range, and is correctly classified; PF (false positives) is the
number of voxels whose actual dielectric value is in the B-F
tissue range, but is incorrectly classified. Obtained results over
1250 test-cases are shown in Table II. The trained networks
are then applied to a realistic test-case obtained from [5], as
demonstrated in Fig. 2.
TABLE II
SVM
NN
Decision Tree
Naı̈ve Bayes
Accuracy in Prediction
θ = 0.972
µ = 0.987
θ = 0.931
µ = 0.989
θ = 0.928
µ = 0.972
θ = 0.894
µ = 0.925
(b)
(c)
(d)
(e)
Fig. 2. The real component of complex dielectric profile of numerical breast
phantom (a) actual test case, (b) naı̈ve Bayes output, (c) decision tree output,
(d) NN output, and (e) SVM output.
R EFERENCES
C OMPARISON OF S UPERVISED L EARNING A LGORITHMS
Algorithms
(a)
Training Timing
92.26 min with
12 parallel processes
90.61 min with
12 parallel processes
18.41 min with
12 parallel processes
191.46 min with
12 parallel processes
III. C ONCLUSIONS
Comparison of supervised learning in classification of breast
tissue into fatty benign or fatty fibroglandular/malignant is
presented. Although there is tradeoff between accuracy and
time for different algorithms, the overall performance of SVM
seems to be the best for this application. Such a classification
scheme can be used in conjunction with a non-linear optimization framework to obtain a quantitative tissue profile.
[1] The American Cancer Society medical and editorial content
team, Breast Cancer Survival Rates [Online]. Available:
http://www.cancer.org/Cancer/BreastCancer/DetailedGuide/breastcancersurvival-by-stage
[2] M. Pastorino, Microwave Imaging. John Wiley & Sons, 2010.
[3] N. K. Nikolova, “Microwave imaging for breast cancer,” IEEE Microw.
Mag., vol. 12, no. 7, pp. 78–94, Dec. 2011.
[4] S. C. Hagness, E. C. Fear, and A. Massa, “Guest editorial: special cluster
on microwave medical imaging,” IEEE Antennas Wireless Propag. Lett.,
vol. 11, pp. 1592–1597, 2012.
[5] M. Lazebnik, D. Popovic, L. McCartney, C. B. Watkins, M. J. Lindstrom,
J. Harter, S. Sewall, T. Ogilvie, A. Magliocco, T. M. Breslin, W. Temple,
D. Mew, J. H. Booske, M. Okoniewski, and S. C. Hagness, “A large-scale
study of the ultrawideband microwave dielectric properties of normal,
benign, and malignant breast tissues obtained from cancer surgeries,”
Phys. Med. Biol., vol. 52, no. 20, pp. 6093–6115, Oct. 2007.
[6] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of
supervised learning algorithms,” International Conference on Machine
Learning, Pittsburgh, Pennsylvania, USA, June 2006.