Comparison of Supervised Learning Algorithms for RF-Based Breast Cancer Detection Sudarshan Nayak and Dipanjan Gope Indian Institute of Science, Bangalore, 560055, India [email protected] Abstract— Three-dimensional imaging based on radio frequency that exploits the contrast in dielectric properties of tissues may be used as a low-cost, non-invasive and nonionizing methodology for breast cancer detection. This paper demonstrates the use of various supervised machine learning algorithms in classification of breast tissues into less-dense fatty and dense fibroglandular or malignant classes from the measured scattered electric field data obtained through antennas placed around the breast tissue. A comparison on the performance of these algorithms are also presented. Such a classification step may be followed by a quantitative non-linear optimization scheme to obtain a more precise reconstruction of the tissue profile. Fig. 1. RF imaging antennas placed in a hemisphere around the breast profile. I. I NTRODUCTION Breast cancer is one of the leading causes for death in women across the globe. Early detection is key to reducing the mortality rate [1]. Among existing techniques, mammogram is based on ionizing radiation and MRI is expensive and therefore not conducive to regular monitoring. A three-dimensional (3D) imaging scheme that is based on radio frequency (RF) and exploits the contrast in the dielectric properties of benign and malignant tissues may provide a low-cost, non-invasive, nonionizing alternative [2–4]. In an RF-based imaging scheme, several antennas are placed in a hemisphere around the breast tissue. While one of these antennas is excited, the scattered electric field is measured at all different antenna locations. This scattered field is dependent on the distribution of different tissue types in the breast. While the permittivity value of malignant (M) and benign-fatty (BF) is quite different, that of benign fibroglandular (B-FG) and malignant tissue is almost similar. In this work, a classification of tissue into less-dense fatty and dense fibroglandular or malignant class is proposed using different supervised learning algorithms and a comparison of the performances of these algorithms is shown. both transmitters and receivers and hence, 400 scattered fields are measured. An in-house forward electromagnetic solver is employed in generating the scattered field results. B. Breast Model Breast phantoms are discretized into 3-D grids of voxels which represent the smallest unit of discretization. The voxels are identical in shape and size, hence resulting in a uniform grid. The dielectric values are assigned according to [5] and are enumerated in Table I for an operational frequency of 2 GHz. TABLE I D IELECTRIC P ROPERTIES OF B REAST T ISSUE Type Benign Adipose Benign Fibroglandular Malignant Skin Medium Relative Permitivity 5±1 46 ± 5 53 ± 1 39 10 Conductivity (S/m) 0.1 ± 0.02 1.56 ± 0.1 1.78 ± 0.1 1.2 0.2 II. RF-BASED T ISSUE C LASSIFICATION Common supervised learning algorithms are applied on the measured data obtained from an array of antennas placed around the breast tissue. A. Hardware Setup An array of antennas is placed around the breast profile as shown in Fig. 1. Sequentially, each antenna is excited and the scattered electric field is measured at all different antenna locations. For Na number of antennas, Na2 data points are obtained. In this work, 20 antennas are used which work as C. Dataset Generation Breast phantoms of size 50×50×1 are generated with voxel dimension 4 mm in each direction. Data sets are generated by introducing a random seed point for FG/M tissue into the breast phantom followed by a random shape and size selection. A total of 5,250 breast phantoms are generated. Simulated scattered electric fields are obtained for each phantom using the forward solver. Data is partitioned into 4,000 training datasets and 1,250 test datasets. D. Supervised Learning Algoritms for Classification Supervised learning algorithm performance is generally problem dependent [6]. Naı̈ve Bayes, decision tree, neural network (NN), and support vector machine (SVM) are some of the supervised learning algorithms which can be used for classification of the data. These algorithms are applied to the scattered field datasets and their performance is compared. Naı̈ve Bayes is a simple classification algorithm that uses Bayesian method to classify the test data after learning probability distribution form the training data. Decision tree builds a tree from the entropy of the training set data and classifies the test data accordingly. NN and SVM try to get the optimal decision boundary from the training set data and classify the test set data into corresponding classes with regard to the obtained boundary. E. Accuracy Prediction The metrics used for accuracy prediction are PT θ= PT + N F and NT µ= , NT + PF where PT (true positives) is the number of voxels whose actual dielectric value is in B-FG/M range and is correctly classified; NF (false negatives) is the number of voxels whose actual dielectric value is in the B-FG/M tissue range, but is incorrectly classified; NT (true negatives) is the number of voxels whose actual dielectric value is in the B-F tissue range, and is correctly classified; PF (false positives) is the number of voxels whose actual dielectric value is in the B-F tissue range, but is incorrectly classified. Obtained results over 1250 test-cases are shown in Table II. The trained networks are then applied to a realistic test-case obtained from [5], as demonstrated in Fig. 2. TABLE II SVM NN Decision Tree Naı̈ve Bayes Accuracy in Prediction θ = 0.972 µ = 0.987 θ = 0.931 µ = 0.989 θ = 0.928 µ = 0.972 θ = 0.894 µ = 0.925 (b) (c) (d) (e) Fig. 2. The real component of complex dielectric profile of numerical breast phantom (a) actual test case, (b) naı̈ve Bayes output, (c) decision tree output, (d) NN output, and (e) SVM output. R EFERENCES C OMPARISON OF S UPERVISED L EARNING A LGORITHMS Algorithms (a) Training Timing 92.26 min with 12 parallel processes 90.61 min with 12 parallel processes 18.41 min with 12 parallel processes 191.46 min with 12 parallel processes III. C ONCLUSIONS Comparison of supervised learning in classification of breast tissue into fatty benign or fatty fibroglandular/malignant is presented. Although there is tradeoff between accuracy and time for different algorithms, the overall performance of SVM seems to be the best for this application. Such a classification scheme can be used in conjunction with a non-linear optimization framework to obtain a quantitative tissue profile. [1] The American Cancer Society medical and editorial content team, Breast Cancer Survival Rates [Online]. Available: http://www.cancer.org/Cancer/BreastCancer/DetailedGuide/breastcancersurvival-by-stage [2] M. Pastorino, Microwave Imaging. John Wiley & Sons, 2010. [3] N. K. Nikolova, “Microwave imaging for breast cancer,” IEEE Microw. Mag., vol. 12, no. 7, pp. 78–94, Dec. 2011. [4] S. C. Hagness, E. C. Fear, and A. Massa, “Guest editorial: special cluster on microwave medical imaging,” IEEE Antennas Wireless Propag. Lett., vol. 11, pp. 1592–1597, 2012. [5] M. Lazebnik, D. Popovic, L. McCartney, C. B. Watkins, M. J. Lindstrom, J. Harter, S. Sewall, T. Ogilvie, A. Magliocco, T. M. Breslin, W. Temple, D. Mew, J. H. Booske, M. Okoniewski, and S. C. Hagness, “A large-scale study of the ultrawideband microwave dielectric properties of normal, benign, and malignant breast tissues obtained from cancer surgeries,” Phys. Med. Biol., vol. 52, no. 20, pp. 6093–6115, Oct. 2007. [6] R. Caruana and A. Niculescu-Mizil, “An empirical comparison of supervised learning algorithms,” International Conference on Machine Learning, Pittsburgh, Pennsylvania, USA, June 2006.
© Copyright 2026 Paperzz