NEURAL AND DECISION THEORETIC APPROACHES FOR THE AUTOMATED SEGMENTATION OF RADIODENSE TISSUE IN DIGITIZED MAMMOGRAMS R. Eckert1, J.T. Neyhart1, L. Burd1, R. Polikar1, S.A. Mandayam1, andM. Tseng2 Department of Electrical and Computer Engineering Rowan University, Glassboro, NJ 08028, USA 2 Fox Chase Cancer Center, Philadelphia, PA 19111, USA ABSTRACT. Mammography is the best method available as a non-invasive technique for the early detection of breast cancer. The radiographic appearance of the female breast consists of radiolucent (dark) regions due to fat and radiodense (light) regions due to connective and epithelial tissue. The amount of radiodense tissue can be used as a marker for predicting breast cancer risk. Previously, we have shown that the use of statistical models is a reliable technique for segmenting radiodense tissue. This paper presents improvements in the model that allow for further development of an automated system for segmentation of radiodense tissue. The segmentation algorithm employs a two-step process. In the first step, segmentation of tissue and non-tissue regions of a digitized X-ray mammogram image are identified using a radial basis function neural network. The second step uses a constrained NeymanPearson algorithm, developed especially for this research work, to determine the amount of radiodense tissue. Results obtained using the algorithm have been validated by comparing with estimates provided by a radiologist employing previously established methods. INTRODUCTION The usefulness of mammography as a non-invasive, low-cost technique for the early detection of breast cancer in women has been established. The procedure of diagnosing mammography X-rays has normally been performed by a trained radiologist. Breast cancer is currently the second leading cause of cancer related deaths among American women [1]. There is a strong desire for a consistently accurate technique for the early detection of breast cancer using the analysis of risk factors. The radiographic appearance of the female breast consists of radiolucent (dark) regions due to fat and radiodense (light) regions due to connective and epithelial tissue. Studies show that the amount of radiodense tissue can be used as a marker for predicting breast cancer risk. For example, women with radiodense tissue in more than 60-75% of the breast have been shown to be at a four to six times greater risk of developing breast cancer than those with lesser densities [2]. The estimation of radiodense tissue has traditionally been a subjective determination by trained radiologists, with very few published work describing quantitative measures [3-6]. This paper presents improvements of our previous algorithm that employs objective measures for quantifying dense tissue in digitized mammograms. The proposed technique has been validated using a set of ten images with percentage of radiodense tissue estimated by a trained radiologist using previously established methods. This research project, conducted CP657, Review of Quantitative Nondestructive Evaluation Vol. 22, ed. by D. O. Thompson and D. E. Chimenti © 2003 American Institute of Physics 0-7354-0117-9/03/S20.00 1735 at Rowan University is intended to support an investigation conducted at Fox Chase Cancer Center (FCCC), examining the correlation between dietary patterns and breast density. Typically, a trained radiologist, upon review of the mammogram, will give a succinct description of the overall breast composition. This description is usually given in the form of the ACR-BIRAD system [7]. Yaffe, Boyd et al have developed a method for determining the percentage of dense tissue in the female breast, often referred to as the "Toronto" method. This method builds on the work of Wolfe by separating mammograms into six categories. These categories are 0%, 0 to 10%, 10 to 25%, 25 to 50%, 50 to 75% and 75% to 100%, representing the percentage of radiodense tissue [8]. They offer a more quantitative approach to classification attempts than those offered by Wolfe. A trained radiologist performs classification of a mammogram into one of the six categories. Although the system allows for a quantitative measure, the classification is still based on the radiologist's qualitative assessment of the mammogram under inspection. The intent of our algorithm is to mimic the expert system (the radiologist) with greater accuracy and to quantify the percentage of radiodense tissue in the mammogram using objective methods and measures. In addition, estimation results from our algorithm are compared with the previously established Toronto method. This paper is organized as follows; we first give an introduction to the problem. This is followed by our research goals and objectives. The approach outlining the technique we developed is presented. In the results section some of the typical results are presented from exercising our algorithm on the digitized mammogram images. Finally, a summary of the work is provided. RESEARCH OBJECTIVES The research objectives are to develop a completely automated algorithm that: (a) Automatically scans digitized mammogram images to locate the breast tissue region in the X-ray. (b) Segments the tissue into radiodense and radiolucent indications. (c) Quantifies the amount and percentage of radiodense tissue. APPROACH The overall approach of this research is shown in Fig. 1. The approach taken addresses two major issues. The first issue is the actual segmentation of the arbitrarily shaped breast tissue region from within the rectangular shaped X-ray film. This is an edge-detection problem that is accomplished using a dynamically generated segmentation mask. After the breast tissue is segmented, radiodense tissue indications within the breast region can be identified and quantified. The difficulty here arises from the gray-level intensities varying among X-rays and locally across the same X-ray. For example, if two mammograms are being analyzed that are both low in percentage of radiodense tissue, the same threshold cannot be applied to both mammograms in order to distinguish between radiodense and radiolucent tissue. One mammogram may be overall brighter in gray-level intensity requiring a larger gray-level value for the threshold while at the same time, the other low radiodense mammogram may be overall darker in gray-level intensity requiring a smaller gray-level value for the threshold. This is a threshold estimation problem. This threshold is obtained by processing each of these eight discrete steps shown in Fig. 1. 1736 ____Digitized Mammogram____ * ____Image Pre-processing____ [_____Mask Generation______| r= T == I_____Tissue Segmentation_____| * I Threshold Determination___| * I_____Density Estimation_____| * ____Image Post-processing____ * I Radiodense Tissue Percentage | FIGURE 1: Overall approach. Segmentation Mask Generation After the mammogram is scanned into the database, the mammogram is still not ready to be directly analyzed for radiodensity percentage. There is a need to remove unwanted noise within the film region of the mammogram. The objective of the segmentation mask is to segment the tissue region from the unwanted film region. This will allow the algorithm to properly quantify the amount and percentage of radiodense tissue present in the breast. The normal mask template is a binary matrix of equal size to that of the original mammogram image. The segmentation mask will consist of binary values of '1's (white) to the corresponding tissue region and values of 'O's (black) to the corresponding film region. This process will allow us to subsequently identify radiodense regions in the image by concentrating on the tissue region only. A previous attempt at the segmentation mask generation using the wavelet method did not provide sufficiently smooth contours [9]. Each image is 1000 x 700 pixels in size. For analysis in generating a segmentation mask, the first step of the segmentation algorithm is to perform an adaptive threshold technique. This is done by segmenting the entire image into smaller blocks of size 50 x 50 pixels. Within each of these sub blocks, the respective histogram is analyzed. If the threshold is bimodal, then a threshold is determined as the midpoint between the two peaks of the threshold. Any pixel value above this threshold will be assigned to a value of 1, while any pixel below this threshold will be assigned a value of 0. For a better viewing representation of the output after the threshold calculation, any histogram with only an upper peak will be assigned to a value of 1. This will essentially predict the likelihood that a given region lays within the breast tissue, within the film, or on the boundary. The threshold of the sub blocks will only be bimodal if there is both tissue region and film region present. However, this new image is still a poor representation of a segmentation mask as shown in Fig. 2c. To better this output, each of the outermost pixels will be used as training data into a radial basis function (RBF) neural network. There is approximately 1000 coarse boundary points. This coarse image is then sub sampled and used as the input to the RBF neural network. The output of the RBF neural network is an optimal prediction of the breast tissue edge, which is used to generate the binary segmentation mask. The reason for using an RBF neural network is because of the known ability for an RBF neural network to approximate a function. A Gaussian is used as the activation function within the RBF neural network. This will smooth the jagged edges that were calculated after block processing. Fig. 2 shows a pictorial representation of this process [10]. 1737 (a) (b) (c) (d) FIGURE 2: Mask generation Process. 2a: Digitized image; 2b: Block process view of image; 2c: Output after threshold determination in block processing; 2d: Approximation of edge of breast tissue region. This mask is then placed over the original image and effectively removes all unwanted information outside of the breast tissue region. Any pixel within the region of the function approximation will be stored, while information outside of the function approximation will be ignored. This step is vital for subsequent image processing techniques to be employed for radiodense tissue percentage determination. Threshold Determination Determination of the estimated threshold relies on assumptions made about the images under test. The following assumptions are made in developing the density estimation algorithms: a) Pixel gray-level is considered as a deciding factor in segmenting radiodense tissue from radiolucent tissue regions. b) The location and shape of the segmented tissue in the mammograms are ignored. Identifying the radiodense tissue region in a segmented gray-level mammogram essentially involves converting the 256 gray-level images to binary format. Radiodense tissue pixels will be assigned a gray-level value of 1 and all others will be 0. However, determining an appropriate gray-level threshold for the conversion process is a non-trivial task. The threshold cannot be an absolute value, for it must respond to variations in signal intensity from image to image and local variations within the same image. Several techniques for generating a dynamic threshold for detecting radiodense indications have been developed. A two-step process is employed: Step 1: Generate mathematical models of the mammogram image by studying the statistics of the gray-level variations. Step 2: Apply hypotheses testing (detection theory) techniques for segmenting radiodense and radiolucent pixels. The digitized mammogram image is modeled as a stationary Gaussian random field using the equation where f(x,y) is the gray-level value in the mammogram image at location (x,y), m/and cTf are the local mean and standard deviation of f(x,y) respectively and w(x,y) is a zero-mean, unit variance, Gaussian random field. Empirical evidence suggests that such a model is reasonable for typical images [11]. 1738 The original image is subdivided into blocks of size 8 x 8 pixels; the local mean and standard deviation for the gray-levels are estimated for these blocks. The Gaussian field is synthesized using a pseudo-random number generator and the mammogram image model is created. This model is completely mathematically tractable and can be used for subsequent processing in place of the original image. Because the mammogram image is now modeled as a Gaussian random field using its gray-level statistics, the segmentation of radiodense tissue may be recast into a problem in hypothesis testing (detection theory). In a detection problem, an observation of a random variable is used to make decisions about a finite number of outcomes. In this case, the pixel gray-level under consideration, f(xty)9 is the random variable, and the two possible outcomes for that gray-level are radiodense or radiolucent. This two-class situation is also known as binary hypothesis testing. To test the hypothesis, the value of the random variable (pixel gay-level) is compared with a threshold. This threshold is dynamically generated, taking into account the variation in gray-level statistics from image to image and the local statistics within each image. A dynamic threshold for segmentation of the radiodense tissue inside the mammogram film is described by the equation -* global + •* nom + Y\*- local ~~*nom) (2) where Tjocai represents the local gray-level variations, Tnom represents the random image modeling, 7 is a parametric weight parameter and Tgi0bai is the global threshold. If the mammogram were a zero-mean Gaussian random image then Tglobal=Tlocal=Tnom=50%. (3) Since this is a real image, then 7/OCfl/ must be determined. To do this, the segmentation threshold is varied across the entire gray-scale range. The image is converted to a binary matrix with all pixels with a value above the threshold being assigned to a value of 1 and all pixels with a value below the threshold being assigned to a value of 0. This resembles the cumulative distribution function (CDF) of a Gaussian random variable. The probability density function (PDF) can be calculated by differentiating the CDF. The mean value of the random variable can be calculated using the PDF. This mean is the local segmentation threshold, Tiocai. The parameter 7 allows the equation to be tuned using a single variable. Implementation of this technique will involve the determination of 7. The resulting segmentation threshold, Tgiobai, is used to segment the radiodense tissue [12]. A constrained Neyman-Pearson function has been developed. This algorithm is based on the heuristics of the cohort of images provided for analysis. Analysis of these images by a strict Bayesian classifier yielded consistent results but provided non-ideal percentage calculations. To overcome this issue a constrained algorithm has been developed that biases the Bayesian classifier based on the local variance of the image and the means of the Gaussian distributions that model the radiodense and radiolucent tissue. The segmentation threshold is now given by the equation: a where TCNP is the constrained Neyman-Pearson threshold, a is a scaling parameter, tf2 is the local variance of the image and /// and jU2 are the means of the radiolucent and radiodense regions respectively. If a is chosen such that ^2max<-oi, then the bias will constrain the 1739 algorithm to values between the Bayesian classifier output and µ2. Image heuristics show that this bias range would be sufficient to model the threshold of the radiodense tissue algorithm to values between the Bayesian classifier output and ^ Image heuristics show region. As the variance of the image approaches α, the term in the first parenthesis tends that this bias range would be sufficient to model the threshold of the radiodense tissue towards and the threshold approaches thata,ofthea term pure inBayesian classifier. As the region. zero As the variance of the image approaches the first parenthesis tends variance of the image decreases, the amount of bias increases. As the variance approaches towards zero and the threshold approaches that of a pure Bayesian classifier. As the zero, the threshold will move towards the mean theincreases. radiodense region,approaches µ2 [13]. variance of the image decreases, the amount of of bias Astissue the variance zero, the threshold will move towards the mean of the radiodense tissue region, jLi2 [13]. Density Estimation and Image Post-Processing Density Estimation and Image Post-Processing Using Tglobal the image is segmented into a binary matrix. All gray-level values that lie aboveUsing TglobalTgiobai are set 1 andisall other values set tomatrix. 0. Using new binary matrix and thetoimage segmented into are a binary Allthis gray-level values that lie theabove segmentation theother percentage of radiodense tissue can be binary determined Tgi0bai are mask set to matrix 1 and all values are set to 0. Using this new matrixusing and the segmentation mask matrix the percentage of radiodense tissue can be determined using Pwhite % RadiodenseTissue = x100% (5) %RadiodenseTissue = -Ptotal − Pfilm-jclOO% (5) P — -Pfilm - total where Pwhite is the total number of white pixels in the matrix, Ptotal is the total number of whereinFinite is the and totalPnumber oftotal whitenumber pixels of in the matrix, tai is the total number of pixels the matrix pixels in thePtofilm-only region, as found film is the pixels the matrix and P/?/m is the total number of pixels in the film-only region, as found using theinsegmentation mask. using the segmentation mask. IMPLEMENTATION RESULTS IMPLEMENTATION RESULTS Validation data was provided by the Channing Laboratory, Brigham and Women’s Validation dataSchool was provided by the This Charming Laboratory, anddrawn Women's Hospital, Harvard of Medicine. set consisted of Brigham ten images from Hospital, Harvard School of Medicine. This set consisted of ten images drawn from hospitals across the country. These images are gray-scale scans of mammogram X-rays and hospitals across the country. These images are gray-scale scans of mammogram X-rays and have been used for validating the algorithm proposed in this paper. It is assumed that the have been used for validating the algorithm proposed in this paper. It is assumed that the images or adjusted adjusted ininany anymatter matterafter after imagesareareuncompressed uncompressed and and have have not not been been enhanced enhanced or acquisition from the film scanner. Each of the raw images is of a different patient and acquisition from the film scanner. Each of the raw images is of a different patient and contains different image characteristics. contains different image characteristics. Our tissue within withineach eachmammogram. mammogram. Ouralgorithm algorithmpredicted predictedthe the percentage percentage of of radiodense radiodense tissue These predictions were then compared with results from using the Toronto method,asas These predictions were then compared with results from using the Toronto method, shown using the the algorithms algorithmsdescribed describedinin shownininFig. Fig.3.3. Fig. Fig.44shows shows typical typical results results obtained obtained using this paper. this paper. 1 1 7070 —————————————————— 6060 Percent 50OVJ —als—Automated Automated segmentation segmentation 40^\J Q) £ />-''" Q.30 __ ^ 2020 1010 0 0 // _,»N^; - - •» - -"Toronto" method "Toronto" method ^^^7^ ^* 1 1 2 2 3 3 4 4 5 6 7 5 6 7 Image Number Image Number 8 8 9 9 1 0 10 FIGURE 3. Percentage of radiodense tissue calculated using our algorithm compared with that calculated FIGURE Percentage of radiodense tissue calculated using our algorithm compared with that calculated using the3.Toronto method. using the Toronto method. 1740 Image 3 (a) (b) (c) (d) FIGURE 4. Typical results of the algorithm showing (a) original image, (b) tissue segmentation mask, (c) segmented breast region and (d) radiodense tissue segmented image. SUMMARY A completely automated algorithm has been presented in this paper that automatically scans digitized mammogram images to locate the breast tissue region in the X-ray. The breast tissue region in the X-ray is then segmented into radiodense and radiolucent indications. Finally, this algorithm quantifies the amount and percentage of radiodense tissue. The results from the algorithm are compared with those obtained using the Toronto method. The differences in performance could be due to the subjective nature of the prediction in the Toronto method and the possible inability of our algorithm to properly model the underlying distributions that make up the image. Further study with a larger database of mammograms is underway to address these issues. ACKNOWLEDGEMENTS This research is supported by grants from the American Institute for Cancer Research and the American Cancer Society. We would also like to thank Dr. Katherine Evers, radiologist, at Fox Chase Cancer Center, Dr. Celia Byrne of the Charming Laboratory, Brigham and Women's Hospital, Harvard School of Medicine, and Mr. Dave Chezem of the Rowan University Information Technology Center for their assistance during the course of this project. 1741 REFERENCES 1. S. H. Landis, T. Murray, S. Bolden and P. A. Wingo, "Cancer statistics," CA Cancer Journal, Vol. 48, pp. 6-29, 1998. 2. N. F. Boyd, G. A. Lockwood, J. W. Byng, D. L. Tritchler and M. J. Yaffe, "Mammographic densities and breast cancer risk," Cancer Epidemiology, Biomarkers and Prevention, Vol. 7, pp. 1133-1144, 1998. 3. M. J. Yaffe, N. F. Boyd, J. W. Byng, R. A. long, E. Fishell, G. A. Lockwood, L. E. Little, D. L. Tritchler, "Breast cancer risk and measured mammographic density," European Journal of Cancer Prevention, Vol. 7 (suppl 1), pp. S47-S55, 1998. 4. P. K. Saha, J. K. Udupa, E. F. Conant, D. P. Chakraborty and D. Sullivan, "Breast tissue density quantification via digitized mammograms," IEEE Transactions on Medical Imaging, Vol. 20, No. 8, 2001. 5. J. Suckling, D. R Lewis and S. G. Blacker, "Paranchymal delineation by human and computer observers," 2nd International Workshop on Digital Mammography, A.G. Gale etal, Eds., York, UK 1994. 6. R. Highnam, M. Brady, and B. Shepstone, "A representation for mammographic image processing," Medical Image Analysis, Vol. l,pp. 1-18, 1996. 7. <http://www.acr.org/departments/stand_accre^irads/reporting.html>, American College of Radiology (ACR) Breast Imaging Reporting and Data System (BIRADS™), American College of Radiology, 1998. 8. N. F. Boyd, H. M. Jensen, G. Cooke, H. Lee Han, "Relationship between mammographic and histological risk factors for breast cancer," Journal of the National Cancer Institute, Vol. 84, pp. 1170-1179,1992 9. J. Neyhart, M. Ciocco, R. Polikar, S. Mandayam and M. Tseng, "Dynamic segmentation of breast tissue in digitized mammograms using the discrete wavelet transform," Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Istanbul, Turkey, October 2001. 10. J. Neyhart, R. Eckert, R. Polikar, S. Mandayam and M. Tseng, "A modified NeymanPearson technique for radiodense tissue estimation in digitized mammograms," Proceedings of the 24th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Houston, Texas, October 2002. 11. J. S. Lim, Two-Dimensional Signal and Image Processing, Prentice Hall, 1989. 12. J. Neyhart, M. Kirlakovsky, S. Mandayam and M. Tseng, "Automated segmentation and quantitative characterization of radiodense tissue in digitized mammograms," Proceedings of the 28th Annual Review of Progress in Quantitative NDE, American Institute of Physics, New York, July 2001. 13. J. Neyhart, Automated Segmentation of Radiodense Tissue in Digitized Mammograms using a Constrained Neyman-P earson Classifier, M.S. Thesis, Rowan University, Glassboro, NJ, 2002. 1742
© Copyright 2026 Paperzz