1735_1.pdf

NEURAL AND DECISION THEORETIC APPROACHES FOR THE
AUTOMATED SEGMENTATION OF RADIODENSE TISSUE IN
DIGITIZED MAMMOGRAMS
R. Eckert1, J.T. Neyhart1, L. Burd1, R. Polikar1, S.A. Mandayam1, andM. Tseng2
Department of Electrical and Computer Engineering
Rowan University, Glassboro, NJ 08028, USA
2
Fox Chase Cancer Center, Philadelphia, PA 19111, USA
ABSTRACT. Mammography is the best method available as a non-invasive technique for the early
detection of breast cancer. The radiographic appearance of the female breast consists of radiolucent
(dark) regions due to fat and radiodense (light) regions due to connective and epithelial tissue. The
amount of radiodense tissue can be used as a marker for predicting breast cancer risk. Previously, we
have shown that the use of statistical models is a reliable technique for segmenting radiodense tissue.
This paper presents improvements in the model that allow for further development of an automated
system for segmentation of radiodense tissue. The segmentation algorithm employs a two-step process.
In the first step, segmentation of tissue and non-tissue regions of a digitized X-ray mammogram image
are identified using a radial basis function neural network. The second step uses a constrained NeymanPearson algorithm, developed especially for this research work, to determine the amount of radiodense
tissue. Results obtained using the algorithm have been validated by comparing with estimates provided
by a radiologist employing previously established methods.
INTRODUCTION
The usefulness of mammography as a non-invasive, low-cost technique for the early
detection of breast cancer in women has been established. The procedure of diagnosing
mammography X-rays has normally been performed by a trained radiologist. Breast cancer
is currently the second leading cause of cancer related deaths among American women [1].
There is a strong desire for a consistently accurate technique for the early detection of
breast cancer using the analysis of risk factors. The radiographic appearance of the female
breast consists of radiolucent (dark) regions due to fat and radiodense (light) regions due to
connective and epithelial tissue. Studies show that the amount of radiodense tissue can be
used as a marker for predicting breast cancer risk. For example, women with radiodense
tissue in more than 60-75% of the breast have been shown to be at a four to six times
greater risk of developing breast cancer than those with lesser densities [2]. The estimation
of radiodense tissue has traditionally been a subjective determination by trained
radiologists, with very few published work describing quantitative measures [3-6]. This
paper presents improvements of our previous algorithm that employs objective measures
for quantifying dense tissue in digitized mammograms. The proposed technique has been
validated using a set of ten images with percentage of radiodense tissue estimated by a
trained radiologist using previously established methods. This research project, conducted
CP657, Review of Quantitative Nondestructive Evaluation Vol. 22, ed. by D. O. Thompson and D. E. Chimenti
© 2003 American Institute of Physics 0-7354-0117-9/03/S20.00
1735
at Rowan University is intended to support an investigation conducted at Fox Chase Cancer
Center (FCCC), examining the correlation between dietary patterns and breast density.
Typically, a trained radiologist, upon review of the mammogram, will give a succinct
description of the overall breast composition. This description is usually given in the form
of the ACR-BIRAD system [7].
Yaffe, Boyd et al have developed a method for determining the percentage of dense
tissue in the female breast, often referred to as the "Toronto" method. This method builds
on the work of Wolfe by separating mammograms into six categories. These categories are
0%, 0 to 10%, 10 to 25%, 25 to 50%, 50 to 75% and 75% to 100%, representing the
percentage of radiodense tissue [8]. They offer a more quantitative approach to
classification attempts than those offered by Wolfe. A trained radiologist performs
classification of a mammogram into one of the six categories. Although the system allows
for a quantitative measure, the classification is still based on the radiologist's qualitative
assessment of the mammogram under inspection. The intent of our algorithm is to mimic
the expert system (the radiologist) with greater accuracy and to quantify the percentage of
radiodense tissue in the mammogram using objective methods and measures. In addition,
estimation results from our algorithm are compared with the previously established
Toronto method.
This paper is organized as follows; we first give an introduction to the problem. This is
followed by our research goals and objectives. The approach outlining the technique we
developed is presented. In the results section some of the typical results are presented from
exercising our algorithm on the digitized mammogram images. Finally, a summary of the
work is provided.
RESEARCH OBJECTIVES
The research objectives are to develop a completely automated algorithm that:
(a) Automatically scans digitized mammogram images to locate the breast tissue
region in the X-ray.
(b) Segments the tissue into radiodense and radiolucent indications.
(c) Quantifies the amount and percentage of radiodense tissue.
APPROACH
The overall approach of this research is shown in Fig. 1. The approach taken addresses
two major issues. The first issue is the actual segmentation of the arbitrarily shaped breast
tissue region from within the rectangular shaped X-ray film. This is an edge-detection
problem that is accomplished using a dynamically generated segmentation mask. After the
breast tissue is segmented, radiodense tissue indications within the breast region can be
identified and quantified. The difficulty here arises from the gray-level intensities varying
among X-rays and locally across the same X-ray. For example, if two mammograms are
being analyzed that are both low in percentage of radiodense tissue, the same threshold
cannot be applied to both mammograms in order to distinguish between radiodense and
radiolucent tissue. One mammogram may be overall brighter in gray-level intensity
requiring a larger gray-level value for the threshold while at the same time, the other low
radiodense mammogram may be overall darker in gray-level intensity requiring a smaller
gray-level value for the threshold. This is a threshold estimation problem. This threshold
is obtained by processing each of these eight discrete steps shown in Fig. 1.
1736
____Digitized Mammogram____
*
____Image Pre-processing____
[_____Mask Generation______|
r=
T
==
I_____Tissue Segmentation_____|
*
I
Threshold Determination___|
*
I_____Density Estimation_____|
*
____Image Post-processing____
*
I
Radiodense Tissue Percentage
|
FIGURE 1: Overall approach.
Segmentation Mask Generation
After the mammogram is scanned into the database, the mammogram is still not ready
to be directly analyzed for radiodensity percentage. There is a need to remove unwanted
noise within the film region of the mammogram. The objective of the segmentation mask is
to segment the tissue region from the unwanted film region. This will allow the algorithm
to properly quantify the amount and percentage of radiodense tissue present in the breast.
The normal mask template is a binary matrix of equal size to that of the original
mammogram image. The segmentation mask will consist of binary values of '1's (white) to
the corresponding tissue region and values of 'O's (black) to the corresponding film region.
This process will allow us to subsequently identify radiodense regions in the image by
concentrating on the tissue region only. A previous attempt at the segmentation mask
generation using the wavelet method did not provide sufficiently smooth contours [9].
Each image is 1000 x 700 pixels in size. For analysis in generating a segmentation mask,
the first step of the segmentation algorithm is to perform an adaptive threshold technique.
This is done by segmenting the entire image into smaller blocks of size 50 x 50 pixels.
Within each of these sub blocks, the respective histogram is analyzed. If the threshold is
bimodal, then a threshold is determined as the midpoint between the two peaks of the
threshold.
Any pixel value above this threshold will be assigned to a value of 1, while any pixel
below this threshold will be assigned a value of 0. For a better viewing representation of
the output after the threshold calculation, any histogram with only an upper peak will be
assigned to a value of 1. This will essentially predict the likelihood that a given region lays
within the breast tissue, within the film, or on the boundary. The threshold of the sub
blocks will only be bimodal if there is both tissue region and film region present.
However, this new image is still a poor representation of a segmentation mask as shown in
Fig. 2c. To better this output, each of the outermost pixels will be used as training data into
a radial basis function (RBF) neural network. There is approximately 1000 coarse
boundary points. This coarse image is then sub sampled and used as the input to the RBF
neural network. The output of the RBF neural network is an optimal prediction of the
breast tissue edge, which is used to generate the binary segmentation mask. The reason for
using an RBF neural network is because of the known ability for an RBF neural network to
approximate a function. A Gaussian is used as the activation function within the RBF
neural network. This will smooth the jagged edges that were calculated after block
processing. Fig. 2 shows a pictorial representation of this process [10].
1737
(a)
(b)
(c)
(d)
FIGURE 2: Mask generation Process. 2a: Digitized image; 2b: Block process view of image; 2c: Output
after threshold determination in block processing; 2d: Approximation of edge of breast tissue region.
This mask is then placed over the original image and effectively removes all unwanted
information outside of the breast tissue region. Any pixel within the region of the function
approximation will be stored, while information outside of the function approximation will
be ignored. This step is vital for subsequent image processing techniques to be employed
for radiodense tissue percentage determination.
Threshold Determination
Determination of the estimated threshold relies on assumptions made about the images
under test. The following assumptions are made in developing the density estimation
algorithms:
a) Pixel gray-level is considered as a deciding factor in segmenting radiodense
tissue from radiolucent tissue regions.
b) The location and shape of the segmented tissue in the mammograms are
ignored.
Identifying the radiodense tissue region in a segmented gray-level mammogram
essentially involves converting the 256 gray-level images to binary format. Radiodense
tissue pixels will be assigned a gray-level value of 1 and all others will be 0. However,
determining an appropriate gray-level threshold for the conversion process is a non-trivial
task. The threshold cannot be an absolute value, for it must respond to variations in signal
intensity from image to image and local variations within the same image.
Several techniques for generating a dynamic threshold for detecting radiodense
indications have been developed. A two-step process is employed:
Step 1: Generate mathematical models of the mammogram image by studying the
statistics of the gray-level variations.
Step 2: Apply hypotheses testing (detection theory) techniques for segmenting
radiodense and radiolucent pixels.
The digitized mammogram image is modeled as a stationary Gaussian random field
using the equation
where f(x,y) is the gray-level value in the mammogram image at location (x,y), m/and cTf
are the local mean and standard deviation of f(x,y) respectively and w(x,y) is a zero-mean,
unit variance, Gaussian random field. Empirical evidence suggests that such a model is
reasonable for typical images [11].
1738
The original image is subdivided into blocks of size 8 x 8 pixels; the local mean and
standard deviation for the gray-levels are estimated for these blocks. The Gaussian field is
synthesized using a pseudo-random number generator and the mammogram image model is
created. This model is completely mathematically tractable and can be used for subsequent
processing in place of the original image. Because the mammogram image is now modeled
as a Gaussian random field using its gray-level statistics, the segmentation of radiodense
tissue may be recast into a problem in hypothesis testing (detection theory). In a detection
problem, an observation of a random variable is used to make decisions about a finite
number of outcomes. In this case, the pixel gray-level under consideration, f(xty)9 is the
random variable, and the two possible outcomes for that gray-level are radiodense or
radiolucent. This two-class situation is also known as binary hypothesis testing. To test the
hypothesis, the value of the random variable (pixel gay-level) is compared with a threshold.
This threshold is dynamically generated, taking into account the variation in gray-level
statistics from image to image and the local statistics within each image.
A dynamic threshold for segmentation of the radiodense tissue inside the mammogram
film is described by the equation
-* global + •* nom + Y\*- local ~~*nom)
(2)
where Tjocai represents the local gray-level variations, Tnom represents the random image
modeling, 7 is a parametric weight parameter and Tgi0bai is the global threshold. If the
mammogram were a zero-mean Gaussian random image then
Tglobal=Tlocal=Tnom=50%.
(3)
Since this is a real image, then 7/OCfl/ must be determined. To do this, the segmentation
threshold is varied across the entire gray-scale range. The image is converted to a binary
matrix with all pixels with a value above the threshold being assigned to a value of 1 and
all pixels with a value below the threshold being assigned to a value of 0. This resembles
the cumulative distribution function (CDF) of a Gaussian random variable. The probability
density function (PDF) can be calculated by differentiating the CDF. The mean value of the
random variable can be calculated using the PDF. This mean is the local segmentation
threshold, Tiocai. The parameter 7 allows the equation to be tuned using a single variable.
Implementation of this technique will involve the determination of 7. The resulting
segmentation threshold, Tgiobai, is used to segment the radiodense tissue [12].
A constrained Neyman-Pearson function has been developed. This algorithm is based on
the heuristics of the cohort of images provided for analysis. Analysis of these images by a
strict Bayesian classifier yielded consistent results but provided non-ideal percentage
calculations. To overcome this issue a constrained algorithm has been developed that
biases the Bayesian classifier based on the local variance of the image and the means of the
Gaussian distributions that model the radiodense and radiolucent tissue. The segmentation
threshold is now given by the equation:
a
where TCNP is the constrained Neyman-Pearson threshold, a is a scaling parameter, tf2 is the
local variance of the image and /// and jU2 are the means of the radiolucent and radiodense
regions respectively. If a is chosen such that ^2max<-oi, then the bias will constrain the
1739
algorithm to values between the Bayesian classifier output and µ2. Image heuristics show
that this bias range would be sufficient to model the threshold of the radiodense tissue
algorithm to values between the Bayesian classifier output and ^ Image heuristics show
region. As the variance of the image approaches α, the term in the first parenthesis tends
that this bias range would be sufficient to model the threshold of the radiodense tissue
towards
and
the threshold
approaches
thata,ofthea term
pure inBayesian
classifier. As
the
region. zero
As the
variance
of the image
approaches
the first parenthesis
tends
variance
of
the
image
decreases,
the
amount
of
bias
increases.
As
the
variance
approaches
towards zero and the threshold approaches that of a pure Bayesian classifier. As the
zero,
the threshold
will move
towards
the mean
theincreases.
radiodense
region,approaches
µ2 [13].
variance
of the image
decreases,
the amount
of of
bias
Astissue
the variance
zero, the threshold will move towards the mean of the radiodense tissue region, jLi2 [13].
Density Estimation and Image Post-Processing
Density Estimation and Image Post-Processing
Using Tglobal the image is segmented into a binary matrix. All gray-level values that lie
aboveUsing
TglobalTgiobai
are set
1 andisall
other values
set tomatrix.
0. Using
new binary
matrix
and
thetoimage
segmented
into are
a binary
Allthis
gray-level
values
that lie
theabove
segmentation
theother
percentage
of radiodense
tissue
can
be binary
determined
Tgi0bai are mask
set to matrix
1 and all
values are
set to 0. Using
this
new
matrixusing
and
the segmentation mask matrix the percentage of radiodense tissue can be determined using
Pwhite
% RadiodenseTissue =
x100%
(5)
%RadiodenseTissue = -Ptotal − Pfilm-jclOO%
(5)
P — -Pfilm
- total
where Pwhite is the total number of white pixels in the matrix, Ptotal is the total number of
whereinFinite
is the and
totalPnumber
oftotal
whitenumber
pixels of
in the
matrix,
tai is the total
number
of
pixels
the matrix
pixels
in thePtofilm-only
region,
as found
film is the
pixels
the matrix and
P/?/m is the total number of pixels in the film-only region, as found
using
theinsegmentation
mask.
using the segmentation mask.
IMPLEMENTATION RESULTS
IMPLEMENTATION RESULTS
Validation data was provided by the Channing Laboratory, Brigham and Women’s
Validation
dataSchool
was provided
by the This
Charming
Laboratory,
anddrawn
Women's
Hospital,
Harvard
of Medicine.
set consisted
of Brigham
ten images
from
Hospital, Harvard School of Medicine. This set consisted of ten images drawn from
hospitals across the country. These images are gray-scale scans of mammogram X-rays and
hospitals across the country. These images are gray-scale scans of mammogram X-rays and
have been used for validating the algorithm proposed in this paper. It is assumed that the
have been used for validating the algorithm proposed in this paper. It is assumed that the
images
or adjusted
adjusted ininany
anymatter
matterafter
after
imagesareareuncompressed
uncompressed and
and have
have not
not been
been enhanced
enhanced or
acquisition
from
the
film
scanner.
Each
of
the
raw
images
is
of
a
different
patient
and
acquisition from the film scanner. Each of the raw images is of a different patient and
contains
different
image
characteristics.
contains different image characteristics.
Our
tissue within
withineach
eachmammogram.
mammogram.
Ouralgorithm
algorithmpredicted
predictedthe
the percentage
percentage of
of radiodense
radiodense tissue
These
predictions
were
then
compared
with
results
from
using
the
Toronto
method,asas
These predictions were then compared with results from using the Toronto method,
shown
using the
the algorithms
algorithmsdescribed
describedinin
shownininFig.
Fig.3.3. Fig.
Fig.44shows
shows typical
typical results
results obtained
obtained using
this
paper.
this
paper.
1
1
7070 ——————————————————
6060
Percent
50OVJ
—als—Automated
Automated
segmentation
segmentation
40^\J
Q)
£
/>-''"
Q.30
__ ^
2020
1010
0
0
// _,»N^;
- - •» - -"Toronto"
method
"Toronto" method
^^^7^
^*
1
1
2
2
3
3
4
4
5
6
7
5
6
7
Image Number
Image Number
8
8
9
9
1 0
10
FIGURE 3. Percentage of radiodense tissue calculated using our algorithm compared with that calculated
FIGURE
Percentage
of radiodense tissue calculated using our algorithm compared with that calculated
using the3.Toronto
method.
using the Toronto method.
1740
Image 3
(a)
(b)
(c)
(d)
FIGURE 4. Typical results of the algorithm showing (a) original image, (b) tissue segmentation mask, (c)
segmented breast region and (d) radiodense tissue segmented image.
SUMMARY
A completely automated algorithm has been presented in this paper that automatically
scans digitized mammogram images to locate the breast tissue region in the X-ray. The
breast tissue region in the X-ray is then segmented into radiodense and radiolucent
indications. Finally, this algorithm quantifies the amount and percentage of radiodense
tissue. The results from the algorithm are compared with those obtained using the Toronto
method. The differences in performance could be due to the subjective nature of the
prediction in the Toronto method and the possible inability of our algorithm to properly
model the underlying distributions that make up the image. Further study with a larger
database of mammograms is underway to address these issues.
ACKNOWLEDGEMENTS
This research is supported by grants from the American Institute for Cancer Research
and the American Cancer Society. We would also like to thank Dr. Katherine Evers,
radiologist, at Fox Chase Cancer Center, Dr. Celia Byrne of the Charming Laboratory,
Brigham and Women's Hospital, Harvard School of Medicine, and Mr. Dave Chezem of
the Rowan University Information Technology Center for their assistance during the course
of this project.
1741
REFERENCES
1. S. H. Landis, T. Murray, S. Bolden and P. A. Wingo, "Cancer statistics," CA Cancer
Journal, Vol. 48, pp. 6-29, 1998.
2. N. F. Boyd, G. A. Lockwood, J. W. Byng, D. L. Tritchler and M. J. Yaffe,
"Mammographic densities and breast cancer risk," Cancer Epidemiology, Biomarkers
and Prevention, Vol. 7, pp. 1133-1144, 1998.
3. M. J. Yaffe, N. F. Boyd, J. W. Byng, R. A. long, E. Fishell, G. A. Lockwood, L. E.
Little, D. L. Tritchler, "Breast cancer risk and measured mammographic density,"
European Journal of Cancer Prevention, Vol. 7 (suppl 1), pp. S47-S55, 1998.
4. P. K. Saha, J. K. Udupa, E. F. Conant, D. P. Chakraborty and D. Sullivan, "Breast
tissue density quantification via digitized mammograms," IEEE Transactions on
Medical Imaging, Vol. 20, No. 8, 2001.
5. J. Suckling, D. R Lewis and S. G. Blacker, "Paranchymal delineation by human and
computer observers," 2nd International Workshop on Digital Mammography, A.G. Gale
etal, Eds., York, UK 1994.
6. R. Highnam, M. Brady, and B. Shepstone, "A representation for mammographic image
processing," Medical Image Analysis, Vol. l,pp. 1-18, 1996.
7. <http://www.acr.org/departments/stand_accre^irads/reporting.html>, American
College of Radiology (ACR) Breast Imaging Reporting and Data System (BIRADS™), American College of Radiology, 1998.
8. N. F. Boyd, H. M. Jensen, G. Cooke, H. Lee Han, "Relationship between
mammographic and histological risk factors for breast cancer," Journal of the National
Cancer Institute, Vol. 84, pp. 1170-1179,1992
9. J. Neyhart, M. Ciocco, R. Polikar, S. Mandayam and M. Tseng, "Dynamic
segmentation of breast tissue in digitized mammograms using the discrete wavelet
transform," Proceedings of the 23rd Annual International Conference of the IEEE
Engineering in Medicine and Biology Society, Istanbul, Turkey, October 2001.
10. J. Neyhart, R. Eckert, R. Polikar, S. Mandayam and M. Tseng, "A modified NeymanPearson technique for radiodense tissue estimation in digitized mammograms,"
Proceedings of the 24th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society, Houston, Texas, October 2002.
11. J. S. Lim, Two-Dimensional Signal and Image Processing, Prentice Hall, 1989.
12. J. Neyhart, M. Kirlakovsky, S. Mandayam and M. Tseng, "Automated segmentation
and quantitative characterization of radiodense tissue in digitized mammograms,"
Proceedings of the 28th Annual Review of Progress in Quantitative NDE, American
Institute of Physics, New York, July 2001.
13. J. Neyhart, Automated Segmentation of Radiodense Tissue in Digitized Mammograms
using a Constrained Neyman-P earson Classifier, M.S. Thesis, Rowan University,
Glassboro, NJ, 2002.
1742