A Method for Identification and Visualization of Histological Image

A Method for Identification and Visualization of
Histological Image Structures Relevant to the
Cancer Patient Conditions
Vassili Kovalev1 , Alexander Dmitruk1 , Ihar Safonau1 , Mikhail Frydman2 , and
Sviatlana Shelkovich3
1
2
3
Biomedical Image Analysis Department, United Institute of Informatics Problems,
Surganova St., 6, 220012 Minsk, Belarus
Department of Morbid Anatomy, Minsk City Hospital for Oncology, Nezavisimosti
av. 64-3, 220013 Minsk, Belarus
Oncology Department, Belarusian Medical Academy of Post-Graduate Education,
Brovki St., 3-B3, 220013, Minsk, Belarus
Abstract. A method is suggested for identification and visualization
of histology image structures relevant to the key characteristics of the
state of cancer patients. The method is based on a multi-step procedure
which includes calculating image descriptors, extracting their principal
components, correlating them to known object properties and mapping
disclosed regularities all the way back up to the corresponding image
structures they found to be linked with. Image descriptors employed
are extended 4D color co-occurrence matrices counting the occurrence
of all possible pixel triplets located at the vertices of equilateral triangles of different size. The method is demonstrated on a sample of 952
histology images taken from 68 women with clinically confirmed diagnosis of ovarian cancer. As a result, a number of associations between
the patients’ conditions and morphological image structures were found
including both easily explainable and the ones whose biological substrate
remains obscured.
1
Introduction
It is well-known that visual examination of histological images taken from tissue
samples remains a gold standard in definitive diagnosis, staging and treatment
of a number of cancer types [1], [2]. However, the histological image analysis
problem has not been adequately explored and remains underdeveloped comparing to other branches of recent image analysis methods [3], [4], [5]. This is
mostly because the histological image data stay apart from the main body of biomedical images by their remarkable morphological complexity [3], [4], [6]. These
holds true for majority of conventional methods and worsened even further by
new emerging techniques of preprocessing of tissue probes and advanced imaging
technologies such as the whole slide scanning producing hyper-large images [7].
Motivation. The motivation of this work stems from a biomedical problem
of discovering implicit links between the morphological structure of histological
images and features describing the state of cancer patients. In particular, we
are interesting in attributing certain conditions of the ovarian cancer patients to
morphological structures observed in routine diagnostic tissue samples as well as
in the probes immuno-histochemically processed for highlighting tissue lymphoand angio-genesis.
Ovarian cancer is a devastating disease which is known as one of the major
causes of female gynaecological death worldwide [8]. In Western countries about
1-2% of all women develop epithelial ovarian cancer at some time during their
lives. The problem is also that the most patients refer to a hospital too late being
already in an advanced stages of disease. This is because the first indication of
the ovarian cancer is not a pain but simply swelling of the abdomen which can be
easily missed. As a result, the five-year survival rates remain as low as 20% [8],
[9]. This work is part of a larger project aimed at studying the malignant tumor
angiogenesis in ovary [10]. Angiogenesis, the development of new blood vessels
from the existing vasculature, is an important factor of solid tumor growth and
metastasis [9]. Without angiogenesis the tumor expansion is naturally limited
by 1-2 mm only because in order to grow the tumor needs to be supplied by
oxygen and nutrition and waste removals outside [11]. Recently, there is a hope
for cancer treatment by inhibiting angiogenesis processes. Thus, disclosing links
between the tumor structure, its growth characteristics and patient conditions
is of paramount importance for oncology [8], [9].
The technical problem. In a typical setup there is a patient database available which contains both image data of different modalities as well as non-visual
patient characteristics such as general social data, clinical observations, results
of laboratory tests, history of personal and family diseases, etc. Then technically
the problem is posed as finding statistically significant associations between the
morphological image structures presented in form of suitable quantitative features and database variables containing the patient records. Such correlations
can be found in a straightforward manner using, for example, conventional approach of feature extraction followed by a multivariate statistical analysis for
identifying significant links between these two. However, this is only possible
with a priori research hypothesis in hands which presumes certain connections
between the specific, pre-defined image structures and some patient characteristics. Being developed, implemented, and successfully applied to the input data,
this approach leaves researcher with only particular results and image structures
that have been extracted and examined. For instance, our preliminary study exploiting this approach was attempting to attribute tumor vessel development
visualized with the help of D2-40 marker to patients’ conditions. To this end,
the vessel network was segmented, characterized by five quantitative features,
and correlated to the patient state. However, despite certain time and other
resources were spent, it gave very particular and rather modest results [10].
Thus, in this context it is worth to consider an alternative, exploratory
approach which is aimed at detecting the whole bunch of objectively existing
correlations between the histology image structures and patient state first and
separate investigating their novelty together with the underlying biomedical sub-
strate afterwards. Such an approach may conditionally be categorized into the
image mining research area. In much the same way as data mining, the image
mining can be understood as the process of extracting hidden patterns from images [12], [13]. More specifically, image mining deals with extraction of implicit
knowledge, image data relationship or other patterns not explicitly stored in
the image database (e.g., [14], [15], [16], [17]). Given that the histological image
analysis is the task that difficult to automate due to its structural sophistication, it appears promising to examine the wide-cut image mining techniques for
discovering the links we are interested in.
Basic requirements. In order to produce the desired result, a method of identification and visualization of histological image structures which correlate with
the cancer patient conditions should fulfill the following major requirements.
(a) The image descriptors should be powerful and flexible enough to capture
a broad range of morphological image properties and be capable of both color
and grayscale images.
(b) The quantitative features which are derived from descriptors and correlated to patient state records should allow mapping selected correlations back to
original images for isolating and visualizing underlying morphological structures.
(c) The number of features used for describing the image content should be
limited by a few dozen to satisfy the well-known statistical requirement (i.e.,
kept less than the number of patients) what avoids correlation purely by chance.
In this paper, we introduce a method for discovering important histology
image structures of cancer tissue that fulfill these requirements. We demonstrate
its abilities on a database of 68 ovarian cancer patients.
2
Materials and Methods
Image Data. A database containing patient records and histological tissue images
of 68 ovarian cancer patients (women, mean age 59.8 years, STD=11.2) was
used with this study. The image data part consisted of 952 color images of
2048×1536 pixels in size which were acquired under ×200 magnification using
recent Leica DMD108 microscope. They included 272 routine hematoxylin-eosin
stained diagnostic images (4 images for each patient) and 680 images of tissue
probes (10 per patient) immuno-histochemically processed with D2-40 marker
highlighting lymphogenesis. Examples are provided in Fig. 1. Patients’ state
records included about 80 characteristics such as the international TNM cancer
staging, medical history, tumor dissemination, surgery and chemotherapy details,
current value of alive-died flag and some other.
The Method. Due to the characteristic textural appearance, color texture descriptors are the most common type of features used in histology image analysis
when describing the image as a whole. Among them are the color co-occurrence
matrices introduced independently in [18] and [19] under the color correlogram
term first [18] and as co-occurrence matrices a year later [19]. There are also
several allied approaches for describing spatial image structure such as simultaneous autoregressive models [15] and some other. Here we continuing to exploit
Fig. 1. Examples of original histological images of tissue routinely stained with
hematoxylin-eosin (top two rows) and D2-40 endothelial marker (bottom two rows).
the co-occurrence approach. However, taking into account the first requirement
given in the introduction, we developing the co-occurrence approach further and
using extended 4D matrices. Namely, we considering triplets of pixels located
at the vertices of equilateral triangles instead of conventional pixel pairs. Note
that such an extension is not just mechanical addition of one more dimension
to co-occurrence matrix array as this might appear at the first sight. The consequences are by far deeper and they related to the problem of discriminating
different sorts of textures with the help of first (pixel intensities/colors alone),
second (gradients), and higher order spatial statistics. This problem was thoroughly studied by Bela Julesz (e.g., [20]). More recently, this line of research on
visual texture perception is studied with the help of fMRI brain scanning. In
particular, it was experimentally proven [21] that patterns of brain activity are
significantly different when observing textures with low and high order spatial
correlations. Note that one should not mistake high order statistics in the spatial
and in the intensity [22] domains.
Let FG = {I(x, y)} = {I(i)} = {I(j)} = {I(k)} be a gray-scale image of
M × N pixels in size. Let suppose all the image pixels are indexed with the help
of indices i, j, and k, where i = 1, M N , j = 1, M N and k = 1, M N and their
intensity levels are I(i), I(j), and I(k) respectively. The indices are naturally
defined by pixel coordinates as i = (xi , yi ), j = (xj , yj ) and k = (xk , yk ). Then
the 4D gray-scale intensity co-occurrence matrix of IIID type defined on the
triplets of pixels (i, j, k) which are located at the vertices of equilateral triangles
with the side of d pixels can be defined as follows:
WIIID = kI(i), I(j), I(k), dk ,
d(i, j) = d(i, k) = d(j, k), d ∈ D,
i < j, i < k,
∀i : yj ≥ yi , yk < yi .
Note that the last two lines of the above equation formalize the requirement of
enumeration of all possible triangles with no repetition. The equation describes
algorithm of covering the whole image by equilateral triangles. As it can be
inferred, the procedure consists of subsequent placing the basic (seed) triangle
vertex on the image position i so that the second vertex j falls into the same row
for d pixels ahead with the vertex k pointing down. This gives the first, initial
position with the seed vertex fixed at i whereas the rest ones are obtained by
rotating the triangle around i clockwise so that its third, i.e. k-th vertex neither
cross nor elevates over the current image row.
In case the image colors should be considered, the color space is suitably
reduced first and corresponding color co-occurrence matrix of CCCD type can
be defined exactly in the same manner using the image color indices C(i), C(j),
and C(k) instead of intensity levels.
Once the co-occurrence matrices are calculated, the very common strategy is
to calculate Haralick’s features next and to use them for image characterization,
clustering, etc. However, this traditional procedure may not be followed here at
least because Haralick’s features cannot be mapped back to the original images as
the second introductory condition requires. On the contrary, the matrix elements
themselves may be mapped back [23] but there are too many of them to satisfy
the final, third condition. The solution is to apply PCA method for extracting
a limited number of uncorrelated features from matrices.
Thus, the method supposes calculating 4D co-occurrence matrices, extracting
principal components, correlating them to patients’ state, selecting significant
ones, projecting selected components back to co-occurrence matrix elements,
and finally using them for visualizing the image structures we are looking for.
Note that since principal components are uncorrelated, there is no need to apply complicated and somewhat risky multivariate statistical analysis methods.
Searching for significant links can be done by straightforward univariate correlations or with the help of Student’s t-test according to the feature type.
3
Results
Original RGB images were converted into the Lab space with Euclidean color
dissimilarity metrics and the number of colors was reduced down to 24 bins using the median cut algorithm preserving most important colors. Thus, the 3D
color co-occurrence sub-matrices CCC with a fixed inter-pixel distance d contain 243 = 13824 cells. Given that elements above leading diagonal are zeros, the
number of effective matrix elements was NE = 2600. Equilateral triangles with
side lengths D = {1, 3, 5} were considered so that the total number of elements
of completed CCCD matrices was 7800. Cumulative CCCD matrices computed
over all the images of each patient were vectorized constituting an input PCA
data table with 68 rows and 7800 columns. PCA resulted in extracting 27 principal components (PCs) in case of matrices of routine images and 38 PCs in
case of D2-40 images under condition of covering 95.0% of variance. The first
components cover 55.7% and 26.5% of variances respectively. These results suggest that structural variability of D2-40 images is substantially higher compared
to routine ones. Being correlated with patients’ data, 27 PCs of routine images
have produced a total of 43 events of correlation significant at p < 0.01. Same
procedure being applied to 38 PCs derived from descriptors of D2-40 images with
highlighted lymphatic vessels resulted in detecting 47 significant links between
these features and patient state records.
Detailed investigation of significant correlation has revealed that some of
them were easily deductible from existing knowledge whereas other are suggestive
for novelty and certainly interesting from both scientific and practical points of
view. For instance, in case of routine images the significant links between PCs and
the following patient data appears to be very promising: development of distant
metastases (p < 0.001), the degree of cancer tissue differentiation (p < 0.007),
the number of miscarriages (p < 0.0001), and the number of chemotherapy trials (p < 0.000002, r = −0.543) (see visualization of related image structures on
the top row of Fig. 2). The negative correlation of the length of borders highlighted in the figure with the number of trials may be explained by the fact that
more spacious tumor structure is typical for relatively ”young” tumors which
are chemically treated first compared to ”old” ones which removed immediately.
Images of tissue processed by D2-40 endothelial marker have demonstrated similar behavior disclosing a number of interesting links. The bottom row of Fig.
2 demonstrates one of them which displays stromal structures (automatically
extracted and visualized on the bottom-right picture) affected by proteins of
endothelial cells. The fraction of these structures strongly correlates with tumor
differentiation rate (p < 0.009), patient survival time (p < 0.010) and presence
of a relapse (p = 0.017).
Finally, the abilities of IIID co-occurrence matrices computed using grayscale
version of the images were also assessed. Despite some promising correlations
were found, an ambiguity was revealed. In particular, when certain IIID matrix
element was mapped back to the grayscale images, it highlights structures of
biologically different sorts. This is because two or more substantially different
image colors were converted down to one single gray level.
Fig. 2. Examples of original images (left column) and their key structures visualized
(right column) for routine (top row) and D2-40 (bottom row) images.
4
Conclusions
The results reported with this study allow to draw the following conclusions.
1. The method presented in this paper may be considered as a promising tool
capable of an automatic identification and visualization of histological image
structures relevant to the cancer patient conditions.
2. Since there is no intrinsic mechanism for semantic assessing the resultant
links detected by the method, an expert-based evaluation of the novelty and
biological substrate of the result is necessary.
3. The future work should include development of an automatic procedure
for selecting the set of matrix elements to be mapped back to original images
once the interesting principal component is identified.
Acknowledgments. This work was funded by the ISTC project B–1682.
References
1. Schwab, M.: Encyclopedia of Cancer. 2 edn. Springer, Heidelberg (2009) 4 volumes,
3235 p.
2. Hayat, M.: Methods of Cancer Diagnosis, Therapy and Prognosis. Springer, Heidelberg (2009-2010) 6 volumes.
3. Wootton, R., Springall, D., Polak, J.: Image Analysis in Histology: Conventional
and Confocal Microscopy. Cambridge University Press, Cambridge (1995) 425 p.
4. Gurcan, M.N., Boucheron, L.E., Can, A., Madabhushi, A., Rajpoot, N.M., Yener,
B.: Histopathological image analysis: A review. IEEE Reviews in Biomedical
Engineering (1) (2009) 147–171
5. Sertel, O., Kong, J., Catalyurek, U., Lozanski, G., Saltz, J., Gurcan, M.:
Histopathological image analysis using model-based intermediate representations
and color texture: Follicular lymphoma grading. Journal of Signal Processing Systems 55(1) (2009) 169–183
6. Yu, F., Ip, H.: Semantic content analysis and annotation of histological images.
Computers in Biology and Medicine 38(6) (2008) 635–649
7. Rojo, M.G., Garcia, G.B., Mateos, C.P., Garcia, J.G., Vicente, M.C.: Critical
comparison of 31 commercially available digital slide systems in pathology. International Journal of Surgical Pathology 14(4) (2006) 285–305
8. Stack, M.S., Fishman, D.A.: Ovarian Cancer (Cancer Treatment and Research). 2
edn. Springer, New York (2009) 409 p.
9. Bamberger, E., Perrett, C.: Angiogenesis in epithelian ovarian cancer (review).
Molecular Pathology 55 (2002) 348–359
10. Sprindzuk, M., Dmitruk, A., Kovalev, V., Bogush, A., Tuzikov, A., Liakhovski, V.,
Fridman, M.: Computer-aided image processing of angiogenic histological samples
in ovarian cancer. Journal of Clinical Medicine Research 1(5) (2009) 249–261
11. Folkman, J.: What is the evidence that tumors are angiogenesis dependent? Journal of the National Cancer Institute 82(1) (1990) 4–6
12. Hsu, W., Lee, M., Zhang, J.: Image mining: Trends and developments. Journal of
Intelligent Information Systems 19(1) (2002) 7–23
13. Herold, J., Loyek, C., Nattkemper, T.W.: Multivariate image mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1(1) (2011) 2–13
14. Perner, P.: Image mining: Issues, framework, a generic tool and its application to
medical image diagnosis. Engineering Applications of Artificial Intelligence 15(2)
(2002) 205–216
15. Chen, W., Meerc, P., Georgescud, B., He, W., Goodellb, L.A., Forana, D.J.: Image
mining for investigative pathology using optimized feature extraction and data
fusion. Computer Methods and Programs in Biomedicine 79 (2005) 59–72
16. Kovalev, V., Prus, A., Vankevich, P.: Mining lung shape from x-ray images. In:
Machine Learning and Data Mining in Pattern Recognition (MLDM-2009). Volume
5632., Leipzig, Germany, Springer (2009) 554–568
17. Kovalev, V., Safonau, I., Prus, A.: Histological image mining for exploring textural
differences in cancerous tissue. In: Swedish Symposium on Image Analysis (SSBA2010), 11-12 March 2010, Uppsala, Sweden, Uppsala University (2010) 113–116
18. Huang, J., Kumar, R., Mitra, M., Zhu, W.J., Zabih, R.: Image indexing using
color correlograms. In: IEEE Comp. Soc. Conf. on Computer Vision and Pattern
Recognition, San Juan, Puerto Rico, IEEE Comp. Soc. Press (1997) 762–768
19. Kovalev, V., Volmer, S.: Color co-occurrence descriptors for querying-by-example.
In: Int. Conf. on Multimedia Modelling, Lausanne, Switzerland, IEEE Comp. Soc.
Press (1998) 32–38
20. Julesz, B.: Foundations of Cyclopean Perception. The MIT Press, Cambridge,
Massachusetts (2006) 426 p.
21. Beason-Held, L.L., Purpura, K.P., Krasuski, J.S., et. al.: Cortical regions involved
in visual texture perception: a fMRI study. Cognitive Brain Research 7 (1998)
111–118
22. Petrou, M., Kovalev, V., Reichenbach, J.: Three-dimensional nonlinear invisible
boundary detection. IEEE Trans. Image Processing 15(10) (2006) 3020–3032
23. Kovalev, V., Petrou, M., Suckling, J.: Detection of structural differences between
the brains of schizophrenic patients and controls. Psychiatry Research: Neuroimaging 124 (2003) 177–189