Thesis - Técnico Lisboa

What is the Role of Color in Dermoscopy Analysis?
Ana Margarida Caetano Ruela
Thesis to obtain the Master of Science Degree in
Biomedical Engineering
Examination Committee
Chairperson:
Professor Patrícia Margarida Piedade Figueiredo
Supervisor:
Professor Jorge dos Santos Salvador Marques
Co-Supervisor:
Professor Teresa Maria de Gouveia Torres Feio Mendonça
Member of the Committee:
Professor João Miguel Raposo Sanches
November 2012
Acknowledgments
This master thesis is integrated in the project PTDC/SAUBEB/103471/2008, funded by FCT. In
the first place, I would like to thank Professor Jorge Salvador Marques for giving me the opportunity
to work in this project and for guiding and supporting me throughout this journey. I would also like
to thank Professor Teresa Mendonça for the advices and support given during this work. For all her
valuable help during this thesis, I leave here my words of appreciation to Catarina Barata. I also want
to acknowledge Dr. Jorge Rozeira for providing the dermoscopy images used in this work.
I cannot thank my parents enough for their unconditional support and for doing absolutely everything to provide me the best education possible. I also have to thank Lira, Zuca and Teca for their
affection during the different parts of this process.
I must also thank my friends and classmates from IST for the lunch breaks which helped me to
relax. My friends from Almada and Barreiro also deserve some words of appreciation for all the funny
moments that helped me recharge batteries. In particular, I would like to thank Afonso Maria, Ana
Arêde and Cecı́lia Nunes for their help and words of encouragement.
i
Abstract
For the last two decades, several Computer-Aided Diagnosis (CAD) systems have been developed
with the aim of aiding dermatologists to classify Pigmented Skin Lesions (PSL). The majority of these
systems usually rely on shape-, color- and texture-based features to classify the lesions. Even though
some systems have already achieved good results, the ideal set of features, to be extracted from the
images, remains unknown. A possible course of action to determine the best set of features is to
separately assess their role.
The aim of this thesis is to assess what is the role of color in dermoscopy analysis and to determine
which descriptor performs best, considering a rich set color spaces and color descriptors.
In order to fulfill this goal, two different CAD systems, whose classification is solely based on color
features, were developed. The first is based on the principle that lesions are uniform and can be
represented by a set of global parameters. This approach has been adopted by the majority of the
works described in the literature. On the opposite, the second system does not regard lesions as
homogeneous and, thus, does not use a single model to represent them. Therefore, in this system,
lesions are divided into several patches which are considered to contain uniform parts of the lesions.
Both systems were tested by using three different types of color descriptors and two different classifiers. Furthermore, the dermoscopy images were, separately, represented by using seven different
color spaces. The best performance (SE = 100% and SP = 93%) was achieved by both CAD systems using the unidimensional color histogram as the image descriptor and images represented in
the opponent color space. These results show that color features play a major role in dermoscopy
analysis.
Keywords
Skin lesions, Melanoma, Computer-Aided Diagnosis (CAD) system, Color Features, Feature Extraction, Lesion Classification
ii
Resumo
Nas duas últimas décadas, vários sistemas de apoio ao diagnóstico têm sido desenvolvidos com
o intuito de auxiliar os dermatologistas a diagnosticar lesões cutâneas. A maioria destes sistemas
baseia-se em caracterı́sticas da imagem que derivam da forma, da cor e da textura para efetuar a
classificação das lesões. Apesar de alguns sistemas já apresentarem um bom desempenho, o conjunto de caracterı́sticas ideais ainda não foi encontrado. Uma possı́vel linha de ação para determinar
qual é o melhor conjunto de caracterı́sticas é avaliar individualmente o seu papel.
Neste sentido, o objetivo desta tese é o de determinar qual é o papel da cor no diagnóstico de
melanomas a partir de imagens dermoscópicas. Pretende-se também determinar qual o descritor
que conduz a um melhor desempenho.
Para cumprir este objetivo, foram desenvolvidos dois sistemas de apoio ao diagnóstico cuja decisão é apenas baseada em caracterı́sticas de cor. O primeiro sistema considera que as lesões são
homogéneas e que podem ser representadas por um conjunto de caracterı́sticas globais. Esta é a
abordagem considerada na maioria dos trabalhos descritos na literatura. Pelo contrário, o segundo
sistema considera que as lesões não são homogéneas e, consequentemente, não podem ser representadas por um modelo global. Desta forma, nos sistemas baseados neste método, as lesões são
divididas em vários blocos que se considera conterem regiões homogéneas da lesão.
Ambos os sistemas foram testados utilizando, em separado, sete espaços de cor para a representação
das imagens, três tipos de descritores e dois classificadores. O melhor desempenho (SE = 100%
e SP = 93%)foi obtido pelos dois sistemas desenvolvidos, ambos utilizando o histograma de cor
unidimensional como descritor e as imagens representadas no espaço de cor oponente. Os resultados obtidos mostram que a cor desempenha um papel fundamental na análise de imagens
dermoscópicas.
Palavras Chave
Lesões cutâneas, Melanoma, Sistema de Apoio ao Diagnóstico, Caracterı́sticas de Cor, Extracção
de Caracterı́sticas, Classificação de Lesões
iii
Contents
1 Introduction
1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.2 Thesis goal and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2 Vision and Color
5
2.1 Color Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.2 Physiology of the eye
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.3 Color Vision Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.4 Color Representation
9
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5 Color Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.1 RGB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.5.2 HSV, HSL and HSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5.3 Opponent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.4 CIE System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.5 CIELAB and CIELUV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3 Dermoscopy analysis
17
3.1 Skin Lesions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Melanomas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Dermoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Medical Diagnostic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.1 ABCD rule of dermoscopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.2 7-point checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Computer Aided Diagnosis system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.1 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.3 Lesion Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4.4 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Color Features in CAD systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
iv
4 Lesion classification using global features
31
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Lesion Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.3 Feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.1 Color Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.2 Generalized Color Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.1 k Nearest Neighbors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4.1.A Distance measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.2 AdaBoost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5 Lesion classification using local features
43
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Keypoint extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3 Feature description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Vocabulary construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5 Image Representation and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
6 Results
51
6.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.2 Preprocessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.3 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
6.4 Cost function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.5 Global Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.5.1 Unidimensional Color Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.5.2 Tridimensional Color Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.5.3 Generalized Color Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.6 Local Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.6.1 Unidimensional Color Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.6.2 Mean Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.6.3 Generalized Color Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.7 Comparison between the global and local systems . . . . . . . . . . . . . . . . . . . . . 70
7 Conclusions and Future Work
75
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Bibliography
79
v
List of Figures
2.1 Electromagnetic spectrum [13]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
2.2 Cones’ response to light [63]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
2.3 Anatomy of the human eye [47].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
2.4 Color Vision Model proposed by W. Frei and B. Baxter (adapted from [20]). . . . . . . .
9
2.5 R, G and B color matching functions [46]. . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 CIE xyY chromaticity diagram [33]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1 Skin Lesion Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Examples of melanomas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.3 Dysplastic Nevu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.4 Blue Nevu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Seborrheic keratosis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.6 Classic Dermatoscope. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.7 Modern Dermatoscope. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.8 Dermatoscopic structures used as a criteria in the 7-point checklist diagnostic algorithm
[3]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.9 Overall scheme of a CAD system for melanoma detection. . . . . . . . . . . . . . . . . . 22
4.1 Overall description of the CAD system for melanoma classification by using global features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Dermoscopy image (left) and its correspondent binary mask (right). The white area of
the mask corresponds to the skin lesion, whereas the black one represents the healthy
tissue. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Regions defined by the two binary masks. The inner region, R1 , is represented by the
red region of the image. The border, R2 , is represented by the yellow region of the image. 34
5.1 Overall description of the CAD system for melanoma classification by using local features. 44
5.2 Representation of the regular grid over the dermoscopy image. . . . . . . . . . . . . . . 45
5.3 Local feature extraction: regular grid and segmentation mask (left) and valid patches
(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.1 ROC of random guess classifier (red line) and cost line (dotted line). . . . . . . . . . . . 56
vi
6.2 Performance of the best classifier of each color space by using the unidimensional color
histogram as a global image descriptor and kNN(left) or AdaBoost(right) as the classifier. 57
6.3 Representation of the feature vectors obtained by using the unidimensional color histogram as the image descriptor and the corresponding best set of parameters. The
feature vectors that correspond to melanomas are represented on the left, whereas the
ones that correspond to non-melanomas are represented on the right. The misclassified lesions are represented in magenta (left) and blue (right). . . . . . . . . . . . . . . . 59
6.4 Performance of the best classifier in each color space by using the tridimensional color
histogram as a global image descriptor and kNN(left) or AdaBoost(right) as the classifier. 60
6.5 Representation of the feature vectors obtained by using the based on color histogram
best classifier. The feature vectors that correspond to melanomas are represented on
the left, whereas the ones that correspond to non-melanomas are represented on the
right. These results were achieved by using the tridimensional color histogram as a
global image descriptor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.6 Scatter plot of features x555 and x665 for melanomas (red) and non melanomas (blue).
On the right, the lesions which were misclassified by the system are now represented
in green and black. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.7 Performance of the best classifier in each color space by using the generalized color
moments as a global image descriptor and kNN(left) or AdaBoost(right) as the classifier. 62
6.8 Representation of the feature vectors obtained by using the generalized color moments
as the image descriptor and the corresponding best set of parameters. The feature
vectors that correspond to melanomas are represented on the left, whereas the ones
that correspond to non-melanomas are represented on the right. The misclassified
lesions are represented in magenta (left) and blue (right). . . . . . . . . . . . . . . . . . 63
6.9 Performance of the best classifier in each color space by using the unidimensional color
histogram as a local image descriptor and kNN(left) or AdaBoost(right) as the classifier. 65
6.10 Performance of the system as a function of K. These results were achieved by using
the unidimensional color histogram as a local image descriptor. . . . . . . . . . . . . . . 66
6.11 Performance of the system as a function of δ. These results were achieved by using
the unidimensional color histogram as a local image descriptor. . . . . . . . . . . . . . . 67
6.12 Performance of the system as a function of the number of bins of the histogram. These
results were achieved by using the unidimensional color histogram as a local image
descriptor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.13 Performance of the best classifier in each color space by using the mean color vector
as a local image descriptor and kNN(left) or AdaBoost(right) as the classifier. . . . . . . 68
6.14 Performance of the system as a function of K. These results were achieved by using
the mean color vector as a local image descriptor. . . . . . . . . . . . . . . . . . . . . . 69
6.15 Performance of the system as a function of δ. These results were achieved by using
the mean color vector as a local image descriptor. . . . . . . . . . . . . . . . . . . . . . 72
vii
6.16 Performance of the best classifier in each color space by using the generalized color
moments as a local image descriptor and kNN(left) or AdaBoost(right) as the classifier.
73
6.17 System’s performance as a function of K. These results were achieved by using the
vector containing the generalized color moments of the patch as a local image descriptor. 73
6.18 System’s performance as a function of δ. These results were achieved by using the
vector containing the generalized color moments as a local image descriptor. . . . . . . 73
6.19 Examples of images classified by the best global system: a TN (top row, left), FN (down
row, left) and TP (down row, right). There was no FP classification. . . . . . . . . . . . . 74
6.20 Examples of images classified by the best local system: a TN (top row, left), FN (down
row, left) and TP (down row, right). There was no FP classification. . . . . . . . . . . . . 74
6.21 scale=0.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
viii
List of Tables
3.1 Previous CAD systems for melanoma classification. . . . . . . . . . . . . . . . . . . . . 28
3.2 Color spaces and color features used in previous CAD systems for melanoma detection. 29
6.1 Best performance of the system for each color space. The results were obtained by
using the unidimensional color histogram as a global image descriptor and kNN as the
classifier. The measures used to evaluate the performance were the SE and SP values
as well as their associated cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2 Best performance of the system for each color space. The results were obtained by
using the unidimensional color histogram as a global image descriptor and AdaBoost
as the classifier. The measures used to evaluate the system’s performance are the SE
and SP values as well as their associated cost. . . . . . . . . . . . . . . . . . . . . . . . 56
6.3 Best performance of the system for each color space. The results were obtained by using the tridimensional color histogram as a descriptor. The measures used to evaluate
the system’s performance are the SE and SP values as well as their associated cost.
The parameters filed by ’-’ indicate that there are more than two possible configurations
which lead to the same result. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.4 Best performance of the system for each color space. The results were obtained by using the tridimensional color histogram as a descriptor. The measures used to evaluate
the system’s performance are the SE and SP values as well as their associated cost. . . 59
6.5 Correspondence between the L∗ , a∗ and b∗ values and features. The variable i, represents the number of the bin and it goes from 1 up to 10. . . . . . . . . . . . . . . . . . . 61
6.6 Best performance of the system for each color space. The results were obtained by
using the generalized color moments as a global image descriptor. The measures
used to evaluate the system’s performance are the SE and SP values as well as their
associated cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.7 Best performance of the system for each color space. The results were obtained by
using the generalized color moments as a global image descriptor. The measures
used to evaluate the system’s performance are the SE and SP values as well as their
associated cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
ix
6.8 Best performance of the system for each color space. The results were obtained by
using the unidimensional color histogram as a local descriptor and kNN as classifier.
The measures used to evaluate the system’s performance are the SE and SP values
as well as their associated cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.9 Best performance of the system for each color space. The results were obtained by
using the unidimensional color histogram as a local descriptor and AdaBoost as the
classifier. The measures used to evaluate the system’s performance are the SE and
SP values as well as their associated cost. . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.10 Best performance of the system for each color space. The results were obtained by
using the mean color vector as the local feature descriptor and kNN as the classifier.
The measures used to evaluate the system’s performance are the SE and SP values
as well as their associated cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.11 Best performance of the system for each color space. The results were obtained by
using the mean color vector as the local feature descriptor and AdaBoost as the classifier. The measures used to evaluate the system’s performance are the SE and SP
values as well as their associated cost. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.12 Best performance of the system for each color space. The results were obtained by
using the generalized color moments as local image descriptors and kNN as the classifier. The measures used to evaluate the system’s performance are the SE and SP
values as well as their associated cost. The parameters filed by ’-’ indicate that there
are more than two possible configurations which lead to the same result. . . . . . . . . . 69
6.13 Best performance of the system for each color space. The results were obtained by
using the generalized color moments as the local feature descriptor and AdaBoost as
the classifier. The measures used to evaluate the system’s performance are the SE
and SP values as well as their associated cost. . . . . . . . . . . . . . . . . . . . . . . . 72
6.14 Best performance achieved for each descriptor of both CAD systems. . . . . . . . . . . 72
x
Abbreviations
CAD Computer-Aided Diagnosis
ADDI Automated Diagnosis of Dermoscopy Images
PSL Pigmented Skin Lesions
RGB Red, Green and Blue
S Short
M Median
L Long
HVS Human Visual System
HSV Hue, Saturation and Value
HSI Hue, Saturation and Intensity
HSL Hue, Saturation and Lightness
O1/2/3 Opponent
CIE Commission Internationale de L’Eclairage
CBIR content-based image retrieval
ED Euclidean Distance
KLD Kullback-Leibler Divergence
HI Histogram Intersection
DS Decision Stump
TP True Positive
TN True Negative
FP False Positive
xi
FN False Negative
SE Sensitivity
SP Specificity
LOOCV Leave One Out Cross-Validation
kNN k Nearest Neighbors
SVM Support Vector Machines
LR Logistic Regression
LDA Linear Discriminant Analysis
ANN Artificial Neural Networks
BoF Bag-of-features
BoW Bag-of-words
ROC Receiver Operating Characteristic
xii
1
Introduction
Contents
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Thesis goal and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
3
4
1
1.1
Motivation
Melanoma is a malignant lesion which results from the abnormal proliferation of melanocytes.
As melanocytes are the cells which produce melanin, the levels of this substance will be increased
in these areas. For this reason, melanomas are usually associated with asymmetric lesions, with
irregular borders and increased diameters, which show some modifications over time. Furthermore,
melanomas are often characterized by the presence of more than one color as well as of some
differential structures [3]. The presence of these specific characteristics eases the classification task.
One of the major concerns about these lesions is their ability to metastasize. Once a melanoma
has undergone the metastisation process, survival rate drops drastically and, in most cases, it is
considered incurable. Conversely, if the melanoma is detected in the beginning of its development, it
can be easily cured by a simple excision. Therefore, early detection of melanomas is one of the major
concerns of this field of research. However, for this to be achieved, sophisticated diagnosis techniques
are required, since naked eye diagnosis is not sufficiently meticulous to differentiate between the
specific characteristics of a melanoma and the ones from other types of lesions [3, 29].
Dermoscopy has been a popular choice among dermatologists to diagnose Pigmented Skin Lesions (PSL). In dermoscopy, dermatologists use a dermatoscope to obtain amplified images of the
lesions. Furthermore, it is possible to visualize the morphological structures of the lesion due to
the temporary elimination of the reflective properties of the skin by applying a liquid solution or a
cross-polarized light onto the lesion. Even though this technique provides a better visualization of
the lesions, it may be still difficult to reach a diagnosis. In order to overcome these difficulties, some
medical diagnostic methods were proposed to aid dermatologists. Two of the most popular methods
are the ABCD rule of dermoscopy and the 7-point checklist [6, 41], which are further explained in
Sections 3.3.1 and 3.3.2.
Nevertheless, the use of these methods does not always lead to satisfactory results. The interpretation of dermoscopy images is subjective and, even if performed by experienced dermatologists,
may lead to an incorrect or inconclusive diagnosis.
For these reasons, for the last years Computer-Aided Diagnosis (CAD) systems for melanoma
classification have been developed. These systems aim to aid dermatologists in the diagnosis of skin
lesions [12, 31].
The majority of the developed systems performs a binary classification of the lesions into melanoma
or non-melanoma. So far, classification is mainly based on the ABCD rule of dermoscopy. Symmetry-,
border-, color- and/or texture-related features are usually extracted from the dermoscopy images and
used for classification.
However, it is not yet known which features lead to a better system’s performance. The majority
of systems uses more than one type of features for image description and there is not much research
done on how each type of features contributes to melanoma classification.
2
1.2
Thesis goal and structure
This master thesis is integrated in the Automated Diagnosis of Dermoscopy Images (ADDI) project
(PTDC/SAUBEB/103471/2008), funded by FCT, which is being developed by the Faculdade de Ciências
da Universidade do Porto in consortium with Instituto Superior Técnico, in Lisbon, and Hospital Pedro
Hispano, in Matosinhos.
As mentioned in Section 1.1, it is not yet known which features are best suited to distinguish between melanomas and non-melanomas. Even though features such as shape, size, texture, among
others, also provide relevant information about skin lesions, this thesis will only focus on the information given by color features. The goal of this thesis is to evaluate the contribution of color in melanoma
detection and to choose from a set of different color features and color spaces the ones which perform best. Furthermore, a comparison between global and local extraction of color features is also
intended.
Two different color-based CAD systems are developed in this thesis, one is used to perform classification based on global feature extraction, whereas the other bases its decision on local features.
The former assumes that lesions are homogeneous and can be described by global parameters (e.g.,
color histograms and color moments). On the contrary, the latter, assumes that lesions are not homogeneous and, thus, cannot be represented by a single model or by average properties. As a
result, this second strategy performs a division of the lesion into several patches (or blocks), which
are assumed to be homogeneous. Each patch is then described by a set of parameters (e.g., color
histograms, color moments and average color). Finally, the fusion of the parameters of all the patches
describes the image.
This master thesis is divided into six chapters, excluding the introductory one. Chapter 2, named
Color and Vision, explores the basic color and vision theories in order to give an insight about color
properties and their importance. This chapter is organized in the following manner: in Section 2.1
the most remarkable discoveries about color vision are evidenced, Section 2.2 briefly describes how
light enters the eye, Section 2.3 explains a non-linear model of color vision, Section 2.4 shows how
almost every color can be obtained by a mixture of three colors and Section 2.5 makes a description
of several color spaces.
Chapter 3 is denominated Desmoscopy analysis and performs an introduction to PSL (Section 3.1)
and to dermoscopy (Section 3.2). Furthermore, it explains the two most popular medical diagnostic
methods used, by dermatologists, to identify melanomas in dermoscopy images (Section 3.3) and
introduces CAD systems for melanoma detection (Section 3.4) as well as the color features and color
spaces which have already been used in these systems (Section 3.5).
Chapter 4 describes the developed CAD system for melanoma detection by using global extracted
color features. Each section of this chapter describes a stage of the system. The system is divided
into three main stages: image segmentation (Section 4.2), feature extraction (Section 4.3) and classification (Section 4.4).
Chapter 5, by its turn, describes the second CAD system which performs image classification
3
based on locally extracted color features. This second system is based on the Bag-of-features (BoF)
model and is a little more complex. The system is divided into six main stages: image segmentation,
keypoint extraction (Section 5.2), feature description (Section 5.3), vocabulary construction (Section
5.4), image representation and classification (Section 5.5). As in in Chapter 4, each section of the
chapter describes a stage of the system.
Chapter 6 discusses the results obtained by applying both methods. Section 6.1 describes the
database used in this thesis and Section 6.2 describes the image pre-processing operations. Section
6.3 explains the metrics chosen to evaluate the system’s performance and Section 6.4 describes the
criterium used to determine the best performance of the system. Sections 6.5 and 6.6 show the
results obtained by applying, respectively, the systems proposed on Chapters 4 and 5. In the final
section of this chapter (Section 6.7) a comparison between the two systems is performed.
Finally, the conclusions of this thesis are in Chapter 7 along with some future directions proposals.
1.3
Contributions
In this thesis, two different algorithms for the classification of pigmented skin lesions were developed. Both systems have the particularity of solely using color features to describe the images.
However, the systems differ on the feature extraction stage. While the first considers that the lesions
are homogeneous and can be represented by a set of global features, the second considers the
lesions non homogeneous and uses several local feature vectors to represent the images. These
algorithms have the following contributions:
• Assessment of the role of color as a global descriptor - The first algorithm makes use of different color descriptors to assess the importance of color as a global descriptor in CAD systems
for PSL classification.
• Assessment of the role of color as a local descriptor - The second algorithm makes use of
different color descriptors to assess the importance of color as a local descriptor in CAD systems
for PSL classification. This is a novel approach since it has not been adopted before in the
literature.
• Introduction of the opponent color space for PSL classification - In addition to some color
spaces already used in previous studies of CAD systems for PSL classification, this work also
uses the opponent color space. This color space had already been used for image processing
in other fields of research.
• Introduction of the generalized color moments as descriptors in the CAD system for melanoma
classification - In addition to the statistical measures and color histograms, which had already
been used in previous studies of CAD systems for PSL classification, one introduce the generalized color moments, developed by Mindru et al, as color descriptors in these systems.
4
2
Vision and Color
Contents
2.1
2.2
2.3
2.4
2.5
2.6
Color Vision . . . . .
Physiology of the eye
Color Vision Model .
Color Representation
Color Spaces . . . . .
Conclusions . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
6
8
8
9
10
15
5
Color and vision are closely related and, thus, it would be inaccurate to introduce color concepts
without mentioning the Human Visual System (HVS). Even though the representation of the objects
in the human brain is colored, the light emitted by the objects is absent of color. Therefore, color is a
sensation created by the HVS [14].
Hereupon, this chapter not only aims to introduce some color concepts but also to clarify the
reader about the HVS. This chapter is organized as follows: Section 2.1 introduces the main color
vision theories and mentions the major discoveries about the HVS. Section 2.2 contains a broad
introduction about the physiology of the eye. Section 2.3 presents a color vision nonlinear model.
Section 2.4 explains how almost every color can be obtain by a combination of the three primary
colors and Section 2.5 makes an introduction to the most important color spaces.
2.1
Color Vision
Even though other color theories had already been presented, the first major color theory was
the one introduced by Newton in the 18th century. Newton observed that, when passing through a
prism, the sun light was refracted and divided into seven different colors. Through this experiment,
Newton proved that the sun light is not pure, instead it results from the mixing of seven different
colors. Furthermore, by using another prism, Newton tested if these colors were pure or if they could
be further separated into more colors. As a result, Newton observed that the colors did not suffer
further division and, thus, they were considered pure colors, or hues. The seven hues are: red,
orange, yellow, green, blue, indigo and violet, which are denominated spectral hues. Additional hues
can be obtained by the mixture of these spectral hues. If the same color can be obtained by both sun
light division and mixture of other hues, it is denominated a metameric color [14].
The spectrum of visible light is a part of the electromagnetic spectrum and light corresponds to
electromagnetic radiation with a wavelength within 380 and 750 nm. Each one of spectral hues
observed by Newton belongs to a particular region of the visible spectrum. In Figure 2.1 it is possible
to observe which region of the spectrum corresponds to each hue.
Figure 2.1: Electromagnetic spectrum [13].
6
However, the answer to how humans perceive light as a mixture of different colors was only answered in 1802 by Thomas Young who postulated that the human eye is composed by three types
of photoreceptor cells, each absorbing in a particular region of the light spectrum. According to his
theory, the information collected by the three photoreceptors, together, enables color vision. Later, in
1850, Hermann von Helmholtz continued Young’s work and stated that the three photoreceptors had
their peak of absorption in the red, green and blue regions of the visible spectrum. Hence, from their
discoveries, it was found that almost every color can be obtained from a mixture of red, green and
blue [14, 45]. Their theory became known as the trichromatic theory.
Around 1830, a retina was observed by the first time, in a microscope, by a group of German
scientists who identified two groups of cells, later named cones and rods due to their shape. In
following studies it was found out that these cells were sensitive to light. Nevertheless, only in 1866,
after the trichromatic theory had been proposed, Max Schultze observed, through his experiments,
that the cones were sensitive to colored glass, and, thus, to color [56, 62].
The first experiments to measure the absorbance spectra of the human photoreceptores were
performed in 1964 by W.B. Marks, W. H. Dobelle and E.F. MacNichol[36], and P. K. Brown and G.
Wald [10], by using a microspectrophotometer. From these experiments the absorption curves of the
three cones were deduced and are shown in Figure 2.2. The three types of cones are denominated
Short (S), Median (M) or Long (L), according to the gamut of wavelengths they absorb. As it had
already been stated by Helmholtz, S, M and L have their peaks of absorbance in the blue, green and
red regions of the visible spectrum [14].
Figure 2.2: Cones’ response to light [63].
Around 1870, Ewal Hering proposed the opponent-color theory. According to his observations,
there was no color that resulted from a mixture of red and green nor from a mixture of yellow and blue.
Furthermore, he observed that dichromate observers could not distinguish between red and green or
between blue and yellow. These and some other observations made him defend that instead of three
cones absorbing in the regions of the spectrum correspondent to red, green and blue, humans had
two chromatic and one achromatic cones with opposite responses. The response of the chromatic
cones would vary, separately, from red to green and from yellow to blue, whereas the one from the
achromatic cone would vary from black to white [11, 45].
7
In 1881 and 1905, two stage color vision models that gathered both Young-Helmholtz’s and Hering’s theories were proposed by Donders and von Kries, respectively. However, their work did not
receive sufficient credit. Only in 1955 and 1956 Jameson and Hurvich were able to quantitatively
prove that in fact the HVS can be represented by two stages. The first stage occurs at a receptoral level and it is in agreement with the trichromatic theory, whereas the second stage occurs in a
post-receptoral level and it is consistent with the opponent color theory [15].
2.2
Physiology of the eye
Light enters the eye through the cornea and passes through the crystalline lens where the ciliar
muscle adapts its shape in order to accommodate the light in the retina. The iris controls the amount
of light that enters the eye through the pupil, by controlling its diameter. When the light reaches the
retina, it is absorbed by the rods and the cones, which are, as mentioned in Section 2.1, respectively
responsible for the achromatic and chromatic vision. In the retina, light is converted into neural signals,
which are conducted through the optic nerve into the brain. A scheme of the eye’s anatomy can be
seen in Figure 2.3. [14, 50]
Figure 2.3: Anatomy of the human eye [47].
There are only about 800 000 nerve fibers to more than 100 000 000 photoreceptors and, thus,
each nerve fiber has to be connected to several photoreceptors. However, not all the photoreceptors
have the same contribution for the generation of the nerve impulse. This mechanism is denominated
lateral inhibition. Each photoreceptor has an associated weight, which can be positive or negative.
Negative weights represent inhibitory responses of the photoreceptors [50].
2.3
Color Vision Model
Several experiments have already confirmed the nonlinearity of HVS [25]. It has been proven that
the photoreceptors’ response is not proportional to the intensity of the incident light. Experimental
measurements have shown that the spatial frequency response of the HVS can be approximated by
a logarithmic function. This approximation is, however, inaccurate for very high or very low spatial
frequencies.
8
Color vision models that consider both the nonlinear response of the retina and the lateral mechanism, explained in the previous section, have already been proposed. An example of a nonlinear
color vision model is the one proposed by W. Frei and B. Baxter in 1977 [20], an adapted scheme of
the proposed model is given in Figure 2.4.
Figure 2.4: Color Vision Model proposed by W. Frei and B. Baxter (adapted from [20]).
The proposed model is a two stage model, since it gathers both trichromatic and opponent color
vision theories. In the first stage, based on the trichromatic theory, when the light reaches the retina
and is absorbed by the three types of cones: red (or L), green (or M) and blue (or S). As a response,
the cones produce the signal given by equation (2.1), in which i ∈ {1, 2, 3}, and where C is the
spectral energy distribution of the incident light and si the spectral sensitivity function of each cone.
The nonlinear response of the cones is then approximated by a logarithmic function. Hence, the
resulting signals are given by equation (2.2).
In the second stage, based on the opponent color theory, the cones’ response is separated into
chromatic and achromatic information. The achromatic information is given by equation (2.3) whereas
the chromatic information is given by two separate responses obtained from the weighted linear differences shown in equations (2.4) and (2.5). Finally, each function is subjected to a spatial band-pass
filter which simulates the lateral inhibition effect.
Z
Ti = C(λ)si (λ)dλ
2.4
(2.1)
Di∗ = log Ti
(2.2)
G∗1 = c1 D1∗
(2.3)
G∗2 = c2 (D2∗ − D1∗ )
(2.4)
G∗3 = c3 (D3∗ − D1∗ )
(2.5)
Color Representation
Color perception results from the response of the HVS to a light stimulus. As mentioned in the
previous section, the response of the cones depends on the light spectra, C(λ), which reaches the
retina and is given by equation (2.1), in which si is the sensitivity function of each cone and i ∈ {1, 2, 3}
corresponds to the red, green and blue cones, respectively.
9
Since the perception of color does not only depend on the incident light spectrum, but also on the
response of the cones, which only respond to particular wavelengths, it is possible for two different
spectra to originate the same visual perception. However, for this to happen, the response of the
three cones has to be the same for both spectra. Therefore, two different spectra C1 and C2 will only
achieve the same cones’ response if
Z
Z
C1 (λ)si (λ)dλ =
C2 (λ)si (λ)dλ
∀i ∈ {1, 2, 3} .
(2.6)
In fact, this property of the HVS is the one that enables the representation of almost every color
by a mixture of the spectra of the three primaries red, green and blue. Therefore, even though a light
spectrum is usually represented by a complex function that cannot be matched by a linear combination
of the spectrum of the three primaries, most times it is possible to combine the primaries in order to
both spectra be equally perceived.
Considering Pi (λ) the spectrum of primary i and ci a constant that represents the amount of each
primary, two spectra are perceived the same if [37, 45]
Z
C(λ)si (λ)dλ =
Z X
3
cj Pj (λ)si (λ)dλ
∀i ∈ {1, 2, 3} ,
(2.7)
j=1
Condition (2.7) can be represented by a system of three linear equations:
 R
R P1 (λ)s1 (λ)dλ
 P1 (λ)s2 (λ)dλ
R
P1 (λ)s3 (λ)dλ
R
R


  R
c1
R P2 (λ)s1 (λ)dλ R P3 (λ)s1 (λ)dλ
R C(λ)s1 (λ)dλ
  c2  =  C(λ)s2 (λ)dλ  .
R P2 (λ)s2 (λ)dλ R P3 (λ)s2 (λ)dλ
R
P2 (λ)s3 (λ)dλ
P3 (λ)s3 (λ)dλ
c3
C(λ)s3 (λ)dλ
(2.8)
Two spectra can only be perceived the same if there is a ci ≥ 0 ∀i ∈ {1, 2, 3} [37, 45].
2.5
Color Spaces
In this section an introductory explanation about the most important color spaces is performed.
The color spaces discussed in this section can be divided into two distinct groups [57]: HVS based
color spaces and CIE color spaces. The former includes all color spaces with a physiological background as well as the color spaces derived from these ones. This group includes the RGB color
space, which is based on the trichromatic theory, the Opponent (O1/2/3) color space, which derived
from the opponent color theory, and the phenomenal color spaces, which derive from the RGB color
space and include the Hue, Saturation and Value (HSV), Hue, Saturation and Lightness (HSL) and
Hue, Saturation and Intensity (HSI) color spaces.
The second group is composed by the color spaces defined by the CIE and include the CIELAB
and CIELUV color spaces.
2.5.1
RGB
There are several color spaces which can be used for image representation. One of the most popular color spaces is based on the trichromatic theory and, thus, defines each color as a combination
of the three primary colors. This color space is denominated RGB.
10
The RGB color space is geometrically defined by a cube in which each coordinate can take values
from 0 up to 255. The axis that links the vertices defined by the coordinates (0, 0, 0) and (255, 255,
255), which respectively represent pure black and pure white, is denominated achromatic axis. All
the colors within this axis are represented by equal values of R, G and B and are considered neutral
colors [50].
The devices which capture images in the RGB color space have three sensors, one for each primary color. The information captured by each sensor depends, not only on the incident light but also,
on the sensitivity function intrinsic to the sensors. Since the obtained image depends on the sensitivity functions of the sensors and since these functions differ between devices, the RGB color space
is considered to be device-dependent [57]. This is an undesirable characteristic, since it complicates
the comparison between images obtained with different devices.
Another drawback of this color space, when applied to natural images, is the existing high correlation between the red, green and blue channels. Studies have shown correlations of 0.78 between
the blue and red channels, 0.98 between the red and green channels, and 0.94 between the green
and blue channels [44]. Furthermore, this color space is perceptually non-uniform, which means that
the difference between two different pairs of colors separated by the same distance may not have the
same visual importance [57].
2.5.2
HSV, HSL and HSI
The RGB color space can be transformed into other color spaces. A possible transformation is to
convert the rectangular coordinates into cylindrical ones. The new color space will be represented by
a cylinder instead of a cube. In this cylindrical color space, each color is described as a combination
of hue, saturation and brightness rather than a combination of red, green and blue. Hue is the angular
coordinate and varies from 0o to 360o . This coordinate represents the predominant color of a given
region. The primaries red, green and blue have a hue of 0o , 120o and 240o , respectively. Saturation is
the amount of pure color present in the region with respect to the amount of white. Saturation ranges
from 0 to 1, where colors with saturation 1 are denominated pure colors. Are considered to be pure
colors all the primary and secondary colors as well as the combination of adjoining pairs of them.
Brightness, by its turn, measures the amount of white present in the color. It varies from 0 to 255,
where 0 corresponds to the pure black and 255 corresponds to the pure white. There are several
brightness functions, among them the more common are value, lightness and intensity. Depending
on the chosen brightness function, the cylindrical color space is denominated HSV, HSL and HSI,
respectively [59].
Although one is denominating this color spaces as cylindrical, the primary geometrical structures
obtained from these transformation are not cylinders, they are hex-cones or double hex-cones instead,
depending on the brightness function. If the chosen function is value or intensity, a hex-cone will be
obtained. On the other hand, if the chosen function is lightness one will obtain a double hex-cone. To
better understand how these geometric structures are achieved, these transformations are explained
with further detail.
11
Firstly, the vertical axis of the new color space is defined. This axis corresponds to the achromatic
axis of the RGB color space. The values along this axis represent the brightness values. Each
brightness value is intersected by a chromatic plane. These planes are obtained in a different manner
for the HSV and HSI and for the HSL color spaces. In the formers, for each brightness value(l), the
chromatic plane will be defined by the projection of the three faces of the cube which contain vertix
(l, l, l) in a plane. These projections have hexagonal shape and all together form the hex-cone. In
the latter, a cube with a principal diagonal between (0, 0, 0) and (2l, 2l, 2l) is defined for brightness
lHSL ≤
1
2,
whereas for lHSL >
1
2
a cube with a principal diagonal between (2l − 1, 2l − 1, 2l − 1)
and (1, 1, 1) is defined. In this case, the chromatic planes result from the projection of the triangles
obtained linking the point (l, l, l) to the six vertex of the defined cube. The chromatic plane with the
larger surface occurs for lHSL =
1
2
and, thus, the tridimensional shape of this color space is a double
hex-cone [26, 59].
The HSV, HSL and HSI are considered phenomenal color spaces. They are based on the Newton’s color circle, in which colors are discriminated by their hue and saturation. Because they derive
from the RGB color space, they are also based on the HVS [57].
The problem with this tridimensional representation has to do with the high risk of choosing values
outside the range of colors. Therefore, in order to define a clearer color space, a cylindrical geometry
was chosen. Nonetheless, this cylindrical geometry also has some shortcomings. Firstly, pure black
and white achieve high saturation values, which does not make sense when considering the saturation
and brightness definitions. Furthermore, a difference of the same number of pixels between two colors
in a low brightness region and in a high brightness region will represent a low color variation in the
former and high color variation in the latter. As a result, these color spaces are also perceptually
non-uniform.
More than one transformation can be used to convert the RGB into the HSV, HSI or HSL color
space. In this work, one will use the ones proposed by [59], see equations (2.9), (2.10), (2.11), (2.12)
and (2.13).
H = arccos( p
1
2 ((R
− G) + (R − B))
(R − G)2 + (R − B)(G − B)
min(R, G, B)
R+G+B
R+G+B
I=
3
S =1−3
V = max(R, G, B)
L=
2.5.3
max(R, G, B) + min(R, G, B)
2
)
(2.9)
(2.10)
(2.11)
(2.12)
(2.13)
Opponent
The opponent color space described in this thesis can also be obtained by a linear transformation
of the RGB space. This color space is based on Hearing’s opponent process color theory, proposed
around 1870. According to Hering, no mixture of the three standard primary colors defines the color
12
yellow. Hence, he argues that yellow should also be considered a primary color. Furthermore, Hering
also states that no single color results from the combinations of red and green or yellow and blue.
As a consequence, he claims the existence of two opposite color channels in our visual system, one
that goes from green to red (O1) and another that goes from blue to yellow (O2). These channels are
independent between them, for instance, colors resulting from the combination of green-yellow and
red-yellow exist. Additionally to these two chromatic channels, there is an achromatic channel (O3).
This channel goes from black to white and, thus, it encodes the lightness of the color [11].
In fact, years after this theory had been proposed, a type of cells, named ganglion cells, was
found on the surface of the retina. It is believed that the ganglion cells are the ones responsible for the
conversion of the red, green and blue outputted by the cones into variations between red and green,
yellow and blue, and black and white(lightness)[14].
This color space is geometrically defined by a parallelepiped. The coordinates of the opponent
color space can be obtained from the RGB coordinates through equations (2.14), (2.15) and (2.16)
[58].
R−G
√
2
R + G − 2B
√
O2 =
6
R+G+B
√
O3 =
3
O1 =
2.5.4
(2.14)
(2.15)
(2.16)
CIE System
In 1931, the Commission Internationale de L’Eclairage (CIE) developed, for the first time, a scientific method to define colors. Before that, colors were described by comparison, which made color
discrimination a difficult task. The CIE system is based on the findings of W. David Wright and John
Guild. In their experiments, a group of subjects had to match a test light that was projected onto a
screen by using a combination of three lights, emitting within the same wavelength of the primaries
red, green and blue. This test light was a monochromatic light of wavelength λ. Performing an average of the subjects’ results, a color matching function for each primary was determined [2]. These
functions aimed to represent the cone’s response as a function of the wavelength of the incident light.
In Figure 2.5 one can see the behavior of these matching functions. A particularly interesting fact
about the color function which represents the imaginary red is that it can sometimes take negative
values. A negative value occurs when a primary has to be added to the test light in order to obtain a
match [33]. The color space defined by these functions is denominated CIE RGB.
In order to avoid the possible complications of using simultaneously positive and negative values,
the CIE developed the CIE XYZ color space. This color space is obtained by applying a linear transformation to the CIE RGB. In this space, colors are identified by their X, Y and Z values, which are
imaginary primaries. The variable Y corresponds to the color luminance whereas the variables X and
Z define the colors chromaticity. According to [50] the X, Y and Z coordinates can be obtained from
the RGB color space by using equations (2.17), (2.18) and (2.19).
13
Figure 2.5: R, G and B color matching functions [46].
X = 0.490R + 0.310G + 0.200B
(2.17)
Y = 0.177R + 0.812G + 0.011B
(2.18)
Z = 0.010G + 0.990B
(2.19)
An alternative space to the CIE XYZ is the CIE xyY. This color space is obtained from the X, Y
and Z variables using equations (2.20), (2.21) and (2.22). The variables x and y represent the color
chromaticity whereas Y represents the luminance. Figure 2.6 shows the CIE x-y chromaticity diagram.
In this chart, the monochromatic colors are along the curvilinear edge, named spectral locus, whereas
the straight edge, denominated line of purples, only defines the limits of the color space. The values
around the spectral locus are the wavelengths of the monochromatic colors, in nanometers. The
triangle limits the possible range of colors in the CIE RGB color space [33].
Figure 2.6: CIE xyY chromaticity diagram [33].
X
X +Y +Z
Y
y=
X +Y +Z
x=
Y =Y
14
(2.20)
(2.21)
(2.22)
2.5.5
CIELAB and CIELUV
The CIELAB and the CIELUV are perceptually uniform color spaces, which means that they follow
the non-linear response of the HVS. Color transitions with the same visual importance are separated
by the same euclidean distance [57]. These color spaces were adopted by the CIE in 1976.
The L*a*b* color space results from a non-linear transformation of the CIE XYZ color space, see
equations (2.23), (2.24), (2.25) and (2.26). This non-linear transformation tries to mimic the non-linear
response of the human eye.
On the other hand, the L*uv color space was created for the cases in which the perceived difference between two colors is proportional to their distance in the color space. In this case the CIE XYZ
coordinates undergo a linear transformation, see equations (2.27), (2.28),(2.29) and (2.30) [32].
Both the L*a*b* and the L*uv color spaces have three main variables: L*, a* and b* in the former
and L*, u and v in the latter. L* represents lightness whereas a* and b*, and u and v, represent both
the hue and saturation of a color. These color spaces are also based on the Hearing’s opponent
process color theory, since L* goes from black to white, a* and u go from green to red and b* and v
go from blue to yellow [32].
Y
) − 16
Yn
(2.23)
a∗ = 500[f (
X
Y
) − f ( )]
Xn
Yn
(2.24)
b∗ = 200[f (
Y
Z
) − f ( )]
Yn
Zn
(2.25)
L∗ = 116f (
f (x) =
2.6
x3
7.87x +
16
116
x > 0.008856
otherwise
(2.26)
u = 13L∗ (u0 − u0 0 )
(2.27)
v = 13L∗ (v 0 − v0 0 )
(2.28)
u0 =
4X
X + 15Y + 3Z
(2.29)
v0 =
9Y
X + 15Y + 3Z
(2.30)
Conclusions
To sum up, color is a sensation created by the HVS as a response of the retina to a visible light
spectrum. The retina contains two types of photoreceptors: the rods and the cones. Color perception
is based on the response of the cones to a light stimulus. At the receptoral level, there are three types
15
of cones red, green and blue, which have their peak of absorption in the red, green and blue regions
of the spectrum. At a post-receptoral level, the ganglion-cells in the retina transform the information
provided by the cones into three opposite channels whose signals go from black to white, yellow to
blue and red to green.
In order to represent the retina’s behavior, several color vision models have already been proposed. The more recent models assume a non-linear response of the cones to the incident light and
use a logarithmic function to mimic this response.
Colors can be represented by several alternative spaces. In this thesis, color spaces were separated into two groups. The first includes the color spaces based on the HVS, such as the RGB, the
O1/2/3 color space and the phenomenal (HSV, HSL and HSI) color spaces. The second is formed
by the color spaces proposed by the CIE, such as the CIE XYZ and the perceptually uniform color
spaces (CIELAB and CIELUV).
16
3
Dermoscopy analysis
Contents
3.1
3.2
3.3
3.4
3.5
3.6
Skin Lesions . . . . . . . . . . . . .
Dermoscopy . . . . . . . . . . . . .
Medical Diagnostic Methods . . . .
Computer Aided Diagnosis system
Color Features in CAD systems . .
Conclusions . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
20
20
22
25
26
17
3.1
Skin Lesions
Skin lesions can appear at birth or later in life and depend not only on genetics but also on the
sun light exposure. There are several types of skin lesions and their classification depends on their
location and origin.
Skin can be roughly divided into three layers: the epidermis, the dermis and the hypodermis.
The epidermis is the most external layer, whereas the hypodermis is the most internal one. Located
between them we have the dermis. According to the its location a lesion might be classified as
junctional, intradermal or compound. A lesion is considered to be junctional if it is placed in the
junction between the epidermis and the dermis, is considered to be intradermal if located within the
dermis and is considered a compound if it results from the clustering of junctional and intradermal
lesions.
Furthermore, lesions can be classified as melanocytic or non-melanocytic according to their origin.
If a lesion derives from an increased number of melanocytes it is denominated a melanocytic lesion.
The melanocytes produce a dark pigment named melanin. For this reason, the majority of melanocytic
lesions present a darker color than the surrounding skin. The main classes of melanocytic lesions
are melanocytic nevi, lentigines and melanomas. While the former two are benign, melanomas are
malignant [3].
On the other hand, if the lesion has any other origin it is classified as a non-melanocytic lesion.
Non-melanocytic lesions can be mainly divided into basal cell carcinoma (BBC), seborrheic keratosis,
vascular lesion and dermatofibroma. As melanoma, the BBC is also a malignant lesion. However,
unlike melanomas, they have a very low growth rate representing a considerably smaller threat. Nevertheless, if not treated they might cause severe damage to the surrounding tissues or even death.
Figure 3.1 is a scheme of the main types of lesions’ classification.
Figure 3.1: Skin Lesion Classification
18
3.1.1
Melanomas
A melanoma results from an uncontrolled division of melanocytes. The two major concerns regarding melanomas are their increased growth rate and their ability to metastasize. This high growth
rate is responsible for the major characteristics of a melanoma. Melanomas grow asymmetrically and
present a high variegation and some differential structures as a consequence of the irregular distribution of melanin across the lesion. These particular features ease the classification of melanomas.
Medical diagnostic algorithms based on the presence or absence of these features, such as the ABCD
rule of dermoscopy [41](see Section 3.3.1), were developed to aid dermatologists. Figure 3.2 shows
two examples of melanomas which exhibit these traits.
Other commun feature in melanomas is the presence of differential structures. The differential
structures that more strongly indicate a melanoma are: atypical pigment network, blue whitish veil and
atypical vascular pattern. Furthermore, melanomas may also have irregular streaks, dots or globules
and an irregular pigmentation. In fact, the presence or absence of these differential structures is also
used to classify a lesion in the medical diagnostic algorithm named 7-point checklist (see section
3.3.2) [3, 6].
Though melanomas have some specific features, they might be mistaken for other lesions or viceversa. Some lesions that might present some of the same features as melanomas are the dysplastic
nevus, the seborrheic keratosis or the blue nevus [17]. Dysplastic nevi also have a melanocytic origin
and they are one of the precursors of melanoma. Therefore, it is important to make a regular check-up
to these lesions. Variations in the lesion’s size, color, thickness and/or shape are strong indicators of
a nevus’ degeneration into melanoma.
Seborrheic keratosis are non-melanocytic lesions, but their appearance might sometimes be similar to melanomas, specially in the case of reticulated seborrheic keratosis. Finally, even though more
exceptionally, blue nevi can also be mistaken with melanomas.
The misdiagnosis of melanomas by other lesions is a major problem in dermoscopy. If melanomas
are not detected and treated in an early stage, metastization occurs and the chances of survival are
greatly diminished. Therefore, it is more and more important to develop diagnostic techniques with an
increasingly smaller probability of misdiagnosis.
Figure 3.2: Examples of melanomas.
19
Figure 3.3: Dysplastic Nevu.
3.2
Figure 3.4: Blue Nevu.
Figure 3.5: Seborrheic keratosis.
Dermoscopy
A popular alternative to naked eye diagnosis has been, for the past three decades, dermoscopy.
In dermoscopy, diagnosis is performed using a dermatoscope. This instrument contains a light source
and a magnifying lens. The lens is placed over the lesion in order to magnify its morphological structures. However, the traditional dermatoscope by itself cannot provide a good visualization of these
structures due to the reflective properties of the skin. In order to cancel these reflective properties a
liquid solution is spread over the skin or on the lens [3, 4, 12]. More recently, a dermatoscope with
an integrated cross-polarized light has been developed. The cross-polarized light both cancels the
reflective properties and precludes the use of the liquid solution [4].
Nonetheless, the acquisition of a good image, alone, is not enough for a successful diagnosis. Images are often unclear and certain features, such as the number of colors and differential structures,
are difficult to identify. Therefore, the dermatologist’s experience is an important factor regarding diagnosis accuracy [30]. However, even when performed by an experienced dermatologist, the diagnostic
might be inconclusive. In such cases, a histological examination, which is an invasive procedure,
must be performed.
Figure 3.6: Classic Dermatoscope.
3.3
Figure 3.7: Modern Dermatoscope.
Medical Diagnostic Methods
Due to the subjectivity of the diagnosis and in order to avoid unnecessary histological examination,
some algorithms have been developed to aid dermatologists to interpret the dermoscopy images. The
two most used algorithms are the ABCD rule of dermoscopy, introduced by Stoltz et al. in 1994 [41],
and the 7-point checklist, introduced by Argenziano et al. in 1998. Both of these methods are only
appropriate to diagnose between melanocytic lesions [3].
20
3.3.1
ABCD rule of dermoscopy
A melanoma can be identified using the ABCD rule of dermoscopy, which is a diagnostic algorithm
used by dermatologists when analyzing dermoscopy images. This rule classifies melanocytic lesion,
into a nevu or a melanoma, according to its asymmetry (A), border (B), color (C) and to the presence
of differential dermatoscopic structures (D).
The lesion is considered to have two perpendicular axes of symmetry. If both axes are asymmetric,
score A will be two. If there is only an asymmetric axis, score A will be one. Otherwise, score A will
be zero.
Score B depends on the characteristics of the border of the lesion. Firstly, the lesion is divided into
eighths. Secondly, the transition between the lesion and the skin in each eighth is analyzed. If there
is a great variation of the pigment pattern, that eighth will have score 1. If, on the other hand, there
is a smooth transition between the lesion and the healthy skin, that eighth will have zero score. The
score given to each eight is added and Score B is obtained by summing the score of all eighths.
Score C depends on the number of colors present on the lesion, in a range of six colors. The
considered colors are: white, red, light-brown, dark-brown, blue-gray and black. Each observed color
contributes with one point to score C. Therefore, the maximum score C is six, whereas the minimum
is one.
Finally, score D depends on the number of differential structures present on the lesion. These
differential structures might be: pigment network, structureless or homogeneous areas, streaks, dots
and globules. Each identified structure adds one point to score D. Thus, score D ranges from one to
five.
The total dermoscopy score (TDS) is computed according to [3]
T DS = 1.3 × scoreA + 0.1 × scoreB + 0.5 × scoreC + 0.5 × scoreD.
(3.1)
According to the TDS a lesion is classified as: benign, if the value is lower than 4.75. Suspicious if
the value lies between 4.8 and 5.45, and highly suspicious if a higher value is obtained.
3.3.2
7-point checklist
The 7-point checklist is another diagnostic algorithm used by dermatologists to identify melanomas.
It is considered a simplified scored pattern analysis since it only requires the identification of seven
dermoscopic criteria [3]. These criteria can be divided into two groups: the major criteria group and
the minor criteria one. The former includes atypical pigment network, blue-whitish veil and atypical
vascular pattern, whereas the latter comprises irregular streaks, irregular pigmentation, irregular dots
or globules, and regression structures, see Figure 3.8.
Every major criterion contributes with a score of two to the final seven-point score, while each
minor criterion adds a score of one to the final score.
A lesion is considered a melanoma if the total score is equal or greater than three.
21
Figure 3.8: Dermatoscopic structures used as a criteria in the 7-point checklist diagnostic algorithm [3].
3.4
Computer Aided Diagnosis system
As aforementioned, the process of lesion classification is a subjective and time-consuming task
which requires experience to be performed with a satisfactory accuracy. Thus, in order to simplify the
classification problem, automatic diagnostic techniques have been developed for the last two decades
[12, 29, 31].
However, these techniques cannot be used alone in the diagnosis process. They still have some
unresolved problems, such as limited tumor class acceptability. Furthermore, most automated techniques impose several restrictions regarding the input images characteristics. Therefore, computerbased techniques are only accepted to support the diagnosis performed by a dermatologist [29].
An automatic diagnostic technique must contain the following three stages: image segmentation,
feature extraction and lesion classification, as shown in Figure 3.9. Each of these stages will be briefly
discussed in the next sections.
Figure 3.9: Overall scheme of a CAD system for melanoma detection.
3.4.1
Image Segmentation
Image segmentation is the first of the three previously defined stages of a CAD system. In this
stage, the skin lesion is separated from the healthy skin present in the image. This is an important
task because these two regions have different features and classification is usually solely based on
the skin lesion. Therefore, if the features extracted from healthy skin areas were also used to describe
the lesion, the classification of the PSL would not be as reliable.
Segmentation is also important to define the contour of the lesion, which is important to evaluate
its symmetry [12].
Segmentation can be roughly divided into region-based or border-based methods. In the former
the goal is to identify all the pixels within the lesion whereas the latter has its focus on detecting the
contours of the lesion. As examples of region-based segmentation one has the adaptive thresholding
method [53], the watershed technique [61], the region growing algorithm [12, 30], among others. On
the other hand, adaptive snakes [42] gradient vector flow [53] are two examples of border-based
22
segmentation methods.
In the majority of cases, the segmentation performed by automated methods differs from the ones
performed by dermatologists. Usually, the former results in smaller tumor extracted areas than the
latter, which might lead to a loss of information. Some authors [12, 30] use region growing algorithms
in order to minimize this difference. Region-growing enlarges the area of segmentation, which results
in larger extracted lesion areas.
Currently, there is no consensus about which method performs best.
3.4.2
Feature Extraction
Each image is represented by a vector of features. Vectors of features are vectors which contain
values that reflect the characteristics of the images. As aforementioned, this vector will depend on
image segmentation.
The extracted features, based on the ABCD dermoscopy rule, are usually shape-, size-, textureand color-related. They are typically extracted from the lesion area. In most works, this extraction can
be done globally [12, 28, 30, 49] or locally [51, 54, 65]. In the former, features are extracted from the
whole lesion whereas in the latter, the lesion is divided into blocks, and the features are extracted from
each block or from a comparison between blocks. For instance, features might be related with color
differences between blocks [51]. A local feature extraction method which has already been applied in
some of these systems is based one the BoF model [54, 65]. This model will also be applied in one
of our CAD systems 5. However, the great majority of works adopts a global extraction of features.
Nonetheless, some information can also be obtained from the periphery of the lesion and from the
regions of healthy skin. These features are usually used to evaluate the sharpness of the transitions
between healthy skin and lesion, since sharp transitions might be an evidence of melanoma [12, 28].
3.4.3
Lesion Classification
In this stage, lesions are classified as melanomas or non-melanomas according to their extracted
features. The classification process is divided into two phases: the training and testing phases. In the
training phase a supervised learning algorithm is used to infer a function (classifier) from the training
data (training set). The training set is composed by a group of dermoscopy images alongside with
their known classification (label), assigned by a specialist. In the testing phase the inferred classifier
is used to assign labels to set of images (test set).
In order to determine the performance of the system, a set of images with known labels (validation
set) is given to the classifier. The performance of the system is then inferred by comparing the labels
assigned by the classifier with their true labels.
The accuracy of the classification is affected by the size of the training set. In general, a larger
training set provides more information to the classifier leading to a more accurate decision [28, 48].
Furthermore, it is important to guarantee a balance between the classes in the training set. The
difference between the number of melanomas and non melanomas should be as small as possible.
23
Since this is not always true, a possible solution for a training set with class disparities is the repetition
of images of the disadvantaged class [12].
Another factor that influences the classification task is the classifier itself. There are several types
of classifiers that can be chosen according to the feature distribution. Performance might be measured
in terms of Sensitivity (SE) and Specificity (SP). The former measures the percentage of correct
decisions in melanoma images whereas the latter measures the percentage of correct decisions
in non-melanomas. A lower SE is more alarming than a lower SP since a misclassification of a
melanoma may delay its excision and worsen the patient’s prognosis [22].
3.4.4
Related works
Several CAD systems for melanoma classification have been developed for the past two decades.
The first system was developed by Green et al. in 1991 [23]. This system adopted similar criteria to
the ABCD rule of dermoscopy. Lesions were represented by a global set of features which comprised
shape-, color- and size-related features. Then, classification is obtained by performing a LDA. The
overall scheme of this CAD system is represented in Figure 3.9.
The majority of the systems developed afterwards have also followed this model. Nevertheless,
different strategies of segmentation, feature extraction and classification have been adopted. However, the great majority of the systems [12, 28, 31, 49] holds to a global feature extraction strategy
based on the ABCD rule of dermoscopy. Most of them use shape-, color- and texture-related features.
However, more recently, a new approach has been adopted to describe the lesions [35, 54, 55].
Instead of considering that lesions exhibit uniform properties across their area, some systems assume
that lesions non-uniform after all. Therefore, rather than using a global set of features to represent
lesions, these systems divide the lesions into several regions, which are then described by local sets
of features.
Though all of these studies have contributed for the evolution of automated melanoma detection,
the study performed by Oka et al., and recently updated by Iyatomi et al., [31, 43] played a special
role. They developed an Internet-based algorithm which can be used by certified dermatologists
all over the world. The system receives a dermoscopy input image of the lesion and extracts its
color-, symmetry-, border- and texture-related features. Afterwards, according to those features, an
ANN classifies the lesion as a melanoma or a non-melanoma. Later on, if the user introduces the
histological information about the lesion, the labeled image can be added to the database and used
for training.
In order to more easily compare different systems, Table 1 contains a description of some of the
performed studies. For each study, information regarding the following topics was collected: authors
and year of publication, type of features used for classification, chosen classifier, size of the database
size and performance of the system.
Some inferences can be taken from Table 3.1. Firstly, up until now, classification has been based
on the information provided by various types of features used together. The majority of these works
gathers color-, texture-, shape- and size-related features in order to achieve a more accurate classi-
24
fication. Concerning the chosen classifiers, while in the earlier studies LDA was a popular choice
[5, 23, 24], later on, Artificial Neural Networks (ANN) became one of the most used classifiers
[17, 28, 31, 48, 49]. Furthermore, Logistic Regression (LR), Support Vector Machines (SVM) and
k Nearest Neighbors (kNN) have also already been used as melanoma classifiers in [9], [12] and [22],
respectively.
The databases used in these studies exhibit size disparities. The size of the databases varies
between a 77 [23] and 5363 [22] lesions and between 5 [23] and 407 [54] melanomas.
The performance achieved by these classifiers was compared in terms of SE and SP. The methods which attained a better performance were the ones developed by Rubegni et al. [48] , Celebi et
al. [12] and Iyatomi et al. [48]. From these studies, the results obtained by the last two are considered
more reliable since they were based on considerably larger databases. Results obtained with smaller
databases might not be as reproducible.
3.5
Color Features in CAD systems
Color is an important feature for image description and classification. Previously to CAD, color
was already used by dermatologists for melanoma classification in techniques such as the ABCD rule
of dermoscopy (see Section 3.3.1). According to these diagnostic techniques, a lesion with a greater
color variation had an increased chance of being a melanoma. In 1991, Green et al. introduced
the first CAD system for melanoma classification. This system was based on the ABCD rule of
dermoscopy and, thus, it considers color features for lesion classification. In their system, Green et
al. extracted features from the R, G and B channels. From each channel they computed the mean
values and the variances.
As in the systems developed by Green et al., almost all of these CAD systems [5, 9, 23, 24, 28,
48, 49, 51] only use color features directly extracted from the R, G and B channels and, hence, they
only consider the images represented in the RGB color space. The first color features to be extracted
were the channels’ intensity means and variances. The channels’ variance gives a lot of information
about the lesion. As it was previously mentioned, a melanoma usually has three or more colors and,
thus, a great color variance is associated to melanomas. However, in most cases, dysplastic nevi also
have a high variegation when compared to other skin lesions. Nevertheless, since the majority of the
skin lesions usually have one or two colors and, as a result, a small variance in each channel, this
color feature plays an important role in shortening the range of possible diagnosis [17]. The maximum
and minimum values of each channel can also provide information about the variation of the lesion’s
coloration. Additionally, more information is obtained computing the mean values of the channels,
since melanomas are associated to darker colors, lesions with darker mean colors can also indicate
a melanoma.
In 1994, Ercal et al. [17] transformed the RGB color space coordinates into L*a*b* coordinates.
The introduction of this new color space lead to a decrease in the number of false negatives. Later, in
2001, Ganster et al. [22] converted the RGB color space into the HSI color space, in which H stands
25
for hue, S stands for saturation and I stands for intensity. Maximum, minimum, mean and variance
were extracted from the normalized H and I channel.
In 2007, Celebi et al. [12] extracted color features using six different color spaces. In addition
to the features extracted from the R, G and B channels, the authors extracted color features from
the normalized R, G and B channels (rgb). Furthermore, the RGB color space was transformed into
the HSV, I1/2/3 (Otha space), l1/2/3 and the CIE L*uv color spaces. In this system, in addition to
the usual statistical features (mean and variance), centroidal distances were computed using all the
mentioned color spaces. The centroidal distance measures the difference between the geometric and
the brightness centroid. Smaller distances are achieved for homogeneous lesions. Another extracted
feature was color asymmetry in the RGB color space. Finally, by using the CIE L*uv color space
4x8x8 color histograms were extracted.
Though in the majority of the mentioned works color features were extracted globally, some CAD
systems [12, 22, 51] divide the lesion in more than one region and extract color features from all
those regions. For instance, in [22] and [12] the lesion is divided in its center and its periphery.
More recently, some CAD systems are based in the Bag-of-features model (see Chapter 5) and color
features, among others, are extracted from each region. For instance, [54] uses color moments as
local image descriptors.
On Table 3.2 the color spaces and color features used in previous works are synthesized.
3.6
Conclusions
Melanomas are one of the most dangerous pathologies in the world. Their increased growth rates
and metastization capability make the development of more efficient diagnostic algorithms increasingly important.
Dermatologists base their diagnosis on dermoscopy images and rely on medical diagnostic methods, such as the ABCD rule of dermoscopy and the 7-point checklist, to reach this diagnosis. Nevertheless, dermoscopy images might be of difficult interpretation and may lead to an incorrect or
inconclusive diagnosis. In case of doubt, the solution is to perform an histological examination.
Therefore, alternative diagnostic methods have been studied. For the past years, CAD systems
have been developed. These systems take advantage of the visual properties of the dermoscopy
images. Color-, texture- and shape-features, among others, are extracted from melanoma and nonmelanoma images and used to teach the system to distinguish melanomas from non-melanomas.
The final goal is to develop a system that receives a dermoscopy image of a skin lesion and returns
its correct diagnosis. These systems would then be used to aid dermatologists to reach a diagnosis.
However, the group of features which perform best has not yet been found. Researchers all
over the world have been testing several groups of features in order to obtain systems with better
performances.
Nevertheless, color features, which are widely used in other areas of image analysis, have been
neglected. Only in recent years, CAD systems using other color spaces rather than RGB have been
26
developed. Moreover, regarding color, the majority of systems is based on statistical features, such
as the mean intensity and variance of each color channel.
In this thesis, one will further explore the role of color features in the automated detection of
melanomas.
27
Authors(Year) Features
Green
et
al.(1991)[23]
Shape,
Size,
Boundary and
Color
Green
et Shape,
Size,
al.(1994)[24] Boundary and
Color
Ercal
et Shape, Boundal.(1994)[17] ary and Color
Andreassi et Geometry,
al.(1999)[5]
Color, Texture
and Islands of
color
Ganster et Geometry,
al.(2001)[22] Color, Border
and Symmetry
Rubegni et Geometry,
al.(2002a)[48] Color, Texture
and Islands of
color
Rubegni et Geometry,
al.(2002b)[49] Color, Texture
and Islands of
color
Hoffman et Symmetry, Boral.(2003)[28] der, Color and
Texture
Faziloglu et Color
al.(2003)[28]
Blum
et Geometry, Boral.(2004)[9]
der, Symmetry,
Color and Texture
Seidenari et Color
al.(2005)[51]
Celebi
et Shape, Texture
al.(2007)[12] and Color
Iyatomi
et Color, Symmeal.(2008b)[31] try, Border and
Texture
Classifier
Database
Mel Total
Performance
SE
SP
80%
90.8%
100%
70.8%
LDA
5
70
LDA
18
164
ANN
120
240
LDA
57
147
88%
81%
kNN
96
5363
87%
92%
ANN
57
147
93%
92.75%
ANN
200
550
94.3%
93.8%
ANN
95
2218
ANN
128
256
87.2%
82.1%
LR
84
837
88.1%
82.7%
LDA
95
459
87.5%
85%
SVM
88
564
93.33%
92.34%
ANN
198
1258
85.9%
86%
89%
83.1%
AUC=80%
AUC=93.3%
Situ
et
al.(2008)[55]
Texture
SVM
Naive Bayes
30
100
Zortea et al.
(2010)[35]
Situ
et
al.(2011)[54]
Texture
SVM
80
164
73.3%
73.9%
Color and Texture
SVM
407
1505
86.17%
84.68%
AUC=93.3%
Internetbased
AUC=82.21%
Table 3.1: Previous CAD systems for melanoma classification.
28
Comments
Authors(Year)
Green et al. (1991)[23]
Green et al. (1994)[24]
Color Space
RGB
RGB
Ercal et al. (1994)[17]
Andreassi et al. (1999)[5]
RGB
L*a*b*
Spherical color space
RGB
Ganster et al. (2001) [22]
HSI
Rubegni et al.(2002a)[48]
RGB
Rubegni et al.(2002b)[49]
Hoffman et al. (2004)[28]
Faziloglu et al.(2003)[28]
RGB
RGB
Blum et al. (2004)[9]
Seidenari et al. (2005)[51]
RGB
RGB
Celebi et al.(2007)[12]
RGB
rgb
HSV
I1/2/3
l1/2/3
Iyatomi et al. (2008b) [31]
CIE L*u*v*
RGB
HSV
Situ et al.(2011) [54]
RGB
Color Features
Means and variances of the color channels.
Means and standard deviations of the color channels.
Variances in the color channels;
Relative chromaticity;
Lesion mean values, deciles, quartiles and gradient
mean value;
Healthy skin mean values.
Maximum, minimum, average and variance of I and
H normalized color channels;
Number of different colors within the lesion and percentage of each of the 15 colors.
Mean values of the channels inside and around the
lesion;
Deciles and quartiles of R, G and B inside the lesion;
Variances of the color channels.
Melanoma colors determined using color histograms.
Red, blue, green and grey value.
Usage of color blocks:
Euclidean distances between blocks in the RGB
color space;
Mean, variance and maximum color differences;
Color asymmetry in the RGB color space;
Mean and standard deviation, of 3 regions (lesion,
inner periphery, outer periphery);
Histogram distances in the CIE L*u*v* color space;
Centroidal distances;
Ratios and differences between the means and
standard deviations over the three regions.
Minimum, average, maximum, standard deviation
and skewness for the whole tumor area, periphery
of the tumor area and the surrounding healthy skin;
Number of colors within the tumor area and peripheral tumor area;
Average color differences between the peripheral tumor area and the inside of the tumor area.
Color Moments.
Table 3.2: Color spaces and color features used in previous CAD systems for melanoma detection.
29
30
4
Lesion classification using global
features
Contents
4.1
4.2
4.3
4.4
4.5
Introduction . . . . .
Lesion Segmentation
Feature extraction . .
Classification . . . .
Conclusion . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
32
32
34
37
41
31
4.1
Introduction
In the first part of this thesis, lesions are assumed to be homogeneous and, according to this
assumption, they can be represented by a set of global parameters. The great majority of CAD
systems is based on the same assumption, but the ideal set of global parameters was not found yet.
In this chapter, one develops a CAD system in which lesions are solely represented by colorbased parameters. The purpose of this study is to determine the importance of global color features
for lesion description.
An overall description of this system is shown in Figure 4.1. As in the majority of works developed
in this area, the system has three main stages: image segmentation, feature extraction and lesion
classification.
Figure 4.1: Overall description of the CAD system for melanoma classification by using global features.
When an image is inserted into the system, it is first submitted to the image segmentation block.
In this block, each input image is used to compute a binary output image (segmentation mask),
which defines the region of the lesion. The segmentation masks were manually performed under the
supervision of an experienced dermatologist.
Since the border region of the lesion also plays an important role in medical dermoscopy analysis
(see ABCD rule), the lesion is divided into two disjoint regions: an inner region and the border.
Afterwards, the image proceeds to the feature extraction block. In this block, color features are
extracted from one of the existing regions of the lesion: the interior, the border or from the whole
lesion. Uni- and tridimensional color histograms, as well as generalized color moments are used,
separately, as global image descriptors. From this point on, lesions are represented by vectors of
features instead of by the images themselves.
Classification is the last stage of the system. In this block, two different classifiers, kNN and
AdaBoost, were trained to distinguish between melanomas and non-melanomas. The aim of this
block is to build a classifier which is able to mimic a dermatologist’s decision.
4.2
Lesion Segmentation
A dermoscopy image is a discrete color image defined as I : Ω → Z3 , where Ω is a subset of
Z2 and each point of the image is defined by (p1 , p2 ) ∈ Ω. Each point (p1 , p2 ) is associated to a
three-dimensional vector which contains the intensity of three color components (I1 (p1 , p2 ),I2 (p1 , p2 )
and I3 (p1 , p2 )).
A good dermoscopy image must include the whole lesion, because all the information provided by
the lesion is relevant. For instance, when performing the ABCD rule of dermoscopy, it is not possible
to draw a conclusion about any of the four parameters without having the whole lesion and, thus, it is
32
not possible to reach a diagnosis.
However, in order to obtain the whole lesion, some surrounding healthy skin will also be captured
in the image. Therefore, since we are only interested in the information provided by the lesion, it is
important to perform an image segmentation before any further image processing.
Thus, when a dermoscopy image is delivered to the system, it is first submitted to the image
segmentation block. Several segmentation algorithms are described in the literature [12, 30, 42, 53,
61]. See [53] for a comparison among several methods. However, all methods produce segmentation
errors when there is a low contrast between the healthy skin and the lesion. In this thesis, we preferred
to use segmentation masks manually performed under the supervision of an expert.
The output of the segmentation block is a discrete binary image B : Ω → {0, 1} which represents
the segmentation mask of image I and is computed according to
B(p1 , p2 ) =
1
0
(p1 , p2 ) ∈ Skin Lesion
otherwise.
(4.1)
Figure 4.2 shows a dermoscopy image and its respective binary mask.
Figure 4.2: Dermoscopy image (left) and its correspondent binary mask (right). The white area of the mask
corresponds to the skin lesion, whereas the black one represents the healthy tissue.
The set of active pixels of B defines region R. In this system, features can be extracted from
R or from two disjoint subregions of R: R1 and R2 . Region R1 includes the inner part of the lesion,
whereas region R2 comprises the lesion’s border, see Figure 4.3. The border of a lesion also contains
valuable information for its diagnosis. For instance, the color transition between the lesion and the
healthy skin is one of the criteria used in the ABCD rule of dermoscopy. A sharp color transition is a
typical melanoma feature. Therefore, the division of R into R1 and R2 aims to take advantage of this
characteristic to improve the classification task.
The division was performed by applying an erosion to B, using a disk E as the structuring element
[27]
(B E)(p1 , p2 ) = {(p1 , p2 ) ∈ Ω|(p1 + u, p2 + v) ∈ B∀(u, v) ∈ E} .
(4.2)
The active pixels of B1 , where B1 = B E, define region R1 . B2 , by its turn, is obtained as follows
33
Figure 4.3: Regions defined by the two binary masks. The inner region, R1 , is represented by the red region of
the image. The border, R2 , is represented by the yellow region of the image.
B2 = B − B1 .
(4.3)
The active pixels of B2 , by their turn, define region R2 . Figure 4.3 shows the two regions R1 (red)
and R2 (yellow).
4.3
Feature extraction
Color has played a major role in image processing and it has been successfully used in contentbased image retrieval (CBIR), image segmentation and in object recognition. CBIR appeared as a
response to the big boom of technology, and consequent increase of information, as well as due to
the emergence of the World Wide Web. In order to manage this new amount of information, CBIR
appeared as a new alternative for image storage and retrieval [64]. Formerly, image retrieval was
performed by using text-based methods. By using these methods, images are stored alongside with
one or more keywords. However, these keywords have to be assigned manually, which is a very
time consuming task. Furthermore, a great amount of unnecessary data is produced because the
complete image and respective labels have to be stored.
On the other hand, CBIR uses visual information instead of keywords as image descriptors. The
visual information usually comprises color, texture and/or shape features, which are extracted by
using automated methods. These features are a compact representation of the images and, thus, are
easier to store and to retrieve. Therefore, when compared to text-based techniques, CBIR methods
are cheaper, faster and more efficient.
One of the most important steps is to choose which features to use. For instance, in a database
of football teams, the color of the equipment is the most intuitive manner of identifying which team is
represented in the image [34]. For this reason, in this case, it is important to rely on color features.
Shape or texture features are similar between the images and, hence, not appropriate to identify the
different football teams.
34
CBIR has also played an important role in medical applications. Hospitals have to store an everincreansing amount of medical images of a great variety of specialties and for a large number of
patients.
The great majority of medical images are absent of color or are obtained under very controlled
conditions. As a consequence, color information is not relevant in these cases and images are mainly
represented by texture features. However, in the case of dermoscopy, photographs are used and,
thus, color features provide valuable information [40].
Color-based features are the ones which have shown most promising results due to their invariance properties, which include invariance to noise, to image degradation and to variations in size,
resolution and orientation [64].
Color histograms have become popular descriptors because, despite their simplicity, they have
achieved good results. For these reasons, the uni- and tridimensional color histograms are the first
descriptors to be used in this system [58]. Furthermore, a set of generalized color moments, introduced by Mindru et al. [38], are also used as image descriptors because, in addition to the photometric
information provided by the color histogram, they also provide spatial information.
4.3.1
Color Histogram
One of the advantages of choosing color histograms as image descriptors arises from their invariance properties. Color histograms do not provide any local spatial information, they only give
information about the image’s color distribution. Therefore, they are invariant to the image’s rotation
and translation. Color histograms have also shown to be tolerant to changes in the image’s scale,
partial image occlusions and blurred regions of the image [58].
A color histogram is an image descriptor that splits the color range into disjoint bins and measures
the number of times a pixel color falls in each bin. These descriptors do not contain any spatial
information, they only provide information about the color probability distribution of each image.
Color histograms can be uni-, bi- and tri-dimensional according to the number of color channels
used to compute the histogram. In this work, only uni- and tri-dimensional color histograms were used
as descriptors.
• Unidimensional histogram: Histogram computed by only considering the intensity values of
one color channel. Considering Nc the number of bins of channel c, the unidimensional histogram is computed as follows
X
hc (i) =
bci (Ic (p1 , p2 ))
i = 1, ..., Nc ,
(4.4)
(p1 ,p2 )∈Z2
where bci (I) is the characteristic function of the ith bin of channel c
bci (Ic (p1 , p2 ))
=
1
0
Ic (p1 , p2 ) ∈ ith bin of channel c
otherwise
(4.5)
35
and Ic (p1 , p2 ) is the intensity of pixel (p1 , p2 ), in color channel c. Afterwards, each histogram is
normalized so that the sum of its elements equals 1.
Each image is then represented by a feature vector x, which is composed by the color histograms of the three color channels according to

h1 (1)


..


.


 h1 (N1 ) 


 h2 (1) 




..
x=
,
.


 h2 (N2 ) 


 h3 (1) 




..


.
h3 (N3 )

(4.6)
in which hc ∈ RNc for c = 1, 2, 3 and x ∈ RN1 +N2 +N3 .
• Tridimensional histogram: A tridimensional histogram simultaneously considers the intensity
values of the three color channels I1 , I2 and I3 and is computed according to
X
x(i, j, k) =
b1i (I1 (p1 , p2 )) × b2j (I2 (p1 , p2 )) × b3k (I3 (p1 , p2 )).
(4.7)
(p1 ,p2 )∈R2
Each channel is quantized by Mc bins and, thus, x ∈ RM1 ×M2 ×M3 . As for the unidimensional
histogram, color histograms are computed in seven different color spaces.
4.3.2
Generalized Color Moments
Generalized color moments were defined by Mindru et al. in 1999 [38]. These moments are
derived from the traditional color moments, but, in addition to the photometric information, they also
abc
provide spatial information about the image. The generalized color moments, Mpq
, were proposed
for the RGB images and are defined by using
abc
Mpq
=
Z Z
pp1 pq2 [I1 (p1 , p2 )]a [I2 (p1 , p2 )]b [I3 (p1 , p2 )]c dp1 dp2 ,
(4.8)
R
where I1 (p1 , p2 ), I2 (p1 , p2 ), I3 (p1 , p2 ) are, respectively, the intensities of pixel (p1 , p2 ) in the red, green
and blue channels [38, 58]. In this thesis, one also applied the generalized color moments to other
color spaces.
abc
The feature Mpq
is the generalized color moment of order p + q and degree a + b + c. Since color
moments of high order and degree are unstable, only moments of degree smaller or equal to 2 and
order smaller or equal to 1 are considered. Furthermore, moments of degree 0 are excluded because
no color information is provided. Therefore, (a, b, c) = {(1, 0, 0) , (0, 1, 0), (0, 0, 1) , (1, 1, 0), (1, 0, 1),
(0, 1, 1), (2, 0, 0), (0, 2, 0), (0, 0, 2)} and (p, q) = {(0, 0), (0, 1), (1, 0)}. Thus, the descriptor based on the
generalized color moments has 27 features. Color moments of degree 1 are (p,q)-intensity moments
whereas the ones of order 0 are non-central moments of the color distribution [38, 58].
36
In practice, this expression needs to be normalized since the size of the lesion varies and it is not
fair to compare the moments of a large lesion to the ones of a smaller one. Therefore, in order to
guarantee scale invariance, the color moments undergo the following transformation
abc 0
(Mpq
) =
abc
Mpq
#Q1+
p+q
2
,
(4.9)
where #Q is the number of pixels within region R.
4.4
Classification
In the classification block a classifier automatically assigns a label to a lesion. However, the system
is only able to perform the classification task after a training phase. In the training phase, a supervised
learning algorithm is used to infer a function (classifier) from the training data. The training data is
composed by a set of skin lesions and their respective labels. The classifier extrapolates the relation
between the training samples and their respective labels to new and different samples [8].
It is not recommendable to feed the whole image of lesions to the classifier. The images’ large size
turns the training process much slower and less reliable: is is not possible to learn a classification rule
for this huge amount of information (millions of color coefficients). Therefore, the chosen alternative
is to represent each image by a vector containing relevant image features. Since the size of a feature
vector is smaller than the image, the training process becomes faster and cheaper.
The goal of the classifier is to learn how to associate the features of the image to a label, melanoma
or non-melanoma, mimicking the decision of a dermatologist. However, it is important to select discriminant features of the image in order to have a more accurate classification.
There are several classifiers that could be used in the system, such as SVM, ANN, kNN, Adaboost,
among many others. However, one chose to use the kNN and Adaboost as classifiers. Even though
kNN is computationally complex, it is simple to implement and achieves good performances.
Adaboost is more sophisticated since it bases its decision on multiple classifiers. It also performs
feature selection since each classifier performs its decision based on the most discriminative feature.
Furthermore, Adaboost is also of simple implementation.
4.4.1
k Nearest Neighbors
The kNN classifier is a simple algorithm which performs its decision based on the labels of the
closest training samples to the test sample. This classifier is based on the assumption that the
closer the test sample is from the training samples of a certain class, the higher is the probability
of belonging to that class. By using this classifier, one has to define the number of neighbors, k,
which are considered in the decision. The smaller the volume occupied by the test sample and its
k-nearest neighbors, the higher is the probability of performing the right decision. The ideal situation
is to manage to occupy a small volume that still contains a large number of neighbors, which is more
likely to be achieved by using a number of samples that goes to infinity.
37
The 1-NN error rate can be compared to the minimum possible error rate (Bayes error rate), as
follows. When the number of samples goes to infinity the maximum error rate can be bounded by
twice the Bayes error rate, which can still be considered a small rate [16]. Let Pe∗ be the Bayes error
rate, the bounds of the nearest neighbor rate of error Pe are
Pe∗ ≤ Pe ≤ Pe∗ (2 −
c
× Pe∗ ),
c−1
(4.10)
where c is the number of classes. In this thesis we are solely dealing with two classes: melanomas
and non-melanomas.
As k increases, the upper bound decreases and gets closer to the Bayes error rate. However,
specially for a limited number of samples, a large k may include neighbors which are far from the test
sample compromising the decision.
Let T = (x(1) , y (1) ), . . . , (x(M ) , y (M ) ) be a training set, in which x(i) ∈ Rd denotes the ith feature
vector and y (i) ∈ {0, 1} the corresponding label. The classification of a new pattern x(te) ∈ Rd
is performed by computing the distances between x(te) and all the training vectors. The k closest
training feature vectors and their respective labels are selected. Afterwards, the classifier assigns to
x(te) the most voted label y (te) among the selected training vectors.
In this thesis, three different distance measures are used in the kNN algorithm. In the case of the
color histograms, one uses the Euclidean Distance (ED), the Kullback-Leibler Divergence (KLD) and
Histogram Intersection (HI). In the case of the generalized color moments only the former distance
measure is used. These measures are further explained in the following Subsection.
4.4.1.A
Distance measures
Several distortion measures have been used, in image retrieval and object recognition, to compare images [39]. In this thesis, two histogram comparison measures (HI and KLD) and an image
comparison measure (ED) were chosen to compute similarities between images.
• Euclidean distance
The ED is the most popular metric used to measure the distance between two points in a
N -dimensional space. The ED between the N -dimensional vectors h = {h1 , h2 , ..., hN } and
g = {g1 , g2 , ..., gN } is given by:
v
uN
uX
dE (h, g) = t
(hi − gi )2 .
(4.11)
i=1
• Histogram intersection
HI was proposed by Swain et al. [58] and it is a similarity measure used to compare two histograms. The similarity is determined by computing the intersection between the intensity of
the bins of the histograms. Therefore, the similarity between h1 and h2 , where both histograms
have the same number of bins N , is computed as follows:
38
sHI (h1 , h2 ) =
N
X
min(h1 (i), h2 (i)).
(4.12)
i=1
The greater the value of sHI , the more similar are the histograms. We can convert this similarity
measure into a distance by computing
dHI = 1 − sHI .
(4.13)
• Kullback-Llebler divergence
Considering two probability distributions (histograms) h and g, the KLD [52] is given by
dKL =
N
X
h(i) × ln
i=1
h(i)
,
g(i)
(4.14)
where it is assumed that g(i) > 0 for i = 1, ..., N and h(i) × ln h(i)
g(i) = 0 if h(i) = 0. However,
since in the case of color histograms it is very likely to have zeros in the vectors, the following
perturbation was introduced
g(i) =
10−4 , if g(i) = 0
.
g(i), otherwise
(4.15)
The lower dKL is, the more similar are h and g.
4.4.2
AdaBoost
Adaboost constructs a strong classifier by using a weighted combination of simple learning algorithms, also denominated as weak classifiers. The weight assigned to each weak classifier depends
on its performance over the training set. The weak classifiers with a higher accuracy will also have a
higher weight. The names weak and strong classifiers derive from their performance, the best weak
classifier may not reach an accuracy above 51% [60]. The strong classifier, on the other hand, results from the combination of these weak classifiers and their associated weight and can achieve high
performances.
The weak classifiers are iteratively trained. In each iteration the parameters used to construct the
weak classifier are the ones that minimize the training error. In this work, a binary classifier based on
a single feature, known as Decision Stump (DS) was used as weak classifier. A DS works as follows
h(xj , p, θ) =
1, if pxj < pθ
,
0, otherwise
(4.16)
where xj is the j th feature of the vector of features x, p is a polarity and θ a threshold.
The parameters xj , p and θ of the first weak classifier are chosen by considering that every training
example has the same weight wi,t , where wi,t represents the weight given to the ith training sample
in the tth iteration, in this case t = 1 . After the first iteration the examples are re-weighted. A higher
weight is assigned to the misclassified examples, so that the next classifier prioritizes their correct
39
classification, and so on. The error of the strong classifier is bounded by exp(−2
PT
t=1
( 12 − t )2 ),
where t is the error of the tth weak classifier. If the majority of the weak classifiers have t < 21 , the
classification error will decrease exponentially.
From all the features within the feature vector, the weak classifier only selects a single feature xj to
perform the classification task. This is the one that minimizes the training error. Therefore, depending
on the number of weak classifiers, AdaBoost may also perform feature selection.
Considering the training set (x(1) , y (1) ), (x(2) , y (2) ), ..., (x(M ) , y (M ) ) , where M is the number of
training samples and x(i) and y (i) are, respectively, the feature vector and the label of the image.
A strong classifier with T weak classifiers is constructed as follows [60].
1. The weights w are initialized according to
wi,1 =
α
2m ,
1
2n ,
if y (i) = 1
.
if y (i) = 0
(4.17)
where i = 1, ..., M , m is the number of melanomas of the training set and n is the number
of non-melanomas. Parameter α allow us to assign a different weight to each class. In these
thesis, one will use α ≥ 1 and, thus, assign a higher or equal weight to melanomas granting
them greater or equal importance, respectively.
The weight given to each sample represents the importance of assigning a correct label to that
sample. In this case, whereas in the first iteration all the samples within a class are assigned
with the same weight, in the following iterations a higher weight will be given to the misclassified
samples so that the next weak classifier prioritizes their classification.
2. For t = 1, ..., T
• The weights are normalized as follows
wt,i
wt,i = PM
j=1 wi,j
i = 1, ..., M.
(4.18)
• The training error, t , is computed according to
t = minxj ,p,θ
X
(i)
wi h(xj , p, θ) − y (i) .
(4.19)
i
From all the possible combination of the parameters xt , p and θ, the one that generates the
smallest training error is used to define the weak classifier ht (x) = h(x, pt , θt ).
• Thereafter, the training examples are re-weighted. Misclassified examples are assigned
with greater weights
wt+1,i = wt,i β 1−ei
where
t
1−t
and
ei =
40
(4.20)
0, if example i was misclassified
.
1,
otherwise
(4.21)
3. Finally, the strong classifier is defined by combining the output of the weak classifiers
C(x) =
1,
0,
PT
t=1
log( β1t )ht (x) ≥
1
2
PT
log( β1t )
,
otherwise
t=1
(4.22)
where ht corresponds to the weak classifier trained in the tth iteration.
4.5
Conclusion
In this chapter, a new CAD system for melanoma detection is proposed. As other systems described in the literature [12, 22, 31, 49], it considers the lesions homogeneous and uses a set of
global parameters to describe each lesion. However, it differs from the others, mainly because it only
uses color features to represent the images.
Three color descriptors are proposed to, separately, describe the lesions. These descriptors are
the unidimensional and tridimensional color histograms, as well as the generalized color moments.
Whereas the former two are promising due to their invariance properties, the latter provides additional
spatial information.
Classification is performed by using one of two classifiers, kNN or Adaboost. kNN is simple to
implement. Furthermore, the classes may or may not be linearly separable, it does not influence
the classifier. Another advantage is the fact that it does not depend on many parameters, only the
distance measure and the number of neighbors, k, have to be defined.
On the other hand, kNN may present some problems when dealing with descriptors with a large
number of features because it does not perform feature selection. As a consequence, computing the
distances between vectors gets more complex and time consuming. Moreover, noisy features affect
the computed distances and, thus, affect the classifiers’ decision [19].
As kNN, Adaboost is simple to implement and has few parameters to tune (α and T ) [21, 60].
Furthermore, this classifier offers the possibility of assigning a higher weight to one of the classes,
by changing parameter α. This property is particularly beneficial in this study, because the misclassification of a melanoma is costlier than the misclassification of a non-melanoma. Another great
characteristic of Adaboost is that it also performs feature selection, since for each weak classifier only
the most discriminative feature is selected.
On the down side, Adaboost is also sensitive to noisy data and outliers. Furthermore, its performance is highly dependent on the chosen weak classifier.
41
42
5
Lesion classification using local
features
Contents
5.1
5.2
5.3
5.4
5.5
5.6
Introduction . . . . . . . . . . . . . . . . .
Keypoint extraction . . . . . . . . . . . . .
Feature description . . . . . . . . . . . . .
Vocabulary construction . . . . . . . . . .
Image Representation and Classification
Conclusion . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
44
45
46
47
48
49
43
5.1
Introduction
The Bag-of-features (BoF) method, used in image retrieval, is inspired on the Bag-of-words (BoW)
model used in natural language processing. BoW has been used to classify text and works as follows.
Firstly, a vocabulary is constructed by using a set of training sentences. The vocabulary will contain
all the different words present in the training sentences. After the construction of the vocabulary, the
training and test sentences will be represented by a histogram, in which each bin represents a word
of the vocabulary. The intensity of each bin is proportional to the number of times that specific word
is repeated in the sentence. Classification is based on these histograms [1].
The BoF model uses the same principles of BoW, but instead of being applied to text, it is applied to content-based image descriptors. BoF constructs a vocabulary by using image descriptors,
denominated visual words, instead of actual words. Models based on BoF are simple to apply and
achieve good results [34]. As a result, BoF became a popular tool in CBIR.
In this chapter, one proposes a CAD system based on the BoF model. As in the previous chapter,
the system solely relies on color features to perform the decision. Figure 5.1 shows the overall configuration of the developed system. This system has six main stages: (1) image segmentation, (2)
keypoint extraction, (3) feature description, (4) vocabulary construction, (5) image representation and
(6) classification.
Figure 5.1: Overall description of the CAD system for melanoma classification by using local features.
The system works as follows. Firstly, a dermoscopy image is inserted into the system and proceeds to the image segmentation block. This block computes a binary image (segmentation mask)
which discriminates lesion pixels from healthy ones. In this thesis, the segmentation masks were
manually obtained under the supervision of an experienced dermatologist. Afterwards, the image
proceeds to the keypoint extraction block where a set of pixels is selected. These pixels (keypoints)
are equally spaced in the image domain according to a regular grid. A set of rectangular patches are
then defined, each of them being centered at a different keypoint. These patches do not overlap since
their sizes are equal to the smallest distance between the keypoints. Moreover, the patches which do
not considerably overlap with the lesion are discarded.
In the feature extraction block color features are extracted from each patch. Each figure is, therefore, represented by as many local feature vectors as patches. In this system the patches are represented by unidimensional color histograms, generalized color moments or by their average color, in
seven different color spaces.
The next stage is the construction of the color-based visual vocabulary. The visual vocabulary is
computed by using a k-means clustering algorithm, which selects the K most representative visual
44
patterns of all the images in the training set.
Each image can now be represented by a histogram of visual words. This histogram is built as
follows. Each local feature of the image is associated to its most similar visual word. The intensity of
each bin is proportional to the number of associated local features.
The final stage is the lesion classification. In this system, kNN and AdaBoost were trained to
classify images according to their histogram of visual words.
5.2
Keypoint extraction
Keypoint extraction is the second stage of this process. In this stage, a set of image pixels is
selected. The selection can be performed randomly or by choosing interest points of the image, such
as edges and corners. These methods of keypoint extraction are, respectively, denominated random
sampling and interest point detection. Nevertheless, there are other methods which can be used in
these stage.
For this system, one chose to use a method named regular grid detection, which is a simple
method in which the keypoints are obtained at regular intervals based on a regular grid defined in the
image. This method may have the disadvantage of selecting keypoints placed in regions that do not
contain any relevant information about the image. Furthermore, images may contain different points
of interest close to each other which will be included in the same patch and whose information will not
be discriminated. Nevertheless, this keypoint extraction method is simple and has already proven to
provide a good performance in some systems [54].
The image features will not solely rely on the information provided by the keypoints. Around each
keypoint, a rectangular patch containing the keypoint and its surrounding pixels is defined and the
local features are extracted from the whole patch.
In this work, the image is split into a set of non-overlaping blocks with a height and width of δi and
δj , respectively. Figure 5.2 shows the local regions of a dermoscopy image determined by using the
implemented regular grid detector.
Figure 5.2: Representation of the regular grid over the dermoscopy image.
Since we are solely interested in the skin lesion, we discard the blocks which do not have a
45
significant overlap with the lesion by using the segmentation mask computed before. The condition
established to determine which blocks should be discarded is given by
Area(R ∩ P (j) ) >
Area(P )
,
2
(5.1)
where R and P (j) are, respectively, the region defined by the active pixels of the segmentation mask
B and the j th patch of the image. Figure 5.3 (left) shows a regular grid applied to a segmentation
mask, where the white region of the image represents R. If at least half of the area of the patch does
not overlap with R, the patch is discarded. Figure 5.3 (right) shows the dermoscopy image and a
representation of all its non discarded patches.
Figure 5.3: Local feature extraction: regular grid and segmentation mask (left) and valid patches (right).
5.3
Feature description
In this stage, local feature extraction is performed. Each one of the previously selected patches
is described by a feature vector. In general, feature vectors might include one or more descriptors,
which can be shape-, texture- or color-related, among others. As the aim of this thesis is to perform
melanoma classification based on color, only color features will be considered.
In this work, three different color descriptors were separately used. For the same reasons stated
in the previous chapter, unidimensional color histograms as well as generalized color moments were
used as color descriptors. However, due to time limitations, the tridimensional color histogram was
not used as an image descriptor in this system. Instead, we chose to use mean color vectors, as
image descriptors, since they are the simplest features which can be defined in a block. As the first to
descriptors are already explained in Section 4.3, only the mean color vector descriptor is explained in
further detail in this section.
• Mean color vector: A mean color vector is a vector containing the mean intensities of the three
color channels. Let Ic (p1 , p2 ) be the intensity of pixel (p1 , p2 ) of channel c, the mean intensity of
channel c of a patch is given by:
46
Pδi +i Pδj+j
fc =
p2 =j Ic (p1 , p2 )
p1 =i
δi × δj
(5.2)
,
where δi and δj represent the patch dimensions. Therefore the descriptor x ∈ R3 is defined as


f1
x =  f2  .
f3
(5.3)
(i)
Considering that the feature vector of region j of I (i) is represented by xj ∈ Rd , where d is the
vector dimensionality, i = 1, ..., N and j = 1, ..., M i , all the local features of the image i will be given
by
h
i
(i)
(i)
(i)
X(i) = x1 , x2 , ..., xM i ,
(5.4)
where M i is the total number of valid patches of image i.
5.4
Vocabulary construction
The next step is the construction of the visual vocabulary. This stage only occurs during the
training phase. The visual vocabulary must contain the K prototypes representing all the visual
features extracted from the training images. This can be achieved by using the k-means clustering
algorithm. This algorithm groups the visual features into K clusters according to their spatial pattern.
Visual features with a similar spatial pattern are within the same cluster. The visual vocabulary will be
formed by a representative visual features from each cluster [35].
1. Feature extraction and feature description are performed on the training set and a global matrix
containing all the local features of all the training images is obtained. This global matrix is
denominated by G and is given by
h
i
G = X(1) , X(2) , ..., X(N ) ,
(5.5)
where X(i) is given by (5.4).
2. Each column of G is a vector in a P -dimensional space, in which P is the number of features of
the descriptor.
3. K random vectors of length P are generated as the initial centroids of the K-means clustering
algorithm.
4. K-means clustering:
(a) K clusters are computed based on the euclidean distance between the centroids and the
descriptors
47
v
uM
uX
=t
(x̂i − xi )2
dEuclidean
(5.6)
i=1
where x̂i is the ith feature of a centroid and xi is the ith feature of a local descriptor. Each
descriptor is included in the cluster which generates the smallest euclidean distance.
(b) The new centroids result from computing the mean of all the vectors within each cluster
and are obtained as follows
PNj
x̂j =
x(i)
i=1
nj
∀x(i) ∈ clusterj,
(5.7)
where Nj is the total number of feature vectors belonging to cluster j.
(c) Clustering stops when the following condition is verified
max(d1 , d2 , ..., dK ) ≤ θ
(5.8)
where θ is a predefined threshold and
dj =
M
X
∗
x̂j (i) − x̂j (i),
(5.9)
i=1
here x̂j and x̂∗j are, respectively, the current and the former centroids of the j th cluster.
5. The final group of centroids forms the visual vocabulary
W = {w1 , . . . , wK } ,
(5.10)
where wj = x̂j .
5.5
Image Representation and Classification
Images are represented by an histogram of visual words, in which each bin represents a visual
word. Each local feature vector of an image is associated to its closest visual word of the visual
(i)
(i)
vocabulary, computed during the training phase. Let us define the closest visual word to xj as wj ,
(i)
hence wj = wk if
(i)
(i)
dE (wk , xj ) ≤ dE (wl , xj )
∀l = 1, ..., K.
(5.11)
The intensity of a each bin depends on the number of local feature vectors associated to that
visual word (5.12) [34] and is computed as follows
i
n(wk |I i ) =
M
X
j=1
48
(i)
δ(wk − wj ),
(5.12)
where
δ(p) =
1,
0,
if p = 0
.
if p 6= 0
(5.13)
Classification is performed by comparing the histograms of visual words of each image. As in
Section 4.4, both the kNN and Adaboost were used as classifiers.
5.6
Conclusion
This chapter proposes a CAD system for melanoma classification by using local features. This
system differs from the one proposed in the previous chapter because it rejects the idea of lesions
being uniform across all their area. Thereby, lesions cannot be represented by a set of global parameters, instead they are divided into several patches which are considered to be homogeneous. Each
patch is then represented by a set of color features.
From all the patches of the training set, by using the K-means algorithm, the K most representative patches are selected and form the constructed visual vocabulary. An image is represented by
associating each of its patches to a word of the visual vocabulary computed during the training phase.
Then, a histogram of visual words is computed to represent the lesion. Thus, the comparison between
lesions is performed by comparing their histograms of visual words.
Once again, three color descriptors are proposed to separately describe the patches. The unidimensional color histogram and the generalized color moments are now used as local descriptors.
However, because using the tridimensional histogram as a local descriptor is a very time consuming
task, one chose to use a mean color vector as the third local descriptor.
Implementing a CAD system by applying the BoF model is a more complex task, however some
good results have already been achieved [55]. A disadvantage of this system, regarding the previous
one, is that it has more parameters to tune (δ and K in addition to the ones of the first system) and,
thus, parameter optimization is more time consuming.
49
50
6
Results
Contents
6.1
6.2
6.3
6.4
6.5
6.6
6.7
Database . . . . . . . . . . . . . . . . . . . . . . . . .
Preprocessment . . . . . . . . . . . . . . . . . . . . .
Evaluation metrics . . . . . . . . . . . . . . . . . . .
Cost function . . . . . . . . . . . . . . . . . . . . . . .
Global Methods . . . . . . . . . . . . . . . . . . . . .
Local Methods . . . . . . . . . . . . . . . . . . . . . .
Comparison between the global and local systems
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
52
52
52
53
54
62
70
51
This chapter describes an experimental evaluation of the methods proposed in this thesis, using a
database of dermoscopy images.
6.1
Database
The dataset used in this work is composed by dermoscopy images collected from the database
of Hospital Pedro Hispano, in Matosinhos. From this database, 148 images of melanocytic lesions
were selected. The dataset comprises 14 (≈ 9%) melanomas and 134 (≈ 91%) non-melanomas. The
images were obtained by using a dermatoscope with an amplification of 20× during routine clinical
examinations.
The images were compressed into the JPEG format. Each image is represented in the RGB color
space and, thus, is composed by three channels, one for each primary. Each color channel contains
information about the intensity distribution of that color throughout the image.
Image classification was performed by an experient dermatologist and, in most cases, the diagnosis was confirmed by histological examination.
6.2
Preprocessment
The preprocessement of the images included hair and reflection removal. Hair removal was performed by using directional filters, whereas reflection removal was performed by using a thresholding
algorithm. The gaps caused by the artifacts removal were filled by using an inpainting algorithm [7].
6.3
Evaluation metrics
The assessment of the performance of the system is done by comparing its decision regarding a
set of lesions with their true diagnosis (ground truth). This diagnosis was performed by an experienced
dermatologist.
Two statistical techniques were used to assess the performance of the systems: LOOCV and
10-fold cross-validation, according to the complexity and time consumption of the task. LOOCV is
performed by using all the images in the dataset and their respective labels to train the system,
except the one we intend to classify. This technique can be quite time consuming since the classifier
is trained as many times as the number of images to classify.
In a k-fold cross-validation the dataset is divided into k disjoint test sets. For each test set there
is a complementary training set that contains the remaining lesions. In fact, LOOCV is a k-fold crossvalidation where k equals the length of the dataset.
The dataset is greatly unbalanced, there is a total of 14 melanomas for 134 non-melanomas. The
unbalanced data can be rather problematic when training some classifiers, such as the kNN. For this
reason, we balanced the data by repeating the patterns of the minority class (melanomas) until both
classes had the same number of training patterns.
52
Statistical methods were used to assess the performance of the system. When comparing the
ground truth with the decision of the classifier we can determine the number of True Positives (TPs),
False Positives (FPs), True Negatives (TNs) and False Negatives (FNs).
The melanomas which are accurately classified by the system as such are named TPs, whereas
the misclassified ones are denominated FNs. On the other hand, the non-melanomas that are correctly classified by the system are referred to as TNs, while the ones that were wrongly classified as
melanomas are denominated FPs.
To statistically classify the system’s performance, the Sensitivity (SE) and Specificity (SP) of the
system are computed. These measures are, respectively, the percentage of correctly classified
melanomas among all melanomas in the dataset and the percentage of correctly classified nonmelanomas among all the non-melanomas in the dataset, and are computed as follows [18]:
SE =
#T P
#T P + #F N
(6.1)
SP =
#T N
.
#T N + #F P
(6.2)
and
The best descriptor can be chosen based on these measures. The Receiver Operating Characteristic (ROC) space plots the sensitivity SE (correct detection probability) as a function of 1-SP (false
alarm probability) and helps to perform a comparison between the performance of the systems.
An ideal system would assign the correct label to all data and its performance would be represented by point (0,100) in the ROC space. On the contrary, the worst performance occurs when the
wrong label is assigned to all the data, in this case the performance is represented by point (100,0) in
the ROC space [18].
6.4
Cost function
To evaluate which combination of SE and SP values indicates a better performance of the system
a cost function was established. In the context of this work, having a FN is more severe than having
a FP. Even though the misdiagnosis of a non-melanoma may subject a patient to unnecessary treatment and emotional distress, misdiagnosing a melanoma delays a possible treatment and decreases
the patient’s survival chances.
Therefore, it is not accurate to consider that a FN and a FP have the same cost in the system’s
performance. In these experiments, it was considered that a system’s FN costs twice more than a FP.
Denoting by CF N the cost of a false negative and by CF P the cost of a false positive, then
CF N = 2 × CF P .
(6.3)
CT = CF N × P (F N ) + CF P × P (F P ) ⇔
(6.4)
The average cost (CT ) is
53
CT = 2 × CF P × (1 − SE) + CF P × (1 − SP ) ⇔
CT = 3 − 2 × SE − SP,
if CF P = 1.
(6.5)
(6.6)
For the sake of simplicity we will normalize this expression, leading to
CT = 1 −
2
1
× SE − × SP .
3
3
(6.7)
The best performance is achieved for the lowest CT value.
The cost as a function of SE and 1 − SP is represented in the ROC space (see Figure 6.1) by a
blue dotted line. At each point, the cost is determined by drawing perpendicular line which connects
that point to the cost line. The closer the intersection point is from the best performance, point (0,100),
the better is the performance.
6.5
Global Methods
In this section we will analyze the performance of the CAD system based on the global feature
extraction method (global system). This system was tested by using three global image descriptors
and two classifiers, which amounts to six different systems that are covered in this section.
Each one of the three steps of the system requires the optimization of a large number of parameters. However, for time and simplicity’s sake, only certain parameters were optimized.
At the feature extraction level, one of the parameters to be optimized was the region from which
features were extracted. There are four possible regions: R, R1 , R2 and the fusion (concatenation) of
R1 and R2 , see Section 4.2.
Regarding descriptor related parameters, the number of bins of both the uni- and the tridimensional
histograms was also optimized.
At the classification level the parameters to be optimized depend on the chosen classifier. The first
classifier to be used was the kNN. For this classifier two parameters were tuned, the number of neighbors and the distance measure used to compute the similarity between feature vectors. Concerning
the former parameter, one chose to test ten different k values, k ∈{5, 7, 9,11, 13,15, 17,19, 21, 23}. Regarding the distance measures, three different distances were used: the ED, KLD and HI. However,
because the last two can only be used to compute the distance between histograms and since the
generalized color moments, as image descriptors, do not generate histograms, only the ED is applied
in this case.
The second classifier to be applied was AdaBoost. This classifier also requires the optimization of
two parameters: T and α, which are, respectively, the number of weak classifiers and the weight given
to the class of melanomas. For each system, we tested four different values of T , T ∈ {2, 5, 10, 20},
and two different α, α ∈ {1, 2}.
In each system, the best set of parameters is determined for seven color spaces: RGB, HSV, HSI,
HSL, O1/2/3, L∗ a∗ b∗ and L∗ uv.
54
After parameter optimization and determination of the best configuration of each system, we will
analyze the features obtained by using this best configuration and compare the differences between
melanoma’s and non-melanoma’s feature vectors.
6.5.1
Unidimensional Color Histogram
In the case of choosing the unidimensional color histogram as the image descriptor one has to
optimize the number of bins of the histogram. The optimization was performed by testing the system
for all numbers of bins within 15-50.
Therefore, in the system using kNN as the classifier the performance was computed for 7 color
spaces, a range of 36 numbers of bins, 4 regions, 3 kNN similarity measures and 10 k values. This
means that we have considered 30204 different configurations, each of them trained and tested using
the LOOCV.
In the system using AdaBoost as the classifier, the performance was computed for the same
number of color spaces, bins and regions. Furthermore, it was computed for 2 values of α and 4
different T values. Therefore, this system was trained and tested, using 10-fold cross-validation, for
8064 different configurations. We used 10-fold cross validation instead of LOOCV because AdaBoost
training is considerably slower.
Tables 6.1 and 6.2 show, for each color space, the best performance achieved by the system along
with the associated cost and used parameters by using, respectively, the kNN and Adaboost in the
classification block. These results are also shown in Figure 6.2.
According to the tables and to Figure 6.2 both systems achieve their best performance by using
O1/2/3 color space and approximately the same number of bins. However, the system using kNN as
classifier reaches a better result.
Figure 6.3 shows the features used by the best classifier. Figure 6.3 (left) shows the feature
vectors associated to melanoma lesions, whereas in Figure 6.3 (right) are represented the feature
vectors associated to non-melanomas. Additionally, in both figures are also represented the mean
vectors and the mean vectors plus and minus two times the standard deviation. The mean and
standard deviation values of each figure were computed by solely using the feature vectors that are
present on the correspondent figure. These statistical measures are useful to perform a comparison
between the unbalanced classes. Lastly, the misclassified non-melanomas are represented in dark
blue on the right.
The first 126 features represented in the figure correspond to region R1 , in which the first 42 stand
for the color histogram representing the O1 channel, whereas the next 84 features correspond to
the ones representing the O2 and O3 channels, respectively. The remaining 126 features, by their
turn, are extracted from R2 and are displayed in the same manner. Features belonging to the same
channel were normalized to sum one.
As previously mentioned in Section 2.4, channels O1 and O2 contain color information and channel O3 contains information about the lightness. The color information provided by channel O1 corresponds to green if the intensity of the pixel is bellow a certain threshold and corresponds to red if
55
Figure 6.1: ROC of random guess classifier (red line) and cost line (dotted line).
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
93
93
93
93
100
100
93
SP
87
90
90
90
93
83
87
Cost
0.09
0.08
0.08
0.08
0.02
0.06
0.09
kNN
Number of bins
15/17/18
15/17/18
15/17/18
42
19/31
16/17
Region
R
R2
R2
R2
R1 + R2
R2
R1 + R2
Distance
KLD
KLD
KLD
KLD
ED
KLD
HI
k
21/23
21
21
21
5
19
7
Table 6.1: Best performance of the system for each color space. The results were obtained by using the unidimensional color histogram as a global image descriptor and kNN as the classifier. The measures used to
evaluate the performance were the SE and SP values as well as their associated cost.
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
86
93
93
93
100
93
93
SP
88
89
89
89
85
93
87
Cost
0.13
0.08
0.08
0.08
0.05
0.07
0.09
AdaBoost
Number of bins
19
50
50
50
41
34
41
Region
R
R1
R1
R1
R2
R1
R1 + R2
α
1
2
2
2
1/2
1
1
T
10
10
10
10
2
10
2
Table 6.2: Best performance of the system for each color space. The results were obtained by using the unidimensional color histogram as a global image descriptor and AdaBoost as the classifier. The measures used to
evaluate the system’s performance are the SE and SP values as well as their associated cost.
56
Figure 6.2: Performance of the best classifier of each color space by using the unidimensional color histogram
as a global image descriptor and kNN(left) or AdaBoost(right) as the classifier.
the intensity is above that threshold. The same occurs in channel O2, if the pixel intensity is above a
certain threshold the information corresponds to yellow and if it is bellow corresponds to blue.
By comparing the mean feature vector of both figures one observes that in both cases the color
histograms reveal that channel O1 predominantly contains pixels on the red region, whereas channel
O2 predominantly contains pixels on the yellow region. It is important to remember that the intensity of
each bin is proportional to the number of pixels counted in the bin. However, the non-melanomas components corresponding to channels O1 and O2 comprise a larger gamut of values than melanomas.
The values from component O1 of non-melanomas are within the 20th and the 33th bin, whereas
the ones from melanomas are within the 20th and the 29th . Similarly, the values of component O2
are within the 20th and the 35th in the case of melanomas and within 20th and 28th in the case of
melanomas. In fact, the misclassified non-melanomas exhibit a behaviour more similar to melanomas
as their components O1 and O2 are within the same range.
Regarding channel O3, the smaller the intensity value the darker the color and vice-versa. Figure
6.3 shows that melanomas contain a larger amount of darker pixels than non-melanomas, which might
be a consequence of the abnormal proliferation of melanocytes and consequent increased production
of melanin.
6.5.2
Tridimensional Color Histogram
Similarly to the unidimensional color histogram, the number of bins of the tridimensional color
histogram must be optimized. Since, for the tridimensional histogram, computing the feature vectors
is more time consuming, optimization was performed by using a smaller set of bins. In this case, each
channel was quantized into M bins, where M can only take the following values M ∈ {5, 10, 15, 20}.
For simplicity’s sake, we only computed the histograms in which all channels are quantized by the
same number of bins.
To sum up, the performance of the systems was computed for 7 color spaces, a range of 4 numbers
of bins and 4 regions. In the system using kNN as the classifier, performance was also computed for
3 kNN similarity measures and 10 k values, whereas in the one that uses AdaBoost as the classifier
57
it was computed for 2 α values and 4 T values. This amounts to 3360 different configurations of
the first system, which were independently assessed by LOOCV, and 448 different configurations of
the second one, which were independently assessed by 10-fold cross-validation. Table 6.3 and 6.4
summarize the results achieved by the first and second systems, respectively, by showing, for each
color space, the best performance, the associated cost and the respective best set of parameters.
As shown in Tables 6.3 and 6.4 and in the ROC space (see Figure 6.4), the best configuration
was achieved by the system that uses AdaBoost and a histogram of 10×10×10 bins, in the HSV or
L∗ a∗ b∗ color space. Nevertheless, the difference between the best performance of the first system
(SE = 93% and SP = 91%) and the one of the second system(SE = 93% and SP = 92%) is very
small.
Even though both HSV and L∗ a∗ b∗ color spaces reached the best performance by using AdaBoost as the classifier, let us analyze the features extracted by using the L∗ a∗ b∗ color space, since
this is also the color space with which kNN performed best. Therefore, Figure 6.5 shows the features
extracted from region R1 by using a tridimensional histogram of 10×10×10 bins in the L∗ a∗ b∗ color
space.
The feature vectors obtained by using this descriptor are represented in Figure 6.5. As previously
mentioned, each feature corresponds to a bin of the histogram. However, unlike the unidimensional
histogram where each feature contains information regarding a single channel, in this case each
feature contains information regarding all three channels. Two pixels are only counted in the same
bin if all three components are within the same intervals. In order to provide the reader an easier
interpretation of the graphics, Table 6.5 shows which intervals of L∗ , a∗ and b∗ are associated to each
feature. To determine which features correspond to each bin, one has simply to replace i by the
intended bin.
Figure 6.5 shows the intensity of each feature, which is proportional to the number of pixels
counted in the correspondent bins. The features which exhibit the greatest intensities are within
the following intervals: L∗ ∈ [30.3, 90.9[, a∗ ∈ [50.5, 100[ and b∗ ∈ [0, 10.1[∪[20.2, 100[, in the case
of non-melanomas, and L∗ ∈ [20.2, 80.8[, a∗ ∈ [50.5, 90.9[ and b∗ ∈ [0, 10.1[∪[20.2, 100[ in the case
of melanomas. Some important conclusions can be drawn from this observation. Firstly, melanomas
reach smaller lightness values than non-melanomas, whereas non-melanomas reach higher lightness
values. As explained in Section 2.5, L∗ is the lightness channel, in which L∗ = 0 corresponds to pure
black and L∗ = 100 corresponds to pure white. Therefore, the higher the value of L∗ the brighter are
the pixels.
These observations are interesting because melanomas are usually associated with darker lesions, due to the increased amounts of melanin.
Regarding the a∗ and b∗ values, there is no apparent difference between melanomas and non
melanomas. In both cases, channel a∗ shows the predominance of the red component in the images,
whereas channel b shows that lesions may contain the yellow or blue components.
However, impressively, AdaBoost only selected two features to perform the classification. These
features were x555 and x665 . The former corresponds to the sixth interval (i = 6) of all three channels,
58
Figure 6.3: Representation of the feature vectors obtained by using the unidimensional color histogram as
the image descriptor and the corresponding best set of parameters. The feature vectors that correspond to
melanomas are represented on the left, whereas the ones that correspond to non-melanomas are represented
on the right. The misclassified lesions are represented in magenta (left) and blue (right).
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
93
93
93
93
100
93
100
SP
84
84
87
86
73
91
72
Cost
0.10
0.10
0.09
0.09
0.09
0.08
0.09
kNN
Number Bins
10×10×10
10×10×10/15×15×15
10×10×10
5×5×5
-
Region
R2
R1 + R2
R1 + R2 /R2
R2
R2
-
Distance
HI
ED/KLD
ED
KLD
-
k
7
5/21
21
19
-
Table 6.3: Best performance of the system for each color space. The results were obtained by using the tridimensional color histogram as a descriptor. The measures used to evaluate the system’s performance are the SE
and SP values as well as their associated cost. The parameters filed by ’-’ indicate that there are more than two
possible configurations which lead to the same result.
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
86
93
93
93
86
93
100
SP
93
92
82
81
93
92
64
Cost
0.12
0.07
0.11
0.11
0.12
0.07
0.12
AdaBoost
Number Bins
15×15×15
10×10×10
15×15×15
5×5×5
15×15×15
10×10×10
10×10×10
Region
R
R1 + R2 /R1
R
R1 + R2 /R1
R1
R1 + R2 /R1 /R
R2
α
1
1
2
1
2
1
1
T
2
2
2
2
5
2
20
Table 6.4: Best performance of the system for each color space. The results were obtained by using the tridimensional color histogram as a descriptor. The measures used to evaluate the system’s performance are the SE
and SP values as well as their associated cost.
59
Figure 6.4: Performance of the best classifier in each color space by using the tridimensional color histogram as
a global image descriptor and kNN(left) or AdaBoost(right) as the classifier.
whereas the latter corresponds to the seventh interval (i = 7) of channels L∗ and a∗ , and to the sixth
interval of channel b∗ .
Figure 6.6 shows a scatter plot computed by using x555 (horizontal axis) and x665 (vertical axis).
On the left, melanomas e non-melanomas are represented in red and blue, respectively. On the right,
the melanomas and non-melanomas which the system failed to classify are, respectively, represented
in green and black. The intensity of feature x555 is inferior to 0.002 in almost all non-melanomas
(122 out of 134). The non-melanomas which exhibited a higher intensity were misclassified. On the
other hand, the intensity of this feature on melanomas exhibits values greater than 0.002. The only
melanoma with an inferior intensity of this feature was misclassified.
On the other hand, it is harder to solely draw conclusions from feature x665 . In effect, the weak
classifier whose selected feature was x665 contributes with a much smaller weight (1.5851) to the
weak classifier than the one whose selected feature was x555 (2.4453).
6.5.3
Generalized Color Moments
The last system differs from the previous two because it uses generalized color moments, instead
of color histograms, as global image descriptors. No further parameters, besides the ones mentioned
in the beginning, were optimized. Therefore, the performance of the systems was computed for 7
color spaces, 4 regions and 1 kNN similarity measures and 10 k values, if kNN is the classifier, and
2 α and 4 T values if the classifier is AdaBoost. Which means that we have considered 280 different
configurations, each of them trained and tested using the LOOCV, in the first system and 224 different
configurations, each of them trained and tested using the 10-fold cross-validation, in the second one.
The best performance reached by this descriptor (SE = 93% and SP = 93%) was achieved by
kNN by using the images in the RGB color space and features extracted from R (see Tables 6.6
and 6.7 and Fig. 6.7). By using AdaBoost as the classfier, the best perfomance(SE = 93% and
SP = 89%) is attained using the features extracted from regions R or R1 in the L∗ a∗ b∗ color space.
However, once again, we will only analyze the features which led to the best performance. Therefore, Figure 6.8 represents the features extracted from region R in the RGB color space. In the figure,
60
Figure 6.5: Representation of the feature vectors obtained by using the based on color histogram best classifier.
The feature vectors that correspond to melanomas are represented on the left, whereas the ones that correspond
to non-melanomas are represented on the right. These results were achieved by using the tridimensional color
histogram as a global image descriptor.
Figure 6.6: Scatter plot of features x555 and x665 for melanomas (red) and non melanomas (blue). On the right,
the lesions which were misclassified by the system are now represented in green and black.
L*
[10.1(i − 1), 10.1i[
a*
b*
x1+100(i−1) , . . . , x100i
x1+10(i−1) , . . . , x10i ,
x101+10(i−1) , . . . , x100+10i ,
x201+10(i−1) , . . . , x200+10i ,
..
.
x901+10(i−1) , . . . , x900+10i
xi , x10+i , x20+i , x30+i , . . . , x990+i
Table 6.5: Correspondence between the L∗ , a∗ and b∗ values and features. The variable i, represents the
number of the bin and it goes from 1 up to 10.
61
Figure 6.7: Performance of the best classifier in each color space by using the generalized color moments as a
global image descriptor and kNN(left) or AdaBoost(right) as the classifier.
melanomas are represented on the left, whereas non-melanomas are represented on the right. Notice that all feature vectors were normalized to have a zero mean and standard deviation equal to
1.
The most noticeable difference between melanomas and non-melanomas is related to the vectors’
mean. Melanomas exhibit a negative mean for all the features, whereas non-melanomas exhibit a
positive one.
The misclassified melanoma is represented in magenta (left), and the misclassified non-melanomas
are represented in dark blue (right). The reason for which the melanoma was misclassified is clear,
it is the only one whose features are above zero and considerably distant from the mean vector. It
is also possible to observe that three of the misclassified non-melanomas are the ones whose majority of features have the smallest values and are closer to the mean of melanomas than the one of
non-melanomas. The remaining melanomas were misclassified due to their proximity with the misclassified melanoma, which is used as a training sample. Furthermore, due to the repetition of the
patterns of the minority class, this training sample counts as ten neighbors.
6.6
Local Methods
In this section we will analyze the results achieved by the CAD system based on the local feature extraction method (local system). This system was also tested by performing all the possible
combinations between three color descriptors and two classifiers.
Similarly to the systems created by using global feature extraction methods, these systems also
used unidimensional color histograms and generalized color moments as image descriptors. However, because using the tridimensional color histogram as a local descriptor is considerably time
consuming, we chose to use a vector containing the mean intensity value of each channel instead.
Regarding the classification block, kNN and AdaBoost were once again chosen as classifiers.
As in the previous section, each system firstly requires a parameter optimization. The first parameter to be optimized was the size of the image patches (or blocks), δi × δj . For simplicity’s sake, the
62
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
93
93
93
93
93
93
93
SP
93
88
90
90
90
91
86
Cost
0.07
0.09
0.08
0.08
0.08
0.08
0.09
kNN
Region
R
R1 /R
R
R1 + R2
R1 + R2 /R/R1 + R2
R
R
Distance
HI
ED
HI
ED
ED
ED
ED
k
19
21
7/9
7
21/21/23
21
23
Table 6.6: Best performance of the system for each color space. The results were obtained by using the generalized color moments as a global image descriptor. The measures used to evaluate the system’s performance
are the SE and SP values as well as their associated cost.
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
86
86
79
79
86
93
86
SP
93
93
94
93
94
89
95
AdaBoost
Cost Region
0.12
R1
0.12
R
0.16
R1
0.16
R1 /R
0.11
R
0.08
R/R1
0.11
R2
α
1
2
1
1
1
1
1/2
T
5
5/10
20
10/20
5
2
2
Table 6.7: Best performance of the system for each color space. The results were obtained by using the generalized color moments as a global image descriptor. The measures used to evaluate the system’s performance
are the SE and SP values as well as their associated cost.
Figure 6.8: Representation of the feature vectors obtained by using the generalized color moments as the image
descriptor and the corresponding best set of parameters. The feature vectors that correspond to melanomas are
represented on the left, whereas the ones that correspond to non-melanomas are represented on the right. The
misclassified lesions are represented in magenta (left) and blue (right).
63
same value (δ) was assigned to δi and δj . The size of the blocks defines the amount of patches in
which the image is divided. The larger the blocks the closer we are to global feature extraction methods. For each system, a total amount of five different δ values was tested, δ ∈ {20, 40, 60, 80, 100}.
The next parameter to be optimized is the size of the visual vocabulary, K. The final histogram of
visual words will have as many bins as K and thus one can determine how many visual words can
accurately describe the images. Four different K values were tested, K ∈ {50, 100, 150, 200}.
Classification was also performed by using kNN and AdaBoost. The former classifier was optimized by choosing one out of ten values for k, k ∈ {5, 7, 9, 11, 13, 15, 17, 19, 21, 23}, and one of the
three distance measures used to compute the similarity between the histograms of visual words: ED,
KLD and HI. Regarding AdaBoost, four different values of T , T ∈ {2, 5, 10, 20}, and two different
values of α were tested.
In each subsection, the best configuration of the system in each color space is discriminated on a
table and the best performances are presented in the ROC space.
Furthermore, for the best configuration of each descriptor we will analyze how the performance of
the system varies with δ and K.
6.6.1
Unidimensional Color Histogram
The first system to be tested uses the unidimensional color histogram as a local descriptor. In
addition to the above mentioned parameters the number of bins of the histogram also requires an
optimization. However, applying the BoF model is a more time consuming task and, thus, while for
the global methods one tested all number of bins in the range 15, ..., 50, in this case one will test the
color histogram for a smaller subset of bins {5, 10, 15, 20, 25, 30} bins.
Therefore, the performance of the system was tested for 4 K values, 5 δ values, 6 different number
of bins and, regarding the kNN classifier, 3 distance measures and 10 different k values and 2 α and
4 T values regarding AdaBoost. This amounts to 4200 different configurations for the first scenario
and 1120 for the second one, each of which was independently assessed by using a 10-fold crossvalidation.
Tables 6.8 and 6.9 show the system’s best configuration for each color space and for each classifier. The best performance for each color space is also represented in the ROC space, see Figure
6.9, for a better comparison of the systems. As in the global methods, the color space which provides
the best result (= 100% and = 93%) is the opponent color space.
In order to analyze how some parameters affect the system, we kept the best configuration and
varied, separately, K, δ and the number of bins of the histogram. These were the selected parameters
since they are the ones related with the stage of feature selection and feature extraction.
Figure 6.10 shows how the system’s performance varies with K. When K increases from 50 to
100, there is a major improvement in the performance of the system, being the best result achieved
for K = 100. Afterwards, the performance drops as K goes from 100 to 150 and drops even more
when K becomes 200. One possible explanation is that, because each patch is associated to its
most similar visual word, if the number of visual words is too small (coarse quantization), patches
64
Figure 6.9: Performance of the best classifier in each color space by using the unidimensional color histogram
as a local image descriptor and kNN(left) or AdaBoost(right) as the classifier.
that are not really similar might be associated with the same visual word and mislead the classifier.
Nevertheless, at a certain point, the performance of the system will stop increasing as K increases,
because by using a large number of visual words (fine quantization), similar patches might not be
associated to the same visual word, due to the large number of visual words.
Performing the same analysis, but now fixating K = 100 and changing δ, Figure 6.11 shows that
the best result is achieved for δ = 60 whereas the worst is obtained for δ = 100. From this analysis
one can conclude that the patches defined by a δ inferior to 60 are very small and do not provide
the necessary information, while patches defined by a greater δ value contain too much, and varied,
information and should not be considered homogeneous nor described as a single patch.
Lastly, Figure 6.12 shows how the performance of the system is deeply affected by the number of
bins of the unidimensional color histogram. As we can observe, there is a major difference between
the cost of using histograms defined by 30 bins and the ones defined by any of the other five numbers
of bins. In fact patches are expected to have a small color variation and the necessity of using a larger
number of bins to discriminate the colors of the patch is understandable.
6.6.2
Mean Color
In this section we will analyze how the system performs by using a mean color vector as a local
image descriptor together and kNN or AdaBoost as classifiers.
Both systems were tested for 4 values of K, 5 values of δ, 7 color spaces and, regarding the
classifier, 3 distance measures and 10 different values of k using kNN as classifier, and 2 values of δ
and 4 values of T using AdaBoost. This amounts to 4200 configurations in the first system and 1120
in the second, all independently assessed by 10-fold cross-validation.
Tables 6.10 and 6.11 show the best configuration of the systems for each color space as well as
its corresponding SE, SP and cost values. Additionally, each one of these best performances are
represented on the ROC space, see Figure 6.13.
The best performance was achieved by selecting patches with a δ = 20 and by converting the
image into the L∗ a∗ b∗ color space. For this configuration, among the four possible K values, the
65
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
93
100
100
100
100
100
100
SP
84
75
79
76
90
78
71
Cost
0.10
0.08
0.07
0.08
0.03
0.07
0.10
kNN
Number of centroids
50
200
150
50
50
100
100
δ
20
20
20
100/60
80
80
100
Number of bins
5
10
5
5
25
15
5
Distance
KLD
ED
ED
ED
HI
KLD
ED
k
23
7
7
7/9
5
17
7
Table 6.8: Best performance of the system for each color space. The results were obtained by using the unidimensional color histogram as a local descriptor and kNN as classifier. The measures used to evaluate the
system’s performance are the SE and SP values as well as their associated cost.
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
93
100
93
93
100
100
100
SP
94
91
93
96
93
88
81
Cost
0.07
0.03
0.07
0.06
0.02
0.04
0.06
AdaBoost
Number of centroids
δ
50
20
200
40
50/100
20/60
200
20
100
60
100
80
200
20
Number of bins
5
5
5/15
30
30
30
5
Distance
2
2
1/2
2
2
1
2
k
10
5
2/10
5
10
5
2
Table 6.9: Best performance of the system for each color space. The results were obtained by using the unidimensional color histogram as a local descriptor and AdaBoost as the classifier. The measures used to evaluate
the system’s performance are the SE and SP values as well as their associated cost.
Figure 6.10: Performance of the system as a function of K. These results were achieved by using the unidimensional color histogram as a local image descriptor.
66
Figure 6.11: Performance of the system as a function of δ. These results were achieved by using the unidimensional color histogram as a local image descriptor.
Figure 6.12: Performance of the system as a function of the number of bins of the histogram. These results
were achieved by using the unidimensional color histogram as a local image descriptor.
67
visual dictionary with 200 visual words was the one that lead to the best performance (SE = 93% and
SP = 95%).
Figure 6.13: Performance of the best classifier in each color space by using the mean color vector as a local
image descriptor and kNN(left) or AdaBoost(right) as the classifier.
Let us now analyze the effect of K and δ over the system’s performance. Adopting the best configuration for all parameters except K, Figure 6.14 shows how the system’s performance changes
as this variable increases. As one can observe, the size of the visual dictionary affects the performance. Even though, in general, the performance decreases as K increases, the best performance
is achieved for K = 200 and, thus, no overall conclusion can be drawn from these observations.
Performing the same analysis, but now fixating K = 200 and changing δ, Figure 6.15 shows that
the best result is achieved for δ = 20. For δ values greater than 20 the performance drops markedly.
Therefore, patches defined by δ values above 20 may contain too much, and varied, information and
should not be considered homogeneous nor described as a single patch.
6.6.3
Generalized Color Moments
The last system uses a set of generalized color moments to represent the lesions and, as in the
other systems, kNN and AdaBoost as classifiers. It was tested for the exact same configurations as
the previous descriptor.
Tables 6.12 and 6.13 show the best performance of each color space, and respective configuration, by using kNN and AdaBoost, respectively, as classifiers. The best result (SE = 100% and
SP = 91%) was obtained by using a dictionary of 200 visual words, patches defined by δ = 20 in the
HSI color space and AdaBoost as the classifier. Interistingly, both AdaBoost and kNN achieved the
best performance in the HSI color space. Figure 6.16 complements the Tables by representing the
performances in the ROC space.
Let us now, once again, analyze how variables K and δ affect the best classifier. Figures 6.17 and
6.18 show the performance of the system for the different values of K and δ, respectively. As in the
other descriptors, the best configuration was kept while varying K and δ, separately.
Firstly, let us analyze the role of K. As shown in Figure 6.17, performance increases as K increases. The generalized color descriptors not only provide information about the color distribution,
68
SE
100
86
93
93
100
100
86
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SP
68
93
82
84
65
74
90
Cost
0.11
0.12
0.11
0.10
0.12
0.09
0.13
kNN
Number of centroids
100
50/100/150
50
50
100
200
50
δ
80
40/20/20
60
40
100
60
20
Distance
ED
KLD
KLD
KLD
ED
KLD
HI/KLD
k
21
7/5/5
13
11
23
11
5
Table 6.10: Best performance of the system for each color space. The results were obtained by using the mean
color vector as the local feature descriptor and kNN as the classifier. The measures used to evaluate the system’s
performance are the SE and SP values as well as their associated cost.
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
86
93
86
86
79
93
93
SP
93
95
95
94
92
94
84
Cost
0.12
0.06
0.11
0.11
0.17
0.07
0.10
AdaBoost
Number of centroids
150
200
50
50
150
50
50
δ
80
20
40
20
40
40
20
α
2
2
2
1
2
2
2
T
5
10
20
5
10
5
2
Table 6.11: Best performance of the system for each color space. The results were obtained by using the mean
color vector as the local feature descriptor and AdaBoost as the classifier. The measures used to evaluate the
system’s performance are the SE and SP values as well as their associated cost.
Figure 6.14: Performance of the system as a function of K. These results were achieved by using the mean
color vector as a local image descriptor.
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
100
100
100
100
100
100
100
SP
80
87
89
87
81
81
71
Cost
0.07
0.04
0.04
0.04
0.06
0.06
0.10
kNN
Number of centroids
100
100/150
200
100
200
150
δ
100
20
40
40
20
100
Distance
HI
HI/KLD
HI
HI
HI
HI
k
23
21/11
7
21/23
11
21
Table 6.12: Best performance of the system for each color space. The results were obtained by using the
generalized color moments as local image descriptors and kNN as the classifier. The measures used to evaluate
the system’s performance are the SE and SP values as well as their associated cost. The parameters filed by ’-’
indicate that there are more than two possible configurations which lead to the same result.
69
but also provide spatial information. As a consequence, there will be a greater variation among spatial patterns and a larger number of visual words will be required to accurately represent all spatial
patterns.
Regarding δ, Figure 6.18 shows that the best result is achieved for the smallest δ (δ = 20). Even
though there are some cost variations as δ increases, globally, the cost increases and the performance
worsens.
6.7
Comparison between the global and local systems
Finally, in this section, a comparison between skin lesion classification using global and local
feature extraction methods is performed. Table 6.14 shows the best performance achieved by each
system using the different image descriptors.
Interestingly, the best performance (SE = 100% and SP = 93%) was reached by the two developed CAD systems. Furthermore, both systems were able to achieve this result by using the
unidimensional color histogram in the O1/2/3 color space as the color descriptor.
These results reveal that the unidimensional histogram, in the O1/2/3 color space, is both a great
global and local descriptor. However, while kNN performed a better classification using the global
features, AdaBoost performed better by basing its decision on the features obtained by using the BoF
model.
The set of generalized color moments performs slightly better as a local descriptor. Even though
the SP decreases a bit, the system is able to correctly classify all melanomas. Regarding the color
space, it is not possible to draw conclusions regarding a broadly suitable color space to use with this
descriptor. For the global methods the best performance was achieved by using the RGB color space,
whereas for the local methods the HSL color space performed the best.
Using the tridimensional histogram as an image descriptor is a very time consuming task and,thus,
it was only used as a global descriptor. Even though its best performance was achieved by using
AdaBoost as classifier, kNN also performs well (SE = 93% and SP = 91%). Its performance is
also really close to the one of the generalized color moments, they only differ in the classification
of two non-melanomas. Furthermore, both misclassify one of the melanomas which places their
performance far behind from the one achieved by the unidimensional color histogram, regarding the
global method.
The mean color vector is the simplest of the descriptors used and it was solely used in the local
system. The features extracted by using this descriptor lead to a better performance of AdaBoost
rather than kNN. From the local descriptors, the mean color vector performed the worst. Its performance was severely afected by the misdiagnosis of a melanoma as it correctly identified a higher
number of non-melanomas.
Both CAD systems performed well regardless of the used color decriptor or classifier. However, in
general, the CAD system based on the local feature extraction method reached better performances.
Therefore, in the future, this system should be studied in more detail.
70
Figure 6.15: Performance of the system as a function of δ. These results were achieved by using the mean color
vector as a local image descriptor.
RGB
HSV
HSL
HSI
O1/2/3
L*a*b*
L*uv
SE
86
93
100
93
86
86
86
SP
59
82
91
81
60
80
94
Cost
0.23
0.11
0.03
0.11
0.23
0.16
0.11
AdaBoost
Number of centroids
100
50
200
150
50/200
150/200
100
δ
40
40
20
20
40/20
20
60
α
2
1
1
1
2
1
2
T
2
5
2
2
2/10
2
20
Table 6.13: Best performance of the system for each color space. The results were obtained by using the
generalized color moments as the local feature descriptor and AdaBoost as the classifier. The measures used to
evaluate the system’s performance are the SE and SP values as well as their associated cost.
Global
Local
Descriptor
Unidimensional Color Histogram
Tridimensional Color Histogram
Generalized Color Moments
Unidimensional Color Histogram
Generalized Color Moments
Mean Color Vector
Color Space
O1/2/3
HSV
L*a*b
RGB
O1/2/3
HSL
HSV
Classifier
kNN
AdaBoost
SE
100
93
SP
93
92
Cost
0.02
0.07
kNN
AdaBoost
AdaBoost
AdaBoost
93
100
100
93
93
93
91
95
0.07
0.02
0.03
0.06
Table 6.14: Best performance achieved for each descriptor of both CAD systems.
71
Figure 6.16: Performance of the best classifier in each color space by using the generalized color moments as
a local image descriptor and kNN(left) or AdaBoost(right) as the classifier.
Figures 6.19 and 6.20 show some examples of TNs (top row, left), FNs (down row, left) and TPs
(down row, right) images, classified by using these best configurations. Both systems accurately
classified all melanomas and failed to classify 10 out of the 134 non-melanomas. From the 10 misclassified lesions, 5 were misclassified by both. These 5 lesions are showed in Figure 6.21.
72
Figure 6.17: System’s performance as a function of K. These results were achieved by using the vector containing the generalized color moments of the patch as a local image descriptor.
Figure 6.18: System’s performance as a function of δ. These results were achieved by using the vector containing the generalized color moments as a local image descriptor.
Figure 6.19: Examples of images classified by the best global system: a TN (top row, left), FN (down row, left)
and TP (down row, right). There was no FP classification.
73
Figure 6.20: Examples of images classified by the best local system: a TN (top row, left), FN (down row, left)
and TP (down row, right). There was no FP classification.
Figure 6.21: False Positives detected by the best configuration of both global and local systems.
74
7
Conclusions and Future Work
Contents
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
77
75
7.1
Conclusions
Melanoma is one of the most aggressive types of cancer and, if not detected at an early stage, is
often incurable. Dermatologists rely on dermoscopy images to perform the diagnosis. Even though
there are some medical diagnostic techniques, such as the ABCD rule of dermoscopy and the 7-point
checklist, which aid dermatologists to analyze the images, some images are of difficult interpretation
and likely to be misdiagnosed. Therefore, in order to simplify classification and to avoid histological
examination, CAD systems have been developed in order to aid dermatologists in the classification
task. Several CAD systems have already been developed and some have already achieved good
performances. Nevertheless, there is still need for improvement. For instance, the ideal set of features
used to represent the images remains unknown.
In effect, the goal of this thesis was to help improving CAD systems for melanoma classification
by assessing the contribution of color features as image descriptors. In addition, we also intended to
determine which color descriptor, of those at stake, performed the best.
In order to determine what is the role of color in CAD systems, we developed two different CAD
systems in which the image representation relies solely on color descriptors. The difference between
the systems is mainly at the feature extraction level. In the system that adopts the global feature
extraction method (global system), lesions were considered to be uniform and were represented by
a set of global features. On the contrary, the one that adopts the local feature extraction method
(local system), lesions are regarded as non-uniforms and are divided into several patches which
are considered to contain uniform regions of the lesion. Each patch can be represented by a local
descriptor. Lesions are represented by the combination of these descriptors.
To better assess the role of color, three different color descriptors were used, separately, in each
system. Global feature extraction was performed using unidimensional and tridimensional histograms,
as well as a set of generalized color moments, as global image descriptors. Local feature extraction
was performed by using the unidimensional color histogram, the set of generalized color moments
and a mean color vector as local image descriptors. In both cases, the descriptors were, separatly,
applied to images represented in seven color spaces: RGB, HSV, HSL, HSI, O1/2/3, L∗ a∗ b∗ and
L∗ uv.
Lesion classification was performed by using two classifiers: kNN, with three different distance
measures (ED,HI and KLD), and Adaboost. The achieved results were presented in Chapter 6.
From the experimental results, it was shown that a CAD system which performs melanoma classification solely based on a single color descriptor is able to reach a SE of 100% and a SP of 93%, which
carries a cost value of only 0.03, in the unit scale. Furthermore, all systems were able to reach performances with a cost value not greater than 0.09, which in this case corresponds to a SE of 100% and
a SP of 74%. Therefore, we are able to answer to the first and main question of this thesis, which is
whether or not color-based features have an important contribution for melanoma classification. From
these findings, it becomes clear that color-based features play a major role in dermoscopy analysis.
The best performance (SE = 100% and SP = 93%) was achieved by the two CAD systems.
76
Interestingly, both systems reached this result by using the unidimensional color histogram in the
O1/2/3 color space as the image descriptor. Hence, we may conclude that, from the set of used
descriptors, the unidimensional color histogram, associated to the O1/2/3 color space, is the best
color descriptor regardless the chosen feature extraction method.
It had already been stated that color histograms are powerful descriptors and, thus, the good
results they achieved did not come as a surprise. Even though the unidimensional color histogram
performed better than the tridimensional one, we cannot conclude that the former is a better descriptor. We must take into account that while the unidimensional histogram was tested for 36 different
numbers of bins, the tridimensional histogram was solely tested by using 4 different values of bins.
Furthermore, it was not possible to test the tridimensional color histogram as a local descriptor due to
time limitations.
Concerning the color space, it is not surprising that the best performance has been achieved by
using the images in the O1/2/3 color space, because from all these color representations this is the
most similar to the second stage of the two stage color vision model (see section 2.3). From the tested
color spaces, this representation is the closest to the representation of color in the human brain. This
possible explains why the classification by using images in the O1/2/3 better mimic the decision of a
dermatologist.
Regarding which feature extraction method performs best, the conclusion is not as straightforward
as the others. Both methods were able to originate a system which lead to the best performance.
However, in general, the systems based on a local feature extraction method were able to reach
better performances despite the simplicity of some of the used descriptors, such as the mean color
vector. However, the system based on the global feature extraction method has the advantage of
being simpler to implement and less time consuming. Therefore, there is a trade-off between simplicity
and performance.
Nevertheless, in the context of this work, where the patients’ health and survival chances are at
stake, performance must have the greater weight.
7.2
Future work
Both of the systems developed in this thesis can be further tested and further improved. In this
section, one will present some possible guidelines for the future work concerning both systems.
• Tridimensional color histogram as a local image descriptor: Color histograms proved once
again to be good descriptors. However, due to time limitations it was not possible to experiment
the tridimensional histogram as a local image descriptor. Therefore, it would be interesting to
assess the role of the tridimensional histogram, as a local descriptor, in the developed CAD
system.
• Larger and more balanced database: The dataset used, in this thesis, to train and test the
systems is small and unbalanced (14 melanomas to 134 non-melanomas). A larger dataset
is expected to lead to better performance, since the classifier can learn from a higher number
77
of examples. In this case, it is specially important to increase the number of melanomas, not
only to decrease the class imbalance but also, because it is less severe to have a FN than
a FP. In effect, increasing the number of instances of a class is equivalent to increasing the
misclassification cost of this class.
• More classifiers: In this work only two classifiers (kNN and Adaboost) were used to assess the
performance of the system. It would also be interesting to compute the performances by using
other classifiers, such as the ANN and the SVM, which have already led to good performances
in other systems [12, 49].
• Combination of features: The goal of this thesis was solely to evaluate the contribution of color
features. A future step will be to combine the best color features with other types of features,
such as shape- and texture-based features. Furthermore, in this work the color descriptors were
not combined and, thus, another possibility is to also combine color descriptors.
78
Bibliography
[1] Bag of words model. http://en.wikipedia.org/wiki/Bag_of_words_model.
[2] Cie 1931 color space. http://www.colorbasics.com/CIESystem/.
[3] Dermoscopy. http://www.dermoscopy.org/.
[4] Dermoscopy. http://www.dermoscopy.co.uk.
[5] L. Andreassi, R. Perotti, P. Rubegni, M. Burroni, G. Cevenini, M. Biagioli, P. Taddeucci,
G. Dell’Eva, and P. Barbini. Digital dermoscopy analysis for the differentiation of atypical nevi and
early melanoma: a new quantitative semiology. Archives of Dermatological Research, 135:1459–
1465, 1999.
[6] G. Argenziano, G. Fabbrocini, Carli P., V. De Giorgi, E. Sammarco, and M. Delfino. Epiluminescence microscopy for the diagnosis of doubtful melanocytic skin lesions. comparison of the abcd
rule of dermatoscopy and a new 7-point checklist based on pattern analysis. Arch Dermatol,
134:1563–1570, 1998.
[7] C. Barata, J. S. Marques, and J. Rozeira. A system for the detection of pigment network in
dermoscopy images using directional filters. Biomedical Engineering, IEEE Transactions on,
59(10):2744–2754, oct. 2012.
[8] Christopher M. Bishop. Pattern recognition and machine learning. Springer, 1st ed. 2006. corr.
2nd printing edition, October 2006.
[9] A. Blum, H. Luedtke, U. Ellwanger, R. Schwabe, G. Rassner, and C. Garbe. Digital image analysis for diagnosis of cutaneous melanoma. development of a highly effective computer algorithm
based on analysis of 837 melanocytic lesions. British Association of Dermatologists, 151:1029–
1038, 2004.
[10] J. K. Bowmaker and H. J. Dartnall. Visual pigments of rods and cones in a human retina. The
Journal of Physiology, 298(1):501–511, 1980.
[11] M. Bratkova, P. Shirley, and R. S. Boulos. orgb: A practical opponent color space for computer
graphics. IEEE Computer Graphics and Applications, 29(1):4255, Jan.-Feb. 2009.
[12] M. E. Celebi, H. A. Kingravi, B. B. Uddin, H. Iyatomi, Aslandogan Y. A, W. V. Stoecker, and R. H.
Moss. A methodological approach to the classification of dermoscopy images. Computerized
Medical Imaging and Graphics, 31(6):362–371, 2007.
79
[13] Wikimedia Commons. File:em spectrum.svg.
[14] Symon D’o. Cotton. Colour, colour spaces and the human visual system, 1995.
[15] R. L. De Valois and K. K. De Valois. A multi-stage color model. Vision Res, 33(8):1053–1065,
1993.
[16] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern classification. Pattern Classification and Scene
Analysis: Pattern Classification. Wiley, 2001.
[17] F. Ecral, A. Chawla, W. V. Stoecker, H.-C. Lee, and R. H. Moss. Neural network diagnosis of malignant melanoma from color images. IEEE Transactions on Biomedical Engineering, 41(9):837–
845, September 1994.
[18] T. Fawcett. An introduction to roc analysis. Pattern Recogn. Lett., 27(8):861–874, June 2006.
[19] E. Fix and J. L. Hodges. Discriminatory analysis, nonparametric discrimination: Consistency
properties. US Air Force School of Aviation Medicine, Technical Report 4(3):477+, January
1951.
[20] W. Frei and B. Baxter. Rate-distortion coding stimulation for color images. IEEE Transactions on
Systems, Man, and Cybernetics, COM-25(11):1385–1392, November 1977.
[21] Yoav Freund and Robert E. Schapire. A decision-theoretic generalization of on-line learning and
an application to boosting. J. Comput. Syst. Sci., 55(1):119–139, August 1997.
[22] H. Ganster, A. Pinz, R. Rohrer, E. Wildling, M. Blinder, and Kittler H. Automated melanoma
recognition. IEEE Transactions on Biomedical Engineering, 20(3):233–239, March 2001.
[23] A. Green, N. Martin, J. McKenzie, G. andPfitzner, F. Quintarell, B. W. Thomas, M. O’Rourk, and
N. Knight. Computer image analysis of pigmented skin lesions. Melanoma Research, 1:231–236,
1991.
[24] A. Green, N. Martin, J. Pfitzner, G. McKenzie, M. O’Rourk, and N. Knight. Computer image
analysis in the diagnosis of melanoma. American Academy of Dermatology, 31(6):958–964,
1994.
[25] C. F. Hall and E. L. Hall. A nonlinear model for the spatial characteristics of the human visual system. IEEE Transactions on Systems, Man, and Cybernetics, SMC-7(3):161–169, March 1977.
[26] A. Hanbury and J. Serra. Colour image analysis in 3d-polar coordinates. In In Proceedings of
the DAGM03 conference, pages 124–131. Springer-Verlag, 2003.
[27] R. M. Haralick, S. R. Sternberg, and X. Zhuang. Image analysis using mathematical morphology.
IEEE Trans. Pattern Anal. Mach. Intell., 9(4):532–550, April 1987.
80
[28] K. Hoffmann, T. Gambichler, A. Rick, M. Kreutz, M. Anschuetz, T. Gruuml;nendick, A. Orlikov,
S. Gehlen, R. Perotti, L. Andreassi, J. Newton Bishop, J.-P. Césarini, T. Fischer, P. J. Frosch,
R. Lindskov, R. Mackie, D. Nashan, A. Sommer, M. Neumann, J. P. Ortonne, P. Bahadoran, P. F.
Penas, U. Zoras, and P. Altmeyer. Diagnostic and neural analysis of skin cancer (danaos). a
multicentre study for collection and computer-aided analysis of data from pigmented skin lesions
using digital dermoscopy. British Journal of Dermatology, 149:801–809, 2004.
[29] H. Iyatomi. New Developments in Biomedical Engineering. InTech, 2010.
[30] H. Iyatomi, M. Celebi, H. Oka, and M. Tanaka. An internet-based melanoma screening system
with acral volar lesion support. In Conf Proc IEEE Eng Med Biol Soc, pages 5156–5159, 2008.
[31] H. Iyatomi, M. E. Celebi, H. Oka, and M. Tanaka. An improved internet-based melanoma screening system with dermatologist-like tumor area extraction algorithm. Computerized Medical Imaging and Graphics, 32:566–579, 2008.
[32] D. A. Kerr. Color and color spaces. http://http://dougkerr.net/pumpkin/articles/Color_
Models.pdf, November 2005.
[33] D. A. Kerr. The cie xyz and xyy color spaces. http://dougkerr.net/pumpkin/articles/CIE_
XYZ.pdf, March 2010.
[34] F. S. Khan, J. van de Weijer, and M. Vanrell. Top-down color attention for object recognition.
pages 979–986. IEEE, 2009.
[35] S. O. Skrøvseth M. Zortea and F. Godtliebsen. Automatic learning of spatial patterns for diagnosis of skin lesions. 32nd Annual International Conference of the IEEE EMBS, Buenos Aires,
Argentina, pages 5601–5604, Aug.-Sept. 2010.
[36] P. Marks, W. H. Dobelle, and E. F. Macnichol. Visual pigments of single primate cones. Science,
143:1181–1183, 1964.
[37] J. S. Marques. Notes for image processing and vision, 2007.
[38] F. Mindru, T. Moons, and L. Van Gool. Recognizing color patterns irrespective of viewpoint and
illumination. pages 368–373, 1999.
[39] H. Müller and D. Keysers. Tutorial on medical image retrieval: Features. http://www-i6.
informatik.rwth-aachen.de/~keysers/MedicalImageRetrieval/slidesFeatures.pdf,
September 2004.
[40] Henning Müller, Nicolas Michoux, David Bandon, and Antoine Geissbuhler. A review of contentbased image retrieval systems in medical applications – clinical benefits and future directions.
International Journal of Medical Informatics, 73:1–23, 2004.
81
[41] F. Nachbar, W. Stolz, T. Merkle, A. B. Cognetta, T. Vogt, P. Landthaler, M. Bilek, O. Braun-Falco,
and G. Plewig. The abcd rule of dermatoscopy. high prospective value in the diagnosis of doubtful
melanocytic skin lesions. J Am Acad Dermatol, 30(4):551–559, April 1994.
[42] J. C. Nascimento and J. S. Marques. Adaptive snakes using the em algorithm. IEEE Transactions
on Image Processing, 14(11):1678–1686, November 2005.
[43] H. Oka, M. Hashimoto, H. Iyatomi, G. Argenziano, H.P. Soyer, and M. Tanaka. Internet-based
program for automatic discrimination of dermoscopic images between melanomas and clark
naevi. British Journal of Dermatology, 150(5):1041–1041, 2004.
[44] Henryk Palus. Colour spaces. Chapmann and Hall, 1998.
[45] Anahit Pogosova. Modeling of human color vision system, 2007.
[46] Marco Polo. Cie1931 rgbcmf. http://www.colorbasics.com/CIESystem/, 2007.
[47] Rhcastilhos. Schematic diagram of the human eye in english, 2007.
[48] P. Rubegni, M. Burroni, G. Cevenini, R. Perotti, G. Dell’Eva, P. Barbini, M. Fimiani, and L. Andreassi. Digital dermoscopy. analysis and artificial neural network for the differentiation of clinically atypical pigmented skin lesions: A retrospective study. The Society for Investigative Dermatology, 119(2):471–474, 2002.
[49] P. Rubegni, G. Cevenini, M. Burroni, R. Perotti, G. Dell’Eva, P. Sbano, C. Miracco, P. Luzi, P. Tosi,
P. Barbini, and L. Andreassi. Automated diagnosis of pigmented skin lesions. International
Journal of Cancer, 101:576–580, 2002.
[50] S.J. Sangwine and R. E. N. Horne, editors. The Colour Image Processing Handbook (Optoelectronics, Imaging and Sensing). Chapmann and Hall, 1 edition, 1998.
[51] S. Seidenari, G. Pellacani, and C. Grana. Pigment distribution in melanocytic lesion images: a
digital parameter to be employed for computer-aided diagnosis. Skin Research and Technology,
11:236–241, 2005.
[52] J. Shlens. Notes on kullback-leibler divergence and likelihood theory. http://www.snl.salk.
edu/~shlens/kl.pdf, August 2007.
[53] M. Silveira, J. Nascimento, J. S. Marques, A. R. S. Marçal, T. Mendonça, S. Yamauchi, J. Maeda,
and J. Rozeira. Comparison of segmentation methods for melanoma diagnosis in dermoscopy
images. IEEE Journal of Selected Topics in Signal Processing, 3(1):35–45, February 2009.
[54] N. Situ, T. Wadhawan, R. Hu, K. Lancaster, X. Yuan, and G. Zouridakis. Evaluating sampling
strategies of dermoscopic interest points. In ISBI’11, pages 109–112, 2011.
[55] Ning Situ, Xiaojing Yuan, Ji Chen, and George Zouridakis. Malignant melanoma detection by
bag-of-features classification. Conf Proc IEEE Eng Med Biol Soc, 2008:3110–3113, 2008.
82
[56] D. M. Szaflarski. How we see: The first steps of human vision. http://www.accessexcellence.
org/AE/AEC/CC/vision_background.php, September 2000.
[57] M. Tkalcic and J. Tasic. Colour spaces - perceptual, historical and applicational background.
Proc. Eurocon 2003, September 2003.
[58] K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. Evaluating color descriptors for object and
scene recognition. IEEE Transactions on pattern analysis and machine intelligence, 32(9):1582–
1596, September 2010.
[59] V. Vezhnevets, V. Sazonov, and A. Andreeva. A survey on pixel-based skin color detection
techniques. In IN PROC. GRAPHICON-2003, pages 85–92, 2003.
[60] Paul Viola and Michael Jones. Robust real-time face detection. International Journal of Computer
Vision, 57:137–154, 2004.
[61] H. Wang, X. Chen, R. H. Moss, R. J. Stanley, W. V. Stoecker, M. E. Celebi, T. M. Szalapski, J. M.
Malters, J. M. Grichnik, A. A. Marghoob, H. S. Rabinovitz, and S. W. Menzies. Watershed segmentation of dermoscopy images using a watershed technique. Skin Res Technol, 16(3):378–84,
2010.
[62] J. S. Werner, S. K. Donnely, and R. Kliegl. Aging and human macular pigment intensity. Vision
Research, 27(2):257–268, 1987.
[63] Wikipedia. Photoreceptor cell. http://en.wikipedia.org/wiki/Photoreceptor_cell#cite_
note-6.
[64] Hua Zhang, Ruimin Hu, Jun Chang, Qingming Leng, and Yi Chen. Research of image retrieval
algorithms based on color. In Proceedings of the Third international conference on Artificial
intelligence and computational intelligence - Volume Part II, AICI’11, pages 516–522, Berlin,
Heidelberg, 2011. Springer-Verlag.
[65] H. Zhou, Mei Chen, and J.M. Rehg. Dermoscopic interest point detector and descriptor. In
Biomedical Imaging: From Nano to Macro, 2009. ISBI ’09. IEEE International Symposium on,
pages 1318–1321, July 2009.
83
84

Download Report

Thesis - Técnico Lisboa

Paperzz.com

Your Paperzz