Pattern Recognition 34 (2001) 727}739
Texture discrimination with multidimensional distributions
of signed gray-level di!erences
Timo Ojala *, Kimmo Valkealahti, Erkki Oja, Matti PietikaK inen
Machine Vision and Media Processing Unit, Infotech Oulu and Department of Electrical Engineering, University of Oulu, P.O.Box 4500,
FIN-90014 University of Oulu, Finland
Nokia Research Center, P.O. Box 407, FIN-00045 Nokia Group, Finland
Laboratory of Computer and Information Science, Helsinki University of Technology, FIN-02015 HUT, Finland
Received 25 January 1999; received in revised form 11 November 1999; accepted 24 November 1999
Abstract
The statistics of gray-level di!erences have been successfully used in a number of texture analysis studies. In this paper
we propose to use signed gray-level di!erences and their multidimensional distributions for texture description. The
present approach has important advantages compared to earlier related approaches based on gray level cooccurrence
matrices or histograms of absolute gray-level di!erences. Experiments with di$cult texture classi"cation and supervised
texture segmentation problems show that our approach provides a very good and robust performance in comparison
with the mainstream paradigms such as cooccurrence matrices, Gaussian Markov random "elds, or Gabor "ltering. 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
Keywords: Texture analysis; Classi"cation; Segmentation; Local Binary Pattern; Brodatz textures
1. Introduction
Texture analysis is important in many applications
of computer image analysis for classi"cation or segmentation of images based on local spatial variations of
intensity or color. A wide variety of measures for discriminating textures have been proposed [1}5].
A class of simple image properties that can be used for
texture analysis are the "rst-order statistics of local property values, i.e., the means, variances, etc. In particular,
a class of local properties based on absolute di!erences
between pairs of gray levels or average gray levels has
performed well; for example in the comparative studies of
Weszka et al. [6] and Conners and Harlow [7], in the
application study of Siew et al. [8], and in the analysis of
texture anisotropy by Chetverikov [9].
Usually di!erent kinds of measures are derived from
di!erence histograms, such as contrast, angular second
* Corresponding author. Tel.: #358-40-567-6646; fax: 358-8533-2612.
E-mail address: timo.ojala@oulu." (T. Ojala).
moment, entropy, mean, and inverse di!erence moment.
Whole distributions of gray-level di!erences were used
by Unser [10]. He considered the approximation of
a second-order distribution by a product of sum and
di!erence histograms as an alternative to the usual cooccurrence matrices. Experiments with Brodatz's [11] textures showed that the sum and di!erence histograms
jointly performed about as well as the cooccurrence matrices. Di!erence histograms appeared to be much more
powerful than sum histograms and performed, even on
their own, about as well as cooccurrence matrices.
Ojala et al. [12] introduced some new spatial
operators for texture classi"cation and conducted a
comparative study of various texture measures with
nonparametric classi"cation based on distributions of
single features or joint pairs of features. Their experiments showed that a very good texture discrimination
can be obtained by simple texture measures, like absolute
gray-level di!erences and local binary patterns.
The analysis of several cooccurring pixel values may
improve texture discrimination, but the exponential
growth of histogram size becomes a problem. Valkealahti
0031-3203/01/$20.00 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 0 1 0 - 8
728
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
and Oja [13] developed methods for reducing the size of
multidimensional cooccurrence histograms, obtaining
higher classi"cation accuracies with reduced multidimensional histograms than with channel histograms and with
wavelet packet signatures.
In this work we propose to use signed gray-level
di!erences, instead of absolute di!erences, and their
multidimensional distributions for texture description,
together with the earlier introduced local binary pattern
(LBP) operator which can be regarded as a simpli"cation of signed di!erences. Section 2 describes the method
and demonstrates its advantages over earlier related
approaches:
E in comparison with gray-level cooccurrences, signed
di!erences describe texture in a more compact and
e$cient form,
E signed di!erences are not a!ected by changes in mean
luminance,
E in comparison with (multidimensional) absolute di!erences, signed di!erences provide more information
about image texture and consequently are much more
powerful.
We evaluate the performance of signed di!erences with
two experimental setups. Section 3 presents results obtained in a di$cult classi"cation problem involving 32
Brodatz textures. For comparison purposes, results for
multidimensional cooccurrence histograms and absolute
di!erence histograms, as well for Gaussian Markov random "eld model and Gabor "ltering are presented. In
Section 4 the performance of signed di!erences is evaluated in supervised texture segmentation, using the problems of a recent large comparative study by Randen and
Husoy [14].
2. Signed gray-level di4erence histograms
In this section we demonstrate the advantages of signed gray-level di!erences over the traditional gray-level
cooccurrences. As an example, we model the dependence
between successive pixels in a monochrome texture image having G gray levels. The joint probability distribution of gray levels of successive pixels is denoted by
p(g , g ), g , g "0, 1,2, G!1. The two-dimensional
gray-level histograms estimating p(g , g ), that is, the
cooccurrence matrices, are very popular descriptors of
texture. Without losing information, the gray level g can
be subtracted from g giving distribution p(g , g !g ).
Assuming that g is independent of the di!erence
g !g , the distribution can be factorized
p(g , g !g )"p(g )p(g !g ),
(1)
Although an exact independence is not warranted in
practice, the factorized distribution may still approximate the joint distribution accurately. Fig. 1(a) shows the
average of distributions p(g , g !g ) computed from
the 32 natural textures used in the texture classi"cation
experiments in Section 3. Fig. 1(b) shows the average
error between p(g , g !g ) and p(g )p(g !g ). The
average error is rather small in proportion to the average
distribution, hence for this example the assumption of
independence seems quite reasonable.
The distribution p(g ) in Eq. (1) describes the overall
luminance of the image, which is unrelated to local image
texture, and consequently does not provide useful information for texture analysis. Hence, much of the information in p(g , g ) about the textural characteristic is
conveyed by the di!erence distribution p(g !g ). The
advantages of gray-level di!erences over gray levels are
thus clear: (1) the di!erences fall mainly within a narrower range than the gray levels, due to the high correlation between gray levels of adjacent pixels, consequently
providing a more compact description of texture; (2) the
signed di!erences are not a!ected by changes in mean
luminance.
Cooccurring di!erences provide more information
about local interpixel dependencies than the one-dimensional di!erence distribution p(g !g ). The present
Fig. 1. (a) The histogram p(g , g !g ) representing gray-level cooccurrences of horizontally adjacent pixels (G"16), pooled from the
32 Brodatz textures used in the texture classi"cation experiments in Section 3. (b) The average error produced by the factorization of
p(g , g !g ) of each texture into p(g )p(g !g ).
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
729
study evaluates the classi"cation performance of two-,
four-, and eight-dimensional di!erence distributions.
Computing cooccurring di!erences within 3;3-pixel
subimages,
g
g
g
g
g
g
g
g
g
we estimate the following distributions,
p (g !g , g !g ),
(2)
p (g !g , g !g , g !g , g !g ),
(3)
p (g !g , g !g ,2, g !g ),
(4)
Even though signed di!erences are not a!ected by
changes in mean luminance, great care should be taken
to ensure that gray scale properties remain constant
throughout the analysis procedure. This requirement
applies to most well-known paradigms, e.g. cooccurrence matrices are by de"nition very sensitive to all
changes in the gray scale. For this reason histogram
equalization or normalization of the gray scale, either
global or local, is often used prior to feature extraction.
2.1. Quantization of the multidimensional diwerence space
with vector quantization
Multidimensional di!erence distributions are advantageous over multidimensional gray-level cooccurrence
distributions for the same reason as described above. The
volume of the di!erence space equals (2G!1)I, where
k"2, 4, 8, corresponding to the distribution we are estimating. If we would straightforwardly describe the di!erence space with a k-dimensional histogram, we would
obtain, even with modest values of G, very large histograms that are computationally expensive and statisticaly unreliabile.
Instead of reducing G, for example, with simple requantization of each coordinate, we partition the k-dimensional di!erence space using vector quantization,
which in terms of classi"cation accuracy has been shown
to be superior [13,15]. For this purpose we employ
a codebook of N k-dimensional codewords, which have
indices n"0, 1,2, N!1. The codebook is trained with
the optimized LVQ1 training algorithm [16], by selecting 100 random vectors from each of the 1024 samples in
the training set (see Section 3.1 for a detailed description
on how the image data is divided into training and
testing sets). The small black and white rectangles in
Fig. 2(a) correspond to the locations of the codewords,
when the di!erence space of p is quantized with a code
book of 384 codewords.
Fig. 2. (a) The di!erence space of p and its quantization with
a codebook of 384 codewords and (b) the di!erence histogram of
a 64;64 texture sample. The indices of the 384 codewords
correspond to the 384 bins in the histogram.
We describe the di!erence information of a texture
sample with a di!erence histogram. The mapping from
the di!erence space to a di!erence histogram is straightforward. Given a particular k-dimensional di!erence vector, the index of the nearest codeword corresponds to the
bin index in the di!erence histogram. In other words,
a codebook of N codewords produces a histogram of
N bins. The di!erence histogram of a texture sample is
obtained by searching the nearest codeword to each
vector present in the sample, and incrementing the bin
denoted by the index of this nearest codeword. The
di!erence histogram of a 64;64 texture sample is illustrated in Fig. 2(b).
Resembling texture transforms have been proposed
in the past. Ojala et al. [12] suggested DIFFX/DIFFY,
which corresponded to the joint distribution of absolute
gray-level di!erences in horizontal and vertical directions. In other words, the only di!erence to p is that
730
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
absolute gray-level di!erences were used instead of signed di!erences. To realize the importance of this di!erence consider the following. Assuming d and d are the
V
W
signed di!erences of adjacent gray levels in horizontal
and vertical directions, we have four distinct pairs of
signs of d and d corresponding to four di!erent texture
V
W
patterns, that is +sgn(d ), sgn(d ),"+(#,#), (#,!),
V
W
(!,#), (!,!),. Whereas DIFFX/DIFFY interprets
these four patterns to be identical, p treats them as
distinct patterns, consequently providing more information about local image texture. This obvious advantage
of signed gray-level di!erences over absolute di!erences
will be veri"ed quantitatively in the classi"cation experiments in Section 3.
2.2. Important simplixcation: local binary pattern (LBP)
Ojala et al. [12] also proposed the local binary pattern
(LBP) operator, which is a simpli"cation of p . In LBP
the signs of the eight di!erences are recorded into an 8-bit
number:
1, if g *g ,
G
LBP" s(g , g )2G\, s(g , g )"
(5)
G
G
0, if g (g .
G
G
It is obvious that LBP contains less textural information than p . The motivation in using LBP instead of
p is two-folded: LBP's gray scale invariance and com
putational simplicity. E!ectively, whereas signed di!erences measure both the spatial organization (pattern) and
the contrast (amount) of local image texture, with LBP
we intentionally focus only on the spatial structure and
discard contrast as it depends on the gray scale. LBP is
by de"nition invariant against any monotonic transformation of the gray scale, i.e. as long as the order of
pixel values stays the same, the output of the LBP operator remains constant. This makes LBP very attractive in
situations where the gray scale is subject to changes due
to, e.g. varying illumination conditions which often have
to be coped with in many applications, for example in
visual inspection. The usefulness of the gray scale invariance will be demonstrated in the texture segmentation
experiments in Section 4. Computational simplicity is
another obvious advantage, as there is no need for the
quantization of the feature space or other time-consuming computations, but the easily calculated 8-bit LBP
numbers are simply accumulated into a histogram of 256
bins. This results in a very straightforward and e$cient
implementation, which may come handy in time critical
applications.
3. Texture classi5cation experiments
3.1. Image data
The 32 Brodatz [11] textures used in the experiments
are shown in Fig. 3. The images are 256;256 pixels in
size and they have 256 gray levels. Each image was
Fig. 3. The 32 Brodatz textures used in classi"cation experiments.
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
divided into 16 disjoint 64;64 samples, which were
independently histogram-equalized to remove luminance
di!erences between textures. To make the classi"cation
problem more challenging and generic, three additional
samples were generated from each sample: a sample
rotated by 903, a 64;64 scaled sample obtained from the
45;45 pixels in the middle of the &original' sample, and
a sample that was both rotated and scaled. Consequently, the classi"cation problem involved a total of
2048 samples, 64 samples in each of the 32 texture categories [13].
3.2. Classixcation principle
The performance of a particular classi"er was evaluated with 10 di!erent randomly chosen training and test
sets. The texture classi"er was trained by randomly
choosing, in each texture class, eight &original' samples,
together with the corresponding 24 transformed samples,
as models. The other half of the data, eight &original'
samples and the corresponding 24 transformed samples
in each texture class, was used for testing the classi"er. In
the classi"cation phase a test sample S was assigned to
the class of the model M that maximized the log-likelihood measure
,\
(6)
¸(S, M)" S ln M ,
L
L
L
where S and M correspond to the sample and model
L
L
probabilities of bin n, respectively.
3.3. Experimental results
We estimated distributions p , p , and p , by par titioning the di!erence space with a codebook of 384
codewords. The codebook was trained with the standard
optimized LVQ1 training algorithm, by selecting 100
random vectors from each of the 1024 samples in the
training set. In other words, 102 400 training vectors in
total were presented to the codebook. As a rule of thumb,
statistics literature often sugggests 10 entries per bin for
a histogram to be statistically reliable. Therefore, we
chose to use a codebook of 384 codewords, for it produces di!erence histograms of 384 bins, which corresponds nicely to roughly 10 entries per bin, given the
e!ective sample size of 62 (a one pixel border is excluded
in the computation of di!erences).
To demonstrate the advantages of signed gray-level
di!erences over the traditional gray-level cooccurrences,
we repeated the experiment using the following distributions,
H (g , g , g ),
H (g , g , g , g , g ),
H (g , g , g , g , g , g , g , g , g ).
(7)
(8)
(9)
731
Cooccurrence distribution H corresponds to p ,
H to p and H to p , meaning that they utilize the
same interpixel information as the corresponding difference histogram. Other factors of the experiment remained exactly the same, i.e. a codebook of 384 codewords was used to partition the cooccurrence space etc.
To demonstrate the superiority of signed gray-level
di!erences over absolute gray-level di!erences, we repeated the experiment estimating distributions
p?@Q("g !g ", "g !g "),
(10)
p?@Q("g !g ", "g !g ", "g !g ", "g !g "),
(11)
p?@Q("g !g ", "g !g ",2, "g !g "),
(12)
all other factors of the experiment remaining exactly the
same. Note that p?@Q corresponds to the DIFFX/DIFFY
operator proposed by Ojala et al. [12].
Average classi"cation accuracies over 10 experiments
are listed in Table 1 for each method. The corresponding
SD of the 10 scores is given in parentheses.
We see that each signed di!erence histogram outperforms its corresponding cooccurrence matrix. This owes
to the fact that the signed di!erences provide a more
compact presentation of interpixel relationships. The differences fall mainly within a narrower range than the
gray levels, due to the high correlation between gray
levels of adjacent pixels. Further, we observe that for
example with p?@Q an average accuracy of 85.3% (SD. of
0.7%) is achieved, hence treating the four di!erent signed
patterns the same results in a signi"cant loss in classi"cation accuracy in this problem. LBP, p 's simpli"cation,
produces an average accuracy of 91.2% (SD. of 0.7%),
hence discarding the contrast of the local image texture is
penalized with about 5% loss in the average classi"cation
accuracy. The gray-scale invariance of LBP has no impact as the gray-scale properties of the texture images are
strictly controlled.
3.4. Comparison to GMRF and Gabor features
We tackled the classi"cation problem also with the
Gaussian Markov random "eld (GMRF) and the Gabor
Table 1
Average classi"cation accuracies (%) over 10 experiments for
signed di!erence histograms, cooccurrence matrices, and absolute di!erence histograms. The number in the parentheses is the
SD of the 10 scores
Signed di!erences
Cooccurrence
matrices
Absolute di!erences
p
p
p
H
H
H
p?@Q
p?@Q
p?@Q
93.3 (0.7)
95.7 (0.7)
96.8 (0.7)
90.8 (0.8)
93.8 (0.6)
94.4 (0.7)
85.3 (0.7)
92.1 (0.5)
93.3 (0.6)
732
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
Table 2
Average classi"cation accuracies (%) over 10 experiments for the
GMRF features. The number in the parentheses is the SD of the
10 scores
Model order
Accuracy
First order
Second order
Third order
Fourth order
Fifth order
Sixth order
Seventh order
All
34.5
44.0
62.7
68.3
69.1
76.7
71.3
90.0
(1.6)
(0.9)
(1.2)
(1.6)
(1.2)
(2.0)
(2.4)
(1.2)
features, which are widely regarded as the state-of-the-art
methods in texture analysis. For feature computation we
used implementations that are publicly available in the
WWW. The implementation of the GMRF features was
obtained from the MeasTex site, which is a framework
for measuring the performance of texture classi"cation
algorithms, providing large image databases and source
codes of standard paradigms [17]. For the Gabor features we used two di!erent approaches, the generic design provided at the MeasTex site and the &optimized'
"lter design of Manjunath and Ma [18].
The MeasTex GMRF features were computed using
the standard symmetric masks, and all models from the
"rst order to the seventh order were attempted. Additionally, the features of all seven models were combined into
one large set of 73 GMRF features (row &all' in Table 2).
The Gabor "lter design at the MeasTex site includes
three di!erent wavelengths (2, 4, and 8 pixels) and four
di!erent orientations (0, 45, 90, and 1353), providing
a "lter bank of 12 "lters. The width of the Gaussian
window was set to wavelength/2. The "lter design strategy of Manjunath and Ma is to reduce the redundancy in
the representation by ensuring that the half-peak magnitude supports of the "lter responses cover the frequency
spectrum in a desired manner. We used their "lter parameters (four scales, six orientations, and the lower and
upper center frequencies of interest were set to 0.05 and
0.4, respectively), which produced a "lter bank of 24
"lters. In order to explore the e!ect of the spatial support
of the "lter designs all odd "lter sizes from 3;3 to
19;19 pixels were attempted. In the classi"cation we
used both the mean and the SD of the magnitude of the
"ltered images together as texture features. Hence a particular "lter size produced 12 (mean) and 24 (mean and
deviation) features for the MeasTex design and 24 (mean)
and 48 (mean and SD) features for the Manjunath and
Ma's design. A common approach is to use only the
mean, but as we observe from the experimental results,
employing also the SD improves the classi"cation accuracy. Again, the features obtained with the nine di!erent
mask sizes were combined into large sets of 108/216
(MeasTex) and 216/432 (Manjunath and Ma) features
(row &all' in Table 3).
Both the multivariate Gaussian discriminant and the
3-NN classi"er were used for classi"cation. When the
3-NN classi"er was used, the features were normalized to
have unit variance. The results reported in Tables 2 and
3 are for the Gaussian discriminant as it provided slightly
better performance. Because the GMRF and Gabor features extracted with a particular model or "lter design
are fairly correlated, the best classi"cation accuracy is
not necessarily obtained by using all features simultaneously, due to the curse of dimensionality. To maximize
Table 3
Average classi"cation accuracies (%) over 10 experiments for the Gabor features. The number in the parentheses is the SD of the 10
scores
Spatial "lter size
3;3
5;5
7;7
9;9
11;11
13;13
15;15
17;17
19;19
All
MeasTex "lter design [17]
Manjunath and Ma's "lter design [18]
Mean
Mean & SD
Mean
Mean & SD
58.0
86.4
86.1
89.4
90.0
89.5
89.6
88.2
87.8
91.8
83.6
92.3
92.2
93.4
93.8
93.1
92.7
91.6
91.3
94.8
93.9
93.7
93.4
93.0
93.4
93.1
92.4
92.7
92.3
95.3
95.1
94.7
94.5
93.8
93.8
93.5
93.9
93.7
93.6
95.9
(2.1)
(1.0)
(0.8)
(0.8)
(0.6)
(0.8)
(0.7)
(0.7)
(0.9)
(1.1)
(1.9)
(0.7)
(0.5)
(0.7)
(0.6)
(0.5)
(0.6)
(0.6)
(0.5)
(0.5)
(0.9)
(1.1)
(1.0)
(0.9)
(0.8)
(0.8)
(0.8)
(1.1)
(1.0)
(1.1)
(0.7)
(0.5)
(0.6)
(0.6)
(0.8)
(0.6)
(1.0)
(0.7)
(0.6)
(0.9)
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
classi"cation accuracy a stepwise search for best feature
combinations was performed, including both forward
and backward selection of features. Since feature selection was optimized with respect to the classi"cation accuracy on the test data, the results obtained can be slightly
biased.
The poor results for individual GMRF models indicate
that they are not suitable for this problem, mainly due to
the crude normalization of the gray scale. A subset including on average 13}14 features from the combined
feature set provides a decent average accuracy of 90.0%
(SD of 1.2%). The results for Gabor features are more
interesting. We observe that the &optimized' "lter design
of Manjunath and Ma does indeed better than the MeasTex's generic design and is also less sensitive to changes
in spatial support, although it is di$cult to say to which
extent the better performance can be attributed to the
larger number of "lters in the Manjunath and Ma's
design. Interestingly, in their design the best average
accuracy is achieved with the smallest spatial "lter size of
3;3 pixels. With this small kernel the "lters mostly
detect primitives such as edges and lines occurring at
a particular orientation and scale, less frequency information. Our results raise the question if too large
"lters (e.g. 17;17) are often used by default, considering
that a larger "lter is computationally more expensive, as
well.
As expected, using the deviation of the magnitude of
the "ltered images in addition to the mean improves
classi"cation accuracy, especially in the case of the MeasTex's design. This is easy to understand as the magnitude
distribution of several "ltered images may have identical
mean, but not necessarily identical deviation. The utilization of the magnitude distribution can be extended
further using the complete distribution instead of some
parameters computed from it as is done by Puzicha et al.
[19], for example.
Regarding feature selection, the best result is obtained
on average with 9 (mean) and 11 (mean and SD) features
for the MeasTex design, and with 11 (mean) and 12 (mean
and SD) features for the Manjunath and Ma's design. In
the case of the combined feature sets the best subset
determined by the stepwise search algorithm contained
on average 13 or 14 features.
The results for the large combined feature sets show
that using several "lter sizes simultaneously improves the
classi"cation accuracy somewhat over individual "lter
banks. However, it is di$cult to judge whether the small
improvement justi"es the increase in computational
overhead. In comparison to signed di!erences none of
the "lter combinations can quite match the accuracy of
operator p . The best subsets of the combined set of 432
features of the Manjunath and Ma's design are closest
with an average of 95.9% (SD of 0.9%), but it goes
without saying that there is no comparison in the computational complexity of these two approaches.
733
4. Texture segmentation experiments
As another test bench for our approach we employed
the supervised segmentation problems used in the recent
comparative study of Randen and Husoy [14]. They
reviewed the most major "ltering approaches to texture
feature extraction: Laws masks, ring/wedge "lters, dyadic
Gabor "lter banks, wavelet transforms, wavelet packets
and frames, quadrature mirror "lters, discrete cosine
transform, eigen"lters, optimized Gabor "lters, linear
predictors, and optimized "nite impulse response "lters.
For reference they also included two classical non"ltering approaches, cooccurrence and autoregressive features. In order to get comparative results we followed the
experimental setup of Randen and Husoy as closely as
possible.
4.1. Image data
The experiment involves images from three di!erent
sources: the Brodatz album [11], the MIT Vision Texture
database [20], and the MeasTex database [17]. Consequently, images captured with di!erent equipment and
under di!erent conditions are used. In order to minimize
discrimination by gray-level properties, the source images were globally histogram equalized before the separate portions used for training and testing purposes were
extracted.
Fig. 4 shows the 12 texture segmentation problems
that were constructed from the source images. We see
that the mosaics vary in terms of texture content and
layout. Mosaics C6 and C7 are notably challenging
16-class problems with a very complicated boundary
structure. The size, number of texture classes and image
source of each mosaic are given in Table 4. For each
texture present in a test mosaic there is a 256;256
training image that is extracted from a di!erent area in
the source image so that an unbiased error estimate is
obtained.
Since the source images were globally histogram
equalized prior to being used, the gray-level mean and
SD of a training image and the corresponding texture in
the test mosaic are roughly equal. However, since the
training and testing portions were extracted from di!erent locations in the large source image, there are few
notable, even visible di!erences in the gray-level properties between the training and testing portions of some
texture classes in certain mosaics. For example, in mosaic
C6 the textures in lower left-hand corner are clearly
lighter than their training versions. Di!erent mean graylevel is not a problem, if texture feature is invariant
against changes in mean as signed di!erences are, but
changes in gray scale can result in serious performance
loss. In terms of exact numbers, for example in mosaic
C5 the gray-scale deviation of the training portion of
a particular texture is 53.1, whereas the deviation of the
734
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
Fig. 4. Texture mosaics used in segmentation experiments. All mosaics are printed at 200 dpi.
same texture in the test mosaic is 75.9. This di!erence
may seem insigni"cant, but in fact it easily throws o!
balance any texture feature that is not invariant against
changes in gray scale. This will become apparent when
we examine segmentation results in more detail in Section 4.3.
We can conclude that the 12 mosaics constitute a very
realistic and challenging set of segmentation problems.
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
735
This was already demonstrated by the results of Randen
and Husoy, as none of the several dozens texture features
they evaluated could top 20% error rate in "ve of the 12
cases.
4.2. Segmentation principle
The segmentation process comprises three steps: partitioning of the di!erence space, computation of texture
models from the training images, and pixel-wise segmentation of the test mosaic. The di!erence space was partitioned with a codebook of N codewords. The codebook
was trained with the standard optimized LVQ1 training
algorithm, by selecting one fourth of the vectors present
in the training images. Texture models were obtained by
searching the nearest codeword to each di!erence vector
present in the training images, and incrementing the bin
denoted by the index of the nearest codeword. Pixel-wise
segmentation was done by centering a circular disk with
radius r at the pixel being segmented, computing the
sample histogram over the disk and assigning the pixel to
the class whose model was most similar to the sample
(Eq. (6)). Nearby the image borders only that portion of
the disk that was inside the image was considered. No
postprocessing, e.g. removal of unlikely small regions,
was applied.
The relationship between N, the number of bins (codewords), and r, the radius of the sampling disk, is of special
importance, as we want the sample histograms to be
statistically reliable. If f is the required average number of
entries per histogram bin, we can establish the following
relationship:
pr
N" .
f
(13)
In the experiments we relaxed f to "ve from the value of
10 used in texture classi"cation experiments. A related
and equally important factor is the size of the sampling
disk. A too small disk is not able to capture the properties
of local image texture, thus producing a noisy segmentation result. On the other hand, a too large disk, in
addition to being computationally more expensive, is not
able to locate texture boundaries accurately. There is not
one optimal size for the disk, since it depends on the
structure of the mosaic and the properties of textures.
Fig. 5 illustrates the segmentation error (proportion of
mislabeled pixels) in mosaic C1 as a function of r for
operators p and LBP. We see that the error rate "rst
decreases as the histogram computed over the disk becomes more discriminative, reaches the minimum and
then starts slowly increasing as the larger disk results in
more mislabeled pixels along texture boundaries. The
slightly more ragged appearance of p 's curve attributes
to each di!erent sized disk having a distinct quantization
of the di!erence space due to varying N (Eq. (13)). We
Fig. 5. Segmentation error in mosaic C1 as a function of the
radius of the sampling disk for operators p (solid) and LBP
(dashed).
used the value of 19 for r in the experiments. Consequently, the number of codewords was 226.
In the case of LBP there is no need for the quantization
of the feature space, but texture models are obtained
straighforwadly by computing the LBP histogram of the
training images. Pixel-wise segmentation was performed
by comparing the LBP histogram of the disk to the
models.
4.3. Experimental results
The 12 mosaics illustrated in Fig. 4 were segmented
using the supervised principle described in Section 4.2.
As the criterion for measuring the goodness of a segmentation result we used the percentage of mislabeled
pixels. The error rates for p , p , p and LBP are given
in Table 4. For reference, the best result obtained
by Randen and Husoy is given, together with the
name of the corresponding feature extractor. The best
result achieved for a particular mosaic is highlighted,
and the average error rate over the 12 problems is also
provided.
We observe that the best signed di!erence operator or
LBP outperforms the best result of Randen and Husoy in
11 of the 12 cases. The lone exception is mosaic C11, for
which the 32-tap FIR "lter ranks "rst. Even the simple
LBP ranks better than any approach of Randen and
Husoy for 10 mosaics. In many cases the improvement to
the results of Randen and Husoy is considerable.
Regarding the relative performance of p , p , and p ,
we generally expect p to be the most powerful as it
incorporates the largest number of cooccurring di!erences, and vice versa p is expected to be the least
powerful. According to the average error rates this
736
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
Table 4
Description of mosaics and segmentation results
Mosaic information
Signed di!erences
Best of Randen & Husoy [14]
No.
Size
Classes
Source
p
p
p
LBP
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
256;256
256;256
256;256
256;256
256;256
512;512
512;512
256;640
256;640
256;512
256;512
256;512
5
5
5
5
5
16
16
10
10
2
2
2
Brodatz
MIT
MIT
MIT
MeasTex
Brodatz
MIT
Brodatz
MIT
Brodatz
Brodatz
Brodatz
8.0
30.0
13.6
26.7
25.9
31.1
36.6
22.2
29.0
0.7
1.8
2.2
7.1
12.9
16.3
21.4
25.7
28.8
35.3
18.6
20.3
0.7
1.6
1.7
7.4
12.8
15.9
18.4
16.6
27.7
33.3
17.6
18.2
0.8
1.5
1.9
6.0
18.0
12.1
9.7
11.4
17.0
20.7
22.7
19.4
0.3
1.0
10.6
7.2
18.9
20.6
16.8
17.2
34.7
41.7
32.3
27.8
0.7
0.2
2.5
Opt. repr. Gabor "lter bank
f16b (d) (full rate)
F}2}1}smpl (d) (full rate)
f32d (d) (full rate)
f16b (d) (full rate)
Prediction error "lter
f16b (d) (full rate)
Dyadic Gabor "lter bank
F}2}1}smpl (d) (full rate)
J ,J ,J
+1 3 $
f32d (a) (full rate)
DCT
19.0
15.9
14.3
12.4
18.4
Average error rate
Average error rate
indeed seems to be the case, even though there are some
disparities with certain mosaics.
We see that LBP, despite its simplicity, provides
the lowest error rate of all operators in seven of the 12
cases. This impressive result attributes to the gray-scale
invariance of the LBP operator. It is understandably
a very useful property when the gray-scale properties of
the unknown test sample di!er from the training data,
which is the case in most of the 12 mosaics used in this
study. A related fact is that signed di!erences are by
de"nition sensitive to changes in gray-scale, just like
most of the feature extractors evaluated by Randen and
Husoy are.
This is visualized in Fig. 6, which shows the mislabeled
pixels when mosaic C5 is segmented using LBP (11.4%
error) and p (16.6% error). We see that in comparison
with LBP p does generally better elsewhere but fails
badly in the middle of the mosaic, where the texture in
the test mosaic has considerably di!erent deviation (75.9)
in comparison with the training image (53.1). We can
address this problem by normalizing the gray scales of
training and testing images, either globally or locally.
Fig. 6c shows the improved segmentation result of 8.9%
for p , when the images are globally normalized to have
identical gray-scale deviation. Similarly, Fig. 6(d) illustrates how local normalization of the gray-scale within
the area of the disk decreases the segmentation error to
7.7%. Unfortunately this type of normalization is not
a universal solution to the gray-scale dependency of
signed gray-level di!erences, because the outcome depends on the gray-scale properties of training and testing
images. For example in the case of mosaic C4 global
normalization increases the segmentation error considerably while local normalization does not produce any
Fig. 6. The mislabeled pixels in the segmentation of mosaic C5 using (a) LBP (11.4% error) and (b) p (16.6% error). The result of p can
be improved by normalizing the gray scale either (c) globally (8.9% error) or (d) locally (7.7% error).
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
737
Fig. 7. The mislabeled pixels in the segmentation of mosaic C7 using (a) LBP (20.7% error) and (b) p (33.3% error).
Fig. 8. Segmentation errors in mosaic C12, when LBP is used with three di!erent disk sizes: (a) r"20 (9.9% error), (b) r"40 (5.4%
error), and (c) r"60 (4.2% error).
signi"cant change in the overall segmentation accuracy.
Normalization also introduces computational overhead
to the procedure and it is up to the application whether
this overhead is acceptable.
Fig. 7 presents the segmentation of the 16-class mosaic
C7 using LBP (20.7% error) and p (33.3% error). In
terms of error rates mosaic C7 seems to be the most
di$cult problem, together with the other 16-class mosaic
C6. We see that p fails particularly badly with the
texture between the two right-hand circles. It hardly
comes as a surprise that training and testing portions of
this texture happen to have quite di!erent deviations
(51.8 vs 71.1). Local and global normalization of the gray
scale lower the segmentation error of p to 25.4% and
25.0%, respectively.
Some may "nd the performance of p , p , p and LBP
surprisingly good, given the small support of 3;3 pixels.
One may argue that this operator size is by no means
adequate, in comparison with e.g. much larger Gabor
"lter masks that are often used. Actually, the &built-in'
support of our operators is inherently larger than 3;3
pixels, as only a speci"c limited set of di!erence vectors
or binary patterns can occur next to a particular di!erence vector or binary pattern. Further, the histogram of
local operator responses incorporates larger scale texture
properties. However, if a larger scale analysis is required,
it is accomplished by simply increasing the predicate (i.e.
neighborhood size) of the operators. Multi-scale analysis
is achieved by combining the output of operators computed at di!erent scales, e.g. by concatenating their
histograms or by aggregating the similarity scores of
individual scales into the "nal similarity score.
The importance of spatial scale is demonstrated in the
segmentation of mosaic C12, which may appear as an
easy two-class problem. However, it is actually a surprisingly di$cult problem, considering that many of the
feature extractors evaluated by Randen and Husoy
scored high error rates over 10%. The di$culty of this
problem attributes to the left-hand texture, which despite
its distinct general appearance is confusingly similar to
the right-hand texture, when examined in small patches.
In this case an alternative to multi-scale analysis is to
increase the size of the disk, so that it is able to capture
the large-scale structure of the textures. Fig. 8 illustrates
738
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
the segmentation result for LBP with three di!erent disk
sizes (r"20,40,60). Since there is only one texture
boundary present in the image, we are not penalized too
much for using a very large disk and the lowest segmentation error is obtained with the largest disk.
The quantization of the di!erence space is randomized
in the sense that the code vectors are initialized with
random values. To explore the stability of the quantization, which naturally a!ects the repeatability of the
segmentation process, we repeated the segmentation experiment of particular mosaics with nine di!erent random seeds, in addition to the constant reference seed used
in the experiments generally. We chose three cases of
di!erent levels of segmentation error: C7 for its high
error rate (p with 33.3%), C10 for its low error rate
(p with 0.7%) and C3 as an example of an average
error rate (p with 13.6%). In the case of mosaic C7 the
error rates of the ten experiments ranged from 33.12 to
33.54 with an average of 32.32 and a deviation of 0.13.
For mosaic C10 the scores ranged from 0.54 to 0.76,
averaging 0.66 with an SD of 0.06. Segmentation of
mosaic C3 is equally stable, as the error rates ranged
from 13.31 to 13.80 with an average of 13.54 and a deviation of 0.16.
classi"cation and supervised segmentation problems
have demonstrated. Even though we used only one spatial scale of 3;3 pixels to compute the signed di!erences,
the results compare favorably to those of the two mainstream paradigms, Gaussian Markov random "elds and
Gabor "ltering.
In order to improve our method we are currently
working on incorporating multiple spatial scales in the
presentation and developing a powerful rotation-invariant modi"cation.
Acknowledgements
Financial support provided by the Academy of Finland is gratefully acknowledged. The authors also wish to
thank Dr. Trygve Randen from Schlumberger GecoPrakla for providing texture mosaics and Mr. Juha KylloK nen from the University of Oulu for his help with the
experiments.
Note
Image data used in the experiments can be downloaded from http://www.ee.oulu."/research/imag/texture/.
5. Discussion and conclusions
In this paper we proposed a method based on distributions of signed gray-level di!erences for texture discrimination. As cooccurring di!erences provide more information of local interpixel dependencies, we propose to
use joint two-, four- and eight-dimensional di!erences. In
comparison with traditional cooccurrence matrices the
multidimensional histograms of signed di!erences provide a more compact texture description and are not
a!ected by changes in mean luminance. In comparison
with the earlier used absolute di!erences signed di!erences contain by de"nition more information about local
image texture.
As we have observed in Section 3, signed di!erences
are very powerful if the gray-scale properties of the image
data are strictly de"ned. Similarly, the segmentation experiments in Section 4 exposed the gray-scale dependency of signed di!erences, which can be eliminated to
a certain extent with either global or local normalization
of the gray scale. However, a more robust analysis can be
achieved with the local binary pattern (LBP) operator.
LBP is a simpli"cation of the eight-dimensional signed
di!erence operator p , being by de"nition invariant
against any monotonic changes in the gray scale. LBP is
also computationally e$cient, as it requires no a priori
quantization of the feature space.
Despite its theoretical simplicity our approach is very
powerful as the experimental results in di$cult texture
References
[1] R.M. Haralick, Statistical and structural approaches to
texture, Proc. IEEE 67 (1979) 786}804.
[2] R.M. Haralick, L.G. Shapiro, Computer and Robot Vision, Vol. 1, Addison-Wesley, Reading, MA, 1992.
[3] T.R. Reed, J.M.H. Du Buf, A review of recent texture
segmentation and feature extraction Techniques, CVGIP:
Image Understanding 57 (1993) 359}372.
[4] M. Tuceryan, A.K. Jain, Texture analysis, in: C.H. Chen,
L.F. Pau, P.S.P. Wang (Eds.), Handbook of Pattern Recognition and Computer Vision, World Scienti"c, Singapore, 1993, pp. 235}276.
[5] L. Van Gool, P. Dewaele, A. Oosterlinck, Texture analysis
Anno 1983, Comput. Vision Graphics, Image Process. 29
(1985) 336}357.
[6] J. Weszka, C. Dyer, A. Rosenfeld, A comparative study
of texture measures for terrain classi"cation, IEEE Trans.
Systems Man, Cybernet. 6 (1976) 269}285.
[7] R.W. Conners, C.A. Harlow, A theoretical comparison
of texture algorithms, IEEE Trans. Pattern Anal. Mach.
Intell. 2 (1980) 204}222.
[8] L. Siew, R. Hodgson, E. Wood, Texture measures for
carpet wear assessment, IEEE Trans. Pattern Anal. Mach.
Intell. 10 (1988) 92}105.
[9] D. Chetverikov, GLDH based analysis of texture anisotropy and symmetry: an experimental study, Proceedings of
the 12th International Conference on Pattern Recognition,
Jerusalem, Israel, Vol. 1, 1994, pp. 444}448.
T. Ojala et al. / Pattern Recognition 34 (2001) 727}739
[10] M. Unser, Sum and di!erence histograms for texture classi"cation, IEEE Trans. Pattern Anal. Mach. Intell. 8 (1986)
118}125.
[11] P. Brodatz, Textures: A Photographic Album for Artists
and Designers, Dover, New York, 1966.
[12] T. Ojala, M. PietikaK inen, D. Harwood, A comparative
study of texture measures with classi"cation based on
feature distributions, Pattern Recognition 29 (1996) 51}59.
[13] K. Valkealahti, E. Oja, Reduced multidimensional cooccurrence histograms in texture classi"cation, IEEE Trans.
Pattern Anal. Mach. Intell. 20 (1998) 90}94.
[14] T. Randen, J.H. Husoy, Filtering for texture classi"cation:
a comparative study, IEEE Trans. Pattern Anal. Mach.
Intell. 21 (1999) 291}310, http://www.ux.his.no/&tranden/.
[15] T. Ojala, M. PietikaK inen, J. KylloK nen, Cooccurrence histograms via learning vector quantization, Proceedings of the
11th Scandinavian Conference in Image Analysis, Kangerlussuaq, Greenland, 1999, pp. 103}108.
739
[16] T. Kohonen, J. Kangas, J. Laaksonen, K. Torkkola,
LVQ}PAK: a program package for the correct application
of learning vector quantization algorithms, Proceedings of
the International Joint Conference on Neural Networks,
Baltimore 1992, pp. 1725}1730.
[17] G. Smith, I. Burns, Measuring texture classi"cation algorithms, Pattern Recognition Lett. 18 (1997) 1495}1501,
http://www.cssip.elec.uq.edu.au/&guy/meastex/meastex.
html.
[18] B.S. Manjunath, W.Y. Ma, Texture features for browsing
and retrieval of image data, IEEE Trans. Pattern Anal.
Mach. Intell. 18 (1996) 837}842, http://vivaldi.ece.ucsb.edu/
users/wei/code}gabor/.
[19] J. Puzicha, T. Hofmann, J.M. Buhmann, Histogram clustering for unsupervised segmentation and image retrieval,
Pattern Recognition Lett. 20 (1999) 899}909.
[20] MIT Vision and Modelling Group, 1998, http://www.media.mit.edu/vismod.
About the Author*TIMO OJALA received his Dr.Tech. degree in 1997 from the University of Oulu, where he is currently working as
the associate director of the MediaTeam Oulu research group and as a research manager in the Machine Vision and Media Processing
Unit. Dr. Ojala's research interests include pattern recognition, machine vision, and multimedia communications.
About the Author*KIMMO VALKEALAHTI received his Dr. Tech. in computer science from Helsinki University of Technology in
1998. Currently, he is with the Radio Communications Laboratory at Nokia Research Center, Helsinki, Finland. He is the author of
several journal and conference papers on pattern recognition, image analysis, and neural networks. His present research interests include
the optimization of radio resource management in the third generation mobile networks.
About the Author*ERKKI OJA received his Dr.Tech. degree in 1977 from Helsinki University of Technology, Finland, where he is
presently Professor of Computer Science at the Laboratory of Computer and Information Science. Dr. Oja is the author of over 200
articles and book chapters on pattern recognition, computer vision, and neural computing, and the book `Subspace Methods of Pattern
Recognitiona, which has been translated into Chinese and Japanese. His research interests are in the area of principal components,
independent components, self-organization, statistical pattern recognition, and applying arti"cial neural networks to computer vision
and signal processing. Dr. Oja is a member of the Finnish Academy of Sciences, past chairman of the Finnish Pattern Recognition
Society, member of the Governing Board of the International Association of Pattern Recognition (IAPR), IAPR Founding Fellow,
President of the European Neural Network Society (ENNS), and IEEE Fellow. He is a member of the editorial boards of several
journals, including `Neural Computationa, `IEEE Transactions on Neural Networksa, and `Int. Journal of Pattern Recognition and
Arti"cial Intelligencea.
About the Author*MATTI PIETIKAG INEN received his Dr. Tech. degree in Electrical Engineering from the University of Oulu,
Finland, in 1982. Currently he is Professor of Information Technology, Scienti"c Director of Infotech Oulu research center, and Head of
Machine Vision and Media Processing Unit at the University of Oulu. From 1980 to 1981 and from 1984 to 1985 he was visiting the
Computer Vision Laboratory at the University of Maryland, USA. His research interests cover wide aspects of machine vision, including
texture analysis, color vision and document analysis. His research has been widely published in journals, books and conferences. He is
the editor of the books `Machine Vision for Advanced Productiona (with L.F. Pau) and Texture Analysis in Machine Vision, published
by World Scienti"c in 1996 and 2000, respectively. Prof. PietikaK inen is a founding Fellow of International Association for Pattern
Recognition (IAPR) and Senior Member of IEEE, and serves as Member of the Governing Board of IAPR. He also serves on program
committees of several international conferences.
© Copyright 2026 Paperzz