The Effect of Prior Probabilities in the Maximum Likelihood

The Effect of Prior Probabilities in the
Maximum Likelihood Classification on
Individual Classes: A Theoretical Reasoning
and Empirical Testing
Zheng Mingguo, Cai Qianguo, and Qin Mingzhou
Abstract
The effect of prior probabilities in the maximum likelihood
classification on individual classes receives little attention,
and this is addressed in this paper. Prior probabilities are
designed only for overlapping spectral signatures. Accordingly, their effect on an individual class is independent of
the classes that are spectrally separable from this class.
The theoretical reasoning reveals that an increased prior
probability, which shifts the decision boundary away from
the class mean, will increase the assignment and boost the
producer’s accuracy as compared to the use of equal
priors; though the change of the user’s accuracy is not
constant, it is expected to decrease in most cases. The
tendency is just the opposite when a lower prior probability is used. A case study was conducted using Landsat TM
data provided along with ERDAS Imagine® software. Two
other pieces of evidence derived from the published
literature are also presented.
Introduction
The maximum likelihood classification (MLC) is the most
widely used method of classifying remotely sensed data
(Maselli et al., 1992). In standard digital image processing, MLC has been considered to be the most advanced
classification strategy for a long time (Maselli et al.,
1992). One of the superior properties of the MLC algorithm is that it can make use of the prior probabilities
derived from ancillary information concerning the area to
be classified, so that remotely sensed data can be integrated with data collected conventionally (Maselli et al.,
1990). By helping to resolve confusion among classes
that are poorly separable (McIver and Friedl, 2002),
prior probabilities can be a powerful and effective aid to
improve classification accuracy (Strahler, 1980). It is
desirable to obtain a reliable prior probability for each
class and use it to at least classify the pixels likely to be
Zheng Mingguo and Cai Qianguo are with the Key
Laboratory of Water Cycle and Related Land Surface
Processes, Institute of Geographic Sciences & Natural
Resources Research, Chinese Academic of Sciences,
Beijing 100101, China ([email protected]).
Qin Mingzhou is with the College of Environment and
Planning, Henan University, Kaifeng 475001, China.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
misclassified (Ediriwickrema and Khorram, 1997).
However, it is a common practice to perform the MLC
with equal prior probabilities, as reliable prior probabilities are not always available.
Up until now, a large number of studies have been
carried out on the incorporation of prior probabilities
into MLC (e.g., Maselli et al., 1990; Maselli et al., 1992;
Strahler, 1980; Gorte and Stein, 1998; Pedroni, 2003).
These studies are generally dedicated to an improvement
in the general performance of the classifier over the
entire image under examination and frequently culminate
in reporting an improved overall accuracy or Kappa
coefficient. However, overall accuracy or Kappa coefficient represents an average accuracy for all classes, and
provides little information about individual classes. As
compared to the use of equal priors, a higher overall
classification accuracy is expected if prior probabilities are
properly estimated. However, this does not mean a synchronous improvement in the classification accuracy for
all individual classes. In many cases, though an increased
Kappa coefficient or overall accuracy can be achieved by
inserting prior probabilities into MLC, producer’s accuracy
or user’s accuracy for individual classes decrease for some
classes while they increase for others. Obviously, not all
classes in a case are of interest; sometimes the purpose of
a classification is simply to obtain information about one
or several individual classes. Therefore, the effect of prior
probabilities on individual classes deserves some attention. The objective of this paper is to understand how
prior probabilities affect the MLC discrimination process
and particularly to investigate the effect of prior probabilities on individual classes. One section of the present
paper is dedicated to the discussion of the effect of prior
probabilities on the MLC discrimination process. Further, a
mathematical reasoning is carried out and some general
rules are established on the variation in the classification
accuracy of individual classes after the incorporation of
prior probabilities into MLC. Finally, a case study is
conducted to test these rules using Landsat TM data
provided along with ERDAS Imagine® software.
Photogrammetric Engineering & Remote Sensing
Vol. 75, No. 9, September 2009, pp. 1109–1117.
0099-1112/09/7509–1109/$3.00/0
© 2009 American Society for Photogrammetry
and Remote Sensing
S e p t e m b e r 2 0 0 9 1109
Maximum Likelihood Classification
MLC Assuming Equal Prior Probabilities (MLCEP)
The maximum likelihood decision rule is based on a normalized (Gaussian) estimate of the probability density function
of each class. The discriminant function for MLC without
considering any prior probabilities or assuming equal prior
probabilities can be expressed as (Strahler, 1980):
g1i (X ) ⫽ P(X 冷 vi)
(1)
where vi stands for class i, g1i(X ) stands for the discriminant function for vi, X is a pixel vector, and P(X ƒvi) is the
probability density function for pixel vector X to be a
member of vi. For classification of a pixel vector X using
MLC, g1i(X ) must be calculated for each class; X is then
assigned to the class for which g1i(X ) is the largest.
This process takes into account not only the marginal
properties of the data sets but also their internal relationships. This is one of the reasons for the great robustness of
the process and for its relative insensitivity to distribution
anomalies (Maselli et al., 1992). However, this process
assigns a specific pixel to a class based only on the similarities between the pixel vector and predefined prototypes.
This leads to a suboptimal performance of the MLC, especially in the presence of a large spectral overlap among the
classes being considered. This will be detailed below.
MLC using Actual Prior Probabilities (MLCPP)
The maximum likelihood decision rule can be modified
easily to take into account prior probabilities, which are
simply the expected area proportions of the classes in a
particular scene or stratum (Pedroni, 2003). These prior
probabilities are sometimes termed “weights” since the
modified classification rule will tend to weigh more heavily
those classes with higher prior probabilities (Strahler, 1980).
P(vi), the prior probability of vi, can be inserted into a
maximum likelihood process by modifying the discriminant
function in the following way (Strahler, 1980):
g 2i (X ) ⫽ P(X 冷 vi) P(vi).
(2)
When prior probabilities are assumed equal or removed,
Equation 2 is identical to Equation 1. g2i(X ) is directly
related to the a posteriori probability, which is a relative
probability that a pixel belongs to a given class (Strahler,
1980; Huang and Ridd, 2002). In this way, pixels are
allocated to their most likely class of membership.
When prior probabilities are used in the maximum
likelihood decision rule, even the class mean vector may be
classified as a different class. The more extreme the values
of the prior probabilities, the less important are the actual
observation values (Strahler, 1980). Strahler (1980) and
Pedroni (2003) hold that prior probabilities tend to produce
larger volumes for classes that are expected to be large and
smaller volumes for classes expected to be small. However,
the following analysis does not absolutely agree with this.
For example, even for the class with the next largest anticipated size, the number of pixels assigned to it may still
decrease, if it happens to have large spectral confusion with
the class with the largest anticipated size.
Theoretical Analysis of the Effect of Prior Probabilities
on the Maximum Likelihood Classification
The Effect of Prior Probabilities on the Discrimination Process
Figure 1 illustrates the effects of prior probabilities in onedimensional feature space. Class v1 and vj are two classes
considered during the classification procedure and in this
1110 S e p e t e m b e r 2 0 0 9
Figure 1. An example illustrating the effects of prior
probabilities in one-dimensional feature space (modified
from Zheng et al. (2005)). Classes v1 and vj are two
classes considered during the classification procedure.
P(X ƒ v1) and P(X ƒvj) are their probability density functions.
X, Xa, Xt, and Xe are one-dimensional feature vectors
in this case. Points z and y are points at curves of
P(X ƒv1) and P(X ƒvj), respectively. Line XeXe represents
the decision boundary when prior probabilities are
assumed equal; line Xa Xa represents the decision
boundary when the decision boundary shifts away from the
mean of class vj; line XtXt represents the decision
boundary when the decision boundary shifts towards
the mean of class vj. The value of P(X ƒvj) at point a is
extremely small so point “a” can be considered as the
lower limit of the distribution of class vj in this feature (for
instance, spectral band); also, point “b” can be considered as the upper limit of the distribution of class v1.
Thus, the spectral overlap between these two classes can
be considered to occur roughly within the interval (a, b).
A more detailed explanation can be found in the text.
section it is assumed that class vj is only significantly mixed
spectrally with class v1. It should be noted that P(X ƒ v1) and
P(X ƒvj) are actually discrete, and they are plotted in the
style of a continuous function simply for a clear presentation; this amounts to the assumption of an infinitesimal
radiometric resolution. P(X ƒ v1) is equal to P(X ƒ vj) at Xe; the
decision boundary is therefore located at X e X e when MLCEP
is performed. It is assumed that xz ⫽ 2xy; this implies that
the probability that vector X occurs in v1 is twice as large as
the probability in vj. As a result, pixel vector X is assigned
to v1. However, this decision cannot lead to an optimal
result. For example, suppose that the expected area proportions of v1 and v j are 10 percent and 30 percent, respectively. In this case, the number of pixels in v j is expected to
be 1.5 times the number in v1 at vector X; v j is therefore
favored for vector X over v1.
Suppose that the expected area proportions above of v 1
and v j are assigned to their prior probabilities. In addition,
simply for a clear presentation in Figure 1, we expand
discriminant functions of the MLCPP for all classes by a factor
of ten, which obviously has no effect on the discrimination
process and classification results. Now the discriminant
functions for v1 and vj are P(X ƒ v1) and 3P(X ƒ vj), respectively. Consequently, as shown in Figure 1, the decision
boundary will shift away from the mean of class v j from
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
Figure 2. A virtual image illustrating the effects of prior
probabilities. The area to the left of line aa⬘ is attributed to
Class A and the area to the right of line bb⬘ is attributed to
Class B. The uncertainty occurs in the zone aa⬘ b⬘b. Points
a1 and b1 are isolated pixels.
it is assumed that more than one class is spectrally confused
with v j and only one of them, namely v 1, is plotted in
Figure 1. For most of the individual classes, there are
expected to be two decision boundaries in the one-dimensional case. To decrease the complexity of the subsequent
reasoning, the right boundary of class v j in Figure 1 is
assumed to be fixed and located at positive infinity along
the X axis. Accordingly, the decision boundary of class v j in
the discussion below simply refers to its left decision
boundary. Actually, the conclusions remain the same
wherever the right boundary of class v j is assumed to be
located.
It is also assumed that when MLCEP is performed, the
decision boundary of class v j is located at X e Xe as shown in
Figure 1. The resultant user’s accuracy (UA(j)) and producer’s accuracy (PA(j)) for v j are:
Na jPj
n
UA(j) ⫽ X jj / g X ji ⫽
i⫽1
X eX e to X aX a. This implies that to the right of line X aX a,
class v j is expected to possess more pixels than class v1 at
any pixel vector; and to the left of line X aX a, class v 1 is
expected to possess more pixels than class v j at any vector.
Actually, the right-hand term of Equation 2 will simply be
the expected number of pixels in one class at the examined
vector if multiplied by the total number of pixels in the
image. Pixel vector X is then assigned to v j, the MLCPP
procedure is therefore expected to produce more correct
assignments than the MLCEP (Zheng et al., 2005).
In Figure 1, the spectral overlap between class v 1 and v j
can be considered to occur roughly within the interval (a, b).
Hence, it is reasonable to assume that the decision boundary
will not go beyond the interval (a, b) unless an extreme
prior probability is assigned. Accordingly, in Figure 2,
though any modification in prior probabilities will always
be accompanied by a change of the border between class A
and B on the classified image derived from MLCPP, the
border will generally not be outside the zone aa⬘ b⬘b. For the
same reason, the assigned class labels for point a1 and b1 in
Figure 2 remain unchanged regardless of the assigned values
of the prior probabilities, unless they have an overlapping
spectral signal as in the zone aa⬘ b⬘b. Therefore, it can be
concluded that prior probabilities in MLC are designed only
for overlapping spectral signatures and have no effect where
the spectral signatures are widely separated (Pedroni, 2003).
As a result, the effect of prior probabilities on one individual class is independent of the classes that are spectrally
separable from it and is determined by both its assigned
prior probability (anticipated size) and the prior probabilities
assigned to the other classes having overlapping signatures
with it.
The Effect of the Shift of the Decision Boundary
on the Classification Result
Producer’s accuracy and user’s accuracy are often used to
measure the classification accuracy for an individual class.
Producer’s accuracy indicates the probability of a reference
pixel being correctly classified and is a measure of omission
error. User’s accuracy is indicative of the probability that
a pixel classified on the map actually represents that
class on the ground and is a measure of commission error
(Congalton, 1991).
Clearly, the effect of prior probabilities is achieved by
shifting the decision boundary in the feature space, as will
be discussed below in terms of two sub-cases, i.e., shifting
away from the class mean and shifting towards the class
mean. Class vj in Figure 1 will be used to illustrate the
effect in the following discussion. However, in this section,
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
n
g Nai Pi
⫽
i⫽1
Pj
n
ai
Pi
g
i⫽1 a j
PA(j) ⫽ X jj /Na j ⫽ Pj
(3a)
(3b)
where N is the total number of pixels in the image to be
classified and “n” is the total number of classes considered;
ai is the relative area proportion of class vi; Xji represents
the number of pixels classified as v j that truly belong to vi.
Pi represents the area under the probability density curve of
class vi in the interval (Xe, ⫹⬁ ), i.e., Pi ⫽ 1 ⫹q P(X|vi )dx .
Xe
Also, we define Pia to represent the area under the
probability density curve of class vi in the interval (Xa, Xe),
Xe
i.e., Pi a ⫽ 1Xa P(X|vi )dx . When the decision boundary for
vj shifts from line XeXe to XaXa (away from its mean), the
user’s accuracy and producer’s accuracy for class vj become:
UA a(j) ⫽
Pj ⫹ Pj a
n
g
i⫽1
(4a)
ai
(P ⫹ Pi a)
aj i
PA a(j) ⫽ Pj ⫹ Pja.
(4b)
With the shift of the decision boundary away from
the class mean, class vj will receive increasing assignments, including increasing correct assignments and
increasing incorrect assignments. This will lead to a larger
area and a higher producer’s accuracy in the classification
result. Equations 3b and 4b reveal that the producer’s
accuracy of class vj is improved by Pja when the decision
boundary shifts from line X eX e to X aX a. Unlike the change
of the producer’s accuracy, however, since both numerator
and denominator in Equation 3a increase, the change
of the user’s accuracy is not constant, i.e., only when
n
n
i⫽1
i⫽1
Pj g ai Pia 7 Pja g ai Pi , is UA(j) ⬎ UAa(j); otherwise UA(j)
⬍ UAa(j). Clearly, class vj generally presents a higher
occurrence probability in the interval (Xe, ⫹ ⬁ ) than in the
Na jPj
NajPja
interval (Xa, Xe), i.e., n
(NaiPi represents
7 n
g Nai Pi
g Nai Pi a
i⫽1
i⫽1
the number of pixels contained in class vi in the interval
(Xe,⫹⬁ ), and Nai Pia represents the number of pixels contained
in class vi in the interval (Xa, Xe)). Consequently, though the
change of the user’s accuracy is data-dependent, the user’s
accuracy is expected to be decreased in most cases when
the decision boundary shifts away from the class mean.
S e p t e m b e r 2 0 0 9 1111
Especially when only one class, e.g., class v1 in Figure 1,
spectrally overlaps class vj, its user’s accuracy will surely
decrease in response to the shift of the decision boundary
from line XeXe to XaXa. That is simply because Pj/P1 is larger
than 1 and Pja/P1a is smaller than 1 in this case.
When the decision boundary shifts towards the mean of
class vj from XeXe to X tXt (Figure 1), the user’s accuracy and
producer’s accuracy are:
UA t (j) ⫽
Pj ⫺ Pjt
n
g
i⫽1
(5a)
ai
(P ⫺ Pit)
aj i
PA t(j) ⫽ Pj ⫺ Pjt
(5b)
Xt
where Pi t ⫽ 1 Xe P(X|vi)dx.
Obviously, with the shift of the decision boundary
towards the class mean, class vj will receive decreasing
assignments, including decreasing correct assignments and
decreasing incorrect assignments. This will lead to a smaller
area and a lower producer’s accuracy in the classification
result. Equations 3b and 5b reveal that the producer’s
accuracy of class vj is decreased by Pjt in this case. The
change in the user’s accuracy is also data-dependent as both
numerator and denominator in Equation 3a become smaller
n
n
i⫽1
i⫽1
simultaneously, i.e., only when Pj g ai Pit 7 Pjt g ai Pi , is
UA(j) ⬍ UAt(j); otherwise UA(j) ⬎ UAt(j). In the interval
(Xe, ⫹⬁ ), P(X ƒvj) is the greatest among all classes. In the
neighborhood of point Xe, however, at least one class, for
example class v1 in Figure 1, has more or less the same
probability density as vj; in the interval (Xe, ⫹⬁ ), P(X ƒ vj) is
significantly larger than the probability density of all other
classes only in the region somewhat far away from point Xe.
This implies that in the interval (Xe, ⫹⬁ ), the probability that
class vj occurs in the neighborhood of point Xe is not so large
as that in the region somewhat far away from point Xe. Hence,
class vj generally possesses a higher probability of occurring in
the interval (Xe, ⫹⬁ ) than in the interval (Xe, Xt), i.e.,
NajPj
NajPjt
. Consequently, in most cases, a higher
7 n
n
g Nai Pi
g Nai Pi t
i⫽1
i⫽1
user’s accuracy is expected when the decision boundary
shifts towards the class mean.
The Effect of Prior Probabilities on the Classification Result
As previously discussed, the effect of prior probabilities on
an individual class depends on both its anticipated size and
the anticipated sizes of the other classes spectrally mixed
with it. Under the condition that prior probabilities are
acceptably estimated or at least the estimated prior probabilities can indicate adequately the rank order of the magnitudes of class sizes, if one class is mostly spectrally mixed
with classes having a smaller size (class MMWSS), the
decision boundary will shift away from its mean when
MLCPP is performed; on the contrary, the decision boundary
will shift towards the class mean if one class is mostly
spectrally mixed with classes having a larger size (class
MMWLS). Consequently, for class MMWSS, MLCPP will assign
more pixels to it, accompanied by a higher producer’s
accuracy and in most cases a lower user’s accuracy compared to MLCEP; and for class MMWLS, MLCPP will assign
fewer pixels to it, accompanied by a lower producer’s
accuracy and in most cases a higher user’s accuracy.
In practice, the true values of prior probabilities are
often unknown. Based on the analysis above, it is known
1112 S e p e t e m b e r 2 0 0 9
that the overestimation of a prior probability for a class will
boost the producer’s accuracy and in most cases reduce the
user’s accuracy; and the underestimation will reduce the
producer’s accuracy and in most cases boost the user’s
accuracy, provided that prior probabilities of the other
classes, or at least the classes spectrally mixed with it,
remain constant. Only when the true values of prior probabilities are used, can a global optimization be achieved. This
can also be illustrated using the two-class cases in Figure 1.
Suppose that the decision boundary between class v1 and vj
is at line XaXa when the true values of prior probabilities are
used. When the decision boundary shifts left from the mean
of class vj, though the assignment to class vj increases and
the assignment to class v1 decreases all the time, the cases
are different on the two sides of line XaXa. To the right of
line XaXa, for every shift in the decision boundary, most of
the increased assignments to vj are attributed to the correct
assignments and most of the decreased assignments to v1
are attributed to the incorrect assignments; as a result, the
correct assignments increase all the time in the global sense.
However, once the decision boundary goes to the left side of
line XaXa, the increased incorrect assignments outnumber
the increased correct assignments for class vj and the
decreased correct assignments also outnumber the decreased
incorrect assignments for class v1; the correct assignments
will therefore be decreased in the global sense. Consequently, the maximum correct assignments, and thus the
global optimization, are expected when the decision boundary is just located at line XaXa.
Experiment
The “lanier.img” file provided along with ERDAS Imagine®
software was selected to test the validity of the established
rules. A reference classification map, the “inlandc.img” file,
is also provided with the software. Lanier.img is a Landsat
TM image of Lake Lanier, Georgia, obtained by the Landsat-5
sensor. It is 512 columns and 512 rows in size and there are
ten classes present on it: deciduous, pine, deciduous/pine
mixed, water, agriculture, bare ground, grass, urban/developed, shade, and cloud. Actually, the three vegetation
classes are mixed with each other and are difficult to
distinguish on the image. To ensure the correctness of the
reference, they were merged into one single class named
“vegetation.” Hence, the reference data involves eight classes
and so do the two classification processes of MLCEP and
MLCPP. For MLCPP, prior probabilities were derived from the
class area proportions in the recoded inlandc.img file. The
two classifications were undertaken using the same training
set, which was selected directly from the recoded
inlandc.img file. The classified images derived from these
two classification procedures are presented in Figure 3. No
cosmetic filtering was applied to them. In view of the fact
that the sampled reference data may not give a completely
faithful reflection of the classification, the accuracy assessment was carried out with the whole image. By referring to
the whole recoded inlandc.img file, two confusion matrices
were generated (Tables 1 and 2).
The changes in the number of assigned pixels and the
producer’s and user’s accuracies for each class are displayed
in Table 3. Class area proportions derived from the recoded
inlandc.img file are also listed in Table 3. Of all the eight
classes, “vegetation” has the maximum area and thus the
highest prior probability assigned, implying that it can only
be spectrally confused with smaller-sized classes and thus
acts as the class MMWSS. As shown in Table 3, the number
of pixels labeled as “vegetation” increases from 154,102 to
161,284, accompanied by an increase of about 3 percent in
its producer’s accuracy and a decrease of about 1 percent in
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
Figure 3. The classified images derived from (a), the MLCEP and (b), the
MLCPP of the lanier.img file. A color version of this figure is available at the
ASPRS website: www.asprs.org.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
S e p t e m b e r 2 0 0 9 1113
TABLE 1.
MLCEP
OF THE LANIER.IMG
FILE
water
agriculture
bare
ground
grass
urban/
developed
shade
cloud
152113
110
14557
6526
435
1085
305
1856
184
20864
34
1418
3
2
409
156
1779
0
23712
457
699
311
0
28
0
0
71
6229
110
976
0
14
0
0
1707
2147
7829
767
0
68
0
0
998
1184
147
9217
2
88
26
659
0
104
0
11
2397
0
0
0
0
1
0
0
0
349
TABLE 2.
vegetation
water
agriculture
bare ground
grass
urban/developed
shade
cloud
VARIATION
OF THE
vegetation
vegetation
water
agriculture
bare ground
grass
urban/developed
shade
cloud
TABLE 3.
CONFUSION MATRIX
OF
CONFUSION MATRIX
OF THE
MLCPP
OF THE LANIER.IMG
FILE
vegetation
water
agriculture
bare
ground
grass
urban/
developed
shade
cloud
157504
112
11066
5609
520
1305
99
772
338
20950
48
1381
3
4
276
70
3254
0
22595
348
521
265
0
3
0
0
84
5965
136
1208
0
7
0
0
2101
1788
7751
860
0
18
0
0
1147
917
132
9380
1
59
188
798
0
106
0
12
2093
0
0
0
0
1
0
0
0
349
AREAS
AND
PRODUCER’S
Area
proportion
AND
USER’S ACCURACIES
Number of
assigned pixels
Class
value
Rank order
MLCEP
MLCPP
vegetation
water
urban/developed
agriculture
grass
bare ground
shade
cloud
67.5%
8.8%
4.4%
10.3%
4.8%
2.8%
1.2%
0.1%
1
3
5
2
4
6
7
8
154102
21633
12369
41079
9223
18066
3113
2559
161284
21860
13034
37041
9063
16115
2469
1278
BETWEEN THE
TWO CLASSIFICATION RESULTS
Producer’s accuracy
c
c
c
T
T
T
T
T
MLCEP
MLCPP
85.9%
90.4%
79.2%
87.9%
62.5%
84.2%
75.0%
99.7%
89.0%
90.8%
80.6%
83.7%
61.9%
80.6%
65.5%
99.7%
FOR THE LANIER.IMG
FILE
User’s accuracy
c
c
c
T
T
T
T
–
MLCEP
MLCPP
98.7%
96.4%
74.5%
57.7%
84.9%
34.5%
77.0%
13.6%
97.7%
95.8%
72.0%
61.0%
85.5%
37.0%
84.8%
27.3%
T
T
T
c
c
c
c
c
Area proportions are derived from the recoded inlandc.img file and used as prior probabilities in MLCPP.
c means increase, T means decrease, – means no variation
its user’s accuracy. Class “cloud” has the minimum area of
all classes, implying that it is expected to act as the class
MMWLS. As expected, prior probabilities assign fewer
pixels to it. However, its producer’s accuracy, remaining
unchanged, does not show an expected decrease despite an
expected increase shown in its user’s accuracy. This may be
attributed to its narrow spectral distribution. When equal
prior probabilities are assumed, almost all of the pixels
belonging to “cloud” (349 out of 350) have been correctly
classified (Table 1), suggesting the decision boundary may
be quite far from its mean. After the MLCPP is performed, the
incorrect assignment to “cloud” from each of the other
classes decreases unless it is equal to zero (see the last rows
in Tables 1 and 2, and this results in the increase in its
user’s accuracy ); this suggests the decision boundary has
shifted toward the mean of class “cloud.” However, there
may be no pixels belonging to “cloud” between the two
decision boundaries of the MLCPP and MLCEP in the feature
space. Consequently, no pixels belonging to “cloud” were
1114 S e p e t e m b e r 2 0 0 9
relabeled as other classes and the producer’s accuracy of
class “cloud” thus holds constant.
When there are a number of classes in the image, any
class is likely to be spectrally overlapped with both largersized and smaller-sized classes. However, the variation in
the number of assigned pixels indicates the general effect of
spectral confusion: the class which receives greater assignments after prior probabilities are used is expected to belong
to the class MMWSS and the class that receives smaller
assignments is expected to belong to the class MMWLS.
According to this, the class of “vegetation,” “water,” and
“urban/developed” are grouped as the class MMWSS, and the
other classes are grouped as the class MMWLS. Except for
“cloud,” as expected, both the number of assigned pixels
and the producer’s accuracy increase for class MMWSS and
decrease for class MMWLS, and the user’s accuracy decreases
for class MMWSS and increases for class MMWLS.
Classes of “shade” and “vegetation” are selected to
investigate the effect of the overestimated or underestimated
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
class mostly spectrally overlapped with class “vegetation.”
As shown in Tables 1 and 2, class “vegetation” is mostly
spectrally confused with class “agriculture” and the relative
area proportion of class “agriculture” is just 10.3 percent
(Table 3). This suggests that when the prior probability of
class “vegetation” was set to 0.1, the decision boundary
between classes “vegetation” and “agriculture” was located at
approximately the same place in the feature space as using
equal priors. For the same reason, class “shade” presents
approximately the same producer’s and user’s accuracies as
the ones resulting from MLCEP at the value of 0.1. And this
value is indicative of the size of class “water” (8.8 percent),
which has substantial spectral confusion with class “shade”.
Other Evidence
Figure 4. The effect of the overestimated or underestimated prior probabilities on individual classes: (a) Class
“shade,” and (b) Class “vegetation”. The dotted lines
indicate the true value of prior probabilities derived from
the recoded inlandc.img file.
prior probabilities on individual classes. Class “shade”
represents the class MMWLS and the class “vegetation”
represents the class MMWSS. The producer’s and user’s
accuracies for them in Figure 4 result from the modification
of the assigned values of their prior probabilities while prior
probabilities of the other classes remained the same as used
in the MLCPP above. As shown in Figure 4, both for class
“shade” and for class “vegetation,” the producer’s accuracy
increases and the user’s decreases with increasing prior
probability. Consequently, compared to the MLCPP above, an
overestimated prior probability boosted the producer’s
accuracy and reduced the user’s accuracy, and an underestimated prior probability reduced the producer’s accuracy and
boosted the user’s accuracy. Note that in Figure 4b, class
“vegetation” presents more or less the same producer’s and
user’s accuracies as the ones resulting from MLCEP at the
value of 0.1. This value is indicative of the size of the
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
The data in Table 4 are derived from Tables 2 and 3 in
Maselli et al. (1990), and the data in Table 5 are from Tables
2 and 4 in Maselli et al. (1992). MLCPP in Maselli et al. (1990)
was carried out using the previously classified images resulting from MLCEP as references and the error probabilities
derived from the confusion matrix found on the training
pixels as priors. In Maselli et al. (1992), prior probabilities
were derived from a nonparametric process proposed by
Skidmore and Turner (1988).
Except for Class 4 in Table 5, the number of the
assigned pixels and the producer’s accuracy show the
same change tendency for each of the other classes, and as
expected, the user’s accuracy shows an inverse change. One
possible reason for the only exception is that during the MLC
in Maselli et al. (1992) the pixels would not be assigned to
any class if their gray value was beyond the range of mean
⫾3 standard deviations of the relevant spectral signature in
all features considered. A substantial spectral confusion is
expected to occur at these pixels, so that these pixels,
though only accounting for a small number, have the
potential to confound the interpretation of the effect of the
prior probabilities.
For the large-sized class in the image, the smaller-sized
class is expected to outnumber the larger-sized class.
Statistically, the large-sized class is therefore subject to
spectral confusion with the smaller-sized class and thus acts
as the class MMWSS. For example, the largest-sized class will
surely be confused with the smaller-sized class unless it is
spectrally distinct. Likewise, the small-sized class is subject
to confusion with the larger-size class and thus acts as the
class MMWLS. This can be seen clearly in Tables 3, 4, and 5.
However, when the large-sized class spectrally overlaps a
larger-sized class or the small-sized class overlaps a smallersized class, an inverse tendency will be observed. For
example, the class “agriculture” is significantly spectrally
mixed with the largest-size class “vegetation” in the
lanier.img case. Hence, though class “agriculture” is the
second largest of all eight classes, it acts as the class MMWLS
rather than the class MMWSS.
Conclusions
When prior probabilities are inserted into the maximum
likelihood process, the absolute number of pixels at the
examined vector, rather than the relative probability that
the vector occurs in each class, is used as the discriminant criterion. In this way, an improved classification or a
greater correct assignment is expected as compared to the
use of equal priors if prior probabilities are properly
estimated. Prior probabilities are designed only for overlapping spectral signatures, and they have no effect where the
spectral signatures are highly separable in the feature
space. Accordingly, the effect of the prior probability on
S e p t e m b e r 2 0 0 9 1115
TABLE 4.
VARIATION
OF AREAS AND
AND MLCPP FOR
Number of assigned pixels
Class
MLCEP
MLCPP
1
2
3
4
5
3223
2351
1660
2300
2277
4254
2393
1651
1902
1608
PRODUCER’S
TM DATA OF
AND USER’S ACCURACIES BETWEEN
THE ISLAND OF SARDINIA
Producer’s accuracy
c
c
T
T
T
MLCEP
MLCPP
63.6%
88.8%
82.9%
61.7%
80.6%
76.3%
89.6%
82.9%
57.7%
68.5%
MLCEP
User’s accuracy
c
c
T
T
T
MLCEP
MLCPP
83.7%
92.0%
71.0%
56.7%
56.7%
76.0%
91.3%
71.4%
64.1%
68.2%
T
T
c
c
c
Derived from Tables 2 and 3 in Maselli et al. (1990).
TABLE 5.
VARIATION OF AREAS AND PRODUCER’S AND USER’S ACCURACIES BETWEEN MLCEP
AND MLCPP FOR TM DATA OF THE SOUTH OF FLORENCE, ITALY
Number of assigned pixels
Class
MLCEP
MLCPP
1
2
3
4
5
26932
24863
4206
13112
6226
10720
44309
3623
15794
2356
Producer’s accuracy
T
c
T
c
T
MLCEP
MLCPP
82.3%
48.5%
80.6%
68.1%
43.7%
74.2%
81.7%
73.1%
67.2%
20.2%
User’s accuracy
T
c
T
T
T
MLCEP
MLCPP
27.8%
92.5%
60.8%
68.9%
16.7%
62.9%
86.9%
63.2%
64.3%
19.9%
c
T
c
T
c
Derived from Tables 2 and 4 in Maselli et al. (1992).
an individual class is independent of the classes that are
spectrally separable from it and is determined by both its
anticipated size and the anticipated sizes of the other
classes significantly mixed spectrally with it. In most cases,
prior probabilities will not produce an improvement in
both the producer’s and user’s accuracies for individual
classes at the same time. An increase in the assigned value
of the class prior probability will shift the decision boundary away from its mean, and therefore boost the producer’s
accuracy. Though the change of the user’s accuracy is datadependent, it is expected to decrease in most cases. On the
contrary, a decrease in the assigned value of the class prior
probability will shift the decision boundary towards the
class mean and therefore reduce the producer’s accuracy
and in most cases boost the user’s accuracy. Consequently,
a large-sized class spectrally mixing with small-sized
classes generally tends to receive more assigned pixels and
presents a higher producer’s accuracy and a lower user’s
accuracy when using prior probabilities in the classification. For a small-sized class spectrally mixing with largesize classes, the tendency is just the opposite. In the
classification practice, if the modification of the assigned
value of the prior probability cannot result in an expected
change of an individual class in the classified result, the
validity of the reference data and thus the reliability of the
derived user’s and producer’s accuracies involving this
class or its spectrally mixed classes can be considered to
be suspect. In this way, the established rule can find its
use in operational applications.
Acknowledgments
The research is supported by National Natural Science
Foundation of China (40871138), the Innovation Project of
the Institute of Geographic Sciences and Natural Resources
1116 S e p e t e m b e r 2 0 0 9
Research, Chinese Academy of Sciences (O7V70030SZ)
and National Basic Research Program of China (No.
2007CB407207). The authors are grateful to two PE&RS
referees, also to the five IJRS referees. Their comments were
invaluable in improving the manuscript; some of them
rewrote the manuscript and greatly improved the English in
this paper. Dr. Xiaojun Zhang’s efforts with the written
English and Mr. Guanfu Han’s work on Figure 2 in the
manuscript are also appreciated.
References
Congalton, R.G., 1991. A review of assessing the accuracy of
classification of remotely sensed data, Remote Sensing of
Environment, 37:35–46.
Ediriwickrema, J., and S. Khorram, 1997. Hierarchical
maximum-likelihood classification for improved accuracy,
IEEE Transactions on Geoscience and Remote Sensing,
35:810–816.
Gorte, B., and A. Stein, 1998. Bayesian classification and class
area estimation of satellite images using stratification, IEEE Transactions on Geoscience and Remote Sensing, 36(3):803–312.
Huang, M.H., and M.K. Ridd, 2002. A subpixel classifier for urban
land-cover mapping based on a maximum-likelihood approach
and expert system rules, Photogrammetric Engineering &
Remote Sensing, 68(11):1173–1180.
Maselli, F., C. Conese, and G. Zipoli, 1990. Use of error probabilities to improve area estimates based on maximum likelihood
classification, Remote Sensing of Environment, 31:155–160.
Maselli, F., C. Conese, L. Petkov, and R. Resti, 1992. Inclusion of
prior probabilities derived from a nonparametric process into the
maximum likelihood classifier, Photogrammetric Engineering &
Remote Sensing, 58:201–207.
McIver, D.K., and M.A. Friedl, 2002. Using prior probabilities in
decision-tree classification of remotely sensed data, Remote
Sensing of Environment, 81:253–261.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
Pedroni, L., 2003. Improved classification of Landsat Thematic
Mapper data using modified prior probabilities in large and
complex landscapes, International Journal of Remote Sensing,
24:91–113.
Skidmore, A.K., and B.J. Turner, 1988. Forest mapping accuracies
are improved using a supervised nonparametric classifier with
SPOT data, Photogrammetric Engineering & Remote Sensing,
54(12):1415–1421.
PHOTOGRAMMETRIC
ENGINEERING
REMOTE SENSING
Photogrammetric Engineering
& Remote& Sensing
September Layout.indd 1117
Strahler, H., 1980. The use of prior probabilities in Maximum
Likelihood Classification of remotely sensed data, Remote
Sensing of Environment, 10:135–163.
Zheng, M.G., Q.G. Cai, and Z.J. Wang, 2005. Effect of prior
probabilities on maximum likelihood classifier, Proceedings
of the 25th IEEE International Geoscience and Remote
Sensing Symposium, 25–29 July, Seoul, Korea:
pp. 3753–3756.
S embe
ptem
1117
S e pte
r b2 e0r0 920 0 9 1117
8/13/2009 1:36:30 PM