Land-Use Classification of Remotely Sensed

Land-Use Classification of Remotely Sensed
Data Using Kohonen Self-organizing Feature
Map Neural Networks
Abstract
I
I
C.Y. Ji
The use of Kohonen Self-organizing Feature Map (KSOFM,or
feature map) neural networks for land-use/land-cover
classification from remotely sensed data is presented. Different
from the traditional multi-layer neural networks, the KSOFM
is a two-layer network that creates class representation by selforganizing the connection weights from the input patterns to
the output layer. A test of the algorithm is conducted by
classifying a Landsat Thematic Mapper (TM) scene for seven
land-uselland-cover types, benchmarked with the maximumlikelihood method and the Back Propagation (BP) network.
The network outpexformes the maximum-likelihood method
for per-pixel classification when four spectral bands are used.
A further increase in classification accuracy is achieved when
neighborhood pixels are incorporated. A similar accuracy is
obtained using the BP networks for classifications both with
and without neighborhood information. The feature map
network has the advantage of faster learning but has the
drawback of being a slow classification process. Learning by
the feature map is affected by a number of factors such as the
network size, the codebooks partitioning, the available
training samples, and the selection of the learning rate. The
feature map size controls the accuracy at which class borders
are formed, and a large map may be used to obtain accurate
class representation. It is concluded that the feature map
method is a viable alternative for land-use classification of
remotely sensed data.
Introduction
Artificial neural networks have been widely studied for the
land-use classification of remotely sensed data (e.g., Heermann and Khazenie, 1992; Bischof et al., 1992; Civco, 1993),
and are now accepted alternatives to statistical classification
techniques (Paola and Schowengerdt, 1997).The non-parametric neural network classifiers have numerous advantages over
the statistical methods, such as no assumption about the probabilistic models of data, the ability to generalize in noisy environments, and the ability to learn complex patterns. Therefore,
neural networks may perform well in cases where data are
strongly non-Gaussian, such as classification that incorporates
textural measures (e.g., Lee et al., 1990;Augusteijn et al., 1995),
and multi-source data classification (e.g., Benediktsson et al.,
1990; Gong, 1996;Bruzzone et al., 1997).
This paper reports on the use of the Kohonen Self-Organiz-
I
C.Y. Ji was with the Institute of Remote Sensing and GIS Applications, Peking University, Beijing 100871, China. He can presently be contacted c/o Xinning Jia, Asian Development Bank,
P.O. Box 789, 0980 Manila, The Philippines (jiaxinninga
adb.org).
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
ing Feature Map (KSOFM)
for land-uselland-cover classification. The algorithm is often described within the context of
artificial neural networks in many textbooks. Basically, the
feature map neural network is a vector quantizer which creates class representation onto a two-dimensional map by
self-organizing the connection weights from a series of input
patterns to outputs nodes. Figure 1depicts the network configuration. Each node in the output layer is fully connected to
its adjacent ones and to the input signals. The weight vectors
are adjusted so that the density function of codebook clusters
approximates the probability density function of the input
vectors. This is referred to as topographic representation
(Bose and Liang, 1996). The algorithm is biologically motivated, as maps of sensory surfaces are found in many parts of
the brain (Kohonen, 1982). Applications of the algorithm are
being made in many areas of pattern recognition tasks such as
speech recognition (Kohonen, 1988), robotics (Graf and
LaLonde, 1988), and image compression (Nasrabadi and
Feng, 1988). A number of studies are also found in remote
sensing classification. Orlando et al. (1990) used the method
to classify four ground-cover classes (three types of sea-ice
and one shadow class) from a radar image. They found that
the KSOFM performed nearly as well as the multi-layer perceptron and the Gaussian classifiers when networks contained at least 20 nodes in either one- or two-dimensional
configurations. A study by Hara etal. (1994) for cloud classification from SAR images concluded that the method yielded
comparable results to that of Learning Vector Quantization
and Migrating Means methods but had the advantages of classifying data with complex texture. Other applications of the
Feature Map algorithm in remote sensing include feature
selection (Iivarinen, 1994), and data preprocessing for neural
network classification (Yoshida and Omatu, 1994).
The current study is focused on the implementations of the
algorithm such as the network design and training. A Landsat
Thematic Mapper scene is used to classify seven land-use/
land-cover types. Classification using the feature map method
is benchmarked with a the maximum-likelihood statistical
classifier and the Back Propagation (BP) neural networks for
both pixel and window classification (i.e., classification that
uses the neighborhood pixels). All the image processing and
classification work is carried out using an image processing
system developed in the MS-Windowsenvironment.
Photogrammetric Engineering & Remote Sensing
Vol. 66, No. 12, December 2000, pp. 1451-1460.
0099-111210016612-1451$3.0010
8 2000 American Society for Photogrammetry
and Remote Sensing
December 2000
1451
mc(t + 1) = mc(t)
+ a(t)[xi(t)- m,(t)] if x is correctly
classified
Output Layer
Neighborhood
Input Layer
Band
Band2
Band3
Figure 1. A typical configuration of Ksom.
mc(t + 1)= m,(t) - a(t)[xi(t)- mc(t)l if x is incorrectly
classified
for i # c
LVQZuses a "window" with a specified width to locate the position of a sample point localized on the map in between two
adjacent neurons, and the weights for both neurons are updated
at the same time. The effect is to push the decision border to
coincide with the Bayesian decision border. The vector X is
defined to lie in the window if
min (dildj, djldi) > (1 - w)(l + w)
(4)
where dj is the relative distance between X and mi, and wis the
width of the window. The weights are updated according to the
following rules, if x falls inside the window:
Algorithm
Details of the algorithm are found in Kohonen (1988; 1989)and
Muller and Reinhardt (1990).The task is to determine a Ddimensional vector of "synaptic coefficients" m, so that each
neuron n is "responsible" for a range of input vectors for which
the distance ((x- m,((takes on the smallest value (Muller and
Reinhardt, 1990). A neighborhood is defined around each neuron so that the neurons within the neighborhood are updated at
each learning step, and this neighborhood often starts with a
large radius and shrinks gradually with time.
Let X = (x,,x,,...,x,} be an observation input vector where
n indicates the dimension of vector X (e.g.,the number of spectral bands). Training of the network is accomplished in two
stages: coarse tune and fine tune. The coarse tune of the map is
as follows:
Step 1: For each neuron, the synaptic coefficients are
randomized to real numbers within the range of 0.0 to
255.0 (i.e., the dynamic range of digital numbers).
Step 2: Feed the network with an input vector X; the
distances of vector X to all the neurons are computed
according to
Step 3: The neuron that has the minimum distance to
the input vector X is chosen and the synaptic coefficients
are updated as
Ncj(t) is the radius specifying the set of nodes in the neighborhood of node j at time t , a(t) is a scalar-valued "adaptation gain" and 0.0 < a(t) < 1.0.
Step 4: Feed in new inputs and loop Steps 2 and 3
until the network converges.
Step 5: Feed in vectors with known class and label
each neuron by majority voting.
The learning rate a must decrease slowly with time:
a(t) = a(t - 1) * a.Initial value of a can be chosen from
0.5 to 0.9.
The fine tune of the map is accomplished by Learning
)
1990). The conVector Quantization (LVQI to L V Q ~(Kohonen,
verged network is provided with samples of known class identities, and the weight vectors are updated according to the
following if LVQl is used:
1452
December 2000
if Ci is the nearest class, but x belongs to Cj # Ciand Ci is the
runner-up class.
In all the other cases, the weight vectors remain unchanged.
LVQ2 tends to overcorrect and LVQ3 is therefore developed,
with a minor modification made when Cj, Ci, and x belong to
the same class: i.e.,
Applicable values of E can be chosen in between 0.1 to 0.5.
Classification is conducted using the nearest-neighbor principle. For each input vector, the Euclidean distances to all the neurons are calculated, and the class identity of the input is found at
the neuron to which the distance is the minimum. A classification
is rejected if the minimum distance is greater than a threshold,
which is set to the largest distance of the training samples of one
class to all the neurons labeled to that class.
Data Acquisition and Preparation
A TM scene acquired on 16 October 1995 over a northern suburban area of Beijing was used in this study (pathlrow: 321123).
An extract of 1024 rows by 1024 columns was taken, which corresponds to an area of approximately 944 km2.Figure 2 shows
the band 5 (mid-inhared) image of the extract. The area covers
the northern part of the Haidian District, and the southern part
of Changping County. Major land-use types of the area are urban
areas, orchards, forests, and arable land. The area has undergone a dramatic change in terms of land-use patterns due to
rapid urban expansion in recent years, and the image has been
used previously in an urban expansion study for the period of
1991to 1995 (Jiet al., 1996).The latest land resource inventory
was completed in 1991 and the land-use map was compiled at a
scale of 1:50,000at the district level, from the original detailed
survey maps at 1:10,000 scale. This map was used for sample
selections and classification accuracy assessment. To reduce the
data volume, a principal components transform was applied to
bands 1,2, and 3, and the first principal component assembled
nearly 97 percent of the total variation. This component was
then used together with bands 4,5, and 7 for classification.
The classification system used in the previous urban
expansion study was adopted for the current study, and seven
land-uselland-cover classes in the study area were under consideration (Table 1).Two sets of sample data were compiled
according to the land-use maps and the information gathered
during the ground visits. The training data were picked (see
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
Figure 2. Band 5 image showing the study area.
Heermann and Khazenie, 1992)to ensure that each sample was
unique and should only belong to one class.
Network Selection and Training
The Feature Map Size
I
As a two-layer network, the feature map size is the only concern
when designing the network. The size of the feature map, i.e.,
the number of codebooks to use, can be interpreted as the overall resolution at which the linear piecewise decision boundaries are formed. It is expected that large-sized maps will
produce better class separation because this would allow a
more accurate linear piecewise approximation to the decision
boundary of Bayesian limits. However, larger-sized maps
require more computation time for network learning. It is also
likely that the global ordering of the feature map may be difficult to achieve for larger maps. An experiment was conducted
to examine the impact of the feature map size on the network
learning performance. The network size ranged from 25 up to
900. Figure 3 shows the relationship of the labeling accuracy
against the map size using the samples of four spectral bands.
The accuracy shown in the figure was obtained after fine-tuning for each different feature map, which is described in the
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
next section. One can see that the network learning improved
as the size increased, and the increase was fast at the beginning
and gradually stabilized as the network size became larger. The
25- by 25-neuron network was large enough, and further expansion of the network size was not necessary, and this map was
therefore used for classification.
As the size of the feature map increases, the improvement
in terms of sample labeling accuracy is non-uniform across
classes. For easily separable classes, the improvement is rather
marginal. For instance, horticultural land does not show much
improvement and it is well separated even when the map size
is small. For classes for which the data distributions substantially overlap (e.g., forests and orchards), the improvement is
much more evident. This suggests that large-sized feature
maps may have to be used when dealing with data that are
strongly overlapping.
Training of the Feature Map
A coarse tune forms the global ordering of weights in D-dimension, and partitioning of codebooks among the classes is
accomplished once the coarse tune is completed. The mathematics behind the ordering of weights and formation of class
December 2000
1453
books so that the number allocated to one particular class is the
number that is needed for that class to form accurate class
boundaries. To achieve this, the following approach was taken.
Class
Land-use/
Training Test
The proportion of samples for each class was calculated, and a
ID
cover
Descriptions
Set
Set
pattern was defined by which the number of samples that
1
Arable Land
mainly winter wheat
8216
8660 should be fed to the network consecutively from each class was
and paddy fields,
determined, according to the proportions. By feeding the samand bare soil.
ples to the network according to the specified pattern, partiBarren Land
quarries, bare rocks,
1145
390
tioning of codebooks is accomplished so that the number of
and cleared land for
codebooks assigned to each class is the result of "winner-takeproperty developall" process. Figure 4 shows a feature map with 15 by 15 neument.
coniferous forest in the
8849
3295 rons partitioned by the latter form. It can be seen from Figure 4
Forest
mountainous areas,
that similar patterns in the input space are mapped onto adjadeciduous woods
cent neurons on the feature map. Apparently, a cluster of three
elsewhere, and
neurons assigned to forests on the lower-left corner is cut off
nurseries.
from the main block of neurons. This is not a "parcellation"
4
Orchard
mainly apple and
2320
2029 caused by using a small neighborhood during the coarse tune
peach orchards.
5
Urban
rural village clusters,
9652
6865 process as mentioned by Kohonen (1990),because the initial
neighborhood used is 14. In fact, these neurons are allocated to
Features
roads, and continudeciduous woodlands because they have a higher spectral
ous urban fabric.
6
Horticulture
vegetable fields and
657
793 reflectance in the near-infrared band than do coniferous woodgreenhouse bases.
lands, and the spectral reflectance is lower than that of horti7
Open Water
lakes, canals, and fish
4021
1268 cultural land but higher than that of orchards.
Surface
ponds.
The learning parameters concerned are the learning rate a,
Total
34833 23300 and the decreasing factor ufor a. Coarse tune is easily accomplished and the tuning is insensitive to the initial value of a,
provided that a decreases slowly with time. For the 25 by 25
map, an initial value of 1.0 is used and the decay factor is set to
0.99. Convergence of the network (when adrops below 0.01) is
93 reached after 1644 epochs. Once the coarse tune is completed,
the samples are classified and the error rate is checked. This
92 - error rate serves as the starting point for the fine adjustment
procedure.
For the fine tune process, the first step is to apply LVQl. This
is optional but it often turns out to be efficient in correctingthe
O
weights that might be over tuned or less tuned by the coarse tune
process. A small sensible value of a should be used. Three
89 -rounds of LVQI are carried out by setting the learning rate to 0.01.
f!
LVQ213 is then applied, and the width of the window is set to 20
2 88 -percent as suggested by Kohonen (1990).Varying the width has
shown partial improvement in terms of labeling accuracy upon
87 -certain classes, while other classes deteriorate. This is clearly
86
caused by the difference in the number of samples falling inside
0
200
400
600
800
1000
the window. Therefore, the learning rate a is varied in order to
Number of Neurons
reduce this effect. For a large bulk of training samples, a relatively larger value of a is used, while a much smaller value is
Figure 3. Relationship between the map size and the labelused for classes with a smaller quantity of samples.
ing accuracy.
OF PIXELSIN THE TRAININGDATASET AND THE TEST
TABLE1. THE NUMBER
DATASET.
i 1::
Y
representations is complicated, and a detailed study can be
found in Bose and Liang (1996).Ripley (1996) argued that it is
unclear how many codebook vectors should be selected for
each class, because the number needed depends as much on
how well they are employed as on the proportions of the class.
Preliminary studies show that the partitioning of codebooks is
affected by the sequence of the samples fed to the network, the
number of samples drawn from each cover class, and how the
data are distributed. Two forms of codebooks partitioning were
employed in the experiment. One was to partition the codebooks evenly among the classes. By alternating the samples
among the classes at each learning step, the codebooks were
nearly evenly divided among the classes. However, as the probability density functions differ from class to class, classes with
higher variations will be portrayed less accurately,because the
number needed should in theory be greater than those with little variations. Therefore, this type of partitioning was discarded from further study. The other is to partition the code1454
Decemher 2000
O O O O @ ~ ~ ~ @ ~
OOOOOOfJQ(J(?JOOO@O
@@@@)~(J@@~~(?JOo~~
@@@@@@@(-JQoopJo@o
@@@@,@@@@(-JQ(=JOoo@
@@@@@@89@00000(=J
@@~o@@~~oQ~QQQ~
@@@000o~o~~@
@
Arable
e@@OOoOo0Oo@~
8@8800000000
@@000000000@1
00000000000C,,-,
e000000000000@@
~80Q00000000000
0
0
0
0 @
e
Barren
Forest
Orchard
Urban
Water
Horticolture
Figure 4. A 15 by 15 feature map labeled for the
seven land-use classes.
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
~
TABLE2. CONFIGURATIONS
FOR THE FOURBP NETWORKS.
is implemented to classify 1 2 classes by splitting arable land
into
three subclasses (bright bare soil, dark bare soil, and vegeNetwork Configuration
tated fields),forests into three subclasses (coniferous, deciduData
Coding
Input
Hidden
Output ous, and shadow), and urban features into rural villages and
metropolitan areas. The distribution of each of these classes is
Four Bands
1 0 units
4 0 units
1 8 units
7 units
assumed
to be Gaussian. The results are then amalgamated into
7 units
252 units
Window 3 by 3
20 units
7 units
seven land-use classes to align with the results from other
Window 5 by 5
4 units
400 units
15 units
7 units
classifications.
4 units
784 units
Window 7 by 7
15 units
7 units
Three-layer B p neural networks (with one hidden layer) are
used for the comparative study. Coarse Coding (Bischof et al.,
1992)is used to map digital values before they a e used as input
It is observed that, once LVQ2 is applied, the use of LVQ3
to the networks. As described by Bischof et al. (1992),the nurnmay smooth out the decision border. Therefore, LVQ3 is no
ber of coarse coding units do not have a profound impact on
longer used, and LVQ2 is used repeatedly. To ensure that tuning
network learning behavior, but coding does allow a relatively
is properly carried out, the sample labeling accuracy is evalularge value of learning rate to be used in order to quickly reduce
ated by forming a confusion matrix, after each complete round
the network error. Table 2 presents the network architectures
of tuning. If one particular class is over-tuned due to an inadefor the classifications. Ten units are used for training the netquate value of a being used, the results are discarded and
work when four bands are used. Fewer units are used when
another run of LVQZ is carried out. At the beginning of the fine
textural information is incorporated to reduce the learning time
tuning process, a relatively large value of a could be used (e.g.,
required for the networks to converge. The output layer has
0.3) for LVQZ without hampering the overall adjustment of
seven neurons which correspond to the seven land-use classes.
weights. As this process continues, smaller values may be
The size of the hidden layer can be a crucial question in netapplied. Tuning is continued until no more increase in labeling work design. For each network, the number of hidden units is
accuracy is obtained. Six rounds of iterative tuning are convaried and the network with the best performance in terms of
ducted, and the network is said to have fully converged.
the final network error is used for classification.
The same training data set is used to train the BP networks.
Inclusion of Textural Information
Connection weights are updated after each sample is fed to the
Studies (e.g., Lee et al., 1990;Bischof et al., 1992; Augusteijn et network. The initial learning rate is set to 0.5, and the momenal., 1995)show that neural networks are able to extract textural
tum term is set to 0.2. An initial run of 30 epochs is conducted
information directly from the neighborhood of a pixel and use
without changing the learning rate and the momentum term.
it to enhance classification performance, and no explicit defiModifications of these parameters are then made by examining
nition of textural measures are required. In the current study,
the dynamics of error as suggestedby Heermann and Khazenie
(1992). The mean square error (MSE) is calculated every five
the neighborhood is defined with a window of varying sizesi.e., 3 by 3 , s by 5, and 7 by 7, respectively. The center of the
epochs, and the MSE decay as a function of time is show in Figwindow is placed at the same image coordinates as used
ure 5. The learning rate may have to be reduced to a smaller
before. Smaller feature maps are used for training the samples
value before the learning is terminated in order to achieve good
taken by the window sizes of 5 by 5 and of 7 by 7 (20 by 20 and
separation between the samples that are difficult to learn (typically, the final learning rate is reduced to 0.001). The sharp
18 by 18, respectively) as an effort to reduce the computation
turning points in Figure 5 correspond to the abrupt changes in
time. This is based on the assumption that the added neighborthe learning rate. The networks with neighborhood information
hood information helps classes separate; therefore, larger feadefined by 5 by 5 and 7 by 7 windows are able to converge to
ture maps are not necessarily required. All feature maps are
able to learn the samples much faster, as the samples are well
the desired error tolerance term (which is set to 0.01), and the
separated when coarse tune is terminated. Three rounds of fine
network with a 3 by 3 window finishes close to the convertuning are carried out for each of the networks. The time used
gence with the final error of 0.0134. Table 3 presents the trainfor training the network increases substantially (see Table 3).
ing time and classification time for both networks.
With the neighborhood information added, the increase in samResults
ple labeling accuracy is seen across classes.
Table 4 presents the classification accuracy using the maximum-likelihood classifier, the feature map method, and the Bp
Benchmarking Studies
algorithm. The classification accuracy for each individual class
Two other classification methods are used for benchmarking
is calculated as the correctly classified pixels divided by the
studies: the maximum-likelihood method and the Back-Propagation neural network. The maximum-likelihood classification total number of pixels in the test set. Confusion matrices for
AND CLASSIFICATION
TIMEFOR
TABLE3. TRAINING
BOTH .NEURALNETWORKS
RUNNINGON A P E M I U M /PC.
~~~
TrainingFeature Map
Four Bands
Window 3 by 3
Window 5 by 5
Window 7 by 7
BP Method
Four Bands
Window 3 by 3
Window 5 by 5
Window 7 by 7
MLC
Four Bands
Coarse Tune
17.33 Minutes
37.2 Minutes
1.45 Hours
1.083 Hours
Fine Tune
Total
3.6 Minutes
10.8 Minutes
22.3 Minutes
1.05 Hours
0.349 Hour
0.8 Hour
1.82 Hours
2.133 Hours
0.546 Hours
4 . 4 Hours
5.9 Hours
6 . 3 Hours
I
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
7
Seconds
Classification
217 Seconds
220 Seconds
2.3 Hours
2.87 Hours
9 1 Seconds
722 Seconds
763 Seconds
1980 Seconds
18
Seconds
December 2000
1455
20
0
40
60
80 100 120
Number of m c h r
140
est in all spectral bands. Although many outliers in the training
data are deleted, still the assumption of Gaussian distribution
has resulted in a great deal of misclassification by the maximum-likelihood method. For orchards, which have a near normal distribution, the Kohonen Map method has also produced
better classification results. In fact, the subdivision of arable
land class reduced the internal variability, forcingmany orchard
pixels to be assigned to arable land by the maximum-likelihood
method. The feature map network produced better results due
to the fact that the decision boundaries are formed more accurately in a piecewise manner by using a cluster (or clusters) of
neurons. In a sense, the map method may be able to achieve
results equally as accurate as is the maximum-likelihood
method when data distribution is normal. However, this
requires that the map be well trained using the data available.
For classes with fewer samples, it is possible that the weights
are not well tuned to favor these classes. This is in fact the case
with Class 6 where the maximum-likelihood method scored
higher classification accuracy then did the feature map method
(the data distribution of Class 6 is near-normal). Figure 6 shows
a segment of the classified images. It is evident that all three
classifiers have produced noisy results, although the results
generated by the two neural networks are slightly less noisy
compared to that of the maximum-likelihood method.
By incorporating the neighborhood information, significant increases in overall classification accuracy are achieved.
The improvement is seen across classes, partly due to the
homogeneity of the test sites chosen. Forests are much better
identified as the confusion with arable land and with orchards
reduces. Urban areas are also better classified, particularly in
the metropolitan areas where fewer pixels are misclassified as
water bodies. For open water surfaces, the identification is also
improved. It is evident from Figure 6 that classification with
textural information is much more noise free. The results indicate that the feature map network is able to utilize the neighbor-
160
Figure 5. The learning curve of the BP networks.
classifications using a window size of 5 by 5 for BP and KSOFM
are presented in Tables 5 and 6, respectively. Others are not presented here in order to reduce the length of the paper. All confusion matrices are standardized before the accuracy is
evaluated.
With the use of four-band data, the feature map network
produced an overall accuracy of 89.7 percent, 2 percent higher
than that of the maximum-likelihood classifier at 87.7 percent.
The increase in classification accuracy is mainly attributed to
better identification of arable land, urban features, and water
bodies, because the data distributions of these classes are far
from normal. The standard deviation of barren land is the high-
TABLE
4.
~NDIvIDUAL LANDUSE/LANDCOVER
CATEGORY
ACCURACYAND OVERALL
ACCURACY
FOR ALL THE CLASSIFICATIONS.
KEYS: MLC-MAXIMUM-LIKELIHOOD
CLASSS~FIER;
KH-KSOFM, BP-BACK PROPAGATION.
TMXM-TEXTURAL
CLASSIFICATION
DEFINED
BY A WINDOW
WITH A SIZEOF MBY M.
Percentage Correct %
Class1
Class2
Class3
Class4
Class5
Class6
Class7
Overall
Accuracy
86.7
91.0
92.5
96.2
96.0
96.6
96.1
96.5
97.0
91.5
91.8
86.7
92.6
96.4
98.5
93.6
95.6
93.1
92.8
90.6
91.8
94.4
97.4
94.3
97.7
94.8
99.7
81.2
89.3
85.5
95.3
91.4
96.0
94.2
97.0
93.7
87.6
89.4
90.6
95.9
95.5
95.8
93.8
97.2
95.4
95.6
91.8
93.3
93.3
94.8
91.7
92.8
91.2
90.0
78.0
83.8
85.3
94.2
95.1
95.4
94.2
93.5
96.3
87.65%
89.67%
89.38%
94.54%
95.22%
95.47%
94.63%
95.12%
95.04%
MLC
KH4
BP4
KHt3 X3
BPt3 X3
KHt5 X5
BPt5X5
KHt7X7
BPt7X7
TABLE
5.
CONFUSION
MATRIX
FOR CLASSIFICATION USING
A 5 BY 5 WINDOW
BY THE BP ALGORITHM.
Classified As
Ground W t h
1
2
3
4
5
6
7
Commission Error %
Column Total i n Pixels
Overall Accuracy
1
2
3
4
5
6
7
0.961
0.007
0.01
0.936
-
0.023
0.023
-
0.033
0.009
0.03
0.0313
0.023
0.008
-
-
0.002
-
11.6
8655
-
0.005
-
1.2
458
-
0.977
0.013
-
0.027
0.02
0.013
0.036
9.7
3549
0.942
0.001
0.002
0.023
0.938
-
3.9
1990
-
0.014
0.928
0.022
8.1
0.8
6689
739
94.63%
-
0.003
-
0.942
0.5
1219
Omission
Error %
Row Total
i n pixels
3.9
6.4
2.3
5.8
6.2
7.2
5.8
8660
390
3295
2029
6865
793
1268
TABLE6. CONFUSION
MATRIXFOR
CLASSIFICATION
USING A
5
BY
5 WINDOWBY KSOm.
Classified As
Ground Truth
1
2
3
5
4
6
Omission
Error %
7
Row Total
in Pixels
1
2
3
4
5
6
7
Commissioe Error %
Column Total in Pixels
Overall Accuracy
Arable
Bamn
Fonst
Orchsrd
Urban
Horticulhm
Water
Figure 6. Extracts of the classified images. (1)
Original band
5 image (2) MLC, (3)KH-four bands. (4)BP-four bands.
(5)KH-Window 3 by 3.(6) BP-Window 3 by 3.(7)KHWindow 5 by 5. (8)BP-Window 5 by 5. (9) KH-Window
7 by 7. (10)BP-Window 7 by 7.
hood information to enhance classification performance
without explicitly defining textural measures.
The neural network trained with the Back Propagation
algorithm achieved an overall accuracy similar to that of the
feature map networks when four spectral bands are used. The
major differencesbetween them in the classified results are the
identification of Class 3 and Class 4. The feature map method
has identified Class 4 more accurately than does the BP network,
while the BP network has identified Class 3 more accurately.
Class 4 is mostly confused with Class 3, because the spectral
signature of orchards is very much similar to that of the deciduous woodlands. Although there are differencesbetween different runs of the BP network (sometimes the difference can be
huge) due to the randomizing of the initial weight and the
selection of learning rate, the same effect on the separation of
the two mentioned classes is observed. This may be attributed
to the use of the Coarse Coding procedure which is designed to
favor generalization, because similar numbers will produce
similar patterns which are provided to the network. The separation between the two classes may be better achieved if coding
is done using bit patterns of digital numbers (e.g., Heermann
and Khazenie, 1992). In this case, similar DNs will be mapped
completely differently; therefore, it would be easier for the network to learn similar patterns. For the feature map algorithm,
when coarse tune is accomplished, the separation is also not
well achieved for the two classes. The use of fine tune proceDecember ZOO0
1457
dures, particularly LVQ2, is able to form accurate decision
boundaries between them. One can see that the identification
of both of the classes is well achieved and balanced.
For all the window classifications, again the same level of
accuracy is obtained by both neural networks. The differences
are still in the two classes mentioned above. The over learning
of the BP networks is observed when the 7 by 7 window classification is conducted. The network error is able to decrease continuously (the final MSE is 0.00745 at epoch 120), but the
overall classification accuracy is decreased by more than half of
a percent. Clearly, as the window size increases and the samples become unique, the BP network loses its ability to generalize, resulting in poor classification performance.
Paola and Schowengerdt (1997) use the majority filtering
technique to compare single-pixel classification against window classification. They found that majority filtering can simply smooth out fine details of classification. In the current
study, it is found that the window classification also has the
same effect, though not as dramatic. The fine details such as
the internal variability (i.e., noise) within the land-usellandcover parcels are smoothed out, resulting in improved classification. However, other details such as linear features (e.g.,
canals, ditches, and roads) are either eliminated or misclassified. In addition the land-use parcels appear to be distorted
around the edges. The problem is more evident when the window size becomes larger. This is because these areas are where
the mixed pixels are located while the samples chosen are from
more homogeneous areas. Moreover, the generalization ability
of both types of networks is weakened due to the uniqueness of
the samples, as mentioned earlier. It can be seen fromFigure 6
that the feature map has misclassified some of the forest areas,
while the BP network has misclassified many land-use parcel
edges as barren land. Window sizes of 3 by 3 andlor 5 by 5 for
incorporating neighborhood information seem to be suitable
choices because, beyond that, no further improvement is seen,
and the distortion of classification around the edges has just
began to emerge. Furthermore, network training time can be
significantly reduced (referto Table 4). Even if small windows
are used, training samples may have to be selected in areas
along the edges so that the mixed samples are represented. The
classification results also suggest that, even when the window
size is quite large, single-pixel classifications can still be seen,
which indicates that the neural networks are better techniques
than majority filtering by which fine details of classification are
largely removed, especially when a filter size of 7 by 7 is used
for majority filtering where the details can barely be seen.
Discussions
Comparison with the BP Method
The KSOFM and BP neural networks differ in a number of ways.
The most notably is their network architecture. The former
requires only two layers with one input layer and one output
layer. The network design is relatively simple, provided that
the output layer is large enough to form accurate class representation. For the latter, three layers (or more if more than one hidden layer is used) are usually needed, and the size of the hidden
layer can have a profound impact on the success of the network
performance. The dilemma of the hidden layer size is well
addressed in the literature (Paola and Schowengerdt, 1997).
An applicable size for a particular problem may have to be
determined by trail and error. In this study, it was found that
increasing the size of the hidden layer helps improve class separation, especially when fewer data are used for classification.
The subtle difference between the two networks lies in the
learning mechanism, i.e., the decision-making process. The
feature map network creates decision boundaries in a linearpiecewise manner. To achieve the same level of accuracy as
that of the BP network, a fairly large map has to be employed.
1458
December 2000
Even so, the decision boundaries may not be able to portray the
data distributions much accurately by linear boundaries that
demarcate classes. The BP network on the other hand, due to
the complexity of its network architecture, plus the powerful
learning mechanism, forms class borders in a non-linear arbitrary fashion (the network is able to mimic the distribution that
is disjointed when two hidden layers are used (Lippmann,
1987));hence, it may be more robust and potentially capable of
achieving better classification results. This is reflected in the
network training because the networks are able to learn the
samples with higher labeling accuracy, although the over
learning by the networks has to be controlled by a proper convergence term. The feature map, however, is not able to further
reduce the confusion among the samples when training is terminated, but it offers a good degree of compromise among the
samples by forming linear decision borders. Both types of networks perform well as compared to statistical classifiers such
as the maximum likelihood.
The inconsistency in training by the BP network is noted by
many researchers. For the feature map, the same phenomenon
is seen because a different set of weights is obtained by varying
the learning rate. But, the same final labeled map may be
obtained even if a different learning rate is used (the labeled
maps may only differ in orientations). For the BP network, convergence controls the generalization ability, and too-welltrained networks may not function well on the entire image.
For the feature map, the over-correctingeffect by the LVQ2 is not
observed, and the experiment conducted shows that classification accuracy continuously increases as fine tuning continues.
To this end, the feature map is more consistent than the BP
network.
Computation-wise, the feature map is a faster learner but a
slow classifier. On the other hand, the BP algorithm suffers
from the fact that learning is time-consuming (Heermann and
Khazenie; 1992, Tzeng et al., 1994). But, classification by the
BP network is fast. It is evident from Table 3 that training of the
feature map is usually 2 to 3 times faster than training the B p
network. When more channels are used for training, the advantage of using the feature map network is even more pronounced, because the training of a large volume of data is much
slower for the BP method, although fewer epochs are usually
required. It should be mentioned that the BP network training is
controlled by many factors such as the hidden layer size, the
learning rate selection, the convergence term, and so on. The BP
learning process can be significantly accelerated by reducing
the hidden layer size and by using fewer nodes for coarse coding. Therefore, the figures in Table 3 should only serve as a relative comparison between the two networks. However, it
appears that the BP network may need more neurons for the
hidden layer to form accurate class borders in order to achieve
the same level of classification as that of the feature map, and
the increased size of the hidden layer prolongs the learning
phase significantly. The enhanced performance using more
hidden neurons could be due to the use of coarse coding of the
input data, because the network has a huge number of input
neurons (for instance, coding for 7 by 7 textural input results in
980 input nodes when using five units per DN); therefore, the
network may require more hidden neurons to accommodate the
input in order to effectively reduce the network error. For the
classification using a 3 by 3 window and the BP network, a network with 252 by 15 by 7 topology has produced an overall
classification at merely 93.74 percent, 1.5 percent lower than
the accuracy presented in Table 4. For the classification phase,
however, the BP networks are much faster than the feature map
networks. For the feature map, finding the neuron to which an
input vector belongs among a vast number of nodes (e.g., a 25
by 25 map results in 625 neurons) is a very time consuming
process. The method proposed by Hardin and Thomson (1992)
has been implemented to speed up this process. Still, the time
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
il
required for classification is tremendous, especially when
there are 196 channels of data (a window size of 7 by 7 for the
classification).
Training of the Feature Map Network
Classification by the feature map neural network is at least
affectedby the following factors. First is the partition of codebooks. Second is the size of the feature map to use. h i n g of
the feature map also contributes a certain degree of influence,
and a fourth influence is the number of samples selected for
each class.
Partitioning of codebooks among classes determines the
number of neurons allocated to each class. This process can be
controlled, to some extent, by using non-equal numbers of
training samples for each class, and by the sequence in which
the samples are provided to the network. The experiments conducted in this study show that, by alternating samples from
each class according to a predefined pattern, the codebooks are
partitioned in a near optimal fashion. The size of the feature
map should be large enough to form accurate class representations. When classifying four-band data, the 10 by 10 feature
map has produced an overall accuracy at merely 87 percent, 2
percent lower than that of the 25 by 25 feature map. At small
sizes, it is obsewed that, no matter how well the feature maps
are partitioned and tuned, the overall labeling accuracy cannot
reach the level produced by larger maps. For two similar
classes, it is often the case that one class is better tuned than the
other, leaving many patterns mixed which would otherwise be
separated by larger networks. The number of samples selected
for each class affects the network learning at various stages,
such as the partition of the feature map and the fine tuning process. Classes with bulky training samples tend to occupy more
codebooks and therefore are more accurately discriminated.
Fewer samples will result in fewer codebooks being activated
during training; class separation may therefore be underachieved. Civico (1993) stated that the selection of training
samples is important for the BP network learning. For the feature map network, it is equally important that the samples may
have to be pre-processed before the network training begins.
For instance, redundant samples may be deleted, and samples
may be balanced among the classes through a certain procedure.
The use of large feature maps trades off the computation
time. Fromthe map size of 10 by 10 to 25 by 25, the time used
for the coarse tune has increased about 20 times (training of the
10by 10 feature map took less than one minute). A more challenging situation might be when a large number of groundcover types are to be classified. For example, classification of
agricultural crops for a fairly large coverage may involve dozens of crop types. In this case, a large quantity of codebooks
would be required, and the training time would increase substantially. Methods for speeding up the training phase, such as
the "Quick Reaction" approach (Monnerjahn, 1994),are therefore necessary.
Conclusions
The results of this study show that the KSOFM algorithm is capable of achieving higher classification accuracy as compared to
that of the maximum-likelihood method. There are no significant differencesbetween the classification performance of the
feature map networks and the Back-Propagation networks. The
feature map neural network is also able to utilize textural information to enhance classification performance. The size of the
feature map and partition of codebooks affect class representation. Large-sized feature maps may have to be used to achieve
more accurate class separation, and the algorithm may therefore encounter difficulties when a large number of classes are to
be represented. Nevertheless, the feature map method is a useful viable alternative for the classification of remotely sensed
data.
PHOTOGRAMMETRIC ENGINEERING 81REMOTE SENSING
Acknowledgment
he author wishes to acknowledge the China National Natural
Science Foundation for sponsoring- this research through
- Project No. 49601014.
References
Augusteijin, M.F., L.E. Clemens, and K.A. Shaw, 1995. Performance
of texture measures for ground cover identification in satellite
images by means of a neural network classifier, E E E 'Transactions
on Geoscience and Remote Sensing, 33(3):616-626.
Benediktsson, J.A., P.H. Swain, and O.K. Ersoy, 1990. Neural network
approaches versus statistical methods in classification of multisource remote sensing data, LEEE Zhnsactions on Geoscience and
Remote Sensing, 28(4):540-552.
Bischof, H., W. Schneider, and A.J. Pinz, 1992. Multispectral classification of Landsat images using neural networks, LEEE 'Transactions
on Geoscience and Remote Sensing, 30(3):482-490.
Bose, N.K., and P. Liang, 1996. Neural Networks Fundamentals with
Graphs, Algorithms, and Applications, McGraw-Hill, Inc., New
York, N.Y., ISBN 0-07-006618-3, p. 361, p. 367.
Bruzzone, L., C. Conese, F. Maselli, and F. Roli, 1997. Multisource
classification of complex rural areas by statistical and neural network approaches, Photogrammetric Engineering & Remote Sensing, 63(5):523-533.
Civco, D.L., 1993. Artificial neural networks for land-cover classification and mapping, Int. J. Geographical Information Systems,
7(2):173-186.
Gong, P., 1996. Integrated analysis of spatial data from multiple sources:
Using evidential reasoning and artificial neural network techniques for geological mapping, Photogrammetric Engineering &
Remote Sensing, 62(5):513-523.
Graf, D.H., and W.R. LaLonde, 1988. A neural controller for collisionfree movement of general robot manipulators, Proc. EEE Int. Conf.
on Neural Networks, ICNN-88, 24-27 July, San Diego, California, 1-77-1-84.
Hara, Y., R.G. Atkins, S.H. Yueh, R.T. Shin, and J.A. Kong, 1994. Application of neural networks to radar image classification, IEEE
'Transactions on Geoscience and Remote Sensing, 32:lOO-109.
Hardin, P.J., and C.N. Thomson, 1992. Fast nearest neighbor classification methods for multi-spectral imagery, The Professional Geographer, 44(2):191-201.
Heermann, P.D., and N. Khazenie, 1992. Classification of multispectral
remote sensing data using a back-propagation neural network,
LEEE 'Transactions on Geoscience and Remote Sensing,
30(1):81-88.
Iivarinen, J., K. Valkealahti, A. Visa, and 0. Simula, 1994. Feature
Selection with Self-organizing Feature Map, Proceedings of the
International Conference on Artificial Neural Networks, [dates of
conference], Sorrento, Italy, 1:334-337.
Ji, C.Y., D.F. Sun, and S. Wang, 1996. Dynamic Land-use Change Detection using TM Images, Technical Report 111to the State Land
Administration of China, The State Land Administration, Beijing,
65 p.
Kim, J.W., J.S. Ahn, C.S. Kim, H.Y. Hwang, and S.W. Cho, 1994. Multiple self-organizing neural networks with the reduced input
dimension, Proceedings of the International Conference on Artificial Neural Networks, 26-29 May, Sol~ento,Italy, 1:310-313.
Kohonen, T., 1982. Self-organizing formation of topologically correct
feature maps, Biological Cybernetics, 43:59-69.
, 1988. The 'neural' phonetic typewriter, Computer, 21:ll-22.
, 1990. The self-organizing map, Proceedings of the LEEE,
78(9):1464-1480.
Kohonen, T., G. Barna, and R. Chrisley, 1988. Statistical pattern recognition with neural networks: Benchmarking studies, Proceedings
of International Joint Conference on Neural Networks, IJCNN-88,
24-27 July, San Diego, California, 1-61-1-68.
Kohonen, T., T. Kangas, J. Laaksonen, and K. Tprkkola, 1992.
LVQ PAK. The Learning Vector Quantization Program Package,
Version 2.1, Laboratory of Computer & Information Science, Helsinki University of Technology, Helsinki, Finland.
December 2000
1459
Lee, J., R.C. Weger, S.K. Sengupta, and R. Welch, 1992. A neural network approach to cloud classification, BEE Transactions on Geoscience and Remote Sensing, 28(5):846-855.
Lippmann, R.P., 1987. An introduction to computing with neural networks, lEEE ASSP Magazine, (April):4-22.
Liu, Z.K., and J.Y. Xiao, 1992. Multiple Kohonen networks and their
application to remote sensing pattern recognition, Proceedings
of the 6thChina National Conference on Image and Gruphics, July,
Beijing, China, pp. 428-431.
Monnerjahn, J., 1994. Speeding-up self-organizing maps: The quick
reaction, Proceedings of the International Conference on Artificial
Neural Networks, 26-29 May, Sorrento, Italy, 1:326-329.
Muller, B., and J. Reinhardt, 1990. Neural Networks: An Introduction,
Springer-Verlag, New York, N.Y., pp. 245-248.
Nasrabadi. N.M.. and Y. Fene. 1988. Vector auantization of images
upon the ~ d h o n e nself-organizing featur;! maps, proceedin& of
the IEEE International Conference on Neural Networks, ICNN-88,
24-27 May, San Diego, ~alifornia,1-101-1-108.
Orlando, R., R. Mann, and S. Haykin, 1990. Radar classification of seaice using traditional and neural classifiers, Proceedings of lnternational Joint Conference on Neural Networks, IJCNN-90, 15-19
January, Washington, D.C., 11-263-11-266.
Paola, Justin D., and R.A. Schowengerdt, 1997. The effect of neural
network structure on a multispectral land-uselland-cover classification, Photogrammetric Engineering b Remote Sensing, 63(5):
535-544.
Ripley, B.D., 1996. Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge, United Kingdom, ISBN
0-521-46086-7, p. 206, p. 323.
Tzeng, Y.C., K.S. Chen, W.L. Kao, and A.K. Fung, 1994. A dynamic
learning neural network for remote sensing applications, lEEE
Transactions on Geoscience and Remote Sensing, 32(5):
1096-1102.
Yoshida, T., and S. Omatu, 1994. Neural network approach to land
cover mapping, lEEE 7Yansactions on Geoscience and Remote
Sensing, 32(5):1103-1109.
(Received 25 January 1999; revised and accepted 06 December 1999)
membership expire soon?
Renew Now! To Ensure you receive all of the ASPRS Member Benefits!
Renewal Now Guarantees:
Delivery of 12 issues of PE&RS, including "Resource 2001"
Up to $100 savings on St. Louis 2001 Annual Conference and ASPRS Specialty Conferences
25-40% discount off ASPRS publications
Remember that your membership renewal is based on the anniversary date of the month you joined. If your
membership is due January 1': we encourage you to renew before the end of the year to ensure your continued subscription to PE&RS. A n y renewals received after the renewal due date may result in missed PE&RS
issues. Please note - back issues will no longer be provided for missed issues caused by late renewals. However, you will be able to obtain back issues by purchasing single copies for $15 each.
9
Easy ways FO nenew:
Renew online
Now members can save time and the cost of a stamp, and also help ASPRS become more efficient, by
renewing their membership online. To renew online, navigate to the ASPRS e-serve site by clicking
http://eserv.asprs.org or by clicking the ASPRS e-serve button at http://www.asprs.org. Login with
your current login I D and password. At the left frame, select "Dues Renewal," then complete the form
with the information requested. If you would like to make a contribution to the Building Fund, or
other contributions, please select the appropriate box. If you have problems accessing the ASPRS eserve with your login ID and password, please contact Sokhan Hing: [email protected],
If you are no longer a student, you may become an Associate Member. Dues are $65. Contact
[email protected] to convert. For details see page 1405.
Please note -Your permanent new member card has been mailed to you along with the renewal notice.
You can use this card as long as you are an active ASPRS member.
Call 301-493-0290, ext. 109
Fax your renewal to 301-493-0208
Mail renewal notice to:
ASPRS, P.O. Box 630558, Baltimore, MD 21263-0558
AS ALWAYS, WE THANK YOU FOR CONTINUING TO SUPPORT ASPRS.
1460
December 2000
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING