The first step in comparing the results of an unsupervised classifier

The first step in comparing the results of an unsupervised classifier versus a supervised
classifier was the selection of appropriate test data. A spatial and spectral subset of a
MISI image generated during a fly-by of Durand Park on the Lake Ontario shore just
north of Rochester was selected (see Figure 3). This image had spatial dimensions of 540
pixels in width and 1639 pixels in height while each pixel contained 25 bands in the
spectral dimension. The second step in the comparison was the creation of code that was
to execute the unsupervised and supervised classification algorithms along with ancillary
data processing algorithms (confusion matrices, pseudo-color classification maps, etc.).
In general, the code that performed the classification did three things. First, the code
executed the desired algorithm and generated a classification map. Second, the program
calculated all possible combinations of the Jeffries-Matusita distance and the transformed
divergence using the class information developed within the algorithm. From these
calculations, a determination of the best three-band combination for display was made.
For the unsupervised algorithm, ISODATA was selected. As such, code was developed to
execute the ISODATA algorithm and compute the distance metric it required – the
minimum distance to the mean. For the supervised algorithm, the Gaussian Maximum
Likelihood (GML) classification scheme was selected. To use this scheme, a discriminant
function had to be modeled after an assumed underlying statistical distribution of the
data. A discriminant function has the property where the output value was greatest for the
class the input hyper-spectral pixel belonged to in a set of discriminant function values
calculated using the first-order statistics of each cluster. For this data set, it was assumed
that each cluster, or class, of data had a multivariate Gaussian distribution.
Once the code-writing phase had ended, the unsupervised algorithm was executed on the
input MISI image. For the ISODATA algorithm, the six classes were desired. After
execution, regions of interest were selected in the input image using the ENVI software
package. The regions of interest that were selected can be seen in Figure 4.
Figure 3. Original MISI image to be classified
Figure 4. Regions of interest selected for
input into the supervised algorithm
2
Using these regions of interest and the MISI image as inputs, the supervised classifier
algorithm was executed. The supervised classifier created three unique pieces of data: an
unsupervised classification map, a supervised classification map, and a classification map
containing the original regions of interest selected through ENVI.
From these unique pieces of data, a number of metrics were to be calculated. The first
metric is a qualitative analysis of the output classification map from the supervised and
unsupervised classification algorithm. Through comparing the results from the
classification map to the original image, a determination of the accuracy can be made.
The second metric is the confusion matrix. A confusion matrix was calculated for two
test cases using the regions of interest selected as ground truth, one for the supervised
classification and one for the unsupervised classification. The third metric are scatter
plots created using the Jeffries-Matusita distance and the transformed divergence. By
generating scatter plots using the three bands specified by the measurement of the two
distances, a qualitative determination of the classification can be made. If, upon plotting,
the pixels belonging to each class do not appear separable, then it can be said with a
qualified level of certainty that the classification accuracy was poor.
3