Analysis of image classification and accuracy assessment Introduction Image classification is considered an important process to recognize the geographical features in the digital remotely sensed images. It is the process of categorizing all pixels into classes (Campbell, 2006). This process can be done automatically by some mathematical programmes. These programmes usually treat each pixel as an individual unit with combined of values in several spectral bands. Mather (2004) has divided the classification process into two stages. The first stage is defining the categories of the objects in the real world. The second stage is labelling the entitles (pixels). There are many classification methods or strategies which have improved by scientists over the years. None of the strategies are the best for all situations because the circumstances for each study and the characteristics of each image differ generally. So that it is vital that each analyst understand the classification strategies to select the most appropriate method for the task at hand (Campbell, 2006). Implementation of the procedure of image classification is done by a computer programme; this programme is called classifier. Sometimes such classifiers isolate the spectral values of the pixels from their neighbours by considering each pixel as a point of observation. These types of classification are considered generally as spectral or point classifiers. As an alternative, the classification process becomes more complex, it considers group of pixels within using both spectral and textural information. This type is called spatial or neighbourhood classification (Campbell, 2006). Commonly, image classification can be separated into two main types; supervised and unsupervised classification. When the analyst interact with the programme or guide the classification by identified each category; the classification is called supervised. On the other hand, the classification is considered unsupervised when the interaction of the analyst is minimal. However, the classification is called a hybrid classification when the classifier shares the characteristics of both supervised and unsupervised methods (Campbell, 2006). Unsupervised Classification Unsupervised classification identifies natural groups of spectral values, and then labels them, after that mapping of these natural groups as classes. Campbell (2006) outlines the advantages and disadvantages of unsupervised classification as follows: Advantages: No detailed prior knowledge of the region is required, but interpreting the results requires knowledge of the region. The opportunity of human errors is minimized. 1 Unique classes are recognized as distinct units. Disadvantages: Unsupervised classification classifies the classes according to their similarities in the spectral values, so that, some pixels are classified within the data that do not necessary related to the informational categories which the analysts interested in. Unsupervised classification is not suitable to generate a specific menu of informational classes due to the analyst limits the menu of classes. Due to the change of the spectral properties of specific informational classes over time, the relationships between the spectral classes and the informational classes are not constant which means that these relationships for an image cannot be extended to another. AMOEBA It is a basic strategy for unsupervised classification. It depends on the contiguity constraint that considers the location of values as spectral classes. It considers a tolerance, which is specified by the analyst, to classify the pixel with its neighbours. So that it is criticized by its contribution of increasing the spectral diversity of classes. It was designed for large homogeneous regions. It seems work well, but it seems less effective in more complex landscapes which consist of smaller parcels (Campbell, 2006). Supervised Classification Supervised Classification depends on using samples of known identity pixels to classify unknown identity pixels. These samples of known identity pixels are selected by the analyst and under his/her supervision, so that this classification type is called supervised classification. These samples are called training areas or training fields. This type of classification is controlled by these samples which are selected under the analyst responsibility and according to his/her prior knowledge that gives the ability to consider these samples under a specific category. Advantages: It is controlled, so it is suitable for generating a classification for purposes of comparison areas or dates It is tied to specific areas (Training areas) No problems of matching spectral categories with the informational categories. The ability of deleting serious errors by examining the training data. Disadvantages The analyst imposes the classification structure. The training area depends on informational categories then refers to two the spectral properties. If this applied on forests, it will ignore the differences in density, age and shadowing. 2 Training areas may not representative of condition if the area to be classified is large. Selecting a good training area is tedious, expensive and takes long time. Campbell (2006) outlines the key characteristics of training areas, which should be considered, as follows: Number of Pixels As a general guideline, the number of pixels selected for each category should be at least 100 pixels. Size of Training Area It is very important and must be large enough for accurate estimates of the properties of each informational class. Also for reliable estimates of the spectral properties of each class, the size of the training data must be as a group include enough pixels but should not too big. Joyce (1978) as cited by Campbell (2006) recommends 4–65 ha (10–165 acres) with preferable 16 ha (40 acres). Analyst should consider the differences in pixel size in variety of sensors. Shape It is not important the shape of training areas, but minimizing the number of vertices like in square of rectangular is preferable. Location The training data are intended to represent variation within the image, so that it must not be clustered in a favoured region which means that the location of the training area is important. Number Due to the diversity of the spectral properties of the informational classes and due to the necessity of representing all the spectral properties of each category. It is very important to provide an optimal number of training areas. Campbell (2006) suggests a number five to ten at a minimum. Placement It is recommended to place the training areas in accurate locations with respect to distinctive features like water bodies. Uniformity 3 It is considered the most important property of a good training area. The frequency distribution for each spectral band should be unimodal. The training areas that show bimodal histograms should be replaced. The Importance of Training Data Scholz et al. (1979) and Hixon et al. (1980) as cited by Campbell (2006) considered that selection of training data is more important than choice of classification algorithm. Also Scholz et al. (1979) as cited in Campbell (2006) concluded that respecting all cover types in the scene is the most important aspect of training areas. Specific methods of supervised classification: There are some methods of supervised classification to classify the pixels not assigned to training fields. Here some of common methods: Parallelepiped Classification It requires the least information from the user. It is sometimes known as box decision rule. It depends on finding a region within a multidimensional data space, and then projection of the spectral values of unclassified pixels into data space. Then assigning the pixels who fall within that region to the appropriate categories (Campbell, 2006). Minimum Distance Classification The spectral data for training areas are plotted in multidimensional data space. The position of each pixel is determined within the clusters that are formed by training data for each category. This determination is according to the values in several bands. After clustering the pixels in groups, the centre of each group must be known then measuring the distance between each unassigned pixel and all groups to assign these pixels to the closest group (Campbell, 2006). ISODATA It is different from minimum distance method. It shares characteristics of both supervised and unsupervised classification, so it is considered as a Hybrid classification (Campbell, 2006). Both Minimum distance method and ISODATA clusters the pixels into groups according to the class mean, but in ISODATA method the classifier repeats the clustering after assigning new pixels. The classifier stops iterations when the average distance falls below the specified threshold or when the change in the average distance is below the specified amount (Mather 2004). Maximum Likelihood This kind of strategy depends on estimating means and variances of classes by using the training data then estimating the probability by considering the brightness in each class. 4 Some values show that a certain pixels might be classified or assigned to more than one class. K-Nearest neighbours (KNN) It depends on the inverse distance weighted. It examines each pixel to be classified with consideration of neighbour pixels, so that nearest neighbours have more influence than more distance neighbours. K is set by the analyst and the algorithm locates the K nearest labelled pixels (Campbell, 2006). Classification Accuracy Assessment: Lillesand (2007) insists on the assessment of the accuracy as a condition to consider the classification is completed. One of the most common methods of accuracy assessment is the Classification Error Matrix (sometimes is called a contingency table or confusion matrix (Lillesand, 2007)). Error matrices compare the relationship between the corresponding results and known reference data. The overall accuracy is calculated by dividing the number of the pixels which is classified correctly by the total number of pixels in the corresponding columns or rows. Another important descriptive measure is the producer’s accuracy. It can be measured by dividing the number of correctly classified pixels in each category by the number of training set pixels used for that category. Also another descriptive measure is the user’s accuracy. It can be computed by dividing the number of the correctly classified pixels in each category by the total number of pixels that were classified in that category (Lillesand, 2007). Applying classification methods After exploring the concept of classification, here is the practical side to have a simple idea about how it works. This practical part has applied on an image of Leicester City-UK. This image is taken by Landsat TM in 1992 (figure:1). Landsat satellites are consists of 7 satellites. The first one has launched in 1972 by the US National Aeronautics and Space administration (NASA). It is considered the first civilian and observation satellite. It is followed by landsat 2,3,4,5,6 and 7. There are differences between these satellites such as the operation periods, types of sensors, number and types of bands and the resolution. Landsat 4 and 5 include TM sensor (Thematic Mapper). The resolution for this sensor is generally 30 metres except for band 6; it is 120 metres. Each band measure a different wavelength. For TM sensor; the bands measure as follows: Band-1: measure 0.45-0.52 Micro metre. Band-2: measure 0.52-0.60 Micro metre. Band-3: measure 0.63-0.69 Micro metre. Band-4: measure 0.76-0.90 Micro metre. 5 Band-5: measure 1.55-1.75 Micro metre Band-6: measure 10.4-12.5 Micro metre. Band-7: measure 2.08-2.35 Micro metre. (Lillesand, 2007). Figure-1: Satellite image of Leicester City, Landsat TM, 1992 All applications of classification can be found under Classifier icon in the main menu. Both of unsupervised and supervised classification will be applied in this part of work as follows: Firstly- Applying of unsupervised classification Figure:2 shows the main window of the unsupervised classification. Analysts or users interact with this window to classify the input raster. The most important interaction with this window in unsupervised classification is specifying the number of classes. This number depends on what does the analyst need to classify. Here we chose 6 classes. Another important input field is the number of iteration to prevent the classifier from running too long or from getting “stuck” in a cycle without reaching the threshold (ERDAS .Inc, 2009). 6 Figure-2: The main window of the unsupervised classification. The result for this classification is displayed in black and white as in figure: 3. Figure-3: The result of unsupervised Classification. 7 It is necessary to change the colours of the classes and Geo-Linking the output image with the input in order to interpret the classes. Figure: 4 shows the output after changing the colours. Figure-4: Changing the colours of classes. It is very important to link the two images together and looking at the spectral profile to interpret the result. Figure: 5 shows 6 points or pixels are selected to represent the classes in the spectral profile as shown in figure: 6 Figure-5: 6 points as a sample to check the spectral values of the classes. 8 Figure-6: Spectral values of the 6 points in Landsat TM bands. It is clear from figure: 6 that class number 2 and 3 has a high reflection in band 4 which measure the vegetation types (Lillesand, 2007). Also it is clear here that the classifier classified the vegetation into a class and a sub-class. Secondly- Applying of supervised classification: At the beginning, signature editor must be run to choose training areas. Figures:7 shows some training areas, some characteristics are considered in taking the training areas such as the shape, number and locations. Figure-7: shows some training areas. 9 It can be seen from the signature editor (figure: 8) the number of training areas which are taken to represent 4 classes. It can be seen 17 training areas as follows: 5 training areas for representing vegetation category. 5 training areas for representing urban category. 4 training areas for representing water category. 3 training areas for representing roads category. Figure-8: shows the signature editors of 17 training areas. After this stage, supervised classification can be applyed by the window in figure: 9. In this window, the analyst can choose the decision rule; Nun-parametric rule like parallelepiped or feature space, or parametric rule like Maximum Likely hood, mahalanobis or minimum distance. The result will show 17 classes for representing 4 categories by Maximum likelihood rule as in figure: 10 10 Figure-9: applying of supervised classification. . Figure-10: classifying the mage into 17 classes for representing 4 categories-Maximum Likelihood. 11 Next stage is the accuracy assessment of this result. It can be done by running the accuracy assessment and then opining the result of classification, then creating random points on the image (figure: 11), and then assigning of each point to the class which is represented by its colour. This can be done by filling the column of class in the table of the accuracy assessment (Figure: 12). After that we can report the accuracy by displaying the error matrix as in figure: 13. Figure-11: creating 25 random points on the output image. Figure-12: filling the column of classes according to the real display or representing the points on the output image. 12 Figuter-13: error matrix of the 25 points. The analyst must know how to read and analyse the error matrix report. In this case, it can be seen from the report the overall accuracy of this classification (%64). Also it can be seen from the report that the producer’s accuracy and the user’s accuracy. It is clear that the user’s accuracy of the class urban-5 is very low (%20). We will try to increase the accuracy of this classification by repeating it without urban class5 which shows the lowest accuracy to see the change in the overall accuracy. To do this we delete this class (urban-5) from the signature file. This can be done by highlighting this class as in figure: 14 and deleting it, then repeating the accuracy report. Figure-14: highlighting urban-5 class to delete it. 13 Now we will repeat the supervised classification without this class and see the overall result. We ran the accuracy assessment and creating 25 points randomly again and then referring each point to the right class (figure: 15) Figure-15: displaying 25 points on the output image and referring each point to the right class. The next stage is to report the accuracy a in figure: 16. It shows a significant increase in the overall accuracy (%76) while it was (%64) in the previous report. Figuter-16 : error matrix after deleting class (urban-5) 14 Conclusion and recommendations: Image classification is considered an important process for recognizing the geographical features in the digital remotely sensed images. It depends on variety of algorithmic methods. It is commonly separated into two main types; unsupervised classification which does not need an interaction with the analyst and supervised classification which need the analyst’s interaction. Each type has advantages and disadvantages. Supervised classification needs to take some training areas and there are some characteristics should be considered during taking the training areas such as the size, shape, location, number, placement and uniformity. There are some methods of supervised classification, for instance, parallelepiped, minimum distance, ISODATA, maximum likelihood and K-Nearest neighbours. The analyst should understand how these methods work in order to choose the most appropriate because there is not optimal method for all circumstances. The accuracy assessment is considered not only an important step at the end of the classification process but also some experts like Lillesand (2007) insist on that the classification process cannot complete without the accuracy assessment. This report tried to apply both unsupervised and supervised classification. Moreover, this report tried to repeat the supervised classification to increase the overall accuracy by deleting the training area which presents the lowest user’s accuracy. The result after deleting this area shows a significant increase in the overall accuracy. However, this report recommends more analysing and investigation on this increase because the reason still does not clear. It might be related to deleting the lowest accuracy area or it might be related to the distribution of the 25 points because we know that they were distributed randomly. Also this report recommends repeating the supervised classification by the same criteria but with more points and also with different distribution. References: - Campbell, J., 2002, Introduction to Remote Sensing (3rd Edition). London: Taylor and Francis. Lillesand, T., R. Kiefer, and J.W. Chipman, 2004, Remote Sensing and Image Interpretation (5th Ed.). New York: Wiley. Mather, P., 2004, Computer Processing of Remotely-Sensed Images (3rd Ed.). Chichester, UK: Wiley. 15
© Copyright 2026 Paperzz