USING PURE AND MIXED OBJECTS IN THE TRAINING OF OBJECT-BASED IMAGE CLASSIFICATIONS H. Costa a*, G. M. Foody a, D. S. Boyd a a School of Geography, University of Nottingham, Nottingham NG7 2RD, UK - (lgxhag, giles.foody, doreen.boyd)@nottingham.ac.uk KEY WORDS: Artificial neural networks, OBIA, Over-segmentation, Sample size, Under-segmentation, WorldView. ABSTRACT: Training of object-based land cover classifications is often performed with objects generated via image segmentation. The objects are commonly assumed to be thematically pure or excluded from training if a mixture of classes is associated with them. However, excluding mixed objects has several consequences such as reducing the size of the training data sets. In this study, it is hypothesized that mixed objects may be used in the training stage of a classification to increase the accuracy with which land cover may be mapped from remotely sensed data, with outputs evaluated in relation to a conventional analysis using only pure objects in training. WorldView-2 data covering the University Park campus of the University of Nottingham were submitted to a series of segmentation analyses in which a range of under- to over-segmentation outputs were intentionally produced. Training objects representing four classes (bare soil, impervious surfaces, vegetation, and water) were selected from the segmentation outputs, resulting in training samples of varying size and proportion of mixed objects. A single-layer artificial neural network equivalent to multinomial logistic regression and able to use both pure and mixed training units was adopted as the classifier. A visual inspection of the results shows that using mixed training objects produced land cover maps of higher quality. Furthermore, the overall and class-specific accuracy of the classifications was systematically higher when mixed training was used (e.g. up to 48% in overall accuracy). The advantage of using mixed objects in training was beneficial even when the size of the mixed training samples was equivalent to that of the pure training samples. 1. INTRODUCTION In object-based image analysis it is common for some of the objects generated via image segmentation to be of mixed thematic composition as a result of under-segmentation error (Clinton et al. 2010). This violates the commonly made assumption of object purity that is implicit in a conventional classification analysis, and thus can be a problem throughout the analysis. For instance, mixed objects can result in degraded training statistics and act to reduce mapping accuracy (Wang et al. 2004; Gao et al. 2011; Hirata and Takahashi 2011). Therefore, mixed objects are often excluded from the training stage so that they are not used in the derivation of training statistics (Dean and Smith 2003; Smith 2010; Dronova et al. 2011; Cai and Liu 2013). Excluding mixed objects has, however, several consequences. Using only pure objects has the drawback of representing only homogenous areas in training whereas afterwards the classifier may be confronted with some mixed objects. Thus pure objects may be not totally representative of the characteristics of the objects that a classifier has to classify. Furthermore, excluding mixed objects has the consequence that the size of the training samples will inevitably be reduced. Deviation from the object purity assumption can, however, sometimes be made throughout a classification analysis (e.g. Foody 1999a). Specifically, impure units can be accounted for in training (Foody 1997; Zhang and Foody 2001; Eastman and Laney 2002; Matthew 2012), class allocation (Wang 1990; Foody 1996; Dronova et al. 2011), and testing stages of a supervised image classification (Foody 1995; Binaghi et al. * Corresponding author 1999; Stehman et al. 2007). However, little research has been undertaken on the use of mixed units in training object-based image classifications. This paper sets out to test the hypothesis that mixed objects may be used in the training stage of a classification to increase the accuracy with which land cover may be mapped from remotely sensed data, with outputs evaluated in relation to a conventional analysis using only pure objects in training. 2. MATERIALS 2.1 Study area A square area of 2502 m per 2502 m over the University Park campus of the University of Nottingham in the United Kingdom was used to undertake the analyses (Figure 1). The central coordinates of the study area are 52°56'28.212"N and 1°11'44.3217"W. A total of four land cover classes were defined: bare soil, impervious surface, vegetation, and water. 2.2 Spectral data A WorldView-2 (WV2) image of acquired on 26 May 2012 was used (Figure 1). This image is comprised of eight spectral wavebands (coastal, blue, green, yellow, red, red edge, NIR1, and NIR2) with a spatial and radiometric resolution of 2 m and 11 bit respectively. methods used included segmentation of the WV2 image, production of training samples, classification of the segmentation outputs, and assessment of the classification accuracy. 3.1 Image segmentation The eight spectral bands of the WV2 image were segmented to generate objects using the multiresolution algorithm implemented in GeoDMA software (Körting et al., 2013), version 0.2.1, which is based on the popular algorithm of Baatz and Schäpe (2000). This is a region-based algorithm that uses spectral and shape properties of the objects being generated, and the most influential parameter of the algorithm, and hence most often manipulated, is scale. Figure1. Subset of the WorldView-2 image acquired on 26 May 2012 over the study area (University Park campus of the University of Nottingham and its environs). The circular areas of 100 m radius outlined orange are clusters randomly located used for selecting training objects. The red points locate the pixels of the random testing sample used to assess the accuracy of the classification produced using segmentation scale=50 and pure training. 2.3 Reference data Reference data were produced to assist the production of training and testing samples. Visual interpretation of the WV2 image (including the spectral wavebands and a panchromatic band of 0.5 m spatial resolution) was the basis for populating training and testing samples with land cover class labels. Imagery available through Google Earth was also inspected. The training samples were produced via cluster sampling (Whiteside et al., 2014) in which the primary sampling unit, the clusters, was defined to be ten circular areas of 100 m radius randomly located (Figure 1). Thus, a total of ~5% of the study area was allocated to training purposes. The land cover in the clusters was delimited by visual interpretation as explained above. The secondary sampling unit was the objects generated via image segmentation that intersected the clusters (see 3.2). Testing samples were produced to assess the accuracy with which WV2 data was classified based on each of the segmentation outputs produced. Therefore, a total of eight samples were produced. They comprised 50 randomly selected pixels per mapped class (Figure 1 shows an example). Although the classifications were object-based, the pixel is a legitimate and practical option for accuracy assessment (Stehman and Wickham, 2011). Land cover in the testing pixels was determined by visual interpretation. Although it may be beneficial to address the potential thematic mixed nature of the pixels, only the dominant class was considered for simplicity. 3. METHODS A series of analyses were undertaken to explore the potential of using mixed objects in training object-based classifications. The A series of segmentation analyses were undertaken in which the parameter scale was manipulated in order to produce a range of under- to over-segmented outputs. The value of the scale parameter was set at 30, 50, 70, and 90 while the remaining parameters were set at 50. As a result, four segmentation outputs were obtained, ranging from over-segmented results (mostly composed of small and possibly pure objects) to undersegmented results (mostly composed of large and possibly mixed objects). In this study an object was taken to be pure if the dominant class covered more than 90% of the object’s area, similar to Cai and Liu (2013). 3.2 Training Training samples were produced to classify each of the segmentation outputs generated. The training samples were formed by the objects generated with each of the parameter settings that intersected the primary sampling unit defined, the 10 random clusters (section 2.3). Therefore, the set of training objects used varied between the segmentation outputs (ranging from over-segmented to under-segmented training objects) while the same geographical area was used in training each classification. The eight spectral bands were used to calculate training statistics, which were the mean and standard deviation of the pixel values associated with the training objects. The mean digital number provides a value of central tendency whereas standard deviation provides a value of variability (texture). As a result, 16 bands were used as discriminating variables in classifications. The training objects were assigned reference class labels accordingly to those visually delimited in the clusters. The proportion of the area that each class occupied in a training object was calculated. The proportions calculated were 0.0 if the class was absent and 1.0 if the object was pure. Intermediate values for at least two classes were calculated when the object was of mixed class composition. Two training strategies were followed. First, the traditional procedure of using only pure objects at the training stage was tested (i.e. the mixed objects, whose dominant class covered <90% of the object, were excluded). Second, all of the training objects were used even if the dominant class covered <90% of the area. The fractional coverage of the classes found in the objects was used as a measure of class membership, and objects were allowed multiple and partial membership. 3.3 Classification A multinomial log-linear classifier was applied via a neural network with no hidden layer (R Core Team, 2014; Venables and Ripley, 2002) to produce land cover maps. This neural network allows the objects of mixed class composition to be used in training in a form similar to that explained in Foody (1997) for per-pixel classification. The mean and standard deviation of the objects across the WV2 spectral bands were used as discriminating variables. Although the classifier used produces soft classifications, traditional hard land cover maps were obtained by allocating each object the label of the class with which it had the greatest membership. Each segmented output generated was thus used to produce hard land cover maps based on different training strategies: pure and mixed. 3.4 Accuracy assessment The testing samples were used to assess the accuracy of the classifications. Confusion matrices comparing the reference labels of the testing pixels and the classification labels of the corresponding objects were constructed. Overall accuracy and per-class estimates of accuracy (user’s and producer’s accuracy) were calculated in terms of proportion of area correctly classified as described in Olofsson et al. (2014), including confidence intervals at the 95% confidence level. The accuracy of the four segmentation outputs produced was also assessed. In this case, the accuracy assessment essentially aimed at determining whether the training data sets used were over-segmented, under-segmented, or balanced. The magnitude of under- and over-segmentation errors is associated with the presence of mixed objects. An empirical discrepancy method proposed by Möller et al. (2013) and slightly refined by Costa et al. (2015) was used in this study to assess segmentation accuracy. This method essentially compares the objects generated to a reference data set to measure the geometric match between them. The land cover delimited in the clusters by visual interpretation (previously used for producing the training data sets) was reused as reference data. The outcome of the method is a metric, Mg, that measures the strength and type of error. Negative Mg values indicate that under- segmentation error dominates while positive Mg values represent the opposite case in which over-segmentation error dominates. So, Mg~0 is deemed to be indicative of optimal segmentation accuracy as the two types of error are balanced (Möller et al., 2013). 4. RESULTS The segmentation outputs were, as expected, notably oversegmented when the scale parameter was small as the objects generated were noticeably smaller than when the scale parameter was set at a large value. Figure 2 shows the level of under/over-segmentation error of the training samples. Negative Mg values indicate that under- segmentation error dominates while positive Mg values represent the opposite case in which over-segmentation error dominates. The Mg value closest to 0 was that of parameter scale 50. Figure 2. Image segmentation accuracy. The difference between the classifications produced with pure and mixed training was evident with mixed training affording classification higher accuracy. Figure 3 shows the classifications of the segmentation output produced with scale=50. The estimated overall accuracy of the maps is 43.0±10.4% (pure training, Figure 3a) and 91.5±5.3% (mixed training, Figure 3b) respectively. Figure 4 shows the overall accuracy of all the classifications. Mixed training enabled classification to achieve higher accuracy values than pure training for all of the segmentation settings used. The per-class estimators of accuracy were also higher when the training stage of the classifications was mixed (Figure 5). For example, the user’s accuracy of class vegetation of the map of Figure 3b (produced with mixed training) was larger than that of the map shown if Figure 3a (produced with pure training), specifically 94.0±6.3% and 48.0±14.0% respectively. 5. DISCUSSION The common practice of using only pure objects in training may compromise the accuracy of object-based classifications. Excluding mixed objects from training reduces the value of a training data set and thus should not be adopted. One of the advantages of allowing mixed objects to derive training statistics is that the size of the training data sets is larger than when only pure objects are used. Because in this study the mixed training strategy did not exclude mixed objects from training, the size of the mixed training data sets was larger than that of the pure training data sets (Table 1). Therefore, the difference between the accuracy values of the classifications trained with pure and mixed data sets are partly caused by different sizes of the training samples. Scale 30 50 70 90 Pure 544 278 171 130 Mixed 767 420 262 207 Table 1. Size (number of objects) of the training samples Figure 5. User’s and producer’s accuracy of class vegetation. Figure 3. Segmentation output produced using scale=50 and classified using a) pure and b) mixed training. Yellow, red, green, and blue represent bare soil, impervious, vegetation, and water, respectively. Figure 6. User’s and producer’s accuracy of class impervious surface. Figure 4. Accuracy of the classifications The size of the training data sets is not, however, the only factor explaining the results. Note that the accuracy of the classification of scale=30 that used pure training (544 training objects) was lower than that of classification of scale=90 with mixed training (207 training objects). The results, thus, suggest that using pure training data sets is a drawback in that only homogenous areas are represented at the training stage whereas afterwards the classifier may be confronted with some mixed objects. Mixed training, on the contrary, informs on the occurrence of mixed objects. To highlight the advantage of using mixed training over pure training, an additional analysis was performed without having to deal with different sizes of the training samples. The size of the mixed training data set for the segmentation output produced with scale=50 was reduced from 420 to 228 objects, the latter being the size of the corresponding pure training data set (Table 1). This was achieved by excluding randomly selected training objects, with all of the objects, pure and mixed, having the same probability of being excluded. This allowed the size of the training data set to be reduced without changing substantially the inherent ratio of pure to mixed objects. The reduced mixed training data set enabled classification accuracy to reach 84.2±7.2%, which is lower than the accuracy of the classification that used the entire mixed training data set, but still significantly larger than when using the pure training data set (Figure 4). 6. CONCLUSIONS Image segmentation is a necessary stage for object-based image classification. Commonly segmentation errors produce objects of mixed class composition, which commonly are excluded from the derivation of statistics at the training stage. However, including mixed objects in training is advantageous because the size of a training data set is not reduced and they better represent the characteristics of the objects that a classifier will be confronted with for producing a map via image classification. ACKNOWLEDGEMENTS The WorldView-2 data used was provided by the Earth Observation Technology Cluster, a knowledge transfer initiative funded by the Natural Environment Research Council (grant NE/H003347/1). REFERENCES Baatz, M., Schäpe, A., 2000. Multiresolution Segmentation: an optimization approach for high quality multi-scale image segmentation, in: Strobl, J., Blaschke, T., Griesebner, G. (Eds.), Angewandte Geographische Informationsverarbeitung XII. Beiträge zum AGIT-Symposium Salzburg 2000. Herbert Wichmann Verlag, Heidelberg, Germany, pp. 12-23. Costa, H., Foody, G.M., Boyd, D.S., 2015. Integrating user needs on misclassification error sensitivity into image segmentation quality assessment. Photogrammetric Engineering and Remote Sensing, 81(6), pp. 451-459. Foody, G.M., 1997. Fully fuzzy supervised classification of land cover from remotely sensed imagery with an artificial neural network. Neural Computing & Applications, 5(4), pp. 238-247. Körting, T.S., Fonseca, L.M.G., Câmara, G., 2013. GeoDMA— Geographic Data Mining Analyst. Computers & Geosciences, 57, pp. 133-145. Möller, M., Birger, J., Gidudu, A., Gläßer, C., 2013. A framework for the geometric accuracy assessment of classified objects. International Journal of Remote Sensing, 34(24), pp. 8685-8698. Olofsson, P., Foody, G.M., Herold, M., Stehman, S.V., Woodcock, C.E., Wulder, M.A., 2014. Good practices for estimating area and assessing accuracy of land change. Remote Sensing of Environment, 148, pp. 42-57. R Core Team, 2014. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Stehman, S.V., Wickham, J.D., 2011. Pixels, blocks of pixels, and polygons: Choosing a spatial unit for thematic accuracy assessment. Remote Sensing of Environment, 115, pp. 30443055. Venables, W.N., Ripley, B.D., 2002. Modern applied statistics with S. Fourth Edition ed. Springer, New York. Whiteside, T.G., Maier, S.W., Boggs, G.S., 2014. Area-based and location-based validation of classified image objects. International Journal of Applied Earth Observation and Geoinformation, 28, pp. 117-130.
© Copyright 2026 Paperzz