Finding essential features for tracking starfish in a video sequence V. Di Gesú , F. Isgró , D. Tegolo , E. Trucco + Università di Palermo, Italy * Heriot-Watt University, Edinburgh, U.K digesu,fisgro,tegolo @math.unipa.it, f.isgro,e.trucco @hw.ac.uk Abstract This paper introduces a software system for detecting and tracking starfish in an underwater video sequence. The target of such a system is to help biologists in giving an estimate of the amount of starfish present in a particular area of the sea-bottom. The nature of the input images is characterised by a low ratio signal/noise and by the presence of noisy background represented by pebbles; this makes the detection a non-trivial task. The procedure we used is a chain of several steps that starting from the extraction of the area of interest ends with a classifier and a tracker, providing the necessary information for counting the starfish present in the scene. 1 Introduction Underwater images have been used recently for a variety of inspection tasks, in particular for military purposes as mine detection, or for the inspection of underwater pipelines [1], cables or platforms [12], or the detection of hand-made objects [11]. A number of underwater missions are for biological studies, as the inspection of underwater life. Despite the fact of the large number of such missions, and that image analysis techniques are starting to be adopted in the fish farming field [10], the majority of the inspection of the video footage recorded during the missions is mostly done manually, as research trying to use image analysis techniques for biological mission is relatively new. In [16] a system for classifying plankton images is described. A camera model is developed by Holland for measuring near-shore fluid processes, fore-shore topography and drifter motions [6]. Active contour models (snakes) are adopted in [7] for tracking and identifying plankton. Colour analysis is used in [15] for classification of coral reef. In this paper we present a simple system for the analysis of underwater video stream for biological studies. The particular task considere here is detecting and tracking starfish in underwater video sequences. The target of such a system is determining the amount of starfish in a particular area of the sea-bottom. The problem we tackle in this work is non-trivial, because of a number of reasons; in particular: the low quality of underwater images bringing a very low signal to noise ratio; the different kind of possible backgrounds as starfish can be found on various classes of sea-bottoms (e.g., sand, ..., rock); The system we present here is a chain of several modules (see Figure 1): it starts with a simple module extracting areas of interest (connected components) from the binarised input image; a second module extracts for each component a set of features describing its shape: then this features are used for checking if a component can be classified as starfish and for tracking the component across the sequence; a last module takes care of counting the starfish detected and tracked. Experiments performed on the classification module on a sample of 1090 candidates report an average success rate for the detection of 96%. The tracker returns satisfactory results, tracking correctly most of the components extracted. The paper is structured as follows. The next section gives an overview of the system. The method adopted for selection areas of interest is described in section 3. In section 4 we describe the features that we extract from the areas of interest for the classification, and section 5 briefly discusses the classification methodology used for this system. Section 6 describes the tracking module, and the counting procedure is described in section 7. Experimental results are reported and discussed in section 8, and section 9 is left to final remarks and future developments. 2 System overview The system, whose logic structure is depicted in Figure 1, works as a pipeline of the following different modules: 1. Data acquisition: each single frame of the underwater video sequence (live video or recorded off-line), is read by the system for processing; 2. Extraction of areas of interest: the current frame of the video sequence is first binarised and then candidate starfish are extracted as large connected components (section 3); 3. Computation of shape indicators (features): for each candidate a set of features are computed. The features chosen are a set of shape descriptors (section 4). 4. Classification: this module discriminates the candidate starfish between starfish and non-starfish, using the features extracted by the previous module (section 5). 5. Tracker: all the components extracted are tracked across the video sequences in order to avoid that already detected starfish are counted more than once (section 6). 6. Counter: This last module implements the core of the counting of the starfish, by considering components which have been classified as starfish for most of their trajectory. 3 Selection of areas of interest This first module detects areas of the image that are likely to include starfish. The objective of this module is to select everything that can be a starfish, regardless of the number of false positives that can be extracted: it will be the classification module taking care of discarding the false positives. The method adopted is very simple. We first binaries the image using a simple adaptive threshold [3], that computes local statics for each pixel (mean value and standard deviation ) in a window of size , and assign a binary value to the pixel of grey level according to the following rule !#"%$'& (*),021+.3547-'62,8 "%$':& 9 (*4 ) / From the binary images all the connected components are extracted using a standard iterative algorithm [5] and the ones having a small area ;#< are filtered out using a size filter based on the simple X84 rejection rule [4], an efficient outlier rejection method for robust estimation. Assuming a Gaussian distribution of the areas, the rule works removing all the components such that ; <>=@?A BDC , where C is given by EGFIH< JLKNM OQP ; < +REGFIHS JLKNM ; S PUT . 4 Features extraction The definition of suitable shape descriptors is essential for the classification phase. In our case the shape de- Figure 2. Examples of input images. scriptors have been suggested by the morphological structure [14] of the starfish. We identified three indicators that are combined into a feature vector to discriminate the connected components extracted between starfish and noise. Geometric indicator The convex hull [9] of the connected component is computed, then a geometric shape indicator, V , is defined as: VW ;##; X XY X where ;#XX is the area of the connected component, and ;XY represents the area of the convex hull. Small values of V will mostly represent starfish, as we expect that the area of the convex hull is much larger of the area of the starfish because of the tentacles. Morphological indicator The morphological shape indicator, Z , is computed by applying the opening morphological operator to the connected component: Z[ ];#; \^XXX where ; \^X is the area of the result obtained applying the opening to the connected component. Starfish are likely to return small values for the Z indicator, as we expect that the opening operator removes most of Video stream Extraction of ROIs Computation of features Classification Tracker Starfish counter Figure 1. Schematic representation of the system. tentacles, so that the area of the resulting component will be smaller than the area of the original connected component. Histogram indicator This indicator is based on the statistics mean values, , and variances, , of the histograms by row and by column of the component to be analysed: $ B ( where $ and ( . Small values of this indicator characterise uniform distribution of the pixels of the component. Therefore starfish components will have small values for . 5 The classifier For the classification module we adopted a simple Bayesian classifier [2]. Let and represent the starfish class and the non-starfish class respectively, and let be a vector in the feature space. Given a feature vector what we want is to compute the a posteriori probabilities < P of a vector to belong to the class < , and assign the vector to the class having the largest < P . The Bayes’ formula states that < P P < < BA (1) We assume Gaussian model for the a posteriori probabilities of ( P < ) in the features space VZ5 ( < P ). The a priori probability of ( ) can be negleted as it is constant for all classes, therefore < P can be computed from the prior of class < and the P < that is precomputed from a training set feature vectors. It is worth to notice that what we consider as the nonstarfish class is not everything that is not a starfish, but only material that can be found on the sea-bottom together with starfish (mainly pebbles). Therefore our non-starfish class is well defined, and it can be seen from Figure 4 that most of the feature vectors fall in a bounded region of the feature space. This can justify the use of a Gaussian distribution to model the non-starfish class, although the cluster formed by the features vectors for this class is not so well shaped as the one formed by the starfish class. 6 The tracker Video tracking is the problem of following moving targets through an image sequence. In our case the targets we are tracking are regions, defined as connected parts of the image. Region-based tracking systems have been reported for several tasks, such as surveillance [8], vehicle guidance and servoing [13], just to name a few. In general these regions have distinguishing intensity or colour properties (e.g., colour histogram, texture statistics, statistical differences from an adaptive or fixed background) that can be used as features for the tracking. In our case we might have used the grey-level texture of the connected component from the original images for matching the components between consecutive frames, but because of the low quality of the video sequences the texture is not well discriminating between the different components. We preferred then to perform the tracking on binary connected components, using the geometric descriptors extracted by the features extraction module as distinguishing property for the tracking. The solution we adopted is again quite simple. We associate each component < S extracted from a frame with a vector < S < S V < S Z < S < S , where < S and < S are the coordinates of the centroid of the connected component, and the other entries are the shape descriptors already described. For each component extracted from the frame we define in the next frame a window search of size , centred in < S '< S . 7 Starfish counter Figure 3. Examples of the components extracted from the video sequences. First row show examples of starfish. The second row shows a selection of elements from the nonstarfish class. For each one of the components < & Y extracted from frame for which the centroid < & Y < & Y falls inside the window , we compute the Euclidean distance < S < & Y between the features vectors V< S ZD< S < S and VQ< & Y Z2< & Y < & Y , and associate the component the component < & Y having the smallest dis < S with tance < S < & Y . A problem. which is a typical problem for video tracking, is the ghosting effect, that is, in our case, when a component is not matched with any feature in the next frame, because the connected component has not been extracted, but it appears again after two frames or more. Because of this we cannot remove a trajectory from the tracker map as soon a region disappears. What we do in the case a com ponent detected in frame is not tracked in frame , is to double the size of the search window in frame B . If the component is still not tracked in frame B , we make the hypothesis that the component has disappeared from the field of view and remove the trajectory temporarily. For new regions in the next 4 frames we test closeness and similarity with the removed region, and if a match if found we reintegrate the trajectory, otherwise we remove the trajectory for good. This last module collects the results from the tracker and from the classifier and updates a counter for the number of different starfish detected in the video sequence. The classification module returns for each component in the frame a label marking the component as starfish or non-starfish, and the tracker module returns trajectories of connected components. The counter module merges the information and once a trajectory has been returned by the tracker, i.e., the associated component has gone out of the field of view. the trajectory associated to a connected component is a list of pairs , where is the frame number and is the vector defined in Section 6. Let be the length of the trajectory. The counter checks the number of frames in the trajectory where the component has been classified as starfish. If B the component is classified as starfish and the counter updates the number of starfish detected. It is worth to mention that the normalised difference + +* might be used as confidence measure for the classification. In fact, assuming no occlusion in the field of view, we expect that a well imaged starfish is recog nised as such for most of the frames in the trajectory ( close to ), whereas noisy and dubious cases can be clas sified in different ways across the sequence, returning a / close to AU? . Finally if a component is a non-starfish its / must be close to . 8 Experimental results We tested our system on different video sequences obtained as different chunks of a long video from an underwater mission. We performed different experiments in order to test the performance of the classifier, of the tracker and of the starfish counting. 8.1 Experiments of the classifier We classified manually a number of connected components from three different video sequences. A set of 394 components (197 starfish and 197 nonstarfish) from the first video sequence, were used as training set in order to estimate the two Gaussian distributions. The two clusters of points in the feature space relative to the training set are shown in Figure 4. A second set of 348 components, divided in 174 starfish and 174 non-starfish, and a third set of 742 components, divided in 371 starfish and 371 non-starfish, have been used as test sets. The two sets were extracted from the second and third video sequence respectively. The results are reported in Table 1. In general we can observe that the success rate in classifying elements from the starfish class is high (in the 9 Conclusions Figure 4. Plot of the distribution of the training set in the feature space. The dark points represent elements in the non-starfish class, the grey crosses elements in the starfish class. order of 98%), that is a very good results for such a simple classifier. Higher is the error in classifying elements from the non-starfish class (in the order of 7%). This is due to the fact that we included among the non-starfish some components that are small parts of a starfish (such as tentacles), and these have morphological properties similar to the starfish. A way to overcome these problem is to identify a feature discriminating between starfish and this sub-class of starfish and adopt a multistep classifier, or add this feature to the feature space if different from the three adopted. 8.2 Experiments on the tracker and counter We performed two experiments on two sequences of 200 and 300 frames respectively where we manually counted the number of starfish in the scene. The counting procedure returns a result of 4 starfish for the first sequence, against a ground truth of 5, and a result of 17 agains a ground truth of 20. The graphs showing the evolution of the tracking are shown in Figure 6. The errors are mainly due to a strong ghosting effect for a pair of components. This components are starfish with not well defined contours, due to low quality of underwater images and small size of the starfish. This makes the extraction of the components quite hard sometimes, so that some regions cannot be extracted from all the frames. If a component is not extracted for more than 4 frames, as it happens in this case, the tracker returns the trajectory to the counter module and a new starfish is added. In Figure 5 we show a sample of three frames from the sequence used for this experiment. The temporal distance between two consecutive frames in the figure is is three frames. The tracked regions are marked with red different symbols. This paper presented a system for the detection and counting of starfish from underwater video sequences. The system is composed by a chain of modules including a Bayesian classifier, that discriminates if a area of interest extracted from the input image represents a starfish or not, and a tracker for tracking components across the sequence. A last counter module takes care of merging the information from the tracker and the classifier and give an estimate of the amount of starfish present is the scene. We carried separate set of experiments for checking the performance of the classifier and of the tracker/counter. Experiments performed on a number images (more than 1000) show that the classifier has a classification success rate of 96%. For the tracker/counter, an experiment on a sequence of 50 frames with 10 starfish returns a result of 12 starfish counted. The system can be developed and improved in a number of ways. Most of them regard the classification module. First the classification module could implement modern and sophisticated learning techniques (e.g., support vector machines). We might also associate to each classification a confidence level (for instance a candidate is classified as a starfish with 90% confidence). Moreover we might think to extend the classification to more classes, discriminating among different species of starfish. We will need more than the three features described in section 4, and it might be useful to use more than one classifier. The tracker can be improved, especially in the way it deals with the ghosting. In particular we plan to include a Kalman filter in order to give a prediction of the position of the component in the next frame. The filter might also use motion information for the other features, as we are tracking static points on a planar surface. Acknowledgements We thank Dr. Benedetto Ballarò for providing some useful Matlab code. This work has been partially supported by the following projects: EIERO project under grant number EUContract HPRI-CT-2001-00173: the international project for universities scientific cooperation CORI May 2001-EF2001; COSTaction 283. The test data were provided by Dr. Anthony Grehan (Martin Ryan Marine Science Institute, University College, Galway, Ireland). References [1] G. Conte, S. Zanoli, A. Perdon, G. Tascini, and P. Zingaretti. Automatic analysis of visual data in submarine pipeline. In Proceedings of the IEEE OCEANS Conference, pages 1213– 1219, 1996. # of components Test1024b Tes21550b 348 (2 742 (2 174) 371) % Errors # of Errors 3.7 4.8 14 36 Mis-classified starfish % Errors # of Errors 1.72 3 2.1 8 Mis-classified non-starfish % Errors # of Errors 6.3 11 7.5 28 Table 1. Results of the experiments on the two test sets Figure 5. Results of the tracking procedure. Tracked regions are marked with red different symbols. Figure 6. Evolution of the starfish counting across two test sequences [2] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley, 2001. [3] R. C. Gonzalez and R. E. Woods. Digital image processing. Addison Wesley, 1993. [4] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel. Robust Statistics: the approach based on influence functions. John Wiley & Sons, 1986. [5] R. Haralick and L. Shapiro. Computer and Robot Vision, volume I. Wiley, 1992. [6] K. Holland, R. Holman, T. Lippmann, and J. Stanley. Practical use of video imagery in nearshore oceanographic field studies. IEEE Journal of Oceanic Engineering, 22:81–92, 1997. [7] D. Kocak, N. da Vitoria Lobo, and E. Widder. Computer vision techniques for quantifying, tracking, and identifying bioluminescent plankton. IEEE Journal of Oceanic Engineering, 24(1):81–95, 1999. [8] W. Lu and Y. Tan. A color histogram based people tracking system. In Proceeding of IEEE International Symposium on Circuits and Systems, 2001. [9] S. Marchand-Maillet and Y. Sharaiha. Binary digital image processing. Accademic Press, 1982. [10] F. Odone, E. Trucco, and A. Verri. Visual learning of weight from shape using support vector machine. In Proceedings of the British Machine Vision Conference, 1998. [11] A. Olmos and E. Trucco. Detecting man-made objects in unconstrained subsea videos. In Proceedings of the British Machine Vision Conference, 2002. [12] A. Ortiz, M. Simo, and G. Oliver. Image sequence analysis for real-time underwater cable tracking. In Proceedings of the IEEE Workshop on Applications of Computer Vision, pages 230–236, 2000. [13] J. O. P. Remagnino and G. Jones. Multi-camera colour tracking. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1999. [14] J. Serra. Image analysis and mathematical morphology. Accademic Press, 1982. [15] M. Soriano, S. Marcos, C. Saloma, and M. Q. amd P. Alino. Image classification of coral reef components from underwater color video. In Proceedings of the MTS/IEEE OCEANS Conference, volume 2, pages 1008–1013, 2001. [16] X. Tang and W. Stewart. Plankton image clasification using novel parallel-training learning vector quantization network. In Proceedings of the IEEE OCEANS conference, 1996.
© Copyright 2026 Paperzz