Finding essential features for tracking starfish in a video sequence

Finding essential features for tracking starfish in a video sequence
V. Di Gesú , F. Isgró , D. Tegolo , E. Trucco
+ Università di Palermo, Italy
* Heriot-Watt University, Edinburgh, U.K
digesu,fisgro,tegolo @math.unipa.it, f.isgro,e.trucco @hw.ac.uk
Abstract
This paper introduces a software system for detecting
and tracking starfish in an underwater video sequence. The
target of such a system is to help biologists in giving an estimate of the amount of starfish present in a particular area
of the sea-bottom. The nature of the input images is characterised by a low ratio signal/noise and by the presence
of noisy background represented by pebbles; this makes the
detection a non-trivial task. The procedure we used is a
chain of several steps that starting from the extraction of
the area of interest ends with a classifier and a tracker, providing the necessary information for counting the starfish
present in the scene.
1 Introduction
Underwater images have been used recently for a variety
of inspection tasks, in particular for military purposes as
mine detection, or for the inspection of underwater pipelines
[1], cables or platforms [12], or the detection of hand-made
objects [11].
A number of underwater missions are for biological studies, as the inspection of underwater life. Despite the fact of
the large number of such missions, and that image analysis techniques are starting to be adopted in the fish farming field [10], the majority of the inspection of the video
footage recorded during the missions is mostly done manually, as research trying to use image analysis techniques for
biological mission is relatively new. In [16] a system for
classifying plankton images is described. A camera model
is developed by Holland for measuring near-shore fluid processes, fore-shore topography and drifter motions [6]. Active contour models (snakes) are adopted in [7] for tracking
and identifying plankton. Colour analysis is used in [15] for
classification of coral reef.
In this paper we present a simple system for the analysis
of underwater video stream for biological studies. The particular task considere here is detecting and tracking starfish
in underwater video sequences. The target of such a system
is determining the amount of starfish in a particular area of
the sea-bottom.
The problem we tackle in this work is non-trivial, because of a number of reasons; in particular: the low quality
of underwater images bringing a very low signal to noise
ratio; the different kind of possible backgrounds as starfish
can be found on various classes of sea-bottoms (e.g., sand,
..., rock);
The system we present here is a chain of several modules (see Figure 1): it starts with a simple module extracting areas of interest (connected components) from the binarised input image; a second module extracts for each component a set of features describing its shape: then this features are used for checking if a component can be classified as starfish and for tracking the component across the
sequence; a last module takes care of counting the starfish
detected and tracked. Experiments performed on the classification module on a sample of 1090 candidates report an
average success rate for the detection of 96%. The tracker
returns satisfactory results, tracking correctly most of the
components extracted.
The paper is structured as follows. The next section gives
an overview of the system. The method adopted for selection areas of interest is described in section 3. In section
4 we describe the features that we extract from the areas
of interest for the classification, and section 5 briefly discusses the classification methodology used for this system.
Section 6 describes the tracking module, and the counting
procedure is described in section 7. Experimental results
are reported and discussed in section 8, and section 9 is left
to final remarks and future developments.
2 System overview
The system, whose logic structure is depicted in Figure
1, works as a pipeline of the following different modules:
1. Data acquisition: each single frame of the underwater
video sequence (live video or recorded off-line), is read
by the system for processing;
2. Extraction of areas of interest: the current frame of
the video sequence is first binarised and then candidate
starfish are extracted as large connected components
(section 3);
3. Computation of shape indicators (features): for each
candidate a set of features are computed. The features
chosen are a set of shape descriptors (section 4).
4. Classification: this module discriminates the candidate starfish between starfish and non-starfish, using
the features extracted by the previous module (section
5).
5. Tracker: all the components extracted are tracked
across the video sequences in order to avoid that already detected starfish are counted more than once
(section 6).
6. Counter: This last module implements the core of the
counting of the starfish, by considering components
which have been classified as starfish for most of their
trajectory.
3 Selection of areas of interest
This first module detects areas of the image that are
likely to include starfish. The objective of this module is
to select everything that can be a starfish, regardless of the
number of false positives that can be extracted: it will be
the classification module taking care of discarding the false
positives.
The method adopted is very simple. We first binaries the
image using a simple adaptive threshold [3], that computes
local statics for each pixel (mean value and standard deviation ) in a window of size , and assign a binary value
to the pixel of grey level according to the following
rule
!#"%$'& (*),021+.3547-'62,8 "%$':& 9 (*4 )
/
From the binary images all the connected components
are extracted using a standard iterative algorithm [5] and
the ones having a small area ;#< are filtered out using a size
filter based on the simple X84 rejection rule [4], an efficient
outlier rejection method for robust estimation. Assuming a
Gaussian distribution of the areas, the rule works removing
all the components such that ; <>=@?A BDC , where C is given
by EGFIH< JLKNM OQP ; < +REGFIHS JLKNM ; S PUT .
4 Features extraction
The definition of suitable shape descriptors is essential for the classification phase. In our case the shape de-
Figure 2. Examples of input images.
scriptors have been suggested by the morphological structure [14] of the starfish. We identified three indicators that
are combined into a feature vector to discriminate the connected components extracted between starfish and noise.
Geometric indicator The convex hull [9] of the connected component is computed, then a geometric shape indicator, V , is defined as:
VW ;##; X XY X
where ;#XX is the area of the connected component, and ;XY
represents the area of the convex hull.
Small values of V will mostly represent starfish, as we
expect that the area of the convex hull is much larger of the
area of the starfish because of the tentacles.
Morphological indicator The morphological shape indicator, Z , is computed by applying the opening morphological operator to the connected component:
Z[ ];#; \^XXX
where ; \^X is the area of the result obtained applying the
opening to the connected component.
Starfish are likely to return small values for the Z indicator, as we expect that the opening operator removes most of
Video
stream
Extraction of
ROIs
Computation of
features
Classification
Tracker
Starfish
counter
Figure 1. Schematic representation of the system.
tentacles, so that the area of the resulting component will be
smaller than the area of the original connected component.
Histogram indicator This indicator is based on the statistics mean values, , and variances, , of the histograms by
row and by column of the component to be analysed:
$
B
(
where $ and ( .
Small values of this indicator characterise uniform distribution of the pixels of the component. Therefore starfish
components will have small values for .
5 The classifier
For the classification module we adopted a simple
Bayesian classifier [2]. Let and represent the starfish
class and the non-starfish class respectively, and let be a
vector in the feature space. Given a feature vector what
we want is to compute the a posteriori probabilities < P of a vector to belong to the class < , and assign the vector
to the class having the largest < P .
The Bayes’ formula states that
< P P < < BA
(1)
We assume Gaussian model for the a posteriori probabilities of ( P < ) in the features space VZ5 ( < P ).
The a priori probability of ( ) can be negleted as it is
constant for all classes, therefore < P can be computed
from the prior of class < and the P < that is precomputed from a training set feature vectors.
It is worth to notice that what we consider as the nonstarfish class is not everything that is not a starfish, but only
material that can be found on the sea-bottom together with
starfish (mainly pebbles). Therefore our non-starfish class
is well defined, and it can be seen from Figure 4 that most
of the feature vectors fall in a bounded region of the feature
space. This can justify the use of a Gaussian distribution to
model the non-starfish class, although the cluster formed by
the features vectors for this class is not so well shaped as
the one formed by the starfish class.
6 The tracker
Video tracking is the problem of following moving targets through an image sequence. In our case the targets
we are tracking are regions, defined as connected parts of
the image. Region-based tracking systems have been reported for several tasks, such as surveillance [8], vehicle
guidance and servoing [13], just to name a few. In general
these regions have distinguishing intensity or colour properties (e.g., colour histogram, texture statistics, statistical
differences from an adaptive or fixed background) that can
be used as features for the tracking.
In our case we might have used the grey-level texture
of the connected component from the original images for
matching the components between consecutive frames, but
because of the low quality of the video sequences the texture is not well discriminating between the different components. We preferred then to perform the tracking on binary
connected components, using the geometric descriptors extracted by the features extraction module as distinguishing
property for the tracking.
The solution we adopted is again quite simple. We
associate each component < S extracted from a frame with
a vector < S < S V < S Z < S < S , where < S and < S
are the coordinates of the centroid of the connected component, and the other entries are the shape descriptors already
described.
For each component extracted from the frame
we
define
in the next frame a window search of size
, centred in < S '< S .
7 Starfish counter
Figure 3. Examples of the components extracted from the video sequences. First row
show examples of starfish. The second row
shows a selection of elements from the nonstarfish class.
For each
one of the components < & Y extracted from
frame for which the centroid < & Y < & Y falls inside
the
window
, we compute the Euclidean distance
< S < & Y between the features vectors V< S ZD< S < S and VQ< & Y Z2< & Y < & Y , and associate the component
the component < & Y having the smallest dis
< S with
tance < S < & Y .
A problem. which is a typical problem for video tracking, is the ghosting effect, that is, in our case, when a component is not matched with any feature in the next frame,
because the connected component has not been extracted,
but it appears again after two frames or more. Because of
this we cannot remove a trajectory from the tracker map as
soon a region disappears. What we do in the case a com
ponent detected in frame is not tracked in frame , is
to double the size of the search window in frame B . If
the component is still not tracked in frame B , we make
the hypothesis that the component has disappeared from the
field of view and remove the trajectory temporarily. For new
regions in the next 4 frames we test closeness and similarity
with the removed region, and if a match if found we reintegrate the trajectory, otherwise we remove the trajectory
for good.
This last module collects the results from the tracker and
from the classifier and updates a counter for the number of
different starfish detected in the video sequence.
The classification
module returns for each component in
the frame a label marking the component as starfish or
non-starfish, and the tracker module returns trajectories of
connected components. The counter module merges the information and once a trajectory has been returned by the
tracker, i.e., the associated component has gone out of
the field of view. the trajectory
associated to a connected
component is a list of pairs , where is the frame
number and is the vector defined in Section 6.
Let be the length of the trajectory. The counter checks
the number of frames in the trajectory where the component has been classified as starfish. If B the component is classified as starfish and the counter updates the
number of starfish detected.
It is worth to mention that the normalised difference
+ +* might be used as confidence measure
for the classification. In fact, assuming no occlusion in the
field of view, we expect that a well imaged starfish is recog nised as such for most of the frames in the trajectory (
close to ), whereas noisy and dubious cases can be clas sified in different ways across the sequence, returning a /
close to AU? . Finally if a component is a non-starfish its
/
must be close to .
8 Experimental results
We tested our system on different video sequences obtained as different chunks of a long video from an underwater mission. We performed different experiments in order to
test the performance of the classifier, of the tracker and of
the starfish counting.
8.1 Experiments of the classifier
We classified manually a number of connected components from three different video sequences.
A set of 394 components (197 starfish and 197 nonstarfish) from the first video sequence, were used as training
set in order to estimate the two Gaussian distributions. The
two clusters of points in the feature space relative to the
training set are shown in Figure 4.
A second set of 348 components, divided in 174 starfish
and 174 non-starfish, and a third set of 742 components, divided in 371 starfish and 371 non-starfish, have been used as
test sets. The two sets were extracted from the second and
third video sequence respectively. The results are reported
in Table 1. In general we can observe that the success rate
in classifying elements from the starfish class is high (in the
9 Conclusions
Figure 4. Plot of the distribution of the training
set in the feature space. The dark points represent elements in the non-starfish class, the
grey crosses elements in the starfish class.
order of 98%), that is a very good results for such a simple classifier. Higher is the error in classifying elements
from the non-starfish class (in the order of 7%). This is due
to the fact that we included among the non-starfish some
components that are small parts of a starfish (such as tentacles), and these have morphological properties similar to the
starfish. A way to overcome these problem is to identify a
feature discriminating between starfish and this sub-class of
starfish and adopt a multistep classifier, or add this feature
to the feature space if different from the three adopted.
8.2 Experiments on the tracker and counter
We performed two experiments on two sequences of 200
and 300 frames respectively where we manually counted
the number of starfish in the scene. The counting procedure
returns a result of 4 starfish for the first sequence, against a
ground truth of 5, and a result of 17 agains a ground truth
of 20. The graphs showing the evolution of the tracking are
shown in Figure 6. The errors are mainly due to a strong
ghosting effect for a pair of components. This components
are starfish with not well defined contours, due to low quality of underwater images and small size of the starfish. This
makes the extraction of the components quite hard sometimes, so that some regions cannot be extracted from all the
frames. If a component is not extracted for more than 4
frames, as it happens in this case, the tracker returns the trajectory to the counter module and a new starfish is added.
In Figure 5 we show a sample of three frames from the
sequence used for this experiment. The temporal distance
between two consecutive frames in the figure is is three
frames. The tracked regions are marked with red different
symbols.
This paper presented a system for the detection and
counting of starfish from underwater video sequences. The
system is composed by a chain of modules including a
Bayesian classifier, that discriminates if a area of interest
extracted from the input image represents a starfish or not,
and a tracker for tracking components across the sequence.
A last counter module takes care of merging the information from the tracker and the classifier and give an estimate
of the amount of starfish present is the scene.
We carried separate set of experiments for checking the
performance of the classifier and of the tracker/counter. Experiments performed on a number images (more than 1000)
show that the classifier has a classification success rate of
96%. For the tracker/counter, an experiment on a sequence
of 50 frames with 10 starfish returns a result of 12 starfish
counted.
The system can be developed and improved in a number of ways. Most of them regard the classification module.
First the classification module could implement modern and
sophisticated learning techniques (e.g., support vector machines). We might also associate to each classification a
confidence level (for instance a candidate is classified as a
starfish with 90% confidence). Moreover we might think
to extend the classification to more classes, discriminating
among different species of starfish. We will need more than
the three features described in section 4, and it might be
useful to use more than one classifier.
The tracker can be improved, especially in the way it
deals with the ghosting. In particular we plan to include a
Kalman filter in order to give a prediction of the position of
the component in the next frame. The filter might also use
motion information for the other features, as we are tracking
static points on a planar surface.
Acknowledgements
We thank Dr. Benedetto Ballarò for providing some useful Matlab code. This work has been partially supported by
the following projects: EIERO project under grant number EUContract HPRI-CT-2001-00173: the international project for universities scientific cooperation CORI May 2001-EF2001; COSTaction 283. The test data were provided by Dr. Anthony Grehan
(Martin Ryan Marine Science Institute, University College, Galway, Ireland).
References
[1] G. Conte, S. Zanoli, A. Perdon, G. Tascini, and P. Zingaretti.
Automatic analysis of visual data in submarine pipeline. In
Proceedings of the IEEE OCEANS Conference, pages 1213–
1219, 1996.
# of components
Test1024b
Tes21550b
348 (2
742 (2
174)
371)
% Errors
# of Errors
3.7
4.8
14
36
Mis-classified starfish
% Errors # of Errors
1.72
3
2.1
8
Mis-classified non-starfish
% Errors
# of Errors
6.3
11
7.5
28
Table 1. Results of the experiments on the two test sets
Figure 5. Results of the tracking procedure.
Tracked regions are marked with red different
symbols.
Figure 6. Evolution of the starfish counting
across two test sequences
[2] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley, 2001.
[3] R. C. Gonzalez and R. E. Woods. Digital image processing.
Addison Wesley, 1993.
[4] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A.
Stahel. Robust Statistics: the approach based on influence
functions. John Wiley & Sons, 1986.
[5] R. Haralick and L. Shapiro. Computer and Robot Vision,
volume I. Wiley, 1992.
[6] K. Holland, R. Holman, T. Lippmann, and J. Stanley. Practical use of video imagery in nearshore oceanographic field
studies. IEEE Journal of Oceanic Engineering, 22:81–92,
1997.
[7] D. Kocak, N. da Vitoria Lobo, and E. Widder. Computer
vision techniques for quantifying, tracking, and identifying
bioluminescent plankton. IEEE Journal of Oceanic Engineering, 24(1):81–95, 1999.
[8] W. Lu and Y. Tan. A color histogram based people tracking
system. In Proceeding of IEEE International Symposium on
Circuits and Systems, 2001.
[9] S. Marchand-Maillet and Y. Sharaiha. Binary digital image
processing. Accademic Press, 1982.
[10] F. Odone, E. Trucco, and A. Verri. Visual learning of weight
from shape using support vector machine. In Proceedings of
the British Machine Vision Conference, 1998.
[11] A. Olmos and E. Trucco. Detecting man-made objects in
unconstrained subsea videos. In Proceedings of the British
Machine Vision Conference, 2002.
[12] A. Ortiz, M. Simo, and G. Oliver. Image sequence analysis for real-time underwater cable tracking. In Proceedings
of the IEEE Workshop on Applications of Computer Vision,
pages 230–236, 2000.
[13] J. O. P. Remagnino and G. Jones. Multi-camera colour tracking. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 1999.
[14] J. Serra. Image analysis and mathematical morphology. Accademic Press, 1982.
[15] M. Soriano, S. Marcos, C. Saloma, and M. Q. amd P. Alino.
Image classification of coral reef components from underwater color video. In Proceedings of the MTS/IEEE
OCEANS Conference, volume 2, pages 1008–1013, 2001.
[16] X. Tang and W. Stewart. Plankton image clasification using
novel parallel-training learning vector quantization network.
In Proceedings of the IEEE OCEANS conference, 1996.