SVM Spatial Pyramid Kernel with Lines and Ellipses Feature

SVM Spatial Pyramid Kernel with Lines and Ellipses Feature Hierarchy for
Autonomous Classification of Similar Insect Species
(Mabel) Mengzi Zhang
Department of Computer Science & Engineering
University of California, San Diego
[email protected]
Figure 1. Process of the hierarchical shape descriptor’s construction, from horizontally-oriented insect image (left-most) to
medial axis extraction (second from left), ellipses extraction (thir d from left), and hierarchy construction (right- most).
ABST RACT
diversities. Biodiversity loss is closely related to
extinction rates and ecosystem services, including air and
water quality and climate, wh ich greatly affect hu man
health. Autonomously classifying insects is significant
because it can be adapted to counting species specimens,
which can be statistically analyzed to perform
biomonitoring. This helps to efficiently investigate
biodiversity and the interaction between human activity
and the environment.
As this is an ongoing project, our current goal is to
achieve an accuracy of 90% or higher. Our highest
accuracy is 88% so far, and with the continuous
development of the hierarch ical shape descriptor, we hope
to comb ine a more mature version of it with our three
existing descriptors that resulted in 88%, so that the
overall accuracy can be boosted to over 90%.
We will exp lain the methods used for the machine
learning and the vision aspects separately in sections 3.1
and 3.2, respectively. The results for the pyramid kernel
and the initial results for the shape descriptor are
presented in sections 4.1 and 4.2, respectively. Serving as
more background information, some previous related
work will be presented in section 2. Finally, in section 5,
we will briefly discuss the plan for more develop ment in
the near future.
This project focuses on automating the classification
of 4700+ photographs from 29 insect species, developing
ideas in computer vision and machine learning.
Autonomous categorization aims at reducing the manual
cost and the number of mistakes even when compared to
experts. Therefore, to be competitive, improving accuracy
is important. Challenges to accuracy in the 29-category
dataset are present in the wide intra-class variations that
impose the need for robust appearance and shape feature
descriptors. We overcome the difficulties by stacking both
descriptors, while preserving spatial information by using
a spatial pyramid kernel for the support vector machine
(SVM). This kernel achieved 88% overall accuracy,
compared to 87% with a histogram intersection kernel, or
84.5% and more time-consuming with a RBF kernel.
On the vision aspect, ongoing work attempts to
produce a new shape descriptor by extracting medial axis
line segments and ellipses from the image to form a
paired hierarchy that describes relative shape, location,
and angles in neighboring parts of an insect's body. The
initial run of this descriptor alone obtained an accuracy
of 66%. We hope to achieve even higher accuracies by
improving the algorithm and combining this descriptor
with other existing ones as described in [11].
2. RELAT ED WORK
A fair amount of exp loration has been done in the
spatial pyramid kernel. Ever since the kernel was
presented by Grau man and Darell, it has been applied to
datasets such as the Caltech-101, Graz, and natural scene
photographs by Lazebnik, Sch mid, and Ponce [8, 14].
1. INT RODUCTION
Species is one of three diversities in measuring
biodiversity, the other two being ecosystem and genetic
1
Even mo re interesting developments evolved fro m the
original pyramid kernel, including an approach fro m
Bosch, Zisserman, and Munoz, whose algorith m learns
the weight parameters in the kernel rather than assigning
the weight according to the level number of the pyramid,
and in addition, the algorith m learns to choose between
global and class-specific kernels [1]. Their approach was
also tested on Caltech-101 and TRECVID.
In our experiments, we chose an imp lementation of
spatial pyramid kernel same as that of Lazebnik et al. [14]
but in a novel way. The spatial pyramid kernel is used as
a base for a new type of stacking, wh ich in co mbination
was able to preserve informat ion at both global and local
scopes, as captured by multip le shape and appearance
descriptors [11].
For shape descriptors, lines and ellipses are not a
new co mbination. They have been tested and verified to
produce sound results in classification and recognition, as
shown in experiments such as those of Chia, Rahardja,
Rajan, and Leung [3]. Besides the choice of these two
primitive shapes, the way that our descriptor is formed is
fundamentally different. Instead of using line segments to
describe shape boundaries in co mp lement to ellipses as in
[3], we first extract the lines to represent medial axes and
the major axis of the ellipses, and then ext ract the ellipses
separately later in the process, using the line segments as
a basis.
On the mo re general topic of insect classification,
previous research has been done on a range of insects
using various automation techniques, such as in [10] and
additional works mentioned in [11].
whose main difference fro m a pyramid kernel is that it is
simp ly the finest base level of the pyramid. Depending on
the number of levels in the pyramid, it preserves at least
as much spatial information as the histogram kernel (for
number of levels = 1), and more as the level increases.
For each new level, the four values in a bo x o f
non-overlapping two-by-two cells in the previous level
are summed to obtain the new value for the
corresponding single cell in the new level. See Figure 2.
Figure 2. Graphical representation of the levels in a 3-level
pyramid whose base level is 4 by 4 cells. At each new level, the
sum of each box of 2x2 cells in the previous level forms the
value for one corresponding box in the new level. As the level
increases, the coarser grid tolerates more changes in the local
area, making the descriptor more robust to differences in
orientations and degrees of bending in the insect. A his togram
intersection kernel does not provide this flexibility, because it is
equivalent to the base level only.
We calculate the pyramid kernel’s values using an
equation adapted fro m those presented by [8, 14]. Instead
of calcu lating the channels at last as in equation (4) in
[14], we include all channels in each cell of the pyramid.
Thus, the intersection of two samples X, Y would su m up
the number of channels M in addition to the number of
cells D.
3. MET HOD
I ( H Xl , H Yl ) = ∑∑ min ( H Xl ( m ) , H Yl ( m ) )
D
The main part of the development consists of two
aspects, vision and machine learning. We will first
discuss the machine learning aspect, as it is fundamental
to the pyramid kernel we used to train the classification
model. Then we present the vision aspect, where we have
developed a hierarchical shape descriptor based on lines
and ellipses extracted fro m the insect images. This
descriptor is still ongoing work to be imp roved.
M
m 1
=i 1 =
i
Equation 1. Calculation for intersection of his tograms of examples
X and Y, where channels are summed into each cell, followed by
cells summing. m is the current channel out of M=29 channels, i is
the current cell out of D total cells. Hxl denotes the histogram of
example X at level l.
3.1. Machine learning: Pyramid Kernel
The Pyramid kernel, as brought forth by [7] and [14],
is adapted to construct descriptors for our database. We
expand [14] to contain 29 channels, one channel for each
insect species. The kernel was used on three existing
descriptors and again with a new method of stacking on
the combined descriptor. These three descriptors were
Histogram of Oriented Gradients (HOG) local features,
salient points of high curvature with a beam angle, and
Scale-Invariant Feature Transform descriptors (SIFT) [11,
5].
Initially, we used a histogram intersection kernel
2
To use the descriptors produced by the pyramid
kernel, as we have mult iple separate feature descriptors,
we co mbine them by stacking, and then we feed the
concatenated result to the SVM again to obtain the final
overall accuracy. Stacking is done slightly differently
fro m tradit ional stacking; instead of simply concatenating
the result matrices of two feature descriptors and still
having 29 classes, with two descriptors, we consider the
first descriptor’s results as 29 classes and the second’s
results as a second set of 29 classes, comb ining into a
total of 58 classes [11, 4]. Th is concatenation is
performed in the class-channel dimension, as opposed to
the cell dimension, thus retaining the spatial arrangement
of each cell in the t wo pyramids. These 58 classes are
used to obtain the final co mbined accuracy.
With traditional stacking, where the matrices are
simp ly concatenated and the number of classes undergoes
no special change, we experimented by converting each
descriptor’s 1-29 result into an n x29 binary matrix, where
n is the number of insect images, and then concatenating
the two descriptors’ binary matrices to form the input. We
have tried feeding the input to various learning models in
OpenCV and Weka, including decision trees, rando m
forest, SVM, and others [6, 9]. The resulting accuracies
fro m all these models were about the same as the higher
accuracy of the two original descriptors, with no
significant increase. A possible experiment we have not
tried is to use a posterior probabilit ies matrix to represent
all original descriptors’ results, as presented in [4]. In our
original attempt, we only used probabilities in the stacked
histograms,
which
accu mu lated
local
feature
classification scores; elsewhere, we used binary class
labels.
tree hierarchy whose root is the ellipse closest to the
insect’s center of mass, and we form subsequent tree
levels by taking the ellipses within a certain radius of
each existing node as that node’s children. This process
repeats until all ellipses are in the hierarchy. Each node
and each of its newly found child ren form a parent-child
pair. Finally, all pairs comb ined represent the entire tree’s
structure. Properties of each ellipse and of its paired
parent in the hierarchy are recorded in a file that we use
as the final descriptor to feed to our SVM framework. See
Figures 1, 3 for the graphical representation of each stage.
In this section, we describe the feature ext ractions
and hierarchy format ion in further details.
Figure 3. Intermediate image representations of the processing
stages, arranged in order of top to bottom.
i. Original erected insect image.
3.2. Vision: Lines and Ellipses Hierarchical
Descriptor
In the 29-class dataset, wide intra-class variations
make the insect images challenging to generate robust
descriptors. Many species have images of insects at
different development stages, including cocoons, which
are shaped like a simp le polygon, very different fro m an
actual insect; infants, which are much smaller in size,
usually oval-like, and have no wings or legs; and adults,
which have full-length legs and sometimes wings. Even
when most images in a species are of adult insects, the
insects have different degrees of bending, orientations,
and positions. Some images do not even capture the entire
insect but only have a segment of the insect’s body. These
variations make it mo re difficult to construct a descriptor
based on shape. The descriptor must represent coherence
within a g iven species and as much d istinctions as
possible among different species.
We chose ellipses to represent the nodes in the
hierarchy, because of their flexib ility to fit well into many
types of irregular shapes. In order to extract ellipses that
fit well to an object’s shape structure, we use the medial
axis, orig inal referred to as the “topological skeleton” in
[7], to fo rm the major axis of the ellipses. Using ellipses
found by medial axis line seg ments and by contour points
extracted fro m an image, we have developed a
hierarchical shape descriptor.
First, we extract a set of medial axes fro m the image
and divide them into groups that roughly correspond to
parts of the insect body, head, legs, abdomen, etc. Each
group represents a small part of the insect, and when all
groups are combined, they appro ximate a skeleton-like
med ial axis tree running through the insect. Using the
points on both the axis and the contours around each axis,
we fit an ellipse that roughly circumscribes this set of
points, representing the holistic shape of the given group.
After the ellipses are found for all groups, we construct a
ii. Extract medial axes in the body.
iii. Extract ellipses that each fits to a
medial axis and its nearby contours.
iv. Start building hierarchy by obtaining a
main body blob to filter out a possible
region for candidate root ellipses.
v. Build a paired hierarchy using the
ellipses. The hierarchy roots at the center
of mass and expands outwards.
3.2.1. Medial axes extraction
For extraction of medial axes in each image, we use
OpenCV to obtain an in itial set of medial axis in the
image, which represents the object’s overall shape
structure – in this case, a skeleton-like representation of
3
{
the insect’s body [6]. Then, using the relative distance
among the pixels of the medial axes, we cluster these
pixels and fit a line segment through each cluster of
pixels, so that the object’s shape structure can be
mathematically represented. In the insect dataset, the line
segments are portions of the insect’s different body parts,
such as antennae, torso, abdomen, legs, etc., where each
body part is usually represented by mult iple small
segments.
After the init ial clustering, if a sanity check finds that
a cluster of p ixels is too coarse to be represented by a line,
then the Object Recognition Toolkit (ORT) [2] is used to
subdivide that cluster. New line segments are then refitted
to the newly formed finer clusters. A cluster is considered
too coarse if it is too wide for a single line segment to
describe, i.e. if the line fitt ing through it covers less than a
threshold, currently set at 0.5, of the horizontal and
vertical ranges of the cluster, or if the cluster is too wide
diagonally for a single line to cover accurately,
determined by testing the minor axis length of the ellipse
fitting to the cluster.
We describe our algorith m in details below.
1.
group.push (med_axis.at (j));
break;
}
}
}
for each ith group
if (groups.at (i).getNumPoints() < THRES_PTS)
groups.remove (i);
4.
for ith group in groups sequence
{
line = cvFitLine (group);
// Cut the line into a segment that spans to the
group’s exact boundaries
seg = line2segment (line);
if (groupIsTooCoarse (group, seg))
{
ORT_reprocess (group, strings);
regroupAndRefitLines (strings);
}
}
5.
Preprocessing:
// Filter out a rough shape of the medial axis.
cvDistance_transform (bin_img);
cvLaplace_transform (bin_img);
cvThresholding (bin_img);
Find line segments:
3.2.2. Ellipse extraction
// Find line segments in the medial axis image [13]
cvHough_lines (bin_img);
med_axis = get_foreground_pts (bin_img);
3.
ORT operations we used [2]
ORT_reprocess (group, strings)
{
// Make all foreground points 8-connected;
clean (group);
// Extract connected strings with open ends
strings.push (link_open (group));
// Extract connected strings with closed ends,
// like in a hoop
strings.push (link_close (group));
}
// Smooth image
bin_img = get_binary_image (image);
cvErode (bin_img);
cvDilate (bin_img);
2.
Fit a line through each group:
To extract ellipses from the image, we use the
med ial axis segments previously found as a basis. Our
main extract ion uses parts of an algorithm in OpenCV’s
fit Ellipse() method, which currently has two algorith ms;
we use the one contributed by Weiss, whose original
algorith m uses three singular value decomposition (SVD)
calculations [6]:
Cluster the medial axis pixels into groups:
set_thres_dist (THRES_DIST);
set_thres_pts (THRES_PTS);
groups = init_new_sequence ();
// Loop through all points on the medial axes
for each ith point on med_axis
{
group = init_new_group ();
groups.push (group);
Equation 2. Solves for (a’, b’, c’, d’, e’), a vector of values used
later to calculate (a, b, c, d, e) in the general equation of the
ellipse. (x i , yi ) is the ith coordinate pair in the given list of points
around all of which the ellipse must fit.
// Put the first point in the pool into this group
if (i == 0)
group.push (med_axis.at (i));
else
for each jth point in group
{
// If the new point is close enough
// to a point in the group, add new
// point to the group.
dist = get_distance (med_axis.at (i),
group.at(j));
if (dist < THRES_DIST)
 − x02 − y02 − x0 ⋅ y0



2
2
 −x − y −x ⋅ y
i
i
i
 i


 2
2
 − xn − yn − xn ⋅ yn
4
x0
xi
xn
y0   a '  10000 
  

 b '   
  
yi   c '  =
  

  d '   

yn  e '    
Equation 3. Solves for (c x, c y), the center point of the ellipse, using
the vector (a’, b’, c’, d’, e’) found.
Our ellipse extraction in details:
// Group the contour points:
loop for each group
find contour points within the bounding box of a group
add these points to that group
loop for each of the left over contour points
find the group closest to this contour point
add this point to that group
 2 a ' c '  cx   d '
c ' 2b '  c  = e ' 

 y  
Equation 4. Refits parameters (a’, b’, c’) around the center point
(c x, c y) found.
 ( x − c )2
 0 x


( xn − cx )2

(y
0
− cy )
2

(y
n
− cy )
2
// Now each group has an exclusive set of nearby contour
// points. Fit an ellipse for each group.
loop for each group
{
// Using midpoint of medial axis as center of ellipse, fit
// an ellipse around medial axis & contour points
fitEllipse_mod (group.getMedialAxisPts (),
group.getContourPts (),
group.getMedialAxisMidPt (),
group.getBoundingBox ());
}
( x0 − cx ) ( y0 − c y )   a '
1
 b '  =  
   
x
c
y
c
−
−
( n x ) ( n y ) c '   
In the usual case, we only use the Equation 4 and
related post-calculations, because by this time in the
process, we already have a sense of where the center of
the ellipse should be. Thus, instead of using Equations 2
and 3 to calcu late the center of an ellipse that fits to a
given group, we simply use the midpoint of this group’s
med ial axis as the center o f the ellipse, and then use
Equation 4 to calculate the radii and angle parameters.
The input list of points to the fitting algorith m
consists of all the points on this group’s medial axis and
the nearby contour. The nearby contour points are
obtained by first separating the set of all contour points,
such that all points contained in the parallelogram formed
by a group’s medial axis are considered to be in the same
group as this medial axis. After all the possible contour
points are distributed in this fashion, the remaining
contour points not in any parallelograms are grouped to
the closest group by distance, calculated between the
contour point and each group’s center point. For each
group, these contour points are comb ined with the medial
axis points to pass in as input for finding a fitting ellipse.
Occasionally, an ellipse found is elongated and
relatively much larger in size than other ellipses, due to
unknown causes. In such cases, we call the original
fit Ellipse method, wh ich calculates a new center fro m the
points and uses all three equations. The orig inal fit Ellipse
method, however, estimates extremely large ellipses for
med ial axes with slope 0, so we check for these and
shrink them down to the branch's bounding box size.
We have tried using the original fit Ellipse method
alone, without using our medial axis midpoints as centers,
and the resulting ellipses’ rotations and elongations are
not as optimal as the current method; they seem to be
more upright and more regular, with fewer
transformations and less elongated, which means they
approximate the shapes less accurately.
Another approach we have tried was the Hough
circles, but the results were not suitable to our needs. The
circles were not calculated based on a medial axis or a
center point, but instead are based on a binary image of
the insect. Resulting circles do not seem to fo llo w any
salient feature of the insect. Moreover, circles are not as
flexib le as ellipses by nature.
void fitEllipse_mod(medAxisPts, contPts, midPt, bbox)
{
use Equation 4 to fit an ellipse around medAxisPts
and contPts, with midPt being the center;
if (ellipse si ze is more than 200% of bbox)
// Refit using all three equations
cvFitEllipse2 ();
// When fitted line is slope 0
if (ellipse is more than 200% of bbox
shrink_ellipse_to_boundingbox (ellipse, bbox);
return ellipse;
}
// Now each group has an ellipse fitted; ready to use the
ellipses to construct the hierarchy for the insect image.
3.2.3. Hierarchy formation
Using the ellipses found and their spatial
relationships to each other, we construct a hierarchy
whose root starts fro m appro ximately the center of mass
of the insect and whose subsequent levels expand rad ially
fro m each existing node; see Figures 4 and 5. The
hierarchy is represented by pairs of a parent ellipse and a
child ellipse.
Before determin ing the root, we use erosions and
dilations to obtain a main b lob of the image, excluding
any noises, in this case, the legs and antenna. This way,
the center ellipse of the image can be more accurately
determined, eliminating cases where an ellipse on the
edge of the image is closer to the center of mass (CoM)
due to noise. Thus, the root is usually found to be the
ellipse closest to the center of the object.
A special case occurs when the CoM falls outside of
the object, such as in hollow images, e.g. a curled insect,
where the CoM would fall inside hole. When this happens,
we use the exact pixels of the main blob, instead of the
bounding box in normal cases, to calculate the
5
root_cand_list.add (this ellipse);
root = get_ellp_closest_to_CoM (root_cand_list);
intersection with each ellipse. This addresses the problem
of the bounding box being deceiving when it is drawn
around an object with holes. In addition, we also set a
more lenient threshold, 50%, on the percentage of overlap
between the ellipse and the blob, rather than the 80% in
normal cases.
}
Algorith m for forming the remaining h ierarchy:
// Find the remaining nodes of the hierarchy by a recursive
loop starting from root
Figure 4. Step-by-step process showing the hierarchy formation
from an insect image.
radius = find_average_radius (all_ellipses);
// Sort all ellipses by distance to root ellipse, short to long
sorted_list = sort (all_ellipses);
curr_parent = root;
queue.add (root);
i. Root ellipse is found as the
ellipse in w hich the center of
mass (magenta dot) falls.
// Each iteration does 1 level of hierarchy, until queue is
empty, i.e. all ellipses are in hierarchy
loop
{
// Ellipses within radius to curr_parent are its children
children = find_all_children (curr_parent,
curr_radius);
ii. Looping through the
remaining ellipses, the first
child ellipse, A, of root is found
within radius r of root.
iii. Second child ellipse, B, of
root is found w ithin radius r of
root.
// If parent is a leaf
if (children.size() == 0)
{
curr_radius = 1.5*radius;
continue;
}
(For root, increase until a child is found, unless there
is only 1 ellipse total; for other nodes, increase just
once.)
iv. No ellipses are found within
radius r of A, radius increased
to 1.5*r to try again. Child C of
ellipse A is found.
queue.pop (curr_parent);
if (thisLevelDone())
lvl_num++;
v. Child D of ellipse B is found
within radius r of B.
if (queue.size () > 0)
curr_parent = queue.at (0);
vi. Hierarchy relationship
among all five ellipses is
obtained.
else if (! allEllpsAreInHier ())
{
if (! Increased_radius)
{
// Push all nodes in this level back into
// queue
repeatThisLevel (queue);
curr_radius = 1.5*radius;
continue;
}
else
{
// Find an ellipse who is not in the
hierarchy, but whose previous 3 neighbors
in the list closer to the root are in the
hierarchy
orphan = find_orphan (sorted_list);
// Parent is found by taking the a verage
parent level of its previous 3 neighbors.
find_parent (orphan);
vii. In a tree structure, the root
has two children, A and B, each
of whic h has one child, C and
D, respectiv ely.
Algorith m for finding the root of the hierarchy:
get_CoM (image);
// Determine root ellipse
if (CoM falls within exactly 1 ellipse)
root = this ellipse;
else if (CoM falls into more than 1 ellipse)
root = the one closest to CoM;
else // CoM falls into no ellipses
{
// Calculate a blob representing the image’s ballpark
erodeAndDilate ();
For each ellipse
if (>= 80% of its bounding box is within the
bounding box of the main blob)
}
curr_parent = queue.at (0);
}
curr_radius = radius;
}
6
Original Image
inconsistency in the number of samp les, the resulting
accuracy in each species may be affected by the lack of
samples, in addit ion to the descriptor’s shortcomings and
the machine learn ing frameworks’ robustness.
Hierarchy
4.1. Spatial Pyramid Kernel
Before the development of the new hierarchical
descriptor, the spatial pyramid kernel was used on three
existing descriptors as explained in section 3.1 to achieve
a comb ined accuracy of 88.06%.
Table 1 illustrates the results of using other SVM
kernels on the stacked co mbination, which achieved
lower accuracies than the spatial pyramid kernel.
Figure 5. An insect image and its hierarchy extraction from the
Amphin species. In the hierarchy image, the magenta dot marks
the center of mass, and the dark blue ellipse marks the root. As
level increases, the ellipse color follows a gradient of blue, green,
yellow, and red (in this case, the deeper levels were not reached).
The line segments connect the centers of hierarchical pairs of
ellipses.
RBF
Hist Int
Pyramid
Accuracy
84.50%
87%
88.06%
Table 1. Accuracies obtained by the Radial Basis Function (RBF),
histogram intersection, and spatial pyramid kernels.
3.2.4. Outputting descriptor
Co mpared to the histogram intersection kernel,
which achieved an accuracy of 87%, the spatial pyramid
kernel had 88%. Th is improvement is small but not
accidental. In the 16-by-16 grid base we used, the
pyramid preserved three additional levels of 4-by-4,
2-by-2, and 1-by-1 cells, having a total of 4 levels of
spatial information, while the histogram intersection
kernel has only 1 level, the base. In simp ler terms, the
pyramid kernel captures informat ion at a local level at the
16x16 base, but also preserves spatial arrangements at a
more global level, on the 4x4 and 2x2 layers. Therefore, it
is only possible for the pyramid kernel to obtain an
accuracy at least the same as that of the histogram
intersection kernel, if not higher. The coarser 4-by-4 and
2-by-2 grids might have helped tolerate insect species that
have large intra-class variations, where the 16-by-16 grid
over-specifies the insect. Thus, even though the pyramid
kernel has a slightly longer running time than the
histogram intersection kernel, it is reasonable and worthy
for the increase in accuracy. It may increase even more
for the new hierarchical descriptor, wh ich we have not
tried with both kernels.
In earlier stages, we have tried the sigmoid kernel,
and it produced similar accuracies to the pyramid kernel,
but was 3 to 4 t imes slower.
We have also experimented with 32x32, 8x8, and
4x4 base pyramids, whose results were much lo wer than a
16x16 one; the 32x32 base pyramid is also several times
slower to compute. Similar for attempts to use only some
of the levels, such as using only the 32x32, 16x16, 8x8
and 4x4 levels in a 32x32 base pyramid, skipping the 2x2
and 1x1 levels.
Ellipses obtained need to be outputted to a matrix
format in a text file in order to pass to SVM train ing. For
each image, we output all the ellipses to a text file, each
row being an ellipse represented by a 5-tuple (u, v, a, b, c),
where the ellipse equation
a ( x − u ) + 2b ( x − u )( y − v ) + c ( y − v ) =
1
2
Kernel
2
is satisfied [12]. This file is used as input to a program
whose binaries are availab le on the Oxford v ision group’s
website [12]. After taking the file as input, the program
performs affine transform on each ellipse to obtain a
circle, and then it co mputes a 128-colu mn SIFT
descriptor around each circle [5]. In other words, fro m
this program, we obtain a SIFT descriptor for each ellipse
we found on the insect image.
We take this SIFT descriptor mat rix and concatenate
it with a matrix containing the pair-hierarchy info rmation
we constructed. Hierarchy properties we used include:
center and SIFT descriptor of each ellipse and its parent
ellipse, distance and midpoint between them, and angle
between them with respect to the child ellipse.
Using this format for the final descriptor, we pass
the descriptor of each image to the existing classification
framework to assess the effectiveness of this new shape
descriptor. The framework is also capable of co mbining
this descriptor with other existing appearance and shape
descriptors to obtain the combined overall accuracy.
4. RESULT S & DISCUSSIONS
More than 4700 images are in the 29-species insect
dataset. Each species contains a range of 27 to 328
images, averaging 163 images per species. Due to the
7
4.2. Hierarchical Descriptor
8.5 hours to process and to generate the hierarchical
descriptor for all 4700+ images. The descriptor matrix is
integrated into a random-trees local feature classifier in
the current SVM framework [11] for training and testing.
We obtained an accuracy of 66.16%, wh ich shows ample
room for improvements. See Figure 6 for the confusion
matrix, Figure 7 for some results of the hierarchy
construction, and Figure 8 for its current deficits.
In later stages, we will be modifying the algorith m
and testing this descriptor alone again, and then
combin ing it with other existing descriptors [11] to boost
the overall co mbined accuracy.
Our latest developments are the ellipses extraction
and the hierarchy construction. The decision to construct
a hierarchy is so that it can record an order of
arrangement among the descriptors for the elliptical
regions. Thus, we plan to concatenate the pair of parent’s
and child’s SIFT descriptors in the final matrix, which
currently only contains the child and its distance, angle,
and midpoint to the parent, without the parent’s actual
SIFT descriptor.
We have conducted an init ial test on the hierarchical
descriptor alone. In the initial run, it took a total of about
Figure 6. Confusion matrix result of the
initial run of the hierarchic al descriptor,
whic h averaged to 66%. Values are
rounded to the nearest percentage of
(predict samples / total samples). Each
row represents a class of species, and
each column is the species predicted. For
each row, <row title> was predicted as
<col title>.
Figure 7. (Best viewed in color) Hierarchy extraction for the 29 species in the dataset, one most representative image each. Images are
presented in pairs, with the odd columns being the original and the even columns the hierarchy images. In the hierarchy images, a
magenta dot indicates the center of mass; dark blue indicates the root ellipse; subsequent levels follow the order of blue, green, yellow, red.
Line segments connect the centers of parent and child ellipses in a hierarchical pair. As can be seen, some species are very similar to
others in shape and/or appearance.
Original Image
Hierarchy
Original Image
Hierarchy
8
Original Image
Hierarchy
9
5. CONCLUSION & FUT URE WORK
traditional stacking, a matrix with probability values for
each class instead of a binary one may offer differences in
the results [4].
In the near future, we propose to improve the
hierarchical descriptor by increasing the nu mber of
ellipses and thus medial axes, modifying the hierarchy
construction to avoid shortfalls as shown in Figure 7, and
possibly modifying the med ial axes ext raction.
Depending on how much each of these would improve
the accuracy, we may co mbine two or more into the final
descriptor. Then, after these attempts, we will co mbine
this shape descriptor with other existing descriptors in the
framework to see how this new descriptor can boost the
overall stacked accuracy. Ultimately, we hope to achieve
an accuracy of 90% or beyond.
We have developed a spatial pyramid kernel and
have combined it with a new stacking method that
achieved an overall accuracy of 88.06%, outperforming
other classifiers [11, 14, 8].
In addition, a new hierarchical descriptor based on
pairs of ellipses calculated fro m med ial axis line
segments extracted fro m the image has achieved an in itial
accuracy of 66.16%, even though the pairing info rmation
was not yet fully used. Improvements are underway.
As mentioned in section 3.1, we have not tried using
posterior matrix instead of binary matrix in tradit ional
stacking. Instead, we are taking a new stacking approach,
as described in [11]. However, if one were to use a
Figure 8. Shortcomings to the algorithm include occasionally wrong root, no clear route of hierarchy, and intra-class inconsistencies.
6. ACKNOWLEDGEMENT S
sponsors for the Distributed Research Experiences for
Undergraduates (DREU), the Co mputing Research
Association on the Status of Women in Co mputing
Research (CRA-W), and the Co mputer Science and
Engineering Depart ment at the University of Washington.
Accomplishments presented thus far and future progress
would not be possible without the support of any one
party listed.
We would especially like to acknowledge the
excellent mentorships of Natalia Larios and Dr. Linda
Shapiro at the University of Washington, both of whom
actively helped to push the progress along and kindly
made
valuable
contributions
and suggestions.
Appreciations also go to the program coordinators and
7. REFERENCES
[1] A. Bosch, A. Zisserman, and X. Munoz.
Representing Shape with A Spatial Pyramid Kernel.
CIVR 2007.
[2] A.
Etemadi.
Object
Recognition
Toolkit.
http://www.cs.washington.edu/education/courses/cse
576/10sp/software/index.ht ml
[3] A. Y.-S. Chia, S. Rahardja, D. Rajan, and M. K.
Leung. Object Recognition by Discriminative
Co mbinations of Line Seg ments and Ellipses. CVPR
2010.
[4] D. A. Lisin, M . A. Mattar, M. B. Blaschko, M. C.
Benfield, and E. G. Learned-Miller. Co mbining Local
[5]
[6]
[7]
[8]
10
and Global Image Features for Object Class
Recognition.
D. Lowe, Distinctive image features fro m scale
invariant keypoints. In IJCV 60(2):91-110, 2004.
G. Bradski. The OpenCV Library. Dr. Dobb’s Journal
of Soft ware Tools, Vo lu me 25, Nu mber 11,
November
2000,
pp.
120,
122-125.
http://opencv.willowgarage.co m/wiki/
H. Blu m. A transformation for extract ing new
descriptors of shape. Models for the perception of
speech and visual form, 1967.
K. Grau man and T. Darrell. Pyramid Match Kernels:
Discriminative Classification with Sets of Image
Features. In Proc. ICCV, 2005.
[9] M. Hall, E. Frank, G. Ho lmes, B. Pfahringer, P.
Reutemann, I. H. Witten (2009); The WEKA Data
Mining Soft ware: An Update; SIGKDD Exp lorations,
Volu me 11, Issue 1.
[10] N. Larios, B. Soran, L. G. Shapiro, G. M. Munoz, J.
Lin, and T. G. Dietterich. Haar Random Forest
Features and SVM Spatial Matching Kernel fo r
Stonefly Species Identification. In ICPR, 2010.
[11] N. Larios, J. Lin, M. Zhang, L. G. Shapiro, and T. G.
Dietterich. Stacked Spatial-Pyramid Kernel: An
Object-Class Recognition Method toCombine Scores
fro m Random Trees. Pending, WACV 2010.
[12] Region Descriptors Linu x binaries. Visual Geo metry
Group,
University
of
Oxfo rd.
http://www.robots.ox.ac.uk/~vgg/research/affine/inde
x.ht ml
[13] R. O. Duda, P. E. Hart, Use of the Hough
Transformation to Detect Lines and Curves in
Pictures. Co mm. ACM, Vo l. 15, pp. 11–15. January
1972
[14] S. Lazebnik, C. Sch mid, and J. Ponce. Beyond Bags
of Features: Spatial Pyramid Matching fo r
Recognizing Natural Scene Categories.
11