SVM Spatial Pyramid Kernel with Lines and Ellipses Feature Hierarchy for Autonomous Classification of Similar Insect Species (Mabel) Mengzi Zhang Department of Computer Science & Engineering University of California, San Diego [email protected] Figure 1. Process of the hierarchical shape descriptor’s construction, from horizontally-oriented insect image (left-most) to medial axis extraction (second from left), ellipses extraction (thir d from left), and hierarchy construction (right- most). ABST RACT diversities. Biodiversity loss is closely related to extinction rates and ecosystem services, including air and water quality and climate, wh ich greatly affect hu man health. Autonomously classifying insects is significant because it can be adapted to counting species specimens, which can be statistically analyzed to perform biomonitoring. This helps to efficiently investigate biodiversity and the interaction between human activity and the environment. As this is an ongoing project, our current goal is to achieve an accuracy of 90% or higher. Our highest accuracy is 88% so far, and with the continuous development of the hierarch ical shape descriptor, we hope to comb ine a more mature version of it with our three existing descriptors that resulted in 88%, so that the overall accuracy can be boosted to over 90%. We will exp lain the methods used for the machine learning and the vision aspects separately in sections 3.1 and 3.2, respectively. The results for the pyramid kernel and the initial results for the shape descriptor are presented in sections 4.1 and 4.2, respectively. Serving as more background information, some previous related work will be presented in section 2. Finally, in section 5, we will briefly discuss the plan for more develop ment in the near future. This project focuses on automating the classification of 4700+ photographs from 29 insect species, developing ideas in computer vision and machine learning. Autonomous categorization aims at reducing the manual cost and the number of mistakes even when compared to experts. Therefore, to be competitive, improving accuracy is important. Challenges to accuracy in the 29-category dataset are present in the wide intra-class variations that impose the need for robust appearance and shape feature descriptors. We overcome the difficulties by stacking both descriptors, while preserving spatial information by using a spatial pyramid kernel for the support vector machine (SVM). This kernel achieved 88% overall accuracy, compared to 87% with a histogram intersection kernel, or 84.5% and more time-consuming with a RBF kernel. On the vision aspect, ongoing work attempts to produce a new shape descriptor by extracting medial axis line segments and ellipses from the image to form a paired hierarchy that describes relative shape, location, and angles in neighboring parts of an insect's body. The initial run of this descriptor alone obtained an accuracy of 66%. We hope to achieve even higher accuracies by improving the algorithm and combining this descriptor with other existing ones as described in [11]. 2. RELAT ED WORK A fair amount of exp loration has been done in the spatial pyramid kernel. Ever since the kernel was presented by Grau man and Darell, it has been applied to datasets such as the Caltech-101, Graz, and natural scene photographs by Lazebnik, Sch mid, and Ponce [8, 14]. 1. INT RODUCTION Species is one of three diversities in measuring biodiversity, the other two being ecosystem and genetic 1 Even mo re interesting developments evolved fro m the original pyramid kernel, including an approach fro m Bosch, Zisserman, and Munoz, whose algorith m learns the weight parameters in the kernel rather than assigning the weight according to the level number of the pyramid, and in addition, the algorith m learns to choose between global and class-specific kernels [1]. Their approach was also tested on Caltech-101 and TRECVID. In our experiments, we chose an imp lementation of spatial pyramid kernel same as that of Lazebnik et al. [14] but in a novel way. The spatial pyramid kernel is used as a base for a new type of stacking, wh ich in co mbination was able to preserve informat ion at both global and local scopes, as captured by multip le shape and appearance descriptors [11]. For shape descriptors, lines and ellipses are not a new co mbination. They have been tested and verified to produce sound results in classification and recognition, as shown in experiments such as those of Chia, Rahardja, Rajan, and Leung [3]. Besides the choice of these two primitive shapes, the way that our descriptor is formed is fundamentally different. Instead of using line segments to describe shape boundaries in co mp lement to ellipses as in [3], we first extract the lines to represent medial axes and the major axis of the ellipses, and then ext ract the ellipses separately later in the process, using the line segments as a basis. On the mo re general topic of insect classification, previous research has been done on a range of insects using various automation techniques, such as in [10] and additional works mentioned in [11]. whose main difference fro m a pyramid kernel is that it is simp ly the finest base level of the pyramid. Depending on the number of levels in the pyramid, it preserves at least as much spatial information as the histogram kernel (for number of levels = 1), and more as the level increases. For each new level, the four values in a bo x o f non-overlapping two-by-two cells in the previous level are summed to obtain the new value for the corresponding single cell in the new level. See Figure 2. Figure 2. Graphical representation of the levels in a 3-level pyramid whose base level is 4 by 4 cells. At each new level, the sum of each box of 2x2 cells in the previous level forms the value for one corresponding box in the new level. As the level increases, the coarser grid tolerates more changes in the local area, making the descriptor more robust to differences in orientations and degrees of bending in the insect. A his togram intersection kernel does not provide this flexibility, because it is equivalent to the base level only. We calculate the pyramid kernel’s values using an equation adapted fro m those presented by [8, 14]. Instead of calcu lating the channels at last as in equation (4) in [14], we include all channels in each cell of the pyramid. Thus, the intersection of two samples X, Y would su m up the number of channels M in addition to the number of cells D. 3. MET HOD I ( H Xl , H Yl ) = ∑∑ min ( H Xl ( m ) , H Yl ( m ) ) D The main part of the development consists of two aspects, vision and machine learning. We will first discuss the machine learning aspect, as it is fundamental to the pyramid kernel we used to train the classification model. Then we present the vision aspect, where we have developed a hierarchical shape descriptor based on lines and ellipses extracted fro m the insect images. This descriptor is still ongoing work to be imp roved. M m 1 =i 1 = i Equation 1. Calculation for intersection of his tograms of examples X and Y, where channels are summed into each cell, followed by cells summing. m is the current channel out of M=29 channels, i is the current cell out of D total cells. Hxl denotes the histogram of example X at level l. 3.1. Machine learning: Pyramid Kernel The Pyramid kernel, as brought forth by [7] and [14], is adapted to construct descriptors for our database. We expand [14] to contain 29 channels, one channel for each insect species. The kernel was used on three existing descriptors and again with a new method of stacking on the combined descriptor. These three descriptors were Histogram of Oriented Gradients (HOG) local features, salient points of high curvature with a beam angle, and Scale-Invariant Feature Transform descriptors (SIFT) [11, 5]. Initially, we used a histogram intersection kernel 2 To use the descriptors produced by the pyramid kernel, as we have mult iple separate feature descriptors, we co mbine them by stacking, and then we feed the concatenated result to the SVM again to obtain the final overall accuracy. Stacking is done slightly differently fro m tradit ional stacking; instead of simply concatenating the result matrices of two feature descriptors and still having 29 classes, with two descriptors, we consider the first descriptor’s results as 29 classes and the second’s results as a second set of 29 classes, comb ining into a total of 58 classes [11, 4]. Th is concatenation is performed in the class-channel dimension, as opposed to the cell dimension, thus retaining the spatial arrangement of each cell in the t wo pyramids. These 58 classes are used to obtain the final co mbined accuracy. With traditional stacking, where the matrices are simp ly concatenated and the number of classes undergoes no special change, we experimented by converting each descriptor’s 1-29 result into an n x29 binary matrix, where n is the number of insect images, and then concatenating the two descriptors’ binary matrices to form the input. We have tried feeding the input to various learning models in OpenCV and Weka, including decision trees, rando m forest, SVM, and others [6, 9]. The resulting accuracies fro m all these models were about the same as the higher accuracy of the two original descriptors, with no significant increase. A possible experiment we have not tried is to use a posterior probabilit ies matrix to represent all original descriptors’ results, as presented in [4]. In our original attempt, we only used probabilities in the stacked histograms, which accu mu lated local feature classification scores; elsewhere, we used binary class labels. tree hierarchy whose root is the ellipse closest to the insect’s center of mass, and we form subsequent tree levels by taking the ellipses within a certain radius of each existing node as that node’s children. This process repeats until all ellipses are in the hierarchy. Each node and each of its newly found child ren form a parent-child pair. Finally, all pairs comb ined represent the entire tree’s structure. Properties of each ellipse and of its paired parent in the hierarchy are recorded in a file that we use as the final descriptor to feed to our SVM framework. See Figures 1, 3 for the graphical representation of each stage. In this section, we describe the feature ext ractions and hierarchy format ion in further details. Figure 3. Intermediate image representations of the processing stages, arranged in order of top to bottom. i. Original erected insect image. 3.2. Vision: Lines and Ellipses Hierarchical Descriptor In the 29-class dataset, wide intra-class variations make the insect images challenging to generate robust descriptors. Many species have images of insects at different development stages, including cocoons, which are shaped like a simp le polygon, very different fro m an actual insect; infants, which are much smaller in size, usually oval-like, and have no wings or legs; and adults, which have full-length legs and sometimes wings. Even when most images in a species are of adult insects, the insects have different degrees of bending, orientations, and positions. Some images do not even capture the entire insect but only have a segment of the insect’s body. These variations make it mo re difficult to construct a descriptor based on shape. The descriptor must represent coherence within a g iven species and as much d istinctions as possible among different species. We chose ellipses to represent the nodes in the hierarchy, because of their flexib ility to fit well into many types of irregular shapes. In order to extract ellipses that fit well to an object’s shape structure, we use the medial axis, orig inal referred to as the “topological skeleton” in [7], to fo rm the major axis of the ellipses. Using ellipses found by medial axis line seg ments and by contour points extracted fro m an image, we have developed a hierarchical shape descriptor. First, we extract a set of medial axes fro m the image and divide them into groups that roughly correspond to parts of the insect body, head, legs, abdomen, etc. Each group represents a small part of the insect, and when all groups are combined, they appro ximate a skeleton-like med ial axis tree running through the insect. Using the points on both the axis and the contours around each axis, we fit an ellipse that roughly circumscribes this set of points, representing the holistic shape of the given group. After the ellipses are found for all groups, we construct a ii. Extract medial axes in the body. iii. Extract ellipses that each fits to a medial axis and its nearby contours. iv. Start building hierarchy by obtaining a main body blob to filter out a possible region for candidate root ellipses. v. Build a paired hierarchy using the ellipses. The hierarchy roots at the center of mass and expands outwards. 3.2.1. Medial axes extraction For extraction of medial axes in each image, we use OpenCV to obtain an in itial set of medial axis in the image, which represents the object’s overall shape structure – in this case, a skeleton-like representation of 3 { the insect’s body [6]. Then, using the relative distance among the pixels of the medial axes, we cluster these pixels and fit a line segment through each cluster of pixels, so that the object’s shape structure can be mathematically represented. In the insect dataset, the line segments are portions of the insect’s different body parts, such as antennae, torso, abdomen, legs, etc., where each body part is usually represented by mult iple small segments. After the init ial clustering, if a sanity check finds that a cluster of p ixels is too coarse to be represented by a line, then the Object Recognition Toolkit (ORT) [2] is used to subdivide that cluster. New line segments are then refitted to the newly formed finer clusters. A cluster is considered too coarse if it is too wide for a single line segment to describe, i.e. if the line fitt ing through it covers less than a threshold, currently set at 0.5, of the horizontal and vertical ranges of the cluster, or if the cluster is too wide diagonally for a single line to cover accurately, determined by testing the minor axis length of the ellipse fitting to the cluster. We describe our algorith m in details below. 1. group.push (med_axis.at (j)); break; } } } for each ith group if (groups.at (i).getNumPoints() < THRES_PTS) groups.remove (i); 4. for ith group in groups sequence { line = cvFitLine (group); // Cut the line into a segment that spans to the group’s exact boundaries seg = line2segment (line); if (groupIsTooCoarse (group, seg)) { ORT_reprocess (group, strings); regroupAndRefitLines (strings); } } 5. Preprocessing: // Filter out a rough shape of the medial axis. cvDistance_transform (bin_img); cvLaplace_transform (bin_img); cvThresholding (bin_img); Find line segments: 3.2.2. Ellipse extraction // Find line segments in the medial axis image [13] cvHough_lines (bin_img); med_axis = get_foreground_pts (bin_img); 3. ORT operations we used [2] ORT_reprocess (group, strings) { // Make all foreground points 8-connected; clean (group); // Extract connected strings with open ends strings.push (link_open (group)); // Extract connected strings with closed ends, // like in a hoop strings.push (link_close (group)); } // Smooth image bin_img = get_binary_image (image); cvErode (bin_img); cvDilate (bin_img); 2. Fit a line through each group: To extract ellipses from the image, we use the med ial axis segments previously found as a basis. Our main extract ion uses parts of an algorithm in OpenCV’s fit Ellipse() method, which currently has two algorith ms; we use the one contributed by Weiss, whose original algorith m uses three singular value decomposition (SVD) calculations [6]: Cluster the medial axis pixels into groups: set_thres_dist (THRES_DIST); set_thres_pts (THRES_PTS); groups = init_new_sequence (); // Loop through all points on the medial axes for each ith point on med_axis { group = init_new_group (); groups.push (group); Equation 2. Solves for (a’, b’, c’, d’, e’), a vector of values used later to calculate (a, b, c, d, e) in the general equation of the ellipse. (x i , yi ) is the ith coordinate pair in the given list of points around all of which the ellipse must fit. // Put the first point in the pool into this group if (i == 0) group.push (med_axis.at (i)); else for each jth point in group { // If the new point is close enough // to a point in the group, add new // point to the group. dist = get_distance (med_axis.at (i), group.at(j)); if (dist < THRES_DIST) − x02 − y02 − x0 ⋅ y0 2 2 −x − y −x ⋅ y i i i i 2 2 − xn − yn − xn ⋅ yn 4 x0 xi xn y0 a ' 10000 b ' yi c ' = d ' yn e ' Equation 3. Solves for (c x, c y), the center point of the ellipse, using the vector (a’, b’, c’, d’, e’) found. Our ellipse extraction in details: // Group the contour points: loop for each group find contour points within the bounding box of a group add these points to that group loop for each of the left over contour points find the group closest to this contour point add this point to that group 2 a ' c ' cx d ' c ' 2b ' c = e ' y Equation 4. Refits parameters (a’, b’, c’) around the center point (c x, c y) found. ( x − c )2 0 x ( xn − cx )2 (y 0 − cy ) 2 (y n − cy ) 2 // Now each group has an exclusive set of nearby contour // points. Fit an ellipse for each group. loop for each group { // Using midpoint of medial axis as center of ellipse, fit // an ellipse around medial axis & contour points fitEllipse_mod (group.getMedialAxisPts (), group.getContourPts (), group.getMedialAxisMidPt (), group.getBoundingBox ()); } ( x0 − cx ) ( y0 − c y ) a ' 1 b ' = x c y c − − ( n x ) ( n y ) c ' In the usual case, we only use the Equation 4 and related post-calculations, because by this time in the process, we already have a sense of where the center of the ellipse should be. Thus, instead of using Equations 2 and 3 to calcu late the center of an ellipse that fits to a given group, we simply use the midpoint of this group’s med ial axis as the center o f the ellipse, and then use Equation 4 to calculate the radii and angle parameters. The input list of points to the fitting algorith m consists of all the points on this group’s medial axis and the nearby contour. The nearby contour points are obtained by first separating the set of all contour points, such that all points contained in the parallelogram formed by a group’s medial axis are considered to be in the same group as this medial axis. After all the possible contour points are distributed in this fashion, the remaining contour points not in any parallelograms are grouped to the closest group by distance, calculated between the contour point and each group’s center point. For each group, these contour points are comb ined with the medial axis points to pass in as input for finding a fitting ellipse. Occasionally, an ellipse found is elongated and relatively much larger in size than other ellipses, due to unknown causes. In such cases, we call the original fit Ellipse method, wh ich calculates a new center fro m the points and uses all three equations. The orig inal fit Ellipse method, however, estimates extremely large ellipses for med ial axes with slope 0, so we check for these and shrink them down to the branch's bounding box size. We have tried using the original fit Ellipse method alone, without using our medial axis midpoints as centers, and the resulting ellipses’ rotations and elongations are not as optimal as the current method; they seem to be more upright and more regular, with fewer transformations and less elongated, which means they approximate the shapes less accurately. Another approach we have tried was the Hough circles, but the results were not suitable to our needs. The circles were not calculated based on a medial axis or a center point, but instead are based on a binary image of the insect. Resulting circles do not seem to fo llo w any salient feature of the insect. Moreover, circles are not as flexib le as ellipses by nature. void fitEllipse_mod(medAxisPts, contPts, midPt, bbox) { use Equation 4 to fit an ellipse around medAxisPts and contPts, with midPt being the center; if (ellipse si ze is more than 200% of bbox) // Refit using all three equations cvFitEllipse2 (); // When fitted line is slope 0 if (ellipse is more than 200% of bbox shrink_ellipse_to_boundingbox (ellipse, bbox); return ellipse; } // Now each group has an ellipse fitted; ready to use the ellipses to construct the hierarchy for the insect image. 3.2.3. Hierarchy formation Using the ellipses found and their spatial relationships to each other, we construct a hierarchy whose root starts fro m appro ximately the center of mass of the insect and whose subsequent levels expand rad ially fro m each existing node; see Figures 4 and 5. The hierarchy is represented by pairs of a parent ellipse and a child ellipse. Before determin ing the root, we use erosions and dilations to obtain a main b lob of the image, excluding any noises, in this case, the legs and antenna. This way, the center ellipse of the image can be more accurately determined, eliminating cases where an ellipse on the edge of the image is closer to the center of mass (CoM) due to noise. Thus, the root is usually found to be the ellipse closest to the center of the object. A special case occurs when the CoM falls outside of the object, such as in hollow images, e.g. a curled insect, where the CoM would fall inside hole. When this happens, we use the exact pixels of the main blob, instead of the bounding box in normal cases, to calculate the 5 root_cand_list.add (this ellipse); root = get_ellp_closest_to_CoM (root_cand_list); intersection with each ellipse. This addresses the problem of the bounding box being deceiving when it is drawn around an object with holes. In addition, we also set a more lenient threshold, 50%, on the percentage of overlap between the ellipse and the blob, rather than the 80% in normal cases. } Algorith m for forming the remaining h ierarchy: // Find the remaining nodes of the hierarchy by a recursive loop starting from root Figure 4. Step-by-step process showing the hierarchy formation from an insect image. radius = find_average_radius (all_ellipses); // Sort all ellipses by distance to root ellipse, short to long sorted_list = sort (all_ellipses); curr_parent = root; queue.add (root); i. Root ellipse is found as the ellipse in w hich the center of mass (magenta dot) falls. // Each iteration does 1 level of hierarchy, until queue is empty, i.e. all ellipses are in hierarchy loop { // Ellipses within radius to curr_parent are its children children = find_all_children (curr_parent, curr_radius); ii. Looping through the remaining ellipses, the first child ellipse, A, of root is found within radius r of root. iii. Second child ellipse, B, of root is found w ithin radius r of root. // If parent is a leaf if (children.size() == 0) { curr_radius = 1.5*radius; continue; } (For root, increase until a child is found, unless there is only 1 ellipse total; for other nodes, increase just once.) iv. No ellipses are found within radius r of A, radius increased to 1.5*r to try again. Child C of ellipse A is found. queue.pop (curr_parent); if (thisLevelDone()) lvl_num++; v. Child D of ellipse B is found within radius r of B. if (queue.size () > 0) curr_parent = queue.at (0); vi. Hierarchy relationship among all five ellipses is obtained. else if (! allEllpsAreInHier ()) { if (! Increased_radius) { // Push all nodes in this level back into // queue repeatThisLevel (queue); curr_radius = 1.5*radius; continue; } else { // Find an ellipse who is not in the hierarchy, but whose previous 3 neighbors in the list closer to the root are in the hierarchy orphan = find_orphan (sorted_list); // Parent is found by taking the a verage parent level of its previous 3 neighbors. find_parent (orphan); vii. In a tree structure, the root has two children, A and B, each of whic h has one child, C and D, respectiv ely. Algorith m for finding the root of the hierarchy: get_CoM (image); // Determine root ellipse if (CoM falls within exactly 1 ellipse) root = this ellipse; else if (CoM falls into more than 1 ellipse) root = the one closest to CoM; else // CoM falls into no ellipses { // Calculate a blob representing the image’s ballpark erodeAndDilate (); For each ellipse if (>= 80% of its bounding box is within the bounding box of the main blob) } curr_parent = queue.at (0); } curr_radius = radius; } 6 Original Image inconsistency in the number of samp les, the resulting accuracy in each species may be affected by the lack of samples, in addit ion to the descriptor’s shortcomings and the machine learn ing frameworks’ robustness. Hierarchy 4.1. Spatial Pyramid Kernel Before the development of the new hierarchical descriptor, the spatial pyramid kernel was used on three existing descriptors as explained in section 3.1 to achieve a comb ined accuracy of 88.06%. Table 1 illustrates the results of using other SVM kernels on the stacked co mbination, which achieved lower accuracies than the spatial pyramid kernel. Figure 5. An insect image and its hierarchy extraction from the Amphin species. In the hierarchy image, the magenta dot marks the center of mass, and the dark blue ellipse marks the root. As level increases, the ellipse color follows a gradient of blue, green, yellow, and red (in this case, the deeper levels were not reached). The line segments connect the centers of hierarchical pairs of ellipses. RBF Hist Int Pyramid Accuracy 84.50% 87% 88.06% Table 1. Accuracies obtained by the Radial Basis Function (RBF), histogram intersection, and spatial pyramid kernels. 3.2.4. Outputting descriptor Co mpared to the histogram intersection kernel, which achieved an accuracy of 87%, the spatial pyramid kernel had 88%. Th is improvement is small but not accidental. In the 16-by-16 grid base we used, the pyramid preserved three additional levels of 4-by-4, 2-by-2, and 1-by-1 cells, having a total of 4 levels of spatial information, while the histogram intersection kernel has only 1 level, the base. In simp ler terms, the pyramid kernel captures informat ion at a local level at the 16x16 base, but also preserves spatial arrangements at a more global level, on the 4x4 and 2x2 layers. Therefore, it is only possible for the pyramid kernel to obtain an accuracy at least the same as that of the histogram intersection kernel, if not higher. The coarser 4-by-4 and 2-by-2 grids might have helped tolerate insect species that have large intra-class variations, where the 16-by-16 grid over-specifies the insect. Thus, even though the pyramid kernel has a slightly longer running time than the histogram intersection kernel, it is reasonable and worthy for the increase in accuracy. It may increase even more for the new hierarchical descriptor, wh ich we have not tried with both kernels. In earlier stages, we have tried the sigmoid kernel, and it produced similar accuracies to the pyramid kernel, but was 3 to 4 t imes slower. We have also experimented with 32x32, 8x8, and 4x4 base pyramids, whose results were much lo wer than a 16x16 one; the 32x32 base pyramid is also several times slower to compute. Similar for attempts to use only some of the levels, such as using only the 32x32, 16x16, 8x8 and 4x4 levels in a 32x32 base pyramid, skipping the 2x2 and 1x1 levels. Ellipses obtained need to be outputted to a matrix format in a text file in order to pass to SVM train ing. For each image, we output all the ellipses to a text file, each row being an ellipse represented by a 5-tuple (u, v, a, b, c), where the ellipse equation a ( x − u ) + 2b ( x − u )( y − v ) + c ( y − v ) = 1 2 Kernel 2 is satisfied [12]. This file is used as input to a program whose binaries are availab le on the Oxford v ision group’s website [12]. After taking the file as input, the program performs affine transform on each ellipse to obtain a circle, and then it co mputes a 128-colu mn SIFT descriptor around each circle [5]. In other words, fro m this program, we obtain a SIFT descriptor for each ellipse we found on the insect image. We take this SIFT descriptor mat rix and concatenate it with a matrix containing the pair-hierarchy info rmation we constructed. Hierarchy properties we used include: center and SIFT descriptor of each ellipse and its parent ellipse, distance and midpoint between them, and angle between them with respect to the child ellipse. Using this format for the final descriptor, we pass the descriptor of each image to the existing classification framework to assess the effectiveness of this new shape descriptor. The framework is also capable of co mbining this descriptor with other existing appearance and shape descriptors to obtain the combined overall accuracy. 4. RESULT S & DISCUSSIONS More than 4700 images are in the 29-species insect dataset. Each species contains a range of 27 to 328 images, averaging 163 images per species. Due to the 7 4.2. Hierarchical Descriptor 8.5 hours to process and to generate the hierarchical descriptor for all 4700+ images. The descriptor matrix is integrated into a random-trees local feature classifier in the current SVM framework [11] for training and testing. We obtained an accuracy of 66.16%, wh ich shows ample room for improvements. See Figure 6 for the confusion matrix, Figure 7 for some results of the hierarchy construction, and Figure 8 for its current deficits. In later stages, we will be modifying the algorith m and testing this descriptor alone again, and then combin ing it with other existing descriptors [11] to boost the overall co mbined accuracy. Our latest developments are the ellipses extraction and the hierarchy construction. The decision to construct a hierarchy is so that it can record an order of arrangement among the descriptors for the elliptical regions. Thus, we plan to concatenate the pair of parent’s and child’s SIFT descriptors in the final matrix, which currently only contains the child and its distance, angle, and midpoint to the parent, without the parent’s actual SIFT descriptor. We have conducted an init ial test on the hierarchical descriptor alone. In the initial run, it took a total of about Figure 6. Confusion matrix result of the initial run of the hierarchic al descriptor, whic h averaged to 66%. Values are rounded to the nearest percentage of (predict samples / total samples). Each row represents a class of species, and each column is the species predicted. For each row, <row title> was predicted as <col title>. Figure 7. (Best viewed in color) Hierarchy extraction for the 29 species in the dataset, one most representative image each. Images are presented in pairs, with the odd columns being the original and the even columns the hierarchy images. In the hierarchy images, a magenta dot indicates the center of mass; dark blue indicates the root ellipse; subsequent levels follow the order of blue, green, yellow, red. Line segments connect the centers of parent and child ellipses in a hierarchical pair. As can be seen, some species are very similar to others in shape and/or appearance. Original Image Hierarchy Original Image Hierarchy 8 Original Image Hierarchy 9 5. CONCLUSION & FUT URE WORK traditional stacking, a matrix with probability values for each class instead of a binary one may offer differences in the results [4]. In the near future, we propose to improve the hierarchical descriptor by increasing the nu mber of ellipses and thus medial axes, modifying the hierarchy construction to avoid shortfalls as shown in Figure 7, and possibly modifying the med ial axes ext raction. Depending on how much each of these would improve the accuracy, we may co mbine two or more into the final descriptor. Then, after these attempts, we will co mbine this shape descriptor with other existing descriptors in the framework to see how this new descriptor can boost the overall stacked accuracy. Ultimately, we hope to achieve an accuracy of 90% or beyond. We have developed a spatial pyramid kernel and have combined it with a new stacking method that achieved an overall accuracy of 88.06%, outperforming other classifiers [11, 14, 8]. In addition, a new hierarchical descriptor based on pairs of ellipses calculated fro m med ial axis line segments extracted fro m the image has achieved an in itial accuracy of 66.16%, even though the pairing info rmation was not yet fully used. Improvements are underway. As mentioned in section 3.1, we have not tried using posterior matrix instead of binary matrix in tradit ional stacking. Instead, we are taking a new stacking approach, as described in [11]. However, if one were to use a Figure 8. Shortcomings to the algorithm include occasionally wrong root, no clear route of hierarchy, and intra-class inconsistencies. 6. ACKNOWLEDGEMENT S sponsors for the Distributed Research Experiences for Undergraduates (DREU), the Co mputing Research Association on the Status of Women in Co mputing Research (CRA-W), and the Co mputer Science and Engineering Depart ment at the University of Washington. Accomplishments presented thus far and future progress would not be possible without the support of any one party listed. We would especially like to acknowledge the excellent mentorships of Natalia Larios and Dr. Linda Shapiro at the University of Washington, both of whom actively helped to push the progress along and kindly made valuable contributions and suggestions. Appreciations also go to the program coordinators and 7. REFERENCES [1] A. Bosch, A. Zisserman, and X. Munoz. Representing Shape with A Spatial Pyramid Kernel. CIVR 2007. [2] A. Etemadi. Object Recognition Toolkit. http://www.cs.washington.edu/education/courses/cse 576/10sp/software/index.ht ml [3] A. Y.-S. Chia, S. Rahardja, D. Rajan, and M. K. Leung. Object Recognition by Discriminative Co mbinations of Line Seg ments and Ellipses. CVPR 2010. [4] D. A. Lisin, M . A. Mattar, M. B. Blaschko, M. C. Benfield, and E. G. Learned-Miller. Co mbining Local [5] [6] [7] [8] 10 and Global Image Features for Object Class Recognition. D. Lowe, Distinctive image features fro m scale invariant keypoints. In IJCV 60(2):91-110, 2004. G. Bradski. The OpenCV Library. Dr. Dobb’s Journal of Soft ware Tools, Vo lu me 25, Nu mber 11, November 2000, pp. 120, 122-125. http://opencv.willowgarage.co m/wiki/ H. Blu m. A transformation for extract ing new descriptors of shape. Models for the perception of speech and visual form, 1967. K. Grau man and T. Darrell. Pyramid Match Kernels: Discriminative Classification with Sets of Image Features. In Proc. ICCV, 2005. [9] M. Hall, E. Frank, G. Ho lmes, B. Pfahringer, P. Reutemann, I. H. Witten (2009); The WEKA Data Mining Soft ware: An Update; SIGKDD Exp lorations, Volu me 11, Issue 1. [10] N. Larios, B. Soran, L. G. Shapiro, G. M. Munoz, J. Lin, and T. G. Dietterich. Haar Random Forest Features and SVM Spatial Matching Kernel fo r Stonefly Species Identification. In ICPR, 2010. [11] N. Larios, J. Lin, M. Zhang, L. G. Shapiro, and T. G. Dietterich. Stacked Spatial-Pyramid Kernel: An Object-Class Recognition Method toCombine Scores fro m Random Trees. Pending, WACV 2010. [12] Region Descriptors Linu x binaries. Visual Geo metry Group, University of Oxfo rd. http://www.robots.ox.ac.uk/~vgg/research/affine/inde x.ht ml [13] R. O. Duda, P. E. Hart, Use of the Hough Transformation to Detect Lines and Curves in Pictures. Co mm. ACM, Vo l. 15, pp. 11–15. January 1972 [14] S. Lazebnik, C. Sch mid, and J. Ponce. Beyond Bags of Features: Spatial Pyramid Matching fo r Recognizing Natural Scene Categories. 11
© Copyright 2026 Paperzz