Pattern Recognition 36 (2003) 69 – 78 www.elsevier.com/locate/patcog Evaluation of a hypothesizer for silhouette-based 3-D object recognition Boaz J. Super ∗ , Hao Lu Computer Vision and Robotics Laboratory, Department of Computer Science, University of Illinois at Chicago, 851 S. Morgan St., Chicago, IL 60607, USA Received 1 May 2001; accepted 2 January 2002 Abstract Shape retrieval and shape-based object recognition are closely related problems; however, they have di2erent task contexts, performance criteria, and database characteristics. In previous work, we proposed a method for similarity-based 2-D shape retrieval using scale-space part decompositions, part-frequency distributions, and structural indexing. In this paper, we evaluate the use of that shape retrieval method as the hypothesis generation component of silhouette-based 3-D object recognition systems, using a performance criterion and test database appropriate for the new application. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Object recognition; Hypothesis generation; Silhouette; View-based; Scale-space; Index vector; Structural indexing; Shape retrieval 1. Introduction Shape retrieval and shape-based object recognition are closely related problems with similar solution methods. However, the task context and performance criteria are di2erent. Shape retrieval is a user tool for information retrieval; for example, to help users search a pictorial catalog of botanical varieties [1], or to index images and video based on the shapes contained in them [2,3]. A shape retrieval method is successful if many of the retrieved results are similar to the query (high precision), or many of the similar items in the database are retrieved (high recall). Because the ultimate arbiter of similarity is the human user, these performance measures are necessarily subjective. Object recognition, on the other hand, is typically part of an automated system. Success can be objectively measured as the proportion of images of test objects correctly recognized. ∗ Corresponding author. Tel.: +1-312-413-8719; fax: +1-312413-0024. E-mail address: [email protected] (B.J. Super). To avoid the high cost of comparing an image to every object model in a database, object recognition systems often employ a fast hypothesizer to retrieve a set of likely candidate objects, followed by a more costly and more accurate veri-er which reduces the set of hypotheses. Ideally, the reduced set will consist of one correct match. In a silhouette view-based system, a set of object silhouettes imaged from di2erent viewpoints is stored for each model object, and a silhouette of the object to be recognized is compared to the stored silhouettes. A shape retrieval engine can be used as a hypothesizer to return a set of silhouettes similar to the input, but it may be the case that only a few, or none, of these silhouettes belong to the correct object model. Thus, high precision or recall are not suBcient; what matters is whether any one of the retrieved similar silhouettes are from views of the correct object model. The well-known distinction between object shape and object identity prevents these two criteria from being the same. An alternative scheme is for the hypothesizer to rank the database silhouettes and the veriCer to process them in rank order, stopping when a good match is found. In this case, the performance of a particular hypothesizer should be evaluated by measuring the reduction in the expected number of 0031-3203/02/$22.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 2 ) 0 0 0 2 3 - 7 70 B.J. Super, H. Lu / Pattern Recognition 36 (2003) 69 – 78 silhouettes that the veriCer must process. This is a measure of the reduction in the cost of the veriCcation stage obtained by using the hypothesizer. We adopt the hypothesis ranking approach in this paper. In Refs. [4 – 6], we developed a fast 2-D shape retrieval system, based on scale-space decompositions of silhouettes, part-frequency histograms coded as index vectors, and structural indexing. In experimental evaluations, the method retrieved at least one similar shape in the top three retrieved items 99 –100% of the time on two test databases. Good scalability was observed over database sizes from O(102 ) to O(103 ), with average retrieval times from 0.7 to 7 ms, respectively. Graceful degradation under occlusion was also demonstrated. The aim of the present paper is to evaluate our 2-D shape retrieval engine as a hypothesizer for silhouette-based recognition of isolated 3-D objects, using performance measures appropriate to this new application. Since silhouette databases for 3-D object recognition tend to be qualitatively di2erent from databases used in 2-D shape retrieval applications (e.g., [1,5,7,8]), we use a new test database constructed for this evaluation. By “isolated object,” we refer to a single object placed against a contrasting background. This case is of course much easier to handle than recognition of objects in clutter, and therefore can be solved more e2ectively using computer vision techniques available today. There are numerous applications in which processing of isolated objects is of interest; for example, digitization of museum collections, archaeological artifacts, and zoological=botanical specimens. Automatic acquisition of silhouettes is particularly reliable in these controlled situations, and can be performed simultaneously with regular image acquisition or 3-D scanning. Object recognition systems that match an input silhouette against a database of stored silhouette views have a long history (e.g., [9 –13]). Methods of this type di2er in the representation of the silhouette (e.g., Fourier descriptors [9,10], moments [10], positions of maxima in curvature scale space [13], attributed strings [14]), and also di2er in the indexing scheme (e.g., geometric hashing [15], structural indexing [16], hierarchical indexing [14]). A detailed discussion of silhouette-based methods for both shape retrieval and object recognition was given in Ref. [6]. However, because both the hypothesizer evaluated here and [13] use curvature scale space, it will be helpful to describe the di2erences brieLy here. In Ref. [13], each signiCcant segment of the silhouette contour is represented by a point in curvature scale space. Matching is performed by alignment of these point sets in scale space. Our method employs curvature scale space in a very di2erent way. First, curvature scale space is used to partition the silhouette contour into overlapping local parts at all scales; then, contour scale space is used to classify the parts into types. Matching is performed not by alignment but by comparing part-type histograms. EBciency is achieved by indexing on local parts instead of by preCltering. Silhouettes have also been employed for purposes besides view-based object recognition. Silhouettes have been used for recovery of 3-D shape by shape-from-contour [17] and by volume intersection [18,19]. Although there are many object recognition methods based on matching images to 3-D models, or matching 3-D models directly to 3-D models, we Cnd that object recognition based on matching image silhouettes is fast and has low storage requirements. Furthermore, acquisition of silhouettes for view-based object recognition can be performed easily and inexpensively using an uncalibrated camera (see Section 2.1 for a description of our imaging platform). The remainder of the paper is organized as follows. Section 2 describes the object recognition system design. This section includes a summary of the shape retrieval engine from Refs. [5,6] that will be evaluated as the hypothesizer. Section 3 describes the test database we constructed to perform the evaluation. Section 4 quantitatively evaluates the hypothesizer. 2. System design A block diagram of a typical silhouette-based object recognition system is shown in Fig. 1. It has four parts: acquisition, silhouette contour extraction, hypothesis generation, and hypothesis veriCcation. The shape retrieval engine from Refs. [5,6] will be used as the hypothesis-generation stage. As the goal of this paper is to evaluate the hypothesizer independently of any speciCc veriCer, we use a general model of a veriCer as the last stage. The following sections describe the system. 2.1. Acquisition Objects were placed on a turntable that could be rotated ◦ through 360 , and imaged with a single camera mounted on a stand. The turntable top surface and the background had the same color, and the illumination was controlled, in order to obtain good discrimination between object and background. (An alternative arrangement would be to use backlighting.) Fig. 2 illustrates the imaging station and geometry. Views of the object at di2erent azimuths could be acquired by rotating the turntable. Views at di2erent elevations required adjustment of the camera height. In an environment for automatic digitization of large numbers of objects it would be advantageous to have computer control of both azimuth and elevation. Fig. 3 shows examples of object images. Fig. 5 ◦ shows 16 silhouettes of object 14 at elevation 0 and ◦ azimuth angles 22:5 apart. 2.2. Silhouette contour extraction Iterative thresholding was performed to automatically separate the object from the background. Then a boundary-following algorithm extracted the contour B.J. Super, H. Lu / Pattern Recognition 36 (2003) 69 – 78 object image acquisition silhouette extraction hypothesis generation ranking hypothesis verification 71 recognized object silhouette database Fig. 1. Block diagram of silhouette-based object recognition system. elevation turntable azimuth Fig. 2. Image acquisition platform. ing of sequences of neighboring scale-space regions to form shape parts. The advantage of the scale-space approach is that it does not presuppose a single scale of shape analysis, and that the embedded segmentation method is essentially parameter-free. We assume that an object silhouette is represented by a planar contour c(u) = (x(u); y(u)), where u is the parameter. A two-parameter contour scale space c(u; ) is generated by convolving the coordinate functions of c(u) by 1-D Gaussians over a range of scales : c(u; ) = (x(u; ); y(u; )) ≡ c(u) ∗ g (u) corresponding to the object’s silhouette [20]. The contour was stored as an ordered list of points. Figs. 4 –5 show examples of silhouettes. The contours were expanded to Ct the individual panels in the Cgures to make the silhouettes legible. This normalization is for display only; the method itself does not require normalization. 2.3. Hypothesizer The hypothesizer is the shape retrieval engine we developed in Refs. [4 – 6]. As this is the module evaluated in this paper, we summarize the method in this section to make the paper self-contained. For a detailed description and a discussion of its relation to alternative methods in the literature, see Ref. [6]. The hypothesizer consists of three components. The Crst component decomposes the silhouette contour into a collection of scale-space parts. The second component constructs histograms of the part types and represents them as index vectors which are stored in the database. The third component compares the input silhouette’s index vector with the index vectors in the database, using structural indexing to perform the comparison eBciently. We review each of the components below. The term “feature” is used in the literature to mean either a part or a measurement. To prevent confusion, in this paper “feature” will always refer to a measurement. 2.3.1. Scale-space part-based representation of silhouettes A scale-space method is used to generate a rich and redundant set of parts at multiple scales of the silhouette contour. The method consists of three stages: generation of the contour scale space and the curvature scale space, segmentation of the silhouette contour into scale-space regions, and link- = (x(u) ∗ g (u); y(u) ∗ g (u)); where ‘∗’ denotes convolution and g (u) = (2 2 )−1=2 2 2 e−u =2 is a 1-D Gaussian with scale [21]. As increases, the contour becomes progressively smoother, a process called contour evolution. Fig. 6a shows an example. At any one scale, the silhouette contour is segmented at zero-crossings of its curvature function, which correspond to the contour’s inLection points. These points are invariant to translation, rotation, and scaling. By deCnition, the segments between zero crossings have no internal changes of sign, so they bend in only one direction, forming an alternating sequence of protrusions and concavities. Other types of critical points, such as negative curvature minima [22], could also be used. To Cnd the zero crossings, a curvature scale space is computed [21]: x (u; )y (u; ) − x (u; )y (u; ) (u; ) = ; [x2 (u; ) + y2 (u; )]3=2 where x (u; ) = x(u) ∗ g (u); x (u; ) = x(u) ∗ g (u); y (u; ) = y(u) ∗ g (u); y (u; ) = y(u) ∗ g (u) are smoothed versions of the Crst and second derivatives of the contour’s coordinate functions. (u; ) is the curvature at position u on the contour at scale . A zero-crossing graph shows the u-values of the zero crossings as functions of , as shown in Fig. 6b. 1 At any one scale , the contour is partitioned into segments, represented in the zero-crossing 1 The zero-crossing graph is called a curvature scale space image in Ref. [21]. 72 B.J. Super, H. Lu / Pattern Recognition 36 (2003) 69 – 78 Fig. 3. Twenty-Cve objects in the test set. Fig. 4. Silhouette contours of the views in Fig. 3. B.J. Super, H. Lu / Pattern Recognition 36 (2003) 69 – 78 73 ◦ Fig. 5. Sixteen views of object 14 (dinosaur toy), at azimuth intervals of 22:5 . Fig. 6. (a) Evolution of a contour. The bold line indicates the original contour. (b) The corresponding zero-crossing graph showing the positions (u) of zero crossings as functions of scale . graph as horizontal spans between successive zero-crossing traces. The segments at adjacent scales are generally not independent. A contour segment typically exists over some range of scales until it is Cnally smoothed away. This process will now be explained more precisely. As increases, zero crossings shift position, and eventually merge and disappear in pairs, forming the loops in the zero-crossing graph [21]. 2 Zero-crossings can never be created, thus the loops always point downwards. When two zero crossings merge and disappear at some scale 1 , the three segments containing them are merged into one new segment. That segment persists until it is merged with other segments at a larger scale 2 . The range of scales (1 ; 2 ) 2 The loops in Fig. 6b are not closed because in a practical implementation, is sampled discretely. over which a segment exists will be called the lifetime of that segment. A scale-space segment corresponds to a region in the zero-crossing graph, bounded on the left and right by two zero crossings and above and below by the scales at which zero-crossing merging events occur (see Fig. 7). This is a special case of Witkin’s [23] deCnition of regions in general 1-D signal scale spaces. In the following, the term segment without qualiCcation will refer to a scale-space segment, and contour segment will refer to a section of the original contour c(u). The database matching method is based on comparing histograms of the part types occurring in two silhouette shapes. Thus, parts that can distinguish among di2erent shapes are needed. Individual segments may not have suBcient discriminating power, so sequences of segments are used instead. SpeciCcally, a part of length L is deCned as a spatial 74 B.J. Super, H. Lu / Pattern Recognition 36 (2003) 69 – 78 The part feature vector is deCned as follows. The total curvature of a segment, curv(s), is measured by integrating the derivative of local orientation along the segment. The length contrast of two neighboring segments si ; si+1 is measured as l(si+1 ) − l(si ) LC(si ; si+1 ) = ; l(si+1 ) + l(si ) Fig. 7. Diagram of a typical portion of a zero-crossing graph. Six zero crossings (z1 ; : : : ; z6 ) and their traces as a function of are shown. Regions of scale space corresponding to scale-space segments (s1 ; : : : ; s7 ) are indicated by shading. Parts consist of sequences of regions that are adjacent at a common scale. The parts of length 3 in this example, indicated by the double-headed arrows, are p1 = (s1 ; s2 ; s3 ); p2 = (s2 ; s3 ; s4 ); p3 = (s3 ; s4 ; s5 ), and p4 = (s6 ; s4 ; s5 ). sequence of L adjacent scale-space segments existing at a common scale (Fig. 7). If the same sequence of scale-space segments (regions) exists at more than one scale it is counted only once. A part persists as long as none of the segments composing it undergo any merges. Parts overlap both in space and in scale, providing redundancy which enhances robustness. In the following, P(S) will denote the set of parts generated from silhouette S. For convenience, only parts of a Cxed length L will be used, although this is not necessary. In Ref. [6], we showed that the number of parts of length L is bounded above by (L=2 + 1) times the number of segments in the original silhouette contour. Since L is typically small (L = 3 in this paper), the use of scale-space parts instead of contour segments does not incur a substantial additional cost. Furthermore, the Cnest scales ( 6 3 pixels) are not used, since structures at those scales are often due to discretization noise. As a result, the number of parts may be less than the number of segments on the original silhouette contour. 2.3.2. Part-type histograms and index vectors The shape parts are sequences of scale-space regions. This raw multiscale representation is not convenient for database indexing. Instead, the system classiCes parts into types, and the database is indexed by the part types. Each silhouette shape will then be represented in compact form by the frequencies with which the part types occur in the silhouette. The parts are classiCed into part types by Crst representing them with local shape features and then quantizing those features into a Cnite number of classes. where l(s) is the arc length of segment s. Length contrast is between −1 and 1. LC ¡ 0 indicates a long segment followed by a short segment; LC ¿ 0 indicates the reverse; and LC ≈ 0 indicates that the two segments are commensurate in size. Then, denoting the segments of a part of length L by s1 ; : : : ; sL , the part is represented by the (2L−1)-dimensional feature vector H = (curv(s1 ); LC(s1 ; s2 ); curv(s2 ); : : : ; LC(sL−1 ; sL ); curv(sL )): The feature vector is then quantized by quantizing each feature dimension. Each distinct possible quantized feature vector is one part type, corresponding to one hyper-rectangle in the feature space. T will denote the set of part types used by the hypothesizer, which may be a subset of all possible part types. T (S) ⊆ T will denote the set of part types occurring in silhouette S. Note that T (S) is a set of part types, while P(S) is a set of parts; i.e., individual occurrences of part types. The use of coarse quantization to reduce quantization errors is recommended by Califano and Mohan [24], Stein and Medioni [16]. Accordingly, segment curvature is quantized into eight non-uniform ranges: (−∞; − ); [− ; − =2), [− =2; − =4), [− =4; 0), [0; =4), [ =4; =2), [ =2; ); [ ; ∞), and length contrast is quantized into three ranges: [−1; − 13 ); [− 13 ; 13 ), and [ 13 ; 1]. The number of part types of length L is thus 8L 3L−1 . The silhouette as a whole is represented by the number of occurrences of each part type in that silhouette. Let T = {t1 ; : : : ; tn } be the set of part types used by the system and let PS (ti ) = {pi1 ; : : : ; pini } denote the set of occurrences of part type ti in silhouette S. DeCne the saliency-weighted part-type histogram of a silhouette S given T as DT∗ (ti ; S) = w(pij ); for i = 1; : : : ; n; (1) pij ∈Ps (ti ) where the saliency weight w(pij ) is a function of the part and=or the part type. When w ≡ 1; DT∗ is simply the histogram of the part types. In the current implementation, the saliency of a part p is deCned as L 2 (si ) w(p) = log2 ; (2) 1 (si ) i=1 where s1 ; : : : ; sL are the scale-space segments in the part and 1 (s); 2 (s) are the smallest and largest scales of segment s, respectively. This saliency function emphasizes parts that have a long scale-space lifetime. B.J. Super, H. Lu / Pattern Recognition 36 (2003) 69 – 78 The part-type histogram of a silhouette S will be repre∗ sented by an index vector v=(v R 1 ; : : : ; vn ), where vi =DT (ti ; S) for i = 1; : : : ; n. Since the same set T is used for all silhouettes handled by the system, all index vectors have the same length, and corresponding components refer to the same part type. The similarity of two silhouettes is measured by comparing part-type histograms using a normalized histogram intersection measure n i ; qi ) i=1 min(v M (v; R q) R = ; (3) max( ni=1 vi ; ni=1 qi ) where v; R qR are the index vectors of two silhouettes. The normalization factor is the maximum of the sums over the two histograms, so that silhouettes that are very dissimilar in complexity will not produce a high similarity score. To summarize: The part feature vectors are used to classify silhouette parts, and to deCne the meaning of individual components of the index vectors. Silhouettes are compared by matching index vectors, not feature vectors. We note that the method is invariant to translation, rotation, and scaling since invariant features are used, and is invariant to starting point on the silhouette contour due to the use of part-type histograms for matching. 2.3.3. Fast ranking of silhouette match hypotheses This section summarizes an eBcient method for computing the similarity scores between the input silhouette’s index vector and the index vectors of the stored silhouettes. Let vR1 ; : : : ; vRm denote the index vectors of all the silhouettes of all the objects in the database, and let qR be the index vector of the input silhouette. The similarity score M (vRj ; q) R is computed for all j = 1; : : : ; m and the stored silhouettes are ranked. The sums in the denominator of M (Eq. (3)) can be precomputed for the stored silhouettes; therefore, the expensive part of the computation is the numerator of M , since every component of the input index vector is compared to the corresponding component of every index vector in the database. Typically, the part type set T is large but the number of part types occurring in any one silhouette is small. Thus, most components of the index vectors are zero and contribute nothing to the similarity score. High speed can be achieved by computing only the non-zero terms of M . This is accomplished by indexing the database independently by each part type (structural indexing [16]). Each part type t ∈ T that appears in the database points to a list of the silhouettes in the database that contain at least one occurrence of t. The similarity scores M (vRj ; q) R for all stored silhouettes j = 1; : : : ; m are then computed as follows. For each non-zero component i of the input index vector q, R the list of stored silhouettes containing part type ti is fetched using a hash table. For each entry on the list, corresponding to some stored silhouette Sj , the ith term of the numerator of M (vRj ; q) R is computed and added to the score for silhouette Sj . Normalization of the scores for each silhouette can be 75 performed after the sums have been completed, and all the scores are then ranked. The process is fast: in Ref. [6], the average time to match an input vector to a 1310-silhouette database and to rank all 1310 silhouettes was 7 ms on a Sun Ultra-10 333 MHz computer. 2.4. Model of veri-er Since our goal is to evaluate the hypothesizer independently of any speciCc veriCer, we use a general model of a veriCer, which satisCes two properties: (1) the veriCer checks candidate matches in the rank order produced by the hypothesizer, and (2) the veriCer makes no errors. This model will allow us to measure the hypothesizer’s contribution to performance without confounding it with veriCer errors. SpeciCcally, we will compute the expected number of veriCcations performed by the ideal veriCer as a function of di2erent hypothesizers. 3. Test database We constructed a test database for evaluation of the hypothesizer, as follows. Twenty-Cve common objects and toys shown in Fig. 3 were used. Silhouettes of the views in Fig. 3 are shown in Fig. 4. Sixteen views were taken of each ◦ ◦ object, at elevation 0 and azimuth spacing of 22:5 . Thus, there were a total of 400 images stored in the database. Each image was 320 × 243 pixels and was in color to enhance the reliability of automatic thresholding. The silhouette contours were represented at one-pixel resolution. Database index vectors were generated for the silhouettes as described in Section 2.3 above. SigniCcant storage compression was achieved: the original images required 77,760 pixels each; the contours required 685 points on average; and the index vectors contained 70 non-zero components on average. 4. Evaluation Each of the 400 images was used as a test image in turn. The self-match of the test image to its copy in the database is excluded from all results reported below. Thus, each test ◦ image is a minimum of 22:5 of separation from the nearest view of the correct object. This is a fairly stringent test, since ◦ a 22:5 viewpoint change can result in a signiCcant change in the silhouette in some cases. We consider the database to be a diBcult one due to high levels of noise in the silhouette contour extraction (see the dinosaur’s leg in view 13 of Fig. 5 for an example), and due to the similarity of some of the views of di2erent objects (e.g., objects 20 and 24, or objects 16 and 18, in Fig. 3). Cross-object matches are always counted as incorrect, even in cases where a human observer would confuse views from two similar objects. 76 B.J. Super, H. Lu / Pattern Recognition 36 (2003) 69 – 78 Table 1 Statistics of the rank of Crst correct match for di2erent hypothesizers. For the random ordering case only, the expected value is reported instead of the sample mean Rank of Crst correct match Hypothesizer Mean Part-based (histogram intersection) 5.2 None (random ordering) 25.0 Global (aspect ratio) 11.7 Combined (part-based & global) 3.6 Table 2 Statistics of the rank of Crst correct match for di2erent hypothe◦ ◦ sizers, for viewpoint sampling at 45 instead of 22:5 (8 views per objects instead of 16). For the random ordering case only, the expected value is reported instead of the sample mean Rank of Crst correct match Median 1 18 6 1 Fig. 8. Cumulative distribution of rank of Crst correct hypothesis for the four hypothesizer cases. The hypothesizer ranked all 399 database silhouettes in descending order of similarity to the test silhouette, where similarity is deCned as in Eq. (3). The ideal veriCer would test each silhouette in this ordering, stopping when the Crst correct match is found. The mean rank of the Crst correct match, computed over all 400 test inputs, was 5.2; the median was 1 (Table 1). The cumulative distribution of the rank of the Crst correct hypothesis is shown in Fig. 8 (the solid line). To provide a basis for comparison, the system was compared to an object recognition system with no hypothesizer, and also to an object recognition system with a hypothesizer that used a perceptually salient global feature of silhouettes, the aspect ratio. The system without a hypothesizer was modeled by presenting the veriCer with database silhouettes in arbitrary order. We assumed all orderings of database silhouettes to be equally likely. The cumulative distribution of the rank of the Crst correct hypothesis is plotted as a dotted line in Hypothesizer Mean Part-based (histogram intersection) 7.3 None (random ordering) 25.0 Global (aspect ratio) 12.2 Combined (part-based & global) 6.5 Median 2 19 6 2 Fig. 8. The expected rank of the Crst correct match is 25; and the median is 18 (Table 1). Thus, the part-based hypothesizer reduced the expected number of candidates the veriCer must process by approximately 20 (25 – 5.2), yielding a veriCcation-stage speedup factor of 4.81. In the system with the global-feature hypothesizer, the as pect ratio was computed as Emin =Emax , where Emin ; Emax are the minimum and maximum (invariant) second moments. The mean rank of the Crst correct match was 11.7 and the median was 6. This hypothesizer thus reduced the mean number of hypotheses processed by the veriCer by about 13 instead of 20. The veriCcation-stage speedup is more than a factor of two (2.25) greater for the part-based hypothesizer than for the global-feature hypothesizer. We also evaluated a combination of the global-feature and part-based hypothesizers [6]. SpeciCcally, the ranking of the database silhouettes was modiCed so that silhouettes with aspect ratios within 0.2 of the input silhouette’s aspect ratio were ranked before all of the others. Within each group (6 0:2 and ¿ 0:2 aspect ratio di2erences), the ordering based on the similarity measure in Eq. (3) was maintained. The mean rank of the Crst correct match was 3.6 and the median was 1. The combination of the two similarity measures thus yielded the best results, providing a small but signiCcant additional reduction of 1.6 silhouettes in the mean number of silhouettes processed by the veriCer. Comparison of the distributions of all four cases in Fig. 8 shows that the largest contribution to the performance of the combined hypothesizer came from the part-based component. As viewpoint sampling resolution becomes sparse, the performance of any silhouette-based object recognition system is expected to decrease. To determine whether the hypothesizer still provides a beneCt under these conditions, ◦ we repeated the evaluation using views spaced 45 apart ◦ in azimuth instead of 22:5 apart. Thus, each test object was represented by 8 views in the database instead of 16. As expected, overall performance decreased; however, the hypothesizer still provided a large beneCt, as shown in Table 2. The part-based hypothesizer reduced the mean number of veriCcations by approximately 18, from 25 to 7.3, yielding a veriCcation-stage speedup of 3.42. The B.J. Super, H. Lu / Pattern Recognition 36 (2003) 69 – 78 combined method provided an additional reduction of 0.8 veriCcations. 5. Conclusion The aim of this paper was to quantitatively evaluate the use of a part-based shape retrieval engine as a hypothesizer for a silhouette-based object recognition system, using a performance measure and test database appropriate to the new application. We constructed a test database of 25 objects with a total of 400 views, and measured the speedup of the veriCcation stage obtained by using the hypothesizer. Substantial speedups were obtained relative to a recognition system without a hypothesizer, and relative to using a recognition system with a global-feature based hypothesizer. Best results were obtained by combining both global and local (part-based) information, with the greater contribution coming from the local information. Substantial speedups were also obtained on a database with sparse viewpoint sampling. [8] [9] [10] [11] [12] [13] Acknowledgements [14] The Crst author was supported in part by NIH grant 1-RO1 EY11747 during the period of this project. The Cgures are used by permission of CVRL and SPIE. [15] References [1] G. van der Heijden, M. Worring, Domain concept to feature mapping for a plant variety image database, in: A. Smeulders, R. Jain (Eds.), Image Databases and Multi-Media Search, World ScientiCc, New Jersey, 1997, pp. 301–308. [2] C. Faloutsos, R. Barber, M. Flickner, J. Hafner, W. Niblack, D. Petkovic, W. Equitz, EBcient and e2ective querying by image content, J. Intell. Inf. Syst. 3 (1994) 231–262. [3] M. Flickner, et al., Query by image and video content: the QBIC system. IEEE Comput. 28 (1995) 23–32. [4] X. Li, B.J. Super, Fast shape retrieval using term frequency vectors, Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries, Ft. Collins, Colorado, 1999, pp. 18–22. [5] B.J. Super, Visual shape retrieval using multiscale term distributions, Proceedings of the SPIE Conference on Storage and Retrieval for Media Databases, SPIE Proceedings, Vol. 3972, San Jose, CA, 2000, pp. 222–233. [6] B.J. Super, Fast retrieval of isolated visual shapes, in preparation. [7] F. Mokhtarian, S. Abbasi, J. Kittler, EBcient and robust retrieval by shape content through curvature scale space, [16] [17] [18] [19] [20] [21] [22] [23] [24] 77 in: A. Smeulders, R. Jain (Eds.), Image Databases and Multi-Media Search, World ScientiCc, New Jersey, 1997, pp. 51–58. S. Sclaro2, Distance to deformable prototypes: encoding shape categories for eBcient search, in: A. Smeulders, R. Jain (Eds.), Image Databases and Multi-Media Search, World ScientiCc, New Jersey, 1997, pp. 149–164. T.P. Wallace, P.A. Wintz, An eBcient three-dimensional aircraft recognition algorithm using normalized Fourier descriptors, Comput. Graphics Image Process. 13 (1980) 99–126. A.P. Reeves, R.J. Prokop, S.E. Andrews, F.P. Kuhl, Three-dimensional shape analysis using moments and Fourier descriptors, IEEE Trans. Pattern Anal. Mach. Intell. 10 (6) (1988) 937–943. P.J. Besl, R.C. Jain, Three-dimensional object recognition, Comput. Surv. 17 (1985) 75–145. B. Vijayakumar, D. Kriegman, J. Ponce, Invariant-based recognition of complex curved 3D objects from image contours, Comput. Vision Image Understanding 72 (3) (1998) 287–303. S. Abbasi, F. Mokhtarian, ABne-similar shape retrieval: application to multiview 3-D object recognition. IEEE Trans. Image Process. 10 (1) (2001) 131–139. A. Del Bimbo, P. Pala, Shape indexing by multi-scale representation, Image Vision Comput. 17 (1999) 245–261. Y. Lamdan, H.J. Wolfson, Geometric hashing: a general and eBcient model-based recognition scheme, Proceedings of the Second International Conference on Computer Vision, Tarpon Springs, FL, 1988, 238–249. F. Stein, G. Medioni, Structural indexing: eBcient 2-D object recognition. IEEE Trans. Pattern Anal. Mach. Intell. 14 (12) (1992) 1198–1204. F. Ulupinar, R. Nevatia, Shape from contour, IEEE Trans. Pattern Anal. Mach. Intell. 17 (2) (1995) 120–135. A. Laurentini, The visual hull concept for silhouette-based image understanding, IEEE Trans. Pattern Anal. Mach. Intell. 16 (2) (1994) 150–162. K.N. Kutulakos, S.M. Seitz, A theory of shape by space carving, Int. J. Comput. Vision 38 (3) (2000) 199–218. R. Jain, R. Kasturi, B. Schunck, Machine Vision, McGraw-Hill, New York, 1995. F. Mokhtarian, A. Mackworth, Scale-based description and recognition of planar curves and two-dimensional shapes, IEEE Trans. Pattern Anal. Mach. Intell. 8 (1) (1986) 34–43. D.D. Ho2man, W.A. Richards, Parts of recognition, Cognition 18 (1985) 65–96. A. Witkin, Scale space Cltering, Proceedings of the International Joint Conference on ArtiCcial Intelligence, Karlsruhe, Germany, 1983, pp. 1019 –1022. A. Califano, R. Mohan, Multidimensional indexing for recognizing visual shapes, IEEE Trans. Pattern Anal. Mach. Intell. 16 (4) (1994) 373–392. About the Author—BOAZ J. SUPER received the Ph.D. degree in Computer Science from the University of Texas at Austin in 1992. He then became a Research Associate and co-founder of the Center for Vision and Image Sciences at the University of Texas at Austin. In 1997 he joined the University of Illinois at Chicago where he is currently an Assistant Professor in the Department of Computer Science. 78 B.J. Super, H. Lu / Pattern Recognition 36 (2003) 69 – 78 Dr. Super’s current research interests in computer vision and visual perception include perceptual organization, shape matching, object recognition, and multimedia retrieval. About the Author—HAO LU received an M.S. degree in Electrical Engineering and Computer Science from the University of Illinois at Chicago in 2000 and an M.S. degree in Rhetorical and Technical Communication from Michigan Technological University in 1999. In 2000 he joined Tellabs in Lisle, Illinois where he is currently a software engineer in the Optical Networking Group Department. Hao Lu’s current work is focused on SONET and other broadband technologies.
© Copyright 2026 Paperzz