Header for SPIE use 3D Shape-based retrieval within the MPEG-7 framework Titus Zaharia and Françoise Prêteux ARTEMIS Project Unit Institut National des Télécommunications, 9, rue Charles Fourier, 91011 Evry Cedex email: [email protected], [email protected] ABSTRACT Because of the continuous development of multimedia technologies, virtual worlds and augmented reality, 3D contents become a common feature of the today information systems. Hence, standardizing tools for content-based indexing of visual data is a key issue for computer vision related applications. Within the framework of the future MPEG-7 standard, tools for intelligent content-based access to 3D information, targeting applications such as search & retrieval and browsing of 3D model databases, have been recently considered and evaluated. In this paper, we present the 3D Shape Spectrum Descriptor (3D SSD), recently adopted within the current MPEG-7 Committee Draft (CD). The proposed descriptor aims at providing an intrinsic shape description of a 3D mesh and is defined as the distribution of the shape index over the entire mesh. The shape index is a local geometric attribute of a 3D surface, expressed as the angular coordinate of a polar representation of the principal curvature vector. Experimental results have been carried out upon the MPEG-7 3D model database consisting of about 1300 meshes in VRML 2.0 format. Objective retrieval results, based upon the definition of a ground truth subset, are reported in terms of Bull Eye Percentage (BEP) score. Keywords: 3D shape-based retrieval, MPEG-7, similarity measures, 3D meshes, VRML, principal curvatures, shape spectrum, subdivision schemes. 1. INTRODUCTION The continuous development of multimedia software and hardware technologies leads to an increasing interest for the use of 3D content. Therefore, methods for efficient representation, coding and accessing such 3D data become a challenging issue of nowadays multimedia technologies. Most of the time, 3D data are represented by polygonal meshes (Figure 1) and the Virtual Reality Modeling Language (VRML) has become one of the most commonly used standard for representing such 3D 1 meshes . Basically, a 3D mesh is defined by a set of vertices and a face set. The vertex position is given by its coordinates in the 3D space, represented in a Cartesian coordinate system. A face is defined as an ordered sequence of vertex indices. Geometry of the mesh refers to the position of the vertices, while the mesh connectivity refers to relationships between vertices, edges and faces of the mesh. The mesh may also include some photometric information, such as color, texture and normal vectors. The present work proposes a descriptor for shape-based indexing and retrieval of 3D mesh models and therefore exploits exclusively the geometry and connectivity information. The concept of shape has been under consideration within various visual perception theories aiming at understanding the 2 way that humans perceive visual shape. The Gestalt psychology focuses on the study of 2D projections of the 3D world. In 3 his work on neuropsychological theory of behavior, Hebb claims that the form is not perceived as a whole but consists of parts. Moreover, the author points out that the relative spatial relations between parts must be learned in order to ensure a successful recognition. Among recent developments more closely related to computer vision applications, Koenderink and 4 van Doorn , propose a hierarchical approach for describing the evolution of shapes through multiple resolutions. Significant 5, 6 contributions related to modules of the human visual system have been made in Marr’s work . Such theories set the general framework and useful criteria for computer-vision object recognition applications. Nevertheless, there is no unique definition of the shape concept which solidly recovers a common and intuitive meaning including geometry connectivity, locality and globality properties. The notion of shape is involved in 3D object recognition schemes reported in the computer vision related literature. Let us first mention the global representations based on the normal vector distributions on the unit sphere, such as extended 7 8, 9, 10, 11, 12 Gaussian images (EGI) or other orientation information-based descriptors . Such approaches require to achieve an alignment procedure during the matching stage which increases the computational complexity of the matching process. 13, 14 Surface-based representations are also reported. Dorai and Jain propose to model surface elements as maximal patches of constant shape index for representing 3D objects acquired from range data. Authors recommend to use a graph matching-based approach for recognizing such objects. Identification of individual volumetric primitives derived from 15 superquadric equations, called parametric geons, is reported in . Such geons correspond to object parts that may be approximated by distinct elementary shapes, such as ellipsoids, cuboids and cylinders (that may be additionally tapered or 16 curved). Nastar et al. use deformable mesh manifolds for face-recognition purposes. Face images are represented as 3D surfaces and the principal component analysis is used for learning a set of representative deformations within the class of 17 face objects. Sclaroff and Pentland also achieve physically-based modeling techniques and modal matching, and measure shape similarity in terms of forces and strain, within a space of "extremal shapes". Volumetric-based approaches to 3D 18 object recognition include multiresolution binary pyramids , set of centres of maximal spheres and distance 19 transformations . (a) (e) (b) (c) (f) (g) Figure 1. Examples of 3D meshes (wireframe representation). (d) (h) When the shape concept is considered within similarity retrieval applications, descriptors should have reasonable size and should support simple similarity measures, enabling effective search/ browsing of very large databases. Thus, recognition methods involving search in high dimensional spaces or iterative matching and registration techniques become prohibitively expensive in terms of retrieval time for such applications. Previous work reported in the literature in the area of shape-based retrieval of 3D mesh models includes especially global shape representations. In 20, spherical moment functions are considered. In 21, several 3D shape descriptors are proposed, including bounding boxes, distributions of orientations and lengths of cord vectors connecting the gravity center of the object to the gravity center of each model face, statistical moments and a 3D wavelet-based representation. Authors deal with the issue of invariance to 3D rotations by representing the 3D object within its proper coordinate system defined by the three eigenvectors of the tensor of inertia, with labels x, y and z set accordingly to the decreasing order of the corresponding eigenvalues. The invariance to scale is then achieved by normalizing the 3D bounding box with axis parallel to the principal axis inertia. Such solutions to the issue of rotation and scale invariance have several limitations: (1) labeling the principal axis according to the decreasing order of the associated eigenvalues does not always guarantee the achievement of optimal spatial correspondences (for example, imagine a high glass and a flat cup on a horizontal table: in the case of the glass, the maximal eigenvalue corresponds to the vertical direction, while for the cup the maximal eigenvalue corresponds to a horizontal one; by defining the coordinate system using the above-mentioned approach, any further attempt to find similarities between the two objects would fail); (2) normalizing the objects to a unit-sized bounding box is highly sensitive to local deformations; (3) the selection of the coordinate system raises an additional mirroring ambiguity that is not straightforward to handle. We note that computer vision applications, such as object recognition, generally use low-level descriptors (features) without any related semantics. On the contrary, humans are often associating shape with some semantic concepts. To exemplify our comment, let us consider the concept of humanoid. An entire class of geometrical shapes with a high variability is recovered by this concept. In addition, an element of this class may have several appearances, since we are dealing here with an articulated object: a sitting man and a standing-up man are both humanoids. Are their associated geometrical shapes really similar? The answer depends on taking or not into account the relative positions of each component of the articulated object. Humans have the ability to decide if this information is relevant or not, depending on specific purposes. Thus, a human can recognize without any difficulty other humans, regardless the posture, and can also respond to specific targets such as gesture recognition where the interpretation of relative position information is an essential issue. Computers do not have any mechanism for deciding. Consequently, the specific objectives of such systems should be clearly specified from the beginning in order to adopt appropriate technical solutions. Reconsidering our example from a computer vision system point of view, local descriptors seem to be more adapted in recognizing articulated objects in different postures, while for gesture recognition applications, global representations are more appropriated. In this paper, our objective is to provide a descriptor for locally characterizing free-form surfaces represented as discrete polygonal 3D meshes. The surface-based approach is motivated by the sake of generality, since 3D meshes may include open surfaces that have not an associated volume. The requirements to be fulfilled relate to issues of invariance with respect to scale and Euclidean transforms, robustness with respect to different triangulations of the same object and ability of successfully retrieving articulated objects with different postures. Considering the key issue of standardizing tools for efficiently representating 3D content, the Moving Picture Expert Group 22 (MPEG) first addressed the issue of 3D mesh coding (3DMC). The recent MPEG-4 standard includes technology for both single-resolution and progressive coding of 3D mesh models. More recently, within the framework of the development of the 23 ,24, 25 future MPEG-7 standard , tools for content-based indexing and retrieval of 3D data have been considered and extensively evaluated. The 3D SSD has been under evaluation within the MPEG-7 Core Experiments (CEs) and finally adopted into the MPEG-7 CD, which is the kernel of the future standard at the current stage of the standardization process. 26, 27, 28, 29, 30, 31 The results presented in this paper are those obtained and reported within the framework of the MPEG-7 3D Shape CE over 5 successive MPEG meetings (October 1999 - July 2000). The paper is organized as follows. Section 2 describes the 3D model database and the associated categorization used for evaluating 3D shape descriptors within the MPEG-7 standardization process. In section 3, we introduce the mathematical definition of the proposed 3D SSD, within a continuous framework. Section 4 deals with the extraction of the 3D SSD from discrete polygonal meshes. We first analyze the issue of 3D object representation in terms of uniqueness, regularity of polygonal faces distribution and smoothness of polyhedral mesh surfaces. Then, we deduce a set of geometrical and topological properties that define the so-called minimal and regular mesh representation. Finally, the extraction method of the 3D SSD is described in details. Experimental results are reported and discussed in Section 5. Section 6 concludes the paper and opens perspectives for future work. 2. THE 3D TEST DATA SET Evaluating 3D shape descriptors requires the availability of a sufficiently large 3D model test data set. Existing commercial 32 3D data repositories, such as the Viewpoint collection , are still expensive for our academic purposes. First, we have used two databases already existing within MPEG: (1) The MPEG-4 3D model data set, consisting of 293 models previously used within the MPEG-4 3DMC CEs33, and (2) The "Letters" data set consisting of 50 3D models representing letters from "A" to "E" and donated for the 3D Shape MPEG-7 CEs by IBM Japan34. Each letter is represented here by 10 differently triangulated 3D meshes (Figures 1.a and b show two examples for letter "B"), the "Letters" data set being specifically created for testing the robustness of 3D shape descriptors with respect to mesh triangulation. We have then gathered 3D mesh models freely available over the Internet, specifically from the 3D Cafe web site35. As the initial models were heterogeneously represented under various 3D formats (such as 3DS, LWO, DXF, COB, ...) our initial work focused on converting these models within VRML2.0, in order to obtain a data set of meshes represented within an unique format. The converted models have been carefully examined and a lot of work has been spent on recovering models exhibiting fully or partial orientation problems or models with broken connectivity (see Section 4 for a synopsis of representation problems). Finally, 947 corrected 3D models have been selected for our purposes. The resulting data set includes thus 1300 3D mesh models represented in VRML2.0 format. Objectively evaluating similarity retrieval results also requires to define an appropriate ground truth. Therefore, a subset consisting of 228 models has been selected and categorized into 15 distinct categories, listed in Table 1. "4 LIMBS" (31) CARS (17) AERODYNAMIC (36) TREES (21) MISSILES (10) BALLOONS (7) BUILDINGS (10) SOMA (7) E1_Mx (9) FINGER (30) LETTER A (10) LETTER B (10) LETTER C (10) LETTER D (10) LETTER E (10) Table 1. The categorization of a subset of the 3D database, used as ground truth. The number of items within each category is specified under brackets. The proposed categorization, approved by the MPEG-7 group aims at taking into account the local geometrical similarity of shapes. Thus, models such as humans, animals, monsters, aliens have been merged within the "4-Limbs" category, while the "Aerodynamic" category includes models of airplanes, helicopters and fishes. E1_Mx gather some engine components, while "Soma" includes nine models made of cubic components in various relative positions. The "Balloons" category includes 7 mesh models of one ore more spherical objects with various triangulations, while the "Cars", "Trees" and "Buildings" categories have been created upon an exclusively semantic basis. 3. THE 3D SHAPE SPECTRUM DESCRIPTOR The 3D SSD aims at providing an intrinsic shape description of 3D mesh models, by exploiting some local geometrical attributes of the 3D surfaces. The 3D SSD has been previously used for indexing and retrieval of 2D images36 and 3D range data13. The shape index, first introduced by Koenderink37 is defined as a function of the two principal curvatures. Let p be a point on a regular38 3D surface S and let k 1p and k p2 denote the principal curvatures38, 39 associated with point p. The shape index at point p, denoted by Ip, is defined as expressed in Equation 1: k 1p + k p2 1 1 I p = − arctg 1 , 2 π k p − k p2 with k 1p ≥ k p2 . (1) The shape index, is a local geometrical attribute of a 3D surface, expressed as the angular coordinate of a polar representation of the principal curvature vector. The shape index ranges in the interval [0,1] and is not defined for planar surfaces. The shape index provides a scale for representing salient elementary shapes such as convex, concave, rut, ridge and saddle (Figure 2), and is invariant with respect to scale and Euclidean transforms. 2 2 1.9 1.9 1.8 1.8 1.7 1.7 1.6 1.6 1.5 1.5 1.4 50 1.4 50 1.8 1.5 1.5 1.7 1.6 1.5 1 1 0.5 0.5 1.4 1.3 1.2 40 50 30 40 30 20 30 40 30 20 10 10 0 1 50 50 20 20 10 1.1 40 10 0 0 a. Spherical cup (0.0) 0 b. Rut (0.25) 0 50 40 50 30 40 30 20 20 10 10 0 0 c. Minimal Saddle (0.5) 0 50 40 50 30 40 10 0 0 d. Ridge (0.75) 40 20 10 10 0 30 20 20 10 50 30 40 30 20 0 e. Spherical cap (1.0) Figure 2. Elementary shapes and their corresponding shape index (under brackets). Let us recall that the principal curvatures are defined as the eigenvalues of the Weingarten map (W) given by the following expression: W = I −1 II , (2) where I and II denote respectively the first and the second fundamental differential forms. Considering a Cartesian coordinate system and a Monge parametrization S = ( x , y , z ) = ( x , y , f ( x , y )) of the surface, with f -C2 differentiable, the fundamental differential forms I and II can be expressed as symmetric and positive semi-definite matrices, and are given by the following equations: S ,S x x I = S x ,S y S x , S y 1 + f x2 = S y ,S y f x f y S ,N fx fy xx and II = 1 + f y2 S xy , N S xy , N f xx 1 = S yy , N 1 + f x2 + f y2 f xy f xy , f yy (3) where N denotes the normal vector to the surface at the considered surface point p, and Sxy is the standard Monge notation for the partial derivatives of the surface S with respect to variables x and y. Here, notations used correspond to : ∂ 2S ∂ 2S ∂ 2S ∂S ∂S (4) , S yy = 2 . Sx = , Sy = , S xx = 2 , S xy = ∂x∂y ∂x ∂y ∂x ∂y The shape spectrum of the 3D mesh is defined as the distribution of the shape index calculated over the entire mesh. We note that the above definitions refer to surfaces in the continuous space ℜ 3 . The extraction of the 3D SSD from discrete, polyhedral 3D meshes is described in details in the following section. 4. ESTIMATION OF THE 3D SHAPE SPECTRUM DESCRIPTOR VRML data are initially intended for graphics purposes. Hence, the first step when applying differential geometry-based analysis is to transform such rough 3D data into some useful geometrical surfaces. 4.1. 3D MESH FILTERING Because the 3D SSD is derived from a surface-based analysis, the representation of a 3D object is a crucial issue. Indeed, the 3D SSD should not be sensitive to the mesh representation. When considering arbitrary mesh models, we have to deal with: • non-orientable meshes, • non-uniqueness of the representation, • degenerated or duplicated polygons, • presence of sharp features, • irregularly sampled meshes. Let us introduce several definitions, useful in the further developments. Two different vertices, successive (modulo cyclical permutations) within the sequence of vertices defining the faces of the mesh constitute an edge. An edge belonging to a single face of the mesh is called a border edge. Border faces are defined as faces including at least a border edge. A non border-edge belongs to at least 2 different triangles and is called an internal edge. Two faces are said to be E-neighbor faces if they share at least a common edge. A sub-set of mesh faces is connected if between each two component faces can be found a path of successive E-neighbor faces. Hereafter, we will denote by connected components the maximal connected sub-sets that can be defined with the faces of the mesh. In order to illustrate the concept of orientable mesh, let us consider the example in Figure 3. With each triangle of the mesh, we associate a normal vector with a direction given by the traversal order of the triangle vertices (Figure 3.a). A mesh is orientable if each two E-neighbor faces have consistent orientations. Figure 3.b shows a non-orientable mesh: the common edge (v1, v2) is traversed in the same direction in the two neighbor triangles. Figure 3.c illustrate the concept of orientable mesh: the common edge (v1, v2) is traversed in opposite directions in the two neighbor triangles. If a mesh has at least two E-neighbor faces with different orientations, the mesh is considered non-orientable. We note that an edge of an orientable surface will belong to at most two faces. N v0 v0 v0 v2 v2 v2 v1 v1 v1 v3 v3 (a) (b) (c) Figure 3. The concept of orientable mesh. The issue of non-uniqueness of the representation (i.e. visually identical 3D meshes that correspond to completely different geometrical 3D surfaces) is illustrated in Figure 4. v3 v3 v12 v4 2 v0 v1 3 v6 v2 1 v7 v1 v6 v 8 v10 4 v0 3 v2 v7 v9 1={v0, v1, v2}, 2={ v1, v4, v3, v2}, 3={ v6, v7, v3, v4}, 4={v7, v6, v5} v1 v8 2 v2 v7 1 v9 v6 v 8 v9 v0 3 v2 3 v1 4 v6 v5 v5 1={v0, v1, v2}, 2={ v6, v4, v3, v7}, 3={ v8, v9, v13, v12}, 4={v11, v10, v5} v3 v4 2 1 v11 4 v5 v3 v4 2 v0 1 v13 v4 1={v0, v1, v2}, 2={ v2, v1, v4, v3}, 3={ v6, v7, v3, v4}, 4={v9, v8, v5} v7 4 v5 1={v0, v1, v2}, 2={ v1, v4, v3, v2}, 3={ v6, v7, v9, v8}, 4={v7, v6, v5} (a) (b) (c) (d) Figure 4. A mesh and its different topological representations. Thicker line segments illustrate cuts. Each of the orientable meshes shown in Figure 4 represents the same physical object, consisting of 4 faces (denoted by 1, 2, 3 and 4, and defined in Figure 4), with the same orientation. A VRML viewer will display them identically. Nevertheless, from a geometrical point of view, these meshes are completely different. The mesh (a) corresponds to a minimal representation, in the sense that it is composed of a single connected component. Meshes (b-d) are obtained from the mesh (a) by cutting the former in several ways (a cut is the duplication of an edge and of its corresponding vertices). Thus, the mesh (b) is represented as a "puzzle" of completely non-connected polygons (the number of connected components of this mesh is equal to the number of polygons). Meshes (c) and (d) are made of two different connected components, defined by the faces (1, 2 and 3) and (4) for mesh (c), and faces (1, 2) and (3, 4) for mesh (d). Analyzing such surfaces in terms of shape index (or more generally, in terms of local characteristics), will yield completely different results. In order to illustrate this issue, let us first define two different types of edges of interest within an orientable 3D mesh (Figure 5). An internal edge is said to be a rut edge (resp. a ridge edge) if the two faces containing it form an angle, within the the reunion of the two demi-spaces pointed-out by the normal vectors, less (resp. greater) than π. Such rut and ridge edges, together with the associated faces, represent discrete versions of surfaces presented in Figures 2.b and d, respectively. θ >π θ <π Figure 5. Rut and ridge edges. Considering the meshes in our example, mesh (a) includes two rut edges, (v1, v2) and (v6, v7) and a ridge edge (v3, v4). Mesh (b) contains exclusively border edges. Mesh (c) includes a rut edge (v1, v2) and a ridge edge (v3, v4) . Finally, mesh (d) includes two rut edges, (v1, v2) and (v6, v7). The possible presence of duplicated or degenerated polygons (polygons of zero area) also affects both the global (such as gravity center, axes of inertia, statistical moments) and local characteristics of the considered 3D surface. This analysis shows the necessity of a pre-filtering stage aiming at bringing, whenever possible, 3D meshes to a minimal and regular topological representation, in terms of orientable surfaces, a minimal number of connected componnets and without degenerated or duplicated polygons. In our work, a preliminary regularizing filtering40, 41 of the 3D mesh models has been applied. Related approaches for repairing 3D mesh models to form manifold surfaces and solid regions are reported in42, 43, 44. However, none of the existing approaches guarantees the correctness of the resulting surfaces in terms of surface orientation consistency of all the connected components. How to obtain models with normal vectors of each connected component pointing towards the exterior of the surface? We solved this issue as follows: first, 3D meshes have been visually inspected and the connected components with reversed orientation have been identified. Then, the traversal order of each face within the previously determined connected components has been reversed. This semi-automatic procedure recovers efficiently 3D models exhibiting such orientation degeneracies. Concerning the irregularly sampled meshes, for a given 3D model, the face area and mean elongation ratio distributions may widely vary (Figure 6.a). In addition, such a 3D mesh shows sharp features, violating the smoothness hypothesis necessary when considering surface derivatives. (a) (b) (c) Figure 6. Initial Model "a00" (a) and its refined versions after one (b) and two (c) subdivision levels. In order to obtain smoother representations of such 3D meshes, a pre-processing step consisting of a mesh subdivision algorithm based on Loop’s subdivision scheme45 is applied. The subdivision algorithm consists of two successive steps: (1) mesh re-sampling by mid-edge vertex insertion (Figure 7), and (2) low pass filtering of the vertex coordinates. By choosing appropriate filter coefficients, the limit surface can be shown to be tangent plane smooth46. In addition, such a subdivision scheme has the advantage of drastically reducing the relative area of the border faces. Let us note that such a scheme requires a preliminary triangulation of the mesh models. Figure 7. Mesh re-sampling by mid-edge insertion. Figures 6.b and c illustrates the effect of applying such a subdivision schemes to model "a00". 4.2. ESTIMATION OF THE PRINCIPAL CURVATURES Estimating the principal curvatures is the key step of the 3D SSD extraction. Indeed, the 3D SSD performances strongly depend on the accuracy of estimates and the robustness of the estimation technique with respect to the triangulation of the 3D model. Our approach is based upon a second degree polynomial surface fitting. In our approach, the principal curvatures are associated with each face of the mesh. First, the mean normal vector to face fi of ~ the mesh, denoted by N f i , is defined as the weighted average of the normal vectors of the 0-adjacent faces of the considered face fi (Equation 5). Two faces of the mesh are said to be 0-adjacent if and only if they share a common vertex. ∑ ~ N fi = { } f k ∈F0 fi ∑ { } f k ∈F0 fi Here, wk N f k . (5) wk N f k { } denotes the set of 0-adjacent faces of fi, N f k denotes the normal vector to face fk, wk is the weighting coefficient F0 f i associated with the 0-adjacent face fk, and ⋅ the L2 norm of a vector in ℜ 3 . Each weighting coefficient wk is set to equal the area of the face fk, in order to take into account some irregularities in the mesh face distribution. A local Cartesian coordinate system (x, y, z) is defined such that its origin coincides with the gravity center of the face fi and N ~ the z-axis is associated with the previously defined mean normal vector N f i . Let ( x i , y i , z i ) denote the cloud of points { }i =1 made of the centroids of the considered face fi and its 0-adjacent faces, with coordinates expressed in the local coordinate system. { }iN=1 . The parametric surface approximation is achieved by fitting a quadric surface through the set of points ( x i , y i , z i ) Let S = ( x , y , z ) = ( x , y , f a ( x , y )) denote the second degree polynomial surface, expressed as: f a ( x , y ) = a 0 x 2 + a1 y 2 + a 2 xy + a 3 x + a 4 y + a 5 , (6) where the ai’s are real coefficients. By denoting a = (a 0 a1 a 2 a 3 a 4 a 5 ) and b( x , y ) = ( x y xy x y 1) , Equation 6 can be rewritten more compactly by using standard matrix notations: f a ( x , y ) = a t b( x , y ) . (7) t 2 2 t The parameter vector a = (a 0 a1 a 2 a 3 a 4 a 5 ) t is determined by applying a least square fit procedure. Given the data points denoted by {( x i , y i , z i )}i =1 , with associated weights {wi }i =1 , the parameter vector â corresponding to the optimal (in the N N weighted mean square error sense) fit is expressed as: N a = ∑ w i b ( x i , y i ) b t ( x i , y i ) i =1 −1 N N ∑ wi z i b( x i , y i ) = arg min ∑ wi ( z i − f a ( x i , y i )) 2 . i =1 a ∈ℜ 6 i =1 Note that the representation of the gravity centers {( xi , yi ,zi )}iN=1 in (8) local coordinates guarantees the invariance of the approximation with respect to Euclidean transforms. Once the parametric approximation available, the principal curvatures k1 and k2 can be easily determined as the eigenvalues of the Weingarten map W, since the first and second differential forms I and II at (x, y) = (0, 0) take simple expressions, given in Equation 9. 1 + a 32 a 3 a 4 a0 a2 1 (9) , II = I = . 2 a 3a 4 1 + a 4 1 + a 32 + a 42 a 2 a1 We finally note that it is also possible to consider the set { } of the n-adjacent faces (n>0). Such a set is recursively Fn f i defined from F0 { f i } by considering successively wider neighborhoods. In this case, the computational complexity increases proportionally to the number of n-adjacent faces without providing a significant improvement of the principal curvature final estimation. 4.3. 3D SSD COMPUTATION The 3D SSD of a 3D mesh M is defined as the histogram of the shape index values, calculated over the entire mesh. The histogram is represented on a Nbins number of bins, uniformly quantizing the range of the 3D shape index values (the [0, 1] k −1 k N bins interval) into a set of Nbins intervals, {∆ k }k =1 , where ∆ k = , , for k = 1, N bins − 1 and N bins N bins N −1 ∆ Nbins = bins , 1 . The relative (with respect to the total area of the mesh) area of each face of the mesh of shape index N bins belonging to the interval ∆ k is added to the kth component of the 3D SSD. Two additional components, denoted by PlanarSurface and SingularSurfaces, and representing the relative surfaces of planar and border patches, respectively, are introduced in the 3D SSD. The planar surfaces must be separately considered since the 3D shape index is not defined for planar patches. Let ka denote the L2 norm of the curvature vector (k1, k2): k a = k 12 + k 22 . We define the degree of curvedness of the surface as C = Area(F0 {f i }) * k a . Each face with curvedness C less than a threshold T, will be considered as part of a planar patch and its relative area will increment the PlanarSurface component of the 3D SSD. Unreliable estimates of the principal curvatures may occur in the case of border faces with a number of 0-order adjacent faces smaller than a pre-defined number Nmin. Hence, the border faces are not taken into account when computing the shape index and their relative area is stored into the SingularSurfaces component. 5. EXPERIMENTAL RESULTS The 3D SSD has been applied to the 3D test data previously described (Section 2). For achieving 3D shape-based similarity retrieval applications, we used L1 or L2 -based distances as similarity measures between the 3D-SSDs associated with meshes under study. Such similarity measures may, or not, take into account the non-planar surfaces or the singular surfaces, depending on each targeted application. In our experiments, the singular and planar histogram bins have not been taken into N bins account, and the 3D SSD has been re-normalized such that ∑ SSD(i ) = 1 . i =1 Figures 8 and 9 show some examples of similarity retrieval corresponding to queries from the "Cars" (8.a) "Aerodynamic" (8.b) "4-Limbs" (9.a and 9.b) and "Trees" (9.c) categories. The first 6 retrieved results are presented according to a decreasing similarity order, from left to right and from top to bottom. In the upper left cell the query is presented. (a) (b) Figure 8. Similarity retrieval results for queries within the "Cars" (a) and "Aerodynamics" (b) categories. (a) (b) (c) Figure 9. Similarity retrieval results for queries within the "4-Limbs" (a, b) and "Trees" (c) categories. Figures 9.a and b show that nice results are obtained for articulated objects in the "4-Limbs" category, where elements in various postures are retrieved (the humanoid with arms and legs spread-out in the second row of Figure 9.a, the sitting dog in Figure 9.b). For objective evaluation of the proposed 3D SSD in terms of retrieval performances, we compute the Bull-Eye Percentage (BEP) score. The BEP is defined for each query as the percentage of correct matches (i.e. items within the same category as the query) among the 2Q top retrieved results, where Q denotes the size of the considered query category. After computing the BEP scores for each categorized item, an average BEP is computed for each category. BEP scores for each category are presented in Table 2 (here, two levels of mesh subdivision have been applied prior to the computation of the 3D SSD). Category Q "4 LIMBS" CARS AERODYNAMIC TREES MISSILES BALLOONS BUILDINGS SOMA BEP Score (%) 31 17 36 21 10 7 10 7 Category 73 E1_Mx 66 FINGER 71 LETTER A 56 LETTER B 87 LETTER C 95 LETTER D 39 LETTER E 100 Global mean BEP score : 85% Q BEP Score (%) 9 30 10 10 10 10 10 100 100 90 100 100 100 100 Table 2. Similarity retrieval rates. Here, Q denotes the number of items within each category. The global mean score of 85% demonstrates the good discriminatory properties of the 3D SSS. A more detailed study of the results shows excellent BEP scores, between 90% and 100%, for shape property-based categories. Good BEP scores (between 70% and 90%) are provided when dealing with categories mixing reasonably shape properties together with semantic concepts ("Missiles", "4-Limbs" and "Aerodynamic" categories), while decreasing performances are reported for categories mainly based on semantic concepts, such as the "Buildings" or "Trees" categories. Robustness with respect to mesh triangulation is demonstrated by the high retrieval rates achieved on the "Letters" corpus. We mention that applying the mesh subdivision technique described in Section 4 (with two levels of subdivision) increased the global mean BEP score of 10%, with respect to the score obtained without preliminary subdivision. In order to decrease as much as possible the complexity of the 3D SSD, we study the effects of quantizing the 3D SSD and of decreasing the number of histogram bins of the representation, with respect to the similarity retrieval related performances. The 3D SSD values range within the [0, 1] interval. We perform a uniform quantization of the [0, 1] interval with a number of quantization steps N = 2 b , where b denotes the number of bits of the representation. The retrieval results are listed in Table 3 for floating point precision (FPP) and coarser representations, corresponding to values of b varying from 12 bits down to 7 bits. Global Mean BEP Score (%) FPP 85 b=12 84 b=11 84 b=10 84 b=9 84 b=8 83 b=7 82 Table 3. Similarity retrieval rates with respect to the number of quantization bits. Experimental results show that a number of 12 bits for the quantization values yields hit rates approaching the floating point representation performances. More compact representations, down to 8 - 7 bits are also possible with 3% degradation of the mean retrieval rate. In order to evaluate the retrieval rate degradation with respect to the number of histogram bins (Nbins) used for the 3D SSD representation, we have computed the hit rates corresponding to 4 different numbers of bins (100, 50, 25 and 10). Results are presented in Table 4. Here, the number of quantization bits of the descriptor values is b=12. Nbins Global Mean BEP Score (%) 100 84 50 84 25 83 10 80 Table 4. Similarity retrieval rates with respect to the number of histogram bins for b=12. Scores listed in Table 4 show that good results are obtain in a range varying from 100 down to 10 histogram bins, with a degradation of the mean retrieval rate of only 5%. Thus, a compact representation of the 3D shape spectrum descriptor can be used while ensuring optimal retrieval rates. The locality of the descriptor is a useful feature for characterizing the local shape information, but could raise some scalability problems when dealing with large amount of 3D data. However, the compactness achieved by quantizing the descriptor values and decreasing the number of histogram bins, makes it possible to foresee some future extensions, aiming at embedding the 3D SSD into some more global representations (e.g. within a histogram refinement framework). 6. CONCLUSION This paper presents a surface-based approach for indexing and similarity retrieval of 3D content represented by 3D mesh models, based upon the characterization of the local shape of a 3D geometric surface. After analyzing the representation and sampling problems that may occur when dealing with discrete 3D surfaces, solutions are proposed and a complete extraction method of the 3D SSD from polygonal meshes is discussed in details. Applied to a 3D data set donated to the MPEG-7 community for search and retrieval experiments, results then obtained are reported and discussed. The objective evaluation of the descriptor, based upon a ground truth of 15 categories including 228 3D meshes, shows a Bull-Eye mean retrieval rate of 85%. The analysis of effects of quantization and number of histogram bins of the representation shows that the proposed approach provides a very compact descriptor of minimum size of 100 bits/ 3D mesh model, allowing fast browsing and search of 3D model databases. Future work will address the issue of combining such a simple local descriptor with some global representation schemes. 7. REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. International standard "The Virtual Reality Modeling Language", ISO/ IEC 14772-1, 1997. L. Zusne, "Contemporary theory of visual form perception: III. The global theories", Academic Press, 1970. D. O. Hebb, "The organization of behavior", John Wiley, 1949. J. Koenderink and A. Van Doorn, "Dynamic Shape", Biological Cybernetics, Vol. 53, pp. 383-396, 1986. D. Marr, "Early processing of visual information", Proc. of the Royal Society of London, B275, pp. 483-519, 1976. D. Marr, "Vision", Freeman, San Francisco, 1982. B.K.P Horn, "Extended Gaussian Image", Proc. of the IEEE, Vol. 72, pp. 1671-1686, 1984. S.B. Kang, K. Ikeuchi, "The complex EGI: a new representation for 3D pose determination", IEEE Trans. on PAMI, vol. 16, No. 3, pp. 249-258, 1994. H. Matsuo, A. Iwata, "3D Object Recognition using MEGI model from range data", Proc. 12th International Conference on pattern Recognition, pp. 843-846, 1994. Y. Li, R. J. Woodham, "Orientation-based representations of 3D shape", Proc. IEEE Conference on Computer Vision and Pattern Recognition, pp. 182-187, June 1994. V.S. Nalwa, "Representing oriented piecewise C2 surfaces", International Journal of Computer Vision, Vol. 3, pp. 131153, 1989. P. Liang, C. H. Taubes, "Orientation-based differential geometric representations for computer vision applications", IEEE Trans. on PAMI, Vol. 16, No. 7, pp. 707-721, 1993. C. Dorai, A.K. Jain, "Shape Spectrum-Based View Grouping and matching of 3D Free-Form Objects", IEEE Trans. on PAMI, Vol. 19, No. 10, pp. 1139-1145, 1997. Dorai and A.K. Jain, "Shape-Spectra Based View Grouping for Free-Form Objects", Int’l. Conf. Image Processing '95, Washington, D.C., pp. 340-343, Oct. 1995. K. Wu, M.D. Levine, "Recovering of parametric geons from multiview range data", IEEE Conference on Computer Vision and Pattern Recognition, 1994. C. Nastar, B. Moghaddam, A. Pentland, "Generalized image matching: statistical learning of physically-based deformations", M.I.T. Technical Report No. 368, 1996. 17. S. Sclaroff, A. Pentland, "Physically-based combinations of views: Representing rigid and nonrigid motion", Work. Motion of Non-rigid & Articulated Objects, pp. 158-164, Austin, 1994. 18. G. Borgefors, G. Ramella, G.Sanniti di Baja, S. Svensson, "On the multiscale representation of 2D and 3D shapes", Graphical Models and Image Processing 61, pp. 44-62, 1999. 19. G. Borgefors, I. Nyström, "Efficient shape representation by minimizing the set of centres of maximal discs/ spheres", Pattern Recognition Letters, Vol. 18, pp. 465-472, 1997. 20. T. Murao, "Descriptors of polyhedral data for 3D-shape similarity search", Proposal P177, MPEG-7 Proposal Evaluation Meeting, Lancaster, UK, February 1999. 21. E. Paquet, M. Rioux, A. Murching, T. Naveen, A. Tabatabai, "Description of shape information for 2-D and 3-D objects", Signal Processing: Image Communications, Vol. 16, pp. 103-122, 2000. 22. International standard ISO/ IEC14496," Information technology - Coding of audio-visual objects". 23. "Overview of the MPEG-7 Standard", ISO/ IEC JTC1/ SC29/ N3752, October 2000, La Baule, France. 24. "MPEG-7 Requirements Document v. 12", ISO/ IEC JTC1/ SC29/ N3548, July 2000, Beijing, China. 25. "MPEG-7 Applications , Demos and Projects", ISO/ IEC JTC1/ SC29/ N3546, July 2000, Beijing, China. 26. T.Zaharia, F. Preteux, M. Preda, "3D Shape spectrum descriptor", ISO/IEC JTC1/SC29/WG11, MPEG99/M5242 , Melbourne, Australia, October 1999. 27. T. Zaharia, F. Preteux, "3D Shape descriptors: Results and performance evaluation" ", ISO/IEC JTC1/SC29/WG11, MPEG99/ M5592, Maui, December 1999. 28. T. Zaharia, F. Preteux, "Crosscheck of the 3D shape spectrum descriptor", ISO/IEC JTC1/SC29/WG11, MPEG00/ M5917 , Noordwijkerhout, The Netherlands, March 2000. 29. T. Zaharia, F. Preteux, "3D Shape Core Experiment: The influence of mesh representation", ISO/IEC JTC1/SC29/WG11, MPEG00/ M6103, Geneva, June 2000. 30. T. Zaharia, F. Preteux, "3D Shape Core Experiment: Semantic versus geometric categorization of 3D mesh models", ISO/IEC JTC1/SC29/WG11, MPEG00/M6104, Geneva, Switzerland, June 2000. 31. T. Zaharia, F. Preteux, "The influence of the quantization step on the 3D shape spectrum descriptor performances", ISO/IEC JTC1/SC29/WG11, MPEG00/M6316, Beijing, China, July 2000. 32. http://www.viewpoint.com. 33. Frank Bossen (Editor), "Description of Core Experiments on 3D model coding", ISO/IEC JTC1/SC29/WG11, M4312, December 1998, Rome. 34. T. Murao, E. Paquet "A Report of Results in 3D Shape Descriptor Core Experiment Stage One", ISO/IEC JTC1/SC29/WG11, MPEG99/M5021, Melbourne, Australia, October 1999. 35. http://www.3dcafe.com. 36. C. Nastar, "The Image Shape Spectrum for Image Retrieval", INRIA Technical Report RR-3206, July 1997. 37. J. Koenderink, "Solid shape", The MIT Press, Cambridge, Massachusetts, 1990. 38. M. Do Carmo, "Differential geometry of curves and surfaces", Prentice-Hall Inc., Englewood Cliffs, New Jersey, 1976. 39. M. Spivak, "A comprehensive introduction to differential geometry", 2nd ed., Houston: Publish or Perish, 1979. 40. P. Hoogvorst, "Filtering of the 3D mesh models", ISO/IEC JTC1/SC29/WG11, MPEG00/ M6101, Geneva, June 2000. 41. A. Gueziec, G. Taubin, F. Lazarus, W. Horn,"Converting Sets of Polygons to Manifold Surfaces by Cutting and Stitching", IEEE Visualization'98, 1998. 42. T.M. Murali, Thomas A. Funkhouser, "Consistent solid and boundary representations", Computer Graphics (1997 SIGGRAPH Symposium on Interactive 3D Graphics), pp. 155-162, March 1997,. 43. G. Butlin, C. Stops, "CAD data repair", Proc. of the 5th Int. Meshing Roundtable, October 1996. 44. G. Barequet, S. Kumar, "Repairing CAD models", Proc. of IEEE Visualization’97, October 1997. 45. C. T. Loop, "Smooth subdivision surfaces based on triangles", Master’s Thesis, Dep. of Mathematics, Univ. of Utah, August 1987. 46. M. Lounsberry, T. DeRose, J. Warren, "Multiresolution analysis for surfaces of arbitrary topological type", Technical Report No. 93-10-05b, Dept. of CS & Eng., Univ. of Washington, January 1994.
© Copyright 2025 Paperzz