3D Shape-based retrieval within the MPEG-7

Header for SPIE use
3D Shape-based retrieval within the MPEG-7 framework
Titus Zaharia and Françoise Prêteux
ARTEMIS Project Unit
Institut National des Télécommunications,
9, rue Charles Fourier, 91011 Evry Cedex
email: [email protected], [email protected]
ABSTRACT
Because of the continuous development of multimedia technologies, virtual worlds and augmented reality, 3D contents
become a common feature of the today information systems. Hence, standardizing tools for content-based indexing of visual
data is a key issue for computer vision related applications. Within the framework of the future MPEG-7 standard, tools for
intelligent content-based access to 3D information, targeting applications such as search & retrieval and browsing of 3D
model databases, have been recently considered and evaluated.
In this paper, we present the 3D Shape Spectrum Descriptor (3D SSD), recently adopted within the current MPEG-7
Committee Draft (CD). The proposed descriptor aims at providing an intrinsic shape description of a 3D mesh and is defined
as the distribution of the shape index over the entire mesh. The shape index is a local geometric attribute of a 3D surface,
expressed as the angular coordinate of a polar representation of the principal curvature vector.
Experimental results have been carried out upon the MPEG-7 3D model database consisting of about 1300 meshes in VRML
2.0 format. Objective retrieval results, based upon the definition of a ground truth subset, are reported in terms of Bull Eye
Percentage (BEP) score.
Keywords: 3D shape-based retrieval, MPEG-7, similarity measures, 3D meshes, VRML, principal curvatures, shape
spectrum, subdivision schemes.
1. INTRODUCTION
The continuous development of multimedia software and hardware technologies leads to an increasing interest for the use of
3D content. Therefore, methods for efficient representation, coding and accessing such 3D data become a challenging issue
of nowadays multimedia technologies. Most of the time, 3D data are represented by polygonal meshes (Figure 1) and the
Virtual Reality Modeling Language (VRML) has become one of the most commonly used standard for representing such 3D
1
meshes . Basically, a 3D mesh is defined by a set of vertices and a face set. The vertex position is given by its coordinates in
the 3D space, represented in a Cartesian coordinate system. A face is defined as an ordered sequence of vertex indices.
Geometry of the mesh refers to the position of the vertices, while the mesh connectivity refers to relationships between
vertices, edges and faces of the mesh. The mesh may also include some photometric information, such as color, texture and
normal vectors.
The present work proposes a descriptor for shape-based indexing and retrieval of 3D mesh models and therefore exploits
exclusively the geometry and connectivity information.
The concept of shape has been under consideration within various visual perception theories aiming at understanding the
2
way that humans perceive visual shape. The Gestalt psychology focuses on the study of 2D projections of the 3D world. In
3
his work on neuropsychological theory of behavior, Hebb claims that the form is not perceived as a whole but consists of
parts. Moreover, the author points out that the relative spatial relations between parts must be learned in order to ensure a
successful recognition. Among recent developments more closely related to computer vision applications, Koenderink and
4
van Doorn , propose a hierarchical approach for describing the evolution of shapes through multiple resolutions. Significant
5, 6
contributions related to modules of the human visual system have been made in Marr’s work . Such theories set the general
framework and useful criteria for computer-vision object recognition applications. Nevertheless, there is no unique definition
of the shape concept which solidly recovers a common and intuitive meaning including geometry connectivity, locality and
globality properties.
The notion of shape is involved in 3D object recognition schemes reported in the computer vision related literature. Let us
first mention the global representations based on the normal vector distributions on the unit sphere, such as extended
7
8, 9, 10, 11, 12
Gaussian images (EGI) or other orientation information-based descriptors
. Such approaches require to achieve an
alignment procedure during the matching stage which increases the computational complexity of the matching process.
13, 14
Surface-based representations are also reported. Dorai and Jain
propose to model surface elements as maximal patches of
constant shape index for representing 3D objects acquired from range data. Authors recommend to use a graph
matching-based approach for recognizing such objects. Identification of individual volumetric primitives derived from
15
superquadric equations, called parametric geons, is reported in . Such geons correspond to object parts that may be
approximated by distinct elementary shapes, such as ellipsoids, cuboids and cylinders (that may be additionally tapered or
16
curved). Nastar et al. use deformable mesh manifolds for face-recognition purposes. Face images are represented as 3D
surfaces and the principal component analysis is used for learning a set of representative deformations within the class of
17
face objects. Sclaroff and Pentland also achieve physically-based modeling techniques and modal matching, and measure
shape similarity in terms of forces and strain, within a space of "extremal shapes". Volumetric-based approaches to 3D
18
object recognition include multiresolution binary pyramids , set of centres of maximal spheres and distance
19
transformations .
(a)
(e)
(b)
(c)
(f)
(g)
Figure 1. Examples of 3D meshes (wireframe representation).
(d)
(h)
When the shape concept is considered within similarity retrieval applications, descriptors should have reasonable size and
should support simple similarity measures, enabling effective search/ browsing of very large databases. Thus, recognition
methods involving search in high dimensional spaces or iterative matching and registration techniques become prohibitively
expensive in terms of retrieval time for such applications. Previous work reported in the literature in the area of shape-based
retrieval of 3D mesh models includes especially global shape representations. In 20, spherical moment functions are
considered. In 21, several 3D shape descriptors are proposed, including bounding boxes, distributions of orientations and
lengths of cord vectors connecting the gravity center of the object to the gravity center of each model face, statistical
moments and a 3D wavelet-based representation. Authors deal with the issue of invariance to 3D rotations by representing
the 3D object within its proper coordinate system defined by the three eigenvectors of the tensor of inertia, with labels x, y
and z set accordingly to the decreasing order of the corresponding eigenvalues. The invariance to scale is then achieved by
normalizing the 3D bounding box with axis parallel to the principal axis inertia. Such solutions to the issue of rotation and
scale invariance have several limitations: (1) labeling the principal axis according to the decreasing order of the associated
eigenvalues does not always guarantee the achievement of optimal spatial correspondences (for example, imagine a high
glass and a flat cup on a horizontal table: in the case of the glass, the maximal eigenvalue corresponds to the vertical
direction, while for the cup the maximal eigenvalue corresponds to a horizontal one; by defining the coordinate system using
the above-mentioned approach, any further attempt to find similarities between the two objects would fail); (2) normalizing
the objects to a unit-sized bounding box is highly sensitive to local deformations; (3) the selection of the coordinate system
raises an additional mirroring ambiguity that is not straightforward to handle.
We note that computer vision applications, such as object recognition, generally use low-level descriptors (features) without
any related semantics. On the contrary, humans are often associating shape with some semantic concepts. To exemplify our
comment, let us consider the concept of humanoid. An entire class of geometrical shapes with a high variability is recovered
by this concept. In addition, an element of this class may have several appearances, since we are dealing here with an
articulated object: a sitting man and a standing-up man are both humanoids. Are their associated geometrical shapes really
similar? The answer depends on taking or not into account the relative positions of each component of the articulated object.
Humans have the ability to decide if this information is relevant or not, depending on specific purposes. Thus, a human can
recognize without any difficulty other humans, regardless the posture, and can also respond to specific targets such as gesture
recognition where the interpretation of relative position information is an essential issue. Computers do not have any
mechanism for deciding. Consequently, the specific objectives of such systems should be clearly specified from the
beginning in order to adopt appropriate technical solutions. Reconsidering our example from a computer vision system point
of view, local descriptors seem to be more adapted in recognizing articulated objects in different postures, while for gesture
recognition applications, global representations are more appropriated.
In this paper, our objective is to provide a descriptor for locally characterizing free-form surfaces represented as discrete
polygonal 3D meshes. The surface-based approach is motivated by the sake of generality, since 3D meshes may include
open surfaces that have not an associated volume. The requirements to be fulfilled relate to issues of invariance with respect
to scale and Euclidean transforms, robustness with respect to different triangulations of the same object and ability of
successfully retrieving articulated objects with different postures.
Considering the key issue of standardizing tools for efficiently representating 3D content, the Moving Picture Expert Group
22
(MPEG) first addressed the issue of 3D mesh coding (3DMC). The recent MPEG-4 standard includes technology for both
single-resolution and progressive coding of 3D mesh models. More recently, within the framework of the development of the
23 ,24, 25
future MPEG-7 standard
, tools for content-based indexing and retrieval of 3D data have been considered and
extensively evaluated. The 3D SSD has been under evaluation within the MPEG-7 Core Experiments (CEs) and finally
adopted into the MPEG-7 CD, which is the kernel of the future standard at the current stage of the standardization process.
26, 27, 28, 29, 30, 31
The results presented in this paper are those obtained and reported
within the framework of the MPEG-7
3D Shape CE over 5 successive MPEG meetings (October 1999 - July 2000).
The paper is organized as follows. Section 2 describes the 3D model database and the associated categorization used for
evaluating 3D shape descriptors within the MPEG-7 standardization process. In section 3, we introduce the mathematical
definition of the proposed 3D SSD, within a continuous framework. Section 4 deals with the extraction of the 3D SSD from
discrete polygonal meshes. We first analyze the issue of 3D object representation in terms of uniqueness, regularity of
polygonal faces distribution and smoothness of polyhedral mesh surfaces. Then, we deduce a set of geometrical and
topological properties that define the so-called minimal and regular mesh representation. Finally, the extraction method of
the 3D SSD is described in details. Experimental results are reported and discussed in Section 5. Section 6 concludes the
paper and opens perspectives for future work.
2. THE 3D TEST DATA SET
Evaluating 3D shape descriptors requires the availability of a sufficiently large 3D model test data set. Existing commercial
32
3D data repositories, such as the Viewpoint collection , are still expensive for our academic purposes.
First, we have used two databases already existing within MPEG: (1) The MPEG-4 3D model data set, consisting of 293
models previously used within the MPEG-4 3DMC CEs33, and (2) The "Letters" data set consisting of 50 3D models
representing letters from "A" to "E" and donated for the 3D Shape MPEG-7 CEs by IBM Japan34. Each letter is represented
here by 10 differently triangulated 3D meshes (Figures 1.a and b show two examples for letter "B"), the "Letters" data set
being specifically created for testing the robustness of 3D shape descriptors with respect to mesh triangulation.
We have then gathered 3D mesh models freely available over the Internet, specifically from the 3D Cafe web site35. As the
initial models were heterogeneously represented under various 3D formats (such as 3DS, LWO, DXF, COB, ...) our initial
work focused on converting these models within VRML2.0, in order to obtain a data set of meshes represented within an
unique format. The converted models have been carefully examined and a lot of work has been spent on recovering models
exhibiting fully or partial orientation problems or models with broken connectivity (see Section 4 for a synopsis of
representation problems). Finally, 947 corrected 3D models have been selected for our purposes.
The resulting data set includes thus 1300 3D mesh models represented in VRML2.0 format.
Objectively evaluating similarity retrieval results also requires to define an appropriate ground truth. Therefore, a subset
consisting of 228 models has been selected and categorized into 15 distinct categories, listed in Table 1.
"4 LIMBS" (31)
CARS (17)
AERODYNAMIC (36)
TREES (21)
MISSILES (10)
BALLOONS (7)
BUILDINGS (10)
SOMA (7)
E1_Mx (9)
FINGER (30)
LETTER A (10)
LETTER B (10)
LETTER C (10)
LETTER D (10)
LETTER E (10)
Table 1. The categorization of a subset of the 3D database, used as ground truth. The number of items within each category
is specified under brackets.
The proposed categorization, approved by the MPEG-7 group aims at taking into account the local geometrical similarity of
shapes. Thus, models such as humans, animals, monsters, aliens have been merged within the "4-Limbs" category, while the
"Aerodynamic" category includes models of airplanes, helicopters and fishes. E1_Mx gather some engine components, while
"Soma" includes nine models made of cubic components in various relative positions. The "Balloons" category includes 7
mesh models of one ore more spherical objects with various triangulations, while the "Cars", "Trees" and "Buildings"
categories have been created upon an exclusively semantic basis.
3. THE 3D SHAPE SPECTRUM DESCRIPTOR
The 3D SSD aims at providing an intrinsic shape description of 3D mesh models, by exploiting some local geometrical
attributes of the 3D surfaces. The 3D SSD has been previously used for indexing and retrieval of 2D images36 and 3D range
data13.
The shape index, first introduced by Koenderink37 is defined as a function of the two principal curvatures. Let p be a point
on a regular38 3D surface S and let k 1p and k p2 denote the principal curvatures38, 39 associated with point p. The shape index at
point p, denoted by Ip, is defined as expressed in Equation 1:
k 1p + k p2
1 1
I p = − arctg 1
,
2 π
k p − k p2
with k 1p ≥ k p2 .
(1)
The shape index, is a local geometrical attribute of a 3D surface, expressed as the angular coordinate of a polar
representation of the principal curvature vector. The shape index ranges in the interval [0,1] and is not defined for planar
surfaces. The shape index provides a scale for representing salient elementary shapes such as convex, concave, rut, ridge and
saddle (Figure 2), and is invariant with respect to scale and Euclidean transforms.
2
2
1.9
1.9
1.8
1.8
1.7
1.7
1.6
1.6
1.5
1.5
1.4
50
1.4
50
1.8
1.5
1.5
1.7
1.6
1.5
1
1
0.5
0.5
1.4
1.3
1.2
40
50
30
40
30
20
30
40
30
20
10
10
0
1
50
50
20
20
10
1.1
40
10
0
0
a. Spherical cup (0.0)
0
b. Rut (0.25)
0
50
40
50
30
40
30
20
20
10
10
0
0
c. Minimal Saddle (0.5)
0
50
40
50
30
40
10
0
0
d. Ridge (0.75)
40
20
10
10
0
30
20
20
10
50
30
40
30
20
0
e. Spherical cap (1.0)
Figure 2. Elementary shapes and their corresponding shape index (under brackets).
Let us recall that the principal curvatures are defined as the eigenvalues of the Weingarten map (W) given by the following
expression:
W = I −1 II ,
(2)
where I and II denote respectively the first and the second fundamental differential forms.
Considering a Cartesian coordinate system and a Monge parametrization S = ( x , y , z ) = ( x , y , f ( x , y )) of the surface, with
f -C2 differentiable, the fundamental differential forms I and II can be expressed as symmetric and positive semi-definite
matrices, and are given by the following equations:
 S ,S
 x x
I =
 S x ,S y
S x , S y  1 + f x2

 =
S y ,S y   f x f y
 S ,N
fx fy 
 xx

and II = 
1 + f y2 
 S xy , N
S xy , N 
 f xx
1

=

S yy , N 
1 + f x2 + f y2  f xy
f xy 
,
f yy 
(3)
where N denotes the normal vector to the surface at the considered surface point p, and Sxy is the standard Monge notation
for the partial derivatives of the surface S with respect to variables x and y. Here, notations used correspond to :
∂ 2S
∂ 2S
∂ 2S
∂S
∂S
(4)
, S yy = 2 .
Sx =
, Sy =
, S xx = 2 , S xy =
∂x∂y
∂x
∂y
∂x
∂y
The shape spectrum of the 3D mesh is defined as the distribution of the shape index calculated over the entire mesh.
We note that the above definitions refer to surfaces in the continuous space ℜ 3 . The extraction of the 3D SSD from
discrete, polyhedral 3D meshes is described in details in the following section.
4. ESTIMATION OF THE 3D SHAPE SPECTRUM DESCRIPTOR
VRML data are initially intended for graphics purposes. Hence, the first step when applying differential geometry-based
analysis is to transform such rough 3D data into some useful geometrical surfaces.
4.1. 3D MESH FILTERING
Because the 3D SSD is derived from a surface-based analysis, the representation of a 3D object is a crucial issue. Indeed, the
3D SSD should not be sensitive to the mesh representation.
When considering arbitrary mesh models, we have to deal with:
• non-orientable meshes,
• non-uniqueness of the representation,
• degenerated or duplicated polygons,
• presence of sharp features,
• irregularly sampled meshes.
Let us introduce several definitions, useful in the further developments. Two different vertices, successive (modulo cyclical
permutations) within the sequence of vertices defining the faces of the mesh constitute an edge. An edge belonging to a
single face of the mesh is called a border edge. Border faces are defined as faces including at least a border edge. A non
border-edge belongs to at least 2 different triangles and is called an internal edge. Two faces are said to be E-neighbor faces
if they share at least a common edge. A sub-set of mesh faces is connected if between each two component faces can be
found a path of successive E-neighbor faces. Hereafter, we will denote by connected components the maximal connected
sub-sets that can be defined with the faces of the mesh.
In order to illustrate the concept of orientable mesh, let us consider the example in Figure 3. With each triangle of the mesh,
we associate a normal vector with a direction given by the traversal order of the triangle vertices (Figure 3.a). A mesh is
orientable if each two E-neighbor faces have consistent orientations. Figure 3.b shows a non-orientable mesh: the common
edge (v1, v2) is traversed in the same direction in the two neighbor triangles. Figure 3.c illustrate the concept of orientable
mesh: the common edge (v1, v2) is traversed in opposite directions in the two neighbor triangles. If a mesh has at least two
E-neighbor faces with different orientations, the mesh is considered non-orientable.
We note that an edge of an orientable surface will belong to at most two faces.
N
v0
v0
v0
v2
v2
v2
v1
v1
v1
v3
v3
(a)
(b)
(c)
Figure 3. The concept of orientable mesh.
The issue of non-uniqueness of the representation (i.e. visually identical 3D meshes that correspond to completely different
geometrical 3D surfaces) is illustrated in Figure 4.
v3
v3
v12
v4
2
v0
v1
3
v6
v2
1
v7
v1
v6
v 8 v10
4
v0
3 v2
v7
v9
1={v0, v1, v2}, 2={ v1, v4, v3, v2},
3={ v6, v7, v3, v4}, 4={v7, v6, v5}
v1
v8 2
v2
v7
1
v9
v6 v 8
v9
v0
3 v2
3
v1
4
v6
v5
v5
1={v0, v1, v2}, 2={ v6, v4, v3, v7},
3={ v8, v9, v13, v12}, 4={v11, v10, v5}
v3
v4
2
1
v11
4
v5
v3
v4
2
v0
1
v13
v4
1={v0, v1, v2}, 2={ v2, v1, v4, v3},
3={ v6, v7, v3, v4}, 4={v9, v8, v5}
v7
4
v5
1={v0, v1, v2}, 2={ v1, v4, v3, v2},
3={ v6, v7, v9, v8}, 4={v7, v6, v5}
(a)
(b)
(c)
(d)
Figure 4. A mesh and its different topological representations. Thicker line segments illustrate cuts.
Each of the orientable meshes shown in Figure 4 represents the same physical object, consisting of 4 faces (denoted by 1, 2,
3 and 4, and defined in Figure 4), with the same orientation. A VRML viewer will display them identically. Nevertheless,
from a geometrical point of view, these meshes are completely different. The mesh (a) corresponds to a minimal
representation, in the sense that it is composed of a single connected component. Meshes (b-d) are obtained from the mesh
(a) by cutting the former in several ways (a cut is the duplication of an edge and of its corresponding vertices). Thus, the
mesh (b) is represented as a "puzzle" of completely non-connected polygons (the number of connected components of this
mesh is equal to the number of polygons). Meshes (c) and (d) are made of two different connected components, defined by
the faces (1, 2 and 3) and (4) for mesh (c), and faces (1, 2) and (3, 4) for mesh (d).
Analyzing such surfaces in terms of shape index (or more generally, in terms of local characteristics), will yield completely
different results. In order to illustrate this issue, let us first define two different types of edges of interest within an orientable
3D mesh (Figure 5). An internal edge is said to be a rut edge (resp. a ridge edge) if the two faces containing it form an
angle, within the the reunion of the two demi-spaces pointed-out by the normal vectors, less (resp. greater) than π. Such rut
and ridge edges, together with the associated faces, represent discrete versions of surfaces presented in Figures 2.b and d,
respectively.
θ >π
θ <π
Figure 5. Rut and ridge edges.
Considering the meshes in our example, mesh (a) includes two rut edges, (v1, v2) and (v6, v7) and a ridge edge (v3, v4). Mesh
(b) contains exclusively border edges. Mesh (c) includes a rut edge (v1, v2) and a ridge edge (v3, v4) . Finally, mesh (d)
includes two rut edges, (v1, v2) and (v6, v7).
The possible presence of duplicated or degenerated polygons (polygons of zero area) also affects both the global (such as
gravity center, axes of inertia, statistical moments) and local characteristics of the considered 3D surface.
This analysis shows the necessity of a pre-filtering stage aiming at bringing, whenever possible, 3D meshes to a minimal and
regular topological representation, in terms of orientable surfaces, a minimal number of connected componnets and without
degenerated or duplicated polygons. In our work, a preliminary regularizing filtering40, 41 of the 3D mesh models has been
applied. Related approaches for repairing 3D mesh models to form manifold surfaces and solid regions are reported
in42, 43, 44. However, none of the existing approaches guarantees the correctness of the resulting surfaces in terms of surface
orientation consistency of all the connected components. How to obtain models with normal vectors of each connected
component pointing towards the exterior of the surface? We solved this issue as follows: first, 3D meshes have been visually
inspected and the connected components with reversed orientation have been identified. Then, the traversal order of each
face within the previously determined connected components has been reversed. This semi-automatic procedure recovers
efficiently 3D models exhibiting such orientation degeneracies.
Concerning the irregularly sampled meshes, for a given 3D model, the face area and mean elongation ratio distributions may
widely vary (Figure 6.a). In addition, such a 3D mesh shows sharp features, violating the smoothness hypothesis necessary
when considering surface derivatives.
(a)
(b)
(c)
Figure 6. Initial Model "a00" (a) and its refined versions after one (b) and two (c) subdivision levels.
In order to obtain smoother representations of such 3D meshes, a pre-processing step consisting of a mesh subdivision
algorithm based on Loop’s subdivision scheme45 is applied. The subdivision algorithm consists of two successive steps: (1)
mesh re-sampling by mid-edge vertex insertion (Figure 7), and (2) low pass filtering of the vertex coordinates. By choosing
appropriate filter coefficients, the limit surface can be shown to be tangent plane smooth46. In addition, such a subdivision
scheme has the advantage of drastically reducing the relative area of the border faces. Let us note that such a scheme
requires a preliminary triangulation of the mesh models.
Figure 7. Mesh re-sampling by mid-edge insertion.
Figures 6.b and c illustrates the effect of applying such a subdivision schemes to model "a00".
4.2. ESTIMATION OF THE PRINCIPAL CURVATURES
Estimating the principal curvatures is the key step of the 3D SSD extraction. Indeed, the 3D SSD performances strongly
depend on the accuracy of estimates and the robustness of the estimation technique with respect to the triangulation of the
3D model. Our approach is based upon a second degree polynomial surface fitting.
In our approach, the principal curvatures are associated with each face of the mesh. First, the mean normal vector to face fi of
~
the mesh, denoted by N f i , is defined as the weighted average of the normal vectors of the 0-adjacent faces of the considered
face fi (Equation 5). Two faces of the mesh are said to be 0-adjacent if and only if they share a common vertex.
∑
~
N fi =
{ }
f k ∈F0 fi
∑
{ }
f k ∈F0 fi
Here,
wk N f k
.
(5)
wk N f k
{ } denotes the set of 0-adjacent faces of fi, N f k denotes the normal vector to face fk, wk is the weighting coefficient
F0 f i
associated with the 0-adjacent face fk, and ⋅ the L2 norm of a vector in ℜ 3 . Each weighting coefficient wk is set to equal
the area of the face fk, in order to take into account some irregularities in the mesh face distribution.
A local Cartesian coordinate system (x, y, z) is defined such that its origin coincides with the gravity center of the face fi and
N
~
the z-axis is associated with the previously defined mean normal vector N f i . Let ( x i , y i , z i )
denote the cloud of points
{
}i =1
made of the centroids of the considered face fi and its 0-adjacent faces, with coordinates expressed in the local coordinate
system.
{
}iN=1 .
The parametric surface approximation is achieved by fitting a quadric surface through the set of points ( x i , y i , z i )
Let S = ( x , y , z ) = ( x , y , f a ( x , y )) denote the second degree polynomial surface, expressed as:
f a ( x , y ) = a 0 x 2 + a1 y 2 + a 2 xy + a 3 x + a 4 y + a 5 ,
(6)
where the ai’s are real coefficients. By denoting a = (a 0 a1 a 2 a 3 a 4 a 5 ) and b( x , y ) = ( x y xy x y 1) , Equation 6 can be rewritten more compactly by using standard matrix notations:
f a ( x , y ) = a t b( x , y ) .
(7)
t
2
2
t
The parameter vector a = (a 0 a1 a 2 a 3 a 4 a 5 ) t is determined by applying a least square fit procedure. Given the data points
denoted by {( x i , y i , z i )}i =1 , with associated weights {wi }i =1 , the parameter vector â corresponding to the optimal (in the
N
N
weighted mean square error sense) fit is expressed as:

N
a =  ∑ w i b ( x i , y i ) b t ( x i , y i )

 i =1
−1
N

N
 ∑ wi z i b( x i , y i ) = arg min ∑ wi ( z i − f a ( x i , y i )) 2 .

 i =1
a ∈ℜ 6 i =1
Note that the representation of the gravity centers
{( xi , yi ,zi )}iN=1 in
(8)
local coordinates guarantees the invariance of the
approximation with respect to Euclidean transforms.
Once the parametric approximation available, the principal curvatures k1 and k2 can be easily determined as the eigenvalues
of the Weingarten map W, since the first and second differential forms I and II at (x, y) = (0, 0) take simple expressions,
given in Equation 9.
 1 + a 32 a 3 a 4 
 a0 a2 
1

(9)
,
II =
I =
.

2
 a 3a 4 1 + a 4 
1 + a 32 + a 42  a 2 a1 
We finally note that it is also possible to consider the set
{ } of the n-adjacent faces (n>0). Such a set is recursively
Fn f i
defined from F0 { f i } by considering successively wider neighborhoods. In this case, the computational complexity increases
proportionally to the number of n-adjacent faces without providing a significant improvement of the principal curvature final
estimation.
4.3. 3D SSD COMPUTATION
The 3D SSD of a 3D mesh M is defined as the histogram of the shape index values, calculated over the entire mesh. The
histogram is represented on a Nbins number of bins, uniformly quantizing the range of the 3D shape index values (the [0, 1]
 k −1
k 
N bins
interval) into a set of Nbins intervals, {∆ k }k =1
, where ∆ k = 
,
 , for k = 1, N bins − 1 and
 N bins N bins 
N
−1 
∆ Nbins =  bins
, 1 . The relative (with respect to the total area of the mesh) area of each face of the mesh of shape index
N
bins


belonging to the interval ∆ k is added to the kth component of the 3D SSD.
Two additional components, denoted by PlanarSurface and SingularSurfaces, and representing the relative surfaces of
planar and border patches, respectively, are introduced in the 3D SSD. The planar surfaces must be separately considered
since the 3D shape index is not defined for planar patches. Let ka denote the L2 norm of the curvature vector (k1, k2):
k a = k 12 + k 22 . We define the degree of curvedness of the surface as C = Area(F0 {f i }) * k a . Each face with curvedness C
less than a threshold T, will be considered as part of a planar patch and its relative area will increment the PlanarSurface
component of the 3D SSD.
Unreliable estimates of the principal curvatures may occur in the case of border faces with a number of 0-order adjacent
faces smaller than a pre-defined number Nmin. Hence, the border faces are not taken into account when computing the shape
index and their relative area is stored into the SingularSurfaces component.
5. EXPERIMENTAL RESULTS
The 3D SSD has been applied to the 3D test data previously described (Section 2). For achieving 3D shape-based similarity
retrieval applications, we used L1 or L2 -based distances as similarity measures between the 3D-SSDs associated with meshes
under study. Such similarity measures may, or not, take into account the non-planar surfaces or the singular surfaces,
depending on each targeted application. In our experiments, the singular and planar histogram bins have not been taken into
N bins
account, and the 3D SSD has been re-normalized such that
∑ SSD(i ) = 1 .
i =1
Figures 8 and 9 show some examples of similarity retrieval corresponding to queries from the "Cars" (8.a) "Aerodynamic"
(8.b) "4-Limbs" (9.a and 9.b) and "Trees" (9.c) categories. The first 6 retrieved results are presented according to a
decreasing similarity order, from left to right and from top to bottom. In the upper left cell the query is presented.
(a)
(b)
Figure 8. Similarity retrieval results for queries within the "Cars" (a) and "Aerodynamics" (b) categories.
(a)
(b)
(c)
Figure 9. Similarity retrieval results for queries within the "4-Limbs" (a, b) and "Trees" (c) categories.
Figures 9.a and b show that nice results are obtained for articulated objects in the "4-Limbs" category, where elements in
various postures are retrieved (the humanoid with arms and legs spread-out in the second row of Figure 9.a, the sitting dog in
Figure 9.b).
For objective evaluation of the proposed 3D SSD in terms of retrieval performances, we compute the Bull-Eye Percentage
(BEP) score. The BEP is defined for each query as the percentage of correct matches (i.e. items within the same category as
the query) among the 2Q top retrieved results, where Q denotes the size of the considered query category. After computing
the BEP scores for each categorized item, an average BEP is computed for each category. BEP scores for each category are
presented in Table 2 (here, two levels of mesh subdivision have been applied prior to the computation of the 3D SSD).
Category
Q
"4 LIMBS"
CARS
AERODYNAMIC
TREES
MISSILES
BALLOONS
BUILDINGS
SOMA
BEP Score (%)
31
17
36
21
10
7
10
7
Category
73
E1_Mx
66
FINGER
71
LETTER A
56
LETTER B
87
LETTER C
95
LETTER D
39
LETTER E
100
Global mean BEP score : 85%
Q
BEP Score (%)
9
30
10
10
10
10
10
100
100
90
100
100
100
100
Table 2. Similarity retrieval rates. Here, Q denotes the number of items within each category.
The global mean score of 85% demonstrates the good discriminatory properties of the 3D SSS. A more detailed study of the
results shows excellent BEP scores, between 90% and 100%, for shape property-based categories. Good BEP scores
(between 70% and 90%) are provided when dealing with categories mixing reasonably shape properties together with
semantic concepts ("Missiles", "4-Limbs" and "Aerodynamic" categories), while decreasing performances are reported for
categories mainly based on semantic concepts, such as the "Buildings" or "Trees" categories. Robustness with respect to
mesh triangulation is demonstrated by the high retrieval rates achieved on the "Letters" corpus.
We mention that applying the mesh subdivision technique described in Section 4 (with two levels of subdivision) increased
the global mean BEP score of 10%, with respect to the score obtained without preliminary subdivision.
In order to decrease as much as possible the complexity of the 3D SSD, we study the effects of quantizing the 3D SSD and
of decreasing the number of histogram bins of the representation, with respect to the similarity retrieval related
performances.
The 3D SSD values range within the [0, 1] interval. We perform a uniform quantization of the [0, 1] interval with a number
of quantization steps N = 2 b , where b denotes the number of bits of the representation. The retrieval results are listed in
Table 3 for floating point precision (FPP) and coarser representations, corresponding to values of b varying from 12 bits
down to 7 bits.
Global Mean BEP Score (%)
FPP
85
b=12
84
b=11
84
b=10
84
b=9
84
b=8
83
b=7
82
Table 3. Similarity retrieval rates with respect to the number of quantization bits.
Experimental results show that a number of 12 bits for the quantization values yields hit rates approaching the floating point
representation performances. More compact representations, down to 8 - 7 bits are also possible with 3% degradation of the
mean retrieval rate.
In order to evaluate the retrieval rate degradation with respect to the number of histogram bins (Nbins) used for the 3D SSD
representation, we have computed the hit rates corresponding to 4 different numbers of bins (100, 50, 25 and 10). Results are
presented in Table 4. Here, the number of quantization bits of the descriptor values is b=12.
Nbins
Global Mean BEP Score (%)
100
84
50
84
25
83
10
80
Table 4. Similarity retrieval rates with respect to the number of histogram bins for b=12.
Scores listed in Table 4 show that good results are obtain in a range varying from 100 down to 10 histogram bins, with a
degradation of the mean retrieval rate of only 5%.
Thus, a compact representation of the 3D shape spectrum descriptor can be used while ensuring optimal retrieval rates.
The locality of the descriptor is a useful feature for characterizing the local shape information, but could raise some
scalability problems when dealing with large amount of 3D data. However, the compactness achieved by quantizing the
descriptor values and decreasing the number of histogram bins, makes it possible to foresee some future extensions, aiming
at embedding the 3D SSD into some more global representations (e.g. within a histogram refinement framework).
6. CONCLUSION
This paper presents a surface-based approach for indexing and similarity retrieval of 3D content represented by 3D mesh
models, based upon the characterization of the local shape of a 3D geometric surface. After analyzing the representation and
sampling problems that may occur when dealing with discrete 3D surfaces, solutions are proposed and a complete extraction
method of the 3D SSD from polygonal meshes is discussed in details.
Applied to a 3D data set donated to the MPEG-7 community for search and retrieval experiments, results then obtained are
reported and discussed. The objective evaluation of the descriptor, based upon a ground truth of 15 categories including 228
3D meshes, shows a Bull-Eye mean retrieval rate of 85%. The analysis of effects of quantization and number of histogram
bins of the representation shows that the proposed approach provides a very compact descriptor of minimum size of 100
bits/ 3D mesh model, allowing fast browsing and search of 3D model databases.
Future work will address the issue of combining such a simple local descriptor with some global representation schemes.
7. REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
International standard "The Virtual Reality Modeling Language", ISO/ IEC 14772-1, 1997.
L. Zusne, "Contemporary theory of visual form perception: III. The global theories", Academic Press, 1970.
D. O. Hebb, "The organization of behavior", John Wiley, 1949.
J. Koenderink and A. Van Doorn, "Dynamic Shape", Biological Cybernetics, Vol. 53, pp. 383-396, 1986.
D. Marr, "Early processing of visual information", Proc. of the Royal Society of London, B275, pp. 483-519, 1976.
D. Marr, "Vision", Freeman, San Francisco, 1982.
B.K.P Horn, "Extended Gaussian Image", Proc. of the IEEE, Vol. 72, pp. 1671-1686, 1984.
S.B. Kang, K. Ikeuchi, "The complex EGI: a new representation for 3D pose determination", IEEE Trans. on PAMI,
vol. 16, No. 3, pp. 249-258, 1994.
H. Matsuo, A. Iwata, "3D Object Recognition using MEGI model from range data", Proc. 12th International Conference
on pattern Recognition, pp. 843-846, 1994.
Y. Li, R. J. Woodham, "Orientation-based representations of 3D shape", Proc. IEEE Conference on Computer Vision
and Pattern Recognition, pp. 182-187, June 1994.
V.S. Nalwa, "Representing oriented piecewise C2 surfaces", International Journal of Computer Vision, Vol. 3, pp. 131153, 1989.
P. Liang, C. H. Taubes, "Orientation-based differential geometric representations for computer vision applications",
IEEE Trans. on PAMI, Vol. 16, No. 7, pp. 707-721, 1993.
C. Dorai, A.K. Jain, "Shape Spectrum-Based View Grouping and matching of 3D Free-Form Objects", IEEE Trans. on
PAMI, Vol. 19, No. 10, pp. 1139-1145, 1997.
Dorai and A.K. Jain, "Shape-Spectra Based View Grouping for Free-Form Objects", Int’l. Conf. Image Processing '95,
Washington, D.C., pp. 340-343, Oct. 1995.
K. Wu, M.D. Levine, "Recovering of parametric geons from multiview range data", IEEE Conference on Computer
Vision and Pattern Recognition, 1994.
C. Nastar, B. Moghaddam, A. Pentland, "Generalized image matching: statistical learning of physically-based
deformations", M.I.T. Technical Report No. 368, 1996.
17. S. Sclaroff, A. Pentland, "Physically-based combinations of views: Representing rigid and nonrigid motion", Work.
Motion of Non-rigid & Articulated Objects, pp. 158-164, Austin, 1994.
18. G. Borgefors, G. Ramella, G.Sanniti di Baja, S. Svensson, "On the multiscale representation of 2D and 3D shapes",
Graphical Models and Image Processing 61, pp. 44-62, 1999.
19. G. Borgefors, I. Nyström, "Efficient shape representation by minimizing the set of centres of maximal discs/ spheres",
Pattern Recognition Letters, Vol. 18, pp. 465-472, 1997.
20. T. Murao, "Descriptors of polyhedral data for 3D-shape similarity search", Proposal P177, MPEG-7 Proposal
Evaluation Meeting, Lancaster, UK, February 1999.
21. E. Paquet, M. Rioux, A. Murching, T. Naveen, A. Tabatabai, "Description of shape information for 2-D and 3-D
objects", Signal Processing: Image Communications, Vol. 16, pp. 103-122, 2000.
22. International standard ISO/ IEC14496," Information technology - Coding of audio-visual objects".
23. "Overview of the MPEG-7 Standard", ISO/ IEC JTC1/ SC29/ N3752, October 2000, La Baule, France.
24. "MPEG-7 Requirements Document v. 12", ISO/ IEC JTC1/ SC29/ N3548, July 2000, Beijing, China.
25. "MPEG-7 Applications , Demos and Projects", ISO/ IEC JTC1/ SC29/ N3546, July 2000, Beijing, China.
26. T.Zaharia, F. Preteux, M. Preda, "3D Shape spectrum descriptor", ISO/IEC JTC1/SC29/WG11, MPEG99/M5242 ,
Melbourne, Australia, October 1999.
27. T. Zaharia, F. Preteux, "3D Shape descriptors: Results and performance evaluation" ", ISO/IEC JTC1/SC29/WG11,
MPEG99/ M5592, Maui, December 1999.
28. T. Zaharia, F. Preteux, "Crosscheck of the 3D shape spectrum descriptor", ISO/IEC JTC1/SC29/WG11, MPEG00/
M5917 , Noordwijkerhout, The Netherlands, March 2000.
29. T. Zaharia, F. Preteux, "3D Shape Core Experiment: The influence of mesh
representation", ISO/IEC
JTC1/SC29/WG11, MPEG00/ M6103, Geneva, June 2000.
30. T. Zaharia, F. Preteux, "3D Shape Core Experiment: Semantic versus geometric categorization of 3D mesh models",
ISO/IEC JTC1/SC29/WG11, MPEG00/M6104, Geneva, Switzerland, June 2000.
31. T. Zaharia, F. Preteux, "The influence of the quantization step on the 3D shape spectrum descriptor performances",
ISO/IEC JTC1/SC29/WG11, MPEG00/M6316, Beijing, China, July 2000.
32. http://www.viewpoint.com.
33. Frank Bossen (Editor), "Description of Core Experiments on 3D model coding", ISO/IEC JTC1/SC29/WG11, M4312,
December 1998, Rome.
34. T. Murao, E. Paquet "A Report of Results in 3D Shape Descriptor Core Experiment Stage One", ISO/IEC
JTC1/SC29/WG11, MPEG99/M5021, Melbourne, Australia, October 1999.
35. http://www.3dcafe.com.
36. C. Nastar, "The Image Shape Spectrum for Image Retrieval", INRIA Technical Report RR-3206, July 1997.
37. J. Koenderink, "Solid shape", The MIT Press, Cambridge, Massachusetts, 1990.
38. M. Do Carmo, "Differential geometry of curves and surfaces", Prentice-Hall Inc., Englewood Cliffs, New Jersey, 1976.
39. M. Spivak, "A comprehensive introduction to differential geometry", 2nd ed., Houston: Publish or Perish, 1979.
40. P. Hoogvorst, "Filtering of the 3D mesh models", ISO/IEC JTC1/SC29/WG11, MPEG00/ M6101, Geneva, June 2000.
41. A. Gueziec, G. Taubin, F. Lazarus, W. Horn,"Converting Sets of Polygons to Manifold Surfaces by Cutting and
Stitching", IEEE Visualization'98, 1998.
42. T.M. Murali, Thomas A. Funkhouser, "Consistent solid and boundary representations", Computer Graphics (1997
SIGGRAPH Symposium on Interactive 3D Graphics), pp. 155-162, March 1997,.
43. G. Butlin, C. Stops, "CAD data repair", Proc. of the 5th Int. Meshing Roundtable, October 1996.
44. G. Barequet, S. Kumar, "Repairing CAD models", Proc. of IEEE Visualization’97, October 1997.
45. C. T. Loop, "Smooth subdivision surfaces based on triangles", Master’s Thesis, Dep. of Mathematics, Univ. of Utah,
August 1987.
46. M. Lounsberry, T. DeRose, J. Warren, "Multiresolution analysis for surfaces of arbitrary topological type", Technical
Report No. 93-10-05b, Dept. of CS & Eng., Univ. of Washington, January 1994.

Download Report

3D Shape-based retrieval within the MPEG-7

Paperzz.com

Your Paperzz