Generating Segmented Meshes from Textured Color Images

Generating Segmented Meshes
from Textured Color Images
Mario A. S. Lizier1
David C. Martins-Jr2
Alex J. Cuadros-Vargas1
Roberto M. Cesar-Jr2
Luis G. Nonato1
1
ICMC – Universidade de São Paulo, Brazil
2
IME – Universidade de São Paulo, Brazil
Abstract
This paper presents a new framework for generating triangular meshes from
textured color images. The proposed framework combines a texture classification technique, called W-operator, with Imesh, a method originally conceived to generate simplicial meshes from gray scale images. An extension
of W-operators to handle textured color images is proposed, which employs
a combination of RGB and HSV channels and Sequential Floating Forward
Search guided by mean conditional entropy criterion to extract features from
the training data. The W-operator is built into the local error estimation
used by Imesh to choose the mesh vertices. Furthermore, the W-operator
also enables to assign a label to the triangles during the mesh construction,
thus enabling to obtain a segmented mesh at the end of the process. The
presented results show that the combination of W-operators with Imesh gives
rise to a texture classification-based triangle mesh generation framework that
outperforms pixel based methods.
Key words: Mesh Generation, Delaunay Triangulation, Feature Evaluation
and Selection, Texture Classification, W-operators
Preprint submitted to Visual Communication and Image RepresentationNovember 4, 2008
1. Introduction
The problem of generating meshes from images has been investigated in
two different contexts, namely, image representation and numerical simulation. The techniques devoted to image representation (also called image
mesh modeling) aim at modeling a given image as a mesh, envisioning mainly
a more compact representation instead of the pixel representation. Image
meshing methods toward numerical simulation, however, intend to generate
meshes suitable for numerical solution of physical phenomena, as for example,
blood flow simulation and mechanical simulation of lung behavior.
Although targeting distinct applications, both approaches make use of
a similar set of conventional image processing tools, such high-pass filters
combined with thresholding, iso-values and interpolation in order to decide
where the vertices of the mesh must be placed. As expected, the use of
conventional tools limits the effectiveness of image based mesh generation
methods to the restrictive class of non-textured images. In fact, most techniques strongly rely on user intervention when dealing with textured images.
This is the case of most techniques in the context of numerical simulation,
as many mesh generation techniques assume well defined curves representing
the boundary of the domain to be meshed as input, what is very difficult to
be obtained through conventional segmentation methods.
In order to circumvent these difficulties, this work proposes a new methodology for meshing textured color images that reduces the user intervention
drastically, being thus a valuable alternative to the existing methods. In
fact, by exploring the concept of W-operators [1] and the developed image
based mesh generation technique called Imesh [2], a new framework for handling textured color images has been conceived. Besides dealing with color
and texture, our approach offers a very flexible mesh generation mechanism
which can be employed to image mesh modeling as well as to image mesh generation for numerical simulation. W-operators are employed both for vertex
placement and cell labeling, thus resulting in an unified scheme to generate a
segmented mesh. A quality mesh is then obtained by applying an adaptation
of Ruppert’s algorithm [3] to the segmented mesh. Therefore, the framework
proposed in this work produces a segmented quality mesh representing structures contained in textured color images, a feature not found in other image
based mesh generation techniques.
Before presenting the main aspects of our approach, we briefly describe
the related work in the next section. The Imesh technique, W-operators
2
and their combination are presented in section 3. Section 4 describes the
methodologies employed to segment and to improve the mesh. Results and
conclusions are discussed in section 5 and 6, respectively.
2. Related Work
In this section we present a review of relevant methods for generating
triangular meshes in domains defined by image. Our discussion is focused on
two-dimensional algorithms, although some of the presented techniques can
be extended to handle 3D images. The related techniques are divided in two
main groups, as following:
Image Mesh Modeling techniques intend to build a mesh that minimizes an error measurement, usually the approximation error between the
original image and that represented by the triangular mesh. Garcia et al. [4],
for example, have presented an algorithm that controls the maximum rootmean-square error (RMS) by choosing the vertices of the mesh from a curvature image, i.e., more vertices are placed in areas with high curvatures. The
mesh model is built by generating the Delaunay triangulation [5] from the
chosen vertices. Regions with high RMS error are re-sampled and new vertices are added to the Delaunay triangulation. Garcia’s method is a typical
example of an adaptive approach, which is characterized by beginning with
an initial mesh that is iteratively refined in order to reduce the interpolation
error. Many algorithms devoted to represent images by meshes are based
on adaptive approaches [6, 7, 8, 9]. Alternatively, some techniques have
adopted an opposite strategy, i.e., a fine mesh is successively coarsened until the approximation error reaches a tolerance [10]. Mixed approaches that
combine refinement and coarsening [11, 12] as well as optimization schemes
for positioning the vertices have also been developed [13, 14].
Still envisioning mesh modeling, Yang et al. [15] proposed an one pass
method that makes use of zero-crossing jointly with error diffusion in order to
choose and to position a set of vertices from which the Delaunay triangulation
is built. Besides reducing the approximation error, the authors argue that
this strategy produces meshes of good quality.
A main drawback of the above methods in the context of textured images
is the difficulty to define an appropriated approximation error which does not
generate an excessive number of elements. This effect happens when internal
details of the texture are detected and new elements are created to represent
them.
3
In general, Image Modeling for Simulation methods divide the mesh
generation process in two main steps: pre-processing and mesh generation.
The pre-processing step aims at filtering and segmenting the image in order
to detect the regions of interest, which are “meshed” in the mesh generation
step. Cebral and Lohner [16] binarize the original image in order to extract
well defined contours from which the mesh is built. Binarization has also been
employed as a pre-processing strategy by Zhang et al. [17] and Berti [18]. In
both algorithms the mesh is generated by defining an implicit function from
the binary images that guides a space partitioning strategy (quadtree) and
thus the mesh generation. They also add a post-processing step to improve
the quality of the mesh.
By making use of pre-processing to reduce noise and to highlight sharp
features, Hale [19] proposed the use of a potential energy function to align a
lattice of points with the image features. The mesh is then generated by Delaunay triangulation from the aligned points. The main problem with Hale’s
strategy is that distinct regions are not identified. Therefore, a mesh segmentation post-processing step is required in order to distinguish the different
structures contained in the image.
To the best of our knowledge, the work by Hermes and Buhmann [14] is
one of the few examples of mesh generation technique that can to deal with
textured images. The triangulation scheme proposed in [14] is based on a
minimization mechanism that relies on an iterative refinement approach and
vertices’s movement. Different from our approach, the quality of generated
elements is not considered and the resulting mesh is not Delaunay. Moreover,
the cost function to be minimized is defined without any previous knowledge
of each texture region and all pixels related to each requested triangle are
evaluated. Furthermore, the methodology proposed in [14] is based on pixel
values, disregarding neighborhood information.
The approach proposed in this paper is based on a supervised texture
classification, called W-operator [1], and it is oriented by a Delaunay mesh
created by Imesh [2] algorithm. The W-operator-based approach proposed in
the present paper is intrinsically local since the image is seen locally through
a window that just observes a limited part of the image, allowing to take
into account the texture information. In fact, our supervised approach allows a straightforward partitioning of the resulting mesh without requiring
an user defined parameter to specify the number of partitioned regions [14, 2].
Furthermore, the robustness and effectiveness supplied by the Delaunay triangulation jointly with an effective vertex insertion mechanism that avoid
4
bad quality triangles render Imesh a very reliable mesh generation scheme.
Moreover, Imesh makes use of a theoretically guaranteed mesh refinement
scheme [20] a characteristic not found in other methods devoted to generate
meshes from images.
3. Imesh and W-operators
This section introduces the methodology employed in our image based
mesh generation scheme. Before presenting the mathematical and computational basis of Imesh and W-operators, we shortly overview the proposed
scheme. The connection between Imesh and W-operators is discussed at the
end of the section.
3.1. Algorithm Overview
The algorithm has as input a textured color image and a set of samples
taken from the image. The samples are used as training set for the Woperators. They could be interactively provided by the user or originated
from a specific pre-defined set of training images.
An initial Delaunay mesh is built from points chosen on the image border
(top, bottom, left and right). An error measure (derived from W-operators)
is assigned to each triangle by traversing the three medians of the triangle.
The error measure indicates whether a triangle must be subdivided. In positive case, a new vertex is inserted inside the “bad” triangle via Delaunay
triangulation. W-operators are also employed to label each non-subdivided
triangle, giving rise to a segmented triangular mesh whose segments make
up the regions of the original image.
Therefore, W-operators offer a mechanism that computes new vertices to
be inserted and triangle labels simultaneously, leading to a segmented mesh
fitted to image features.
3.2. Imesh
The reasoning behind Imesh is to generate, from a given image, a mesh
that fits features contained in this image [2]. More specifically, let T be
the set of triangles of a Delaunay mesh M whose vertices are points in the
domain of an image I and E : T → R+ be a function that associates an error
measure to each triangle in T . Function E takes into account an specific
approximation property to decide how good a triangle is.
5
Different from other strategies described in the literature, which usually
evaluate the error associated with a triangle σ ∈ T by traversing all pixels
inside σ, Imesh iteratively built the mesh by calculating E over the medians
of each triangle and evaluating a local error measure in each pixel along the
medians. By traversing only medians the algorithm reduces its computational
effort while being still effective in detecting triangles that do not satisfy the
desired approximation.
Let h1 , h2 , h3 be the three medians of a given triangle σ ∈ T . Consider
the sets of points P hj = {p ∈ hj | E (p) ≥ cE } where E is a local error
function and cE is a user defined scalar. Therefore, P hj is the set of points
(pixels) on the median hj whose local error E is higher than a tolerance cE .
The local error function E is used as a stop criterion and is better defined in
section 3.4.
By denoting as dM (p) the square Euclidean distance from a point p to its
closest vertex in M , i.e., dM (p) = min{d2 (p, v)}, where d(·, ·) is the Euclidean
v∈M
distance and v is a vertex in M , we can define pc as the point maximizing
dM (p) for all p ∈ P hj , j = 1, 2, 3. From these definitions, we can state the
error function E as follows:
½
{E (pc )} if P hj 6= ∅ for any j = 1, 2, 3
E(σ) =
(1)
0
otherwise
The most distant pixel (pc ) from triangulation vertices that satisfy the
local error measure is chosen to be inserted to the Delaunay triangulation. By
choosing the most distant point, the algorithm avoids concentrating points
around particular regions of the image, which is a common problem in other
methods.
3.3. W-operators
A W-operator is an image transformation that is locally defined inside a
window W and translation invariant [21]. This means that the W-operator
depends just on shapes of the input image seen through the window W and
that the applied transformation rule is the same for all image pixels. A technique based on information theory concepts to estimate a good W-operator
to perform binary image transformations (e.g. noisy image filtering) and
gray scale analysis (e.g. gray-level texture recognition) has been introduced
in [1].
In order to use W-operators for color texture classification, we propose
in the present paper an extension of the approach proposed in [1] to handle
6
color images. The fact that each pixel is represented by a vector of color
channels worsens even more the problem of lack of training data. Because
of this, an approach for dealing with the lack of training data becomes even
more required. By constraining the search space of operators, less training
data is necessary to get good estimations of the best candidate operator [22].
The main idea of our approach, instead, is to select a subset of W which is
as much informative as possible about the texture classes according to the
training set. As the number of combinations of subsets is exponential in
terms of the cardinality of W , a feature selection algorithm is required. Our
approach does not constrain the search space and, at the same time, does not
require large training samples to satisfactorily estimate the joint probability
distributions between the observed patterns and the classes based on the
features selected.
3.3.1. W-operator definition and properties
Let Z2 denote the integer plane and + denote the vector addition on
2
Z . The opposite of + is denoted −. An image is a function f from Z2 to
L = {1, ..., k}m , where k is the number of tones of each color channel and m
is the number of channels considered. For example, m = 3 for RGB or HSV
images and m = 6 if we consider color image as being the combination of
RGB and HSV channels.
The translation of an image f (x) by a vector h ∈ Z2 is the image f (x − h)
denoted by fh (x). By denoting as L the set of images taking values in L,
an image transformation or operator can be defined as a mapping Ψ from L
onto Y , where Y = {1, ..., c} is a set of labels (classes).
An operator Ψ is called translation invariant iff, for every h ∈ Z2 and
f ∈ L,
Ψ(fh )(x) = Ψ(f )(x − h).
(2)
Let W be a finite subset of Z2 . A constraint class of f over W , denoted
as Cf |W , is the family of functions whose constraint to W results in f |W , i.e.,
Cf |W = {g ∈ L : g|W = f |W }.
(3)
An operator Ψ : L → Y is locally defined in the window W iff, for every
x ∈ Z2 , f ∈ L ,
Ψ(f )(x) = Ψ(g)(x), ∀g ∈ Cf |Wx ,
7
(4)
where Wx is the window translated by x ∈ Z2 .
An operator is called a W-operator if it is both translation invariant and
locally defined in a finite window W .
3.3.2. Color W-operator design
The design of W-operators depends on a collection of ideal/observed image pairs. In texture classification, there are observed texture images and the
ideal images labeled, i.e., an image containing the labels of each texture at
the respective locations. From the observed images, each translation of the
window W gives an observed pattern (feature vector). The label associated
to this pattern is retrieved from the ideal images at the same location of the
W origin (usually the origin is the central pixel of W ). Thus, the observed
feature vector associated to its label is one sample of the training set.
In order to build the training set, an adopted window of fixed dimensions
collects m feature vectors (vectors of integer numbers representing band levels) for each pixel, one for each considered color channel of the image. These
m vectors are appended to a single feature vector. An observed sample is
given by the pair formed by this vector and the label that indicates the
texture to which it belongs. Furthermore, each observed feature vector is
replicated in r rotated versions of d degrees, where (r + 1) · d = 360. This
strategy is important not only to increase the number of training samples,
but also to achieve a more accurate recognition of textures that may be rotated in several ways on test images (in landscape images for example). It is
thus assumed that the textures present rotational symmetry.
Finally, each feature vector is quantized to k < 256 levels in the following
way. As each intensity value belongs to the range from 0 to 255, this interval
is subdivided in k intervals of equal size. Then, given an intensity value v,
the quantized value v 0 is defined as v 0 = (v · k)/256. It is important to notice
that quantization is an important step in order to avoid estimation errors
and to allow inclusion of more features (window points).
Here, we consider the input image as being the combination of RGB
and HSV channels (m = 6). The process of generation, quantization and
replication of the samples is illustrated by Figure 1. The window can be
viewed as a 3D-structure, where the first and second dimensions are spatial
and the third indicates the channel (red, green, blue, hue, saturation, value).
We have explored two different color spaces in order to give more flexibility
to the feature selection procedure to identify the most discriminative features
for different possible colored textures.
8
Figure 1: Generation and replication of the training samples by rotations of 45 degrees
(d = 45).
Designing the W-operator based on the entire 3D-window is not a good
solution in general due to lack of training samples to estimate the joint probability distributions appropriately. The idea is hence to select a subset of
points of the 3D-window, called sub-window, which is more informative with
respect to the classes. A possible sub-window is schematically shown is Figure 2.
Figure 2: An example of a sub-window from which the W-operator may be designed. The
chosen sub-window is composed by the black positions.
In order to obtain the sub-window, the same feature selection method
used in [1] was adopted, i.e., Sequential Floating Forward Search (SFFS)
[23] guided by mean conditional entropy criterion. Nevertheless, we propose
a different criterion to reflect the curse of dimensionality1 . The proposal
1
Refer to [1] for a complete discussion.
9
consists in the penalization of rarely observed instances instead of the nonobserved instances (as originally proposed in [1]). The new criterion is defined
in Equation 5.
Ê[H(Y |XZ )] =
N
H((F (1), ..., F (c)) +
t
X
P (xZ )H(Y |xZ ), (5)
xZ ∈XZ :P (xZ )> 1t
where F : {1, ..., c} → [0, 1] is the probability distribution given by
½
α,
for i = y
F (i) =
,
1−α
, for i 6= y
c−1
P
H(X) = − x∈X P (x)logc P (x) (Shannon entropy with logarithm base equal
to c), t is the number of training samples, Z is the subset of the feature
set indexes, and N is the number of instances xZ with P (xZ ) = 1t . In this
equation, the rarely observed instances are those with only one observation
in the training set, which means that the probability of all these instances
is Nt . Without any penalization, such instances would have conditional entropy equal to zero (minimum possible), since the mass of probability is all
concentrated in a unique class. This penalization consists in distributing
probability mass among all possible classes. The amount of distributed mass
is given by 1 − α. In this work, α = 0.9 has been adopted. In other words,
our confidence that the right class is the observed one for the given instance
with only one occurrence is 90%.
Using the criterion function defined in Equation 5, feature selection can
be defined as an optimization problem where we search for Z ∗ ⊆ I such that:
Z ∗ : H(Y |XZ ∗ ) = minZ⊆I {Ê[H(Y |XZ )]} ,
(6)
with I = {1, 2, ..., n}.
The W-operator is represented by a table of conditional probabilities
where each row is a possible instance xZ ∗ of XZ ∗ , each column is a possible class Y = y and each cell of this table represents P (Y |XZ ∗ = xZ ∗ ).
This table is used as a Bayesian classifier where, for each given instance, the
chosen label Y = y is the one with maximum conditional probability for the
considered instance. In case of instances that have two or more labels of maximum probability (including non-observed instances), the nearest neighbors
according to the Euclidean distance are taken successively. The occurrences
10
of each label are summed until that only one of such labels has the maximum
number of occurrences and be chosen as the class to which the considered
instance belongs.
The output of the W-operator classification that is used for the Imesh is a
probability matrix where each cell (i, j) contains c conditional probabilities,
one for each class, corresponding to the pattern observed around the pixel
(i, j) of the image.
3.4. Local Error and W-operators
The local error E used by function E (Equation 1) is the mechanism that
allows to explore W-operators in the Imesh framework.
Let p be a pixel on the median hj of the triangle σ, y(p) be the label
assigned to p (by the W-operator), and Dy(p) be the directional “derivative”
in p of y(p) in hj direction. Let αi , i = 1, 2, 3, be the barycentric coordinates
of p regarding σ and A(p) = min{αi }, i = 1, 2, 3, be the function that assoi
ciates to p its smallest barycentric coordinate. The local error E is defined
as follows:
½
0
if Dy(p) = 0
(7)
E (p) =
A(p) if Dy(p) 6= 0
The local error defined in Equation 7 assigns to each boundary point
(regarding the labels) its smallest barycentric coordinate. The barycentric
coordinates of a point p are related to the areas of the triangles formed by
p and the vertices of σ. As p is a transition point between different image
regions, the function E can be seen as a measure of how well fitted a triangle
is regarding an image region.
In fact, a small value of A(p) indicates that hj intersects an image edge
close to the boundary of the triangle. Therefore, values of E(σ) close to
zero indicate that the triangle σ is well fitted to an image region. Hence, a
triangle σ is considered unsuitable if E(σ) > cE , where 0 ≤ cE ≤ 1 is an
user defined scalar. Unsuitable triangles are eliminated by inserting, in the
Delaunay triangulation, the point p such that E(σ) = A(p). Since p is a
point chosen to be as far as possible from the vertices of M , the problem of
dense accumulation of points around existing vertices is reduced.
Figure 3b) shows the mesh generated from the image shown in Figure 3a).
Notice that the triangles tend to be contained into similar regions of the
image. Such characteristic will be exploited during the mesh segmentation
step described in the following section.
11
(a)
(b)
Figure 3: a) Original image; b) Mesh generated from a).
4. Partitioning and Mesh Improvement
In [2], the partition step does not have any information about the definition of each region. Instead of a k-means approach, we can take advantage
of W-operators about the possibility to label each triangle generated during
the refinement process based on texture properties. Thus, a labeled mesh
is naturally obtained, which is not possible with other image based mesh
generation algorithms.
However, the mesh produced by the generation process may not be appropriated to some applications due to the poor quality triangles. In order to
ensure a good quality mesh, we have adapted the Ruppert’s algorithm [3] to
work on the segmented mesh. Details about mesh segmentation and quality
improvement are presented in the following section.
4.1. Mesh Partitioning
Mesh partitioning (segmentation) is a well known procedure in computer
graphics context, mainly for simplification and compression of meshes [24,
25, 26, 27]. However, the problem of segmenting a mesh generated from an
image has not been deeply investigated, being the work by Bertin et al. [28]
one of the few examples described in the literature.
However, segmentation can play an important part when the mesh is
generated for numerical simulation, as each object within the image may be
associated with different physical parameters that interferes in the simulation. Nevertheless, the use of W-operators makes it possible to carry out
mesh generation and segmentation simultaneously.
12
In fact, while traversing the medians, the algorithm may store the most
frequent label found on such medians. If the most frequent label is predominant then such label is assigned to the corresponding triangle (in our
implementation we consider a label predominant if it appears more than 90%,
although this value has a little influence on overall performance). When the
most frequent label is not predominant, no label is assigned to the triangle
and the refinement proceeds. At the end of the refinement process, very
few triangles (if any) are not labelled. Such remaining triangles assume the
most frequent label present on their medians. Figure 4 shows four steps of
the refinement and labelling process. Colored triangles have already been labeled while white triangles have not. Notice in figure 4d) the resulting mesh
segmentation.
a)
b)
c)
d)
Figure 4: Four steps of the mesh generation process.
4.2. Mesh Improvement
Mesh improvement aims at ensuring quality for the mesh, i.e. to ensure that each triangle verifies a minimum angle criterion. Furthermore, the
mesh improvement step should also preserve the mesh partitioning (segmentation) already computed. A variant of Ruppert’s algorithm [3] is employed
13
to achieve such a good quality triangulation while still preserving the mesh
segmentation.
Ruppert’s algorithm refines a Delaunay mesh by inserting the circumcenters of “poor” quality triangles. The quality of the triangles is measured by
the circumradius-to-shortest edge ratio, i.e. the radius of the circumcircle
divided by the length of the shortest edge of the triangle. It can be shown
that the circumradius-to-shortest edge ratio r/d of a triangle is related to its
smallest angle α by sin α = d/(2r) [29]. As the insertion of the circumcenters
tends to generate triangles with shorter circumradius, the smallest angle of
the new triangles tends to be “better” than the old ones, thus improving
the triangulation quality. Ruppert’s algorithm inserts circumcenters until
all triangles satisfy a quality constraint, i.e., all triangles have the ratio r/d
limited by a constant.
The strategy to insert new vertices is controlled by two main rules. Let
G be the set of triangle edges making up the boundary of a mesh M (edges
shared by only one triangle). The first rule of Ruppert’s algorithm verifies,
for each edge e in G, if a vertex of M lies strictly inside the diametral circle
(the smallest circle enclosing the edge) of e. In positive case, the edge e is
split in two new edges by inserting a vertex at its midpoint. The two new
edges replace e in G and the process follows until the diametral circles of
every edge (or subdivided segments) in G are empty. The second rule aims
at inserting a vertex at the circumcenter of each triangle whose circumradiusto-shortest edge ratio is larger than a bound B. However, if the new vertex
(circumcenter) lies inside the diametral circle of an edge in G then such vertex
is not inserted and the encroached
√ edge is split as in the first rule.
It can be shown that if B ≥ 2 then Ruppert’s algorithm terminates [29].
Furthermore, a consequence of keeping empty diametral circles for the edges
in G is that the circumcenter of any triangle contained in M lies within
M . This fact ensures that boundary edges of M will be preserved when
circumcenters are inserted, thus conserving the original partitioning.
Ruppert’s algorithm ensures that the smallest angle of every triangle in
M is at least 22o . An exception occurs when two adjacent boundary edges
form a small angle. In this case, the small angle formed by boundary edges
must be detected at the beginning of the process, thus receiving a special
treatment. In our implementation we adopt Shewchuk’s strategy to handle
small angles [29]. Such strategy consists in dividing the edges that comprise
the small angle in accordance to concentric circles (the center is the vertex
where the small angle occurs) whose radius are all power of two. Details of
14
this small angle treatment can be found in [29].
A new scheme to handle boundary edges has been developed in order
to adapt Ruppert’s algorithm to produce a good quality mesh while still
preserving the mesh partitioning. The graph G has been extended in such a
way that boundary edges (edges shared by only one triangle) as well as edges
between two different submeshes (meshes with distinct labels) are included
in G. In this way, by ensuring empty diametral circle for all the edges in
G one can preserve the partitioning, as these edges will be kept during the
refinement.
5. Experimental Results
This section presents results obtained from two sets of experiments in
order to illustrate the effectiveness of the proposed methodology (W-operator
and Imesh).
5.1. Berkeley Segmentation Dataset images
Figure 5 shows four images (obtained from the Berkeley Segmentation
Dataset2 ) used in our experiments and the respective 15 × 15 sized blocks
extracted to compose the training samples while Figure 6 shows the respective
manual segmentations (groundtruth). Blocks with same color in each image
belong to the same class of texture (five blocks for each class). The manually
labeled training regions (Figures 5a), b), c) and d)) represent 2.91%, 2.91%,
3.64% and 2.19% of the respective input images.
As discussed in section 3.3, the W-operator technique requires three parameters. The first is the window size (denoted |W |) that translates over the
training images to collect the samples. The second parameter is the quantization degree k. The last parameter is related to the degree of rotations d
for samples replication. We evaluated the classification results over the four
images illustrated in the Figure 5 for several combinations of values for these
parameters, considering four window sizes |W | = {1 × 1, 3 × 3, 5 × 5, 7 × 7},
three quantization degrees k = {4, 8, 16}, and two rotation values d = {0, 45}.
Besides, the utility of using both RGB and HSV channels was also evaluated,
i.e., the results obtained by using both RGB and HSV channels have been
compared against just using RGB bands. Note that the window size 1 × 1
2
http://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds
15
a) Pyramid
b) Horse
c) Lake
d) Snake
Figure 5: Training sets taken from the Berkeley images chosen for the experiments. Blocks
of same color inside a particular image belong to the same texture class.
means pixel-wise classification, and d = 0 means absence of samples replication by rotation.
The MAE error, i.e., the percentage of misclassified pixels according to
the groundtruths presented in Figure 6, was adopted as classification error
measurement. Tables 1, 2, 3 and 4 show the MAE errors corresponding to the
classification of pyramid, horse, lake and snake images, respectively, for every
parameter combination. A first conclusion that can be drawn from these
results is that rotation and HSV channels usually improve the classification
performance considerably. Hence, we concentrate the analysis on the rightbottom quadrant of the tables, which means fixed RGB+HSV and rotation,
16
a) Pyramid
b) Horse
c) Lake
d) Snake
Figure 6: Respective manual segmentations (groundtruth) supplied by the Berkeley images
repository for the images shown in the Figure 5.
and variable k and |W |. For the pyramid image, higher k leads to better
results, but for the horse image, the better results are achieved for lower k.
For the other two images, all values of k give similar results k is the best for
the other three images. So, it is difficult to tell which k is the best, since
it depends on the particularity of the images. There is no definitive answer
for the best window size as well. Theoretically, larger windows could achieve
better classification results. In practice, this does not occur for two reasons.
First, larger windows collect less samples in the border of the images (in the
case of the training sets used in this set of experiments, the images supplied
for training are just 15×15 sized blocks). Second, larger windows increase the
17
search space. In this way, suboptimal feature selection algorithms like SFFS
tend to achieve results far from optimal for large windows. By considering
all these statements and paying carefully attention to the four tables, the set
of parameters that displays competitive results for all images except the lake
image is RGB+HSV, rotation (d = 45), k = 16 and |W | = 3 × 3. So, these
values are fixed from now on.
Table 1: W-operator classification errors (MAE) for the pyramid image obtained by all
combinations of the considered parameter values.
no rotation
rotation
k=4
k = 8 k = 16 k = 4
k = 8 k = 16
|W | = 1 × 1 0.2268 0.1870 0.1077
–
–
–
RGB |W | = 3 × 3 0.1613 0.1377 0.0960 0.1473 0.1259 0.0895
|W | = 5 × 5 0.1459 0.1383 0.1040 0.1271 0.1249 0.0875
|W | = 7 × 7 0.1651 0.1838 0.1051 0.1316 0.1372 0.0949
|W | = 1 × 1 0.1839 0.1694 0.1025
–
–
–
RGB |W | = 3 × 3 0.1727 0.1296 0.0918 0.1571 0.1144 0.0853
HSV |W | = 5 × 5 0.1509 0.1238 0.1026 0.1214 0.1135 0.0865
|W | = 7 × 7 0.1873 0.1299 0.1037 0.1497 0.1115 0.0913
Table 2: W-operator classification errors (MAE) for the horse image obtained by all
combinations of the considered parameter values.
no rotation
rotation
k=4
k = 8 k = 16 k = 4
k = 8 k = 16
|W | = 1 × 1 0.3728 0.2843 0.2793
–
–
–
RGB |W | = 3 × 3 0.3062 0.3048 0.2835 0.2892 0.2796 0.2891
|W | = 5 × 5 0.3121 0.3113 0.2778 0.2834 0.2792 0.2915
|W | = 7 × 7 0.2904 0.2928 0.3410 0.2843 0.3022 0.2970
|W | = 1 × 1 0.3438 0.2959 0.2802
–
–
–
RGB |W | = 3 × 3 0.2627 0.2743 0.2838 0.2515 0.2669 0.2759
HSV |W | = 5 × 5 0.2557 0.2907 0.2876 0.2463 0.2864 0.2757
|W | = 7 × 7 0.2857 0.2779 0.3029 0.2421 0.2903 0.2860
For the fixed parameters, the sub-window points chosen by the adopted
feature selection method (SFFS) guided by the equation 5 are illustrated in
Figure 7. The results of applying the classifier (W-operator) using the selected points is presented in Figure 8. Using just pixel-wise classification,
i.e., 1 × 1 window, all MAE errors become increased by a factor of 12.45%,
1.02%, 7.19% and 25.16% respectively. Therefore, using local information
18
Table 3: W-operator classification errors (MAE) for the
binations of the considered parameter values.
no rotation
k=4
k = 8 k = 16
|W | = 1 × 1 0.4050 0.3088 0.2789
RGB |W | = 3 × 3 0.3462 0.2456 0.2342
|W | = 5 × 5 0.3591 0.2629 0.2440
|W | = 7 × 7 0.3895 0.2980 0.2650
|W | = 1 × 1 0.2666 0.2145 0.2291
RGB |W | = 3 × 3 0.2604 0.2075 0.2186
HSV |W | = 5 × 5 0.2492 0.2294 0.2343
|W | = 7 × 7 0.2579 0.2166 0.2022
lake image obtained by all com-
k=4
–
0.3236
0.3306
0.3373
–
0.2191
0.2276
0.2342
rotation
k=8
–
0.2365
0.2759
0.2569
–
0.2048
0.2048
0.2228
k = 16
–
0.2125
0.2280
0.2366
–
0.2065
0.2218
0.2498
Table 4: W-operator classification errors (MAE) for the snake image obtained by all
combinations of the considered parameter values.
no rotation
rotation
k=4
k = 8 k = 16 k = 4
k = 8 k = 16
|W | = 1 × 1 0.5086 0.6013 0.4989
–
–
–
RGB |W | = 3 × 3 0.5632 0.5544 0.4473 0.5467 0.5000 0.4343
|W | = 5 × 5 0.6051 0.5158 0.4468 0.6286 0.4978 0.4455
|W | = 7 × 7 0.5692 0.5078 0.4813 0.6538 0.4832 0.4371
|W | = 1 × 1 0.4655 0.4982 0.4855
–
–
–
RGB |W | = 3 × 3 0.5025 0.4370 0.4061 0.5086 0.3876 0.3879
HSV |W | = 5 × 5 0.5676 0.4775 0.4407 0.5859 0.3835 0.4144
|W | = 7 × 7 0.5886 0.4190 0.4073 0.5650 0.3621 0.4065
instead of pixel-wise classification has a better performance in capturing texture information as expected. Also notice that in Figure 8 the W-operator
is employed without any combination with Imesh, i.e., W-operator assigns
a label to each pixel based only on the values of the conditional probability
matrices.
As depicted in Figure 9, most pixels receive a high conditional probability,
ensuring reliable labels. However, in some regions (see for example figures 9c)
and d)), the labels are produced from low conditional probability values,
causing a noise label distribution, as seen in Figures 8c) and d).
On the other hand, the combination of W-operator with Imesh reduces
the noise label distribution problem drastically, as labels are assigned to
triangles instead of pixels. This can be clearly observed in Figures 10a)–
d), where the MAE error has been reduced by a factor of 25.2%, 22.1%,
70.0% and 45.4% with respect to the images in Figures 8a)–d), respectively.
19
a) Pyramid
b) Horse
c) Lake
d) Snake
Figure 7: Sub-windows resulted by applying SFFS with the criterion given by the Equation 5 on the training sets shown in Figure 5.
Figure 11 shows the triangular meshes from which the images in Figures 10a)–
d) have been generated. Figures 12a)–d) depict the result of applying the
mesh improvement procedure described in section 4.2 to the triangulations
shown in Figures 11a)–d), respectively.
In order to compare our results with another color texture classification
technique, we also implemented a Multi-Layer Perceptron (MLP) neural network tuned to classify textures. Our implementation is based on a MLP
commonly used in pattern and texture recognition [30]. The MLP has been
trained with the standard backpropagation algorithm, whose input is the
same training set used by W-operators. The topology of the MLP is comprised by three layers: the first one is defined from the size of the mask
representing the training window (13 × 13 per channel); the intermediate
layer is defined empirically, depending on the MLP convergence (we have
adopted 11 neurons); the output layer contains a single neuron that supplies
the scalar characterizing the class where the texture must belong to. After
the learning step, the MLP contains the weights that will classify the whole
texture.
The result of applying the MLP classifier is used to define the labels
during the Imesh algorithm. The final results applied to the pyramid, horse,
lake and snake images are shown in Figure 13. As it can be seen, the MAE
error obtained with MLP is much higher than those presented in Figure 10,
showing that the combination of W-operator and Imesh gives rise to an
effective texture classifier. Furthermore, the proposed approach can supply,
besides a segmented image, a quality triangular mesh, a characteristic not
found in other textured color image segmentation techniques.
20
b) Horse (MAE = 27.59%)
a) Pyramid (MAE = 8.53%)
c) Lake (MAE =
20.65%)
d) Snake (MAE =
38.79%)
Figure 8: Results of the W-operator application on the images of Figure 5 with their
respective Mean Absolute Errors (MAE). The parameters adopted are: RGB+HSV, |W | =
3 × 3, k = 16 and rotation of 45 degrees (d = 45).
21
a) Pyramid
b) Horse
c) Lake
d) Snake
Legend
Figure 9: Images representing the largest probability values of the conditional probability
matrices corresponding to the W-operator classification results shown in Figure 8.
22
a) Pyramid (MAE = 6.16%)
b) Horse (MAE = 21.50%)
c) Lake (MAE = 6.20%) d) Snake (MAE = 21.17%)
Figure 10: Images generated from the triangulations presented in Figure 11.
23
a) Pyramid
b) Horse
c) Lake
d) Snake
Figure 11: Triangular mesh generated from the combination of W-operator and Imesh.
24
a) Pyramid
b) Horse
c) Lake
d) Snake
Figure 12: Mesh improvement applied to the triangulations shown in Figures 11a)–d).
25
a) Pyramid (MAE = 9.67)%
b) Horse (MAE = 34.69%)
c) Lake (MAE = 30.15%) d) Snake (MAE = 22.18%)
Figure 13: Results generated from the combination of the MLP classifier and Imesh.
26
5.2. VisTex dataset images
Here, results obtained from application of our method to four images pertaining to the VisTex Database3 are presented. The W-operator parameters
used here are the same fixed in the previous section: RGB+HSV, |W | = 3×3,
k = 16 and d = 45. Figure 14 shows the four chosen images and their respective training sample images. The results also show that the meshes
generated by Imesh based on the probability matrices generated by the Woperator eliminate almost completely the classification errors committed by
the W-operator.
Figure 15 shows the results of the W-operator classification, the MLP classification, and the W-operator+Imesh technique for the images of Figure 14.
The visual results indicate a better performance by using the W-operator
technique.
6. Conclusion
In this work we presented a new framework for image based mesh generation from textured color image segmentation, which combines W-operators
with Imesh. The presented results show that such proposed unified framework is an effective mechanism to accomplish textured color image segmentation as well as image meshing.
Although other methods have been proposed to generate triangular meshes
from images, to the best of our knowledge, there exists no technique devoted
to combine color texture classifier with mesh generation. This is hence an important contribution of our work. Furthermore, because the proposed framework allows to assign labels to triangles, it is prone to reduce the number of
misclassified elements, being more efficient than a classifier that operates in
pixel level.
Another interesting aspect of the proposed approach is that it can be
extended to 3D images. An important consequence of this extension is that,
if successful, it will give rise to a new technique to generate meshes straightly
from volumetric data, what is of paramount importance in numerical simulation from medical and biological data.
As a future work, an interactive software can be implemented in which
the user supplies an initial training set by just clicking over the objects of
3
http://vismod.media.mit.edu/vismod/imagery/VisionTexture/vistex.html
27
Brick Paint
Grass Land
Grass
Brick
Grass
Grass
Mt Valley
Valley Water
Brick
Paint
Land
Mt
Valley
Valley
Water
Land
Land
Figure 14: Original images selected from the Vistex Database and their respective training
images (128 × 128 size).
28
a) Brick Paint
b) Grass land
c) Mt Valley
d) Valley Water
Figure 15: From left to right: original images, W-operator results, MLP-based classification results and W-operator+Imesh results
29
interest and the system displays a first result from the W-operator+Imesh
technique. If the user thinks the results are not satisfactory, the user may
continue to click the objects in order to increase the training set and refine
the results until satisfaction.
Acknowledgment
The authors are grateful to FAPESP, CNPq and CAPES for financial
support. The test images have been obtained from [31].
References
[1] D. C. Martins-Jr, R. M. Cesar-Jr, J. Barrera, W-operator window design by minimization of mean conditional entropy, Pattern Analysis &
Applications 9 (2006) 139–153.
[2] A. Cuadros-Vargas, L. Nonato, R. Minghim, T. Etiene, Imesh: An image based quality mesh generation technique, in: IEEE Proceedings
SIBGRAPI’05, 2005, pp. 341–348.
[3] J. Ruppert, A delaunay refinement algorithm for quality 2-dimensional
mesh generation, Journal of Algorithms 18 (3) (1995) 548–585.
[4] M. Garcı́a, A. Sappa, B. Vintimilla, Efficient approximation of grayscale images through bounded error triangular meshes., in: IEEE Intern.
Conf. on Image Processing, 1999, pp. 168–170.
[5] S. Fortune, Voronoi diagrams and delaunay triangulation, in: H. F.K.,
D. D.Z. (Eds.), Computing in Euclidean Geometry, Vol. 1 of Lecture
Notes Series on Computing, World Scientific, Singapore, 1992, pp. 193–
233.
[6] C. Huang, C. Hsu, A new motion compensation method for image sequence coding using hierarchical grid interpolation, IEEE Trans. Circuits Syst. Video Technol. 4 (1994) 44–51.
[7] M. Garland, P. Heckbert, Fast polygonal approximation of terrains and
height fields, Tech. Rep. CMU-CS-95-181, Carnegie Mellon University
(1995).
30
[8] T. Gevers, A. Smeulders, Combining region splitting and edge detection
through guided delaunay image subdivision., in: IEEE Proceedings of
CVPR, 1997, pp. 1021–1026.
[9] S. Coleman, B. Scotney, Mesh modeling for sparse image data set, in:
IEEE ICIP, IEEE Computer Society, 2005, pp. 1342–1345.
[10] A. Ciampalini, P. Cignoni, C. Montani, R. Scopigno, Multiresolution
decimation based on global error, The Visual Computer 13 (5) (1997)
228–246.
[11] H. Pedrini, An improved refinement and decimation method for adaptive
terrain surface approximation, in: WSCG’2001, 2001, pp. 5–9.
[12] P. Kocharoen, K. Ahmed, R. Rajatheva, W. Fernando, Adaptive mesh
generation for mesh-based image coding using node elimination approach, in: IEEE ICIP, 2005, pp. 2052–2056.
[13] D. Terzopoulos, M. Vasilescu, Sampling and reconstruction with adaptive meshes, in: IEEE Int. Conf. Comp. Vision, Pattern Recog., 1992,
pp. 829–831.
[14] L. Hermes, J. Buhmann, A minimum entropy approach to adaptive image polygonization, Image Processing, IEEE Transactions on 12 (10)
(Oct. 2003) 1243–1258.
[15] Y. Yang, M. Wernick, J. Brankov, A fast approach for accurate contentadaptive mesh generation, IEEE Trans. on Image Processing 12 (8)
(2003) 866–881.
[16] J. Cebral, R. Lohner, From medical images to cfd meshes, in: Proceedings of the 8th International Meshing Roundtable, 1999, pp. 321–331.
[17] Y. Zhang, C. Bajaj, B.-S. Sohn, Adaptive and quality 3d meshing from
imaging data, in: SM ’03: Proceedings of the eighth ACM symposium
on Solid modeling and applications, 2003, pp. 286–291.
[18] G. Berti, Image-based unstructured 3d mesh generation for medical
applications, in: ECCOMAS - European Congress On Computational
Methods in Applied Sciences and Engeneering, 2004.
31
[19] D. Hale, Atomic images - a method for meshing digital images, in: 10th
International Meshing Roundtable, 2001, pp. 185–196.
[20] J. Shewchuk, Delaunay refinement algorithms for triangular mesh generation, Computational Geometry: Theory and Applications 22 (2-3)
(2002) 21–74.
[21] E. R. Dougherty, J. Barrera, Pattern recognition theory in nonlinear
signal processing, J. Math. Imaging Vis. 16 (3) (2002) 181–197.
[22] E. R. Dougherty, J. Barrera, G. Mozelle, S. Kim, M. Brun, Multiresolution analysis for optimal binary filters, J. Math. Imaging Vis. 14 (1)
(2001) 53–72.
[23] P. Pudil, J. Novovicovı̈¿ 21 , J. Kittler, Floating search methods in feature
selection, Pattern Recognition Letters 15 (1994) 1119–1125.
[24] Z. Yan, S. Kumar, C. Kuo, Mesh segmentation schemes for error resilient
coding of 3-d graphic models, IEEE Trans. Cir. Sys. Video 15 (1) (2005)
138–144.
[25] H. Zhang, R. Liu, Mesh segmentation via recursive and visually salient
spectral cuts, in: Vision, Modeling, and Visualization, 2005, pp. 429–
436.
[26] D. Cohen-Steiner, P. Alliez, M. Desbrun, Variational shape approximation, ACM Trans. Graph. 23 (3) (2004) 905–914.
[27] L. Guillaume, D. Florent, B. Atilla, Curvature tensor based triangle
mesh segmentation with boundary rectification, in: Computer Graphics
International, 2004, pp. 10–17.
[28] E. Bertin, S. Marchand-Maillet, J. Chassery, Optimization in voronoi
diagrams, in: J. Serra, P. Soille (Eds.), Mathematical Morphology and
Its Applications to Image Processing, Kluwer Academic, 1994, pp. 209–
216.
[29] J. Shewchuk, Lecture notes on delaunay mesh generation, Tech. Rep.
CA 94720, Department of Electrical Engineering and Computer Science
- Berkeley (2000).
32
[30] N. N. Kachouie, J. Alirezaie, Texture segmentation using gabor filter and
multi-layer perceptron, in: IEEE International Conference on Systems,
Man and Cybernetics, 2003, pp. 2897–2902.
[31] D. Martin, C. Fowlkes, D. Tal, J. Malik, A database of human segmented
natural images and its application to evaluating segmentation algorithms
and measuring ecological statistics, in: Proc. 8th Int’l Conf. Computer
Vision, Vol. 2, 2001, pp. 416–423.
33