Interactive face generation from verbal description using conceptual

52
JOURNAL OF MULTIMEDIA, VOL. 3, NO. 2, JUNE 2008
Interactive face generation from verbal
description using conceptual fuzzy sets
Hafida Benhidour, Takehisa Onisawa
Humanistic systems Lab., Department of Intelligent Interaction Technologies, Graduate School of Systems and
Information Engineering, University of Tsukuba, 1-1-1, Tennodai, Tsukuba, 305-8573 Japan
Email: [email protected], [email protected]
Abstract— In this article, a human centered approach for
interactive face generation is presented. The users of the
system are given the possibility to interactively generate
faces from verbal description. The system generates automatically an average face from the same race of the target
face using anthropometric measurements of the face and the
head and displays it to the user as an initial face. This face is
then transformed to the desired face according to the verbal
description given by the user. The user describes each facial
feature using words, and then the feature changes according
to the user’s description. Depending on the race of the user
and the gender of the face drawn, the relation between
the size of each facial feature and the set of the linguistic
words used to describe it is obtained using Conceptual
Fuzzy Sets(CFS), which are used to represent the meaning of
the words. CFSs can represent context dependent meaning.
The context here is the race of the user and the gender
of the target face. We describe the encouraging results of
the experiments and discuss future directions for our face
generation system.
Index Terms— anthropometric measurements of the face,
face generation, context dependent meaning of words, conceptual fuzzy sets
I. I NTRODUCTION
Human faces have been the subject of interest of
numerous studies [1], [2], [3], [4], [5], [6]. Basically,
there are two types of studies dealing with faces, a study
on face recognition which detects patterns and shapes
in the face, and the reverse process, a study on face
reconstruction, which is a classic problem in criminology.
In 2D face reconstruction systems, the image processing approach and the human centered approach, both with
different drawing styles, are the main approaches adopted
to generate faces.
The first approach uses a 2D image of the face as input
to generate a face drawing [5], [7], [8], [9], [10].
The face can be drawn exactly by this approach. However, even if different users draw the same face by this
approach, only the same face is generated. Furthermore,
although a person who wants to draw a face has usually
some subjective impressions on this face, such as, strong,
cheerful, small eyes, normal mouth, it is difficult to reflect
these impressions in the generated face only by image
This paper is based on “Drawing human faces using words and
conceptual fuzzy sets” by H.Benhidour and T. Onisawa, which appeared
in the Proceedings of the IEEE International Conference on Systems,
Man and Cybernetics, ISIC, Montreal, QC, Canada, October 2007.
c 2007 IEEE.
© 2008 ACADEMY PUBLISHER
processing since impressions are subjective and depend
on the person who describes the face.
In the human centered approach, a user is a part of
the system and interacts with it using words until he/she
obtains a desired face [11]. A face is generated based
on the user’s verbal description of facial features and the
impressions about the target face. The user has the possibility to retouch the obtained face using linguistic words
until a satisfied face is drawn. If different users draw
the same target face, various types of faces are obtained
reflecting users’ impressions. Our face generation system
is based on the human centered approach and considers
user’s subjectivity in face drawing.
This paper is an extension to [12], with more analysis
and experiments. The paper presents our approach to
draw faces of people belonging to different races from a
description with words of the face and adapt the meaning
of these words according to the user’s race and the gender
of the drawn face. Using anthropometric measurements of
the face and the head, the system generates automatically
an average face from the same race of the target face
and displays it to the user as an initial face. This face
is then transformed to the desired face according to the
verbal description given by the user. The user describes
each facial feature using words, and the feature changes
according to the user’s description. Depending on the race
of the user and the gender of the face to be drawn, the
relation between the size of each facial feature and the
set of the linguistic terms used to describe it is obtained
by conceptual fuzzy sets. These fuzzy sets represent the
meaning of the words.
Such a system may have potential applications, such as
helping police investigators to draw suspect’s faces based
on a verbal description of the target face that the witness
has given to them, or just drawing faces for fun.
The remainder of the paper describes our approach
in more details. Section II shows some related works.
Section III provides a description of the proposed method
and Section IV covers the generation of the average faces
according to the anthropometric measurements of the face
and the head. Section V explains the use of conceptual
fuzzy sets to obtain the meaning of the linguistic words
according to the race and the gender of the target face.
Section VI shows evaluation results while conclusion and
future work are discussed in Section VII.
JOURNAL OF MULTIMEDIA, VOL. 3, NO. 2, JUNE 2008
II. R ELATED WORKS
There have been many attempts to automatically generate facial drawings. Most of them are based on the image
processing approach, such as the systems proposed by
Fujiwara et al. [7] and Chen et al. [8]. The face can
be drawn exactly by this approach but cannot reflect the
user’s impressions, as mentioned in Introduction.
Caricature Generator, the first computer graphics technique for facial caricatures developed by Brennan [1],
generates a caricature by comparing the proportions of a
particular face to those of a mean face, and exaggerating
any differences found. Though this formalization remains
relevant in recent works, it does not allow the user to use
verbal description which is more easy for human to deal
with.
Our face generation system deals with drawing faces
from the same perspective as Hirasawa et al [13], [14]
did for face caricatures drawing with words. They use a
database of parameters of facial features described with
and without impressions, collected for each user, and then
computes new values for each parameter according to the
inputted words and the values stored in the actual user’s
database. However, their approach does not allow to have
multiple meanings for a word and allows to draw faces
of only Japanese young men.
Another approach similar to [13] is described in [6],
a system for drawing facial caricatures and exaggerating
them is realized using the concept of Fuzzy Information
Granulation. The linguistic terms describing the features
of the face are represented as fuzzy set related to the
concept of Computing with Words. A face space which
consists of some parameters of the face is defined, drawing a caricature can be considered as seeking a point
in the face space that satisfies restrictions given by the
words expressing the impressions on the target face.
These restrictions are fuzzy subsets in the face space.
Exaggerating parts of the facial caricature is realized as
rules on the linguistic labels.
In our approach, anthropometric measurements of the
face [15] are used to generate an average face for each
race and conceptual fuzzy sets (CFS) [16], [17], [18] are
used to compute the meaning of each word according to
the race of the user and the gender of the target face. Our
approach does not require the user to introduce individual
parameters before using the system. In addition, it allows
users to draw faces of men and women belonging to
different races and to have multiple meanings for each
word which is more close to reality when having users
from different races. Indeed, meaning of words used to
describe a face may change from a person to another
depending on his/her race. For example, subjective impressions expressed by an American when describing an
American face are different from those expressed by a
Japanese for the same face. The Japanese may consider
the eyes big. On the other hand, the American may
consider the same eyes as being normal.
© 2008 ACADEMY PUBLISHER
53
III. S YSTEM FRAMEWORK
The system is designed to generate faces interactively,
to allow a user to see the obtained face for each description and retouch using words the parts of the face that
do not correspond to the target face. At each time the
user gives a verbal description for a facial feature, the
corresponding changes are applied to the feature and the
new face is displayed.
The system consists of an input part, a processing part
and an output part as shown in Fig. 1. In the input part,
the user selects the gender, the race of the target face
and user’s race. The user uses words to describe the
facial features and the face. Each facial feature can be
described by a set of linguistic terms. Table I summarizes
the linguistic descriptors used in the system. The user
can choose a hairstyle from a set of hair styles, manually
constructed, and a face shape among six kinds of shapes:
oval, round, long, square, heart and diamond.
In the processing part, a mesh of the face, prepared
using the modeling tool Blender [19] is loaded into the
system. Then, using the anthropometric measurements
as constraints on the mesh and applying the appropriate
geometric transformations, an average face from the same
race and gender as the target face is generated automatically. For each race and gender, there is an average face.
The average face is shown to the user as an initial face to
which modifications are applied according to the user’s
description until the desired face is obtained. Conceptual
fuzzy sets are used to determine the relation between each
word of the verbal description given by the user and the
new measurements of the corresponding facial feature.
Then, appropriate geometric transformations are applied
to the corresponding facial feature of the average face to
modify it according to the new measurements and obtain
a new facial feature. The corresponding facial feature
changes at each time the user gives a description.
The obtained face at the output part can be retouched
using the verbal description of the input part of the
system. The user repeats this step until he/she obtains
the desired face.
IV. G ENERATION OF AVERAGE FACES
Faces are different from one race to another in their
sizes, colors and facial features. However, humans are
able to recognize their races and gender easily because
they detect the similarities between these faces and the
average face of each race. Thus, an average face is used
for each race. When the user gives a description of a
face including the race, the corresponding average face is
displayed and modified until it matches the desired face.
To generate automatically an average face for each race,
the system needs measurements of the features and the
face. The anthropometry of the face and the head provides
such measurements.
A. Face anthropometry
Anthropometry is the science of human body measurement [15]. Anthropometric evaluation begins with the
54
JOURNAL OF MULTIMEDIA, VOL. 3, NO. 2, JUNE 2008
TABLE I.
FACIAL FEATURES AND THEIR LINGUISTIC DESCRIPTORS
Feature
Element
Descriptors
Face
shape
oval, round, square,
diamond, long, heart
a set of colors
large,normal,narrow
big, normal,small
large, normal, small
skin color
width
height
forehead
Eyebrows
length
width
position
color
long, short, normal
thin, medium, thick
more up,more down, close,
wide
curved, angled, soft angled,
round, flat
a set of colors
Eye
height
length
position
eye and eye
shape
color
big, small, normal
big,normal,small
more up, more down
close-set, wide-set eyes
a set of eye shapes
a set of colors
Nose
width
position
big, small, normal
more up,more down,
to left, to right
a set of nose shapes
shape
Figure 1. Face generation system framework
shape
Mouth
identification of particular locations on a subject, called
landmark points, defined in terms of visible or palpable
features (skin or bone) on the subject. A series of measurements between these landmarks is then taken using
carefully specified procedures and measuring instruments.
The Farkas’s system [15] uses a total of 47 landmark
points to describe the face. Fig. 2 descibes some of them.
Farkas’s system uses linear and angular measurements
to describe the human face. The linear measurements
are projective or tangential. The angular measurements
records inclinations or angles.
In this work, projective measurements and some angular measurements are used. Projective measurements are
the shortest distance determined between two landmarks,
for example, the mandible width (go-go) in Fig. 2. The
only angular measurement used is the inclination angle
measurement of the eye, the inclination of the line (exen) of the eye in Fig. 2. In [15], the mean values of
Figure 2. Example of facial landmarks and anthropometric measurements
© 2008 ACADEMY PUBLISHER
length
position
shape
color
big, small, normal
more up, more down,
to left, to right
a set of mouth shapes
a set of colors
Ear
size
position
small, normal, big
more up, more down
Hair
style
color
a set of hairstyles
a set of colors
the measurements and the standard deviations for each
feature of the face and the head are provided for North
American Caucasian, Chinese and African-American, for
men and women.
B. Face Modeling
B-spline surfaces as shapes representation are used to
model a face. A mesh of a face and a mesh for each facial
feature are prepared using Blender, a 2D and 3D modeling
tool [19]. The meshes of the face and the head are then
loaded into the system. According to the race specified
by the user, the anthropometric landmarks are assigned
to fixed locations on these meshes. The mesh of each
feature is then loaded in the appropriate position given
by the landmarks. Then, the system computes the ratio
between the measurement of each feature of the mesh
and the mean value provided by the anthropometry study.
The ratios are used to perform a scaling for each facial
feature and the face to generate automatically the average
face for the race. Fig. 3 and Fig. 4 show examples of the
JOURNAL OF MULTIMEDIA, VOL. 3, NO. 2, JUNE 2008
Figure 3. Example of average faces of men (from the left to the right:
(a)White race, (b)Black race, (c)Asian race)
Figure 4. Example of average faces of women (from the left to the
right: (a)White race, (b)Black race, (c)Asian race)
average faces generated. These average faces are different
in the position and the size of their features.
V. C ONTEXT DEPENDENT MEANING OF VERBAL
DESCRIPTION
Once the average face is generated, the system computes the meaning of the words used to describe the
face. By the expression ”meaning of the words”, we
mean the relation between the size of each facial feature
including the face and the set of the linguistic words used
to describe it.
In the field of fuzzy sets, a linguistic value has generally
a fixed membership function once it is determined. However, a label of a fuzzy set can have a variety of meanings
depending on various conditions [18]. For example, the
label tall has different meanings depending on whether the
context is Japanese or American. In our face generation
system, the meaning of the words used to describe the
face and the facial features depends on the race of the
user and the gender of the target face. Indeed, subjective
impressions expressed by an American when describing
an American face are different from those expressed by
a Japanese for the same face. The Japanese may consider
the eyes big. On the other hand, the American may
consider the same eyes as being normal. Thus, the shape
of a fuzzy set representing the meaning of a linguistic
word is different from a race to another and from a gender
to another and depends on what the current context is.
Fig. 5 shows the different fuzzy sets corresponding to the
meaning of the words, Small, Normal and Big for the
height of the eyes depending on the race of the user and
the gender of the target face.
The race of the target face is used to generate the
corresponding average face to be displayed as an initial
© 2008 ACADEMY PUBLISHER
55
face to the user. However, this race is not used to compute
the meaning of the words describing the face. Only the
race of the user and the gender of the target face have
influence on the meaning of the words. Indeed, suppose a
user A from the white race describing the eyes of a white
woman B as normal eyes. The user A has considered the
eyes normal because he is comparing the height of the
eyes with the height of the normal eyes of his race (white).
However, normal eyes in women of his race are different
from those in men, thus the user is comparing the eyes of
the woman B with the normal eyes of women in the white
race and not with those of men. The same user A describes
the eyes of an Asian woman C as small, because the user
A is comparing them with the normal eyes in women of
his race, which is the white race, and not with the normal
eyes in women of the Asian race, which is the race of the
woman C. Thus, for the user A, the race of the described
face does not affect his/her judgment, since the user is
comparing the eyes with the normal eyes of his race for
the same gender as that of the target face.
For a user from a race r, eyes having their height close
to the mean value of heights of eyes in the same race r are
considered normal eyes. So, the user A from the White
race compares the height of the eyes of the target face
to the mean of the heights µwhite (g) of the eyes in his
race and the same gender g of the target face. The mean
µr (g) and the standard deviation σr (g) are obtained from
the anthropometry study, where r is the race and g is the
gender. In Fig. 5, the width of the fuzzy sets for the word
Normal is defined as 4σr (g). Thus, if the height h of the
eye is in the interval [µr (g) − 2σr (g), µr (g) + 2σr (g)],
the eye is considered normal with a degree of membership
equal to Fr,g (h) where Fr,g () is the membership function
of Normal eyes for the race r and the gender g.
Thus, considering the race and the gender leads to a
context dependent meaning of the words which can be
represented by Conceptual fuzzy sets (CFS), where the
context is the race r of the user and the gender g of the
face. The CFS provides the relation between the size s
of the facial feature and the linguistic words w used to
Figure 5. Fuzzy sets for Big, Normal and Small depending on race of
user and gender of face
56
JOURNAL OF MULTIMEDIA, VOL. 3, NO. 2, JUNE 2008
describe it according to the following relation
s = Fcf s (r, g, w)
A. Conceptual fuzzy sets
Conceptual fuzzy sets are another type of fuzzy sets
used to represent context dependent meaning and complex
concepts [16], [17], [18]. A label of a fuzzy set is the
name of a concept, such as tall and short for the height
of a person, and the fuzzy set represents the meaning of
the concept. The shape of a fuzzy set is determined from
the meaning of the label depending on various situations.
A CFS represents the meaning of a concept by the
distribution of activation of its labels. The distribution
changes depending on the actual context and thus on the
activated labels. When activating more than two labels in
a CFS, the resulted CFS contains overlapped propagations
of activations and assigns a degree of connection between
each two labels. The activations resulted through the CFS
show a context dependent meaning. Indeed, since the
figure of the CFS depends on the activation of labels
indicating the context as well as the activation of a
label naming a concept, different situations make different
figures representing thus the meaning of the label (the
concept) for that context.
In our system, CFSs are realized using Bidirectional
Associative Memories (BAM) [20]. Let W = [wij ]m×n
be the weight matrix of the BAM, where m and n denote
the number of nodes in the input and output layers, and
wij denotes the synaptic strength of the connection from
the ith input node to the j th output node. The network
structure of the Bidirectional Associative Memory model
is similar to that of the linear associator with an input
layer X and an output layer Y, but the connections are
bidirectional, wij = wji , for i = 1, 2, ..., m and j =
1, 2, ..., n. The units in both layers serve also as both
input and output units depending on the direction of
propagation.
In CFS, a node in the BAM represents a concept and
a link represents the strength of the relation between
the two connected concepts. The network propagates
an input vector of concepts X = {x1 , ..., xm } to the
Y layer, the units in the Y layer are computed using:
m
P
yj =
xi wij + Jj . The pattern Y produced on the
B. CFSs used for face generation
The number of CFSs used per facial feature depends
on the words used to describe that feature. For example,
eyes need two CFSs, one for the length and another for
the height. The user can choose a shape for the eyes and
the color of the pupil. The CFS for the height of an eye is
shown in Fig. 6. It consists of two layers. The upper layer
includes nodes describing the race of the user (White,
Black and Asian are considered here), the gender of the
face and the linguistic words used to describe the eye
(big, small and normal). The lower layer includes nodes
representing the height of the eye. Each CFS is trained to
learn instances given during the off-line learning process
by computing the corresponding correlation matrix of
the bidirectional associative memory of the CFS. The
following are examples of instances for the CFS for the
height of the eye:
•
•
The height of person A’s eyes is 1.1 cm . The person
A is a man. These eyes are normal with a grade 0.9
for a White user.
The height of person A’s eyes is 1.1 cm. The person
A is a man. These eyes are big with a grade 0.95 for
a Asian user.
The user usually describe the face and the facial
features using linguistic words and adverbs. The adverbs
used in the system are little, almost and very defined by
their respective values 0.3, 0.6 and 1. The activation value
of the node in the CFS corresponding to the linguistic
word is the value of the adverb used in the description.
Suppose a user from the white race wants to draw
eyes of a man. The user describes the eyes as little big.
The context here is ”white race” and ”man”. Thus, the
nodes ”white” for the user’s race, ”man” for the gender
and ”big” with the value 0.3 for the height are activated
in the CFS of the height of the eyes. In Fig. 6, the
activated nodes in the corresponding CFS are shown.
If another user describes the eyes as almost big, the
node in the CFS corresponding to ”big” is activated with
the value 0.6 corresponding to the value of Almost. The
distribution of activation in the lower layer activates the
appropriate nodes representing the height of the eye for
that context. The values of the activated nodes in the
output layer represent the degree of connection between
i=1
output layer is then propagated back to the X layer using:
n
P
xi =
yj wij + Ii . Ii and Jj are extra external inputs
j=1
for each unit in the input and the output layer, respectively.
Activation of concepts corresponding to a particular situation produces reverberation, then after stabilization, other
nodes in the Y layer are activated.
Bidirectional Associative Memories are capable to
learn associations of examples by computing the correlation matrix W using the Hebbian rule with the Optimal
gradient-descent method and the dummy augmentation
method [21]. These two methods increase the storage
capacity of the BAM considerably.
© 2008 ACADEMY PUBLISHER
Figure 6. Concetual Fuzzy Set for height of eye
JOURNAL OF MULTIMEDIA, VOL. 3, NO. 2, JUNE 2008
the two connected concepts. The node with the maximum
value is chosen to represent the meaning of ”little big
eyes” which correspond here to 1.2 cm.
Once the new height is obtained, appropriate geometric
transformations are performed. Let haverage be the height
of the eye in the average face and hcf s the height provided
hcf s
is used to scale
by the CFS. The ratio ratio = haverage
the eye and obtain the desired one. In case the position
of the eye changes, a translation is performed.
VI. E XPERIMENTS AND EVALUATIONS
Fig. 7 and Fig. 8 show illustrations of drawn faces
belonging to the Black and Asian races with their original
faces.
A. Face drawing experiments
In order to evaluate the system, nine users from different races are asked to draw faces of six models, three
men and three women. We have a man and a woman
model from three races, the White, the Black and the
Asian race. The faces of the models are unfamiliar to all
the users. The races of the users are also different, six
of them are Asians and three are Whites. The users are
asked to describe each face using words before using the
system. This description is used to evaluate the verbal
descriptions in Section VI-C.
Fig. 9 shows one of the models, model 1, a White
young man, and the faces drawn by four users. Two
other models, Model 4 and Model 5 with one of the
obtained faces for each model are shown in Fig. 7 and
Fig.8 respectively.
The obtained faces for the same model do not look
necessarily similar due to the subjectivity of the users
when describing the facial features. The users here use
different words to describe the face but have the same
model in their mind.
Table II shows the words used by each user to describe
the eye’s height of the face shown in Fig. 9 and their
57
corresponding inputs and outputs for the eye’s height
CFS.
Two of the White users, user1 and user3, describe the
height as Normal and the CFS output is 1.08 cm. The
White user user5 describes the height as being Little small
and the CFS output is 0.9 cm. The White users compare
the height of the eyes with the height of the normal eyes of
his race, equal to 1.08 cm according to the anthropometric
measurements.
On the other hand, the Asian user user8 and user9
describe the height of the eyes as Normal and the CFS
output is 0.94 cm. The Asian users compare the height
of the eyes with the height of the normal eyes of their
race (which is equal to 0.94 cm for men according to the
anthropometric measurements) and find that it is normal.
The other Asian users describe the height as being Little
big and Almost big.
The output of the CFS in Table II for user5 from the
White race and that for user8 form the Asian race are
in accordance with the judgment of the users. The words
used to describe the eye height here are different among
the two users but the CFS outputs show that their values
are almost similar. On the other hand, the two users user1
and user8, who are from different races, use the same
word Normal to describe the height and the outputs of
the CFS for the eye height are 1.08 cm and 0.94 cm
respectively. The outputs of the CFS show that the word
Normal has different meaning according to the race of the
users.
B. Subjective evaluation of the drawn faces
Each user evaluates whether he/she is satisfied or not
with his/her own drawn face for each model using the
following point-scale: +1: not satisfied at all, +2: not
satisfied, +3: a little not satisfied , +4: neutral, +5: a little
satisfied , +6: satisfied, +7: very satisfied.
The average evaluation of each user for his/her own
drawn faces is shown on Fig. 10. In this evaluation,
one user among the nine is neutral, while all the others
are satisfied with different degrees. Fig. 11 shows the
frequency of each evaluation point-scale for all the users
and all the models. The most used point scale is satisfied.
These results are encouraging. Indeed, the results can be
TABLE II.
I NPUT AND OUTPUT FOR EYE ’ S HEIGHT CFS
Figure 7. Model 4: A Black woman (from the left to the right: (a) the
model, (b) the drawn face)
Figure 8. Model 5: An Asian man (from the left to the right: (a) the
model, (b) the drawn face)
© 2008 ACADEMY PUBLISHER
user1
Race
Linguistic term
CFS input
CFS output
white
normal
1
1.08 cm
user2
asian
little big
0.3
1.1 cm
user3
white
normal
1
1.08 cm
user4
asian
little big
0.3
1.1 cm
user5
white
little small
0.3
0.9 cm
user6
asian
little big
0.3
1.1 cm
user7
asian
almost big
0.6
1.3 cm
user8
asian
normal
1
0.94 cm
user9
asian
normal
1
0.94 cm
58
JOURNAL OF MULTIMEDIA, VOL. 3, NO. 2, JUNE 2008
Figure 9. Model 1: Original face with the drawn faces (from the left to the right: (a)Original face of a White young man,(b) face drawn by White
user1, (c)face drawn by Asian user2,(d) face drawn by White user5,(e) face drawn by Asian user9)
interpreted as the users being satisfied with the majority
of drawn faces, that is the meanings of the words used to
describe the face are in accordance with the judgments of
the users.
C. Evaluation of verbal descriptions
The aim of this evaluation is to show that the words
used to describe faces before and after using the system
are almost the same. Fig. 12 shows the difference between
the words in the initial description given by the user
before drawing the models, and the words used after
interaction with the system. The difference is calculated
after affecting a scale-point to each description. As an
example, +1: very small, +2: almost small, +3: a little
small, +4: normal, +5: a little big, +6: almost big, +7:
very big, are the scale-points affected to the description
of the eye height. If the user describes the eye height
before using the system as almost big, corresponding
to the scale-point +6, and after interaction with the
system as normal, corresponding to the scale-point +4,
the difference between the two words is +2. The values
on Fig. 12 are obtained by averaging the differences
Figure 12. Difference between the initial and the final description given
to the system averaged over all models per user
over all models and all facial features. The results of the
evaluation in figure 12 show that the average difference
per user ranges between the values +0 to +1, which means
that the description given by the user before and after
using the system differs, in average, by none or one
word respectively. This result is appreciable since a person
considering, for example, the eye height as normal, agrees
that the same eyes can also be described as little big, due
to the fuzziness of the description. In addition, the small
value of the average difference means that the impressions
of the users are well expressed by the system.
VII. CONCLUSIONS AND FUTURE WORK
Figure 10. The average evaluation per user
Figure 11. The frequency of the evaluation point-scales
© 2008 ACADEMY PUBLISHER
The paper describes a system for face generation from
verbal description. The system draws faces of people from
different races from a description given by the user using
words and adapts the meaning of these words according
to the user. The drawing process starts by generating first
an average face of the race of the target face from the
anthropometric measurements of the face. The average
face is displayed to the user who can modify it from
a description using words words. Since, the meaning of
these words depends on the race of the user and the gender
of the face to be drawn, the Conceptual Fuzzy Sets are
used to compute their meaning according to the context.
Appropriate geometric transformations are then used to
transform the average face according to the outputs of
the Conceptual Fuzzy Sets. Experiments show that the
proposed approach gives encouraging results to compute
the meaning of the words and draw the faces belonging
to different races.
As a future work we are planing to consider subjective
impression words in drawing faces. User’s subjective
JOURNAL OF MULTIMEDIA, VOL. 3, NO. 2, JUNE 2008
impressions such as young, old, looks pale, cheerful
person and so on are not considered in this work, though
the proposed method is still useful for drawing many
human faces. Another limitation is that the meaning
of the linguistic words do not express the size of the
facial features relatively to the size of the face. For
example, normal eyes in a large face may be described
as small eyes. To solve this issue, proportions between
measurements of the face should be considered in the
computation of the meaning of the words. Finally, the
present system assumes that the meaning of a linguistic
word expressed by persons belonging to the same race is
almost similar. We are planing to adapt the meaning of
the words according to each user using adequate online
learning techniques. The system will learn the meaning
of the words from the retouches made by the user to the
drawn face.
R EFERENCES
[1] S. E. Brennan, “Caricature generator: the dynamic exaggeration of faces by computer,” Leonardo, vol. 18, no. 3,
pp. 170–178, 1985.
[2] D. DeCarlo, D. Metaxas, and M. Stone, “An anthropometric face model using variational techniques,” in SIGGRAPH ’98: Proceedings of the 25th annual conference
on Computer graphics and interactive techniques. New
York, NY, USA: ACM, 1998, pp. 67–74.
[3] D. Borro, Z. Mo, P. Joshi, S. You, and U. Neumann,
“Mobiportrait: Automatic portraits with mobile computing,” in DFMA ’05: Proceedings of the First International
Conference on Distributed Frameworks for Multimedia
Applications. Washington, DC, USA: IEEE Computer
Society, 2005, pp. 340–345.
[4] M. Pantic and L. J. M. Rothkrantz, “Expert system for
automatic analysis of facial expression,” Image and Vision
Computing, vol. 18, pp. 881–905, 2000.
[5] H. Wang and K. Wang, “Facial feature extraction and
image-based face drawing,” in 6th International Conference on Signal Processing, vol. 1, 2002, pp. 699–702.
[6] J. Nishino, M. Yamamoto, H. Shirai, T. Odaka, and
H. Ogura, “Fuzzy linguistic facial caricature drawing system,” in Proceedings of the second International Conference on Knowledge-Based Intelligent Electronic Systems
(KES), vol. 2, 1998, pp. 15–20.
[7] T. Fujiwara, M. Tominaga, K. Murakami, and
H. Koshimizu, “Webpicasso: Internet implementation
of facial caricature system picasso,” in Proceedings
of the Third International Conference on Advances in
Multimodal Interfaces (ICMI’00). UK: Springer-Verlag,
2000, pp. 151–159.
[8] H. Chen, Y.-Q. Xu, H.-Y. Shum, S.-C. Zhu, and N.N. Zheng, “Example-based facial sketch generation with
non-parametric sampling,” in International Conference on
Computer Vision 2001 (ICCV’2001), 2001, pp. 433–438.
[9] P. Chiang, W. Liao, and T. Li, “Automatic caricature
generation by analyzing facial features,” in Proceedings of
Asian Conference on Computer Vision (ACCV), Jeju Island,
Korea, 2004.
[10] H. Chen, Z. Liu, C. Rose, Y. Xu, H.-Y. Shum, and
D. Salesin, “Example-based composite sketching of human
portraits,” in NPAR 04: Proceedings of the 3rd international symposium on Non-photorealistic animation and
rendering. New York, NY, USA: ACM, 2004, pp. 95–153.
[11] T. Onisawa, “Soft computing in human centered systems
thinking,” Lecture notes in computer science, vol. 3558,
pp. 36–46, 2005.
© 2008 ACADEMY PUBLISHER
59
[12] H.Benhidour and T. Onisawa, “Drawing human faces using words and conceptual fuzzy sets,” in Proceedings of
the IEEE International Conference on Systems, Man and
Cybernetics, ISIC, Montreal, QC, Canada, 2007, pp. 2068–
2073.
[13] Y.Hirasawa and T.Onisawa, “Facial caricature drawing
considering user’s image of facial features,” in Proceedings of International Conference on Soft Computing and
Intelligent Systems, Tsukuba, Japan, 2002.
[14] Y. Hirasawa and T. Onisawa, “Facial caricature drawing
using subjective image of a face obtained by words,” in
Proceedings of IEEE International Conference on Fuzzy
Systems, vol. 1, Budapest, Hungary, 2004, pp. 45–50.
[15] L. Farkas, Anthropometry of the Head and Face. Raven
Press, 1994.
[16] T.Takagi, T.Yamaguchi, and M.Sugeno, “Conceptual fuzzy
sets,” in International Fuzzy Engineering Symposium IFES
91, vol. 2, 1991, pp. 261–272.
[17] A.Imura, T.Takagi, and T.Yamaguchi, “Intention recognition using conceptual fuzzy sets,” in IEEE International
Conference on Fuzzy Systems FUZZ-IEEE’93, vol. 2,
1993, pp. 762–767.
[18] T. Terano, M. Sugeno, M. Mukaido, and K. Shigemasu,
Fuzzy Engineering Toward Human Friendly Systems. IOS
Press, 1992.
[19] Http://www.blender.org/cms/Home.2.0.html.
[20] B. Kosko, “Bidirectional associative memories,” in IEEE
Transactions on Systems, Man and Cybernetics, vol. 18,
no. 1, 1988, pp. 49–60.
[21] G. Zheng, S. N. Givigi, and W.Zheng, “A new strategy
for designing bidirectional associative memories,” Lecture
notes in computer science, pp. 398–403, 2005.
Hafida Benhidour is currently a Ph.D. student at the University
of Tsukuba. She received her master and engineer’s degree
in computer science from the Algerian National institute of
computer science in 2005 and 2001, respectively. In 2004, she
was awarded the Monbusho Scholarship to continue her PhD at
the University of Tsukuba, Japan.
Her research interests include application of soft computing
techniques to human centered systems.
Professor Takehisa Onisawa received the B.E. and the M.E.
in Control Engineering in 1975 and 1977, respectively, and
the Dr. Eng. in Systems Science in 1986, all from Tokyo
Institute of Technology. He served as a Research Associate
in the Department of Systems Science, Tokyo Institute of
Technology from 1977 to 1986. From 1986 to 1989, he was
a Lecturer and from 1989 to 1991 an Associate Professor in
the Department of Basic Engineering at Kumamoto University.
Then he joined the University of Tsukuba. Currently, he is a
Professor in the Graduate School of Systems and Information
Engineering. He received the Hashimoto Award from Japan
Ergonomics Association in 1987 and Outstanding Paper Award
at International Symposium on Advanced Intelligent Systems in
2005.
His research interests include applications of soft computing
techniques to human centered systems thinking and Kansei
information processing. He was the 9th President of Japan
Society for Fuzzy Theory and Intelligent Informatics from June
2005 to May 2007. He is a member of IEEE and IFSA.