Impact of various types of smiles on perceived impression of virtual

Impact of various types of smiles on perceived
impression of virtual agents
Daniel Moscoviter
University of Twente
P.O. Box 217, 7500AE Enschede
The Netherlands
[email protected]
ABSTRACT
Smiling has a big influence on our every-day social interaction. In this article, the impact of smiling when applied to a virtual agent that serves as a health advisor
is studied. Using experimental evidence from literature,
three different types of smiles are compared in a fictional
situation after being modeled on a virtual agent. These
scenarios have been rated by test subjects in a practical
experiment, and the results of this experiment have been
analyzed. The results ended up being inconclusive, owing
to multiple factors that include an unrealistic virtual agent
and low number of test subjects.
Keywords
virtual agent, facial expression, smile, politeness, experiment, questionnaire
1.
INTRODUCTION
Smiling is one of the most recognizable facial expressions,
involving the flexing of muscles near the mouth and eyes.
Research has shown that people can, both consciously and
unconsciously, distinguish different types of smiles, even
when applied to virtual agents [12]. These smiles often
have their own meaning, including showing amusement or
anxiety. The different types of smiles each have their own
morphological and dynamic characteristics as well [10].
Morphological characteristics refer to which facial muscles
are contracted or relaxed, while dynamic characteristics
refer to the duration of a smile and the velocity at which
the muscles are activated.
Virtual agents with emotional attributes are created with
different goals in mind. While it is known that people
can distinguish different types of smiles of virtual agents,
and that inappropriate facial expressions can negatively
influence interactions [9], it is not yet known whether these
smiles have different effects on perceived impression when
applied to virtual agents in specific scenarios.
Studies on different types of smiles have been done by
Ochs et al. [10], who list three types of smiles and their
characteristics, based on their study involving participants
creating one amusement, politeness and embarrassment
smile through a web interface.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee.
14th Twente Student Conference on IT January 21st , 2011, Enschede,
The Netherlands.
Copyright 2010, University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science.
This study measures the impact of different types of smiles
in a specific scenario by means of a practical experiment.
Several research questions can be defined to aid in this
study. The main research question is:
What impact do different types of smiles have
on the perceived impression of virtual agents?
To help answer this question, some sub questions are answered first:
• What are the most important types of smiles, and
what are their characteristics?
• How can these characteristics be modeled on a virtual agent?
• What is a preferable scenario to test the impact of
these different smiles using a practical experiment?
• What are some of the qualities that a virtual agent
has to possess in this environment?
• What conclusions can be drawn from the results of
the practical experiment?
Ochs et al.’s paper and its references are the main resources for answering the first sub question. The types of
smiles they defined are implemented using modeling tools
provided by the University of Twente.
The exact scenario in which the impact of the different
types of smiles is tested is specified. It involves a virtual
agent trying to tell the test subject some good or bad
news; the virtual agent smiles during the conversation.
This scenario involves a total of three different types of
messages: a positive one, a negative one, and a neutral
one.
An example of such a scenario is a virtual agent delivering
health communication and even health behavior change
interventions [3]. This virtual agent tries to deliver good
news by complimenting the user on his physical fitness,
and bad news by stimulating the user to increase his physical activity.
The way the virtual agent delivers the messages to the
user requires special attention: politeness has been shown
to have a very big impact on the motivational state of
learners by Brown et al. [5], and a way to apply their
politeness theory to virtual agents have been described by
Johnson et al. [7]. What type of politeness to implement
is a trade-off in the desire to communicate, the desire to
be efficient, and the desire to maintain the user’s face.
Table 1. Most common characteristics of different
smile types.
Amused Embarrassed Polite
Big mouth
yes
no
no
Open mouth
yes
no
no
Asymmetry
no
yes
no
Tense lips
no
yes
no
Raised cheeks
yes
no
no
2. SMILES AND POLITENESS
2.1 Smiles
Ochs et al. have created a web application, E-smilescreator, to allow users to create different smiles on a virtual
agent [10]. Test subjects were tasked with creating three
different types of smiles: one amused, one embarrassed
and one polite smile. Users could manipulate the virtual
agent’s face by specifying characteristics using multiplechoice radio buttons, showing a graphical virtual agent
being updated with these choices simultaneously.
These characteristics included specifying whether the mouth
should be big or small, open or closed, symmetrical or
asymmetrical, whether the lips should be tense, and whether
the cheeks should be raised or not. Additionally, users,
were asked to rate the smile they created on a Likert scale.
A total of 348 users participated in their study.
An analysis of the error rates showed that the amused
smile type was better classified than the embarrassed and
polite smiles types. The morphological characteristics of
the smiles that were chosen most often have been presented in Table 1. We see that according to this information, a big, open mouth and raised cheeks are characteristics unique to the amused smile, while asymmetry and
tensed lips are unique to the embarrassed smile. We will
use these characteristics in the experiment.
Smile characteristics can be implemented in FACS (Facial
Action Coding System [6]), a system to categorize facial
expressions. FACS can generate nearly every expression
that is possible to produce with the human anatomy. It
specifies a total of 32 AU s (Action Units) that each define
the contraction or relaxation of a small number of facial
muscles. For example, the raised cheeks in Table 1 can be
represented with FACS ’ AU6.
2.2
Politeness
Brown et al. have formulated the politeness theory [5].
They defined two kinds of face: positive and negative.
Positive face is about being liked, respected and approved
of by others—self-esteem. Negative face is the want to
not be imposed on by others—freedom to act. In social
interaction, it is important to maintain both faces of all
the participants. Sometimes, FTAs (face-threatening acts)
can not be avoided in communication. In these cases it is
important for the speaker to choose the right strategy to
deliver his message.
interest. This strategy is intended to make the hearer
feel good about himself. Usually used in situations
where people know each other fairly well.
Negative politeness Recognizes the want to be respected
by the hearer, but assumes that the speaker is imposing on him. This strategy is used when there is
awkwardness and social distance in a situation.
Off-record An indirect strategy intended to take pressure off of the speaker. Tries to not impose directly.
Which politeness strategy to choose depends on the want
to communicate the face-threatening act, the want to be
efficient, and the want to maintain the face of the hearer.
3.
EXPERIMENT
The effects of different types of smiles have been tested
in a user experiment. The types defined by Ochs et al.
have been implemented in a modeling framework using
the characteristics in Table 1, and combined with a fictional scenario. Videos that show this scenario for each
of the three selected smile types—amused, embarrassed
and polite—were created and presented to test subjects.
The test subjects were asked to rate the videos based on
a composed questionnaire.
3.1
Scenario
To create a believable and apt situation to present to the
test subjects, a dialogue was constructed. This dialogue is
based on the scenario of a health advisor telling a test subject about their current health status, and ways to improve
their healthiness. The dialogue is completely fictitious for
two reasons:
• We want the experience to be the same for all the
test subjects, to be able to draw more accurate conclusions.
• Due to time constraints, both on the side of the researcher and the test subjects, actually providing individual health status reports is not viable.
The dialogue was created such that the test subjects were
presented with three different types of messages:
Positive Stating that the test subject has had been in
good shape in previous health inspections.
Neutral Stating that regular exercise is important to stay
in good shape.
Negative Stating that the test subject’s cholesterol level
has risen recently and advising him to change his
eating habits.
Brown et al. defined four main politeness strategy types:
Bald on-record Does not try to minimize the threats to
the hearer’s face. Applying this strategy will probably embarrass the other person or make them feel uncomfortable. This strategy is commonly used among
people who now each other well, such as family and
friends.
Positive politeness Minimizes the face-threatening act
to the hearer’s face by expressing friendliness and
After a brief introduction by the virtual agent, the positive
message is presented using positive politeness by complimenting the test subject on his condition. The negative
message, however, starts with negative politeness by stating that the latest results are unfortunate. It then switches
to a bald on-record strategy, telling the test subject it
should change its eating habit. The dialogue ends in positive politeness, with the virtual agent taking an exaggerated interest in the test subject; he’s told that she looks
forward to following his progress in future encounters.
3.2
Questionnaire
A questionnaire was composed to evaluate the qualities of
the different types of smiles. Rather than directly asking
about the smiles, users were asked to rate various qualities
that a virtual agent should possess. A large part of these
questions were taken almost verbatim from Bailenson et al.
[2], but translated to Dutch to make it easier to understand
for the test subjects.
The social presence and likeability parts of their questionnaire served as the main contributions to the selected questions. Their questions were designed to assess perception
of co-presence and likeability and have been successfully
used in their previous works [1].
The sensitivity of the data involved leads to a couple of
major qualities that a virtual agent in this situation should
possess. The user should feel at ease, so the likeability of
the agent should be high. At the same time, the user
should take the virtual agent and her advise seriously, so
her ability to provide high social presence, sincerity, and
realism are aspects to take into account [4]. The likeability
quality is important for a virtual health advisor, because
the user should feel at ease when discussing personal information. This makes the social presence a relevant scale
for this experiment as well. The status and interest scales
in their questionnaire are of much lesser importance in this
scenario, so they were not included.
Figure 1. Picture of the agent smiling while talking
in the video of the amused smile.
In addition to likeability and social presence, there are
some other qualities the virtual agent should possess to
be taken seriously. The following questions were therefore
made specifically to test the scenarios in this experiment,
bringing the total amount of questions to 12:
• Q1: How well do you understand the conversation?
• Q2: To what degree do you feel addressed personally?
• Q3: Does the conversation flow naturally?
• Q4: Does the conversation feel sincere?
• Q5: Does the person make an effort to appear sincere?
Rather than Bailenson et al.’s 7-point scale ranging from -3
to 3, with 0 being the neutral option, a 5-point scale with 3
as the neutral option was applied for consistency with the
scale used for the other questions in this questionnaire.
The majority of the questions was formulated in such a
way that lower ratings implied a negative assessment. The
ratings of the few questions that did not follow this format
have been inverted before analysis.
The test subjects were also asked to enter their age for
statistical purposes. No other personal information was
requested and/or processed.
3.3
Set-up
The scenario has been implemented using Elckerlyc, a behavior realizer for virtual humans developed by the Human
Media Interaction group of the University of Twente [13].
Elckerlyc’s HMI FaceEditor assisted in precise control over
every facial AU defined by FACS via its FACS to MPEG4
converter. Ochs et al.’s findings were used to accurately
model the different types of smiles, and they were subsequently saved as a FAP (Facial Animation Parameters
[11]) file.
The dialogue and smiles were implemented in the BML
(Behavior Markup Language [8]) format by directly converting the FAP file to face elements with au attributes,
Figure 2. Picture of the agent smiling while talking
in the video of the embarrassed smile.
directly manipulating AU s on a virtual agent. It was then
opened in Elckerlyc’s EnvironmentDemo. For this experiment, only the morphological characteristics were considered. The sound of the dialogue was created by the
cmu-slt-hsmm text-to-speech engine.
By recording the playback of this BML file and converting it to an H.264/MPEG-4 Part 10 video file with AAC
audio, a media file is created that can be played on most
systems, unlike the more compact BML script format. Example frames of the resulting videos for the amused and
embarrassed smiles can be seen in Figures 1 and 2.
These videos were placed on a web page, accompanied by
a short explanation and the questionnaire. The web page
would randomly show the visitors a video of one type of
smile, ensuring a roughly equal distribution of test subjects per video type.
3.4
Procedure
The test subjects were asked to watch one of the three
available videos on a computer of their choosing, based on
which they were asked to fill in the aforementioned questionnaire to assess their perception of the virtual agent.
The test subjects had the option to watch their designated
video multiple times, although very few subjects made use
5
5
4
4
Rating 3
Rating 3
2
×
1
Amused
×
Embarrassed
2
×
×
1
Polite
×
×
Amused
Smile type
Embarrassed
Polite
Smile type
Figure 3. Graph of the results of social presence’.
Figure 4. Graph of the results of Q3.
of this opportunity. After submitting their answers the
questions to the questionnaire, the subjects were thanked
for their cooperation and assured that their results would
be processed anonymously.
been performed to compare the question groups. The results are presented in Table 2. No significant differences
have been found. We can see that for some questions, the
means deviate up to a full point, but all of them fall within
each other’s standard deviation.
Many test subjects left comments about their experience
with the virtual agent after the experiment finished, despite this not being part of the intended procedure. Due to
their unscientific nature these comments will not be evaluated in the results, but they will be briefly mentioned in
the discussion.
3.5
Subjects
A total of 22 people were found willing to lend their time
to participate in the experiment. This group was selected
to be as homogeneous as possible. It consisted solely of
males aged between 17 and 30 years old, with a mean of
21.5 and standard deviation of 2.76 years.
It was not possible to equally distribute the three different
videos among the total amount of test subjects. The first
video, which contained the amused smile, was tested by 8
test subjects; one more than the 7 test subjects that were
assigned to each of the other two videos.
None of the test subjects were aware of the specifics of the
experiment, and while many expressed interest in knowing
what the experiment was testing exactly, details on this
were only supplied after they finished their participation.
Accidental multiple submissions by one test subject were
filtered out of the results prior to analysis.
4.
RESULTS
The answers submitted by the test subject have been evaluated on various points. First by testing the reliability of
the scales that were used, then by comparing the results
of the different videos.
4.1
Reliability
The social presence and likeability parts of Bailenson et
al.’s questionnaire were analyzed for reliability. The social
presence scale resulted in Cronbach’s α = 0.49. Removing
the question about perceiving the other person in a virtual
room improved this to α = 0.66, so this subset of questions
will be referred to as social presence’ from here on, and
the removed question as Q8. The likeability scale resulted
in Cronbach’s α = 0.62. The remaining questions are Q1
to Q5, equivalent to the numbering in section 3.2.
4.2
Comparison
An ANOVA analysis with 95% confidence interval has
As an example of this, the results for the social presence’ scale has been graphed in Figure 3. The question
closest to a statistically significant difference is Q3 with
F (2, 19) = 2.85, p < 0.08. It is graphed in Figure 4. Even
here the embarrassed and polite smiles are still tied, however. No clear correlation between the answers and the
test subjects’ age could be found.
5.
DISCUSSION
No clear conclusions about the smiles can be extracted
from the results. The lack of variation between the ratings
of the different types of smiles is a clear problem, as is the
associated high standard deviation.
It is likely that the low variation between the ratings of
the different types of smiles is directly related to the overwhelmingly negative assessment of the virtual agent. The
main criticisms voiced by the test subjects, both in the
answers of the questionnaire and in comments that were
made after the completion of the experiment, can be attributed to the virtual agent’s lack of liveliness, its unnatural movements, and robotic voice. Despite the robotic
voice, test subjects generally had no problem following
the conversation. Just one person rated the intelligibility
as being below average.
The test subjects unambiguously indicated that the conversation was highly unnatural and insincere, though some
noticed that the virtual agent at least tried to be sincere.
Most test subjects never had the illusion that they were
talking to a real person, or that the virtual agent was
aware of their presence. The likeability scale ratings were
probably dragged down by most test subjects finding the
virtual agent highly unattractive.
Q8 was formulated as:
I perceive that I am in the presence of another
person in the virtual room with me.
It was excluded from the other questions about social presence because of issues with its reliability. Its high standard
deviation can be explained by the ambiguity of the question; it asks about a virtual room, which is not a clear
concept when looking straight at a computer screen. This
Table 2. Means and standard deviations of results.
Amused smile Embarrassed smile Polite
M
SD
M
SD
M
Social presence’ 1.41
0.38
1.25
0.43
1.64
Likeability 1.94
0.78
2.07
0.93
2.14
Q1 4.00
0.93
3.57
0.79
4.14
Q2 1.75
0.71
2.00
1.16
2.29
Q3 1.88
1.13
1.14
0.38
2.14
Q4 2.50
1.20
1.57
0.79
2.43
Q5 2.13
0.84
2.43
1.62
1.86
Q8 2.38
1.51
2.00
1.53
1.57
question would probably be more relevant in a situation
involving virtual reality.
The overall high standard deviation is partially due to
the relatively low number of test subjects. Increasing this
number would probably result in at least a few statistically
significant distinctions.
In the future, this experiment could be repeated with a
more realistic virtual agent and more test subjects for
more accurate results. Additionally, interaction between
the virtual agent and the user could be implemented to
increase the realism of the scenario. This could be done
by allowing the user to answer a virtual agent’s questions
and taking the conversation in a particular direction based
on these answers. A variant of this approach that is easier
to implement would be to offer the user a select number
of answers in the form of clickable buttons.
6.
CONCLUSION
A couple of different types of smiles can be distinguished.
The most important ones are the amused, embarrassed
and polite smiles. The amused smile is characterized by
a big, open mouth with raised cheeks. The embarrassed
smile’s main characteristics are an asymmetrical smile and
tense lips. The polite smile typically has a small, closed
mouth.
The characteristics of these different types of smiles can
be modeled on a virtual agent. In this article, this is done
in the Elckerlyc software suite by using the FAP and BML
file formats to define the facial expressions, and supplying
an appropriate dialog.
The different types of smiles can be evaluated in the scenario of a health advisor discussing a patient’s healthiness.
Since these conversations involve the exchange of sensitive
information, the politeness as described by Brown et al.,
is an important factor.
The data involved is sensitive, so the virtual agent should
possess several qualities. Likeability is important for the
user to feel at ease, but the virtual agent should be taken
seriously as well, making social presence another important factor.
The results of the practical experiment did not show a
statistically significant user preference of one type of smile
over another. The main reasons for this are an overall lack
of realism, causing an unequivocal negative assessment of
the virtual agent, and a low number of test subjects.
7.
ACKNOWLEDGMENTS
The author would like to thank his supervisors, Betsy van
Dijk and Dennis Reidsma, and his fellow students in the
Intelligent Interaction track of the Bachelorreferaat course
for their valuable feedback. Thanks also go out to all the
test subjects who participated in the experiment.
8.
smile
SD
0.78
0.63
1.07
1.11
0.69
0.98
0.69
1.13
REFERENCES
[1] J. Bailenson, J. Blascovich, A. Beall, and J. Loomis.
Interpersonal distance in immersive virtual
environments. Personality and Social Psychology
Bulletin, 29(7):819, 2003.
[2] J. Bailenson, R. Guadagno, E. Aharoni, A. Dimov,
A. Beall, and J. Blascovich. Comparing behavioral
and self-report measures of embodied agents: Social
presence in immersive virtual environments. Paper
presented at. In Proceedings of the 7th Annual
International Workshop on PRESENCE, 2004.
[3] T. Bickmore, L. Caruso, and K. Clough-Gorr.
Acceptance and usability of a relational agent
interface by urban older adults. In CHI’05 extended
abstracts on Human factors in computing systems,
pages 1212–1215. ACM, 2005.
[4] T. Bickmore and J. Cassell. Relational agents: a
model and implementation of building user trust. In
Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 396–403. ACM,
2001.
[5] P. Brown and S. Levinson. Politeness: Some
universals in language usage. Cambridge Univ Pr,
1987.
[6] P. Ekman, W. Friesen, and J. Hager. Facial action
coding system, volume 160. Consulting Psychologists
Press Palo Alto, CA, 1978.
[7] L. Johnson, P. Rizzo, W. Bosma, M. Ghijsen, and
H. van Welbergen. Generating socially appropriate
tutorial dialog. In ISCA Workshop on Affective
Dialogue Systems, volume 3068 of LNAI, pages
254–264, Berlin Heidelberg New York, 2004.
Springer Verlag.
[8] S. Kopp, B. Krenn, S. Marsella, A. Marshall,
C. Pelachaud, H. Pirker, K. Thórisson, and
H. Vilhjálmsson. Towards a common framework for
multimodal generation: The behavior markup
language. In Intelligent Virtual Agents, volume 4133
of LNCS, pages 205–217. Springer, 2006.
[9] R. Niewiadomski and C. Pelachaud. Model of Facial
Expressions Management for an Embodied
Conversational Agent. In Proceedings of the 2nd
international conference on Affective Computing and
Intelligent Interaction, pages 12–23. Springer-Verlag,
2007.
[10] M. Ochs, R. Niewiadomski, and C. Pelachaud. How
a Virtual Agent Should Smile? In Intelligent Virtual
Agents: 10th International Conference, IVA 2010,
Philadelphia, PA, USA. Proceedings, page 427.
Springer, 2010.
[11] I. Pandzic and R. Forchheimer. MPEG-4 facial
animation: the standard, implementation and
applications. Wiley, 2002.
[12] M. Rehm and E. André. Catch me if you can:
exploring lying agents in social settings. In
Proceedings of the fourth international joint
conference on Autonomous agents and multiagent
systems, pages 937–944. ACM, 2005.
[13] H. van Welbergen, D. Reidsma, Z. Ruttkay, and
J. Zwiers. Elckerlyc - a bml realizer for continuous,
multimodal interaction with a virtual human.
Journal on Multimodal User Interfaces,
3(4):271–284, 2010.