CONVERSATIONS AND VIRTUAL REALITY
by
Swapna Reddy Gouravaram
A report submitted in partial fulfillment
of the requirements for the degree
of
MASTER OF SCIENCE
in
Computer Science
Approved:
Dr. Vicki Allan
Major Professor
Dr. Steve Allan
Committee Member
Dr. Xiaojun Qi
Committee Member
UTAH STATE UNIVERSITY
Logan, Utah
2004
i
ACKNOWLEDGMENTS
I would first like to thank my advisor, Professor Vicki Allan, for all her wisdom
and guidance throughout my years as a master’s student. I am extremely thankful to her
for having helped me in all aspects throughout my education in the Computer Science
Department. She has been a source of knowledge and inspiration. Professor Vicki has
been very patient in helping me understand the new trends and concepts in computer
science. I am greatly indebted to her for all the help she has rendered.
I am also grateful for the members of my committee, Dr. Steve Allan and Dr.
Xiaojun, for taking time in providing advice and direction on my report. Finally, I would
like to thank all my friends for their support.
Swapna Reddy Gouravaram
ii
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS………………………………………………………………..ii
LIST OF FIGURES………………………………………………………………………iv
LIST OF TABLES………………………………………………………………………..v
ABSTRACT……………………………………………………………………………...vi
CHAPTER
1
INTRODUCTION...................................................................................................1
1.1 Agent Conversations..............................................................................2
1.2 Virtual Reality........................................................................................8
2
MODELING CONVERSATIONS USING STOCHASTIC CONTEXT-FREE
GRAMMARS........................................................................................................10
2.1 Introduction..........................................................................................11
2.2 Related work........................................................................................12
2.3 Stochastic context-free grammars........................................................14
2.4 MATES (Martial Agent Trait-Based Emotion System)......................15
2.5 Conversations......................................................................................16
2.5.1 IPIP-NEO Personality Survey..............................................17
2.5.2 Rejection language mapping................................................20
2.5.3 Computing probabilities.......................................................24
2.5.3.1 Determining the Probability of Rejection Based on
the “Anger” personality attribute.......................................25
2.5.3.2 Determining the Probability of Counter-proposal
Based on the “Sympathy” personality attribute.................28
3
2.6 An Example: Movie Plan....................................................................31
2.7 Conclusions and Future work………………………………………..34
EVALUATION OF EXISTING VIRTUAL REALITY TOOLS……………….35
3.1 Introduction…………………………………………………………..35
3.2 Requirements………………………………………………………...36
3.3 Comparison of virtual reality systems……………………………….41
3.3.1 NetICE……………………………………………………..41
3.3.2 Dive………………………………………………………...43
3.3.3 Alpha wolves……………………………………………....45
3.3.4 Jack………………………………………………………...47
3.3.5 Active Worlds……………………………………………...48
3.4 Discussion……………………………………………………………52
iii
4
3.5 Conclusions and Future work………………………………………..53
RECOMMENDATIONS AND CONCLUSIONS………………………………54
REFERENCES…………………………………………………………………………..58
LIST OF FIGURES
Figure
Page
1.1
Conversations using Finite State machines..............................................................3
1.2
Conversations using Dooley graphs.........................................................................4
1.3
Conversations using Petri nets before a token is fired ............................................5
1.4
Conversations using Petri nets after a token is fired................................................6
2.1
Mapping between input (personality and situation data) and output.....................17
2.2
Stochastic finite state machine for rejection language...........................................24
2.3
Determining rejection probability based on anger.................................................26
2.4
Determining counter-proposal probability based on sympathy.............................28
3.1
NetICE virtual environment...................................................................................43
3.2
Dive virtual environment.......................................................................................45
3.3
Wolves forming social relationships......................................................................47
3.4
Jack changing tools of a machine..........................................................................48
3.5
Active Worlds virtual environment.......................................................................50
3.6
Avatar in NetICE showing an anger face..............................................................56
3.7
Avatars in NetICE raising their hands...................................................................56
iv
LIST OF TABLES
Table
Page
2.1
IPIP-NEO Personality Survey report.....................................................................19
2.2
Personality vector for Alice...................................................................................31
2.3
Personality vector for Alice...................................................................................32
3.1
Evaluation of existing virtual reality tools.............................................................51
v
ABSTRACT
Conversations and Virtual Reality
by
Swapna Reddy Gouravaram, Master of Science
Utah State University, 2004
Major Professor: Dr. Vicki Allan
Department: Computer Science
Many emotional and social agents exist that model various aspects of human
behavior and personality. This report designed a prototype to generate responses for a
part of the conversation. The emphasis has been on developing a framework for the
rejection language mapping in the domain of MATES (Martial Agent Trait–based
Emotion System) using stochastic context-free grammars. The MATES agent is an
intelligent agent with personality, emotions, goals, and plans. Personality of the agent is
used to determine the responses during a conversation.
The report also analyses the current literature in the field of virtual reality. Based
on opinions of key researchers and our own evaluations, we identify the key issues that
must be addressed for evaluating various virtual reality tools.
Virtual reality tools can be coupled with response generation to enhance
believability of agents.
(64 pages)
vi
CHAPTER 1
INTRODUCTION
Agents and virtual environments are two powerful tools in the world of
computing. Each has aspects that could potentially complement the other’s strength,
particularly when used in animation. This report explores that possibility. To begin, we
will briefly define agents and virtual reality.
An agent is an autonomous entity that acts on the behalf of the user [40] by
performing actions to meet its design objectives. Weiss [39] defines an intelligent agent
as “one that is capable of performing flexible actions” where flexibility means: reactivity, pro-activeness, and social ability.
A virtual environment is a computer-generated environment that the user may
manipulate or move through in real time. An intelligent virtual environment consists of
intelligent agents that adapt to the user’s requirements [24]. Thus, agents in intelligent
virtual environments guide users, perform actions, and present information to the users.
Virtual reality establishes multi-modal interactions among agents and humans [30,
31]. In order to achieve these interactions, agent design has to include the ability to
communicate at a satisfactory level by including personality, emotions, and language
[24].
Agents have been used in various applications such as e-commerce applications
[23], and large-scale distributed applications like Dive [10]. Intelligent agents are used
extensively in applications that involve human interactions [29]. This report deals with
agent conversations and virtual reality.
1
1.1 Agent Conversations
Current research in believable agents deals with developing agents with personality and
emotion [26]. Software agents are expected to be believable. When interacting with an
agent, the user should feel that he/she is interacting with a life-like character rather than a
life-less character. These interactions should model believable, effective human
communication [22]. The requirements for believability are personality and emotion.
Personality distinguishes one character from another. It includes everything unique and
specific about the character. Emotion is the mental state of the agent and depends on the
situation [26]. Characters with different personalities show different emotions under
similar situations. Research in believable agents is important, as designers need
techniques for representing emotions and personality to depict user behavior.
Extensive research has been done in modeling agent conversations. In particular,
finite state machines, Dooley graphs, and Petri nets have been used for developing
conversations.
A finite state machine consists of states and transitions. States represent the
possible states of the agents in a given conversation. Transitions represent a change in the
conversation state based on communication to and/or from the agent [13].
Figure 1.1 shows a conversation using finite state machine between Agent A and
Agent B. Agent A proposes an activity to Agent B. Agent B can either reject the proposal
or accept the proposal. Finite state machines can handle sequential conversations or
conversations that occur in parallel. Finite state machines represent each agent’s
communication parameters in the form of transition rules.
2
Figure 1.1: Conversations using Finite State machines.
A Dooley graph is represented by a 4-tuple <E, P, M, A>. E is a set of counting
numbers indexing the chronologically ordered utterances in the conversation. P
represents the set of participants in the conversation. A is the set of ordered triples of the
form {p1, p2, k} defined over two ordered pairs S and R: the Sender set S= {<p1, k>,
participant p1 sends utterance k} and the Addressee set R = {<p2, k>, participant p2
receives utterance k}. Each triple in A becomes an arc in the graph. M is a relation
between S and R indicating the sender and receiver of an utterance [27]. Figure 1.2
represents the conversation of Figure 1.1 in terms of a Dooley graph.
3
Figure 1.2: Conversations using Dooley graphs
In the above figure, utterances connect the participants. The decision made by B
spawns a new component {B1, B2} in order to represent the complete conversation. A
finite state machine model clarifies various states through which a conversation may
move, but it obscures the identity of the participant. Dooley graphs appear similar to
finite state machines, but instead of representing possibilities, the Dooley graph
represents a complete conversation. Each node represents a participant and state
information. Thus, we can have more than one state for a participant. Each state
represents a part of the conversation. Thus, the collection of states associated with a
participant represent the role of the participant. Utterances link senders and receivers.
Dooley graphs are said to be useful in capturing complex agent interactions and
expressing them in an understandable manner, but they are not used in generating a
conversation or choosing a possible conversation path.
A Petri net is represented by a 5-tuple <P, T, I, O, MO), where P is the set of
places. T is the set of transitions. I and O are the input and output functions which map
4
places to transitions and transitions to places. MO is the marking vector that characterizes
the initial state of the system by indicating the number of tokens at each place. Every
transition has a predecessor place and a successor place. When there is at least one token
in all predecessor places connected to a transition, we say that the transition is enabled.
An enabled transition fires by removing one token from each predecessor place, and
depositing one token at each successor place (all the preconditions must be fulfilled).
Petri nets have been used for modeling conversations in complex, distributed and
concurrent systems [6]. The main reason for using Petri nets on a large scale is their
graphical representation and well-defined semantics.
Figures1.3 and 1.4 represent the conversation in Figure 1.1 in terms of a Petri net.
Further, Figures 1.3 and 1.4 show transitions among places by removing and adding
tokens.
Figure 1.3: Conversations using Petri nets before a token is fired
5
Figure 1.4: Conversations using Petri nets after a token is fired
The number of tokens removed or added depends on the cardinality of the arc. In
the example shown in Figure 1.3 and 1.4, the cardinality is one. Petri nets provide support
for complex, concurrent, distributed, and/or stochastic conversations. The tokens in the
Petri nets represent the dynamic components of the system. The widespread use of Petri
nets is due to their useful graphical representation (used to model flow charts, block
diagrams, and networks) and mathematical formalism (needed for state equations and
algebraic equations) to represent the behavior of the system. With Petri nets, it is easier to
see the flow of a conversation than it is with Dooley graphs (based on the triggering of
the events).
There are several ways to incorporate the timing concept into a Petri net model.
One way is to associate a firing delay with each transition. This delay specifies the
relative time that the transition has to be enabled, before it can actually fire. If the delay is
a random distribution function, then it is a stochastic Petri net. While we can use Petri
6
nets to effect the stochastic generation of conversation, it seems more complicated than
stochastic grammars.
Stochastic context-free grammars are based on context-free grammars. Stochastic
context-free grammars extend context-free grammars by adding probabilities to
productions. Stochastic grammars can be used for generation and/or recognition of
sentences. They are easier to implement than Petri nets. We are using the grammars to
generate a sentence of the language. Others are using stochastic grammars to recognize
sentences in the language, by assigning a parse tree to each sentence.
Linking personality with emotions can affect decision-making. Gratch [12] gives
an example of a virtual environment to illustrate how emotions can be used to affect
behavior. The virtual environment consists of a platoon leader with several autonomous
agents, such as other platoon leaders, civilians, and platoon members. The emotional
traits of the agents are set based on situation data, and responses are generated based on
the agent’s beliefs, plans and emotional values.
Kshirsagar [19] combines personality and emotions to generate responses to
synthesize virtual humans. Kshirsagar defines interactions by means of finite state
machines. The PEN model determines the personality traits. PEN model is comprised of
three personality dimensions namely extraversion, neuroticism and psychoticism.
Our MATES system (Martial Agent Trait-based Emotion system) consists of two
agents, Bob and Alice, representing a couple considering marriage. These agents
converse with each other based on current plans, goals, history, personality, and
emotions. A conversational path is a collection of specifications that guide particular
choices during a conversation. Specifications are a collection of internal information
7
structures, such as with whom to communicate, when to communicate, and what to
communicate. A variety of conversational paths are possible based on personality,
emotions, plans, goals and history. For example, angry people tend to give more
rejections. The selection process is determined by using stochastic context-free
grammars.
We have attempted to generate multiple responses between two agents through
the use of stochastic context-free grammars. A MATES agent has a personality that is
established based on user response to a series of questionnaires. Personality values are
used to determine the responses generated by the agent. We have chosen the five-factor
model personality model for our system, as it is compatible with existing psychological
theories.
1.2 Virtual Reality Systems
Virtual Reality (VR) is a computer-generated virtual environment through which the user
may move and manipulate the environment in real time. Virtual reality is accomplished
via a combination of computer techniques and interface devices that presents the user
with the illusion of being in a three-dimensional world [37]. The main goal of virtual
reality implementations is to provide the user with an interface that is easy to use,
interesting, and intuitive.
Virtual reality entails the use of advanced technologies, including various
multimedia peripherals, to produce a simulated (virtual) environment that users perceive
as comparable to real world objects and events. Virtual reality worlds find applications in
various fields. Virtual reality is used in engineering to build virtual aircrafts, automobiles,
8
and submarines, thus allowing objects (engineering components) to be manufactured,
inspected, assembled, tested, and otherwise subjected to all sorts of simulations. Virtual
reality is used in classrooms to facilitate distance learning and teaching. Virtual reality is
also used to develop e-commerce applications such as online shopping [2,15].
After a literature search in the field of virtual reality, we identified key issues that
must be addressed for evaluating virtual reality tools. The technique for response
generation (described in Chapter 2) can be integrated with virtual reality tools to generate
believable animations.
This report is organized in a paper format. Chapter 2 is a paper titled “Modeling
Conversations Using Stochastic Context-free Grammars”.It describes an innovative
method to model conversations using stochastic context-free grammars to generate
multiple responses. Chapter 3 is a paper titled “Evaluation of Existing Virtual Reality
Tools”. It analyzes the current research in the field of virtual reality. In chapter 4, we will
consider how the existing techniques can be integrated together to provide a plausible
animation and, finally, give our conclusions.
9
CHAPTER 2
MODELING CONVERSATIONS USING
STOCHASTIC CONTEXT-FREE GRAMMARS
ABSTRACT
One primary characteristic of agent-based systems is that agents pursue their
goals by means of conversations. Software agents are able to make offers and counteroffers for a proposal by exchanging information. Communication becomes more effective
if agents convey their feelings through conversations. Using stochastic context-free
grammars, one can generate multiple responses.
An important component of this research is to generate part of a conversation. Our
system consists of two agents Alice and Bob, representing a couple considering marriage.
One agent is programmed with personality data from Alice, while the other agent is
programmed with personality data from Bob extracted using the IPIP-NEO survey [18].
These agents then converse with each other based on goals, plans, history, personality
data, and emotions.
Stochastic context-free grammars extend context-free grammars by adding
probabilities to the productions. The probability assigned to productions determines the
output responses. With probabilities, we can generate a fixed number of responses as well
as a variance in the number of responses generated. Hence, the occurrence of some
sentences is greater than others for a particular event. The probability of the productions
is determined by the input data (personality data) to generate output responses. Use of
stochastic context-free grammars allows us to combine various personality data to model
multiple responses.
10
2.1 Introduction
Software agents are autonomous, collaborative entities that are used in developing
various software applications and, can act effectively on behalf of the user [40]. Agents
are able to make offers and counter-offers to a proposal.
One of the primary characteristic of agent-based systems is that agents interact
with each other to meet their goals. Several people have attempted to define
conversations. Greaves [14] says that conversations are the meaningful exchange of
messages (information) between interacting agents. Robert [32] defines a conversation as
a shared understanding on the meaning of speech acts and the connectedness between
utterances.
Conversations and their associated protocols to characterize agent messages have
been discussed in the agent community for several years. Speech act theory [33] is the
foundation for modeling, analyzing, and designing agent communication in multi-agent
systems.
Agents that intend to have a conversation require internal information structures
about which communication acts to use, when to use them, with whom to communicate, a
list of possible responses, and possible outcomes upon receiving the expected responses.
In the MATES system, agents converse with each other based on current plans,
goals, history, personality, and emotions. A variety of conversational paths are possible.
The selection process is determined by using stochastic context-free grammars.
This paper explores the use of stochastic grammars to provide multiple responses
that connote the strength of the feeling of the agent’s response. A stochastic context-free
grammar determines the ranking of sentences by attaching probabilities, yielding a
11
variety of responses that reflect situation data, as well as incorporate some random
variability. Personality and situation data determine the probabilities for the productions.
Hence, the plausibility of the occurrence of some response is greater than other
responses. The personality data of each agent is determined by the five-factor model [18],
as it is compatible with existing psychological theories.
In Section 2.2, we briefly introduce formalisms currently being used for
conversational modeling. In Section 2.3, we outline the details associated with stochastic
context-free grammars and their advantages. In Section 2.4, we introduce the MATES
system. In Section 2.5, we describe the IPIP-NEO personality survey, describe the
rejection language mapping of the phrases, and determine the probabilities for the
personality attributes. We conclude with a discussion of our examples in the domain of
the MATES system (Section 2.6).
2.2 Related Work
Extensive research has been done in modeling agent conversations. Finite state
machines, Dooley graphs, and Petri nets have been used for developing conversations
structures.
Finite state machines represent the simplest modeling formalism used for agent
conversations. As such, finite state machines are not designed to distinguish between
actions of the participants. Dooley graphs represent conversations by linking the
utterances made by the participants. They provide state and participant information [27].
This is a considerable improvement over finite state machines. However, finite state
machines and Dooley graphs cannot represent concurrent conversations. On the other
12
hand, Petri nets have been used for modeling conversations in complex, distributed and
concurrent systems [6].
Linking personality and emotions with conversation structures can affect
decision-making. AvaTalk, an agent technology, enables users to carry out conversations
with the avatars/agents [16]. This technology is used for tutoring students where agents
play the role of mentors. Each agent has a set of state variables that keeps track of its
emotional and personality traits. An agent’s state variables are used for the generation of
responses during the course of a conversation.
In designing conversation systems, the most popularly used model is the reactive
model. The system computes its response based on what the user says or does. Such a
reactive system is required if the user interacts with the system such as the tutoring
system described above. However, in a scenario generation system such as the one we
propose, the individualization comes by way of setting up the parameters (personality and
couple information) before the scenario is generated. The user interacts with the system
by changing the parameters and generating a new scenario. This semi-dynamic
interaction allows control of the system at a high level and requires less context
information to be passed to the individual parts of the system.
In such a system, it is possible to generate the plan for both partners, rather than
just one, based on personality and couple interaction data. One of the high level plans is
to generate conversations. The decision of how each part of the conversation is generated
becomes independent. For example, based on parameter settings, we decide that the
conversation should have the following high level plan:
13
Bob: Propose an idea.
Alice: Ask for details.
Bob: Give details.
Alice: Ask for other choices.
Bob: State the original idea was best.
Alice: Reject the idea.
Bob: Respond to the rejection.
If this is the plan, how Alice will reject is a relatively independent piece. As a first
step to demonstrate the effectiveness of stochastic context-free grammars, we illustrate
their use in producing the rejection part of the conversation.
2.3 Stochastic Context-free Grammars
A context-free grammar can be represented by a 4-tuple <T, N, S, P> where T is
the set of terminals, N is the set of nonterminals, S is the start nonterminal and P is the set
of productions of the form X ∑* where X is a nonterminal and ∑ is a collection of
terminals and nonterminals.
A stochastic context-free grammar (SCFG) extends context-free grammar by
adding probabilities to productions [34].
X ∑*
[p]
where X is a nonterminal, ∑ is a collection of terminals and nonterminals and p
represents the probability that this particular production is used in sentence generation.
The rule probability can also be expressed as p (X ∑*), which represents the
conditional likelihood of the production X ∑*. The probability of all productions with
14
the same nonterminal on the left-hand side of the production must, therefore, sum to
unity.
Stochastic context-free grammars are also called as probabilistic context-free
grammars. Stochastic context-free grammars can be used for the generation of strings in
the language or for the verification of strings in a language. We use them for string
generation. The various rejection phrases used in our system (when expressed in terms of
stochastic context-free grammars) are:
R No
p (R No) = 0.3
R I won’t
p (R I won’t) = 0.2
R I can’t
p (R I can’t) = 0.2
R Sorry
p (R Sorry) = 0.05
R I would rather not
p (R I would rather not) = 0.15
R Absolutely not
p (R Absolutely not) = 0.1
Stochastic context-free grammars are superior to context-free grammars for
conversation generation because the probability attached to each production gives some
idea of the plausibility of the sentence. Stochastic context-free grammars can add new
productions and their associated probabilities for each new event. Stochastic grammars
provide a quantitative basis for ranking different productions, thus exploiting the
dependencies between sentence structure and situation data [34].
2.4 MATES (Marital Agent Trait-Based Emotion System)
The MATES agent system is a tool used by pre-martial couples to recognize their
interpersonal conflicts and, hence, facilitate a harmonious relationship. The tool can also
15
be used by marriage counselors and researchers to study human interactions in different
situations. A MATES agent incorporates emotions to mimic human behavior. The goals
of this system are to help individuals make appropriate martial decisions and to motivate
an individual to change destructive habits.
Our system represents a couple, Alice and Bob. One agent is programmed with the
personality data of Alice while another agent is programmed with the personality data of
Bob. The personality data is obtained by using IPIP-NEO surveys [25]. These agents
converse with each other about a specific scenario, and the conversation varies depending
on personality, goals, and interaction patterns.
2.5 Conversations
One can express conversations (interaction patterns) as regular expressions. These
regular expressions form the basis for stochastic context-free grammars. Stochastic
context-free grammars are used to generate multiple responses depending on the
probabilities assigned to the productions. Multiple responses like “No, No, No” (rather
than “No”) express the feeling of the speaker agent in negotiating the activity depending
on the level of personality values, such as anger, aggression, and so on. The emphasis in
this paper has been on developing a prototype for the part of the conversation that
generates various rejection messages in a simple and extensible way. Our motivations for
using stochastic context-free grammar are:
1) Stochastic context-free grammars can be used as a language generator.
2) Stochastic grammars can be dynamically evaluated.
3) The probability assigned to productions determines the output sentences.
16
The probabilities depend on multiple parameters like personality and the current
situation. Figure 2.1 represents the mapping between the input (personality) and output
(response).
Figure 2.1: Mapping between input (personality and situation data) and output (response)
2.5.1 IPIP-NEO Personality Survey
Different people often choose different behaviors depending on the situation; for
example, some people are pleasure seekers while other people are depressed. We have
chosen five-factor personality traits to represent different traits [18]. According to the
taxonomy, the personality of an individual can be parameterized into five dimensions
namely extraversion, neuroticism, openness, agreeableness, and conscientiousness.
An extraverted person seeks the attention of others. Extraverts enjoy being in a
group and are full of energy. On the other hand, introverts lack energy and are less social.
The various extraversion facets are friendliness, gregariousness, assertiveness, activity
level, excitement-seeking, and cheerfulness [18].
An agreeable person is cooperative, convincing, and listens to other people’s
arguments. They are friendly, kind, and compromise their own interests for the sake of
17
others. On the other hand, disagreeable people are generally not concerned about others.
They are more selfish. The various agreeableness facets are trust, morality, altruism,
cooperation, modesty, and sympathy [18].
Conscientious people have control over their behaviors. A positive attribute of
conscientiousness indicates people are intelligent and reliable, whereas a negative
attribute may be that conscientious people are often regarded as boring. The various
conscientiousness facets are self-efficacy, orderliness, dutifulness, achievement-striving,
self-discipline, and cautiousness [18].
Neuroticism refers to mental distress, depression, and suffering. People with high
values of neuroticism interpret normal situations as dangerous and threatening. The
various neuroticism facets are anxiety, anger, depression, self-consciousness,
immoderation, and vulnerability [18].
Open people are generally imaginative and creative. People with low scores on
openness tend to be plain by nature. The various facets in this domain are imagination,
artistic interests, emotionality, adventurousness, intellect, and liberalism [18].
The IPIP-NEO personality survey report estimates the individual’s level on each
of the five personality domains mentioned above. The survey consists of 120 questions.
Each facet has a value in the range of 1-100. Table 2.1 shows a sample report based on
the IPIP-NEO personality survey. The report estimates the individual level in the five
personality domains. The description of each domain is explained further by classifying
into six sub domains. This report compares the personality of one person with others of
the same age.
18
Table 2.1: IPIP-NEO Personality Survey report
EXTRAVERSION
Friendliness
Gregariousness
Assertiveness
Activity Level
Excitement-Seeking
Cheerfulness
AGREEABLENESS
Trust
Morality
Altruism
Cooperation
Modesty
Sympathy
CONSCIENTIOUSNESS
Self-Efficacy
Orderliness
Dutifulness
Achievement-Striving
Self-Discipline
Cautiousness
NEUROTICISM
Anxiety
Anger
Depression
Self-Consciousness
Immoderation
Vulnerability
OPENNESS TO EXPERIENCE
Imagination
Artistic Interests
Emotionality
Adventurousness
Intellect
Liberalism
19
94
90
91
82
97
49
80
81
87
63
88
73
25
77
84
96
36
87
83
87
73
6
0
16
16
39
8
29
35
65
18
83
9
86
4
2.5.2 Rejection Language Mapping
In our system, the conversation can be viewed as a high level plan which consists
of low level plans. One such low level plan indicates how rejections are made by the
listener agent for the proposal made by the speaker agent. The rejection language can be
expressed by the following regular expression.
Rejection: reject [reject + explanation + judgment + counterproposal] *.
For simplicity, in our system each rejection response must have at least one reject
phrase followed by multiple explanation phrases, judgment phrases and counterproposals
phrases.
The various rejection phrases used in our system are:
No.
I can’t.
I won’t.
Sorry.
Absolutely not.
I would rather not.
The various explanation phrases used in our system are:
Because of something you have done, I don’t want to spend time with you.
Because of something you have done, I would rather be with my friends / family.
Because of some situation.
Because of prior history.
20
Because I have a conflict.
Because it is ridiculous and dumb.
I am too busy for you.
The various judgment phrases used in our system are as follows. For simplicity, we have
classified the judgment phrases into two categories, namely positive and negative
judgment phrases.
The various positive judgment phrases are:
You have such good ideas.
You are so thoughtful.
That does sound fun.
I am so sorry.
The various negative judgment phrases are:
You always want to do things I don’t like.
Why did you think I would want to do that?
You never consider my feelings.
The various counter proposal phrases used in our system are:
Maybe another time/day.
Maybe we should go golfing/bowling.
Maybe we should do some other thing I know you like.
Maybe we should do something we both like.
Maybe we should do something only I like.
21
The context-free grammar for the above regular expression is:
S R1E1J1C1
R1 R | RR1
E1 EE1 | €
J1 JJ1 | €
C1 CC1 | €
where ‘S’ represents start nonterminal, ‘R1’,‘E1’,‘J1’,‘C1’ represent intermediate
nonterminals, ‘R’ represents the rejection phrase, ‘E’ represents the explanation phrase,
‘J’ represents the judgment phrase, ‘C’ represents the counter-proposal phrase and ‘€’
represents a null terminating string.
R No.
R I won’t.
R I can’t.
R Sorry.
R Absolutely not.
R I would rather not.
E Because of some situation, I don’t want to spend time with you.
E Because of some situation, I would rather be with my friends/family.
E Because of something that you have done.
E Because I have a conflict.
E Because of prior history.
E Because it is ridiculous and dumb.
22
E I am too busy for you.
J You have such good ideas.
J You are so thoughtful.
J That does sound fun.
J You always want to do things I don’t like.
J You never consider my feelings.
J Why did you think I would want to do that?
J I am so sorry.
C Maybe another time.
C Maybe another day.
C Maybe we should go golfing.
C Maybe we should go bowling.
C Maybe we should do some other thing I know you like.
C Maybe we should do something we both like.
C Maybe we should do something only I like.
Figure 2.2 represents a simplified stochastic finite state machine for the stochastic
context-free grammar mentioned above.
23
Figure 2.2: Stochastic finite state machine for the rejection language
From Figure 2.2, we assume that a rejection message must begin with a reject
phrase. The possible phrases could be permuted to achieve more options. Reject
responses may be followed by an explanation phrase, judgment phrase, or a counterproposal phrase. Depending on the input data that we get, we can generate output
messages which are a collection of multiple rejections, explanations, judgments and
counter proposals.
2.5.3 Computing Probabilities
We have developed a prototype for the whole rejection language generation in
Java. We have designed functions to determine the probabilities based on personality
attributes. For example, people with high values of anger tend to give more rejections;
people with high values of sympathy tend to give more counter-proposals. The purpose of
assigning probabilities to productions is that it determines a variance in the number of
responses generated. Our results are based on five of the personality attributes identified
by the IPIP survey, namely anger, sympathy, cooperation, depression, and assertiveness.
24
The various responses generated as a result of combining these attributes is given in
Section 2.6.
Personality values range from 1-100. For simplicity, each attribute is identified as
very high, high, medium, low, and very low depending on the attribute value.
An attribute is rated very high if the personality value is in between 81 and 100.
An attribute is rated high if the personality value is in between 61 and 80.
An attribute is rated medium if the personality value is in between 41 and 60.
An attribute is rated low if the personality value is in between 21 and 40.
An attribute is rated very low if the personality value is in between 0 and 20.
2.5.3.1 Determining the Probability of Rejection Based on the “Anger” Personality
Attribute
The reject phrase probability is calculated based on “anger” personality attribute
using the following formula.
F(y) = 0
= (x - 40) / 60
when x >= 0 and x <= 40
when x > 40 and x <= 100
where ‘x’ is the anger personality attribute value.
This choice of mapping function is used to generate a variance in the number of reject
phrase responses. We wanted between one and four rejections based on the anger
personality attribute (assuming the decision for rejection has been made). The rejection
probability has an empty production at lower values of anger and increases as the value
of anger increases. Figure 2.3 illustrates this.
25
Probability for Anger Personality
Rejection Probability
1.2
1
0.8
0.6
Probability
0.4
0.2
0
-0.2 0
50
100
150
Anger
Figure 2.3: Determining rejection probability based on anger.
We ran five tests at each of the five anger levels to see the output responses. Assuming
the decision for reject has already been made, the sets of responses at various levels of
anger are as follows:
Case 1: Anger =30
No responses.
Case 2: Anger = 55
No responses.
No responses.
No responses.
I won’t.
No responses.
26
Case 3: Anger = 75
No: I can’t.
I can’t.
I won’t.
I would rather not.
I can’t: Absolutely not.
Case 4: Anger = 85
I can’t: Sorry: No.
I would rather not: No: I won’t.
No: I would rather not.
No: Absolutely not: I won’t.
I can’t: I won’t.
Case 5: Anger = 95
No: No: Sorry: I can’t
Absolutely not: I can’t: I won’t.
I can’t: I can’t: No: I won’t.
No: I would rather not: I can’t.
I can’t: No: I can’t: I would rather not.
27
2.5.3.2 Determining the Probability of a Counter-proposal Based on “Sympathy”
Personality Attribute
The counter-proposal phrase probability is calculated based on the “sympathy”
personality attribute, by using the following step function:
F(y) = 0 when x>= 0 and x<=40
x/20 when x>=41 and x<=100
where ‘x’ is the sympathy personality attribute value.
This choice of mapping function determines the fixed number of counter-proposal phrase
responses. We wanted counter-proposal responses to increase by one whenever the level
of the sympathy attribute increases. The counter-proposal probability has an empty
production at lower values of sympathy and increases by one as the sympathy value
increases. Figure 2.4 illustrates this.
Probability for sympathy personality
1.2
Counter-proposal probability
1
0.8
0.6
Probability
0.4
0.2
0
0
20
40
60
80
100
120
-0.2
Sympathy
Figure 2.4: Determining counter-proposal probability based on sympathy.
28
Assuming the decision for reject has already been made, the various sets of responses at
different levels of sympathy are as follows:
Case 1: Sympathy = 20
There would be no responses.
Case 2: Sympathy = 45
Maybe another time.
Maybe we should go bowling.
Maybe we should do something we both like.
Maybe another time.
Maybe we should go golfing.
Case 3: Sympathy = 65
Maybe another day: Maybe we should do something only I like.
Maybe we should go golfing: Maybe another day.
Maybe another time: Maybe we should do something we both like.
Maybe we should go golfing: Maybe we should do something I know you like.
Maybe we should do something we both like: Maybe another day.
Case 4: Sympathy = 80
Maybe another time: Maybe we should something we both like: Maybe we should
go bowling.
Maybe another day: Maybe we should go golfing: Maybe another time.
29
Maybe we should do something only I like: Maybe we should go golfing: Maybe
another time.
Maybe another day: Maybe another time: Maybe we should go golfing.
Maybe we should do something we both like: Maybe we should do something I
know you like: Maybe we should go bowling.
Case 5: Sympathy = 100
Maybe we should do something we both like: Maybe we should go bowling:
Maybe we should go bowling: Maybe another day
Maybe we should go golfing: Maybe another time: Maybe another day: Maybe
we should go bowling.
Maybe we should go bowling: Maybe we should do something only I like: Maybe
another day: Maybe another time.
Maybe another day: Maybe we should do something I know you like: Maybe we
another time: Maybe another day.
Maybe another time: Maybe we should go bowling: Maybe we should go golfing:
Maybe we another time.
On similar lines, we computed the probabilities for cooperation, depression and
assertiveness personality attributes. We have used a bell shaped function to determine a
variance in the number of explanation phrases generated based on the depression
attribute. The explanation phrases are generated at medium levels of depression. We have
used a step function to determine a fixed number of judgment phrases based on
cooperation and assertiveness. Lower levels of cooperation determine negative judgment
30
phrases, and higher levels of cooperation determine positive judgment phrases. Judgment
phrases are generated when the assertiveness value is greater than or equal to medium
level.
2.6 An Example: Movie Plan
Consider a sample scenario of a conversation between Alice and Bob. Bob asks
Alice if she would accompany him to a movie. This request could result in an emotional
response for Alice.
Case 1: Alice displays the attribute anger due to Bob’s behavior last night, hence, she
rejects the proposal made by Bob. This event triggers the attributes of Alice. Based on the
situation data, we set the personality attributes of Alice. Table 2 shows the personality
vector for Alice.
Table 2.2: Personality vector for Alice
Personality attribute
Value
Anger
90
Depression
71
Sympathy
52
Cooperation
20
Assertiveness
66
31
Using the mapping functions, the actual responses generated by our system are as
follows:
No: I can’t: I won’t: You always want to do things I don’t like :You never
consider my feelings: Maybe another time.
I can’t: I would rather not: No: You never consider my feelings: Why did you
think I would want to do that?: Maybe we should go golfing.
I can’t: Absolutely not: No: I would rather not: You always want to do things I
don’t like: Why did you think I would want to do that?: Maybe we should do
something we both like.
Notices that, the output responses contain many reject phrases, as the anger value is very
high. Because of the low value of cooperation, the output response contains negative
judgment phrases and sympathy is used to generate counter-proposal phrases.
Case 2: Alice could also feel guilty due to her role in the previous night’s disagreement.
She would attribute some guilt to herself for her behavior. Based on this situation, we set
the personality attributes for Alice as follows:
Table 2.3: Personality vector for Alice
Personality attribute
Value
Anger
50
Depression
55
32
Sympathy
77
Cooperation
50
Assertiveness
63
Given this input, the actual sets of responses generated by our system are as follows:
No: Absolutely not: Because I have a conflict: Because of prior history: Because
of some situation: You have such good ideas: Maybe we should go golfing:
Maybe another time.
I can’t: I am too busy for you: Because of prior history: Because of some
situation: That does sound fun: Maybe another day: Maybe we should do some
other thing I know you like.
I won’t: I am too busy for you: Because it is ridiculous and dumb: Because I have
a conflict: You are so thoughtful: Maybe we should go bowling: Maybe we
should do something we both like.
Notice that, the output response contains very few reject phrases, as the anger is value is
in the medium range when compared to the previous result. Depression determines the
explanation phrases. Higher values of cooperation determine positive judgment phrases,
and since the value of the sympathy is greater than previous result, the number of
counter-proposals generated is also greater than the previous result.
33
2.7 Conclusions and Future Work
An important aspect of conversation modeling is to make the conversations reflect
feelings. Feelings can be expressed by generating multiple responses. We have used
stochastic context-free grammars to generate responses by assigning probabilities to
productions. These probabilities are determined based on personality and situation data.
The emphasis in this paper is on developing the rejection part of the conversation. In
contrast to systems in which possible responses are hard-coded, our system allows greater
flexibility by extending the prototype with additional functionalities.
We are working to extend our prototype to be richer by combining emotions with
personality attributes. Apart from the personality and situation of the agent, we are trying
to determine the emotion-based probabilities in order to make the communication more
effective.
34
CHAPTER 3
EVALUATION OF EXISTING VIRTUAL REALITY TOOLS
In this paper, we review the literature in the field of virtual reality, and we identify the
basic requirements for a tool in this area. We identify key issues that must be addressed
for evaluating realistic virtual reality tools. Existing tools such as NetICE, Dive, Alpha
wolves, Jack and Active worlds are evaluated based on these key requirements.
3.1 Introduction
Computers have been in existence since the early 40s. However, it is only during last few
decades that the general public started using them in day-to-day activities. As interaction
with computers increases, the need for friendly user-interfaces increases. Several user
interface technologies (such as real-time applications, computer graphics, and graphic
displays) were widely used in the late 80s. These formed the basis for developing virtual
reality tools [28]. It is interesting to interact with a virtual world that is interactive and
easy to use instead of watching a picture on the monitor.
Virtual reality is a combination of computer techniques and interface devices that
present the user with the illusion of being in a three-dimensional world. The essence of
virtual reality is immersion, the ability to immerse the user as an active participant, as
opposed to a passive viewer [4].
The intent of virtual reality systems is to provide the user some tasks that serve as
a substitute for real world experience [4]. The main goal of virtual reality
implementations is to provide the user with an interface that is easy to use, interesting,
and intuitive.
35
This paper is organized as follows: In Section 3.2, we discuss the requirements
suggested by researchers for building virtual environments. In Section 3.3, we compare
the existing virtual reality tools based on the key issues. In Section 3.4, we discuss the
virtual reality systems. Section 3.5 gives a conclusion and some improvements to the
current virtual reality systems.
3.2 Requirements
Virtual reality systems are used to generate real time response and enhanced human
computer interaction. Interaction is made more plausible by using accurate acoustic and
video devices. Users can manipulate the objects present in the world and have them
respond in real time, making the user feel as if he/she is operating in a real world [2].
Researchers found the requirements completely depend upon the system being
developed. Developing a virtual reality system needs in-depth knowledge of computer
graphics, sensing and tracking technology, graphic displays, and real time applications. A
virtual system must provide the following basic features [5]:
1) It must provide effective collaboration.
2) It must integrate existing applications of the virtual world in a uniform and useradapted manner.
The requirements for building virtual reality tools can be classified into three groups
[2].
1) Application requirements (hardware and software requirements).
2) User requirements.
3) Practical requirements.
36
Application requirements include support for sensory input and output devices,
device independence, and multi-user support. User requirements include conventions in
navigation, realism, speed, and so on. Practical requirements include integration and
extensibility, as the virtual reality system needs to incorporate new devices as the domain
changes. This paper mainly focuses on user and practical requirements.
The main goal of virtual reality implementations is immersion. This makes the user
feel as if he/she is navigating in a real world. While building a virtual reality tool, it is
important to know the user’s expectations and interaction methods with the world. The
user requirements can further be classified as follows [9].
1) Ease of use: The length of time that users invest to learn how to use the system
depends on the complexity (the technical and non-technical aspects of the
system). Hence, the system should be appropriate for both technical and nontechnical users.
2) Conventions in navigation: Developers need to follow certain standard
conventions documented by virtual reality researchers in order to develop the user
interface. The techniques adopted for navigation should be intuitive so that users
know how to get around the world.
3) Realism: Users have growing expectations about the level of realism. Use of 3D
representations makes the world look more real. Developers need to consider
what level of realism satisfies the user depending on the model to be accessed.
4) Method of interaction: Users employ various interactive devices to
communicate with the virtual world. A mouse, joystick, and keyboard are some of
the interactive devices. Users with disabilities need specific requirements like
37
LCD glasses, and gloves for interacting with the virtual world. Therefore, the use
of an appropriate interactive device depends on the tool developed.
5) Speed: Virtual reality applications should update images at high speed. Ideally,
this should be no slower than conventional video frame refresh rates. The speed
of the system is directly related to the performance of the system and depends on
the hardware and software used to develop the system. In some cases, the speed of
the system depends on the network connection.
Virtual reality systems integrate several components. Because of various application
domains, virtual reality systems cannot be a closed system as new devices emerge, and
the need for the specialized devices increases. As the user expectations on the system
increase, the need for devices that are faster and accurate likewise increases. Thus, the
virtual system architecture needs to be open, extensible, and modular [2].
The design of multi-user applications, that can support thousands of users is one of
the challenging areas in computer science. SIGGRAPH 96 (Special Interest Groups on
Computer Graphics) had several panels, exhibits and discussions on multi-user virtual
communities. These forums introduced the notion of “avatar cyberspace” to the virtual
community.
The users in a virtual environment are termed avatars. Avatars navigate in the virtual
worlds, communicate with other avatars, and engage in a variety of collaborative acts. An
avatar is the body an agent wears in a virtual community [7]. It is a pictorial
representation of the user in a virtual world.
Users sometimes wish their environment to reflect their mood [15]. The emotions
exhibited by an avatar depend on the personality of the avatar [26]. It is desirable for the
38
virtual humans (avatars) in a virtual environment to possess personality and depict
emotions.
Communication becomes more effective if avatars display various body animations,
gestures, and facial animations, such as lip synchronization. Body animation needs to
incorporate body movement strategies such as solid foot contact, grasp, and interactions
between the agent’s body and objects within the environment [11]. They enrich and
clarify speech [22]. Lip synchronization is an important component of facial animation.
Lip synchronization provides techniques for synchronizing jaw and lip movements [22].
In summary, according to virtual reality researchers, virtual reality systems must be
extensible, easy to use, flexible, and provide effective communication to make the world
more realistic. These can be explained as follows:
1) Extensible: The need for specialized, fast and accurate devices arises as the
domain changes. Virtual reality systems need to be extensible to add new devices.
2) Integration with existing applications: It is desirable for the tool to integrate
with existing applications as it facilitates third party developers to develop more
interactive worlds.
3) Open environment: Some virtual reality systems support the appearance of a
new user, and some may not. An open system (allowing users to come and go) is
desirable if the system anticipates new users.
4) Non-verbal communication: It is desirable for the avatars in the world support
non-verbal communication such as body animation, gestures and lip
synchronization. Such non-verbal communication helps the user to depict his
mood and personality.
39
5) User control: Users have a different level of control for each tool. The user’s
control is essential to depict emotions, personality, mood, and manipulation of
objects in the world. The user’s level of control determines how easily the tool
can be used.
Apart from these requirements, we have identified additional key requirements in
order to analyze the existing models of virtual reality. They are as follows:
1) Demonstration: There should be a demonstration of the tool in order to support
the claims made by the developers of the tool. This helps the users or potential
buyers to analyze the performance, speed, flexibility, interactivity, and realism of
the tool.
2) Programming skills: Any specific programming skills the user may need to use
the tool should be stated upfront.
3) Avatar model: Avatar model helps the user to portray the environment in a
realistic way. For example, if the avatar is a 3D model, users feel the presence of
being in the environment.
4) Dynamic environment: In a virtual world environment, if either a new user or
the current users are aware of changes made to the environment, then the world is
described as dynamic.
5) Real time audio: The tool should support real time audio to enhance realism.
6) Avatar identity: The avatars should be identifiable within the environment.
7) Communication among avatars: Communication is essential in any virtual
reality environment, as it facilitates exchange of information. Developers use
different communication techniques based on the tool.
40
8) Software cost: The license and cost agreements needed to use the tool should be
reasonable.
3.3 Comparison of Virtual Reality Systems
In this section, we evaluate the existing virtual reality tools. They are NetICE,
Dive, Alpha wolves, Jack and Active worlds. We do our evaluation based on the key
issues addressed in Section 3.2.
3.3.1 NetICE
The Networked Intelligent Virtual Environment (NetICE) is a project developed by
Advanced Multimedia Processing Lab at the Carnegie Mellon University. It aims at
providing a virtual videoconference so that people from remote places can still feel that
they are communicating in person [20].
The website [25] provides a client-side executable file. The file can be
downloaded, and connection to the server can be established by using the specified IP
and port address. This provides the client with a virtual environment containing his/her
avatar. The avatar is represented as a 3D human body model able to simulate human
behaviors. Hence, the user does not need specific programming skills to use the
environment.
The NetICE is based on client-server architecture. The NetICE server maintains
the state of the system and distributes information to the clients. Each client is rendered a
3D audiovisual environment. The client can add his/her avatar to the environment, see the
virtual environment, and see the avatars of other users in the environment. The avatar can
41
change his/her position, move around the room, and raise his/her hands. The user can
choose his/her face model. The face model can be a realistic face model or a synthetic
face model [21]. The system does not support the animation of realistic face models, but
can animate synthetic face models.
Each avatar is identified by means of a nametag, which specifies the name of the
avatar. This can be achieved by labeling the avatar. The label will show up on the front
and back of the avatar’s shirt [21].
In NetICE, whenever a new avatar enters the virtual environment, the server sends
the current virtual environment information [21]. We, therefore, label the tool as open
and dynamic.
In NetICE, clients can exchange their information by using a shared white board.
The users can draw/write simultaneously on the shared white board. Communication is
made more interactive by sharing and manipulating the 3D virtual objects [21].
The website provides a demonstration of the virtual environment. The support for
basic facial expressions such as joy, anger, surprise, sadness, fear, and disgust are
provided. We have observed from the demonstration that the facial expressions are
limited, and the user is offered no control over them. Also, the lip movement is not
synchronized with the speech. The utterances are always in the same tone. In other
words, there is no emotion shown in the speech.
The system provides real time audio support by enabling the user to transmit
his/her own voice over the network, but this is not sufficiently demonstrated.
42
Currently, this tool can be used in virtual business conferences. This tool is a
possible replacement for instant messaging, and online systems. Figure 3.1 shows a
screen shot of NetICE virtual environment.
Figure 3.1: NetICE virtual environment
3.3.2 Dive
The Distributed Interactive Virtual Environment (Dive) [8] is a research prototype
developed at the Swedish Institute of Computer Science. Dive is a tool kit for developing
multi-user, distributed virtual environments. It aims at providing interaction among
networked participants in a multi-user application over the internet.
Dive is based on a distributed architecture, in which participants communicate by
both reliable and non-reliable multicast protocols. The main purpose of using a
distributed architecture (when compared to client-server architecture) is to reduce the
interaction time, thus avoiding network time latency [10]. The participant is provided
with a virtual environment called as visualizer (the default environment is called Vishnu)
containing his/her avatar. The user can only control the navigation of the avatar by using
arrow keys. The avatar is represented as a 3D body model.
43
Dive is a hierarchical database containing entities such as nodes (3D objects used
in the environment) and actors. Dive is written in ANSI C [10]. Dive is a shared,
distributed database. In order to develop Dive applications and plug-ins, one needs to
understand Dive libraries (written in C) to link to the applications. We, therefore,
conclude that developers need programming knowledge to run Dive applications.
Each participant in the Dive world is called an actor. Actors exchange information
by means of a memory, shared over the network. Each actor is represented as a bodyicon, distinguishing it from other actors. The user can use a variety of body-icons
provided by Dive. In Dive, actors enter and leave worlds dynamically, but there is no
description or demonstration available that shows how this can be achieved. It is claimed
that the dynamic behavior of objects can be obtained using TCL scripts. Hence, we label
the tool as being open and dynamic.
The website provides the demonstration of applications developed using the Dive
architecture. The actors can walk, jump, run, sit and turn their heads. Dive supports very
few body animations with no support for lip synchronization and facial animation.
Dive can be integrated with World-Wide-Web applications. Dive is a research
prototype and has no provisions for future enhancements.
Several virtual reality systems have integrated Dive in their systems. CyberMath,
a shared virtual reality system, is built on top of Dive. It aims at providing an exciting
and suitable way for exploring and teaching mathematics [35].
Dive’s distribution needs a license but the binaries are available for free. Binaries
provide the default Dive application (visualizer). This helps the developers to get an idea
44
of the Dive application.
Figure 3.2 shows a live shot of a conference using Dive
architecture.
Figure 3.2: Dive conference
3.3.3 Alpha Wolves
The Alpha Wolf is a project by Synthetic Characters Group at the Massachusetts Institute
of Technology Media Lab. It aims at implementing simple, common sense abilities found
in wolves. This system differs from other natural models by focusing on social learning
[3].
The Alpha Wolf system represents a pack of virtual wolves. The wolves in the
pack are 3D animated wolves. The pack consists of three wolf puppies that represent the
participants, and three fully autonomous adult wolves. Each participant plays the role of
one of the three puppies, which are black, white or gray in color. Participants direct the
behavior of wolves by howling, growling or barking into the microphone [36]. We,
therefore, concluded that the users do not need programming skills to use the system.
45
The user has high control over the actions of the wolf. The user controls the
actions of the wolves. By howling, barking or whining, participants tell the pups to howl,
growl or bark, and thus interact with other pups in the pack [36]. The users can tell their
pups where to go and with whom to interact by making various sounds over the
microphone. The system supports real time audio by having a microphone through which
users direct their pups.
The Alpha Wolf environment is a pack of virtual wolves. Wolves are not allowed
to enter or leave the world. By howling, growling, whining or barking, wolves form
social relationships with other wolves. The wolves remember their social relationships by
forming an emotional memory. The next time they meet, their remembered relationships
direct their interaction [36]. We, therefore, conclude the system is closed and dynamic.
In Alpha Wolf, the emotional state of the wolf is computed by the system, but
there is not enough description to support the claim. There are two states in which a wolf
can be, dominant or submissive. The wolf can be made dominant by constant growling at
the other wolves. The system can exhibit the submissive behavior by whining at the other
wolves. From the animation pictures that are available on the website, we conclude that
these animated wolves support some emotions such as happiness (when they meet a wolf
of the same pack), fear (when a wolf approaches a dominant wolf) and anger (when a
wolf approaches a submissive wolf) [36]. The wolves support body animations such
walking and running.
There are a number of ways in which the tool described here could be extended.
For example, the system could incorporate a more elaborate emotional model, or some
other means of providing various behaviors for the characters.
46
The website provides some pictures of animation. It provides a video
demonstrating the installation made at SIGGRAPH 2001. A screen shot from the
installation that shows social relationships among wolves is shown in Figure 3.3.
We believe that real world applications could benefit from this research, as it is an
example of a computational model of social learning.
Figure 3.3: Wolves forming social relationships
3.3.4 Jack
Jack [17] is a software tool developed by Electronic Data Systems Corporation. Using
Jack, one can build virtual environments, virtual humans, position the humans in the
environment, assign tasks to the virtual humans, and thus analyze the performance of the
humans. A female embodiment called Jill is also provided. The avatar is represented as a
3D human body model. The avatars are identified by name. The user can select male and
female avatars, set the position of the avatar in the environment, or use an existing
channel for playback in the environment.
Jack is a simulation-software solution that helps various industries improve the
ergonomics of product designs and workplace tasks. These virtual humans can tell
47
engineers important ergonomical information, such as what they see and reach, when and
why they are getting hurt, when they are tired, and so on. There is a demonstration
available on the website. From the demonstration, we observed that Jack supports various
body animations. The virtual human does not show any facial expressions. The tool
supports 77 body animations. The tool provides a motion capture tool kit that is used to
generate gestures. Basic and complex gestures can be generated using this tool kit.
Jack can be integrated with virtual environments. The tool can be extended to
create user defined virtual humans by using a template. Jack is used to perform the tasks
that are assigned to it. Jack software is used in sorting mail, factory work, automotive
door assembly, etc. It cannot handle dynamic changes in the environment. Hence, we
describe the tool as being closed and static.
The cost of the product and the license details are not specified. There are no
sample codes available to test the software. The website does not provide sufficient
information as to how the tool works, system requirements, or the programming skills to
use the tool. Figure 3.4 shows a live shot of Jack’s ergonomics of product designs.
Figure 3.4: Jack changing the tools of a machine
3.3.5 Active Worlds
Active Worlds [1] aims at building 3D virtual reality environments, wherein thousands of
users can chat in these fascinating worlds. It also provides e-business services such as
48
online shopping in a 3D virtual shop. Using active worlds, one can build a 3D virtual
home on the internet, as well as play virtual games.
The website permits launching the Active World’s plug-in for free. After
installing the plug-in, one can enter the active world by creating a user name. This
provides the user with a virtual environment containing his/her avatar. The avatar is
represented as a 3D human body model. Active World technology allows even a novice
user to build a custom 3D homepage. Thus, a user without any programming skills can
use the environment.
Each user can add his/her avatar to the environment, see the environment, and see
the avatars of other users. The user has the option of using either a male or female avatar
as his/her identity. The avatar can walk, run, jump, fly, and dance in the environment.
The user can control the avatar’s movement and the world (environment) features. Active
Worlds support a set of emotions, but this is not sufficiently demonstrated. The system
supports various body animations, claims to support facial animations, but there is no
support for lip synchronization.
Each avatar is distinguished from other avatars by having a user name which
appears as a text bubble. In Active Worlds, whenever a new avatar enters the
environment, the avatar does not get the current environment information. We, therefore,
conclude the tool is static and open.
In Active Worlds, avatars exchange information by means of a shared chat
window. The text the user types appear in a text bubble.
49
Many developers have used Active World’s extensive software development kit
to implement various virtual games. Using the algorithms provided by Active Worlds, the
tool could be extended to make the world look more interactive.
Active Worlds Corporation opens a new vista in the field of online
communication, collaboration, and interactive entertainment. Support is provided in
terms of online help section. Visiting and chatting in active worlds as a tourist is free.
Users who need extra privileges in the environment may register as a citizen of the active
world by paying a nominal citizenship fee.
Figure 3.5 shows a live shot of the
environment.
Figure 3.5: Active Worlds virtual environment
50
Table 3.1 lists the evaluation criteria and the performance of each tools based on the
criteria discussed in Section 3.2.
Table 3.1: Evaluation of existing virtual reality tools
Criteria
NetICE
Dive
Demonstration
Provided
Provided
Programming
skills
Not required
Required
Avatar model
3D human 3D
body 3D animated 3D human 3D human
body model model
wolf
body model body model
Extensible
No
Environment
type
Open
and Open
and Close and Close
Dynamic
Dynamic
Dynamic
Static
Non-Verbal
communication
Limited
Very limited
Decent
Limited
Limited
Real time audio
Provided
Not
provided
Provided
Not
provided
Not
provided
User control
Limited
Very limited
High
Limited
Limited
Integration
with
existing No
applications
Yes
No
Yes
No
Communication Shared
among avatars white board
Shared
memory
Memory
Not
applicable
Shared chat
window
No
Alpha
Jack
wolves
Pictures of
animation
Provided
are available
Active
worlds
Not required
Not required
Yes
Provided
Not
specified
Yes
Yes
and Open
Static
and
Avatar identity
Name tag
Body icon
Wolf color
Agent name
User name
Software cost
Not public
Needs
license
Not public
Not
specified
Free
51
3.4 Discussion
A demonstration serves as a working model of the tool. With the exception of Alpha
Wolves, all of the systems provide a demonstration of the tool. It is observed that users or
vendors who use the Dive tool need specific programming skills as the libraries are
written in C.
The ability to extend the world without destroying the older objects serves as an
incremental development of a system. It is observed that Alpha Wolves, Jack, and Active
Worlds are extensible. The distributed architecture of Dive and Jack can be integrated
with existing applications.
All the systems support a 3D avatar model. This makes the world more realistic to
the user. Each system uses a different naming system to identify the avatars. To aid
cooperation, sharing of views or mappings is necessary in a multi-user environment. It is
observed that each of the systems, use a specific communication technique depending on
the tool.
The ability of the system to add new users and be able to dynamically rebind to a
new virtual world improves the flexibility of the system. The majority of the systems
provide support for adding new users. It is observed that NetICE and Dive are open and
dynamic systems. These systems are more flexible when compared to other systems.
Non-verbal communication plays an important role in collaborative systems. All
of the systems provide minimal support for non-verbal communication. It is desirable that
virtual reality systems operate in real time. It is observed that only NetICE and Alpha
Wolves operate in real time.
All of the systems considered provide support for user control. It is desirable that
the system supports high user control. In Alpha Wolves, the user has a high control on the
52
behavior of the wolves. Active Worlds software is available for free. Dive binaries are
available for free, but the distribution needs a license.
From our analysis, it is clear that each tool has been designed for a different
purpose. Based on our research, we feel that Dive and NetICE are more flexible when
compared to other tools.
3.5 Conclusions and Future Work
We have identified the basic requirements for building a virtual reality tool. To build a
virtual world, it is important for the system to be extensible, flexible and collaborative.
Additionally, we feel that a demonstration, license agreement, skill set, non-verbal
communication, user control, and avatar details must be provided to the user. We have
analyzed the current theories in virtual reality. We found that Dive and NetICE are more
flexible when compared to other tools.
As discussed in the previous sections, virtual worlds can be used for a wide range of
purposes. Therefore, research in the development on virtual reality must focus on the
following issues.
How to provide media support that can handle text, voice, and audio inputs and
outputs.
How to implement protocols related to social and cohesion aspects in a
community.
Ensuring hardware and software interoperability.
53
CHAPTER 4
RECOMMENDATIONS AND CONCLUSIONS
This report has described an innovative method for developing conversation
structures by using stochastic context-free grammars. Stochastic context-free grammars
are used to generate multiple responses to express the strength of the feeling of the agent
during a conversation as well as specific parts of a response (such as rejection and
counter-proposal).
The report also analyzes the existing research in the field of virtual reality. Based
on opinions of key researchers and our own evaluations, we identify the key issues that
must be addressed for evaluating various virtual reality tools. Existing tools NetICE,
Dive, Alpha Wolves, Jack, and Active Worlds have been evaluated based on these key
requirements.
From our analysis of the tools, it is clear that the various tools were designed for
different purposes. We believe that some of the tools can be pipelined together to form a
meta-tool. This meta-tool can be used to generate more plausible animations.
Dive is a tool kit for developing multi-user, distributed virtual environments by
providing interaction among networked participants over the Internet. In Dive, avatars
support few body animations with no support for facial animations. Hence, we feel that
this approach cannot be used to generate responses based on facial expressions.
Alpha Wolves serves as a computational model for social learning. In this
approach, an avatar is represented as a 3D animated wolf. The responses would be natural
actions made by a wolf, such as howling, growling, barking, and whining. Hence, this
54
approach is not suitable to generate responses in the English language. Moreover, the
avatars support very few facial animations, such as happiness and fear.
Jack is a software tool used to improve the ergonomics of product designs and
workplace tasks. This simulation-software is used only to perform the tasks that are
assigned to it. Hence, this tool cannot be used to generate responses as conversations do
not play any role in this software tool.
Active Worlds is 3D virtual reality chat software. In Active Worlds, avatars enter
and leave the world dynamically. The responses generated by various avatars are random,
that is the responses are not intended for a particular participant. Therefore, we cannot
use this tool to model responses as we cannot choose a sender and a receiver, and
moreover, avatars do not support any facial expressions.
NetICE is used for virtual business conferencing. The avatars in NetICE support
basic facial expressions such as joy, anger, surprise, fear, disgust and sadness. Based on
our research, we feel that Networked Intelligent Virtual Environment (NetICE)
architecture can be integrated with our technique for response generation to provide
effective user interface.
Possible Integration of Techniques
NetICE contains a centralized multi-point control unit (MCU) to which each
user/participant sends data streams. The participant’s emotional information can be
gathered from the MCU. The participant’s personality information can be collected by
taking an IPIP-NEO personality survey. Facial expressions can be pipelined with
emotions and personality to generate output responses.
55
Figure 4.1: Avatar in NetICE showing an angry face.
For example, if the input is an angry facial expression, then the set of output
responses would be “No, I can’t, I won’t”.
The library of emotions supported by the avatars can be used to generate various
hand gestures. The user can control the hand movements of the avatars by choosing from
menu options.
Figure 4.2: Avatars in NetICE raising their hands.
We feel that generating hand movements based on personality/emotion values
would make the avatars more believable. For example, a participant with a high value of
assertiveness can seek the attention of others by making sharp hand gestures, such as
raising his hand. When the avatars are at normal level of assertiveness, they can move
their arms down to the resting position
We feel that applications can use personality/emotion to generate responses (the
method described above) to enhance the believability of avatars.
56
This report has described an innovative method for generating agent responses
based on stochastic context-free grammars. The various personality traits used by the
agents and the affects it can have on the conversation have been described. The technique
employs five personality traits: anger, sympathy, depression, cooperation and
assertiveness. We describe the rejection language mapping based on these five
personality traits.
An important aspect of conversation modeling is to make the conversation reflect
the mood of the agent. The strength of the feeling of the agent can be expressed by
generating several responses. We have used stochastic context-free grammars to generate
responses by assigning probabilities to productions. Probabilities are determined by
personality attributes. In contrast to systems in which possible responses are hard-coded,
our system allows greater flexibility. We have developed a framework for generating
responses that can be extended with additional functionalities.
This report also analyses the current literature in the field of virtual reality. We
have identified the basic requirements for building a virtual reality tool. To build a virtual
reality tool, it is important for the system to be extensible, open, dynamic, support
realism, and support integration. Additionally, we feel that a demonstration, license
agreement, skill set, non-verbal communication, user control, avatar details must be
provided to the user.
We have shown how a tool can be coupled with response generation to enhance
believability of agents.
57
REFERENCES:
[1] Active worlds (http://www.activeworlds.com active as of May 17, 2004).
[2] R. Blach, J. Landauer, A. Rosch and Andreas Simon, “A Highly Flexible Virtual
Reality System”, Special issue on virtual reality and issue in industry and research,
Volume 14, Issue 3-4, (1998),167-178,.
[3] B.Blumberg and B.Tomlinson, “AlphaWolf: Social learning, Emotion and
Development in Autonomous Virtual Agents”, First GSFC/JPL Workshop on Radical
Agent Concepts. NASA Goddard Space Flight Center, MD, (2002), 35-45.
[4] S.Bryson, S.Feiner, F.Brooks, P.Hubbard, R.Pausch and A.Dam, “Research frontiers
in Virtual Reality”, Proceedings of the 21st annual conference on Computer graphics and
interactive techniques, (July 1994), 473-474.
[5] P.Christiansson, "Capture of user requirements and structuring of collaborative VR
environments". AVR II & CONVR 2001, Conference on Applied Virtual Reality in
Engineering & Construction Applications of Virtual Reality. (eds: O. Tullberg, N.
Dawood, M. Connell. 201 pp.) Gothenburg October 4-5, (2001), 1 – 17.
[6] S.Cost, Y.Chen, T. Finin and Y.Labrou, “Using Colored Petri Nets for Conversation
Modeling”, Issues in Agent Communication, Frank Dignum and Mark Greaves (editors),
Springer-Verlag, Lecture Notes in AI, (2000),178 – 192.
58
[7] B.Damer, S.DiPaola, J.Paniaras, K.Parsons, B.Roel and M.Ma, “Putting a human face
on cyberspace (panel): designing avatars and the virtual worlds they live in”, Proceedings
of the 24th annual conference on Computer graphics and interactive techniques, (August
1997), 462-464.
[8] Dive. Distributed Interactive Virtual Environment (http://www.sics.se/dive/ active as
of May 17, 2004).
[9] K.Fernie and J.Richards, “Creating and Using Virtual Reality: A Guide for the Arts
and Humanities”, (http://vads.ahds.ac.uk/guides/vr_guide/ active as of May 17 2004).
[10] E.Frecon and M.Stenius, “DIVE : A scaleable network architecture for distributed
virtual environments”. Distributed Systems Engineering Journal, Special Issues on
Distributed Virtual Environments, (1998), 91- 100.
[11] J.Gratch, J.Rickel, E.Andre, N.Badler, J.Cassell and E.Petajan, "Creating Interactive
Virtual Humans: Some Assembly Required”, in IEEE Intelligent Systems, (July 2002),
54-63.
[12] J.Gratch, and S.Marshella “Using emotion to change belief”, In proceedings of the
1st International Joint Conference on Autonomous Agents and Multi-Agent Systems,
Bologona, Italy, (2002), 334-341.
59
[13] A.Glan and A.Baker, “Multi-Agent Communication in JAFMAS”, Autonomous
Agents Workshop on Specifying and Implementing Conversational Policies, Seattle,
Washington, (1999), 67-70.
[14] M.Greaves, H.Holmback and J.Bradshaw, “What is a conversation policy”, Lecture
notes in Computer Science, Issues in Agent Communication, (2000), 118-131.
[15] C.Greenhalgh, “Approaches to Distributing Virtual Reality Systems”, Techincal
Report NOTTCS-TR-96-5, Department of Computer Science, The University of
Nottingham, (August 1996).
[16] C.Guinn, R. Hubal and G.Frank, “AVATALK Virtual Humans for Training with
Computer Generated Forces”, Proceedings of the Ninth Conference on Computer
Generated Forces. With G.A.Institute for Simulation & Training: Orlando, FL, (2000),
133 – 138.
[17] Jack. (http://www.plmsolutions-eds.com/products/efactory/jack active as of May 17,
2004).
[18] O.P.John, “The "Big Five" factor taxonomy: Dimensions of personality in the
natural language and in questionnaires”, In LA Pervin (Eds.), Handbook of personality:
Theory and research, New York: Guilford, (1990), 66- 100.
60
[19] S.Kshirsagar and N.Thalamann, “Virtual Humans Personified”, Proceedings of
Autonomous Agents Conference, (July 2002), 356- 359.
[20] W.Leung, K.Goudeaux, S.Panichpapiboon, B.Wang and T.Chen, “Networked
Intelligent Collaborative Environment (NetICE)”, IEEE International Conference on
Multimedia and Expo, New York, (July 2000), 1645 – 1648.
[21] W.Leung and T.Chen, “A Multi-User 3-D Virtual Environment with Interactive
Collaboration and Shared Whiteboard Technologies”, Journal of Multimedia Tools and
Applications, Special Issue on Distributed Multimedia Systems for Virtual Society, Vol.
19, No. 3, (April 2003), 7-23.
[22] W.Leung and T.Chen, "Creating a multi-user 3D environment", IEEE Signal
Processing Magazine, Vol 12, (May 2001), 9-16.
[23] P.Maes, R.Guttman and A.Moukas, “Agents that Buy and Sell in Communications
of the ACM” (Vol.42, Iss.3), ACM Press, NY, USA, (1999), 81- 93.
[24] J.Milde. “Lokutor: A communicative Agent for Intelligent Virtual Environments” In
International European Simulation Multi-Conference (ESM 2000), Ghent , (2000), 737777.
61
[25]
NetICE.
Networked
Intelligent
Collaborative
Environment
(http://amp.ece.cmu.edu/projects/NetICE active as of May 17 2004).
[26] Oz group (http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/oz/web/papers/CMUCS-97- 156.html active as of May 17 2004).
[27] V.Parunak, “Visualizing agent conversations: Using Enhanced Dooley graphs for
agent design and analysis”, In Proceedings of the Second International Conference on
Multi-Agent Systems, (1996), 275-282.
[28] C.Regan, “An investigation into Nausea and Other Side-effects of Head-coupled
Immersive Virtual Reality”, Virtual Reality Vol 1, No 1, (1995) 17-32.
[29] S.Reilly “Believable Social and Emotional Agents”, Techincal Report CMU-CS-96138, School of Computer Science, Carnegie Mellon University, (1996).
[30] J.Rickel and L.Johnson, “STEVE: A Pedagogical Agent for Virtual Reality (video)”
in Proceedings of the Second International Conference on Autonomous Agents, (May
1998), 332-338.
[31] J.Rickel and L.Johnson, “Task Oriented Dialogs with Animated Agents in Virtual
Reality”, Proceedings of the First Workshop on Embodied Conversational Characters,
(October 1998), 39 – 46.
62
[32] R.Sander, “Conversational Coherence: Form, Structure and Strategy”, Sage
Publications, (1983), 67-80.
[33] J.Searle, “Speech Acts: An Essay in the Philosophy of Language” Cambridge
University Press, Cambridge, England, (1969).
[34] A.Stolcke, “Bayesian Learning of Probabilistic Language Models”, Ph.d, University
of California at Berkeley, (1994).
[35] G.Taxén and A.Naeve, “CyberMath: Exploring Open Issues in VR-Based Learning”,
SIGGRAPH 2001 Educators Program. In SIGGRAPH 2001 Conference Abstracts and
Applications, (2001), 49-51.
[36] B.Tomlinson, M.Downie, M.Berlin, J.Gray, D.Lyons, J.Cochran and B. Blumberg,
“Leashing the AlphaWolves: Mixing User Direction with Autonomous Emotion in a
Pack of Semi-Automous Virtual Characters”, Proceedings of the 2002 ACM SIGGRAPH
Symposium on Computer Animation, (2002), 7-14.
[37] VR: Virtual reality definition
(http://www.foresight.org/UTF/Unbound_LBW/Glossary.html active as on May 17,
2004).
63
[38] T.Wagner, B.Benyo, V.Lesser and P.Xuan, “Investigating interactions between
agent conversations and agent control components”, Agents 99 Workshop on
Conversation Policies, (1999), 314- 331.
[39] G.Weiss, “Multiagent Systems: A Modern approach to Distributed Artificial
Intelligence”, MIT Press, Cambridge, Massachusetts, (1999), 27-60.
[40] M.Wooldridge and N.Jennings, Book on “Agent Technology, Foundations,
Applications and Markets”, Springer-Verlag, (1998), 4 – 7.
64
© Copyright 2025 Paperzz