Multimodal Communication
in the Staging Virtual farm
Patrizia Paggio and Bart Jongejan
Center for Sprogteknologi
MUMIN workshop
Helsinki 2002
The Staging project
(www.staging.dk)
Interdisciplinary Danish project: nature and use of 3D
applications populated with autonomous agents.
CST’s work: multimodal communication components
of a 3D virtual farm.
Focus: multimodal integration, mixed-initiative
dialogue, interaction between dialogue and other
behaviours.
Paggio and Jongejan - Helsinki ‘02
2
The Staging VE
The VE
is in charge of simulating the world
provides the agents with sensory information
processes requests from the agents (move objects,
produce sounds, play animations)
Staging VE developed at CVMT (Aalborg University)
CST has developed a mock-up for testing purposes.
Paggio and Jongejan - Helsinki ‘02
3
Agents
Agents carry out behaviours
in reaction to external stimuli and according to their
inner state (hunger, tiredness…)
based on strength of activation level
Engaging in a dialogue with the user’s avatar is also a
behaviour.
Dialogue behaviour has strong degree of activation for
the farmer agent.
Paggio and Jongejan - Helsinki ‘02
4
The Aalborg VE
Paggio and Jongejan - Helsinki ‘02
5
The CST farm
Her skal vises et billede af vores VE
Paggio and Jongejan - Helsinki ‘02
6
Multimodal communication
User can interact with agents via various devices:
microphone, keyboard, touch screen, data glove.
Commercial speech technology, dedicated gesture
recogniser (Karin Husballe Munk at CVMT).
Speech can be combined with deictic, iconic and turntaking gestures (Cassell and Prevost 1996). Gestures
and speech merged by multimodal parser.
Paggio and Jongejan - Helsinki ‘02
8
Multimodal integration
Hand movements
Speech
Speech recognition
Gesture recognition
pointing,
size
Chart initialisation
Parsing
turntaking
Semantic mapping
Communication management
Action
More integration
Gesture and word are paired:
Feed that cow$1|cow
Gesture adds information to lexicon entry.
Word and gesture must be (nearly) synchronous
Syntactic constraints:
deictic (pointing) requires noun or pronoun
iconic (size) requires noun
Semantic constraints:
semantic types must be compatible
Paggio and Jongejan - Helsinki ‘02
10
Example
Feed that cow$1|animal.
pointgesture := <object-type>$<internal-id>
{act=request,
predicate=feed,
arg3={reln=animal,
semtype=animal,
objectid=cow$1}}
reln and object type unified, semtype compatible, objectid
added.
Paggio and Jongejan - Helsinki ‘02
11
Contradiction example
Feed that cow$1|apple.
{act=request,
predicate=feed,
arg3={reln=animal,
semtype=animal,
objectid=cow$1}}
gesture and noun semantic types incompatible; only the
interpretation provided by the gesture is compatible with
the semantics of the predicate and survives.
Paggio and Jongejan - Helsinki ‘02
12
Examples
Deictic gestures
U: Feed an animal, please. A: Which animal shall I feed?
U: Take that cow (+ pointing)
Iconic gestures
U: Feed the sheep, please. A: Which food shall I take?
U: The small apple (+ size)
Turn-giving and taking gestures
U: Hi (+give turn) A: Shall we...
Paggio and Jongejan - Helsinki ‘02
13
The Communication Manager
Interprets user’s dialogue moves
Builds dialogue trees
Interprets references not resolved by gestures
Decides agent’s dialogue moves based on preceding
dialogue and on changes in the VE
Dialogue goals arising from scenario combined with
dialogue obligations created by preceding dialogue.
Paggio and Jongejan - Helsinki ‘02
14
Dialogue goals
Dialogue goals are created based on domain-specific
action templates (Badler et al 1999).
A template specifies actions with related semantic
arguments, corresponding attribute name in the
semantic representation, necessary preconditions.
FeedAction(Topic=Feed, Animal=<arg3>,
Food=<arg2>, Tool=<instr>,
Precondition=Hungry(Animal))
Paggio and Jongejan - Helsinki ‘02
15
Example: feed action
U: Hi come here
A: Okay, I’ll do it
U: Feed an animal.
A: Which animal shall I take?
U: That cow$1|cow.
A: Which food shall I take?
U: (Take) a small$|small apple.
A: Which tool shall I take?
U: Take the pitchfork.
A: Okay, I’ll do it.
Paggio and Jongejan - Helsinki ‘02
16
Example: precondition not met
U: Give that brown cow$2|cow an apple, please.
...
A: The cow is not hungry.
Paggio and Jongejan - Helsinki ‘02
17
Example: agent initiative
A: Shall I feed the brown cows and the sheep?
U: Yes, give the animals a carrot.
A: Which tool shall I take?
U: The pitchfork.
A: Okay, I’ll do it.
Paggio and Jongejan - Helsinki ‘02
18
Dialogue obligations
Set of condition/obligation pairs model valid speech
act sequences.
E.g.: Request/Accept, Reject
Whque/Answer, Inform
Used to
produce a correct reaction to a user move
interpret a user move as either closing a dialogue
segment or opening a new one
Paggio and Jongejan - Helsinki ‘02
19
Dialogue trees
request U: Give the white cow an apple please.
whque
A: Which tool shall I use?
whque
U: Where is the pitchfork?
inform
A: The pitchfork is in front of
the tree.
request
U: Take the pitchfork then.
accept A: Okay, I’ll do it.
Paggio and Jongejan - Helsinki ‘02
20
Relaxing the rules
Condition/obligation pairs do not always fit.
Speech acts can be implied:
A: Hi
U: Feed the animals please
They can be coerced:
U: Feed an animal.
A: Which animal shall I take?
U: Feed the brown cow then.
Paggio and Jongejan - Helsinki ‘02
21
Conclusions
Staging has made an initial attempt at giving an agent
multimodal dialogue abilities to allow for mixedinitiative dialogues.
Future research:
more advanced gesture recognition
better understanding of how gestures and speech
can complement each other
repairs and self-repairs
interaction between dialogue and other behaviours
Paggio and Jongejan - Helsinki ‘02
22
FILM
Paggio and Jongejan - Helsinki ‘02
23
© Copyright 2026 Paperzz