Interaction within Multimodal Environments in a

Interaction within Multimodal Environments in a Collaborative Setting
Glenn A. Martin, Jason Daly and Casey Thurston
Institute for Simulation & Training, University of Central Florida
3280 Progress Dr., Orlando, FL 32826
{martin, jdaly, cthursto}@ist.ucf.edu
Abstract
Much attention has been given to the topic of multimodal environments as of late. Research in haptics is thriving
and even some olfactory research is appearing. However, while there has been research in collaborative
environments in the general sense, minimal focus has been given to the interaction present in multimodal
environments within a collaborative setting. This paper presents research into a network architecture for
collaborative, multimodal environments with a focus on interaction.
Within a dynamic world three types of human interaction are possible: user-to-user, world-to-user, and user-toworld. The first two form what are fundamentally multimodal environments; the user receives feedback using
various modalities both from other users and the world. The last one forms the area of dynamic environments where
the user can affect the world itself. The architecture presented here provides a flexible mechanism for studying
multimodal environments in this distributed sense. Within such an architecture, latency and synchronization are key
concepts when bringing multimodal environments to a collaborative setting.
1
Introduction
Virtual environments have existed for many years. However, despite numerous distributed simulations, including
training uses in many domains, there still is relatively little interaction within these virtual environments. Most
allow only navigation and perhaps some minimal interaction. For example, military simulations allow basic
combat-oriented interaction (shooting, throwing a grenade, etc.), but only few of those incorporate advanced
modalities (such as haptics or olfactory) or dynamic changes to the environment itself.
2
Types of Interaction
Within a distributed, interactive virtual environment there are many components to address: the user, the
environment and other users. In categorizing the types of interaction we have split them into four categories: userto-user, world-to-user, user-to-world and world-to-world. The first two types form what are fundamentally
multimodal environments. Whether from another user or from the world, the user gets feedback from the various
modalities. In the case of user-to-user it may be grabbing a shoulder or shooting an opponent. For world-to-user it
may be bumping into a piece of furniture in the virtual environment.
The latter two types of interaction form what we call dynamic environments. For example, the user can act upon the
world by moving an object or changing the characteristics of an object (e.g. watering a virtual plant making the soil
damper). Similarly, within a world-to-world interaction a rain model might change the soil moisture. In this paper
we will focus on the user-to-world interactions, however.
3
Multimodal Environments
In terms of virtual environments, a multimodal environment is one that the user experiences and interacts with using
more than one sense or interaction technique. Strictly speaking, an environment that provides visual and auditory
feedback is a multimodal environment; however, the more interesting environments are those that employ haptic
and/or olfactory feedback in addition to visual and auditory. Gustatory, or taste feedback, is also possible, but there
is almost no evidence of this in the literature, so we will not discuss it here. On the interaction side, systems that
take input using more than one technique can also be considered multimodal. For example, a system that provides a
joystick for locomotion as well as a speech recognition system for accepting commands to alter the environment can
be called a multimodal environment. In this section, we focus on the senses and how the user experiences a
multimodal environment, as well as a software architecture that supports the modeling and design of multimodal
environments in a seamless way. Though it is difficult to develop a clear metric to determine whether adding more
senses contributes to a greater sense of presence or immersion, there have been studies conducted that suggested this
(Meehan, Insko, Whitton & Brooks, 2002). Also, Stanney et al. have shown that adding sensory channels allows the
user to process more data than the same amount of information presented over a single sensory channel (2004).
3.1
Visual
Apart from a few specialized applications, a virtual environment always provides visual feedback. The visual sense
is essential for navigation and detailed exploration of the environment. While other senses may introduce an event,
entity, or object to investigate, it is ultimately the visual sense that will be used to analyze and deal with the object of
interest. For this reason, an environment designed to be multimodal cannot ignore or diminish the importance of
vision, and an appropriate amount of resources should be spent on creating high-fidelity visuals. Visual feedback is
essential to all types of collaborative interaction. Signals, gestures, and other visual interaction allow user-user
collaboration. The primary means for the user to experience the world is through visual exploration. Finally, the
ability for the user to visually perceive the world is a prerequisite for user-world interaction.
3.2
Auditory
Most modern environments also provide some level of auditory feedback. It may be as simple as a beep or click
when a certain event happens, or it may involve realistic, spatialized sounds coordinated with virtual objects and
enhanced with environmental effects. The auditory sense is a very useful and natural channel for informing the user
of events that occur outside his or her field of view. Simple speech is probably the most natural user to user
interaction in a collaborative environment. With a high-fidelity acoustic rendering, the user may also be able to
identify and keep track of one or more sound sources without relying on vision (some examples are a bird in a tree, a
splashing stream, or gunshots from a sniper positioned in a second-story window). Also, environmental effects such
as reflection and reverberation can give clues to the size and composition of the immediate environment or room.
These are two examples of world-user interactions using the auditory channel.
3.3
Haptic
Haptics, or the sense of touch, has more recently been introduced into virtual environments. The term “haptic
feedback” covers many different types of feedback. One is tactile feedback, which consists of physical stimuli to
alert the user’s sense of touch to an event or environmental feature. Tactile feedback can take the form of vibrators
for general tactile events, thermal heating pads for providing thermal feedback, or electromechanical tactile displays
for allowing the user to feel the physical characteristics of a surface. The other form of haptics is force feedback.
This type of feedback physically exerts force on the user’s body or resists his movements. Force feedback allows
the user to grasp virtual objects and feel their structure or to explore a deformable virtual surface in high detail with
a stylus. Haptics open up a whole new range of user-user interactions. Using a force-feedback device one user can
grasp or tap another user’s shoulder, which the other user may perceive via vibrotactile feedback. This kind of nonverbal communication may be useful for soldier training simulations. Similarly, a wide range of world-user
interactions open up with haptics. Haptics may allow a user to feel low-frequency vibrations from explosions, detect
heat coming from a fire behind a closed door, or feel the bones underneath the skin in a medical simulation. Highly
realistic user-world interactions can also be facilitated by haptic devices.
3.4
Olfactory
Olfaction, or the sense of smell, is just beginning to appear in multimodal environments. The psychological effects
of certain scents is well-documented in the literature, so one could infer that these effects, as well as others, may be
useful for creating more realistic environments, or for creating certain desired effects. Technologically, olfaction is
somewhat restricted in that any scents to be presented must be packaged and delivered using some form of
technology. While these technologies have progressed somewhat in recent years, they still can only deliver a few
distinct scents each. It would be desirable to be able to use a small set of basic scents that could combine to create a
wide variety of compound scents. However, no conclusive work has been done to accomplish this in a general
sense. Another issue is that scents presented to the user must be able to dissipate quickly or otherwise be removed
from the environment. If scents are allowed to linger and combine, they could lead to simulator sickness issues,
especially if the scents used are relatively strong. In terms of interaction, olfaction can be a particularly powerful
world-user interaction method if the scent used has a natural association with the virtual scent source.
3.5
Software Architecture
Each modality requires a certain software technique to efficiently model the world and present it to the user. The
visual sense is well-studied and several techniques are available to handle visual modeling and rendering. The scene
graph is one such technique and works well in a general-purpose software system. The auditory sense has also been
dealt with in detail, and several different audio libraries have emerged as a result. All of them tend to model the
world using a single listener and many sound sources, providing essential features such as spatialization and
attenuation by distance. Depending on the library, other features such as reflections, reverberation, and occlusion
may be available as well. Haptics is not nearly as well-developed as the previous two modalities, and the task of
characterizing haptic interactions and providing a generic software platform to support these interactions is quite
challenging. Fundamental to haptic interactions is collision detection, so a large part of a haptic system would
require surface modeling (similar to visual modeling) and intersection tests between surfaces. Furthermore, because
haptic interactions require very high frame rates (around 1000 Hz), these intersection tests and collision responses
must be handled very efficiently. Finally, olfaction is very new to virtual environments; however the techniques
required are very similar to those used by the auditory sense. The same source/detector paradigm works well for
sources of scent in an unobstructed environment. When the environment is divided into different rooms, occlusion
of scent sources may be needed (depending on the “ventilation” properties of the virtual environment).
To deal with multiple modalities at once, it helps to have a software architecture that models the world as a seamless
whole, but contains all the necessary features to produce the appropriate stimuli over the various modalities. For
example, a virtual object is typically visible, but it may also produce sound or a scent, and it may be possible to
grasp it or throw it at a different user. In a system which models the world separately for each modality, this would
require two distinct models for the visual and haptic senses, plus coordination of the motion of the visual and haptic
models, as well as the positions of the object’s sound source and scent source. With a unified architecture, the
virtual object can be represented as a single object, containing attributes that enable it to be rendered over the
various modalities. A geometric representation can be used for both the visual and haptic senses (although the
haptic model may need to be simplified for fast haptic rendering). All the parameters of a sound source can be
encapsulated into another attribute, while the scent parameters for the scent source can be carried by a third attribute.
On the user’s side, the structure is similar. Attributes encapsulating the user’s viewpoint, sound listener, and scent
detector are attached to the virtual object which represents the user (the user’s avatar). See Figure 1.
Scene
Sound Listener
Viewpoint
Avatar
Object
Scent Source
Scent
Sound Source
Scent Detector
Geometry
Geometry
Sound Data
Figure 1: Attaching attributes to scene components
This structure is essentially an augmented scene graph. Besides the normal scene graph hierarchy of scene, groups,
and geometry, there are various attributes that can be attached to the components of the scene graph. Certain
components in the scene form the root of subgraphs that represent virtual objects of interest. To give the virtual
object multimodal features, we need only to create and attach the multimodal attributes that we need. After the
scene graph is manipulated in preparation for drawing a new visual frame, the various multimodal attributes are
updated based on the latest position, orientation, and other relevant state of the component to which they are
attached. Thus, the visual rendering is handled just as it would be in a visual-only scene graph system. Audio
rendering is updated based on the new global positions and orientations of the virtual object and listener, and
environmental effects can be set based on attributes located elsewhere in the scene. Scent generation is done in a
similar manner, so the strength of scent deployment is scaled based on the distance between the user and the various
scent sources.
Haptics is the only modality that doesn’t directly benefit from this structure. As mentioned previously, haptic
interactions are rather difficult to generalize into a supporting software architecture. We are currently progressing
toward this goal, however. Haptic interactions can be split into three phases: the event that causes a haptic
interaction, the mapping from event to hardware interface, and the activation of the specific hardware to render the
event. The event comes from a taxonomy of possible haptic events, and the rendering depends on the various
technologies available. Any architecture supporting haptic interfaces must be extensible enough to allow additions
to both the kinds of events that can occur (for new interactions not previously considered), and the software interface
to the technologies available (for new devices that are created later). The most challenging aspect of haptic
rendering is the mapping from event to hardware. Different mappings must be used for different types of events.
For example, a positional tactile event occurs when a virtual object touches the user’s body. This can happen from a
wide variety of events, such as a simple collision (walking into a stationary virtual object), non-verbal
communication between users (tapping your friend on the shoulder), or checking a downed soldier’s pulse and body
temperature.
In contrast, there are directional tactile events, such as feeling the blast of a nearby grenade
explosion, or feeling the heat emanating from a campfire. Both of these event types could be rendered by some kind
of tactile device (a vibrator or pneumatic actuator) or a thermal device in the case of heat effects. An entirely
different kind of event occurs when you consider kinesthetic interactions, such as grasping a virtual object with a
force-feedback glove or simulating the inertia of manipulating a heavy virtual object with a force-feedback joystick.
However, the system must still map these interaction events to hardware feedback.
While the mappings from event to device will need to vary by the type of event and the specific effect that is
desired, it is possible to envision different classes of hardware that may be able to use the same mapping and deliver
very similar effects. The differences in these classes come only from the capabilities and features of the hardware.
For example, there are several examples of vibrotactile devices. Some of them can vary frequency, some can vary
amplitude, and some can only be turned on and off. The same mapping for a shoulder tap event, for example, can be
used for all of these devices. The desired effect for a vibratory device would probably be a low-gain pulse at a
frequency that makes the effect as mild as possible. Some devices could reproduce this with high fidelity, while
others could only manage to turn the device on and off a few times. Nevertheless, the mapping from event to device
would be the same. This is one area that could be handled by a general-purpose haptic software infrastructure, and
this is part of the work we are carrying out at this time.
This vision of a haptic software architecture fits in with the multimodal architecture detailed above in a looselycoupled manner. The simulation would need to detect haptic interactions between users or the world and the user,
characterize these events, and send them to a separate haptic mapping and rendering module. This is similar to the
manner in which sound and sources are updated (albeit more complex). The listener and/or the source of sound
move and the new locations must be sent to the sound hardware to keep the sound feedback consistent with the
system’s model of the world. Similarly, if the user interacts physically with the world or another user, the
interactions must be rendered haptically for the feedback to be consistent with the model of the world.
4
Dynamic Environments
In contrast to simple static environments (including the multimodal environments described in the previous section),
dynamic environments allow the user to interact with them and alter their composition and characteristics. This is
the essence of user-world interactions. A system that allows the user to interact with the environment is much more
engaging than one that never changes, despite what the user does. In this section, we will present the various types
of dynamic user-world interactions and describe some of the available software that supports them.
4.1
The Basics
The simplest world-user interactions are those that only affect the user. That is to say, these interactions are not
strictly part of dynamic environments, but they are still essential for a dynamic environment to be created. The most
basic of these is collision detection. There are dozens of algorithms for collision detection. Most virtual
environment systems use simple ray-triangle intersection tests, which is good for simple tasks such as keeping the
user from walking though walls and keeping the user’s avatar on the ground (known as terrain following). More
sophisticated tasks require more advanced collision detection. Some of the techniques used in dynamic simulations
are the GJK algorithm (Gilbert, Johnson & Keerthi, 1988), the Lin-Canny closest-features algorithm (Lin & Canny,
1991), and a host of hierarchical bounding volume techniques (Quinlan, 1994), (Klosowski, Held, Mitchell,
Sowizral & Zikan, 1996), (Gottschalk, Lin & Manocha, 1996).
4.2
Rigid and Deformable Body Dynamics
Rigid body dynamics is probably the simplest type of user-world interaction. The concept is that the user can
interact with various objects in the world in a physically realistic manner. Examples would be kicking a door open,
kicking a virtual ball, or throwing a rock at a stack of plastic cups and knocking some of them down. All of these
interactions involve relatively precise collision detection, and most require 4 th-order integration of the dynamic
forces involved to remain stable. Still, within reasonable limits, it is possible to handle these computations at
interactive frame rates. Fundamentally, rigid body dynamics involves two basic tasks, collision detection and
collision response. Collision detection must be efficient if there are a large number of objects in motion. Collision
response is the result of a collision between bodies or between a body and a static surface. The colliding body may
bounce, roll along the surface, or come to rest. Collision response takes the form of linear forces applied to push the
body away from the point of collision, as well as torsion forces to cause it to change its angular velocity. There have
been many refinements to rigid body simulation over the years, including (Brach, 1991), (Chatterjee & Ruina,
1998), and (Chenney, Ichnowski & Forsyth, 1999). Incorporating rigid body dynamics can add a new level of
realism to an interactive virtual environment.
Deformable body dynamics is very similar to rigid body dynamics. Instead of rigid objects bouncing and rolling on
surfaces, the objects can flex and bend as they collide with surfaces in the environment. The task of collision
detection is nearly identical. The added complexity comes from the collision response, which now must consider
the effects on the various parts of the body as well as the body as a whole. This leads to the “rag doll physics”
systems that have become popular in the latest first-person computer games. The addition of dynamic simulation
helps make the character animations more realistic. Other examples of deformable body simulation are cloth
simulation systems and surgery simulators which involve procedures on soft tissue. Deformable body dynamics
adds another layer of complexity to the mathematics of rigid body dynamics. However, with certain constraints, it is
also possible to handle deformable bodies at interactive frame rates. As processing power continues to increase,
these limitations will diminish as well.
4.3
Fluid Dynamics
Fluids are more mathematically-intensive than rigid and deformable body dynamics. Fluid dynamics is governed by
the Navier-Stokes equations, which are non-linear partial differential equations, as opposed to the equations of
Newtonian dynamics, which are ordinary differential equations. Until recently, it was very difficult to obtain a
stable fluid system at anything approaching real-time. In 1999, Stam demonstrated a stable volumetric fluid model
that ran at near interactive rates. Although not accurate enough to use in engineering applications, it is quite useful
for computer graphics and virtual environments to visually recreate fluid flows. As a result, realistic fluid dynamics
have begun to appear in games and simulations. The benefits of realistic fluids in dynamic environment are
numerous. First, physically-based fluid flow looks very good and can help with the all-important sense of
immersion. Second, objects can interact with fluids in a realistic manner, so the user can jump into a flowing river
and be carried downstream just as expected. Also, a realistic fluid flow model could generate accurate propagation
of smoke and scents through the air, complete with wind effects.
4.4
Fracture
Another class of dynamic interaction is fracture. This is any interaction that changes the geometric description of an
otherwise rigid object. As with fluid dynamics, this kind of simulation has only been possible in offline (noninteractive) situations until recently. In 2001, Muller et al, presented a new deformation and fracture simulation
technique that made it possible achieve interactive rates. Like Stam’s method for fluids, it simplifies the process
somewhat so that it behaves like an intensive physical simulation visually, but it is not quite physically accurate
enough to be suitable for engineering applications. Fracture can add another dimension of realism to a dynamic
environment. If a glass bowl is knocked off of a table, instead of bouncing off of the ground as would be the case in
a simple rigid body simulation, a fracture simulation can be used to shatter the bowl on impact.
There are also non-physics based fracture models, where an application-specific technique is used to achieve a
desired goal in the simulation. While not generally useful or realistic, these methods are often entirely adequate for
the task at hand, and are much less taxing on the processing power of the simulator. One example of a non-physics
based technique was a “Dynamic Terrain” engine designed by the Army Research Laboratory. This engine created
elliptical holes in virtual surfaces in response to explosions in the simulation. This capability allowed Soldiers in a
virtual training simulation to enter buildings using breaching charges instead of using the existing doors, which is
often considered tactically dangerous.
4.5
Benefits
In short, dynamics can add a new level of realism to a virtual environment, especially a multimodal virtual
environment, where all user-world interactions can be depicted with stimuli covering four senses. Depending on the
task at hand, user-world interactions may be essential, as in the case of a surgical simulation, for example. In a
collaborative setting, dynamic environments open up a whole range of interactions, with two or more users
interacting with each other, and affecting the same dynamic environment.
5
GEMINI
We now turn to linking multimodal and dynamic environments in a collaborative, interactive setting. Specifically,
we present an architecture for a distributed system and then address the interaction issue.
5.1
Previous Work
We have a fairly long history of work in distributed environments, which we briefly go into here. While some of it
is older, it shows the progression of our thoughts in this area. Others have built distributed architectures, but a
complete survey is beyond the focus of this paper. However, the MASSIVE system added many capabilities to this
area including scoping and communication in large environments (Greenhalgh, 1998). The Distributed Interactive
Virtual Environment (DIVE) was one of the first Internet-based multi-user systems to allow collaboration; however,
the environments were not rich in modalities although they did provide some simple dynamic behavior of objects
(described by scripts evaluated on each network node) (DIVE, 2005). Many others are similar. Our focus is on a
distributed architecture for both multimodal and dynamic environments.
5.1.1 The Services
Back in 1994 we developed the initial concept of the “Shared Environment” that would contain all data regarding
the dynamic environment in a fully distributed fashion. Clients would connect into the “Shared Environment” and
query the data each needed, making updates as necessary. While the “Shared Environment” was thought of as a
single component, it really was implemented as a number of distributed components known as “Services.” To avoid
bottlenecks, a “Service” would run on each host in the distributed environment and was responsible for propagating
all updates and dealing with any data coherency issues (although the latter was ignored in this initial work). Figure
2 contains a representative diagram of the “Shared Environment” and how it was implemented in terms of the
“Services.” The focus of our work at this time was on the problem of “dynamic terrain” where terrain was
dynamically modified during the simulation. A number of clients related to that work are shown in the Figure.
These “Services” provided a good start to our research in distributed, dynamic environments and provided much
functionality that was needed. However, there were some drawbacks. Based on the military funding of the project,
the “Services” were all based on the military simulation standard at the time, Distributed Interactive Simulation
(DIS) (“Distributed Interactive Simulation,” 1993). While this worked at the time, as time went on we found that
DIS was restrictive for researching new capabilities in distributed virtual environments. In addition to DIS being
restrictive, architecturally the “Services” was not as scalable as we hoped. Within a “Service” it worked well as
each host contained the “Service” it needed and it avoided the bottlenecks of other architectures (e.g. central
servers). However, as the number of databases increased we had to create a new “Service” to contain it (for
example, if we added atmospheric effects, we could create an “Atmosphere Database” that would require the
creation of an “Atmosphere Service”).
There were also some implementation issues with the “Services.” In particular, the “Entity Service” had too much
of a dependence on Euler angles. Therefore, all the singularity and ambiguity of Euler angles was clearly evident.
In addition, all client communication was completed through the use of shared memory. While this was perfectly
alright for the design at the time, we found that we wanted to explore other capabilities for research purposes.
Entity Simulator (e .g., M1 A1 T ank)
Entity Service
Im age Ge nerator
Host Softwa re Terra in Se rvi ce
Entity Service
Crate r Clien t
Soi l Clie nt
...
Terra in Se rvi ce
Entity Simulator (e .g., AAAV)
Entity Service
Track Cl ient
...
Wave Cl ient
...
Entity Service
Terra in Se rvi ce
Terra in Se rvi ce
Oce an Se rvi ce
Im age Ge nerator
Host Softwa re Oce an Se rvi ce
DIS Netwo rk
Figure 2: The services
5.1.2 The Communication Service
As part of a conference exhibit of our virtual environment research (at SIGGRAPH 1998) we developed a
component based on Greek mythology known as “Man & Minotaur” where users would enter an immersive virtual
environment as either Theseus and try to escape a maze or as the Minotaur and try to catch Theseus. In addition, an
additional element known as the “Greek Gods” (Pan, Athena and Hephaestus) allowed other users to participate on
Virtual Reality Markup Language (VRML) workstations in a manner to either help or hinder Theseus (their choice).
In order to support the “Greek Gods” we needed to connect these stations to the rest of the distributed environment
or “Shared Environment.”
Given the VRML use the obvious choice for implementing the connection to the “Shared Environment” was Java.
However, we wanted to take the opportunity to investigate the next step in our architectural research. To address the
issue of multiple “Services” (including the maintenance headache caused by it) we developed a Java-based solution
based around a single “Communication Service.” The conceptual architecture was similar to that shown in Figure 2
with the “Communication Service” and potentially multiple clients. In this case, however, client communication
was initially achieved through Remote Method Invocation (RMI). The use of RMI had the advantage of using either
a central server and fully distributed architecture. However, we found that RMI was both buggy and slow and ended
up switching the connection to be TCP/IP based (note, though, that this was back in 1998 when Java and RMI were
both fairly new; in addition, it is possible that threading off some of the processing would have helped). In any
event, the experience allowed us to explore some our thoughts and ideas.
5.2
GEMINI
When it came to the new work described in this section, we wanted to address many of the issues previously
identified. In addition, we wanted to provide a testbed for research throughout the field of distributed interactive
(multimodal) environments. This included a flexible number of databases, a flexible architecture design (to provide
for the extremes of a central server and fully distributed system, and everything in between), and flexible network
protocols. The latter comes from the need to still support DIS in our applied (military) research yet with a desire to
pursue advanced research in interactive environments (with techniques never thought of when DIS was conceived).
To fulfill these needs, we built the General-purpose, Environmental, Modular and Interactive Network Interface
(GEMINI). Conceptually, it appears similar to the “Shared Environment” but there are a number of differences.
5.2.1 Flexible Communication
Our first step in developing GEMINI was to generalize all forms of interprocess communication. This included
TCP/IP, UDP/IP, shared memory, message queues, and even serial ports. This allows the connection between the
instances of the GEMINI servers as well as the connections between clients and GEMINI servers to be of any form.
Therefore, any architecture can be formed for research investigations. For example, if multiple GEMINI servers are
used (one per host) with a shared memory connection from clients, then a fully distributed system is in use (much
like the “Services” discussed earlier). However, if a single GEMINI server is used and clients connect using
TCP/IP, then a central server architecture is being used (see Figure 3 to illustrate the comparison). Using this
flexibility, you can even have a hierarchical arrangement within the overall distributed system.
Figure 3: GEMINI in fully distributed and central server configurations
5.2.2 Flexible Protocols
The second step was to design a system for flexible protocols. As mentioned earlier, there is still a need to
interoperate with military simulations using primarily DIS although the High-Level Architecture is a possibility
(“High Level Architecture,” 2005), but we wish to explore advanced multimodal interactions, which will require
new protocols. In other words, DIS is too limited for our research but we still have a need to support it. In order to
make a system that could handle both needs, we incorporated a “translator” concept. As data comes into GEMINI
from other external components (whether other GEMINI servers or other simulators) it is translated from the
protocol on that communication link into an internal “GEMINI” protocol. Similarly, as data leaves GEMINI to be
transmitted to other external components, it is converted from the internal protocol to the protocol defined for that
link. Flexibility is achieved through the use of a “translator” that is dynamically loaded at run-time. Support for a
new protocol is added by simply implementing the “translator” for that protocol into and out of the internal
“GEMINI” protocol (see Figure 4). As an aside we also included the translator feature on the client-communication
side. While rarely used, it would allow a client whose protocol cannot be changed (such as a commercial game
engine) be able to connect into the overall distributed environment.
When developing the internal protocol we weighed the option of using binary record structures or an XML-based
document language. Binary records are faster, but are harder to maintain and suffer from endian issues (in the case
of mixed clients). An XML-based approach is slower but provides great flexibility including re-ordering, or even
elimination, of fields. We chose the XML-based approach; however, the client interface is designed to hide
knowledge of the internal protocol from the clients themselves. In order to translate from protocol to protocol a
mapping definition is required that establishes how fields are converted back-and-forth.
Figure 4: Translator in use
5.3
Collaborative Interaction
Now we turn to the question of supporting interaction in a collaborative environment setting. Recall that in the
multimodal collaboration sense we are concerned with user-to-user and world-to-user interactions. The user acts
upon other users, and both other users and the world act upon the user. These interactions are essentially events and
are transmitted accordingly. GEMINI’s support for other protocols allows the study of these interactions and can
include specific variables of the interaction (e.g. frequency and pulse rate for haptic interactions). Within this area
there are also some interesting questions in the area of synchronization and data coherency.
Interaction within a dynamically updating environment is more complex. Again recall that we are concerned with
user-to-world and world-to-world interactions. This requires updates to all aspects of the world including objects
and parameters of objects. For example, an object on a desk might be moved, a hole might be blown in a wall, or
dirt turned into mud. This requires a network protocol and a set of databases to contain all the information (objects
and parameters). GEMINI supports this by supporting multiple databases and flexible protocols.
6
Conclusions and Future Work
We have built both a virtual environment library for multimodal environments and a new distributed architecture for
multimodal and dynamic environments. The distributed architecture supports multiple databases and flexible
protocols through the use of translators, which allow easy configurability for detailed protocols. These features
provide the mechanism for supporting rich multimodal exchanges and interactive, updatable virtual environment.
Our initial work with this testbed will be to study fundamental questions such as latency issues. On the applied side
we will be incorporating a translator to support the High Level Architecture for our military simulation use. We will
also be extending the supported databases and protocols to support a dynamic visualization of scenes. In terms of
multimodal environments, we will be extending VESS (our multimodal environments software architecture) with
the generalized haptics library discussed in Section 3.5, as well as support for dynamic environments, incorporating
many of the features discussed in Section 4.
7
Acknowledgements
The authors wish to acknowledge funding support from the Army Research Institute, the Naval Research Laboratory
and the Office of Naval Research. In addition, we thank Mr. Duvan Cope who helped integrate and test early
versions of the GEMINI system.
8
References
Brach, R. M. (1991). Mechanical impact dynamics: rigid body collisions. New York: John Wiley & Sons, Inc.
Chatterjee, A. & Ruina, A. (1998). A new algebraic rigid body collision law based on impulse space considerations.
Journal of Applied Mechanics, 65, 939-951.
Chenney, S., Ichnowski, J. & Forsyth, D. (1999). Dynamics modeling and culling. IEEE Computer Graphics and
Applications, 19 (2), 79-87.
Defense Modeling and Simulation Office (2005). High Level Architecture. Retrieved February 26, 2005, from
https://www.dmso.mil/public/transition/hla/
DIVE: Distributed Interactive Virtual
http://www.sics.se/dce/dive/dive.html
Environment
(2005).
Retrieved
February
27,
2005,
from
Gilbert, E. G., Johnson, D. W. & Keerthi, S. S. (1988). A fast procedure for computing the distance between
complex objects in three-dimensional space. IEEE Journal of Robotics and Automation, 4 (2), 193-203.
Gottschalk, S., Lin, M. & Manocha, D. (1996). OBB-Tree: A hierarchical structure for rapid interference detection,
Proceedings of the 23rd Annual Conference on Computer Graphics and Interactive Techniques, 171-180.
Greenhalgh, C. M. (1998). Awareness management in the MASSIVE systems, Distributed Systems Engineering, 5
(3), 129-137.
Institute of Electrical and Electronics Engineers (1993). ANSI/IEEE standard 1278: protocols for distributed
interactive simulation. New York: Institute of Electrical and Electronics Engineers.
Klosowski, J., Held, M., Mitchell, J. S. B., Sowizral, H. & Zikan, K. (1996). Efficient collision detection using
bounding volume hierarchies of K-DOP's. IEEE Transactions on Visualization and Computer Graphics, 4 (1), 2136.
Lin, M. C. & Canny, J. F. (1991). A fast algorithm for incremental distance computation. IEEE International
Conference on Robotics and Automation, 1008-1014.
Meehan, M., Insko, B., Whitton, M. & Brooks, F. P. (2002). Physiological measures of presence in stressful virtual
environments. ACM Transactions on Graphics (Proceedings of the 29 th Annual Conference on Computer Graphics
and Interactive Techniques), 21 (3), 645-652.
Müller, M., McMillan, L., Dorsey, J. & Jagnow, R. (2001). Real-time simulation of deformation and fracture of stiff
materials. Proceedings of the Eurographics Workshop on Computer Animation and Simulation, 113-124.
Quinlan, S. (1994). Efficient distance computation between non-convex objects.
International Conference on Robotics and Automation, 3324-3329.
Proceedings of the IEEE
Stam, J. (1999). Stable fluids. Proceedings of the 26th Annual Conference on Computer Graphics and Interactive
Techniques, 121-128.
Stanney, K.M., Samman, S., Reeves, L.M., Hale, K., Buff, W., Bowers, C., Goldiez, B., Lyons-Nicholson, D., &
Lackey, S. (2004). A paradigm shift in interactive computing: Deriving multimodal design principles from
behavioral and neurological foundations. International Journal of Human-Computer Interaction, 17 (2), 229-257.