Design of a Body-Driven Multiplayer Game System

Design of a Body-Driven Multiplayer Game
System
SAMI LAAKSO AND MIKKO LAAKSO
Telecommunications Software and Multimedia Laboratory
Helsinki University of Technology
_________________________________________________________________________________________
We have designed and evaluated a novel multiplayer game system using just one top-view camera. With the
proposed system, player avatar movement can be directly mapped to the physical movement of the player,
accompanied by additional hand gestures triggering more complex actions. This article presents a study of the
concepts of body-driven multiplayer games using the proposed system. We have created four different test
games using human-centered design (HCD). We describe both the computer vision- based implementation and
the lessons we learned about designing effective content for interactive body-driven multiplayer games.
Categories and Subject Descriptors: H.5.2 [Information Interfaces and Presentation]: User Interfaces – Iinput
devices and strategies, Interaction styles; I.4.8 [Image Processing and Computer Vision]: Scene Analysis –
Motion, Tracking
General Terms: Performance, Design, Experimentation, Human Factors
Additional Key Words and Phrases: Computer vision, object tracking, gesture recognition, multiplayer games
_________________________________________________________________________________________
1. INTRODUCTION
In today's game and entertainment systems, the common goal in creating user interfaces
is to utilize human senses and movements more diversely than before. The use of bodily
and spatial user interfaces is more immersive and compelling than using just the
keyboard, mouse, or joystick. Conventional virtual reality systems try to make the user
interface transparent or invisible by using either surround-screens or head-mounted
displays [Sherman and Craig 2003]. However, this approach is high-cost and the
usability experience is often cumbersome, as the user has to wear special virtual reality
devices.
A more robust and low-cost solution is a camera-based user interface, where
computer vision technology is used to map the player's body movements and gestures to
control the application. The first to take this approach was technological artist Myron
Krueger, who pioneered the development of unencumbered, full-body participation in
computer-created virtual experiences [Krueger 1983.]. Later academic research, such as
the ALIVE project [Maes et al. 1994], improved Krueger’s system by replacing special
blue-screening hardware with computer vision algorithms; most body-driven applications
today employ a similar scheme. The camera is placed in front of the user and the video
_________________________________________________________________________________________
Authors’ addresses: Telecommunications Software and Multimedia Laboratory, Helsinki University of Technology,
Espoo, Finland, PO Box 5200, FIN-02015 HUT; emails: {slaakso | mlaakso}@tml.hut.fi
Permission to make digital/hard copy of part of this work for personal or classroom use is granted without fee
provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the
title of the publication, and its date of appear, and notice is given that copying is by permission of the ACM,
Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific
permission and/or a fee. Permission may be requested from the Publications Dept., ACM, Inc., 1515 Broadway,
New York, NY 10036, USA, fax: +1 (212) 869-0481, [email protected]
© 2006 ACM 1544-3574/06/1000-ART4C $5.00
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006. Article 4C.
2
●
S. Laakso and M. Laakso
screen is in the near proximity of the camera, enabling the user to see either himself or
his avatar in the game screen and to control the application action with his body
movements.
So far camera-based user interfaces have been used in various entertainment or
education applications, ranging from interactive dance space [Sparacino 2001] to martial
arts training [Hämäläinen et al. 2005]. Today there are also several commercial camerabased body-driven game systems available, both complete game systems such as
Mandala GX System by Vivid Group [http://www.vividgroup.com/] and home-sized
devices such as Eyetoy for Sony PlayStation 2 [http://www.fmod.org/].
A major advantage in using a camera-based interface is the possibility of replacing
the computer-generated player avatar with an image of the real user, which can add
immersion and intuitiveness. However, the common factor in almost all existing bodydriven game systems is that they are intended for single players only. Real body-driven
multiplayer games have received little or no attention. And, due to camera placement
requirements, the amount of playing space has usually been very limited.
In this article we investigate the use of camera-based user interfaces in different
multiplayer games. Our goal is to combine tracking of multiple users with a camerabased game engine and make a true body-driven multiplayer game system. In our games,
moving the player character requires the player to actually move around instead of just
moving his limbs. The solution we propose is a low-cost game framework system
capable of running various multiplayer games using just a single camera.
In our system, we place the camera, pointing downwards, on top of the players
Placing the camera directly above the playing area with optical axis perpendicular to the
playing area plane is optimal for tracking players. With our system, the movements of the
player avatar can be mapped directly to the physical movements of the player,
accompanied by additional hand gestures that trigger more complex actions. Results from
the pilot study, where we implemented the first version of the proposed system, were
promising [Laakso 2004]. However, the technical concept is nothing without the content.
Designing body-driven games is not an easy task, and multiplayer issues make the task
even harder. Our goal was to evaluate the concepts of various body-driven multiplayer
games in order to find some common rules for creating interactive and entertaining
applications using the proposed system. So we created several different games using the
human-centered design (HCD) approach [ISO 1999] with iterative user evaluation as part
of the design and development process. We then evaluated and analyzed the games.
This article makes three contributions: (1) proposes techniques for a body-driven
multiplayer game framework that expand on earlier methods; (2) provides a
demonstration that vision-based techniques can be implemented robustly and
inexpensively using a single camera and PC; and (3) offers the lessons we learned
regarding interaction principles for designing effective, entertaining applications using
our techniques.
2. RELATED WORK
Multiplayer games using other than camera-based solutions have also been proposed. In
Lumetila [Väätänen 2001], the floor consists of pressure sensors that track player group’s
global position. Other, more commercial, examples are the several Disney interactive
theme park rides, where a group of players can play together, each assigned a different
role by the game controller [Mine 2003].
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
Design of a Body-Driven Multiplayer Online Game System
●
3
There are previous examples of using top-view cameras to track people: e.g.,
analyzing players and their performance in team sports [Peršand and Kovacic 2000;
monitoring pedestrians in various surveillance applications [Rossi and Bozzoli 1994.].
Academic research on multiplayer issues has mostly focused on solutions for
entertainment games or children’s education, of which a good example is KidsRoom, an
interactive playspace for children [Bobick et al. 1999]; a comprehensive survey of human
motion capture using computer vision is presented by Moeslund and Granum [2001].
Top-view cameras combined with floor/wall projection have also been an inspiring
source for several artistic installations. Utterback [http://www.camilleutterback.com/]
used the camera to monitor physical movement in exhibit space and to paint a changing
wall projection in response to the activity. BodyMover, developed by ART+COM
[http://www.artcom.de/ ], is an interactive multiuser installation where users can control
visual and acoustic output with their gestures. In Body-Driven City of News, a user
walking on a floor projection image can browse a virtual web browser [Sparacino 2001].
MetaField Maze is a virtual, room-sized recreation of the traditional marble maze game,
where the player controls the play by walking around the projected 3D model of the
game on the floor [Keays 1998].
Another approach to multiplayer games is networked games, which enable players to
participate in the same activity regardless of their physical location. In Sports over a
Distance [Mueller 2005], two players can play a ball game while being miles apart. Both
sites contain a wall projection of the remote site, enabling the participants to interact with
each other through a life-sized video and audio connection. However, these kinds of
systems are more complex, expensive, and not easily portable. Hence our current system
concentrates on multiplayer games that use the same physical place.
3. THE GAME SYSTEM DESIGN
In order to robustly identify and track multiple players with a single camera, the only
possible solution is to place the camera on top of the playing area. Placing the camera on
top of the players clearly diminishes the amount of information and possible gestures that
can be gained from the players. But the advantages are also considerable. The playing
area size is multiplied as the players can employ the full camera view. And most
importantly, multiple users cannot easily occlude each other, which allows us to map a
player’s location on the camera directly to game control.
3.1 Player Approximation
We use background subtraction as the basis for locating players from the camera image
[Sonka et al. 1999]. When the game system is started, a sample image of the background
is stored in memory. During the game, a pixelwise absolute difference to the background
sample is computed for each frame. Then, the difference image is thresholded to obtain a
binary image where each nonzero pixel is considered part of the user.
After thresholding, each nonbackground pixel is labeled as belonging to a specific
object. It is then determined whether an object is a main body or a part of a body; a part
of a body is merged into the closest main body; and finally the mass center of each player
object is calculated. The corresponding player’ character coordinates can then be mapped
to the player’s physical position on the playing area. Individual players are tracked
through time, so that after all the mass centers of the players have been determined,
distances between all player positions in the current frame and the last frame are
calculated. Player positions in the current frame are then matched to player positions in
the previous frame so that the distances are minimized.
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
4
●
S. Laakso and M. Laakso
After all objects have been determined, we obtain the smallest rectangles surrounding
the player objects, i.e., the bounding boxes. The bounding boxes, or simply mass centers,
can then be used to approximate the player characters in 2D/3D world. A bounding box,
naturally, gives us only two dimensions, but the third dimension can be set to some
defined value, if 3D representation is required.
Since the camera must be placed high above the playing area and a full camera view
must be used, the effect of radial distortion [Slama 1980] becomes noticeable. There are
several algorithms available for correcting radial distortion, but we decided to use a raw,
uncorrected image because our game system in general uses only the mass center
location of the player object and we wanted to save more computing power for the game
engine itself. As a result, our gesture detection scheme had to be simple and robust.
3.2 Gesture Recognition
It is possible to define a game that does not require anything else than the possibility of
moving the player character. However, for basic games there usually is a need for one or
several different kinds of metaphors like jumping, firing, or an activating metaphor.
When defining gestures for our system, technical restrictions must be taken into account.
As the camera is fixed in the ceiling pointing downwards, there are some severe
limitations on gestures. All gestures must be very simple and easy to distinguish from
one another. Usability is another issue that must be dealt with. As games are usually fastpaced and require constant attention, player should be able to see the game screen at all
times. So it must be possible to produce all the gestures while the player is facing the
game screen.
A natural way for humans to make recognizable gestures is to use their arms. As the
player is seen from the top, the possible gestures are limited to extending one or both
arms. Individual finger tracking is not possible due to limited resolution. Technically, the
system could be further upgraded by adding the possibility of distinguishing legs as well.
However, this would make general game playing very difficult, as it is not possible to
move and gesture with your feet at the same time. Also, due to our camera placement, it
is technically very challenging to distinguish between arms and legs, and our system does
not do so.
There are almost no limits to the possible directions that a human arm can point to.
But for a gesture-recognition system, it is not feasible to try to distinguish among an
unlimited number of directions. Therefore, in our system, we concentrated on vertical
and horizontal directions only. We also decided not to use gestures where players point
their arms backward. This was done purely for usability reasons, as extending an arm
backward is quite unnatural and in no way a comfortable.
Using the bounding box as our approximation of the player character makes it
straightforward to also use it to detect players’ gestures. When the width of the bounding
box grows rapidly beyond a certain limit during successive frames, an arm extending
sideways and pointing is detected. Similarly, box height that grows rapidly enough is
recognized as an extended vertical arm. By combining this data with the location of the
mass center relative to the bounding box, it is possible to distinguish which arm is
extended. This simple scheme was found to work effectively in almost all situations.
Figure1. shows all the possible gestures the system can detect (please note that the player
is constantly looking at the game screen located on top of the image).
When designing body-driven games, the number of gestures required should be as
few as possible and be linked to their corresponding actions. The system should naturally
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
Design of a Body-Driven Multiplayer Online Game System
●
5
Fig.1. Eight gestures recognized by the system.
allow the users to produce gestures while moving. But at the same time, when a user
merely moves around the playing area, his changing bounding box should not be seen as
making random gestures. We solved this problem by checking that the player’s mass
center difference between adjacent frames does not exceed a certain threshold and by
accurately calculating suitable values for the growth limits of the region.
Another thing to note is that as our defined metaphors require rapid growth in the
dimensions of the bounding box, they are of a trigger-type and do not stay active. To
produce multiple similar metaphors, players must return their arms to their sides and then
extend them again. Therefore, continuous metaphors like “autofire” are currently not
possible.
3.3 Approximating a Contour with a Line
We also tested an alternative method for a more detailed player character approximation.
First, we thresholded the image and located all the separate contours. A bounding box
was then determined for each contour. Finally, a line was fit inside the player contour
using the least-squares method. The length of the line was determined to be the minimum
width and height of the contour bounding box.
Line position and length can then be used to approximate player character. This has a
natural advantage over the simple mass center or bounding box approach, as we can now
also have a direction for the player -- for example, if players are required to extend their
arms sideways and we need to get the direction in which the player faces. Figure 2.
shows the proposed method in action. The image on the left is the actual camera image;
the image on the right shows the threshold image with the best-fitting line drawn inside
the player’s contour.
This scheme also allows people to co-operate by connecting their body parts so that
the camera sees them as one contour. People moving as a big group create a "bigger"
Fig.2. Fitting a line inside a player’s contour.
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
6
●
S. Laakso and M. Laakso
Fig. 3. Global contour matching against the shape of a circle.
player; naturally, this slows down their reaction times. However, when the players
combine their body parts, tracking gestures becomes very hard to do. Also, as the number
of detected players can then change during the game, tracking the players through time
becomes nearly impossible. So using contour approximation with a line is better in
games where gestures are not used and tracking individual players is not required.
3.4 Group Dynamics
In our previous work we concentrated on tracking and analyzing individual players.
However, for co-operative games or games where there are specific teams competing
against each other, there is a need to investigate global group dynamics.
In our approach, we concentrated on analyzing the shapes formed by the players. We
first determined the global mass center of all the players, divided the circle into 32 slides,
and, for each slide, determined the longest possible line that can be drawn outwards from
the center of the mass. It was determined that each line ends when the last pixel of the
player contour is found. Finally, by analyzing the differences in line length, we can
determine the global shape of the players. For example, if players are required to form a
circle, all the line lengths should be nearly equal. Figure 3 shows an example of the
global contour matching. The image on the left shows the camera input, and the image on
the right shows the extracted center of the global mass and the line lengths calculated in
all 32 directions.
4. THE TEST SYSTEM
Our test system was built into an experimental classroom at the Telecommunications
Software and Multimedia Laboratory (TML). An analog CCD surveillance camera was
set pointing downwards at a height of 5 meters in the middle of the room. Using a wideangle lens, the camera image covered 8.3 x 6.8 meter playing area with 25 frames per
second. The actual game was shown on a white screen on one wall of the room in front
of the playing area. The dimensions of the white screen were 8.5 x 3.5 meters.
The image of the game was projected on the screen by two adjacent video projectors.
The monitor of the gaming computer was set to 1024x768 pixels dual view, thereby
setting the resolution of the game view on the screen to 2048x768 pixels. As the
camera’s aspect ratio was a normal 4:3, some scaling was required. The camera view was
scaled to the screen, so that a player standing on the playing area in front of the screen
could produce his player character on screen at a horizontally matching position. The test
set-up is shown in Figure 4, where four players are playing a body-controlled game. The
camera is located above the playing area and is not visible in the image.
The software was coded in C++ on a Windows platform. The computer vision
component was developed on top of the video-processing platform, EyesWeb
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
Design of a Body-Driven Multiplayer Online Game System
●
7
Fig. 4. Four players play a body-controlled game.
[http://www.eyesweb.org/], using Intel Integrated erformance Primitives (Intel IPP)
[http://www.intel.com/software/products/ipp/], and the Open Source Computer Vision
Library [http://www.intel.com/technology/computing/opencv/] for additional image
processing. The 3D graphics output was obtained by using OpenGL, and audio effects
obtained with an F-Mod audio engine [http://www.fmod.org/] and the Simple Direct
Media Layer [http://www.libsdl.org/]. The computer system was a 3.2 GHz Intel Pentium
P4. Both the computer vision and the multiplayer game engine were developed as
modular components to allow easier system maintenance and upgrade, as well as future
research.
5. EVALUATION STUDY
Designing body-driven games is not an easy task, and multiplayer issues make it even
harder. Therefore, we wanted to evaluate the concepts of body-driven multiplayer games
in order to find some common rules for creating interactive and entertaining applications.
With that goal in mind, we designed and implemented several games and concluded a
study to evaluate the general guidelines and design issues. All the games were created
through an iterative process where user evaluation was part of the design and
development process.
5.1 Human-Centered Design
All our body-driven games were created using the human-centered design (HCD)
approach [ISO 1999]. HCD was chosen because it makes the most efficient use of the
opinions of the end users of the system during the development process. The users do not
merely participate in the usability evaluation, but are an essential part of the design
process from its early stages.
5.1.1 Context. Our game system was designed to be easily portable and generally
independent of location. Our system is clearly unsuitable for home use, unless there is a
large empty space where the ceiling is very high. But any kind of public space containing
a 5 x 5 x 5 meter cube of free space could be suitable. Boring empty places could be
turned into interactive playing places, where the players could walk in, play for a while,
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
8
●
S. Laakso and M. Laakso
and then leave. Hence we did not fix the amount of playing time; a game could be played
for 5 minutes or an hour (which has to be taken into account when designing the games).
We also did not fix the number of players; but there had to be at least 2 to 6 players
playing simultaneously. We implemented our game so that 10 was the maximum number
of players.
5.1.2 The Target Group. Since the playing place was not fixed to any specific location,
it was decided that the target group should not be limited either. Age and sex should not
be allowed to significantly impact successful participation in the game. All the games
were designed for a wide age range, from 10-year olds to adults. The players’ experience
with computer games could also vary considerably. Our aim was to create a system that
does not require any previous experience of body-driven computer games.
In general, we assume that the players know each other, at least to some extent. Games
that depend on co-operation benefit when players are familiar with each other. However,
complete strangers should also be able to play the game together, and designers must take
this into account.
5.2 Expert Evaluation
The aim of the expert evaluation was to test the functionality of the games and the
technical concept as an iterative process. A group of TML staff members worked as
expert evaluators and tried to locate system faults and areas requiring further
development. It was very important to find the most crucial faults before evaluation by
the end users, since malfunctions could greatly influence the end users’ gaming
experience. Every malfunction and proposed new or changed feature was carefully noted.
We had three classifications of faults and their priorities:
(1) cosmetic problems, low priority;
(2) usability problems that prevent players from reaching their goals, medium
priority; and
(3) functional problems, to be fixed immediately.
As part of the HCD process, the results from the expert evaluation were used iteratively
during game design and implementation. The gestures used in the games was the main
topic of discussion (in several games many different variations in gesture were tested).
Also, game speed and some basic game logic were fixed according to feedback from the
expert evaluation group.
5.3 End User Evaluation
We carried out end user evaluations during two periods: the first evaluations were in
autumn 2004 and the second in spring 2005. In total, we had 32 test subjects, both males
and females. User age ranged from 10 to 45. All games were played with varying
numbers of players, typically 2 to 4 players at the same time. During game play,
evaluators observed the game and the players’ actions. Afterwards, all players were given
semi-structured interviews. The major topics for evaluation were
•
•
•
•
playability and usability,
collaboration and shared experience,
aerobic goals, and
entertainment value
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
Design of a Body-Driven Multiplayer Online Game System
●
9
5.4 Test Games
The goal of our test games was to find out what kind of games would be the best to play
with our proposed system. All the test games that were implemented were quite simple.
In this phase, we were not interested in making graphically and technically complex
games, but wanted to focus on the essence of game logics. For this purpose, we decided
to divide the games into four general genres:
•
•
•
•
all players play individually towards their own goals (“all vs. all”);
all players play individually towards a common goal (“all vs. computer”);
two teams play against each other (“team vs. team”); and
all players play as one team with a common goal (“team vs. computer”).
One test game was created for each genre. Our research issue was to discover the
major differences and similarities among the different genres. Using only a very limited
set of games does not guarantee any statistical results, but some basic conclusions can be
drawn.
The location of the virtual camera, that is, the direction and angle from which the
game is shown, is another important issue. The location of the camera is important for
creating games in both 2D (where games are shown from the top and side views) and 3D.
In addition, we also wanted to test whether it is beneficial to use real video images from
the top view camera as part of the game.
5.4.1 The Body-Driven Bomberman. The first game to be implemented was a bodycontrolled version of the classic Bomberman, put out by the Hudson Soft Company
[http://www.hudson.co.jp/]. The game idea is very simple: player characters move in a
labyrinth and drop bombs which explode after a short period. The object of the game is
to bomb the other players and avoid being bombed yourself. The game also includes a
few computer-controlled enemies and some breakable walls that reveal power-ups such
as a more effective blast radius.
The original Bomberman game had a maximum of 4 players simultaneously, while
our game allows 10 players. Player characters move on the game screen according to
player positions on the playing area. The target position of the player character, i.e., the
player’s physical position, is marked with a small dot to ease control in case the player
character gets caught behind a wall. The “extend arm(s) forward” gesture is used to drop
a bomb. Each player has only one life per round. When a player is killed, his player
character is removed from the screen. Eliminated players can, however, remain standing
in the playing area during the rest of the round. After all but one player has been
eliminated, the game restarts.
Figure 5 shows a typical game in progress. The lower picture is the actual camera
view from the playing area; the corresponding action on the game screen is shown on the
top portion of the figure. For better visibility, the camera view was stretched to match the
actual game screen (from aspect ratio 4:3 to 8:3).
5.4.2 The Body-Driven Bubble Bobble. Our second test game was a body-driven remake
of the famous arcade classic "Bubble Bobble," from Taito Corporation
[http://www.taito.co.jp/]. In the original game, the players control two bubble-breathing
baby-dragons who must get past 100 different levels in order to save their girlfriends. All
the levels are filled with an assortment of computer-controlled enemies. The dragons can
blow bubbles to capture the enemies inside them. To get rid of an enemy, the dragons
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
10
●
S. Laakso and M. Laakso
must first capture the enemy inside a bubble and then jump on the bubble to burst it.
Eliminated enemies turn into bonus items, giving players a higher score or various
special abilities like temporary invulnerability or fireball bubbles. The dragons can also
jump to reach higher platforms.
Our version of the Bubble Bobble was aimed to support and encourage co-operation.
The players could not kill each other, and on some levels the only way to clear the level
Fig. 5. Four players play Body-Driven Bomberman; the lower image is the actual camera view from the playing
area (stretched to match the actual game screen); the corresponding action on the game screen is shown
on top.
Fig. 6. Four players play Body-Driven Bobble Bobble; the lower image is the actual camera view from the
playing area (stretched to match the actual game screen); the corresponding action in the game screen is
shown on top.
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
Design of a Body-Driven Multiplayer Online Game System
●
11
Fig.7. Four players play Body-Driven Pong; the lower image is the actual camera view from the playing area
(stretched to match the actual game screen); the corresponding action in the game screen is shown on top.
Fig.8. Four players play Body-Driven Firemen; the game image shows the actual camera view with 3D
objects embedded.
is for players to work together. After initial tests, another issue was also slightly changed
to further encourage co-operation between the players. We added the possibility for a
player to capture not just enemies but also other players inside a bubble. But the player
still cannot not kill other players, and a player inside a bubble could easily go free by
making a jumping or firing gesture. This bubble travel technique can be used to reach
higher platforms that cannot be reached by jumping. But this technique can also be used
to slow down other players when racing towards a bonus item.
All 100 levels in the original game were reimplemented by us. As the game view
aspect ratio in our implementation is very different than in the original game, levels were
scaled heavily in a horizontal direction. Also, some levels had to be slightly modified to
allow more flexible multiplaying for a maximum number of 10 players.
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
12
●
S. Laakso and M. Laakso
As the game itself is viewed from a side view, the player’s control over his
character’s movements by moving himself is limited and relies heavily on gestures.
Hence a player’s vertical movements on the playing area have no effect on the
character’s movement. The player character on the game screen tries to reach the player’s
horizontal position on the playing area, but naturally cannot pass through walls. The
player character’s target position, that is, the player’s horizontal position, is again marked
with a small dot to ease control.
Defining suitable gestures for firing bubbles and jumping proved to be difficult. At
first we used the “extend arm(s) forward" gesture for firing bubbles and the "extend
arm(s) horizontally" gesture for jumping. But this turned out to be too difficult. When a
player extended his arms vertically, his mass center often shifted slightly to the
horizontal, which caused some random movements during the firing metaphor and
randomized the firing direction as well. After trying out several variations, we finally
used “extend arm(s) forward” for jumping, “extend left arm horizontally” for firing
bubbles to the left, and “extend right arm horizontally” for firing bubbles to the right.
The original game logic, which was related to the death of a player, was also
changed. To keep the multiplayer game fun and interesting, the players were made
immortal. But to keep the game challenging, the player character’s death was punished
with a decrease in the score. In the original game, the player was also returned to the
starting point after dying, but in our version this turned out to be a bit frustrating, as the
playing area was so big that the player was easily distracted when his character was
suddenly moved to a new location. So we finally used a scheme where a player’s death
simply immobilized the character for a second, and then continued from the same spot
while staying invincible for a moment. Figure 6 shows an example view of the
gamescreen along with the corresponding camera frame. The camera view has been
stretched to match the actual game screen
5.4.1 Body-Driven Pong. As the third test game, we decided to create a multiplayer
version of the classic Pong game. The game idea in Pong is a very simplified version of
tennis. The players control bats located on the opposing sides of the game screen, and try
to prevent the ball from leaving the field from their side. Every time the ball collides with
a bat, its speed increases.
The original Pong was a two-player game, but in our body-driven version we decided
to allow multiple players (maximum of 10) to participate. The players control bats whose
locations are determined by the players’ contour mass centers; the players extend their
hands to control the angle of the bats, as described in Section 3.3. Every time the ball hits
a player's bat, it bounces in the opposite direction. The players are allowed to move
freely on the game field, but are automatically divided into two teams. Players located on
one half of the playing area make up the first team; players on the other side form the
other team. So the game can also be played with an odd number of players. The game
ends after one side has scored 20 points.
An actual game is shown using 3D graphics. Figure 7 shows a typical game in
progress; The lower figure is the actual camera view from the playing area and the
corresponding action in the game screen is shown at the top of the figure. The camera
view has been stretched to match the actual game screen.
Virtual camera location and orientation. As the aspect ratio of our game screen on
the white screen is not the normal 4:3 but 8:3, it would seem natural that the main action
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
Design of a Body-Driven Multiplayer Online Game System
●
13
of the games should take place on the horizontal level. However, as the games still
requires the players to constantly watch the screen in front of them, moving sideways
most of the time can be confusing. The direction of the action can have a huge impact on
the usability of a game.
To explore the negative and positive aspects of this issue, two different versions of
Pong were created, one where the game was played on the horizontal level and one on
the vertical level. While testing the game, each group was asked to play on both the
vertical and horizontal levels. The players were interviewed about the pros and cons of
both versions after they finished playing.
Because the players are required to constantly watch the white screen while playing
and moving, the angle of the virtual camera is an especially important factor in the
game’s usability. If the 3D game world is shown from the wrong angle, the players can
easily feel confused, which would significantly degrade the usability of the game.
So we set up another test. We modified the game so that the virtual camera could be
easily adjusted during the game. Possible angles for the virtual camera were 0°, 15°, 30°,
45°, 60°, 75° and 90°, where 0° meant that the game was viewed straight from the top
and 90° meant totally from the side. During testing, each group was asked to play a
session using every possible virtual camera angle. The test was repeated for both
horizontal and vertical versions of Pong. After all the angles had been tested, each player
was asked which angle was the most suitable.
5.4.2 Body-Driven Firemen. In our fourth test game, we wanted to study the use of group
dynamics, as discussed in Section 3.4. So we created a co-operative game in which all
players were required to act as a group and form a specific shape. In order to help the
players to orient toward working as a group, the players could see their real video
images, and 3D objects were drawn on top of the images.
The players act as a team of firemen; their task is to catch people jumping from a
building that is on fire. The game is played from the viewpoint of our top-view camera.
Falling people appear as animated 3D characters who are falling towards the ground at
an accelerating speed. In order to save them, the players are required to move under the
characters and form a circle. As the circle forms, a computer-generated falling net in the
shape of a circle is drawn on top of the players. The size of the net is determined by the
size of the players’ circle and its integrity. If the players manage to catch the 3D
character before it hits the ground, the character is saved; otherwise it will hit the ground
emitting a painful sound effect. If the players manage to keep saving characters, after a
while the falling characters will first start to fall with a twist and finally there will be
multiple characters falling at the same time. The 3D characters will keep on jumping
from the building until the players miss catching 5 of them.
Because this game required a real camera view, we could not use our normal scheme
of scaling the width of the camera image similarly to the width of the white screen.
Hence, the extraction of the players’ location was done as usual, but the actual game
screen was shown unscaled and centered on the white screen. Figure 8.shows an example
view of the actual game screen with 3D objects drawn on top of the video image.
During our initial user tests, our 3D model had static lighting. But it soon became
apparent that the falling 3D character had to cast a shadow (which also added realism to a
scene) into the floor, since otherwise the players had great difficulty in determining
where the 3D character was about to land.
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
14
●
S. Laakso and M. Laakso
6. RESULTS AND DISCUSSION
6.1 Observations
6.1.1 Body-Driven Bomberman. Bomberman was designed to be an example of “all vs.
all,” that is, games with no co-operation among the players. But our version of
Bomberman was found to be a bit too confusing and lost its appeal quite quickly. The
players just ran around the playing area while trying to avoid other players and hoping
that the other players would get hit by a bomb blast. The addictive feature of the original
arcade Bomberman did not fully translate to our body-driven remake.
We also encountered the serious problem of the random false gesture. As the players
ran around the playing area, the gesture-detection system sometimes detected false
gestures. This was particularly annoying in the Bomberman game, where a player could
easily blow himself up if he dropped a bomb unknowingly. Another unwanted behavior,
caused by nonoptimal conditions and the primitive tracking system, was that player
characters sometimes switched when the players collided or just moved very close to one
another
6.1.2 Body-Driven Bobble Bobble. In this game all the players work towards a common
goal. The gameplay itself is straightforward, but finding the correct gestures proved
difficult. However, after several user tests, we found efficient variants. Extending the
arms forward can easily be linked to jumping, and extending an arm sideways is quite a
natural metaphor for shooting.
After the gestures were mastered, the gameplay was fun and addictive. As the game is
divided into levels and supported by co-operative play, the feeling of competition and
progress is natural. The game was found to be best when played by 2 to 4 players, so that
there is enough room to maneuver around the playing area.
This game also suffered from the occasional detection of random gestures and
switching of player characters. However, random false gestures proved not to be such a
great problem, as the bubbles are not lethal to the players. The switching of player
characters was a more serious issue, but the players usually agreed to try to switch
characters back to their original controllers.
6.1.3 Body-Driven Pong. The use of extended hands as the bat control mechanism was
considered natural, and almost all players understood the logic without instruction.
However, if the player was just standing still without extending his arm, the line-fitting
algorithm kept finding a different angle for each frame. This resulted in a rapidly
changing bat angle, and was sometimes a bit confusing. The effect of radial distortion in
the corners of the playing area also caused slight problems for line-fitting.
In this game the players could form bigger bats by contacting their body parts. This
sometimes happened unintentionally and annoyed some players, especially if the ball was
missed because of this. Also, some players did not respect the "gentlemen’s rule" of
staying in their own side of the playing area but went on to the opponents’ side to distract
their play. This often led to physical contact as well, when the players started pushing
each other. The system also allowed the players to switch sides during an on-going game,
which was beneficial if the teams turned out to be unbalanced.
When playing sideways Pong, the players had to position their bodies so that they
more or less faced either left or right. But at the same time, they needed to constantly
watch the game screen, so after a while their necks get sore. Switching sides at frequent
intervals brought some relief to this problem, but it still remained a serious drawback.
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
Design of a Body-Driven Multiplayer Online Game System
●
15
Vertical Pong did not have this problem, but user collisions became more frequent. As
the players in team close to the white screen moved vertically, they had no idea where the
players on the other team were. Of course they could try to determine the other team’s
physical location from its bat representations, but as the game gets faster this becomes
difficult. One solution would be to divide the game into two white screens, one in front
of the playing area and one behind. Both screens would show the game from the opposite
angles, allowing both teams to face the white screen and also see the other team head-on.
Results from the inquiry about the optimal virtual camera angle were not statistically
significant. As was expected, after being given the ability to play a 3D game, not a single
person voted for 0° or 90° angles. Preferred angles were divided quite randomly, most
people preferring values near 45°. All in all, our test still show that our game system is
valid for 3D games, too.
6.1.4 Body-Driven Firemen. The use of a real video image in Firemen was greeted with
enthusiasm by the players, as they could see themselves moving on the playing area.
However, the angle of the game was a bit confusing at first, as not many people have
actually seen themselves from top! But as soon as they got oriented with the camera
angle, they were able to concentrate fully on the game.
In the beginning of the game, the players were told to form a circle. No other
instructions were given. Then the 3D figures started falling from the sky. The initial
reaction of almost all groups was to stay still and watch them fall. Someone (usually one
or two of the leaders in the group) figured out the object of the game and tried to get the
other players to move together in order to catch the falling figures only after one or
several of them had hit the ground. (In several groups there appeared one or two people
who acted as the group leaders and shouted instructions to others.)
In some groups, especially those made up of complete strangers, there was some
discomfort. It was not easy for complete strangers to suddenly hold hands and move
around together. In these cases the players usually preferred to just hold their hands close
to those of the others but not to actually touch; this is probably a cultural issue.
The computer-generated circle that was drawn on top of the players was seen as a
great help. Some players complained about the missing information in cases where the
circle was not as big as it could have been. Adding additional information about the
shape, such as one red line in the direction that is most off the scope, could perhaps ease
this difficulty.
One of most frequent comments was also related to visibility. As the players are
required to form a circle, there will inevitably be those who do not face the white screen
but the opposite wall. These players must either keep looking over their shoulders or
follow the instructions of players who can see the white screen. Most did both. Some
groups rotated the circle so that everyone had a chance to see. All in all, group behavior
when targeting a common goal in body-driven games is an interesting issue and needs
further research.
6.2 Usability and Playability
Due to the non-optimal lighting conditions and primitive tracking system, the following
three kinds of usability errors were expected, and encountered:
• correct gestures were not recognized;
• false gestures were recognized; and
• player characters were switched during a game.
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
16
●
S. Laakso and M. Laakso
Gestures that went unrecognized were regarded as a minor usability problem, as the
players usually performed the gestures again, with larger movements or at faster speed.
Random false gestures incorrectly identified by the system were a problem in several
games, as our simple tracking system does not fully detect if the player is moving or
performing a gesture. The games should be designed so that random gestures cause no
significant problem to gameplay. Multiple cameras could be used to locate the player and
his gestures more accurately; but we wanted to keep the hardware setup as simple as
possible so that the system would stay robust and portable.
Player character switches sometimes occurred when the players collided or just
moved very close to one another. Player character switches are probably inevitable
during a game, as identifying a person correctly at all times is not possible in our current
real-time system. In Pong, we avoided this issue by automatically dividing the players
into two groups according to their location in the playing field. Other solutions like this
could also be used.
Camera drivers, image processing, and display double- buffering cause a slight delay
between the player’s gesture and the actual response in the game. Delay is also caused by
our gesture-recognition scheme, which detects the gesture only after it has already been
partially performed. With current games and current hardware, the delay was measured
as approximately 0.5 to 0.9 seconds maximum, which was still regarded as acceptable.
Naturally, if the games become more computationally expensive, this may become an
issue.
Based on our observations, we believe that games should not require complex
movements and gesture combinations,. It should also not be required that players position
themselves at pixel-deep accuracy or perform actions under very tight timing.
6.3 Collaboration and Shared Experience
In the proposed game system the visual and audio displays are shared. The players hear
the same sounds and look at the game screen from almost the same point of view. This
allows them to communicate freely during a game. In addition, the individual playing
space is not limited, so it provides for natural social interactions among players.
As players familiarized themselves with games and the pace of the games got more
intense, it was inevitable that sooner or later physical and emotional interactions among
the players would begin, including physical pushing and blocking other players, as well
as shouting insults or cheers.
Games of a collaborative nature seem to be more entertaining than games with only
individual goals. Global collaborative goals combined with individual competition seem
to be the most addictive to players. Our experiments with Pong show that team games
may contain some usability problems when the teams are “face to face” and the playing
area is limited Adding more game screens would ease this issue.
The Firemen game was an interesting experiment, as it required all players to cooperate, which often led to spontaneous reactions among the players. All in all, the group
behavior when targeting a common goal in body-driven games is an interesting issue and
needs further research.
6.4 Aerobic Goals
Body-driven games also seem to satisfy aerobic goals. After playing a body-driven game
for half an hour, almost all the players reported that their arms and legs were sore; after
an hour, they were exhausted. Luckily, in some games, players could also use their legs
to perform gestures, as the system does not distinguish between the limbs. This allowed
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
Design of a Body-Driven Multiplayer Online Game System
●
17
the players to rest their arms and not interrupt the game. The players’ exhaustion must
also be taken into account. A player should be able to drop out of the game or switch
places with a spectator without interrupting the general gameplay.
6.5 Entertainment Value
All the body-driven games can make gaming a spectator sport, as long as the spectators
stay out of camera view. Our experiments show that spectators shouting at and cheering
the players has a growing effect on the pace of the game and level of entertainment. Also,
the transformation from spectator to active player should be as smooth as possible -- a
group of players should be able to switch places with another group without interrupting
the gameplay. For example, in few test sessions of Bubble Bobble, players managed to
complete all 100 levels by switching the currently active group of players. When one
group got tired, another took its place and continued the game from the same situation.
This way the collaborative experience was not limited to just one group of players but to
a much larger audience.
7. CONCLUSIONS AND FUTURE WORK
We have presented the design and evaluation of a body-driven multiplayer game system,
together with the results and observations from testing several games. The system we
introduce is low-cost and portable, requiring only one top-view camera, a video
projector, and a computer. Our game system is generally independent of location. The
only requirement is a relatively clear and properly lit 5 x 5 x 5 meter area with the
possibility of projecting the game image on some flat surface nearby. The system is
clearly not suitable for home use, but any kind of empty public place could be turned into
an interactive entertainment space.
Our system involves not only human-computer interactions but human-human
interactions among the players and the audience as well. The players can collaborate or
compete with each other in the same virtual and physical space. Our game system can
also make body-driven games a great spectator sport. As long as the bystanders stay out
of camera view, they can watch the players play the game and switch places if the current
players get tired, providing a continuous collaborative experience for a large and
dynamic audience.
The system has been tested with four different body-driven games, each approaching
the multiplayer game concept from a different perspective. Design guidelines for
developing multiuser body-driven games follow:
•
•
•
•
•
•
Co-operation. Games should encourage players to co-operate in some form,
either “team vs. team” or all working towards a common goal. Pure “all vs. all”
games tend not to be as addictive in the long run.
Simplicity. Games should not require the players to position themselves with
precise accuracy or perform actions that require very tight timing.
Intuitiveness. The control metaphors should have reference points to natural
human actions.
Neutrality. Player switches should not be an issue; players should be allowed to
enter and leave during an on-going game.
Scalability. Games should be playable with varying numbers of players; If
possible, those numbers should be allowed to change during a game.
Robustness. Both player approximation and gesture recognition should be simple
and robust in order to work under varying conditions.
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
18
●
S. Laakso and M. Laakso
Our observations show that the games can be played from varying virtual camera
angles. Our system is valid for both the pure top-view and side-view, as well as 3D
games. Another discovery is that the use of a real video image is not mandatory; a topview image of a player does not add more value than the side-view image. Naturally, it
can still be used if the game benefits from the use of real video in other ways.
As our gesture-recognition algorithm does not use pixel color information, the ceiling
camera could also be an infrared camera. Then the playing area would be lit more darkly
and with dynamic lights. However, this would require more hardware, and possibly
compromise the portability of the system. In addition, real video images could no longer
be used as part of a game.
In the future, we intend to focus more on the group dynamics side of our multiplayer
body-driven games. Spontaneous group actions in several games seem to show that it
would be valuable to investigate group dynamics and interaction issues further. More
research is needed if body-driven multiplayer games are to attract the players’ attention
for a longer period than just a quick test. We believe that emphasizing aerobic goals,
combined with collaborative gaming, might be a promising solution; but this requires
further research.
ACKNOWLEDGMENTS
First of all, we would like to thank all the test participants. We would also like to
acknowledge the contributions and advice given by the staff of TML.
REFERENCES
Web links verified on October 15, 2005.
ART+COM AG. Homepage: http://www.artcom.de/.
BOBICK, A., INTILLE, S., DAVIS, J., BAIRD, F., PINHANEZ, C., CAMPBELL, Y., IVANOV, Y., SHUTTE, A., AND
WILSON, A. 1999. The KidsRoom: A perceptually-based interactive immersive story environment.
PRESENCE: Teleoperators Virtual Environ. 8, 4 (Aug.), 367-391.
EYESWEB. http://www.eyesweb.org/.
EYETOY. http://www.eyetoy.com/.
FMOD. Cross-platform audio engine. ttp://www.fmod.org/.
HÄMÄLÄINEN, P., ILMONEN, T., HÖYSNIEMI, J., LINDHOLM M., AND NYKÄNEN, A. 2005. Martial arts in artificial
reality. In Proceedings of the Conference on Human Factors in Computing Systems (CHI 2005, Portland,
OR, April 2-7). ACM, New York. 781-790.
HUDSON SOFT CO.. Homepage:ttp://www.hudson.co.jp/.
INTEL.® Integrated performance primitives. http://www.intel.com/software/products/ipp/.
ISO 1999. Human-centred design processes for interactive systems. ISO 13407. International Standards
Organization.
KEAYS, B. 1998. MetaField maze. MIT Media Lab. http://www.billkeays.com/metaField.htm
KRUEGER, M. 1983. Artificial Reality. Addison-Wesley, Reading, MA.
LAAKSO, S. 2004. Top-view body-driven multiplayer game framework – A pilot study. In Proceedings of the
Conference on Virtual Systems and Multimedia (VSMM 2004, Ogaki, Japan, Nov. 17-19). IOS Press. 970979.
MAES, P., PENTLAND, A., BLUMBERG, B., DARRELL, T., BROWN, J., AND YOON, J. 1994. ALIVE: Artificial life
interactive video environment. Intercommunication 7 (1994), 48–49.
MINE, M. 2003.Towards virtual reality for the masses: 10 years of research at Disney's VR studio. In
Proceedings of the Workshop on Virtual Environments 2003 (Zurich, May 22-23). ACM, New York. 1117.
MOESLUND, T., AND GRANUM, E. 2001. A survey of computer vision-based human motion capture.
Comput.Vision Image Understanding 81, 3, 231–268.
MUELLER, F. 2005. Sports over a distance. In Proceedings of the 3rd International Conference on Pervasive
Computing (Munich, May 8-13).
OPEN SOURCE COMPUTER VISION LIBRARY. http://www.intel.com/technology/computing/opencv/.
PERŠ, J. AND KOVACIC, S. 2000. A system for tracking players in sports games by computer vision. Electrotech.
Rev. J. Electrical Eng. Comput. Sci. 67, 5, 281–288.
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.
Design of a Body-Driven Multiplayer Online Game System
●
19
ROSSI, M. AND BOZZOLI, A. 1994. Tracking and counting moving people. Tech. Rep. 9404-03, IRST, Italy,
April.
SHERMAN, W. AND CRAIG, A. 2003. Understanding Virtual Reality: Interface, Application and Design. Morgan
Kaufmann, San Francisco.
SIMPLE DIRECTMEDIA LAYER. http://www.libsdl.org/.
SLAMA, C. (ED.). 1980. Manual of Photogrammetr. 4th ed.. American Society of Photogrammetry.
SONKA, M., HLAVAC, V., AND BOYLE, R. 1999. Image Processing, Analysis and Machine Vision. 2nd ed.
Brooks/Cole Publishing.
SPARACINO, F. 2001. (Some) computer vision based interfaces for interactive art and entertainment installations.
INTER_FACE Body Boundaries, Anomalie, n.2, Paris.
TAITO CORP. Homepage: http://www.taito.co.jp/.
UTTERBACK, C. Homepage: http://www.camilleutterback.com/.
VÄÄTÄNEN, A., STRÖMBERG, H., AND RÄTY, V. 2001. Nautilus : A game played in interactive virtual space. In
Proceedings of the Conference on the Graphics Interface (Ottawa, June 7-9).
VIVID GROUP. Homepage: http://www.vividgroup.com/.
Received November 2005; accepted February 2006.
ACM Computers in Entertainment, Vol. 4, No. 4, October 2006.