Interactive Dancing Game with Real-time Recognition of Continuous

Interactive Dancing Game with Real-time Recognition of
Continuous Dance Moves from 3D Human Motion Capture
Jeff K.T. Tang
Jacky C.P. Chan
Howard Leung
Department of Computer Science
City University of Hong Kong
83 Tat Chee Avenue, Kowloon,
Hong Kong
+852 2194 2837
Department of Computer Science
City University of Hong Kong
83 Tat Chee Avenue, Kowloon,
Hong Kong
+852 2194 2547
Department of Computer Science
City University of Hong Kong
83 Tat Chee Avenue, Kowloon,
Hong Kong
+852 2788 7234
[email protected]
[email protected]
[email protected]
ABSTRACT
We have implemented an interactive dancing game using optical
3D motion capture technology . We propose a Progressive Block
Matching algorithm to recognize the dance moves performed by
the player in real-ti me. This makes a virtual partner be able to
recognize and respond to the play er’s movement without a
noticeable delay. The completion progress of a move is tracked
progressively and the virtual partner’s move is rendered in
synchronization with the player’s current action. Our interactive
dancing game contains moves with various difficulty levels that
suits for both novices and skillful players. Through animating the
virtual partner in response to the play er’s movements, the player
gets immersed into the virtual environment. A user test is
performed to have a subje ctive evaluation of our game and the
feedbacks from the subjects are positive.
1. INTRODUCTION
Categories and Subject Descriptors
In real situation, two partnered dancers
could communicate
through their body movements. Hence, how the human pla yer
interacts with the computer avatar becomes a very important issue.
The virtual partner must be able to res pond to the pla yer’s dance
move promptly. So the sy stem must be able to recognize the
player’s movement withou t significant delay . In this paper, we
propose a Progressive Block Matching algorithm to perfor m the
real-time recognition of continuous dance moves.
D.5.1 [Multimedia Information Systems]:
Artificial, augmented and virtual realities.
Animations,
I.2.0 [Artificial Intelligence]: General, Cognitive simulation.
I.2.1 [Applications and Expert Systems]: Games.
I.3.7 [Three-Dimensional Graphics and Realism]: Animation,
Virtual reality.
General Terms
Algorithms, Measurement, Performance, Design, Experimentation,
Human Factors.
Keywords
Human-Computer Interaction, Interactive Dancing Game, 3D
Human Motion Capture, Continuous Motion Recognition.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or com mercial advantage and that
copies bear this n otice and the f ull citation on the first page. To copy
otherwise, or republish, to post on servers or to re distribute to lists,
requires prior specific permission and/or a fee.
ICUIMC’11, February 21–23, 2011, Seoul, Korea
Copyright 2011 ACM 978-1-4503-0571-6…$10.00.
There is an increasing demand from game pla yers for games with
enhanced functionalities in different dimensions such as realism,
interaction, complexity, multi-player support, etc. The recent
advances in te chnologies such as higher computational speed,
greater 3D processing power, as well as more s ophisticated
motion sensing devices allow game developers to produce more
advanced games in order to satisfy the game players. Video Game
is getting more ubiquitous to many people’s life and they spend a
lot of time on it. Especially, many people are attracted by Virtual
Reality games because they enjoy to get immersed into the virtual
environment. In this paper, we propose an interac tive dancing
game based on 3D motion capture technology. The virtual avatar
is able to dance collaboratively by recognizing what a human
player is dancing in real-time.
In our proposed interactive dancing game, a real-time motion data
acquisition method is needed. There are many available real-time
sensor-based input technologies. In particular, we use 3D optical
motion capture technology because it allows us to capture the
human motion precisely.
We have implemented the interactive dancing game that conforms
to a design principle: a good game should be easy to play but hard
to master [1]. There are two modes in the game: training mode
and freestyle mode. The training mode aims to help players to get
familiar with the dance moves, while the freestyle mode allows
players to dance freely. Dance moves consist of various difficulty
levels that are suitable for both novices and skillful players. While
the interaction makes the game more fun, marks will be given if
he/she can collaborate with the virtual partner to comple te a move
successfully. A bonus mark will be awa rded to the player when
he/she has performed
some completed collaborations
continuously. We conducted a user study about our game and we
have received positive responses from the players.
The paper is organized as the following. Section 2 presents some
related work. Section 3 provides an overview about our proposed
system. Section 4 covers the details of our real-time recognition
algorithm for continuous dance m oves. Section 5 explains about
the tuning of s ystem parameters. S ection 6 describes the
gameplay of our interactive dancing game. The user studies are
presented in Section 7. Conclusions and future work are provided
in Section 8.
2. RELATED WORK
In order to make the computer more helpful to human, researchers
spent lot of tim e to e nhance the interaction between human and
computer. Babu et al. [2] proposed a multimodal s ocial agent
framework called “Marve”, which can recognize human face and
speech as multimodal input and a virtual human can res pond
through a rule-based system. Jaksic et al. [3] proposed a virtual
salesperson in an online shop that can give a ppropriate feedback
to customers by monitoring their emotion from facial expression.
Apart from facial expression, body language is als o an important
way to express our emotion. Iwadate et al. [4] propos ed an
interactive dance system which can identify the emotion
expressed in a dance video sequence and control the multimedia
according to emotion. The identification is based on three features,
motion speed, openness of the body and a cceleration of motion,
for example, speedy and light openness of the body together
representing happy.
With the advance in s ensor-based input technology, capturing
human motion data becomes more common. Marker-based optical
motion capture system is gaining more popularity in the movie
industry. The captured human motion da ta is used to render
virtual character in the m ovie to produce realis tic motions. In
addition to animation rendering, other researchers are interested in
analyzing the motion data to understand more about it. Li et al. [5]
proposed a motion recognition algorithm which segments motion
at equal inter val and selects motion features by Single Value
Decomposition (SVD) to be classified using Support Vector
Machine (SVM). With a state space approach, Darby et al. [6]
used the Hidden Markov Model (HMM) to re cognize human
actions. It can predict the move performed by the player using the
past frames and is a robust method to model time series. However,
enormous amount of tra ining data is needed to ensure the
robustness of the recognition. Otherwise, recognition may fail due
to over-fitting. In this paper, w e will also perform motion
recognition on 3D motion data captured by optical motion capture
system. Nevertheless, the difference between our approach and
approaches in [5][6] is that we focus more on the real-time issue
and target on the continuous recognition without noticeable delay.
Some researchers focus on building d ance-related applications.
Calvert et al. [7] worke d on the notation of dance motions for
recording and editing. Anim ations can be generated by the
computer according to the dance notation such that a
choreographer can now rehearse the idea with the computer
before actual meeting with a real dancer. Ebe nreuter et al. [8]
further compared the dance notation with
motion capture
technology and 3D ani mation on t he ability of recording and
editing dance motion. The re sult showed that the three
technologies have their own advantage in different aspects such as
ease to use, c ost, etc. Dance education is another topic that
attracts the interest of various researchers. Magnenat-Thalmann et
al. [9] proposed a web 3D platform for dance le arning. They use
motion capture technology to capture expert dance movements ,
and playback through a web interface. Leung et al. [10] proposed
a performance training tool that focuses on dance learn ing. It can
evaluate how good a person performs and gives useful feedback.
Our application also contains some elements of dance learning but
with more emphasis on the entertainment aspects.
The human-computer interaction through dance is an importa nt
topic studied by various parties. Dance Dance Revolution is a
famous video game which the game is played using a dance mat.
The player has to step on the correct zone in time in order to win
the game. Although the p layer cannot really learn dancing
through this game, it is a fun game. Tsuruta et al. [11] proposed a
virtual dance collaboration system that can identify some simple
moves like jumping and waving a hand. Then the virtual avatar
will perform the same moves. Nevertheless, their method is not
applicable to longer moves while dance motions are always not as
simple and short as jumping. Reidsma et al. [12] proposed a rap
dance system in which virtual dancers are driv en by the beat
detected from sound, music or dance video clips. It does not really
matter how the player moves except for the beat indicated. In our
application, we would like to provide an interesting application in
which the human-computer interaction is based on the movement
of the whole body.
3. SYSTEM OVERVIEW
In our proposed game, a human dancer (player) can interact with
the computer dancer as the virtual pa rtner. Figure 1 shows the
architecture of our prop osed system. An optical 3D motion
capture system is used to capture the real-time movement of the
human dancer. The motion data is digitalized and recognized in a
PC. The data server delivers the motion templates that a re
necessary for the motion recognition. The play er’s move is
analyzed continuously and the sy stem will generate the most
appropriate move for the virtual partner to be animated and shown
to the player on the screen.
Figure 1. The system diagram.
The data server contains collaborative dance templates that are
need to be captured in advance. In our current system, we use Ago-go dance that conta ins funny interactive moves between the
male and female dancers. Each dance template consists of dance
moves captured from a male dancer and a female dance
r
simultaneously. Two examples of A-go-go moves are shown in
Figure 2. A-go-go dance moves are chosen because th ey are
highly collaborative. In som e move, the movements of both
dancers are symmetric to each other as shown in Figure 2(a),
while in some movement the gestures of both danc ers are totally
different as shown in Figure 2 (b).
The pre-stored data in the data server thus contains synchronized
movements between the two dancers. This is important bec ause
during the gameplay, the s ystem needs to perform continuous
recognition of the player’s move in or der to render the
corresponding move of the virtual dance partner to facilitate
interaction. The detailed description of our continuous recognition
algorithm for real-time dance moves is provided in the next
section.
Completion state. Figure 4 shows the state diagram illustrating the
flow between these s tates. The cha nge of state is triggered by
block matching cost. Here we describe the FSM in high level. At
the beginning, the system is in the Idle state since the input move
is unrecognized. In general,
there are Nm chains of states
corresponding to a total of Nm template moves.
Move
Some key-postures of the Motion Templates
1
2
3
(a) Symmetric move
(b) Collaborative move
Figure 2. Example A-go-go dance moves.
4. PROPOSED INTERACTIVE DANCE
FRAMEWORK
During the game, the player performs the dance moves
continuously. As a result, our system needs to recognize the
moves performed by the player in a c ontinuous manner. Mo re
importantly, the s ystem needs to generate the virtual partne r’s
dance move corresponding to the play er’s dance move in re altime. This means that the s ystem cannot wait until th e player
finishes his/her move before carrying out the recognition module.
Rather, the recognition needs to perform in a real-time manner in
which the delay cannot be too high for ge nerating the response
immediately. The c ontinuous recognition of real-time dance
moves can be represented by a finite state machine (FSM). We
proposed a progressive block matching approach that is able to
recognize the play er’s move in real-time. In the next subsection,
we first explain the states in our f inite state machine
representation and the interaction between t he states. In the
subsequent subsection, we will explain the progressive matching
algorithm in details.
4.1 FINITE STATE MACHINE
REPRESENTATION
There are eight tem plate dance moves stored in our s ystem as
shown in Figure 3. Given the pl ayer’s input mo ve, the system
needs to determine whether the input move belongs to one of the
template dance moves or it is an unrecognized move. We propose
a finite state machine to represent the dif ferent states in the
recognition process. There are f our states in our finite state
machine: (1) Idle state, (2) Start state, (3) Response state, and (4)
4
5
6
7
8
Figure 3. The eight Template A-go-go Moves.
In the Start state of each template move, the input mo tion is
divided into blocks (shorter segments) to be matched. The inpu t
block is compared with the be ginning block of each template
move and a block matching c ost is computed. If the block
matching cost is la rger than a threshold THcost, then the inpu t
move does not belong to that pa rticular template move and the
system returns to the Idle state. On the other hand, if the block
matching cost between the input block and the beginn ing block of
the best matched template move is lower than the threshold THcost,
then the system enters the Response state for that template move.
During the Response state, the system continues to check whether
the player keeps performing the same motion as the template
move. The system keeps track the percent compl etion of the
player’s move. If the accumulated error from the template move
increases until certain threshold value, it triggers the system to go
back to the Idle state. Otherwise, if the player can 100% complete
his/her move, the Completion state will be triggered. In this state,
score will be awarded to the player. The system will then go back
to the Idle state and get prepared to recognize the play er’s next
move.
problem of finding the matching cos t between the postures at
those frames.
Figure 5 s hows 20 joints and 5 end-s ites (marked as *) in the
human model we have used. Within the joints, there are 6 endeffectors (marked as +). Joint angle difference is considered in our
similarity metrics because they are comparable in motions
performed by people of different body sizes. The end-sites are the
terminus in the human model hierarchy hence they are not
considered in our m atching cost that is derived from the joint
angle differences. On the other hand, the end-effec tors are also
ignored because they are often inconsistent even when the same
motion is performed.
Idle
state
Start
state
Start
state
Response
state
Response
state
Completion
state
Completion
state
Start
state
…
Response
state
Completion
state
Figure 5. The joints and end-sites in our human model.
Move 1
Move 2
Move Nm
Figure 4. State diagram of our interactive model.
4.2 REAL-TIME MOTION RECOGNITION
ALGORITHM FOR CONTINUOUS DANCE
MOVES
Our recognition algorithm is developed based on the comparison
between two postures. The frame matching cost betwee n two
postures needs to be defined. To account for the temporal
variations like speed or inconsistency, a frame correspondence is
determined by searching for the best match for each input frame.
In collaborative dancing game, the vir tual avatar is needed to
understand what its partner (the human player) is doing. Hence,
motion recognition needs to be performed in real- time. In our
proposed method, the input s tream is processed in blocks and a
move is recognized as one of the templates based on the Block
Matching Cost. The percentage com pletion (i.e. the progress of
collaboration) between the play er and the virtual partner is
monitored progressively.
In the following sub-sections, we will first introduce the Frame
Matching Cost, which forms part of the formulation of Block
Matching Cost.
4.2.1 FRAME MATCHING COST
Each frame of the motion corresponds to a posture. The frame
matching cost (i.e., the cost for matching a frame of the pla yer’s
move and a frame of a tem plate move) is equivalent to the
The frame matching c ost Sim(P1, P2) between a pair of postures
P1 and P2 is given by the weighted sum of joint angle differences.
The weight of each joint angle wi is given by the distance of that
joint measure to the hierarchical end-site. For example, the weight
of the left-shoulder joint is the distance measured from leftshoulder joint to left-hand joint that is equal to the s um of lengths
of upper and lower arms. Assume that there a re NJ joint angles
extracted for each posture. Denote these joint angles from posture
P1 by θ1(i) and those from posture P2 by θ2(i), where i = 1,2,…,
NJ. The equation for the frame matching cos t Sim(P1, P2) is given
by equation (1):
NJ
Sim( P1 , P2 )   wi 1 (i)   2 (i )
(1)
i 1
4.2.2 FRAME CORRESPONDENCE
There exist temporal variations when two people are performing
the same movement. Even a move is performed by the s ame
person several times, the speed may be different. To account for
this deviation, more reference fr ames from the tem plate move
should be searched in order to determine the best match for each
frame of the input move.
Dynamic Time Warping (DTW) is often used to match sequential
time data. However it is not suit able to us e DTW in our
application because we do not know when the play er finishes the
current move, and the virtual partne r needs to make decision
within a s hort time in order to deliver a pr ompt collaborative
response with little delay . Otherwise, the interaction will appear
asynchronous and unnatu ral. Hence, we propose a block-based
matching method, which imitates DTW by locally matching
blocks of frames of varying sizes in ascending time order.
The frame correspondence process is illustrated in Figure 6. In (a),
the best m atch f(1) for frame 1 of the input block is searched
within the matching range w in the tem plate move. The si ze w
depends on the size of input block NB. In particular, we set w =
1/4 of NB frames. For each frame i of the input block, the best
matched frame f(i) is defined as the frame of the template move
that yields the minimum frame matching cost locally within the
matching range. Next, as illustrated in Figure 6(b), the best match
f(1) is then set as the starting frame of the next matching range in
order to search for the best match f(2) for frame 2 of the input
block. This process is r epeated until every frame in the in put
block is corre sponded to a frame in the template move as
illustrated in Figure 6(c). The average frame matching cost will be
used to determine whether the play er is performing the s ame
move. The frame correspondence is useful in block matching,
which will be discussed in Section 4.2.3.
4.2.3 BLOCK MATCHING COST
The block matching cost is used to (1) determine whether the
input move is s imilar to one of the template moves in the Start
state; and (2) determine whether the input move is still similar to
the particular template move in the Response state.
In the Idle state, the stream to the starting block of a ny candidate
move is needed to be alig ned first. The system has to make
decision about which templa te move is the most similar to the
input. This is trained by a thre shold and will be describe d in
Section 5. Once the input move is recognized as one of the
template, the Start state is triggered.
In the Start sate, the block matching cost between the input block
and the beginning block of each template move is computed. The
block matching cost is defined as the average local matching cost
among all corresponding frames as described in the previous
section. The template move th at yields the minimum block
matching cost is identified. If this minimum block matching cost
is below a threshold THcost, then the system triggers the Response
state for that particular move.
In the Response state, the block matching cost is computed as the
(a) Searching for the best matched frame
for the 1st frame
cumulative mean of the matching cost of the current block and all
previous blocks since the Start state. In other words, it is the
average frame matching costs between all corresponding frames
up to the current frame. The advantage of using cumulative mean
is that the change of value is more stable. It is thus easier to set a
threshold for making the decis ion especially for the marginal
cases. If the block matching cost is above a thre shold THcost, then
the input move has too much deviation from the template move
and the system gets back to the Idle state. Otherwise, the system
stays in the Response state and continues checking for the ne xt
block until the current move has been completed.
4.2.4 PERCENTAGE COMPLETION
To let the virtual partner dances collaboratively, it is too late to
wait for the player to finish a move before recognizing it. In the
proposed real-time block matching method, we need to identify to
the current status of the play er’s move and generate the
appropriate response from the virtual dance partner.
During the Response state, the block matching repeats and hence
a sequence of matched block pairs are formed progressively until
the player’s move is completely recognized from a certain motion
stream. The player’s move is matched continuously so that the
system can identify the status of the player’s move and generate
the appropriate response for the virtual dance partner.
Input stream
Best matches
between blocks
Template move
5%
40%
80% 100%
Figure 7. Percent completion.
Hence, we suggest keeping track the percentage completion of the
player’s move during the Response state. Figure 7 illustrates how
(b) Searching for the best matched frame
for the 2nd frame
Figure 6. Frame correspondence.
(c) Frame correspondence of the input
block
In our real-tim e recognition algorithm, two parameters NB and
THcost(n) are required. The parameter NB is the block size, i.e., the
number of frames required in the Start state in order for the
system to decide that the player’s input move belongs to a
template move. NB is also useful to determine the matching range
in the block matching is Response state as stated in section 4.2.2.
The parameter THcost(n) is the threshold for the block matching
cost that determines whether the player’s move is similar to the nth template move, where 1≦n≦N and N = num ber of template
classes. A total of 48 m otion samples, eight template moves
performed by two dancers in three trials, have been captured to
tune the two thresholds.
THcost(n) is trained by the cumulative mean CCM(i) which is
equivalent to the c umulative matching cost at i-th frame in the
matching, where 1≦i≦M and M is the minimum length of two
comparing template moves. The calculation of CCM(M) is given in
Equation (2), where PH(i) and PK(i) are the i-th frames from any
two motion template samples. If these two template samples are
very similar, they are likely from the same template and the
cumulative cost should be small.
1
CCM ( M ) 
M
M
 SimP
i 1
H
(i ), PK (i ) 
(2)
Figure 8 and Figure 9 shows the cumulative frame matching costs
CCM(i), where 1 ≦ i ≦ M (refer to pre vious paragraph). The
matched cases (in circles) and for the unmatched cases (in crosses)
for Templates Move 2 (and Template Move 6) are shown
respectively.
The unmatched cases (in cros ses) are denoted by CCM_unmatch(i)
which is the cumulative mean of matching cost at the i-th frame
such that the template moves PH(i) and PK(i) are from different
classes. On the other hand, the matched cases (blue circles ) are
denoted by CCM_match(i) which is the cumulative mean of matching
cost at the i-th frame such that the template moves PH(i) and PK(i)
are from the same class.
Ideally, for all frames, CCM_unmatch(i) should be higher than
CCM_match(i). Let 1≦n≦N (refer to previous paragraph), a set of
thresholds at each frame i of the n-th template THcost(n) = {TH(1),
TH(2),…, TH(I),…, TH(60)} is given by the mid-value between
the minimum of the unmatched group i.e. min({CCM_unmatch}) and
the maximum of the matched group i.e. max({CCM_match}), which
is shown by dashed lines in F igure 8 and Fig ure 9. This is
regarded as the boundary between similar and dissimilar template
move samples. Each set of threshold is different from templates as
the distinguishing powers between matched and unmatched
samples are different from template moves.
Template Move 2
Cumulative Mean of Frame Matching Cost
5. PARAMETER TUNING
To find a suitable block siz e NB for each t emplate, we consider
the distribution of the matched/unmatched groups again. Basically.
NB is determ ined by the minim al number of frames Z such that
TH(i) > max(CCM_match(i)) and TH(i) < min(CCM_unmatch(i)) holds in
each template move. Note that Z is different from template classes.
In the training result, Z is equal to 1 for Template Move 2 as
illustrated in Figure 8 while it is equal to 26 for Template Move 6
as illustrated in Figure 9. The parameter NB is determined as the
maximum value of Z’s for all template moves. This means that the
input stream is divided into blocks of 26 frames to be processed.
Hence, the recognition time is roughly 26/60 or 0.43 second.
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0
10
20
30
40
50
60
Frame Number
Figure 8. Threshold tuning for Template Move 2.
Cumulative Mean of Frame Matching Cost
the percentage completion PC% can be estimated in a particular
template move bas ed on the frame correspond ence. It can be
calculated by the time index of the corresponding frame matched
with the current frame in the input stream (the m–th frame) over
the total number of M frames of that recognized template move,
i.e. PC% = m/M100%.
Template Move 6
0.1
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0
0
10
20
30
40
50
60
Frame Number
Figure 9. Threshold tuning for Template Move 6.
Figure 10 illus trates the process of checking the completion
progress of an input template move using our tuned thres holds.
The postures below the graph are the key frames from the input
move and postures above the curve are the corresponding postures
of the template move. This shows that the frame correspondence
can be determined even if the inp ut move has s ome temporal
variations from the tem plate move and th at the percent
completion of an input move can be monitored progressively.
Figure 10. The mapping between an input move and a template move.
6. GAMEPLAY
Currently in our interactive dancing gam e, the tem plate dance
moves belong to A-go-go dance that is a popular dance in 1960s.
The purpose of the game is to let the player learn A-go-go and
have fun at the same time. In the ga me, a virtual partner will
dance with the player. For those who are new to A-go-go, they
can use the “training mode” to practice with their virtual partners
first. Once they become experienced, the “freestyle mode” is
designed for them to show off their dancing skills. In the
following sections, the set-up of the game, the training m ode and
the freestyle mode are discussed.
6.1 SETUP OF THE GAME
Figure 11 s hows an example of how a player dances with his
virtual partner. The play er has to wear a tight suit attached with
some optical markers as the sensor-based input. Watchin g the
screen in front of him/h er, the player can s ee his/her virtual
partner and other messages appeared i n the game such as the
score, recognized template moves, etc. The rendering is done by
OpenGL and the game interface is developed by MFC. The player
can select a game mode between the training mode and the
freestyle mode.
Figure 11. (a) A player is dancing with his virtual partner, (b) A screen shot showing a sample of image like the one
shown on the screen in (a)
6.2 TRAINING MODE
The training mode is for the players to get familiar with different
A-go-go template moves. There are eight template moves that are
labeled as either easy or difficult. Players can first learn the easy
template moves and then the dif ficult template moves. Each
template move is trained in three steps. In the first step, the player
watches the demonstration of a template player’s move and the
corresponding virtual partner’s move. T his lets the player know
how he/she should move and have an impression about how the
virtual partner will respond to his/her move. In the second step,
the player has to perform the move as much as he/she ca n in 20
seconds and understand whether he/she and the virtual partner are
able to dance toge ther as a team. The virtual partner cannot react
with the desired move if the play er dances poorly. One mark will
be awarded to the player for each completed move to motivate the
player. In the third s tep, the player can see the playback of the
captured move. If the play er is successful in making the moves,
he can see his avatar dancing with the virtual partner. If the player
fails to dance correctly, he/she should find out what is wrong with
his/her moves. By going th rough the training mode, the pla yer
will soon be familiar with all the eight template moves and is
ready for the freestyle mode.
6.3 FREESTYPE MODE
In the freestyle mode, the player can dance freely with any of the
eight template moves with the virtual partner. The player will get
scores if he/she completes a template move. One mark is awarded
to the player for each completed easy template move and three
marks are awarded to the player for each completed d ifficult
template move. To intro duce more challenges, the pl ayer may
perform some “combo” moves which correspond to a sequence of
certain completed moves. Once the player successfully performs a
combo, a message will be shown on the screen to notify the player
with a bonus score of 10 marks. After the performance, the player
can also watch the playback to see his/her performance.
accurate. The average marks of questions 4 and 5 are both 4.1,
which proves our recognition method is acceptably accurate.
Table 1. The six questions in the questionnaire
No.
Question
Avg. Mark
1.
Do you agree that the game is fun?
4.7
2.
Do you agree that y ou know more about
A-go-go?
4.7
3.
Do you agree that the motion of the virtual
dancer is smooth?
4.0
4.
Do you agree that the virtual dancer can
follow your move?
4.1
5.
Do you agree that the virtual dancer can
perform the correct move as you desire?
4.1
Apart from the quantitativ e results of th e questionnaires, some
valuable opinions have been collected from the subjects. Some
comments have been given on the usability of the sy stem. One
subject said that the demonstration in the training mode was quite
hard to follow as the display speed was too fast for novice and the
view point was not s uitable. The display for mus ic is also a
concern of one s ubject who thought that music can help him
moving with the right pace. One of them suggested that it would
be easier fo r players to learn if some comments about
performance of players can be given. It is also interesting that the
scoring is so important in the game. Some subjects reported that
they became much more eager to play after they knew that
another subject’s score was higher than theirs. For the recognition
of motion, one subject suggested that we can do some calibration
to adapt the system according to user variations. It is a good idea
as it can make our system more robust. We will consider using the
motion data c aptured in the training mode to tune the system’s
threshold in the near future. Other c omments about the
implementation of the system will also be handled in the future.
7. USER STUDIES
Seven subjects (one female and six males) are invited to try out
our interactive dancing game. Two of them have prior experience
of A-go-go dancing. Af ter introducing the game, each subject
played the game for twenty minutes. Afterwards, each subject was
asked to fill in a questionnaire with the questions listed in Table 1.
Subjects should put down marks ranging from 1 (totally disagree)
to 5 (totally agree) for the given statements.
The average marks given by the subjects are also illustrated in
Table 1. The avera ge mark of question 1 is 4.7, which suggests
that our game is fun. Question 2 obtained an average mark of 4.7
that shows our game can help pla yers learning A-go-go. The
average mark of Question 3 is 4.0, which shows our method can
most players thought our interac tive dancing game can recognize
their action well in real-time situation, and also the virtual partner
is rendered smoothly without much delay or jitter. In order to give
the player an impression that the virtual partner follows his move
and reacts with the correct move, the recognition of the player’s
moves and the identificatio n of the percent completion must be
8. CONCLUSION AND FUTURE WORK
In this paper, we have described an interactive dancing game that
provides an environment for a human play er to dance with a
virtual partner. The 3D motion capture technology is used as the
sensor-based input. W e highlight the propos ed real-time
recognition algorithm in a c ontinuous motion s tream. A block
matching approach is introduced which p erforms local frame
matching block by block in forward time s equence. In our
proposed method, the threshold for each template is trained with
the samples performed by different players and hence adaptive to
style variation. In this work, the A-g o-go dance motions have
been considered. A prototype dancing game is built based on this
algorithm. It is designed to s uit for players with different skill
levels. Novice can practice w ith the training mode. Skillful
players will be awarded by bonus points when they can interact
with the virtual partner in combo. T he scoring scheme aims to
motivate the player to keep play ing the game. The virtual partner
in the virtual environment allows the player to interact with it and
hence he/she will get immersed to the virtual dance room. Some
subjects are invi ted to tr y the sy stem and t heir feedbacks are
mostly positive.
Journal of Computer Games Technology, vol. 2008, Article
ID 751268, 7 pages, 2008.
Our proposed work can be further applied to applications or
games using optical motio n capture devices or video-bas ed
system like XBox Kinect [13] , etc. Games can be des igned to
allow a player to collaborate with a virtual characters to complete
a task, which is especially useful for performing art that required
a high level of collaboration (e.g. street dance).
[7] Calvert, T., Wilke, L., Ryman, R., Fox, I. 2005. Applications
of Computers to Da nce. IEEE Computer Graphics and
Applications, vol. 25, no. 2, pp. 6-12, Mar/Apr 2005.
As future work, in order to enhance the recognition performance,
we will provide a training mode to let the system learn to
accommodate with sty le variations among different players.
Meanwhile, more template moves are included in the system.
Another approach is to consider interactive moves of other types
that require closer collaboration between the player and virtual
partner, such as holding hands like in Waltz dance. Collision
detection between the avatars representing the play er and the
virtual dance partner is an interesting problem to be solved, which
can be extended to a m
ore sophisticated interaction model
between multiple real and virtual players.
9. ACKNOWLEDGMENTS
The work described in this paper was substantially supported by a
grant from the Research Grant Council of the Hong Kong Special
Administrative Region, China [project No. CityU 1165/09E].
10. REFERENCES
[1] Moore, M.E., and Sward, J. 2007. Introduction to the Game
Industry, Prentice Hall, 2007. (ISBN: 0-13-168743-3),
Chapters 8, pp.249-276.
[2] Babu, S., Schmugge, S., Inugala, R., Rao, S., Barnes, T., and
Hodges, L. F. 2005. Marve: a prototype virtual human
interface framework for stu dying human-virtual human
interaction. In Lecture Notes in Computer Science, vol. 3661,
pp.120-133.
[3] Jaksic, N., Branco, P., Stephenson, P., and Encarnação, L. M.
2006. The effectivenes s of so cial agents in reducing us er
frustration. In Proceedings of the CHI '06 Extended
Abstracts on Human Factors in Computing Systems
(Montréal, Québec, Canada, Ap ril 22 - 27, 2006). CHI '06.
ACM, New York, NY, pp.917-922.
[4] Iwadate, Y., Inoue, M., Suzuki, R., Hikawa, N., Makino, M.,
Kanemoto, Y. 2000. M IC Interactive Dance System-an
emotional interaction system, In proceedings of the Fourth
International Conference on Knowledge-Based Intelligent
Engineering Systems and Allied Technologies, vol. 1, pp. 9598, 2000.
[5] Li, C., Zheng, S.Q., and Prabhakaran, B.
2007.
Segmentation and R ecognition of Motion Streams by
Similarity Search, ACM Transactions on Multimedia
Computing, Communications and Applications (ACM
TOMCCAP), Vol. 3(3), Article 16, August 2007.
[6] Darby, J., Li, B., a nd Costen, N. 2008. Activity
Classification for Interactive Game Interfaces, International
[8] Ebenreuter, N. 2005. Dance Movement: A F ocus on the
Technology, IEEE Computer Graphics and Applications, vol.
25, no. 6, pp. 80-83, Nov/Dec 2005.
[9] Magnenat-Thalmann, N., Protopsaltou D., and Kavakli, E.
2008. Learning How to Dance Using a Web 3D Platform.
Lecture Notes in Computer Science, vol. 4823, pp.1-12, 2008.
[10] Leung, H., Chan, J ., Tang K.T., and Komura, T. 2007.
Ubiquitous Performance Training Tool Using Motion
Capture Technology. In proceedings of the First
International Conference on Ubiquitous Information
Management and Communication (ICUIMC 2007), pp.185194, Suwon, Korea, 8-9 February 2007.
[11] Tsuruta, S., Kawauchi, Y., W oong, C., and Ha chimura K.
2007. Real-Time Recognition of Body Motion for Virtual
Dance Collaboration System. In proceedings of the 17th
International Conference on Artificial Reality and
Telexistence, pp. 23-30, 28-30 Nov. 2007.
[12] Reidsma, D., Nijholt, A., Poppe, R.W., Rienks, R.J., and
Hondorp, G.H.W. 2007. Virtual Rap Da ncer: Invitation to
Dance. In proceedings of the CHI '06 extended abstracts on
Human factors in computing systems, pp. 263-266, 2006.
[13] Microsoft Corporation. 2010. Xbox Kinect. Available:
http://www.xbox.com/en-US/kinect