Digital Archive of Human Dance Motions

Digital Archive of Human Dance Motions
Atsushi Nakazawa, Shinchiro Nakaoka, Shunsuke Kudoh and Katsushi Ikeuchi
Institute of Industrial Science, University of Tokyo
Japan Science and Technology Corporation
{nakazawa,nakaoka,kudoh,ki}@cvl.iis.u-tokyo.ac.jp
Abstract. This paper presents our project that aims to develop the technologies to digitize, analyse and present
human motions. The human motion data is acquired from a motion capture system consisting of 8 cameras and 8
PC clusters. Then the whole motion sequence is divided into some motion elements and classified into some
groups according to the correlation of end-effectors' trajectories. We call these segments as 'motion primitives'.
Concatenating these motion primitives generates new dance motions. We are also trying to make a humanoid
dance these original or generated motions using inverse-kinematics and dynamic balancing technique.
Keywords: human motion, humanoid robot, motion primitive, motion capture data
1. Introduction
In these days, so many important intangible cultural properties are losing because of the
lack of the successors. Developing the archiving methods for these cultural properties are
one of the important issue for our project 1 . In this paper, we introduce our motion
archiving project that aims to develop total technology for motion acquisition,
computational analysis, searching and comparing them by using AI technologies.
Figure 1 shows our project overview. Our study is consisting of these issues:
1. Acquire original human motion by using motion capture systems.
2. Analyze the motion consists of motion segmentation and motion classification.
3. Motion editing based on primitive motions.
4. Present original/synthesized motions by using humanoid robots.
Human motions are acquired by using our original systems and conventional motion
capture systems. Our system consists of 8 cameras and PCs that are one-to-one connected
and acquire motion data by using conventional marker based stereo measurement method.
The basic idea of our motion analysis is detecting ‘dance motion primitives’ and describes
whole dance motions as the sequences of motion primitives. This idea is very important for
motion archives, such as searching, understanding, comparison and synthesis of the motions.
Furthermore, concatenation primitive motions can generate new dance motions.
How to present these archived motions is a crucial issue for effective motion digital
archive. Because we aim to develop the methods that the successors can easily recognize
archived motions and re-perform them again. We are developing a new VR-display method
that the humanoid robots perform archived or synthesized dance motions and present for the
users. Compared to the other presentation methods like videos or computer graphics
animations, users can understand human motions much easily and accurately through
humanoid robots’ performance because they can recognize actual human motions as a realtime and real-size object’s motions. This new display method can be used for other VRapplications that aim to present human motions much realistically compared with other
existing VR presentation methods.
2. Acquisition of human motions
The human dance motion is acquired by the motion capture system that consists of 8
cameras (SONY DXC-9000) and PC clusters (Pentium III-800MHz Dual). The cameras are
arranged to surround a person and PCs can acquire image frames in the size of 720x480
pixels at near 30Hz. All internal clocks of PCs are rectified by NTP protocol in advance, and
acquisition time of each image frames are also recorded in millisecond's accuracy.
We also acquire blur-free images of moving objects by using the frame-shutter functionality
with which this camera is equipped. All cameras' calibration parameters are acquired, and
then the geometrical relations between cameras and the internal parameter of each camera can
be determined. Attaching lighting markers on the desirable positions of human body carries
out the human motion acquisition. During the actor performing the dance motion, all PCs
only acquires the multi-viewpoint images and after that, stereo measurement is done by
matching markers between the images2.
Figure 1 : Our project overview.
3. Analysis: detecting 'motion primitives'
Our aim of motion analysis is detecting similar motion elements (‘motion primitives’) and
describes the whole dance motion with the sequence of them. According to the analysis
result, we can recognize the structure of the dance motion, such as the same motion segments,
iterative motion sequence or other kinds of regularities. Furthermore, it becomes possible to
detect the mutual relations of different dancing by comparing the primitive motion of them.
To detect the motion primitives, we paid our attention to the local minimum frames of the
end effectors velocity (hands and feet). Because it represents the human motion segment: the
start points and the end points of the motion primitives. Many researchers notice this value
for motion segment points3 4. To evaluate the similarities of the motion segments, we used
the DP distance of the target points' trajectories in 3D space. According to this value, the
motion segments are clustered into some groups. Consequently, same motions of the target
points have same labels. They are registered as the 'minimum motion primitives'.
In the structural motions such as the dances, much longer motion sequences (the
regularities of the minimum motion primitives) can be seen. Our algorithm can detect these
primitive motion patterns and detect final motion primitives by equalizing the same motion
sequences.
3.1 The motion analysis algorithm
We use 15 measurement points for analysis: hands(L,R), elbows(L,R), shoulders(L,R), head,
hip, body center, waists(L,R), thighs(L,R) and feet(L,R). The analysis pipeline consists of
following steps (Fig.2).
Figure 2 The motion analysis algorithm.
Figure 3 The preliminary analysis result of “Soran-Bushi”.
Figure 4 The analysis result of “Soran-Bushi”.
The colorerd segment indicates the frequently appearing motion segments.
(1) Define the body center coordinate system.
We define the body center coordinate system which set the X-axis as the direction of the
waist and Z-axis as the perpendicular direction. In order to detect the symmetry of the right
and left arms/feet movements, we used the symmetry coordinate system for right/left half of
the body portions.
(2) Coordinates conversion of target points.
The target points (both hands and feet) is changed into a body center coordinate system.
(3) Preliminary segmentation.
Calculate the velocities of the target points and detect the local minimum. Gaussian filter
is applied in advance to prevent segmentation errors.
(4) Evaluate the correlation between the segments.
Evaluate the correlation between the target points' trajectories according to the DP
matching distance.
Assume that segment m, n are described as Vm = {vm1 , vm2 , L , vmim | vmi ∈ R 3 } ,
Vn = {vn1 , vn2 , L , vnin | vni ∈ R 3 } then the distance between these segments D(m, n) can be
calculated with following :
D(m, n) = S (Vm , Vn )
S (k , l ) = d k ,l + min( S k ,l −1 , S k −1,l −1 , S k −1,1 )
d i , j =|| vmi − vn j ||
(5) Cluster and label the segments.
The detected segments are clustered with nearest neighbour algorithm. So the segments in
which the target points pass the similar locus have the same labels. The symmetrical motions
are also detectable because we use the symmetrical coordinate system for right and left side
of the body.
Using these preliminary analysis procedures, whole motion sequence is segmented and
clustered into the segments in which a target point draws same trajectory. We call these
segments as the 'minimum motion segments'. Figure 3 shows the preliminal analysis result of
the Japanese folk dance 'Soran-Bushi'.The minimum motion segments represent very simple
motions such as "Swing down the left arm", "Steps forward the right leg". In addition, much
long motion units exist for dance motions. To find these ones, following steps are applied for
extracting the frequently appearing sequence of the minimum motion segments.
(6) Find the frequently appearing minimum motion segment sequences.
From the classified segment sequences within the same portions, frequently appearing
ones are detected by using the apriori algorithm5. These results are registered as 'higher level
motion segments'. This processing is performed to the segment sequence of all possible
length in a part. Consequently, we can acquire multi-length and multi-hierarchical motion
segments.
(7) Find the motion primitives among the different target points.
In order to find the motion primitive of the whole body portion, the correlations between
different target points are evaluated. For any level motion primitives, coincidence
probabilities between the primitives of different target points are calculated. If this value is
higher than the threshold, they have a relation and defined as the longer motion primitives.
(8) Equalization
Finally, the motion segment sequences that are classified to the same motion are equalized
for its 3D trajectories. In this process, the DP matching result is used to find the temporal
matching points. The equalized results are preserved as the final primitive motion of this
motion sequence.
Figure 4 shows the final analysis result of the "Soran Bushi". We can notice following
feathers: (a) The whole dance motion consists of the iterative motion primitives and unique
motion segments, (b) Iterative motion primitives are connected by a unique primitive motion
and (c) The longer unique primitive motion sequences exist in a part of whole dance motion.
According to these results, we understand the structure of this dance. This consists of some
variations of iterative motion primitives and unique motion sequences.
4. Generate new motions from motion primitives
As described in the last section, dance motions consist of the iterative motion primitives
and unique motions that connect the iterative motions. We also noticed that this structure
exists not only in this example but also in most of Japanese folk dances. According to this
fact, concatenating the motion primitives can generate new dance motion.
4.1 The motion generation algorithm
Our motion concatenation algorithm is shown in Figure 5. Assume that two motion
primitives are selected to be concatenated. The transition of these motions is generated with
following pipeline:
(1) Set up a support leg during the transition. The latter primitive is translated so that this leg
comes to a same position.
(2) Calculate the positions of the unsupported foot, waist, body and neck during transitions.
We employed 2nd order polynomials to keep the continuity of the position and velocity.
v(t ) = a (t − T / 2) 2 + b
where v(0) = v 0 , v(T ) = v1
x(t ) = x(0) + ∫ v(t )dt
(3) Linear interpolation is applied for the waist and neck coordinate system with following
procedure.
(a) Assume that the coordinate systems of the start and end time as 0 R1 , 0 R2 . Then the
rotation matrix between these coordinates is calculated 1 R2 =( 0 R1 ) −1 0 R2 .
(b) Convert 1 R2 into the quaternion description Q12 = {x, y, z , w} .
(c) Interpolating rotation matrix is calculated as Q12 (t ) = {x, y, z , w(t )} , where
w(t ) = w ⋅ t / T .
(d) Convert Q12 (t ) into the rotation matrix R12 (t ) .
(e) Interpolated coordinate system is acquired as 0 R(t )= 0 R1 R12 (t ) .
Figure 5 The motion concatenation algorithm.
Figure 6 The motion concatenation results.
(4) For the movements of the arms and the feet, the minimum joint angle jerk model is
employed for interpolation. We assume each portion has 4-DOF(2-DOF on a shoulder,
thigh, 2-DOF on a elbow, knee), the joint angle parameter vectors during transition as
Θ(t ) ∈ R 4 . To generate the transition motions is to determine these functions. We define
each ones as the 5 order polynomials and set the angles and their verocities’ border
conditions at t = 0, T . Finally, each parameters are determined to minimize the joint
angles' jerk during transition:
∫
T
t =0
Σ
θ n (τ ) 3
dτ → min
dτ 3
The duration T is determined by the average of the former and latter segments' time.
Figure 6 shows the interpolation of the motion primitives from Japanese Fork Dances 'Soran
Bushi' and 'Harukoma'.
5. Presenting dance motions by humanoid robots
In this section, we show a method to import the dance motions into a humanoid robot. A
similar study has been done by Pollard et.al 6. They used robot arms that have the same DOF
of humans' ones. In our study, we employed 28-DOF whole body humanoid robot HRP-1S
7
and try to imitate whole body motions. As the first trial, we assumed that the feet are fixed
and imitate upper body motions. Although only imitating hands movements, we noticed the
whole body balancing control is necessary to keep stand this robot. For this issue, we
propose a new method that enables robot to imitate human dance motions as similar possible
while keeping its body stand, through dance motion structure analysis we presented last
sections.
For these trials, we used OpenHRP simulator and HRP-1S virtual model8. These ones
enable to test whether our data works on the real humanoid robot.
5.1 Limitation of joint angles and velocities
The joint angles of dance motions can be solved by using original motion capture data,
simple inverse kinematics algorithm and humanoid robots connection models. But these
angle values cannot be imported directly because of these restrictions: the singularity and the
limits of joint angles and their velocities.
Pollard proposed a method to solve these problems. On their methods, joint angle values
are deformed so that it becomes within the limits, by applying this filter like PD control
model:
θ&i = θ i − θ i −1 (1)
θ&&F ,i +1 = 2 K s (θ&i − θ&F ,i + K s (θ i − θ F ,i )) (2)
θ&F ,i +1 = max(θ&L , min(θ&U , θ&F ,i + θ&&F ,i +1 )) (3)
θ F ,i +1 = θ F ,i + θ&F ,i +1 (4)
θ& = θ − θ (5)
i
i
i +1
θ&&B ,i −1 = 2 K s (θ&i − θ&B ,i + K s (θ i − θ B ,i )) (6)
θ&B ,i −1 = max(θ&L , min(θ&U ,θ&B ,i + θ&&B ,i −1 )) (7)
θ B ,i −1 = θ B ,i + θ&B ,i −1 (8)
θ V = 0.5(θ F ,i + θ B ,i ) (9)
where θ i is the original joint angle, θ&L ,θ&U are the lower and upper angle velocity limits.
Equations (1)-(4) are solved from the start frame to the end, and equations (5) -(8) are solved
backward. The final joint angle θ V is the average of the forward and backward passes (9).
Using this method, the joint angle velocities are kept within the given limits. The singularity
is also resolved at the same time because a sequence part near the singular point can be
regarded as the area where the velocity is very fast. As a result, a gimbal locked part is
decomposed to properly interpolated angles.
5.2 Keeping balance
Although we are assuming that the robot's feet are fixed while dancing, the balancing
problem still exists. When the robot swings arms in wide arc (Figure 7-a), it cannot keep the
balance and falls down (Figure 7-c). To keep the robot standing during all dance sequences,
its ZMP (Zero Moment Point), which indicates a balanced force point existed between the
robot and ground, must be within a support area enclosed by its sole areas9. Although all the
body motions influence ZMP, the movements of the arms are main factors to move ZMP
outside the support area because the portions are far from feet contact points ('the fulcrum' of
the body). As the result, the robot must fall down like Figure 8-c.
In order to keep ZMP within the support area, the motion must be modified to compensate
for the ZMP trajectory. On the Pollard's filter, the stiffness parameter K s controls the motion
dynamics. When K s reduces, the joint angle accelerations are limited and the whole motion
becomes loose and compact. Then the ZMP position is kept so that the robot can remain
standing. But too small K s makes large difference compared to the original motions (Figure
7-b). So finding most suitable and optimal K s is very important for good motion imitation
while keeping the robot balance. The first idea is finding maximum K s during the whole
motion sequence as following steps:
(1) Detect the frames where ZMP is outside the support area.
(2) Reduce K s on the primitive segments that include ZMP deviation.
(3) Iterate the above process until ZMP is inside over the whole motion sequence.
In this process, K s are optimised and we achieve a proper motion between similarity and
balancing. On the simulator, we have simulated the motions generated by this method and
they have realized balance keeping. Then we have experimented with the real robot (Figure
7-d). It has completed the whole motion standing by itself.
Figure 8 Motion sequences: (a) is the original motion. (b) is the motion generated
with small K s . (c) is same as (a), this falls down on dynamics simulation. (d) is the
real humanoid (HRP-1S). Our method enables it to keep standing.
Figure 9
Example of primitive distinguished graph; In the new graph, a gradient
is flattened and the velocity is zero at the segment boundary.
5.3 Clarify the dance motions' poses
The problem of the last algorithm is making the motion 'ambiguously'. This is because
such kinds of filter lose the 'Stop Motions' of the dance motions. As described in section 3,
the local minimum frames of the target points' velocity represent the borders of primitive
motions, and the dancer takes some particular and important body poses on these frames.
Based on this idea, we have proposed a method for clarify the dance motions. The
following steps are applied to each joint sequence:
(1) When the joint angle velocity is not nearby zero at the primitive boundary, force the
velocity and the acceleration to be zero.
(2) In a joint angle graph, find the nearest extreme points both backward and forward from
the boundary point. If the distance between them is shorter than a threshold, a value of
the boundary point is replaced with the extreme point, and the next extreme point is
taken.
(3) Boundary point is smoothly connected with the extreme points by a proper polynomial
equation with constraints that the velocity and the acceleration are zero at the end
points. This connection curves override former values.
Figure 8 shows the results. We have confirmed this method is useful for imitating much
human-like dance motions.
6. Conclusion
In this paper, we propose the dance motion imitation method for humanoid robots through
visual observation. The human motion is acquired by the motion capture systems that consist
of 8 cameras and 8 PCs. Using motion analysis method, we can recognize the structure of the
human dance motion and the motion primitives. We also develop the motion concatenation
method and show that the original motion is recovered by concatenating the motion
primitives. For importing human motions into the robot, two problems are there: the limit of
the angles and their velocities and the balancing. For later problem, we present a new method
that controls the hand's movement by using ZMP trajectories and motion analysis results.
Final simulation results proved our method could keep both the robot's balancing and the
shape of original motion.
References
Daisuke Miyazaki, Takeshi Ooishi, Taku Nishikawa, Ryusuke Sagawa, Ko Nishino, Takashi Tomomatsu,
Yutaka Takase, Katsushi Ikeuchi, The Great Buddha Project: Modelling Cultural Heritage through
Observation,VSMM2000 (6th international conference on virtual systems and multimedia),pp.138-145,2000
2
R.F.Rashid: Towards a system for the interpretation of moving light display, IEEE PAMI vol 2, no 6, pp574581, 1980
3
T.Flash, H.Hogan:The coordination of Arm Movements,J. Neuroscience, pp.1688-1703, 1985
4
Y.Uno, M.Kawato, R.Suzuki: Formation and Control of Optimal Trajectory in Human Multi-Joint Arm
Movement-Minimun Torque Change Model, Biological Cybernetics 61, pp.89-101, 1989
5
R. Agrawal et.al: Fast Discovery of Associattion Rules, Advances in Knowledge Discovery and Data
Mining, MIT Press, 1996
6
Nancy S. Pollard et.al, Adapting Human Motion for the Control of a Humanoid Robot, ICRA2002, 2002
7
Kanehiro,F. et al, Virtual humanoid robot platform to develop controllers of real humanoid robots without
porting, Proc. IEEE/RSJ IROS 2001, 2001.
8
K. Yokoi, et al: A Honda Humanoid Robot Controlled by AIST Software, Humanoids 2001, pp.259-264.
9
Seyoon Tak, Oh-young Song, and Hyeong-Seok Ko: Moiton Balance Filtering, EUROGRAPHICS2000.
1