Imitating human reaching motions using physically inspired optimization principles

Imitating human reaching motions using physically
inspired optimization principles
S. Albrecht, K. Ramı́rez-Amaro, F. Ruiz-Ugalde, D. Weikersdorfer, M. Leibold, M. Ulbrich and M. Beetz
[email protected], {ramirezk, ruizf, weikersd}@in.tum.de, [email protected], [email protected], [email protected]
Intelligent Autonomous Systems Group, Chair of Mathematical Optimization and Institute of Automatic Control Engineering
Technische Universität München, Germany
Abstract— We present an end-to-end framework which equips
robots with the capability to perform reaching motions in a
natural human-like fashion. A markerless, high-accuracy, modelbased human motion tracker is used to observe how humans
perform everyday activities in real-world scenarios. The obtained
trajectories are clustered to represent different types of manipulation and reaching motions occurring in a kitchen environment.
Using bilevel optimization methods a combination of physically
inspired optimization principles is determined that describes
the human motions best. For humanoid robots like the iCub
these principles are used to compute reaching motion trajectories
which are similar to human behavior and respect the individual
requirements of the robotic hardware.
The presented framework is the combination of several
procedures. The starting point are video sequences of humans
performing everyday manipulation tasks. A markerless human
motion tracker is used to extract the posture of the human in
each frame and then this data set is sorted into the characteristic human sub-movements by a clustering approach based
on the algorithms of Dynamic Time Warping and k-mean.
Given a characteristic movement, a novel bilevel optimization
approach yields the one out of a given family of cost functions
which reproduces the data best. For the specific dynamics of
the humanoid robot’s arm an optimal trajectory for this cost
function is computed (see Figure I).
I. I NTRODUCTION
In daily life a huge potential exists to assist and disencumber
humans in fulfilling their tasks, by providing service or care
robots. A difficult task in building adaptive and autonomous
robots is generating task-specific robot motions which fit
naturally in a human everyday environment. The desired
requirements of such a system would be to generate motions
based on the current task, the object type and state, while
additionally considering information given through context
and environment.
Our contribution is a framework which is able to represent
and generate motions based on an optimal combination of
physically inspired principles obtained by analyzing human
motions recorded in an unconstrained observation setup. In the
experiments discussed here, subjects are observed while performing the everyday activity of setting the table without prior
instructions. The underlying principles of the human motions
are determined by a bilevel optimization approach, i.e. we
identify parameters in an optimal control problem describing
the motion planning, and a human-like control is generated for
a humanoid robot based on these identified parameters. This
approach has the important property to generate motions that
fit different kinematic/dynamic chains while also considering
variations in the task space goal. The generated motions are
similar to human behavior as they are based on the same
optimization principles. This is an important requirement in
human-robot interaction scenarios where humans are expecting
a certain behavior from the robot.
As an example of everyday movements we focus on reaching motions, which consist in moving one hand from an initial
position in the workspace to the desired goal position.
Fig. 1.
Framework overview
The field of describing human characteristics has various
facets and is growing fast. Disciplines dealing with analyzing
and describing these movements range from psychology and
biology over computer and engineering sciences to mathematics. Consequently, the relations and differences between all
these approaches cannot be discussed within the limits of this
paper; e.g. refer to reviews [12, 25, 29].
The advantages of learning from demonstration were shown
for example in [3], where pole-balancing was accomplished
after one trial. Two types of learning strategies in the context
of robot control have to be distinguished: Motion imitation,
where the observed human state is directly mapped onto the
imitating robot, as in [4, 10, 11, 14, 19], and imitation learning,
where characteristics found in the demonstrated motions are
used to control the robot. Two central techniques in imitation
learning are motor primitives, e.g. [21, 24], and Markov
decision problems, e.g. [8, 20, 26]. The basic assumption of
such Markov decision problems is that the set of states is finite
and that state transition probabilities are known. Under these
assumptions the techniques of inverse reinforcement learning
can solve the inverse problem of finding the cost function
applied in the demonstrations [1, 37].
Another approach based on the dynamics of the human body
and using optimization of a new cost function was presented
in [18]. Their goal is to describe the human motion based on
muscle dynamics rather than to generate a robot control.
If the focus is shifted from finding the underlying optimality
principle of demonstrated human motions to mapping a given
demonstrated posture to a system with a different kinematic
and dynamic structure, a task is obtained which is similar to
animating characters, e.g. [36]. The task of mapping human
postures to a robot considering the dynamics and kinematic
bounds was presented in [28].
The paper is organized as follows: In section II the tracking
procedure is discussed and the proposed clustering strategy is
presented in section III. The bilevel approach is summarized
in section IV, followed by the description of the transfer to
the iCub robot in section V. Note that the results of the
discussed examples are distributed to the respective sections
of this paper.
II. H UMAN M OTION T RACKING
model which can be adapted to the specific subject performing
the kitchen activities [5].
To be able to accurately track in the high-dimensional joint
state space of the human model, the tracker uses a mixture
of annealed and partitioned particle filtering [6]. The human
motion data used in this paper was recorded and published as
part of the TUM Kitchen Data Set [30]. The database contains
data for several different subjects which executed everyday
kitchen tasks like setting the table without specific instructions
(see Figure 3).
Fig. 3.
Scenes from the TUM kitchen data set
III. C LUSTERING H UMAN M OVEMENTS
The reaching trajectories available in the TUM Kitchen Data
Set show a high variance due to different object types and
relative object positions.
In order to correctly identify motion principles we first
cluster all available reaching motions into subtypes using a
clustering technique based on Dynamic Time Warping (DTW)
and k-means (compare Figure 4).
In order to build detailed and reliable models of human
behavior in everyday scenarios a high-accuracy and versatile
human motion tracker is required. Industry-leading trackers
based on markers attached to the human limbs or requiring
special suits are not an option when recording in an real-world
environment like the assistive kitchen [7].
For our recordings we choose a markerless high-accuracy
and model-based tracker which uses the industry-leading
RAMSIS model. It provides 63 degrees of freedom in the joint
space (see Figure 2) and has a highly parameterizable skin
Fig. 4.
Fig. 2.
The full joint space of the RAMSIS human model
Overview of the steps to cluster the trajectories
The DTW [17] algorithm has earned its popularity by being
extremely efficient as a similarity measure of time series which
minimizes the effects of shifting and distortion in time by
allowing ‘elastic’ transformation of time series in order to
detect similar shapes with different phases.
To align two time series X = (xi )1≤i≤N and Y =
(yj )1≤j≤M , xi , yj ∈ R3 , N, M ∈ N, the algorithm starts
by building the distance matrix C = (ci,j ) ∈ RN ×M , where
ci,j := d(xi , yj ) = kxi − yj k is the Euclidean distance.
Once the local cost matrix C is built, DTW finds the
alignment path that runs through the low-cost areas of the cost
matrix. This path can be found using dynamic programming
by evaluating the following recurrence for γ(i, j):
γ(i, j) = ci,j+min {γ(i−1, j −1), γ(i−1, j), γ(i, j −1)} (1)
Using the metric on the trajectory space obtained through
DTW, the k-means algorithm can now be used to cluster the
reaching trajectories. By varying the cluster count, we can
identify different types of reaching motions.
The results obtained from the above procedure are shown
in Figure 5. The trajectories where obtained by taking the
Cartesian positions of the hand of the right arm (see HAR
in Figure 2). We found several main clusters which are
differentiated by object type and relative object position. It
can be observed that these sequences show stereotypical and
pre-planned motion patterns [35].
Fig. 5.
Trajectory clusters for wrist motions in the TUM Kitchen Data Set
For the optimization step we used arm movements of three
clusters of totally different motion types: Reaching for a cup
in an upper kitchen unit (cluster 1), for a placemat on top of
the oven (cluster 2) and for a spoon inside a drawer (cluster
7). This choice was made to show that our approach can be
used for different kinds of human arm motions.
IV. O PTIMIZATION - BASED A NALYSIS
A common assumption of many approaches analyzing human motion is that humans try to minimize an unknown
cost function while doing everyday manipulation tasks. This
assumption also forms the basis for the strategy discussed here.
It has to be noticed that the main purpose of this approach is to
describe human behavior based on physical models rather than
to explain it from a biological or psychological perspective
[12].
Various cost functions have been proposed in literature and
most of them are used within an open-loop approach. Closedloop control [15, 32] allows for correction of deviations due
to external disturbance or noise in the human muscles, but
the computation of such a control law is complicated for reallife robotic systems [31]. No learning or adaption processes
are analyzed here, consequently the human motion planning
is described as an open-loop approach.
In [13] the minimum jerk hypothesis is introduced. Being
based purely on kinematics, this criterion postulates that the
trajectory of the human hand can be determined by minimizing
jerk, the third time-derivative of the hand position. Some
characteristics of planar arm motions can be explained by this
criterion [12], while other observations cannot [33]. Consequently, several other cost functions were proposed since then,
e.g. the minimum torque change criterion [33] accounting for
the dynamics of the human arm. The step from planar motions
to full three-dimensional arm movements further complicates
the task of describing human motions [16]. To the best of our
knowledge all basic cost function presented in literature so far
can only describe a limited range of movements.
Thus, we will consider a convex combination of cost
functions and adapt the weighting factors of the considered
costs for diverse tasks. The question arises which choice of
weights can describe an observed human motion, leading to
the problem of finding the cost function out of a given family
which approximates the given data best.
From a mathematical point of view this is a bilevel optimization problem, i.e. a combination of two optimization problems
[9]. In the lower level problem (llp) a specified cost function
f is minimized with respect to the given arm dynamics and
the boundary conditions describing the analyzed motion task,
i.e. minimization of f (x) subject to h(x) = 0. The vector
x is a concatenation of the values of the joint angles and
their derivatives and the values of the muscle forces and their
derivatives. Denote the considered basic cost functions by fi ,
then the llp cost function being the weighted combination of
these is given by
X
f (x|w) :=
wi fi (x).
(2)
i
Hence, the llp is a standard problem of optimal control with
the optimal solution x∗ (w) for a fixed weight vector w:
min f (x|w) subject to h(x) = 0.
x
(3)
The dynamics of the human arm are modeled by connecting
rigid body dynamics of the links with a simple dynamical
model of the muscle characteristics. The arm model possesses
two joints (shoulder and elbow) having five degrees of freedom and seven lumped muscle pairs altogether. The resulting
ordinary differential equation is discretized by using a singlestep method to obtain the constraints of the type h(x) = 0.
The goal of our approach is now to find the weight vector
w minimizing the deviations between recorded human data
and the quantities corresponding to the solution of the llp.
We choose to compare the two by their hand positions in
Cartesian space. Consequently, a function pcomp is introduced
which returns the hand positions belonging to the llp solution
x∗ (w) and pdata are the recorded human hand positions. The
-0.4
-0.3
-0.2
-0.1
0
0.1
-0.4
-0.3
-0.2
-0.1
0
0.1
-0.4
0.2
0.2
0.2
0.1
0.1
0.1
0
0
0
-0.1
-0.1
-0.1
-0.2
-0.2
-0.2
-0.3
-0.3
-0.3
-0.4
-0.4
-0.4
(a) cup motion
Fig. 6.
(b) placemat motion
-0.3
-0.2
-0.1
0
0.1
(c) spoon motion
Comparison of tracked wrist and elbow positions (black/grey) and trajectories computed from the optimal combination of cost functions (blue)
(units in meter) [projection to the x-z-plane]
applied distance measure dist uses points of equal paths length.
Summing up, the following optimization problem is obtained,
the so-called upper level problem (ulp):
min dist(pdata , pcomp (x∗ (w)))2
w
(4)
subject to
X
wi = 1 and x∗ (w) solves llp for w.
(5)
i
The combination of the lower and the upper level problem
is a bilevel optimal control problem. A solution strategy
handling the llp and the ulp separately was presented in
[23] for macroscopic lower level dynamics. The llp dynamics
of the problem solved here are more complex due to the
nonlinearities in arm and muscle dynamics.
The strategy presented here is based on a reformulation of
the bilevel problem using the first order necessary conditions
(KKT-conditions) for an optimum of the llp:
0 = ∇f (x|w) + ∇h(x)λ,
(6)
where λ are the adjoint variables. This equation together with
h(x) = 0 can be used to transform the bilevel problem into a
standard nonlinear optimization problem:
min dist(pdata , pcomp (x))2
w,x,λ
(7)
subject to
0
=
∇f (x, w) + ∇h(x)λ,
(8)
0
=
(9)
1
=
h(x),
X
wi ,
(10)
i
0
≤
wi .
(11)
Note that each solution of the bilevel problem solves this
reformulation, but the reverse implication does not always
hold, because the KKT-conditions are only necessary.
The reformulated problem is solved by using the optimization software IPOPT [34], an interior-point algorithm for
large-scale nonlinear programming. The mathematical details
are out of the scope of this paper, but can be found in [2].
Therein we discuss among other things the details of the
problem structure and the suitability of our approach for the
given task. Furthermore, we show that a wide range of values
can be used to initialize the weights wi .
In the bilevel optimization of the three stereotypic arm
movements recorded in the kitchen (compare section III) we
use the generalizations to three dimensions of some common
cost functions: Minimization of hand jerk [13], joint jerk [27]
and torque change [33] and additionally, we consider variants
like planar hand jerk based only on the jerk of the x- and
y-component.
Here the optimization results for three motions (reaching
for a cup in an upper kitchen unit, reaching for a placemat on
top of the oven and picking up a spoon inside a drawer) are
presented. The recorded trajectories of human wrist and elbow
positions and the corresponding optimal results of the bilevel
optimization are displayed in Figure 6. It can be observed that
it is possible to find weight vectors w such that the computed
hand paths get close to the recorded ones.
The optimization results show that no single criterion alone
can correctly reproduce the trajectories. Figure 7 shows the
root of the values of the upper level cost function, which is
the root mean squared deviation (RMSD) between the recorded
data and the computed trajectory, for some of the used basic
cost functions and the optimal combination.
RMSD @mmD
1.0
0.8
0.6
0.4
0.2
RMSD @mmD
1.0
0.8
0.6
0.4
0.2
(a) cup motion
Fig. 7.
(b) placemat motion
Joint Jerk
Hand Jerk
Optimal Combination
(c) spoon motion
RMSD between observed and optimal trajectories for some basic llp cost functions and the optimal combination
V. T RANSFER TO THE ROBOT
We use the iCub [22], a 53 degrees of freedom humanoid
robot, to transfer and execute the human motions. The strong
humanoid design of the iCub gives us an appropriate testing
platform to show similarities between the original human
motions and the human-based optimized robot motions. Strong
anatomical human-robot similarities can be appreciated on the
shoulder and elbow joints, which are precisely the joints we
are concentrating on.
Using the cost function found by the bilevel optimization for a characteristic movement in the kitchen scenario,
the kinematics, dynamics, and joint limits of the iCub, and
selectable initial and final Cartesian positions, an optimal
control problem similar to the llp mentioned above is solved
to generate the controls for the iCub. The obtained motion
is therefore human-like with respect to the optimization principles used by the human for that specific action (human to
robot optimization transfer). Additionally it is also adjustable
to any desired initial state and final goal within the workspace
of the robot. Therefore the robot is able to execute the same
type of intentional action but with the flexibility of choosing
an initial state and final goal to fit the current situation.
z @mD
0.1
Torque Change
RMSD @mmD
1.0
0.8
0.6
0.4
0.2
For evaluation purposes in our experiments we chose final
and initial states for the robot similar to the ones from the
human but scaled down in order to fit inside the workspace of
the robot. Execution time was also scaled to fulfill the velocity
restrictions of the robot.
It is important to note that our approach does not copy
human movements directly but generates trajectories for the
robot’s arm based on a combination of dynamic principles used
by the human. Additionally, in our case the more restrictive
hardware constraints in the kinematic chain of the robot are
met (Figure 8). The generated optimal motions are angle
trajectories that are fed to the robot at regular intervals and
executed using an independent P controller q̇ = kp (qref − q)
for each joint. This controller introduces a delay and the robot
itself has a motor velocity controller that introduces other types
of transformations to the signals.
Figure 9 shows the human and the iCub robot executing the
reaching motion for the placemat from a similar viewpoint.
The elbow of the robot has to guarantee a certain distance
to the torso in order to avoid self-collision. This limits the
maximal similarity between robot and human motion, but on
the other hand it shows that the difficulties arising in the
context of transferring control strategies can be handled by
our approach.
0
-0.2
y @mD
-0.1
0.2
0.1 x @mD
0
0
Fig. 8. Comparison of robot wrist trajectory (red) and human wrist trajectory
(black) in robot working coordinates relative to the starting position for three
different motions (left: spoon, middle: cup, right: placemat)
Fig. 9.
Human and iCub executing a reaching motion for a placemat
VI. C ONCLUSION AND F UTURE W ORK
This paper discusses a framework to record and analyze
human arm motions in an environment of everyday-life, in
our case a kitchen scenario, and then to use this knowledge
to control a humanoid in a human-like way. Various steps
are needed within this framework: Starting with tracking
the human motions and then clustering into the relevant
sub-movements, data is assembled which clearly shows the
stereotypical human motions. A bilevel optimization approach
is then used to analyze this data and obtain a cost function
describing the human arm movement best, considering the
nonlinear dynamics of the human arm. In the last step, this
cost function is used to generate optimal trajectories for a
humanoid robot doing the same task as the human did before.
The main contribution of our approach is to use physically
inspired cost functions, which optimally describe observed
human arm motions, to control a robot not by copying the
observed trajectories of the human, but by minimizing the
same cost function for the robot while maintaining the ability
to change the initial position and the final goal.
Because the robot control is based on optimizing a cost
function that is likely to be meaningful and important for the
execution of the current activity from the intention perspective,
this system has the potential to be connected in a convenient
way to high level reasoning and planning. This high level
system could select the appropriate cost function parameters
to fit the current desired intention of the action.
VII. ACKNOWLEDGMENTS
This work is in part funded by the CoTeSys cluster of excellence “Cognition for Technical Systems” within the Excellence
Initiative of the DFG. Part of the numerical computations
were performed on a Linux cluster supported by DFG grant
INST 95/919-1 FUGG. Additionally, K. Ramı́rez-Amaro is
supported by a CONACYT-DAAD scholarship.
R EFERENCES
[1] P. Abbeel and A.Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proc. Int. Conf. on Machine Learning, 2004.
[2] S. Albrecht, M. Leibold, and M. Ulbrich. A bilevel optimization
approach to obtain optimal cost functions for human arm-movements.
Numerical Algebra, Control and Optimization (NACO), accepted, 2011.
[3] C.G. Atkeson and S. Schaal. Learning tasks from a single demonstration.
In IEEE ICRA, pages 1706–1712, 1997.
[4] C.G. Atkeson and S. Schaal. Robot learning from demonstration. In
Proc. Int. Conf. on Machine Learning, pages 12–20, 1997.
[5] J. Bandouch, F. Engstler, and M. Beetz. Accurate human motion capture
using an ergonomics-based anthropometric human model. AMDO 2008,
LNCS, 5098:248–258, 2008.
[6] J. Bandouch, F. Engstler, and M. Beetz. Evaluation of hierarchical
sampling strategies in 3d human pose estimation. In 19th British
Machine Vision Conf., 2008.
[7] M. Beetz et al. The assistive kitchen - a demonstration scenario for
cognitive technical systems. In IEEE ROMAN, 2008.
[8] S. Calinon and A. Billard. Incremental learning of gestures by imitation
in a humanoid robot. In ACM/IEEE HRI, pages 255–262, 2007.
[9] S. Dempe. Foundations of bilevel programming. Kluwer Academic
Publishers, 2002.
[10] M. Do, P. Azad, T. Asfour, and R. Dillmann. Imitation of human
motion on a humanoid robot using non-linear optimization. In IEEERAS Humanoids, pages 545–552, 2008.
[11] C. Dube and J. Tapson. Kinematics design and human motion transfer
for a humanoid service robot arm. In Rob. Mech. Symp., 2009.
[12] S.E. Engelbrecht. Minimum principles in motor control. J. Mathematical
Psychology, 45(3):497–542, 2001.
[13] T. Flash and N. Hogan. The coordination of arm movements: an
experimentally confirmed mathematical model. J. Neurosci., 5(7), 1985.
[14] E. Gribovskaya, S.M. Khansari-Zadeh, and A. Billard. Learning nonlinear multivariate dynamics of motion in robotic manipulators. J.
Robotics Res., 30(1):80–117, 2011.
[15] C.M. Harris and D.M. Wolpert. Signal-dependent noise determines
motor planning. Nature, 394:780–784, 1998.
[16] F. Hermens and S. Gielen. Posture-based or trajectory-based movement
planning: a comparison of direct and indirect pointing movements. Exp.
Brain Res., 159(3):340–348, 2004.
[17] E. Keogh and C.A. Ratanamahatana. Exact indexing of dynamic time
warping. Knowledge and information systems, 7(3):358–386, 2005.
[18] O. Khatib, E. Demircan, V. De Sapio, L. Sentis, T. Besier, and S. Delp.
Robotics-based synthesis of human motion. J. Physiology Paris, 103(35):211–219, 2009.
[19] C.H. Kim, D. Kim, and Y. Oh. Adaptation of human motion capture data
to humanoid robots for motion imitation using optimization. Integrated
Computer-Aided Engineering, 13(4):377–389, 2006.
[20] D. Lee, H. Kunori, and Y. Nakamura. Association of whole body motion
from tool knowledge for humanoid robots. In IEEE/RSJ IROS, pages
2867–2874, 2008.
[21] M.J. Mataric. Getting humanoids to move and imitate. Intelligent
Systems and their Applications, 15(4):18–24, 2002.
[22] G. Metta, G. Sandini, D. Vernon, L. Natale, and F. Nori. The icub
humanoid robot: an open platform for research in embodied cognition.
In Workshop on Performance Metrics for Intelligent Systems, pages 50–
56, 2008.
[23] K. Mombaur, A. Truong, and J.-P. Laumond. From human to humanoid locomotionan inverse optimal control approach. Auton. Robot.,
28(3):369–383, 2010.
[24] S. Nakaoka, A. Nakazawa, F. Kanehiro, K. Kaneko, M. Morisawa, and
K. Ikeuchi. Task model of lower body motion for a biped humanoid
robot to imitate human dances. In IEEE IROS, pages 3157–3162, 2005.
[25] D.J. Ostry and A.G. Feldman. A critical evaluation of the force control
hypothesis in motor control. Exp. Brain Res., 153(3):275–288, 2003.
[26] N.D. Ratliff, J.A. Bagnell, and M.A. Zinkevich. Maximum margin
planning. In Proc. Int. Conf. on Machine learning, pages 729–736,
2006.
[27] D.A. Rosenbaum, L.D. Loukopoulos, R.G.J. Meulenbroek, J. Vaughan,
and S.E. Engelbrecht. Planning reaches by evaluating stored postures.
Psychological Review, 102(1):28–67, 1995.
[28] A. Safonova, N.S. Pollard, and J.K. Hodgins. Optimizing human motion
for the control of a humanoid robot. Proc. Applied Mathematics and
Applications of Mathematics, 2003.
[29] S. Schaal, A. Ijspeert, and A. Billard. Computational approaches to
motor learning by imitation. Phil. Trans. R. Soc. Lond. B: Biol. Sci.,
358(1431):537–547, 2003.
[30] M. Tenorth, J. Bandouch, and M. Beetz. The TUM kitchen data
set of everyday manipulation activities for motion tracking and action
recognition. In IEEE Int. Workshop on THEMIS (ICCV2009), 2009.
[31] E. Todorov. Optimality principles in sensorimotor control. Nat.
Neurosci., 7(9):907–915, 2004.
[32] E. Todorov and M.I. Jordan. Optimal feedback control as a theory of
motor coordination. Nat. Neurosci., 5(11), 2002.
[33] Y. Uno, M. Kawato, and R. Suzuki. Formation and control of optimal
trajectory in human multijoint arm movement. Biol. Cybern., 61(2):89–
101, 1989.
[34] A. Wächter and L.T. Biegler. On the implementation of an interiorpoint filter line-search algorithm for large-scale nonlinear programming.
Math. Prog., 106(1):25–57, 2006.
[35] D.M. Wolpert and Z. Ghahramani. Computational principles of movement neuroscience. Nature Neurosci., 3:1212–1217, 2000.
[36] K. Yamane, Y. Ariki, and J. Hodgins. Animating non-humanoid
characters with human motion data. In Proc. SIGGRAPH Symposium
on Computer Animation, pages 169–178, 2010.
[37] B.D. Ziebart, A. Maas, J.A. Bagnell, and A.K. Dey. Maximum
entropy inverse reinforcement learning. In Proc. AAAI Conf. Artificial
Intelligence, volume 1433–1438, 2008.