Imitating human reaching motions using physically inspired optimization principles S. Albrecht, K. Ramı́rez-Amaro, F. Ruiz-Ugalde, D. Weikersdorfer, M. Leibold, M. Ulbrich and M. Beetz [email protected], {ramirezk, ruizf, weikersd}@in.tum.de, [email protected], [email protected], [email protected] Intelligent Autonomous Systems Group, Chair of Mathematical Optimization and Institute of Automatic Control Engineering Technische Universität München, Germany Abstract— We present an end-to-end framework which equips robots with the capability to perform reaching motions in a natural human-like fashion. A markerless, high-accuracy, modelbased human motion tracker is used to observe how humans perform everyday activities in real-world scenarios. The obtained trajectories are clustered to represent different types of manipulation and reaching motions occurring in a kitchen environment. Using bilevel optimization methods a combination of physically inspired optimization principles is determined that describes the human motions best. For humanoid robots like the iCub these principles are used to compute reaching motion trajectories which are similar to human behavior and respect the individual requirements of the robotic hardware. The presented framework is the combination of several procedures. The starting point are video sequences of humans performing everyday manipulation tasks. A markerless human motion tracker is used to extract the posture of the human in each frame and then this data set is sorted into the characteristic human sub-movements by a clustering approach based on the algorithms of Dynamic Time Warping and k-mean. Given a characteristic movement, a novel bilevel optimization approach yields the one out of a given family of cost functions which reproduces the data best. For the specific dynamics of the humanoid robot’s arm an optimal trajectory for this cost function is computed (see Figure I). I. I NTRODUCTION In daily life a huge potential exists to assist and disencumber humans in fulfilling their tasks, by providing service or care robots. A difficult task in building adaptive and autonomous robots is generating task-specific robot motions which fit naturally in a human everyday environment. The desired requirements of such a system would be to generate motions based on the current task, the object type and state, while additionally considering information given through context and environment. Our contribution is a framework which is able to represent and generate motions based on an optimal combination of physically inspired principles obtained by analyzing human motions recorded in an unconstrained observation setup. In the experiments discussed here, subjects are observed while performing the everyday activity of setting the table without prior instructions. The underlying principles of the human motions are determined by a bilevel optimization approach, i.e. we identify parameters in an optimal control problem describing the motion planning, and a human-like control is generated for a humanoid robot based on these identified parameters. This approach has the important property to generate motions that fit different kinematic/dynamic chains while also considering variations in the task space goal. The generated motions are similar to human behavior as they are based on the same optimization principles. This is an important requirement in human-robot interaction scenarios where humans are expecting a certain behavior from the robot. As an example of everyday movements we focus on reaching motions, which consist in moving one hand from an initial position in the workspace to the desired goal position. Fig. 1. Framework overview The field of describing human characteristics has various facets and is growing fast. Disciplines dealing with analyzing and describing these movements range from psychology and biology over computer and engineering sciences to mathematics. Consequently, the relations and differences between all these approaches cannot be discussed within the limits of this paper; e.g. refer to reviews [12, 25, 29]. The advantages of learning from demonstration were shown for example in [3], where pole-balancing was accomplished after one trial. Two types of learning strategies in the context of robot control have to be distinguished: Motion imitation, where the observed human state is directly mapped onto the imitating robot, as in [4, 10, 11, 14, 19], and imitation learning, where characteristics found in the demonstrated motions are used to control the robot. Two central techniques in imitation learning are motor primitives, e.g. [21, 24], and Markov decision problems, e.g. [8, 20, 26]. The basic assumption of such Markov decision problems is that the set of states is finite and that state transition probabilities are known. Under these assumptions the techniques of inverse reinforcement learning can solve the inverse problem of finding the cost function applied in the demonstrations [1, 37]. Another approach based on the dynamics of the human body and using optimization of a new cost function was presented in [18]. Their goal is to describe the human motion based on muscle dynamics rather than to generate a robot control. If the focus is shifted from finding the underlying optimality principle of demonstrated human motions to mapping a given demonstrated posture to a system with a different kinematic and dynamic structure, a task is obtained which is similar to animating characters, e.g. [36]. The task of mapping human postures to a robot considering the dynamics and kinematic bounds was presented in [28]. The paper is organized as follows: In section II the tracking procedure is discussed and the proposed clustering strategy is presented in section III. The bilevel approach is summarized in section IV, followed by the description of the transfer to the iCub robot in section V. Note that the results of the discussed examples are distributed to the respective sections of this paper. II. H UMAN M OTION T RACKING model which can be adapted to the specific subject performing the kitchen activities [5]. To be able to accurately track in the high-dimensional joint state space of the human model, the tracker uses a mixture of annealed and partitioned particle filtering [6]. The human motion data used in this paper was recorded and published as part of the TUM Kitchen Data Set [30]. The database contains data for several different subjects which executed everyday kitchen tasks like setting the table without specific instructions (see Figure 3). Fig. 3. Scenes from the TUM kitchen data set III. C LUSTERING H UMAN M OVEMENTS The reaching trajectories available in the TUM Kitchen Data Set show a high variance due to different object types and relative object positions. In order to correctly identify motion principles we first cluster all available reaching motions into subtypes using a clustering technique based on Dynamic Time Warping (DTW) and k-means (compare Figure 4). In order to build detailed and reliable models of human behavior in everyday scenarios a high-accuracy and versatile human motion tracker is required. Industry-leading trackers based on markers attached to the human limbs or requiring special suits are not an option when recording in an real-world environment like the assistive kitchen [7]. For our recordings we choose a markerless high-accuracy and model-based tracker which uses the industry-leading RAMSIS model. It provides 63 degrees of freedom in the joint space (see Figure 2) and has a highly parameterizable skin Fig. 4. Fig. 2. The full joint space of the RAMSIS human model Overview of the steps to cluster the trajectories The DTW [17] algorithm has earned its popularity by being extremely efficient as a similarity measure of time series which minimizes the effects of shifting and distortion in time by allowing ‘elastic’ transformation of time series in order to detect similar shapes with different phases. To align two time series X = (xi )1≤i≤N and Y = (yj )1≤j≤M , xi , yj ∈ R3 , N, M ∈ N, the algorithm starts by building the distance matrix C = (ci,j ) ∈ RN ×M , where ci,j := d(xi , yj ) = kxi − yj k is the Euclidean distance. Once the local cost matrix C is built, DTW finds the alignment path that runs through the low-cost areas of the cost matrix. This path can be found using dynamic programming by evaluating the following recurrence for γ(i, j): γ(i, j) = ci,j+min {γ(i−1, j −1), γ(i−1, j), γ(i, j −1)} (1) Using the metric on the trajectory space obtained through DTW, the k-means algorithm can now be used to cluster the reaching trajectories. By varying the cluster count, we can identify different types of reaching motions. The results obtained from the above procedure are shown in Figure 5. The trajectories where obtained by taking the Cartesian positions of the hand of the right arm (see HAR in Figure 2). We found several main clusters which are differentiated by object type and relative object position. It can be observed that these sequences show stereotypical and pre-planned motion patterns [35]. Fig. 5. Trajectory clusters for wrist motions in the TUM Kitchen Data Set For the optimization step we used arm movements of three clusters of totally different motion types: Reaching for a cup in an upper kitchen unit (cluster 1), for a placemat on top of the oven (cluster 2) and for a spoon inside a drawer (cluster 7). This choice was made to show that our approach can be used for different kinds of human arm motions. IV. O PTIMIZATION - BASED A NALYSIS A common assumption of many approaches analyzing human motion is that humans try to minimize an unknown cost function while doing everyday manipulation tasks. This assumption also forms the basis for the strategy discussed here. It has to be noticed that the main purpose of this approach is to describe human behavior based on physical models rather than to explain it from a biological or psychological perspective [12]. Various cost functions have been proposed in literature and most of them are used within an open-loop approach. Closedloop control [15, 32] allows for correction of deviations due to external disturbance or noise in the human muscles, but the computation of such a control law is complicated for reallife robotic systems [31]. No learning or adaption processes are analyzed here, consequently the human motion planning is described as an open-loop approach. In [13] the minimum jerk hypothesis is introduced. Being based purely on kinematics, this criterion postulates that the trajectory of the human hand can be determined by minimizing jerk, the third time-derivative of the hand position. Some characteristics of planar arm motions can be explained by this criterion [12], while other observations cannot [33]. Consequently, several other cost functions were proposed since then, e.g. the minimum torque change criterion [33] accounting for the dynamics of the human arm. The step from planar motions to full three-dimensional arm movements further complicates the task of describing human motions [16]. To the best of our knowledge all basic cost function presented in literature so far can only describe a limited range of movements. Thus, we will consider a convex combination of cost functions and adapt the weighting factors of the considered costs for diverse tasks. The question arises which choice of weights can describe an observed human motion, leading to the problem of finding the cost function out of a given family which approximates the given data best. From a mathematical point of view this is a bilevel optimization problem, i.e. a combination of two optimization problems [9]. In the lower level problem (llp) a specified cost function f is minimized with respect to the given arm dynamics and the boundary conditions describing the analyzed motion task, i.e. minimization of f (x) subject to h(x) = 0. The vector x is a concatenation of the values of the joint angles and their derivatives and the values of the muscle forces and their derivatives. Denote the considered basic cost functions by fi , then the llp cost function being the weighted combination of these is given by X f (x|w) := wi fi (x). (2) i Hence, the llp is a standard problem of optimal control with the optimal solution x∗ (w) for a fixed weight vector w: min f (x|w) subject to h(x) = 0. x (3) The dynamics of the human arm are modeled by connecting rigid body dynamics of the links with a simple dynamical model of the muscle characteristics. The arm model possesses two joints (shoulder and elbow) having five degrees of freedom and seven lumped muscle pairs altogether. The resulting ordinary differential equation is discretized by using a singlestep method to obtain the constraints of the type h(x) = 0. The goal of our approach is now to find the weight vector w minimizing the deviations between recorded human data and the quantities corresponding to the solution of the llp. We choose to compare the two by their hand positions in Cartesian space. Consequently, a function pcomp is introduced which returns the hand positions belonging to the llp solution x∗ (w) and pdata are the recorded human hand positions. The -0.4 -0.3 -0.2 -0.1 0 0.1 -0.4 -0.3 -0.2 -0.1 0 0.1 -0.4 0.2 0.2 0.2 0.1 0.1 0.1 0 0 0 -0.1 -0.1 -0.1 -0.2 -0.2 -0.2 -0.3 -0.3 -0.3 -0.4 -0.4 -0.4 (a) cup motion Fig. 6. (b) placemat motion -0.3 -0.2 -0.1 0 0.1 (c) spoon motion Comparison of tracked wrist and elbow positions (black/grey) and trajectories computed from the optimal combination of cost functions (blue) (units in meter) [projection to the x-z-plane] applied distance measure dist uses points of equal paths length. Summing up, the following optimization problem is obtained, the so-called upper level problem (ulp): min dist(pdata , pcomp (x∗ (w)))2 w (4) subject to X wi = 1 and x∗ (w) solves llp for w. (5) i The combination of the lower and the upper level problem is a bilevel optimal control problem. A solution strategy handling the llp and the ulp separately was presented in [23] for macroscopic lower level dynamics. The llp dynamics of the problem solved here are more complex due to the nonlinearities in arm and muscle dynamics. The strategy presented here is based on a reformulation of the bilevel problem using the first order necessary conditions (KKT-conditions) for an optimum of the llp: 0 = ∇f (x|w) + ∇h(x)λ, (6) where λ are the adjoint variables. This equation together with h(x) = 0 can be used to transform the bilevel problem into a standard nonlinear optimization problem: min dist(pdata , pcomp (x))2 w,x,λ (7) subject to 0 = ∇f (x, w) + ∇h(x)λ, (8) 0 = (9) 1 = h(x), X wi , (10) i 0 ≤ wi . (11) Note that each solution of the bilevel problem solves this reformulation, but the reverse implication does not always hold, because the KKT-conditions are only necessary. The reformulated problem is solved by using the optimization software IPOPT [34], an interior-point algorithm for large-scale nonlinear programming. The mathematical details are out of the scope of this paper, but can be found in [2]. Therein we discuss among other things the details of the problem structure and the suitability of our approach for the given task. Furthermore, we show that a wide range of values can be used to initialize the weights wi . In the bilevel optimization of the three stereotypic arm movements recorded in the kitchen (compare section III) we use the generalizations to three dimensions of some common cost functions: Minimization of hand jerk [13], joint jerk [27] and torque change [33] and additionally, we consider variants like planar hand jerk based only on the jerk of the x- and y-component. Here the optimization results for three motions (reaching for a cup in an upper kitchen unit, reaching for a placemat on top of the oven and picking up a spoon inside a drawer) are presented. The recorded trajectories of human wrist and elbow positions and the corresponding optimal results of the bilevel optimization are displayed in Figure 6. It can be observed that it is possible to find weight vectors w such that the computed hand paths get close to the recorded ones. The optimization results show that no single criterion alone can correctly reproduce the trajectories. Figure 7 shows the root of the values of the upper level cost function, which is the root mean squared deviation (RMSD) between the recorded data and the computed trajectory, for some of the used basic cost functions and the optimal combination. RMSD @mmD 1.0 0.8 0.6 0.4 0.2 RMSD @mmD 1.0 0.8 0.6 0.4 0.2 (a) cup motion Fig. 7. (b) placemat motion Joint Jerk Hand Jerk Optimal Combination (c) spoon motion RMSD between observed and optimal trajectories for some basic llp cost functions and the optimal combination V. T RANSFER TO THE ROBOT We use the iCub [22], a 53 degrees of freedom humanoid robot, to transfer and execute the human motions. The strong humanoid design of the iCub gives us an appropriate testing platform to show similarities between the original human motions and the human-based optimized robot motions. Strong anatomical human-robot similarities can be appreciated on the shoulder and elbow joints, which are precisely the joints we are concentrating on. Using the cost function found by the bilevel optimization for a characteristic movement in the kitchen scenario, the kinematics, dynamics, and joint limits of the iCub, and selectable initial and final Cartesian positions, an optimal control problem similar to the llp mentioned above is solved to generate the controls for the iCub. The obtained motion is therefore human-like with respect to the optimization principles used by the human for that specific action (human to robot optimization transfer). Additionally it is also adjustable to any desired initial state and final goal within the workspace of the robot. Therefore the robot is able to execute the same type of intentional action but with the flexibility of choosing an initial state and final goal to fit the current situation. z @mD 0.1 Torque Change RMSD @mmD 1.0 0.8 0.6 0.4 0.2 For evaluation purposes in our experiments we chose final and initial states for the robot similar to the ones from the human but scaled down in order to fit inside the workspace of the robot. Execution time was also scaled to fulfill the velocity restrictions of the robot. It is important to note that our approach does not copy human movements directly but generates trajectories for the robot’s arm based on a combination of dynamic principles used by the human. Additionally, in our case the more restrictive hardware constraints in the kinematic chain of the robot are met (Figure 8). The generated optimal motions are angle trajectories that are fed to the robot at regular intervals and executed using an independent P controller q̇ = kp (qref − q) for each joint. This controller introduces a delay and the robot itself has a motor velocity controller that introduces other types of transformations to the signals. Figure 9 shows the human and the iCub robot executing the reaching motion for the placemat from a similar viewpoint. The elbow of the robot has to guarantee a certain distance to the torso in order to avoid self-collision. This limits the maximal similarity between robot and human motion, but on the other hand it shows that the difficulties arising in the context of transferring control strategies can be handled by our approach. 0 -0.2 y @mD -0.1 0.2 0.1 x @mD 0 0 Fig. 8. Comparison of robot wrist trajectory (red) and human wrist trajectory (black) in robot working coordinates relative to the starting position for three different motions (left: spoon, middle: cup, right: placemat) Fig. 9. Human and iCub executing a reaching motion for a placemat VI. C ONCLUSION AND F UTURE W ORK This paper discusses a framework to record and analyze human arm motions in an environment of everyday-life, in our case a kitchen scenario, and then to use this knowledge to control a humanoid in a human-like way. Various steps are needed within this framework: Starting with tracking the human motions and then clustering into the relevant sub-movements, data is assembled which clearly shows the stereotypical human motions. A bilevel optimization approach is then used to analyze this data and obtain a cost function describing the human arm movement best, considering the nonlinear dynamics of the human arm. In the last step, this cost function is used to generate optimal trajectories for a humanoid robot doing the same task as the human did before. The main contribution of our approach is to use physically inspired cost functions, which optimally describe observed human arm motions, to control a robot not by copying the observed trajectories of the human, but by minimizing the same cost function for the robot while maintaining the ability to change the initial position and the final goal. Because the robot control is based on optimizing a cost function that is likely to be meaningful and important for the execution of the current activity from the intention perspective, this system has the potential to be connected in a convenient way to high level reasoning and planning. This high level system could select the appropriate cost function parameters to fit the current desired intention of the action. VII. ACKNOWLEDGMENTS This work is in part funded by the CoTeSys cluster of excellence “Cognition for Technical Systems” within the Excellence Initiative of the DFG. Part of the numerical computations were performed on a Linux cluster supported by DFG grant INST 95/919-1 FUGG. Additionally, K. Ramı́rez-Amaro is supported by a CONACYT-DAAD scholarship. R EFERENCES [1] P. Abbeel and A.Y. Ng. Apprenticeship learning via inverse reinforcement learning. In Proc. Int. Conf. on Machine Learning, 2004. [2] S. Albrecht, M. Leibold, and M. Ulbrich. A bilevel optimization approach to obtain optimal cost functions for human arm-movements. Numerical Algebra, Control and Optimization (NACO), accepted, 2011. [3] C.G. Atkeson and S. Schaal. Learning tasks from a single demonstration. In IEEE ICRA, pages 1706–1712, 1997. [4] C.G. Atkeson and S. Schaal. Robot learning from demonstration. In Proc. Int. Conf. on Machine Learning, pages 12–20, 1997. [5] J. Bandouch, F. Engstler, and M. Beetz. Accurate human motion capture using an ergonomics-based anthropometric human model. AMDO 2008, LNCS, 5098:248–258, 2008. [6] J. Bandouch, F. Engstler, and M. Beetz. Evaluation of hierarchical sampling strategies in 3d human pose estimation. In 19th British Machine Vision Conf., 2008. [7] M. Beetz et al. The assistive kitchen - a demonstration scenario for cognitive technical systems. In IEEE ROMAN, 2008. [8] S. Calinon and A. Billard. Incremental learning of gestures by imitation in a humanoid robot. In ACM/IEEE HRI, pages 255–262, 2007. [9] S. Dempe. Foundations of bilevel programming. Kluwer Academic Publishers, 2002. [10] M. Do, P. Azad, T. Asfour, and R. Dillmann. Imitation of human motion on a humanoid robot using non-linear optimization. In IEEERAS Humanoids, pages 545–552, 2008. [11] C. Dube and J. Tapson. Kinematics design and human motion transfer for a humanoid service robot arm. In Rob. Mech. Symp., 2009. [12] S.E. Engelbrecht. Minimum principles in motor control. J. Mathematical Psychology, 45(3):497–542, 2001. [13] T. Flash and N. Hogan. The coordination of arm movements: an experimentally confirmed mathematical model. J. Neurosci., 5(7), 1985. [14] E. Gribovskaya, S.M. Khansari-Zadeh, and A. Billard. Learning nonlinear multivariate dynamics of motion in robotic manipulators. J. Robotics Res., 30(1):80–117, 2011. [15] C.M. Harris and D.M. Wolpert. Signal-dependent noise determines motor planning. Nature, 394:780–784, 1998. [16] F. Hermens and S. Gielen. Posture-based or trajectory-based movement planning: a comparison of direct and indirect pointing movements. Exp. Brain Res., 159(3):340–348, 2004. [17] E. Keogh and C.A. Ratanamahatana. Exact indexing of dynamic time warping. Knowledge and information systems, 7(3):358–386, 2005. [18] O. Khatib, E. Demircan, V. De Sapio, L. Sentis, T. Besier, and S. Delp. Robotics-based synthesis of human motion. J. Physiology Paris, 103(35):211–219, 2009. [19] C.H. Kim, D. Kim, and Y. Oh. Adaptation of human motion capture data to humanoid robots for motion imitation using optimization. Integrated Computer-Aided Engineering, 13(4):377–389, 2006. [20] D. Lee, H. Kunori, and Y. Nakamura. Association of whole body motion from tool knowledge for humanoid robots. In IEEE/RSJ IROS, pages 2867–2874, 2008. [21] M.J. Mataric. Getting humanoids to move and imitate. Intelligent Systems and their Applications, 15(4):18–24, 2002. [22] G. Metta, G. Sandini, D. Vernon, L. Natale, and F. Nori. The icub humanoid robot: an open platform for research in embodied cognition. In Workshop on Performance Metrics for Intelligent Systems, pages 50– 56, 2008. [23] K. Mombaur, A. Truong, and J.-P. Laumond. From human to humanoid locomotionan inverse optimal control approach. Auton. Robot., 28(3):369–383, 2010. [24] S. Nakaoka, A. Nakazawa, F. Kanehiro, K. Kaneko, M. Morisawa, and K. Ikeuchi. Task model of lower body motion for a biped humanoid robot to imitate human dances. In IEEE IROS, pages 3157–3162, 2005. [25] D.J. Ostry and A.G. Feldman. A critical evaluation of the force control hypothesis in motor control. Exp. Brain Res., 153(3):275–288, 2003. [26] N.D. Ratliff, J.A. Bagnell, and M.A. Zinkevich. Maximum margin planning. In Proc. Int. Conf. on Machine learning, pages 729–736, 2006. [27] D.A. Rosenbaum, L.D. Loukopoulos, R.G.J. Meulenbroek, J. Vaughan, and S.E. Engelbrecht. Planning reaches by evaluating stored postures. Psychological Review, 102(1):28–67, 1995. [28] A. Safonova, N.S. Pollard, and J.K. Hodgins. Optimizing human motion for the control of a humanoid robot. Proc. Applied Mathematics and Applications of Mathematics, 2003. [29] S. Schaal, A. Ijspeert, and A. Billard. Computational approaches to motor learning by imitation. Phil. Trans. R. Soc. Lond. B: Biol. Sci., 358(1431):537–547, 2003. [30] M. Tenorth, J. Bandouch, and M. Beetz. The TUM kitchen data set of everyday manipulation activities for motion tracking and action recognition. In IEEE Int. Workshop on THEMIS (ICCV2009), 2009. [31] E. Todorov. Optimality principles in sensorimotor control. Nat. Neurosci., 7(9):907–915, 2004. [32] E. Todorov and M.I. Jordan. Optimal feedback control as a theory of motor coordination. Nat. Neurosci., 5(11), 2002. [33] Y. Uno, M. Kawato, and R. Suzuki. Formation and control of optimal trajectory in human multijoint arm movement. Biol. Cybern., 61(2):89– 101, 1989. [34] A. Wächter and L.T. Biegler. On the implementation of an interiorpoint filter line-search algorithm for large-scale nonlinear programming. Math. Prog., 106(1):25–57, 2006. [35] D.M. Wolpert and Z. Ghahramani. Computational principles of movement neuroscience. Nature Neurosci., 3:1212–1217, 2000. [36] K. Yamane, Y. Ariki, and J. Hodgins. Animating non-humanoid characters with human motion data. In Proc. SIGGRAPH Symposium on Computer Animation, pages 169–178, 2010. [37] B.D. Ziebart, A. Maas, J.A. Bagnell, and A.K. Dey. Maximum entropy inverse reinforcement learning. In Proc. AAAI Conf. Artificial Intelligence, volume 1433–1438, 2008.
© Copyright 2026 Paperzz