Dynamical System Modulation for Robot Learning via Kinesthetic

Dynamical System Modulation for Robot Learning
via Kinesthetic Demonstrations
Micha Hersch, Florent Guenter, Sylvain Calinon, and Aude Billard
Learning Algorithms and Systems Laboratory - LASA
School of Engineering - EPFL
Station 9 - 1015 Lausanne - Switzerland
Abstract
We present a system for robust robot skill acquisition from kinesthetic demonstrations. This system allows a robot to learn
a simple goal-directed gesture, and correctly reproduce it despite changes in the initial conditions, and perturbations in the
environment. It combines a dynamical system control approach with tools of statistical learning theory and provides a solution to
the inverse kinematics problem, when dealing with a redundant manipulator. The system is validated on two experiments involving
a humanoid robot: putting an object into a box, and reaching for and grasping an object.
Index Terms
Robot Programming by Demonstration, Dynamical System Control, Gaussian Mixture Regression
Corresponding author: Micha Hersch, [email protected]
Conditionally accepted short paper submitted to the IEEE Transactions on Robotics
Paper # A07-471/A06-417
This work was supported by the European Commission through the EU Integrated Projects ROBOT-CUB (FP6-IST-004370), Cogniron (FP6-IST-FET002020) and ROBOT@CWE (FP6-034002).
1
Dynamical System Modulation for Robot Learning
via Kinesthetic Demonstrations
I. I NTRODUCTION
A
S robots are progressively coming out of the controlled
environment of assembly lines to pervade the much
less predictable domestic environments, there is a need to
develop new kinds of controllers that can cope with changing
environments and that can be taught by unskilled human users.
In order to address this last issue, Programming by Demonstration (PbD) has emerged as a promising approach [1]. PbD
has been mostly used in two cases: for tasks involving no
or very loose interaction with the environment (like writing,
martial arts or communicative gestures) human demonstrations are used to train a movement model, which can be
used to reproduce the task. Those movement models (also
used in computer animation or visual gesture recognition)
usually imply some averaging process (LWR [2], HSTMM
[3]), possibly in a latent space (GPLVM [4], ST-Isomap [5])
or some probabilistic model like Bayesian Networks [6]. And
for more complex tasks, involving precise interactions with the
environment, the robot learns from examples how to sequence
a set of hard-coded controllers for a given task. This has been
done using HMMs [7] or knowledge-based systems [8].
In our work, we position ourselves in between those two
approaches and combine learning of a task-dependent modulation of a built-in controller. We, thus, start with a basic
built-in controller (or motion primitive) that consists in a
dynamical system with a single stable attractor. We modulate
the trajectories generated by this controller to be task-specific,
by learning a probabilistic model of the task-based trajectory,
as shown by a human user. This results in a general framework
for learning and reproducing goal-directed gestures, despite
different initial conditions and changes occurring during task
execution. In this respect, we improve in several ways classical
control approaches for goal-directed motions, such as [9].
The closest work to ours is [2], which uses a dynamical
system for goal-directed reaching. We depart from this work
in two ways: First, we propose a hybrid controller composed
of two of our basic dynamical systems working concurrently
in end-effector and joint angle spaces. This results in a
controller that has no singularities. Second, the dynamical
system approach gives us a controller robust in the face of
perturbations, which can recompute the trajectory on-line to
adapt to sudden displacements of the target or unexpected
motion of the arm during motion, and we provide experimental
results on the robustness to static and dynamic changes in the
environment. While our controller is less precise than ad-hoc
controller (e.g. [10]), it is more general in that it can be easily
modulated to achieve arbitrary goal-directed reaching tasks.
In the experiments presented here, the motions are demonstrated to the robot by a human user moving the robots’ limbs
passively (kinesthetic training). In Section IV, we validate the
approach on two different tasks, namely placing an object
into a box, and reaching-to-grasp a chess piece, see Fig. 2
for illustrations of these two tasks.
II. OVERVIEW
The system is designed to enable a robot to learn to
modulate its generic controller to produce any arbitrary goaldirected motion. The model must be generic so as to reproduce the motion given different initial conditions and under
perturbations during execution. Moreover, the architecture of
the system must permit the use of different control variables
for encoding the motion. Here, we compare a control in
either velocity or acceleration. We refer to those further as the
velocity model (see Section II-B) and the acceleration model
(see Section II-C).
A. System Architecture
The structure of the system is the same for both models
and is schematized in Fig. 1. During training, the relevant
variables (end-effector velocity profiles for the velocity model,
or end-effector positions, velocities and accelerations for the
acceleration model) are extracted from the set of demonstrated
trajectories and used to train a Gaussian Mixture Model
(GMM) (see Table I). During reproduction, the trajectory is
specified by a spring-and-damper dynamical system modulated
by the GMM (see section III). The target is tracked by a
stereo-vision system and is set to be the attractor point of
the dynamical system. At each time step, the desired velocity
computed by the model is then fed to a PID controller for
execution. This does not hinder the online adaptation of the
movement.
B. Velocity Model
The first way to encode a motion in a GMM, is to consider
the velocity profile of the end-effector as a function of time
. Thus, the input variable is the time and the output
variable is the velocity, like in the following velocity model:
(2)
In other words, the movement is modeled as a velocity profile,
given by a function of time, which is learned as described
is the endin Table I. Here and henceforth, effector velocity specified by the task model. is obtained
by applying (1) with the appropriate variables.
2
TABLE I
S UMMARY OF G AUSSIAN M IXTURE R EGRESSION (GMR).
!#"
GMR is a method suggested by [11] for statistically estimating a function
given by a “training set” of
examples
, where
is a noisy
measurement of
:
$
34!#&"
9,&
% !#&('*)+&,",-/&!. 021
) &65 ! & "87:9 &
)/&
( is the Gaussian noise). The idea is to model the joint distribution of the “input”
variable and an “output” variable as a Gaussian Mixture Model. If we join those
variables in a vector
, it is possible to model its probability density
function as a mixture of
Gaussian functions
)
; 5=< 4>2)#>@?A>
B
F
C !;" 5ED H GJILK ;NMPO G 'PQ G "R'TSUWV,XZY*X\[+Y F D H G 5]
G 021
G 021
where the H G_^ < `a] ? are the priors, and I !;NMRO G 'PQ G " is a Gaussian function
with mean O G and covariance matrix Q G :
ILK ;bMPO G 'PQ G " 5 K !c H "*de Q G e f4g hj1 iRkl K@m ] !; m O G " > QZGg 1 !; m O G "f'
c
where n is the dimensionality of the vector ; . The mean vectors O G and covariance
matrices Q G can be separated into their respective input and output components:
target tracking
(stereovision)
kinesthetic
demonstrations
PSfrag replacements
{}| ˆ‰x€ Š,‰x€ ‹‰Aƒ „‚Œ‰ †R‡
–
GMM training
training
”
GMR
(2,4)
+Ž  
 Ž4‘ Ž+ ’ “
modulated
spring and
damper system
(11,12)
only in the case of the acceleration model
˜š— ™
execution
˜œWž ˜6— œWž Ÿ — œ
C. Acceleration Model
A second way of encoding a trajectory is to take as input the
position and velocity , and as output the acceleration . The
rationale of this is to consider a trajectory not as a function
of time, but as the realization of a second-order dynamical
system of the form:
_ ¡ *£¢ j¤
Q 4G G4o op Q ts
The Gaussian Mixture Model (GMM) is trained using a standard E-M algorithm,
taking the demonstrations as training data. The GMM computes a joint probability
density function for the input and the output, so that the probability of the output
conditioned on the input are GMM. Hence, it is possible, after training, to recover
the expected output variable , given the observed input variable .
)u
F
) u 5 3u 4!#" 5EG D G ! #" K O G4o 37:Q G4o p QZG#g o 1 p ! m O 4G o p " f'
021wv
where the G !#" are given by:
v
G !#" 5 y H G I H Gx4MI PO G4o p 'RG4Q o p G4o p " G4o p
G 021
!MPO '(Q "bz
D
v
The tilde (u ) sign indicates that we are dealing with expectation values.
ªN « ­¬®J¯ ª ¥r°Z ªN± ¯ ª ª
where r²
Fig. 1. The architecture of the system. During training the relevant variables
(end-effector’s position, velocity and acceleration) are extracted from the
demonstrations and used to train a GMM. During task execution, this model is
used to modulate a spring-and-damper system.
is the end-effector velocity
specified by the task model.
is the target location, and
are
respectively the actual current end-effector’s position and velocity and the
joint angles’ velocities. The numbers in parentheses refer to the corresponding
equations in the text.
˜2›
Q G 5rq Q Q #G G#o o p p
(1)
to enable a (possibly redundant) robotic arm with © joints
to reproduce the task with sufficient flexibility. Although the
modulation is in end-effector space, it is advantageous (for
avoiding singularity problems related to inverse kinematics
of redundant manipulators) to consider the spring-and-damper
dynamical system in joint angle variables:
execution by
the robot
Ž•
trajectory
{}| ~A € ‚ ƒ features
„‚… †R‡
task
model
O G 5L< O >G4o p O >G4o ? >
(3)
Again,
is obtained by applying (1) with the appropriate
variables. The velocity specified by the acceleration model is
then given by
¥§¦ ¡ *£¢ j¢
(4)
where ¦ is the time integration constant (set to ¨ in this paper).
Since the position and velocity depend on the acceleration
at previous times, this representation introduces a feedback
loop, which is not present in the representation given by (2).
III. M ODULATED S PRING - AND -DAMPER S YSTEM
We now show how the task model described above is used
to modulate a spring-and-damper dynamical system in order
(5)
is the vector of joint angles (or arm configuration vector). This dynamicalª system produces straight paths
±
(in joint space) to the target , which acts as an attractor of
the system. This guarantees that the robot reaches the target
smoothly, despite possible perturbations.
The above dynamical system is modulated by the variable
given by the task model (2) or (4). In order to weigh
the modulation, we introduce a modulation factor ³ ´ < `µ] ? ,
which weighs the importance of the task model relatively to
the spring-and-damper system. If ³ ·¶ , only the spring-anddamper system is considered, and when ³ ¨ only the task
model is considered.
ª ± In order to guarantee the convergence
of the system to , ³ has to tend to zero at the end of the
movement. In the experiments described here, ³ is given by:
³ ¸ T¬j¹º¯=³ » ¯ ¼ ¨ ½¬ ¹ ³ ¿¾aÀAÁ/ ³ ` ¨ ¢
(6)
< `Ã] ? is a scalar.
where ³ ` is the initial value of ³ and ¬ ¹ L
Since
lives in the end-effector space (and not in the joint
space), the modulation is performed by solving the following
constrained optimization problem.
ª ÄNÅ/ÆNÇÈ
À!É
Ê
Ѥ ÒN¤
¨ ¯ ³ ª ¯ ª « ËÍÎ Ì Ê ª ¯ ª « j¥
Î
³ L¯ ÏJËÐÌ ´¯ Ï
=ÔÓ ª ¢
(7)
(8)
Ó is the Jacobian of the robot arm kinematic function
where
Õ
Î
Î
and Ì Ê ´Ö²º×6² and Ì Ø_×6 are diagonal matrices
necessary
to compensate for the different scale of the and
ª
variables.
As a rough approximation,
Î
Î the diagonal elements
of Ì are set to one and those of Ì Ê are set to the average
3
distance between the robot base and its end-effector.
The solution to this minimization problem is given by [12]:
Ù Î Ê ¥ÚÓË Î Ó½Û m ] Ù Î Ê ª « §
¥ ÓË Î ÜÛ
Î Ê ¯ ÞÎ Ì Ê ¢ Î Î Ì ¤
¨ ³
³
ª ¾aÝ\Å/Ý
(9)
(10)
To summarize, the task is performed by integrating the
following dynamical system:
¬ß¯ ª ¥§°ß ªN± ¯ ª +
(11)
ÙÎ Ê §
¥ ÓË Î Ó½Û m ] Ù Î Ê ª « ¥ÚÓË Î ÜÛ (12)
Î Î Ê
and
are given by (6) and (10), and is
Since the target location is given in cartesian coordinates,
inverse kinematics must be performed in order to obtain the
corresponding target joint angle configuration which will
serve as input of the spring-and-damper dynamical system.
In the case of a redundant manipulator (such as the robot
arm used in the following experiments) the desired redundant
parameters of the target joint angle configuration can be
extracted from the demonstrations. This is done by using the
GMR technique described in Table I to build a model of the
final arm configuration as a function of the target location.
Fig. 2. The setup of the experiments. The top pictures show the first task and
the lower picture sow the second task Left: a human operator demonstrates
a task to the robot by guiding its limbs. Right: the robot performs the task,
starting from different initial positions.
200
àá
IV. E XPERIMENTS
A. Setup
We validate and compare the systems described in this paper
on two experiments. The first experiment involves a robot
putting an object into a box and the second experiment consists
in reaching and grasping for an object. Those experiments
were chosen because (1) they can be considered as simple
goal-directed tasks (for which the system is intended), (2) they
are tasks commonly performed in human environments and (3)
they presents a clear success or failure criterion.
All the experiments presented below are performed with a
Hoap3 humanoid robot acquired from Fujitsu. This robot has
four back-drivable degrees of freedom (dof) at each arm. Thus,
the robot arms are redundant, as we do not consider endeffector orientation. The robot is endowed with a stereo-vision
system enabling it to track color blobs. A small color patch
is fixed on the box and on the object to be grasped, enabling
their 3D localization. Pictures of the setup are shown in Fig.
2.
àá
100
0
PSfrag replacements
Using an attractor system in joint angle space has the
practical advantage of reducing the usual problems related
to end-effector control, such as joint limit and singularity
avoidance. Equation 9, which is a generalized version of
the Damped Least Squares inverse [13] [14], is a way to
simultaneously control the joint angles and the end-effector,
imposing soft constraints on both of them. It is thus different
than optimizing the joint angles in the null space of the
kinematic function.
[mm]
where
given either by (2) (velocity model) or by (4) (acceleration
model). Integration
using a first-order Newton
ª « isª performed
¥r¦ ª « ).
approximation ( [mm]
ªN «
ª
0
âRã [mm]
100
200
0
300
200
100
100
Pä 150
æ [mm]
ä æ [mm]
50
äRå [mm]
250
150
50
Fig. 3. The demonstrated trajectories for the box task (left) and the grasping
task (right). Circles indicate starting positions.
1) Preprocessing: During the demonstrations, the robot
joint angles were recorded and the end-effector positions were
computed using the arm kinematic function. All recorded
trajectories were linearly normalized in time (ç éè8¶8¶ time
steps) and Gaussian-filtered to remove noise. The number of
Gaussian components for the task models were found using the
Bayesian Information Criteria (BIC) [15], and the parameter
values used were ¬ ¹ ·¶¤ ¶ëê , ¬ì·¶š¤ ¨wí and °î­¶š¤ ¶8ê .
B. Putting an object into a box
1) Description: For this task, the robot is taught to put an
object into the box (see Fig.2). In order to accomplish the
task, the robot has to avoid hitting the box while performing
the movement and must thus first reach up above the box and
then down to the box. A straight line reaching will in general
cause the robot to hit the box while reaching and thus fail.
2) Training: A set of 26 kinesthetic demonstrations were
performed, with different initial positions and box locations.
The box was placed on a little table. Thus its location only
varies in the horizontal plane. Similarly, the initial position of
the object (and thus of the end-effector) lied on the table. The
set of demonstrated trajectories is depicted in Fig. 3, left. The
velocity models trained on this data are shown in Fig. 4, left.
4
“put object in box” task
“grasp object” task
acceleration model
vertical velocities
2
2
B
ï ðñ
zd
1
1
ï ðò
ï ðó
ï ðñ
ï ðó
ï ðò
ü ýþ
0
A
0
-1
-2
0
100
200
time steps
trajectories
−1
300
−2
0
500
0
500
0
500
0
500
0
500
0
ùRû
500
200
zm
PSfrag replacements
time steps
PSfrag replacements
Fig. 4. The velocity models for both tasks. The dots represent the training
data, the ellipses the Gaussian components and the thick lines the trajectory
obtained by GMR alone. The thick lines show that, for the first task, the
and
are averaged out by the model, but the
horizontal components
vertical component
shows a marked upward movement. For the second
task, all components are almost averaged out.
õ8ô ø
õ6ô ö
[mm]
ùú
B
ù(ú
100
ùRû
B
A
0
[mm]
õëô ÷
C. Reach and Grasp
A
200
100
100
[mm]
[mm]
Fig. 5. In the center, the acceleration model for the second task. The ellipsoids
show the Gaussian components at twice their standard deviation. Only three
projections (out of nine) are shown. The vertical acceleration strongly depends
on the position in the horizontal plane. On the lower right, two trajectories
encoded by this model but starting from different positions A and B (indicated
by the crosses) are shown. The corresponding vertical velocity profiles appear
on the upper right. They differ significantly, as the model is not homogeneous
across the horizontal plane.
[mm]
[mm]
1) Description: In order to accomplish this task, the robot
has to reach and correctly place its hand to grasp a chess piece.
In other words it has to place its hand so that the chess piece
stands between its thumb and its remaining fingers, as shown
in Fig. 7, left. This figure illustrates that the approaching the
object can only be done in one of two directions: downward
180
160
160
or forward. This task is more difficult than the previous one,
140
140
as the movement is more constrained. Moreover, a higher
120
120
precision is required on the final position, since the hand is
100
100
relatively small.
80
80
60
2) Training: A set of 24 demonstrations were performed
60
40
starting from different initial positions located on the horizon40
tal plane of the table. The chess piece remained in a fixed
ÿ 120100
ÿ 140
160
140
location. Depending on the initial position, the chess PSfrag
piece
ÿ 80 100 120
replacements
120
120
[mm]
ÿ
[mm]
100
60
100
80
40
80
60
20
80
was approached either downward or forward (as illustrated
[mm]
0
[mm]
-20
on Fig. 7). The set of demonstrations is represented in Fig.
3, right. The resulting velocity model is shown in Fig. 4, Fig. 6. Left: end-effector trajectories of the robot putting the object into
right. One can notice that there is no velocity feature that the box. The thin line corresponds to the velocity model and the thick line
corresponds to the acceleration model. Right: online trajectory adaptation to a
is common to all demonstrated trajectories. The acceleration target displacement using the velocity model. The circles indicate to location
model is shown in Fig. 5. This model captures well the fact that of the box, as tracked by the stereo-vision system. The thick line shows the
the vertical acceleration component depends on the position in produced trajectory and the thin line shows the original trajectory if the box
remained unmoved. Similar results were obtained with the acceleration model.
the horizontal plane.
D. Results
Endowed with the system described above, the robot is
able to successfully perform both tasks. For the first task,
both the velocity and the acceleration models can produce
adequate trajectories (see Fig. 6, left for examples). The system
can adapt its trajectory online if the box is moved during
movement execution (see Fig. 6, right). For the second task,
examples of resulting trajectories are displayed in Fig. 7, right.
In order to evaluate the generalization abilities of the systems,
both tasks were executed from various different initial positions arbitrarily chosen on the horizontal plane of the table,
and covering the space reachable by the robot. Fig. 8 shows the
results and starting positions for both experiments. For the box
experiment (left), the velocity model was successful for 22 out
of the 24 starting locations (91%). The two unsuccessful trials,
indicated by empty circles, correspond to initial positions close
to the work space boundaries. The acceleration model was
successful for all trials (100%).
For the chess piece experiment (Fig. 8, right), the velocity
model was successful for 5 out of 21 (24%) trials whereas the
acceleration model was successful for 18 trials (86%). This
performance gap is due to the fact that this task does not
require a fixed velocity modulation. The adequate modulation
depends on the position. This position-dependent modulation
can be captured by the acceleration model, but not by the
velocity model. As illustrated in Fig. 5, the acceleration model
is able to produce different velocity profiles, depending on the
starting position and is thus more versatile than the velocity
model.
5
[mm]
120
80
40
PSfrag replacements
180
140
120
[mm]
80
100 [mm]
60
Fig. 7. Left: the chess piece to be grasped. For a successful grasp, the robot
has to approach it as indicated by the arrows. Right: resulting trajectories for
the grasping task, starting from two different initial positions. The acceleration
model (thick lines) adapts the modulation to the initial position, while the
velocity model (thin lines) starts upward in both cases. The trajectory produced
by the velocity model and starting left of the target is not successful.
success:
vel. model: 91%
acc. model: 100%
chess piece
BOX
120
[mm]
[mm]
success:
vel. model: 24%
acc. model: 86%
R EFERENCES
150
160
80
40
100
50
0
ROBOT
ROBOT
0
PSfrag replacements
50
100
150
−50
200
[mm]
250
300
first step in this direction has been taken in [16], where a
balance between different sets of variables is achieved.
Of course, the adequacy of this framework is restricted
to relatively simple tasks, such as those described in the
experiments. More complicated tasks, such as obstacle
avoidance in complex environments or stable grasping of
particular objects require a detailed model of the environment
and more elaborate planning techniques. The tasks considered
for this framework are those that cannot be accomplished
by simple point-to-point reaching, but still simple enough
to avoid the complete knowledge of the environment. But
this framework could be extended to learn more complicated
tasks. In a first step in this direction, [17] investigates in
simulation and on a simplified framework how Reinforcement
Learning can deal with obstacle avoidance.
50
100
150
200
[mm]
250
300
Fig. 8. The robustness to initial end-effector position for both tasks. The
plots represent top views of the first (right) and second (left) experiment. The
filled markers (circles or squares) indicate all initial positions for which the
velocity model was successful. The circles (filled and non-filled) indicate all
initial positions for which the acceleration model was successful. The crosses
indicate initial end-effector position, for which both models failed. The dots
indicate the starting positions of the training set.
V. D ISCUSSION
Our results show that the framework suggested in this
paper can enable a robot to learn constrained reaching
tasks from kinesthetic demonstrations, and generalize them
to different initial conditions. Using a dynamical system
approach allows to deal with perturbations occurring during
the task execution. This framework can be used with various
task models and has been tested for two of them, the velocity
model and the acceleration model. The results indicate that the
velocity model is too simplistic if the task requires different
velocity profiles when starting from different positions in
the workspace. The acceleration model is more sophisticated
and can model more constrained movements, but may fail to
provide an adequate trajectory when brought away from the
demonstrations in the phase space *£¢ j . Other regressions
techniques, such as LWR, could also be used. But if there are
inconstancies across demonstrations, simple averaging may
fail to provide adequate solutions.
In its present form, the modulation factor between the
dynamical system and the task model (³ ) is not learned.
Learning it from the demonstrations is likely to further
improve the performance of the system, especially for tasks
requiring a modulation at the end of the movement. It would
also be desirable to have a system that extracts the relevant
variables, and automatically selects the adequate model. A
[1] A. Billard and R. Siegwart, Eds., Robotics and Autonomous Systems,
Special Issue: Robot Learning From Demonstration. Elsevier, 2004,
vol. 47, no. 2,3.
[2] A. Ijspeert, J. Nakanishi, and S. Schaal, “Movement imitation with
nonlinear dynamical systems in humanoid robots,” in Proceedings of
the IEEE International Conference on Robotics and Automation, 2002,
pp. 1398–1403.
[3] W. Ilg, G. Bakir, J. Mezger, and M. Giese, “On the representation,
learning and transfer of spatio-temporal movement characteristics,”
International Journal of Humanoid Robotics, pp. 613–636, 2004.
[4] A. Shon, J. Storz, and R. Rao, “Towards a real-time bayesian imitation
system for a humanoid robot,” in Proceedings of the IEEE International
Conference on Robotics and Automation, 2007, pp. 2847–2852.
[5] O. Jenkins, G. González, and M. Loper, “Tracking human motion and
actions for interactive robots,” in Proceedings of the Conference on
Human-Robot Interaction (HRI07), 2007, pp. 365–372.
[6] D. Grimes, D. Rashid, and R. Rao, “Learning nonparametric models for
probabilistic imitation,” in Advances in Neural Information Processing
Systems (NIPS 06), 2006.
[7] K. Ogawara, J. Takamatsu, H. Kimura, and K. Ikeuchi, “Extraction of
essential interactions through multiple observations of human demonstrations,” IEEE Trans. Ind. Electron., vol. 50, pp. 667–675, 2003.
[8] R. Dillmann, “Teaching and learning of robot tasks via observation of
human performance,” Robotics and Autonomous Systems, no. 2,3, pp.
109–116, 2004.
[9] C. Campbell, R. Peters, R. Bodenheimer, W. Bluethmann, E. Huber, and
R. Ambrose, “Superpositioning of behaviors learned through teleoperation,” IEEE Transactions on Robotics, 2006.
[10] R. Burridge, A. Rizzi, and D. Koditschek, “Sequential composition
of dynamically dexterous robot behaviors,” International Journal of
Robotics Research, 1999.
[11] Z. Ghahramani and M. Jordan, “Supervised learning from incomplete
data via an em approach,” in Advances in Neural Information Processing
Systems 6, J. Cowan, G. Tesauro, and J.Alspector, Eds.
Morgan
Kaufmann Publishers, 1994.
[12] A. Billard, S. Calinon, and F. Guenter, “Discriminative and adaptive
imitation in uni-manual and bi-manual tasks,” Robotics and Autonomous
Systems, vol. 54, no. 5, pp. 370–384, 2006.
[13] C. Wampler, “Manipulator inverse kinematic solutions based on vector
formulations and damped least-squares methods,” IEEE Transactions on
Systems, Man and Cybernetics, Part C, vol. 16, no. 1, pp. 93–101, 1986.
[14] Y. Nakamura and H. Hanafusa, “Inverse kinematics solutions with
singularity robustness for robot manipulator control,” ASME Journal of
Dynamic Systems, Measurement, and Control, vol. 108, pp. 163–171,
1986.
[15] G. Schwarz, “Estimating the dimension of a model,” Annals of Statistics,
vol. 6, 1978.
[16] S. Calinon, F. Guenter, and A. Billard, “On learning, representing and
generalizing a task in a humanoid robot,” IEEE Trans. Syst., Man,
Cybern. B, vol. 37, no. 2, pp. 286–298, 2007.
[17] F. Guenter, M. Hersch, S. Calinon, and A. Billard, “Reinforcement
learning for imitating constrained reaching movements,” RSJ Advanced
Robotics, vol. 21, no. 13, pp. 1521–1544, 2007.