Incremental Motion Learning with Gaussian Process Modulated

Incremental Motion Learning with Gaussian Process
Modulated Dynamical Systems
Klas Kronander, Mohammad Khansari and Aude Billard
Learning Algorithms and Systems Laboratory
Ecole Polytechnique Federale de Lausanne, Switzerland
{klas.kronander,mohammad.khansari,aude.billard}@epfl.ch
Abstract—Dynamical Systems (DS) for robot motion modeling
are well-suited for efficient robot learning and control. Our focus
in this extended abstract is on autonomous dynamical systems,
which represent a motion plan completely without dependency
on time. We develop a method that allows to locally reshape
an existing, stable autonomous DS without risking introduction
of additional equilibrium points or unstable behavior. This is
achieved by locally applying rotations and scalings to the original
dynamics. Gaussian Processes are then used to incrementally
learn reshaped dynamical systems. We briefly report on preliminary results from applying the proposed methodology for
learning 2d hand-writing motions.
I. I NTRODUCTION
A set of preprogrammed behaviors is insufficient for a truly
versatile robot. Alternative solutions should hence be sought to
endow robots with the capability to learn tasks, both supported
by a teacher (Learning from Demonstration) and on its own
(Reinforcement Learning). In both cases, Dynamical Systems
(DS) have emerged as one of the most general and flexible
ways of representing motion plans for robots.
In this work, we explore incremental learning in autonomous dynamical systems. Most currently existing DS
representations are not ideally suited for this purpose, as
they either ensure stability through a phase variable1 [1] or
impose stability constraints which can be difficult to satisfy
in an incremental learning setting [2]. We propose a new DS
representation based on locally applying rotations and scalings
to a dynamical system with known stability properties. This
approach allows representation of very complex trajectories
without risking the introduction of spurious attractor points
or unstable behavior. In order to learn from incremental
demonstrations, we use Gaussian Processes to encode the
variations of the parameter vector determining how the original
dynamics should be rotated and scaled in different parts of
the state space. We will refer to our framework as Gaussian
Process Modulated Dynamical Systems (GP-MDS). Like any
GP-based method, GP-MDS suffers computationally as the
training set grows. To deal with this, we propose a novel
trajectory-based heuristic for managing sparsity of the training
data in GP-MDS.
This extended abstract consists of a description of the GPMDS architechture, and briefly presents preliminary results
from applying GP-MDS to learning 2d hand-writing motions.
1 The use of an external phase variable for driving the DS forward in practice
means that the system is not autonomous.
II. A PPROACH
N
Let x ∈ R represent a N -dimensional kinematic variable,
e.g. a Cartesian position vector. Let a continuous function f :
RN 7→ RN represent a dynamical system:
ẋ = f (x)
(1)
In the remainder of this document, it will be assumed that f
has a single attractor, which without loss of generality can be
placed at the origin. We will refer to Eq. (1) as the original
dynamics.
A. Locally Modulated Dynamical Systems
The goal of this work is to locally modify the original
dynamics. This is achieved by introducing local modulation,
resulting in a system termed reshaped dynamics:
ẋ = g(x) = M (x)f (x)
(2)
where M (x) ∈ RN ×N is a continuous matrix valued function
that modulates the original dynamics f (x) by rotation and
speed-scaling:
M (x) = (1 + κ(x))R(x)
(3)
Here, κ(x) is a continuous state-dependent scalar function
strictly larger than −1 and (x) ∈ RN ×N is a state-dependent
rotation matrix. Note that since M (x) has full rank, the
reshaped dynamics will have the same equilibrium points as
the original dynamics. Moreover, if M is chosen such that it
is locally active2 , then the reshaped dynamics are bounded.
In order to learn reshaped dynamics, we parameterize the
modulation function M (x) = M (θ(x)) with θ(x) ∈ RP
being a vector of P = 2 parameters (rotation angle and speed
scaling) in the 2d case, and P = 4 (rotation represented as axis
angle in 3 parameters and one parameter for the speed scaling).
For a trajectory data set {xm , ẋm }M
m=1 , a corresponding set
can
be
easily
be
computed
by comparing f (xm )
{xm , θm }M
m=1
and ẋm for each m = 1 . . . M . It is possible to similarly
parameterize rotations in higher dimension, e.g. for reshaping
dynamics expressed in joint space.
Note that as the norm of θ goes to zero, the modulation
function Eq. (3) goes to the identity matrix. Hence, if a local
regression technique is applied to θ, the reshaped dynamics
2 M (x) is said to be locally active if there exists some closed subset χ of
RN such that M (x) = IN for all x ∈ RN \ χ
Training data
Training points used by GP-MDS
Influence region of the GP
The behavior of GPR is determined by the choice of observation noise variance σn2 and covariance function k(·, ·). In
this work, we use the squared exponential covariance function,
defined by:
(x − x0 )T (x − x0 )
0
2
k(x, x ) = σf exp −
2l
Example trajectories from
original dynamical system
Example trajectories from
reshaped dynamical system
Fig. 1. Left: Example of reshaped dynamics using GP-MDS in a 3d system.
The colored streamtapes represent example trajectories of the reshaped dynamics. The streamtapes colored in black represent trajectories that do not pass
through the reshaped region of the state space, and hence retain the straightline characteristics of the linear system that is used as original dynamics here.
The green streamtube are artificially generated data representing an expanding
spiral. Points in magenta represent the subset of this data that was selected as
training set. The gray surface illustrates the region in which the dynamics are
significantly altered (corresponding to a level set of the predictive variance
in the GP). Right: Same as left but zoomed in and the influence surface has
been sliced to improve visibility of the training points and the trajectories.
(a)
(b)
(c)
(d)
Fig. 2. Examples of GP-MDS in a 2d systems. (a): Streamlines represent
the direction of motion at each point. As seen, all trajectories converge to the
single attractor of the linear original dynamics. Green circles indicate collected
data points and magenta-colored circles indicate points selected for use in the
GP training set. The copper-colored colormap illustrates the reshaped region,
e.g. the region where the GP outputs a parameter vector different from zero.
(b): Similar to (a) but with different collected data.(c): A 2d GP-MDS
example with nonlinear original dynamics. (d): A 2d GP-MDS example with
nonlinear original dynamics and several reshaped parts of the state space.
are guaranteed to be bounded. In the next Section, we apply
Gaussian Process Regression for encoding θ(x) using a data
set {xm , θm }M
m=1 .
B. Learning Modulation with Gaussian Processes
In this section, we present how the reshaped dynamics
can be learned by encoding the parameter vector θ using
Gaussian Process Regression. Due to space restrictions, we
omit a review of GPR here and refer to [3]. The predictive
mean of the p:th entry of the parameter vector at a test-point
x∗ is:
θ̂p (x∗ ) = Kx∗ X [KXX + σn2 I]−1 Θp
(4)
P T
where Θp = [θ1p , . . . , θM
] and:
KXx∗ = [k(x1 , x∗ ), . . . , k(xM , x∗ )],
T
Kx∗ X = KXx
∗
The element at row i, column j of the M × M matrix KXX
is given by:
[KXX ]ij = k(xi , xj )
where l, σf > 0 are scalar hyper-parameters. In this work,
these parameters and the observation noise variance were set
to predetermined values. Alternatively, they could optimized
to maximize the likelihood of the training data [3].
If an identical GP prior is used for each dimension 1 . . . P
of θ, computing multidimensional θ is done at little additional
cost compared to predicting a single output, since the input
dependent vector a(x∗ ) = Kx∗ X [KXX + σn2 I]−1 can be
precomputed and predictions of each dimension is then simply
done by computing a dot product θ̂p (x∗ ) = a(x∗ )T Θp .
With fixed hyper-parameters, incremental learning can be
achieved simply by incrementally expanding the training set
with new data. However, since GPR involves inversion of
a matrix that grows with the number of training data, it is
important to sparsely represent the incoming data. This can be
done by defining some selection criteria that determines if new
data should be included in the training set or not. While most
previous such criteria are related to the predictive variance
or information-theoretic measures which depend on the input
patterns of the data [4], we propose a custom selection criteria
for GP-MDS which depends on the outputs and hence relates
directly to the resulting trajectory. To determine whether a
new pair xM +1 , θM +1 should be added to the training set, we
compare with the predicted parameter vector using the current
training set, θ̂(xM +1 ). The speed-scalings κM +1 , κ̂(xM +1 )
and the rotation angles φM +1 , φ̂(xM +1 ) are extracted from
2
1
θM +1 and θ̂(xM +1 ). Then, let JM
+1 and JM +1 denote two
positive scalar functions, defined as:
1
JM
+1 =
|κM +1 − κ̂(xM +1 )|
1 + κM +1
2
JM
+1 = min(|φM +1 − φ̂(xM +1 ) + 2kπ|)
k∈N
(5a)
(5b)
1
The first function, JM
+1 is a relative measure of the speed
2
error, and the second function JM
+1 is an absolute measure
of the error in rotation angle. The decision whether to add
a point is then made by comparing the values of these
1
2
functions with predefined thresholds J , J . If the value of
either function exceeds its threshold, the data point is added
to the training set, otherwise it is not. The thresholds should
be set so as to achieve a good trade-off between sparsity
and accurate trajectory representation. In this work, we used
1
J = 0.3, corresponding to a speed error smaller than 30%
being considered tolerable. For the rotation angle, a threshold
2
of J = 10o was used.
Fig. 1 shows an example of reshaping a linear 3d system
with GP-MDS to locally incorporate an expanding spiral
pattern. Fig. 2 shows a set of examples of reshaped dynamics
in 2d.
Original training data
Highlighted trajectories
Streamlines of dynamics
III. L EARNING H ANDWRITING M OTIONS
To illustrate an application of the proposed approach in
a Learning from Demonstration setting, we use GP-MDS to
learn a set of handwriting motions from the LASA handwriting
data set [2]. The form of the original dynamics is not constrained - any first order system can be used as f in Eq. (2).
For example, an SEDS model (Gaussian Mixture Regression
with stability constraints) can be used.
The first column of Fig. 3 shows training data for three
letters from the LASA handwriting set, along with streamlines
from SEDS models trained on this data. Note that these
models already do a good job at producing smooth generalized
dynamics from the data. The middle column of Fig. 3 shows
GP-MDS being applied for refining the SEDS dynamics. For
letter N, starting trajectories left of the demonstrated starting
location is problematic, as illustrated by the black example
trajectory in Fig. 3a. In Fig. 3b, this is remedied with a very
simple corrective demonstration. For letters W and Z, one
additional demonstration (different from the demonstrations
used for the SEDS models) was given. The goal here is to
sharpen the corners, which are overly smooth both in the original demonstrations and the resulting SEDS model (Figures 3d
and 3g). In order to favor detail over generalization, a fine
lengthscale was selected, resulting in the sharpened letters in
Figures 3e and 3h.
The right column of Fig. 3 shows streamlines from GP-MDS
applied to a linear system in place of an SEDS model. In these
cases, the original training data (the same that was used for
training the SEDS models) was used for GP-MDS. A medium
scale lengthscale was chosen to trade-off generalization and
detail. This exemplifies that GP-MDS can be used even
without any task-knowledge in the original dynamics although
a good original model can provide advantages such as better
generalization and allow GP-MDS to focus on detail refining
rather than general behavior.
Note the sparse selection of training data in Fig. 3, middle
column. In areas of the state-space were the original dynamics
have the same direction as the corrective demonstration, it is
not necessary to add training data3 . The sparse data selection
is also clearly visible near the end of the letters in the right
column of Fig. 3, since the demonstrations there are roughly
aligned with the trajectories of the linear system which is used
as original dynamics in these cases.
IV. C ONCLUSION
This extended abstract presented a novel representation for
autonomous dynamical systems, based on locally rotating and
scaling existing dynamics. An advantage of this representation
is that by construction it is impossible to introduce spurious attractors or unstable behavior. Gaussian Processes were
employed for learning reshaped dynamics, resulting in the
GP-MDS framework, which uses a heuristic sparsity criteria
1
3 In these experiments J = 0.3 was set to a very high value, tolerating
speed errors up to 30 % since speed was not considered important for this
task. In practive, the selection criteria is hence based on the angle error only.
(a)
Collected corrective data
Selected GP data
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Fig. 3. Left column: Demonstrated trajectories (red dots) and resulting
SEDS models for the letters N,Z and W. Example trajectories are highlighted
in black. Middle column: GP-MDS is used to improve various aspects
of the SEDS models. The copper colormap illustrates the reshaped region,
highlighting that GP-MDS locally modifies the dynamics. Right column: The
original training data is provided to GP-MDS, with a simple linear system
replacing SEDS as original dynamics.
that is tailored for the DS application. Preliminary results
from applying GP-MDS for refining handwriting motions were
presented.
We plan to improve the data selection procedure by incorporating pruning of old training points, so that the dynamics can
be continually reshaped over time with a maximum allowed
size of the number of points in the GP training set. The
current sparsity criteria is inherently sensitive to outliers, a
point which is usually not crucial when data comes from
a continuous process such as trajectory demonstrations. The
algorithm would however benefit from a method of discarding
outliers and this will be explored in future work.
ACKNOWLEDGMENT
This research was supported by the Swiss National Science
Foundation through the National Center of Competence in
Research Robotics.
R EFERENCES
[1] A. Ijspeert, J. Nakanishi, and S. Schaal, “Movement imitation with
nonlinear dynamical systems in humanoid robots,” IEEE Intl. Conf. on
Robotics and Automation, pp. 1398–1403, 2002.
[2] S. Khansari-Zadeh and A. Billard, “Learning stable non-linear dynamical
systems with Gaussian Mixture Models,” IEEE Transactions on Robotics,
vol. 27, pp. 1–15, 2011.
[3] C. Rasmussen and C. Williams, Gaussian processes for machine learning.
MIT Press, 2006.
[4] J. Quiñonero Candela and C. Rasmussen, “A unifying view of sparse
approximate Gaussian process regression,” The Journal of Machine
Learning Research, vol. 6, pp. 1939–1959, 2005.

Download Report

Incremental Motion Learning with Gaussian Process Modulated

Paperzz.com

Your Paperzz