Reconstruction of chaotic dynamics by on-line EM

Reconstruction of chaotic dynamics
by on-line EM algorithm
Shin Ishiiy
Masa-aki Sato z
y Nara Institute of Science and Technology
8916-5 Takayama-cho, Ikoma-shi, Nara 630-0101, Japan
(TEL) (+81)-743-72-5980
(FAX) (+81)-743-72-5989
(E-mail) [email protected]
z Advanced Telecommunication Research Institute International
2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan
(TEL) (+81)-774-95-1039
(FAX) (+81)-774-95-1259
(E-mail) [email protected]
to appear in Neural Networks
Abstract
In this paper, we discuss the reconstruction of chaotic dynamics by using a normalized
Gaussian network (NGnet), which is a network of local linear regression units. The NGnet
is trained by an on-line EM algorithm in order to learn the vector eld of the chaotic
dynamics.
We investigate the robustness of our approach to two kinds of noise processes: system
noise and observation noise. System noise disturbs the system dynamics itself. Observation noise is added when the state variables are observed. It is shown that the trained
NGnet is able to reproduce a chaotic attractor, which approximates the complexity and
instability of the original chaotic attractor well, even under various noise conditions. The
trained NGnet also shows good prediction performance.
When only part of the dynamical variables are observed, the delay embedding method
is used. The NGnet is then trained to learn the vector eld in the delay coordinate space.
Experiments show that the chaotic dynamics embedded in the delay coordinate space is
able to be learned under the two types of noise.
1
Reconstruction of chaotic dynamics
1
2
Introduction
One of the main topics in the eld of nonlinear dynamical systems is estimating dynamical
characteristics and future behaviors of an unknown dynamical system from its observation.
Especially when the system is chaotic, such inverse problems are dicult. Long-term characteristics of dynamical systems can be estimated by invariant measure [4, 10], Lyapunov exponents [11], correlation dimension [12], unstable periodic orbits [3], and so on. These estimations
have been done by mimicing the dynamical system using some approximation methods. Such
methods include neural networks [17, 4], Markov chain approximation based on triangulations
[10, 11], and piecewise-linear approximators based on triangulations [2, 3]. On the other hand,
the estimation of future behaviors is called short-term prediction [8, 28, 5, 13, 1], because the
prediction for a long future period is impossible due to the sensitive dependency of chaotic
dynamics on the initial condition [14]. This paper deals with both of the long-term dynamical
characteristics and the short-term prediction, by using a neural network model and a stochastic
learning algorithm.
We [21, 22] formerly showed that a recurrent neural network (RNN) could learn continuous
time chaotic dynamics from observed data. In these studies, we emphasized that long-time
trajectory generation has its own validity. That is, once the learning system acquires the
dynamical characteristics of the target chaotic dynamics from the observed data, a long-time
trajectory can be generated that goes around the chaotic attractor. Although the time sequence
of this trajectory may be dierent from the observed data, the dynamical characteristics of the
synthesized trajectory can well approximate those of the observed data. As an application, we
showed the possibility of natural sound synthesis by an RNN trained by observed natural sound
waveforms. The RNN was able to synthesize a waveform with naturally correlated uctuations
which made the sound natural [20].
In this paper, a normalized Gaussian network (NGnet) [18] is employed in order to approximate the discretized vector eld of a chaotic dynamical system. We will show that this
new approach is able to learn chaotic dynamics faster and better than an RNN trained by
the \back-propagation through time" algorithm [19]. The NGnet is a network of local linear
regression units. The model softly partitions the input space by using normalized Gaussian
functions, and each local unit linearly approximates the output within its partition.
The NGnet is trained by the on-line EM algorithm, which we previously proposed [23]. It
has been shown that this on-line EM algorithm is faster than the gradient descent algorithm.
Moreover, the on-line EM algorithm is faster and more accurate than the batch EM algorithm
[31] especially for a large amount of data [23, 25]. Therefore, it is advantageous to use the online EM algorithm for learning chaotic dynamics because a large amount data is necessary for
approximating the vector eld accurately. In the on-line EM algorithm, the positions of the local
units can be adjusted according to the input and output data distribution. Unit creation and
unit deletion are also performed according to the data distribution. Since a chaotic attractor is
usually distributed in a lower dimensional manifold than the input space, the NGnet is suitable
for approximating the vector eld in the neighborhood of the attractor.
For the short-term prediction, a local polynomial approximation is often used [8]. For a given
data point, this method rst searches for a number of nearest neighbor points in the observed
data. The target function in the local region is then approximated by using a polynomial
function whose parameters are determined using the nearest neighbor points. The estimated
3
Reconstruction of chaotic dynamics
polynomial function has dierent parameters for each observed data point and the number of
parameters in this method is proportional to the amount of the observed data. A large amount
of data requires much memory and computation time in the prediction process. The NGnet,
on the other hand, requires only a moderate amount of parameters even for a large amount
of data, because each unit softly partitions the input space according to the data distribution.
Therefore, the prediction can be performed quickly in our method. In addition, the learning
and the prediction can be done concurrently under real time operation by using the on-line EM
algorithm.
In real world problems, observed variables often include noise. In our approach, the discretized vector eld is estimated from discretized time derivatives of the observed data. It is
well known that the estimation of time derivatives from discretized data is very sensitive to
noise, since the derivative operation emphasizes noise components. Hence, we investigate the
robustness of our approach to two kinds of noise processes. They are system noise and observation noise. System noise disturbs the system dynamics itself. Observation noise, in contrast,
is added when the state variables are observed.
In order to measure the performance of our method, the dynamical characteristics of chaotic
attractors generated by the NGnet trained under various noise conditions are compared with
those of the original chaotic attractor. The results show that our method is able to mimic
the chaotic dynamics even under the noise conditions. The trained NGnet also shows good
performance for the short-term prediction.
In many situations, part of the dynamical variables are often observed. This situation was
also investigated in our former study [22], where the RNN was trained to learn the chaotic
dynamics with hidden variables. Although the trajectory of the observed variables was reproduced fairly well, it remained unclear whether the dynamics of the trained RNN was equivalent
to the original dynamics because of coordinate transformation ambiguity [22]. In order to avoid
such problems, we employ a delay coordinate embedding [29, 26] in this study. The NGnet is
trained to learn the discretized vector eld in the delay coordinate space. It is shown that the
chaotic dynamics is able to be learned with this method in the delay coordinate space under
noise conditions.
An alternative learning method, in which the NGnet is trained to learn the mapping function
in discretized time, is also investigated in this study. The performances of this method are then
compared with those of the discretized vector eld method.
2
NGnet and on-line EM algorithm
In this section, we review the on-line EM algorithm for the NGnet, which was proposed in our
previous paper [23].
The NGnet [18], which transforms an N -dimensional input vector x to a D-dimensional
output vector y, is dened by
!
X
G (x)
~ x~
y =
W
(1a)
P
G (x)
N
1
1
0
0
0
0
2
2
G (x) (2) j6 j exp 0 (x 0 ) 6 (x 0 ) ;
(1b)
2
M
i=1
i
i
M
j =1
i
j
i
i
i
1
i
4
Reconstruction of chaotic dynamics
where M denotes the number of units and the prime (0) denotes a transpose. G (x) is an N dimensional Gaussian function, which has an N -dimensional center vector and an N -by-N
covariance matrix 6 . j6 j is the determinant of 6 . W~ (W ; b ) is a D-by-(N + 1) linear
regression matrix, where x~0 (x0; 1) and W~ x~ W x + b .
The NGnet can be interpreted as a stochastic model, in which a pair of an input and an
output, (x; y), is a stochastic event. For each event, a unit index i 2 f1; :::; M g is assumed
to be selected, which is regarded as a hidden variable. The stochastic model is dened by the
probability distribution for a triplet (x; y; i), which is called a complete event:
#
"
D+N 0
1
1 0
1
0
0
0
0
~
P (x; y; ij) = (2) 2 j6 j 2 M exp 0 (x 0 ) 6 (x 0 ) 0
2
2 jy 0 W x~j : (2)
~ j i = 1; :::; M g is the set of model parameters. A pair (x; y) is called an
f ; 6 ; ; W
Rincomplete event. Since the expectation value of output y for a given input x, i.e., E [y jx] yP (yjx; )dy, is identical to equation (1), the probability distribution (2) provides a stochastic
model for the NGnet.
From a set of T incomplete events (data) (X; Y ) f(x(t); y(t)) j t = 1; :::; T g, the model
parameter of the stochastic model (2) can be determined by the maximum likelihood estimation method. In particular, the EM algorithm [6] can be applied to a stochastic model with
hidden variables. The EM algorithm repeats the following E- and M-steps.
E (Expectation) step: Let be the present estimator. By using , the posterior probability
that the i-th unit is selected for each datum (x(t); y(t)) is calculated by
P (x(t); y(t); ij)
:
(3)
P (ijx(t); y (t); ) = P
P (x(t); y (t); j j)
Using this posterior probability, the expected log-likelihood Q(j; X; Y ) for the complete events
is dened by
XX
Q(j; X; Y ) =
P (ijx(t); y (t); ) log P (x(t); y (t); ij):
(4)
M (Maximization) step: Since an increase of the expected log-likelihood (4) implies an
increase of the log-likelihood for the data set (X; Y ) [6], the expected log-likelihood is maximized
with respect to . A solution of the necessity condition @Q=@ = 0 is given [31] by
= hxi (T )=h1i (T )
(5a)
(5b)
60 = [hxx0i (T )=h1i (T ) 0 (T )0 (T )]0
~W = hyx~0i (T )[hx~x~0i (T )]0
(5c)
h
i
1 hjyj i (T ) 0 Tr W~ hx~y0i (T ) =h1i (T );
(5d)
=
D
i
i
i
i
i
i
i
i
i
2
i
D
i
i
i
i
1
i
i
i
i
1
i
2
i
i
2
i
M
j =1
T
M
t=1 i=1
i
i
i
i
1
i
i
i
i
i
2
2
i
i
i
i
i
i
1
1
i
where Tr(1) is the matrix trace. h1i denotes a weighted mean with respect to the posterior
probability (3) and it is dened by
X
hf (x; y)i (T ) 1 f (x(t); y(t))P (ijx(t); y(t); ):
(6)
i
T
i
T
t=1
5
Reconstruction of chaotic dynamics
The EM algorithm introduced above is based on batch learning [31], namely, the parameters
are updated after seeing all of the data. We introduce here an on-line version [23] of the EM
algorithm. Let (t) be the estimator after the t-th datum (x(t); y(t)). In this on-line EM
algorithm, the weighted mean (6) is replaced by
T
T
Y
X
hhf (x; y)iii(T ) (T ) (
(T )
0
@
t=1 s=t+1
T
T
Y
X
(
t=1 s=t+1
(s))f (x(t); y (t))P (ijx(t); y(t); (t 0 1))
(7a)
101
(s))A ;
(7b)
(s) 1 is assumed. Parameter (t) 2 [0; 1] is a discount factor, which is
where Q
introduced in order to forget the eect of an earlier inaccurate estimator. is a normalization
coecient, and it works as an eective learning coecient. For the present datum (x(t); y(t)),
the E-step calculates the posterior probability P (t) P (ijx(t); y(t); (t 0 1)) by using equation
(3). The M-step calculates the weighted means, hh1ii (t), hhxii (t), hhjyj ii (t), and hhyx~0ii (t), by
the following step-wise equation:
hhf (x; y)ii (t) = hhf (x; y)ii (t 0 1) + (t) [f (x(t); y(t))P (t) 0 hhf (x; y)ii (t 0 1)] (8a)
(t) = (1 + (t)=(t 0 1))0 :
(8b)
Using the weighted means, the new parameters are obtained by
(t) = hhxii (t)=hh1ii (t)
(9a)
"
#
0
~
~
(t 0 1)~x(t)~x (t)3 (t 0 1)
3~ (t) = 1 01(t) 3~ (t 0 1) 0 (1=P((tt))03 1)
(9b)
+ P (t)~x0(t)3~ (t 0 1)~x(t)
~ (t) = W~ (t 0 1) + (t)P (t)(y(t) 0 W~ (t 0 1)~x(t))~x0(t)3~ (t)
W
(9c)
h
i
1 hhjyj ii (t) 0 Tr W~ (t)hhx~y0ii (t) =hh1ii (t);
(9d)
(t) =
D
where 3~ (t) [hhx~x~0ii ]0 . 60 (t) can be obtained from the following relation with 3~ (t).
!
60 (t0)
060 (t) (t)
~3 (t)hh1ii (t) =
(10)
00 (t)6 (t) 1 + 0 (t)60 (t) (t) :
Due to the introduction of the auxiliary variable 3~ (t), it is not necessary to calculate the inverse
covariance matrix, which is needed in equations (5b) and (5c) in the batch EM algorithm.
It can be proved [23] that this on-line EM algorithm is equivalent to the stochastic approximation method for nding the maximum likelihood estimator, if the eective learning
coecient satises the following conditions:
1
1
X
X
(t) < 1:
(11)
(t) = 1;
T
s=T +1
i
i
i
2
i
i
i
i
i
i
1
i
i
i
i
i
i
2
2
i
i
1
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
i
1
i
i
i
i
1
i
i
1
i
1
i
1
i
i
i
2
t=1
t=1
!1 1, condiFor example, when the discount factor (t) is scheduled to approach 1, i.e., (t) 0!
tion (11) is satised and the on-line EM algorithm locally converges to a maximum likelihood
estimator. An example for such scheduling is given by
(t) = 1 0 (1 0 (t 0 1));
(12)
t
6
Reconstruction of chaotic dynamics
where 0 1. In this paper, we use scheduling (12) with = 0:9993.
We also employ dynamic unit manipulation mechanisms in order to eciently allocate the
units [23]. The probability P (x(t); y(t) j (t 0 1)) indicates how probable the present model is
expected to produce a datum (x(t); y(t)) with the present parameter (t 0 1). If this probability
is less than a certain threshold value, a new unit is produced to account for datum (x(t); y(t)). In
the production, = x(t) and W~ = (0; y(t)). The covariance matrix 6 is determined
based on the distance from the neighboring units. On the other hand, the weighted mean
hh1ii (t) indicates how much the i-th unit has been used to account for the data until t. If the
mean becomes less than a certain threshold value, this unit is deleted. Our unit manipulation
mechanisms above are based on a heuristic model selection framework . Although the threshold
value for the unit production is determined by hand, the value does not signicantly depend
on the function to be approximated. The threshold value for the unit deletion is set at a small
value in comparison to the reciprocal of the unit number .
In order to deal with a singular input distribution, a regularization for 60 (t) is introduced
as follows:
i
h
(13a)
60 (t) = hhxx0ii (t) 0 (t)0 (t)hh1ii (t) + hh1 ii (t)I =hh1ii (t) 0
(13b)
hh1 ii (t) = hhjxj ii (t) 0 j (t)j hh1ii (t) =N;
where I is the N -by-N identity matrix and is a small regularization constant. The corresponding 3~ (t) can be calculated in an on-line manner using a similar equation to (9b). It should
be noted that the regularized covariance (13) does not introduce any bias on the estimation of
the linear regression matrix, (9c) [23].
We have shown that the on-line EM algorithm with an appropriate scheduling for the
discount factor (t) is faster and more accurate than the batch EM algorithm, especially for a
large amount of data [23, 25]. Therefore, it is advantageous to use the on-line EM algorithm
for approximating the target function from a large amount data. In addition, the on-line EM
algorithm is ecient for dynamic environments, i.e., the input and output distribution changes
over time [23].
What our on-line EM algorithm is actually doing is the maximization of the discounted loglikelihood for the input-output joint probability [23, 25]. It is dierent from the usual function
approximation in two points: (a) many function approximation methods do not care about the
input distribution, while our method does; and (b) our method estimates the output variance
parameter , which is not used in the function approximation (1).
The estimation of the input distribution is useful especially when data appear in a localized
region in the input space. This often occurs in approximation problems of dynamical systems,
which is considered in this study. In addition, the estimation of the input distribution enables
us to allocate the normalized Gaussian units eciently and dynamically.
The output variance parameter corresponds to the residual error of the linear tting in
the i-th local region. By incorporating this parameter in the estimation, our learning algorithm
with the NGnet balances the approximation of the input distribution and the linear regression
by the local units. Then, our learning method can obtain the optimal parameter for the
function approximation, considering the noise level involved in the training data, the roughness
new
new
new
i
1
2
i
i
2
i
1
i
i
2
i
i
i
i
i
2
2
i
1
1
i
N
i
i
N
i
2
i
2
i
1 Bayesian methods will provide a more theoretical model selection framework [24].
2 We set the threshold at 1004 in experiments in this paper. Typical unit number is
100
1000.
7
Reconstruction of chaotic dynamics
of the target function, and so on. In addition, since the output variance corresponds to the
error of the local regression with respect to the training data, the magnitude of noise in the
training data can be estimated from it without any prior knowledge. Later we will show an
experimental result on the noise level estimation. Furthermore, when of the i-th unit is too
large in comparison to the other units, it is considered that the i-th linear regression is not
good. The i-th unit can be split into two units in such a case [23], which will provides another
unit manipulation mechanism.
2
i
3
Reconstruction from full observation
3.1 Lorenz system
As a chaotic dynamical system, we consider the Lorenz system [14], which is dened by the
following dierential equations:
x_ = a(y 0 x)
(14a)
y_ = 0y + (b 0 z )x
(14b)
z_ = 0cz + xy;
(14c)
where x_ denotes the time derivative of the state variable x. The parameters are set at their
usual values: a = 10:0; b = 28:0, and c = 8=3. Let X_ = F (X ) be the vector notation of the
Lorenz equation (14), where X (x; y; z)0. An orbit of the Lorenz equation is obtained by the
fourth order Runge-Kutta method with the integration time step t = 0:001. Figure 1 shows
the phase portrait of the Lorenz attractor, when the sampling time step is t = 10 1 t = 0:01.
In the following experiments, the sampling time step is xed at t .
R
O
R
O
3.2 Discretized vector eld
We [21, 22] formerly showed that an RNN with sigmoidal units was able to generate a chaotic
attractor, which had a similar topological structure to the Lorenz attractor, after learning of
an orbit of the Lorenz system. In the present study, we employ the NGnet to approximate the
discretized vector eld of the dynamical system and show that this new approach is able to
learn the chaotic dynamics faster and better than the former method.
We assume that the system variable X (t) is observed (sampled) at discrete time steps, i.e.,
t = n 1 t (n = 0; 1; :::). The discretized vector eld F~ (X ) for the sampling time step t is
then dened by
F~ (X ) (8 (t ) 0 X )=t ;
(15)
where 8 (t ) is the solution of the dynamical equation 8_ (t) = F (8 (t)) under the initial
condition 8 (0) = X . The discretized vector eld is dependent on the sampling time step t
and is dierent from the original vector eld of the continuous-time system. If the sampling
time is xed, however, the discretized vector eld contains all the information of the discretized
dynamics whose evolution is described by
(16)
X ((n + 1) 1 t ) = X (n 1 t ) + t F~ (X (n 1 t )):
O
O
X
X
O
O
O
X
X
X
O
O
O
O
O
8
Reconstruction of chaotic dynamics
Alternately, the discretized dynamics can be described by
X ((n + 1) 1 t ) = K (X (n 1 t )) 8 1 O (t );
(17)
where K is called the discrete mapping. In Section 5, we will compare these two methods.
O
O
X (n t )
O
3.3 Learning vector eld
In order to train the NGnet so as to approximate the discretized vector eld (15), the input
and the target output are the state vector and the discretized vector eld at that state vector,
respectively. Namely, the input is X (n 1 t ) and the target output is (X ((n + 1) 1 t ) 0 X (n 1
t ))=t , for n = 0; 1; ::: When the trained system is running, the discretized vector eld at the
current state X (n 1 t ) is approximated by using the trained NGnet. The next time-step state
X ((n + 1) 1 t ) is then calculated from equation (16) by using the approximated discretized
vector eld F~ . By continuing this process, a trajectory is reproduced.
The above approach is dierent from our former approach using an RNN [21, 22]. The
RNN approach tried to approximate the original vector eld of the continuous-time system
even if the system was discretized. This could be done using the \back-propagation through
time" algorithm [19] for the discretized dynamical system. In that approach, the quality for
estimating the original vector eld degrades as the sampling time step enlarges. On the other
hand, the new approach tries to approximate the discretized vector eld of the discretized
system. Therefore, the estimation quality is less dependent on the sampling time step t .
Accordingly, our current objective is to approximate the discretized dynamics of the unknown
dynamical system rather than to estimate the original continuous-time dynamical system from
the discrete sampling.
The solid lines in Figures 4(a), 4(b) and 4(c) show the learning process. The abscissa
denotes the amount of learned data, which are supplied one after another from an orbit of
the Lorenz system. The ordinate in Figure 4(a) denotes the normalized mean squared error
(nMSE) of the discretized vector eld. The nMSE is evaluated using 10,000 test data that
are randomly sampled on the Lorenz attractor, and it is normalized by the variance of the
true vector eld on the Lorenz attractor. The ordinate in Figure 4(b) or 4(c) denotes the
log-likelihood discounted with the discount coecient (t) as in equation (7) or the averaged
log-likelihood for the 10,000 test data above, respectively. Our on-line EM algorithm maximizes
the discounted log-likelihood (Figure 4(b)) for the input-output joint probability [23, 25] and
converges to a local maximum of the expected log-likelihood on the attractor. Figures 4(b)
and 4(c) show that both the discounted log-likelihood for the training data and the averaged
log-likelihood for the test data increase prosperously as the amount of training data increases.
The averaged log-likelihood approximates the expected log-likelihood on the attractor. On the
other hand, the on-line EM algorithm does not directly minimize the nMSE. This may be the
reason for occasional up and down in nMSE learning curves (Fig.4(a)).
After 100,000 data are learned, the number of units becomes 135 and the nMSE on the
attractor becomes 0.015%. Figure 2 shows the phase portrait of the reproduced dynamics,
whose discretized vector eld is given by the trained NGnet, after the 100,000 data have been
learned.
In our former study [22], 3,000,000 data points were needed to train the RNN with sigmoidal
units in order to well approximate the Lorenz dynamics. Namely, weight parameters of the RNN
O
O
O
O
O
O
O
9
Reconstruction of chaotic dynamics
were updated 30,000 times and 100 orbit points were presented to the RNN for each weight
update.
The characteristics of the Lorenz dynamics are better approximated in our new approach
than in the RNN method as will be shown below. Therefore, our new approach is faster and
more accurate than the former one. This is partly because the NGnet is a local model, and
partly because the on-line EM algorithm is faster than the gradient descent algorithm used in
the RNN. On the other hand, since the NGnet is a local model, it has a weak extrapolation
ability in comparison with global models such as the RNN with sigmoidal units. In the outside
region of the attractor, the NGnet does not approximate the vector eld well.
The Lorenz attractor has a two-dimensional sheet-like structure consisting of a large number
of closely packed sheets [14]. Its fractal dimension is slightly larger than two [12]. Therefore, the
local covariance matrix, 6 , becomes nearly singular and the learning process becomes unstable,
if the regularization method introduced in Section 2 is not employed. When the regularization
method with = 0:01 is used, the largest, the second largest, and the smallest eigenvalues of the
covariance matrices become 5:44(mean) 6 4:67(standard deviation), 0:91 6 0:92, and 0:06 6 0:26,
respectively, for the 135 units after the learning of the 100,000 data. Our regularization method
prevents the smallest eigenvalue from becoming zero. Figure 3 shows the receptive elds of the
135 units. In the gure, we can see that the 135 units cover the whole attractor. Here, the
receptive eld of the i-th unit is depicted by an ellipse whose axes correspond to the two
principal components of the local covariance matrix 6 . The ellipse center is set at the unit
center .
i
i
i
3.4 Robustness to noise
In this subsection, the robustness of our approach to noise is examined. We consider two kinds
of noise processes: system noise and observation noise.
System noise disturbs the system dynamics, namely, the system dynamics X_ = F (X ) is
disturbed by the white Gaussian noise (t):
X_ = F (X (t)) + (t):
(18)
The white Gaussian noise (t) satises E [(t)(s)] = I 1 (t 0 s), where E [1] denotes the
expectation value, I is a three-dimensional identity matrix, (1) is the Dirac's delta function, and
is the system noise variance. The disturbed state variable X (t) becomes a stochastic process.
By applying the Euler method, the stochastic dierential equation (18) can be discretized [15]
with the integration time step t as follows:
X ((n + 1) 1 t ) = X (n 1 t ) + t 1 F (X (n 1 t )) + t (n 1 t ):
(19)
The noise whiteness in the continuous system, E [ (t)(s)] = I 1 (t 0 s) becomes E[ (n 1
t ) (m 1 t )] = I 1 =t in the discretized system. Here, is the Kronecker's delta and
P
R!
t ( =t ) 0!
the coecient
1
=t
guarantees
the
normalization
of
the
delta
function:
1
=
R
1 = ds(t 0 s). Since the Euler method is not very precise for the integration of a chaotic
dynamical system, we use the following method. The integration of the noiseless equation (14)
is done by the Runge-Kutta method with the integration time step t . At each integration
2
S
2
S
R
R
R
R
R
R
R
2
R
R
2
S
S
nm
R
nm
R
m
R
R
nm
R
t
0
10
Reconstruction of chaotic dynamics
time step, the discretized system noise (n 1 t ) t (n 1 t ) is independently added to the
system variable:
X ((n + 1) 1 t ) = X (n 1 t ) + t 1 F (X (n 1 t )) + (n 1 t );
(20)
where F (1) is the eective discretized vector eld obtained by the Runge-Kutta method for
the noiseless equation (14) with the integration time step t . From the denition, (n 1 t ) t 1 (n 1 t ), the discretized system noise satises E [ (n 1 t ) (m 1 t )] = I 1 t ,
namely, the variance of the discretized system noise (n 1 t ) is given by t . This method
(20) obtains an approximated solution of the stochastic dierential equation (18).
On the other hand, when the state variable is observed, observation noise is added:
Y (t) = X (t) + (t):
(21)
In experiments, Y (n 1 t )(n = 0; 1; :::) are supplied to the learning system. At each observation
time step, observation noise is assumed to be independently generated according to the
Gaussian distribution whose variance is given by (t =2). Then, the eective noise variances
in the discretized vector eld for the sampling time step t , i.e., (Y ((n + 1) 1 t ) 0 Y (n 1
t ))=t , become ( =t ) for the system noise and ( =t ) for the observation noise. The
magnitudes of these two types of noise
q are given in proportion to the standard deviation of the
state variable on the attractor, Var(X )=3 = 8:52, which is experimentally obtained from
a lot of Lorenz orbits. For example, the variance of system noise is determined as t =
t () . In the following experiments, we represent the noise level using the percentage of
parameter .
Figures 4(a), 4(b) and 4(c) show the learning processes, i.e., the nMSE in 4(a), the discounted log-likelihood for the training data in 4(b) and the average log-likelihood for the test
data in 4(c), for various system noise levels: 0% (solid), 10% (dashed), 20% (dash-dotted), and
30% (dotted). No observation noise is added in these experiments. As the noise level becomes
large, the nMSE on the attractor increases, the log-likelihood (LL) for both the training data
and the test data becomes small, and the amount of data needed to approximate the dynamics
becomes large. If a suciently large amount of data is supplied, our method is able to reproduce a chaotic attractor that is similar to the original Lorenz attractor, even when system noise
increases up to 40% or observation noise increases up to 70%.
In order to measure the similarity between the Lorenz attractor and the reconstructed
chaotic attractor, the dynamical characteristics of the attractor are examined. Table I summarizes the characteristics of the attractors reconstructed by the NGnet after the learning of
2,000,000 data in various noise conditions. Twelve conditions are prepared: no noise, 10%,
20%, 30% and 40% system noise, and 10%, 20%, 30%, 40%, 50%, 60% and 70% observation
noise.
\NU" denotes the number of NGnet units after the learning. As the system/observation
noise level increases, the NU becomes large, due to the variation of input Y (n 1 t ) and the
target output (Y ((n + 1) 1 t ) 0 Y (n 1 t ))=t . Although the NU depends on the threshold
values for the unit production and deletion mechanisms, the comparison in Table I is meaningful,
because the threshold values are the same for all of the noise conditions. In general, the NU is
likely to gradually enlarge as the number of training data increases.
In each condition, the nMSE and the LL on the attractor are evaluated on a xed test data
set consisting of 100,000 data points that are randomly sampled from the Lorenz attractor.
S
R
R
R
R
R
R
RK
R
S
R
RK
R
R
R
S
S
S
R
R
S
2
R
2
S
S
R
R nm
R
O
O
O
O
2
O O
O
O
2
S
2
O
O
O
O
O
2
R
R S
2
O
O
O
O
Reconstruction of chaotic dynamics
11
The nMSE and the LL around the attractor are evaluated on a xed test data set consisting
of 100,000 data points that are randomly sampled from Lorenz trajectories with 20% system
noise. The purpose of the latter condition is to see how well the NGnet approximates the vector
eld in the neighborhood of the attractor. In general, the nMSE and the LL on the attractor
becomes worse as the noise level becomes large. When system noise is added, on the other
hand, the nMSE around the attractor becomes smaller than that in the noiseless case. When
moderate system/observation noise is added, the LL around the attractor is also improved.
Although our on-line EM algorithm maximizes the discounted log-likelihood, the accuracy
of the dynamics reproduced by the NGnet depends on the approximated discretized vector eld.
In the following experiments, therefore, we often use the nMSE to evaluate the approximation
ability for the discretized vector eld. It should be noted that the absolute value of the nMSE
and the LL cannot be compared.
From the correlation dimension (CD) [12] and the maximum Lyapunov exponent (MLE) [7,
27] in Table I, we can see that the chaotic attractor reproduced by the NGnet well approximates
the complexity and the instability of the original Lorenz attractor for noise levels below 40%
system noise or 30% observation noise. In our former study [21, 22], the MLE value in the
noiseless case was 0.85, which was worse than the present results. When the observation noise
is 40% or larger, the CD becomes smaller than the true value and the MLE becomes larger
than the true value. Namely, the detailed fractal structure of the chaotic attractor is smeared
by the large noise, so that the complexity of the attractor measured by the CD decreases.
The increase of the MLE implies that the large noise increases the instability of the obtained
dynamics. When observation noise is larger than 75% or system noise is larger than 45%, the
orbits produced by the NGnet often stop or go into limit cycles after long chaotic transients.
The above results can be interpreted as follows. System noise disturbs the system trajectory
and it helps the learning system explore the states around the attractor. Therefore, the data
with the system noise contain information about the vector eld around the attractor, although
the information includes the noise. When noiseless data are provided, there is no information on
the vector eld outside the attractor, but accurate information on the vector eld is available on
the attractor. Observation noise also disturbs the observed trajectory. However, the dynamics
is not disturbed by the observation noise, so the data with the observation noise contain no
information on the vector eld outside the attractor. These facts may be a part of the reason
why system noise improves the nMSE around the attractor and observation noise does not.
There may be other reasons for the good performance of our method under noise conditions.
In the NGnet, the parameters of each unit are determined by the weighted mean (7) in the
neighborhood of the unit center. Because of the ergodicity of the chaotic attractor, much data
can be found in the neighborhood of any point on the attractor if one observes the system long
enough. Therefore, the noise eects can be smoothed by the averaging process.
We can also consider a situation where both of the two noise processes occur. Figure 5 shows
the reproduced chaotic attractor after learning of 3,000,000 data with both 30% system noise
and 30% observation noise. In this case, the nMSE on the attractor becomes 0:178%, which is
12 times as large as that of the noiseless case in Figure 2. Nevertheless, the reproduced chaotic
attractor looks similar to the Lorenz attractor, and has similar characteristics, too. Table I
contains the characteristics of the reproduced attractor.
In the experiments above, the regularization parameter in equation (13a) is xed at 0.01
for every noise condition. In general, the regularization becomes less important in the presence
12
Reconstruction of chaotic dynamics
of large noise, because the noise enlarges the distributed region of the input data.
One of the advantages of our method is that the noise level included in the observed data
can be estimated without prior knowledge. The output variance parameter determined by
(9d) estimates the mean squared error of the linear tting in the i-th local region. Therefore,
the global noise level can be estimated from the average of . The estimated noise levels
normalized by the vector eld variance in various noise conditions are given in Table II. The
results show that the noise levels are successfully estimated by our method.
The prediction ability of the trained NGnet in the above experiments is also examined. If
one were to need future predictions for only small time steps, a much smaller amount of data
would be needed. We here examine the predictive performance of the NGnet trained by a large
amount of data in the presence of the two types of noise.
Figure 6(a) shows a typical example of a time-series for x(t) predicted by the NGnet, which
is trained with no noise, 10% system noise, and 10% observation noise. The prediction is
accurate until about t = 5:2 in the noiseless and the 10% system noise cases. In the 10%
observation noise case, the prediction starts to deviate at t = 3:5 but continues to follow the
true orbit until about t = 5.
Figure 6(b) shows the average prediction error. Each line denotes the squared error between
the three dimensional state variable predicted by the NGnet and the true Lorenz orbit at each
time. The error is averaged over trials starting from 100 dierent initial states and is normalized
by the variance of the state variable, . The prediction is accurate until about t = 4:2 on
average in the noiseless and the 10% system noise cases. In the 10% observation noise case, the
prediction starts to deviate at t = 3:5 but continues to follow the true orbit until about t = 4:2
on average.
Even with the perfect model, the prediction error would grow as exp(MLE 1 t) on average due
to the instability of the chaotic dynamics. The above prediction times are fairly long compared
to the characteristic time, 1=MLE = 1:04, which is determined by the MLE of the Lorenz
attractor. One should note that t = 1 corresponds to 100 time steps in these experiments.
These experiments show our approach based on the on-line EM algorithm is robust to system
and observation noises. The chaotic attractor reproduced by our method well approximates
the dynamical characteristics of the original Lorenz attractor even in the presence of the two
types of noise. Prediction by the trained NGnet is also good under the two types of noise. The
experiments also show that the system noise does not degrade the prediction ability but the
observation noise degrades it slightly. In addition, we have found that the system noise can
help the learning of the vector eld around the attractor, but can be harmful to the learning
when it is too large.
2
i
2
i
2
4
Reconstruction from partial observation
In real problems, part of dynamical variables are often observed. In this section, it is assumed
that one of the dynamical variables, e.g., x(t) of the Lorenz system (14), is observed. This
situation was also investigated in our former study [22], where the RNN was trained in order
to learn the Lorenz dynamics with hidden variables. Although the trajectory of the observed
variables was reproduced fairly well, it remained unclear whether the dynamics of the trained
RNN was equivalent to the original dynamics, because of coordinate transformation ambiguity
13
Reconstruction of chaotic dynamics
[22]. In order to avoid this problem, we employ a delay coordinate embedding in the present
study.
4.1 Time delay embedding
The delay coordinate for variable x(t) is dened by
Z (t) (x(t); x(t 0 ); x(t 0 2 ); :::; x(t 0 (d 0 1) ))0 ;
(22)
where d and are called the reconstruction dimension and the lag time, respectively [29, 26]. In
experiments, we set d = 3 and = n 1 t = 0:15 (, i.e., n = 15), and x(n 1 t ) (n = 0; 1; :::)
are provided to the learning system. From this trajectory, the delay coordinate trajectory (22)
is obtained. Figure 7 shows the phase portrait of Z (t), where a chaotic attractor can be seen.
The lag time has been determined to be the rst minimum of the mutual information
between x(t) and x(t0 ) [9]. This prescription guarantees that x(t) and x(t0 ) are independent
enough of each other to be useful as coordinates in a time delay vector but not so independent
as to have no connection with each other at all.
The embedding theorem [29, 26] states that, if the reconstruction dimension d is larger than
2s, where s is the attractor's dimension, possibly fractal, the attractor can be embedded in the
delay coordinate, namely, there is a smooth vector eld for the delay coordinate:
Z_ = H (Z );
(23)
and the vector eld H preserves the topological structure of the original vector eld F [26].
Since the fractal dimension of the Lorenz attractor is slightly larger than two [12], one can safely
embed the Lorenz attractor with d 5. However, this is not a necessary dimension for the
embedding. With the lag time = 0:15, the Lorenz attractor can be embedded in the delay coordinate for d 3. This can be conrmed by checking the false neighbors in the reconstruction
space [16]. When distant points in the original dynamical space are projected to neighboring
points in the reconstruction space, these points are called false neighbors. The absence of false
neighbors implies that the reconstructed attractor is completely unfolded and is safely embedded in the reconstruction space. If neighbor points in the d-dimensional reconstruction space
become far apart in the (d + 1)-dimensional reconstruction space, they are regarded as false
neighbors. It has been experimentally conrmed that there are no false neighbors for d 3
with the lag time = 0:15. Therefore, d = 3 is sucient for the embedding of the Lorenz
attractor.
There are several approaches for making future prediction using the delay coordinate and
each method has its own advantages and disadvantages. Conventionally, the delay coordinate
is used to predict x(t + ) [8, 28], namely, some function approximator is trained in order
to learn mapping from the delay coordinate Z (t) to the future value for the observed variable
x(t + ). In this approach, the prediction is done for every time period and only a stroboscopic
trajectory fx(t + ); x(t + 2 ); :::g is produced. In order to predict a full trajectory fx(t +
t ); x(t + 2t ); :::g, one can use x(t + t ) as the target value for the predictor [30]. In this
approach, the initial state should be specied by the d (= n (d 0 1) + 1) dimensional vector,
fx(0); x(t ); :::; x((d 0 1) )g. This implies that the eective embedding dimension becomes
d ( d). If the observation involves noise, the initial state is disturbed by the d -dimensional
noise, which may introduce an excessive amount of noise for prediction.
O
O
O
O
O
R
O
R
R
14
Reconstruction of chaotic dynamics
In our approach, the NGnet is trained to learn the discretized vector eld corresponding
to the original vector eld H (Z ) in the delay coordinate space. Then, the full trajectory
fx(0); x(t ); x(2t ); :::g can be obtained from the d-dimensional initial state Z (0). The embedded vector eld H (Z ) has a global constraint:
H (Z (t + )) = H (Z (t))
(k = 1; :::; d 0 1)
(24)
along the system trajectory. Here, H (1) denotes the k-th element of the vector eld H . In
order to impose this constraint explicitly on the vector eld in the learning process, the backpropagation through time algorithm [19] is required, and the on-line EM algorithm cannot be
used. Therefore, we discard this constraint in the learning process. Since the noiseless training
data satisfy this constraint, one can expect that the trained NGnet approximately satises this
constraint if the NGnet well approximates the observed data.
By employing the on-line EM algorithm, the nMSE of the discretized vector eld on the
attractor rapidly decreases, as the amount of learned data increases. After learning of 100,000
data, the nMSE becomes 0:016%. Figure 8 shows the phase portrait produced by the trained
NGnet. One can see the similarity between Figures 7 and 8.
When the trained system is running, the discretized vector eld at the current state Z (n1t )
is approximated by using the trained NGnet. The next time-step value Z ((n + 1) 1 t ) is
then calculated by using the approximated discretized vector eld. Although constraint (24) is
ignored in this process, we expect that the trained NGnet approximately satises the constraint.
Actually, the correlation coecients of Z (t + ) vs. Z (t), and Z (t + ) vs. Z (t), on the
attractor reproduced by the NGnet after the learning of the 100,000 data, are found to be
almost equal to 1.
O
O
k+1
k
k
O
O
2
4.2 Robustness to noise
1
1
0
In order to consider noise processes in a partial observation case, we assume that system
noise disturbs all of the dynamical variables, X (t), and observation noise disturbs the observed
variable x(t). Further details of the noise processes are the same as in the full observation case
discussed in Section 3.3.
In the partial observation case, the two types of noise introduce several diculties to our
learning scheme. In the full observation case, system noise helps the learning system explore
states around the attractor. In addition, the eect of system noise and observation noise in the
discretized vector eld at one observation time is independent of that at another time. Then,
the learning of a lot of data from the attractor is expected to work as temporal smoothing for
the discretized vector eld so as to remove the two types of noise. In the partial observation
case, on the other hand, the noise eect in the embedded vector eld is dependent on that
at another time. Namely, it introduces temporally correlated noise. Therefore, the temporal
smoothing does not work well, so the learning scheme becomes more sensitive to noise than in
the full observation case. Moreover, a delay coordinate trajectory with two types of noise does
not satisfy the global constraint (24).
Even with these diculties, the NGnet is able to learn the chaotic dynamics in the delay
embedding space. If system noise is less than 20% or observation noise is less than 25%, the
chaotic attractor can be reproduced by the NGnet after learning of 2,000,000 data. If the noise
level becomes larger than these values, the reproduced trajectory converges to a limit-cycle after
15
Reconstruction of chaotic dynamics
a long chaotic transient whose topological structure is similar to that of the Lorenz attractor.
Table III summarizes the characteristics of the reproduced chaotic attractors under eight noise
conditions. As the noise level increases, the CD becomes smaller than the true value and the
MLE becomes larger than the true value. Figure 9 shows the phase portrait of the chaotic
attractor produced by the NGnet after learning of 2,000,000 data which include 15% system
noise and 15% observation noise. Its complexity seems to be slightly smaller than that of Figure
7.
The prediction ability of the trained NGnet using the delay embedding is also examined.
Figure 10(a) shows a typical example of a time-series for x(t) predicted by the NGnet, which is
trained with no noise, 10% system noise, and 10% observation noise. The prediction is accurate
until about t = 5:5 in the noiseless and the 10% system noise cases. In the 10% observation
noise case, the prediction starts to deviate at t = 1:5 but continues to follow the true value
until about t = 5:5.
Figure 10(b) shows the average prediction error. Each line denotes the squared error between
the state variables predicted by the NGnet and the true state variables in the delay coordinate.
The error is averaged over 100 dierent initial conditions and is normalized by the variance of
the state variables in the delay coordinate. The prediction starts to deviate at about t = 1 but
closely follows the true value until about t = 3 on average, in all cases. The prediction still
follows the true value until about t = 4:5 in the noiseless case, t = 4 in the 10% system noise
case, and t = 3:8 in the 10% observation noise case. These prediction times are fairly long
compared to the characteristic time, 1=MLE = 1:04, but they are shorter than those in the full
observation case.
5
Learning discrete mapping
In this section, we discuss an alternative learning method. The NGnet is trained in order to
approximate the discrete mapping for the sampling time step t , i.e., X ((n + 1) 1 t ) =
K (X (n 1 t )) (n = 0; 1; 2; :::) [1]. Mathematically, this method is equivalent to the previous
method that approximates the discretized vector eld. However, the two methods treat noise
eects dierently in numerical calculations. As explained in Section 3.3, the noise variance for
the discretized vector eld is ( =t ) for the observation noise. On the other hand, the noise
variance for the discrete mapping K is (t 1 =2), which is much smaller than that of the
discretized vector eld for small t . The diculty of the discrete map method is that map K
is very close to the identity map for small t . Thus, the task of the NGnet is to estimate the
small deviation from the identity map which is of order t .
Table IV summarizes the characteristics of the reproduced attractor by the NGnet after
learning of 2,000,000 data under various noise conditions. Twelve conditions are prepared: no
noise, 10%, 20%, 30%, 40% and 50% system noise, and 10%, 20%, 30%, 40%, 50% and 60%
observation noise. Although the NGnet approximates the state point at the next observation
time, \nMSE" in the table denotes the nMSE of the discretized vector eld like in Table I.
Thus, Table IV can be compared easily with the results in Table I.
The performances for small noise up to the 30% level are comparable with those of the
discretized vector eld (VF) method. The nMSE for the system noise case in the map method
is slightly better than that of the VF method. However, the nMSE for the observation noise
O
O
2
O
O
O
2
O
O
O
O
O
16
Reconstruction of chaotic dynamics
case in the map method is slightly worse than that of the VF method. The NGnet trained with
noiseless data in the map method poorly approximates the true mapping function around the
attractor as can be seen in Table IV, and trajectories generated by this NGnet often stop after
long chaotic transients. This result may come from the fact that the NGnet in the map method
tends to approximate an identity map. The CD and the MLE in Table IV for the noiseless case
are calculated from data in the chaotic transient period.
There are remarkable dierences for large noise cases. When observation noise increases
more than 40%, the MLE in the map method decreases, while that in the VF method increases.
In addition, as the noise level increases, the number of units decreases in the map method but
increases in the VF method.
This can be interpreted as follows. In the map method, the function approximated by each
local unit is close to the identity function as stated above. Consequently, the approximated
function becomes simpler than in the VF method, so that the partition in charge of each local
unit is likely to be larger than that in the VF method. This tendency becomes more prominent
when the noise becomes large, because the data are distributed in an enlarged region when
there is more noise. This enlargement of the receptive eld acts as the spatial smoothing for
the mapping function such that it leads to a decrease in the complexity and instability of the
chaotic attractor. On the other hand, the discretized vector eld looks random under the large
noise. As a result, the estimated vector eld by the NGnet becomes complex and the number
of units increases.
Table V shows the average volume of receptive elds and the average output variance for
various noise conditions. In order to obtain the average volume of receptive elds, we calculate
the weighted mean for the determinant of the input covariance 6 . The value is normalized by
. Each \RF" value in Table V denotes the percentage of the cubic-root of the normalized
value. Therefore, the table entry indicates the relative length of the receptive elds in comparison to the input standard deviation, . The denition of the average output deviation is
the same as in Table II. From Table V, we can see that the receptive eld is larger in the map
method than in the VF method, especially when the noise level is large. On the other hand,
the average output variance is much smaller in the map method than in the VF method. This
implies that a larger region can be approximated by a linear map in the map method than in
the VF method so that the receptive elds become larger in the map method than in the VF
method.
It should be noted that the comparison of the unit number between the VF method and
the map method is meaningful. Although the unit number depends on the threshold values
for the unit production and deletion mechanisms, those values are the same between the two
methods. Each threshold value corresponds to the probability which does not directly depend
on the local linearity of the input-output relationship nor the eective noise involved in the
training data.
The map method is also applied to the partial observation case. Map K (Z ) satises the
same constraint as (24):
K (Z (t + )) = K (Z (t))
(k = 1; :::; d 0 1):
(25)
Table VI summarizes the characteristics of the attractors reproduced by the NGnet after learning of 2,000,000 data in various noise conditions. Comparing Tables III and VI, the performances in the map method seem to be slightly worse than those in the VF method. As in the
i
3
k+1
k
Reconstruction of chaotic dynamics
17
full observation case, the number of units decreases in the map method and increases in the
VF method when the noise level increases.
6
Conclusion
In this paper, we have discussed the reconstruction of chaotic dynamics by using the NGnet.
The NGnet is trained by the on-line EM algorithm. Although the chaotic dynamics is unstable
on its attractor, the discretized vector eld of the chaotic dynamics can be well approximated by
the NGnet. By investigating the characteristics of the reproduced attractors, we have found that
our method well approximates the complexity and instability of the original chaotic attractor.
Furthermore, our scheme is shown to be fairly robust to two types of noise processes, i.e.,
system and observation noises. The trained NGnet also shows good prediction performance.
When only part of the dynamical variables are observed, the delay embedding method is
used. The NGnet is trained in order to learn the discretized vector eld in the delay coordinate
space. It has been shown that the chaotic dynamics is able to be learned with this method in
the delay embedding space under the two types of noise, although the performances are lower
than those in the full observation case.
We have recently proposed a new embedding method [32] that is more robust to noise than
the delay embedding method.
18
Reconstruction of chaotic dynamics
Table I
Characteristics of chaotic attractors produced by NGnet after learning of
2,000,000 data under various noise conditions. \NU" denotes the number of
units after the learning, whose upper limit is set at 1000. \nMSE" and \LL"
denote the normalized mean squared error of the discretized vector eld in percentage and the averaged log-likelihood, respectively. The \on" and \around"
denote the nMSE or the LL evaluated on the attractor and around the attractor, respectively. The test data set is the same in all of the noise conditions.
The \on" test data set consists of 100,000 data point that are randomly sampled from noiseless Lorenz trajectories, and the \around" test data set consists
of 100,000 data point that are randomly sampled from Lorenz trajectories involving 20% system noise. \CD" denotes the correlation dimension averaged
over four sets of 50,000 points taken from the reproduced attractor. Its standard deviation is calculated from the deviation among the four sets and the
linear tting error for each data set. \MLE" denotes the maximum Lyapunov
exponent, which is calculated along a single trajectory consisting of 10,000
points. The Jacobian matrix for a trajectory point is estimated by a secondorder polynomial approximation using 32 data points in the neighborhood of
the current data. The Gram-Schmidt orthogonalization is used for the MLE
calculation. Its standard deviation is calculated for a period between t = 80
and t = 100.
NU
nMSE
LL
CD
MLE
on around
on around
True Lorenz
2:054 6 0:006 0:958 6 0:014
Noiseless
183 0.004 0.400 4:04 09:87 2:053 6 0:008 0:940 6 0:028
10% 80 0.007 0.016 05:17 05:81 2:049 6 0:007 0:998 6 0:005
Sys. noise 20% 148 0.010 0.018 06:90 07:23 2:045 6 0:007 0:994 6 0:013
30% 322 0.046 0.048 07:63 07:80 2:025 6 0:011 1:172 6 0:036
40% 569 0.059 0.078 08:14 08:25 1:862 6 0:034 0:966 6 0:029
10% 114 0.024 1.258 05:48 07:60 2:040 6 0:013 1:004 6 0:002
20% 224 0.054 2.539 06:90 08:43 2:027 6 0:015 1:073 6 0:015
30% 391 0.154 3.161 07:69 08:74 1:998 6 0:010 1:659 6 0:043
Obs. noise 40% 606 0.291 3.543 08:28 09:01 1:857 6 0:041 4:181 6 0:187
50% 885 0.421 3.871 08:71 09:26 1:586 6 0:084 4:688 6 0:312
60% 1000 0.385 3.866 09:24 09:66 1:892 6 0:029 3:629 6 0:244
70% 1000 0.530 4.123 09:83 010:18 1:666 6 0:041 2:589 6 0:189
Sys. and Obs. 30% 917 0.178 0.365 08:23 08:41 1:967 6 0:033 1:202 6 0:038
19
Reconstruction of chaotic dynamics
Table II
Actual noise levels for the vector eld and the noise levels estimated by the
trained NGnet. InP order to calculate the latter value, we obtain the weighted
mean of , i.e.,
hh1ii . The weighted mean is normalized by the vector
eld variance, . The table entry denotes the percentage of the square-root
of the normalized weighted mean. The noise conditions are the same as those
in Table I.
Actual (%) Estimated (%)
Noiseless
0.0
0.83
10%
13.4
8.9
Sys. noise 20%
26.8
17.9
30%
40.2
27.1
40%
53.6
36.5
10%
13.4
11.8
20%
26.8
22.4
30%
40.2
33.2
Obs. noise 40%
53.6
44.7
50%
67.0
56.6
60%
80.4
64.9
70%
93.7
74.1
2
i
2
M
i=1
2
i
i
20
Reconstruction of chaotic dynamics
Table III
Characteristics of chaotic attractors in the delay coordinate space reproduced
by NGnet after learning of 2,000,000 data under eight noise conditions. Denitions of table entries are the same as those in Table I. The number of units
is limited to 1000.
NU
nMSE
LL
CD
MLE
on around on around
True embedded
2:021 6 0:007 0:976 6 0:012
Noiseless
387 0.003 2.117 7.67 -16.95 1:923 6 0:063 0:882 6 0:001
5% 292 0.006 0.118 -0.28 -1.45 1:972 6 0:020 0:968 6 0:004
Sys. noise 10% 450 0.020 0.115 -2.38 -2.80 1:940 6 0:024 1:032 6 0:004
15% 813 0.075 0.173 -3.39 -3.60 1:865 6 0:034 1:480 6 0:066
5% 290 0.025 0.733 -0.89 3.43 1:857 6 0:047 2:821 6 0:199
Obs. noise 10% 504 0.060 1.022 -2.53 4.08 1:729 6 0:050 4:208 6 0:132
15% 943 0.146 1.458 -3.36 -4.33 1:732 6 0:040 4:250 6 0:272
20% 1000 0.120 1.584 -4.28 -4.94 1:603 6 0:044 3:997 6 0:200
21
Reconstruction of chaotic dynamics
Table IV
Characteristics of chaotic attractors reproduced by NGnet after learning of
2,000,000 data under various noise conditions. The target output for NGnet
is state value at the next observation time. Denitions of table entries are the
same as those in Table I.
z The number of learned data is 3,000,000, because reproduced trajectory
after learning 2,000,000 data often goes into a limit-cycle after a long chaotic
transient.
NU
nMSE
LL
CD
MLE
on around on around
Noiseless
181 0.055 450.4 16.70 -6.20 2:039 6 0:020 1:039 6 0:002
10% 138 0.005 0.012 9.27 8.53 2:048 6 0:010 0:968 6 0:006
20% 118 0.007 0.010 6.71 6.39 2:052 6 0:006 0:895 6 0:011
Sys. noise 30% 88 0.014 0.016 4.86 4.70 2:047 6 0:010 0:914 6 0:004
40% z 78 0.016 0.019 3.59 3.53 2:045 6 0:007 0:909 6 0:002
50% 65 0.036 0.039 2.48 2.46 2:042 6 0:011 0:890 6 0:018
10% 139 0.036 2.111 7.45 5.55 2:034 6 0:011 1:177 6 0:010
20% 96 0.117 2.766 4.70 3.83 2:017 6 0:011 1:293 6 0:048
Obs. noise 30% 84 0.232 3.172 3.18 2.65 1:984 6 0:016 1:094 6 0:022
40% 82 0.412 3.525 2.10 1.73 1:942 6 0:013 0:765 6 0:020
50% 86 0.657 3.874 1.33 1.05 1:878 6 0:038 0:716 6 0:020
60% 76 1.059 4.439 0.52 0.29 1:564 6 0:052 0:855 6 0:011
22
Reconstruction of chaotic dynamics
Table V
Average volume of receptive elds (\RF") and average output deviation (\Output dev.") in the vector eld method (VF) and the map method (Map). In
order to obtain the average volume of receptive elds, we calculate the weighted
mean for the determinant of the input covariance 6 . The value is normalized
by . Each table entry of RF denotes the percentage of the cubic-root of the
normalized value. The denition of the average output deviation is the same
as in Table II.
VF
Map
RF (%) Output dev. (%) RF (%) Output dev. (%)
Noiseless
3.9
0.83
5.1
0.076
10% 11.4
8.9
9.2
0.67
20% 13.9
17.9
15.3
1.34
Sys. noise 30% 15.8
27.1
23.5
2.01
40% 16.3
36.5
31.8
2.67
50%
40.7
3.35
10% 8.1
11.8
9.8
1.20
20% 9.0
22.4
15.6
2.37
30% 9.5
33.2
19.9
3.49
Obs. noise 40% 10.0
44.7
25.2
4.61
50% 9.5
56.6
29.3
5.71
60% 9.8
64.9
35.1
6.79
70% 10.3
74.1
3
i
23
Reconstruction of chaotic dynamics
Table VI
Characteristics of chaotic attractors reproduced by NGnet after learning of
2,000,000 data under eight noise conditions. The target output for NGnet is
the next point in the delay coordinate. Denitions of table entries are the
same as those in Table I.
NU
nMSE
LL
CD
MLE
on around on around
Noiseless
183 0.131 18.79 18.07 8.19 1:838 6 0:025 2:445 6 0:159
5% 203 0.009 0.102 12.73 12.00 1:927 6 0:040 0:960 6 0:002
Sys. noise 10% 142 0.017 0.107 9.98 9.69 1:685 6 0:049 0:834 6 0:016
15% 126 0.030 0.122 8.42 8.24 1:268 6 0:066 0:863 6 0:007
20% 114 0.048 0.138 7.23 7.12 1:605 6 0:038 0:599 6 0:029
5% 139 0.038 0.331 11.34 10.36 1:750 6 0:067 2:068 6 0:095
Obs. noise 10% 129 0.093 0.660 9.19 8.53 1:791 6 0:027 1:874 6 0:049
15% 120 0.260 1.158 7.82 7.33 1:657 6 0:067 2:712 6 0:124
Reconstruction of chaotic dynamics
24
References
[1] Abarbanel, H. D. I, Brown, R., Sidorowich, J. J., & Tsimring, L. Sh. (1993). The analysis
of observed chaotic data in physical systems. Reviews of Modern Physics, 65, 1331-1392.
[2] Allie, S., Mees, A., Judd, K., & Watson, D. (1997). Reconstructing noisy dynamical systems by triangulations. Physical Review E, 55, 87-93.
[3] Allie, S., & Mees, A. (1997). Finding periodic points from short time series. Physical Review
E, 56, 346-350.
[4] Casdagli, M. (1989). Nonlinear prediction of chaotic time series. Physica D, 35, 335-356.
[5] Casdagli, M., Eubank, S., Farmer, J. D., & Gibson, J. (1991). State space reconstruction
in the presence of noise, Physica D, 51, 52-98.
[6] Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of Royal Statistical Society B, 39, 1-22.
[7] Eckmann, J.-P., Kamphorst, S. O., Ruelle, D., & Ciliberto, S. (1986). Liapunov exponents
from time series. Physical Review A, 34, 4971-4979.
[8] Farmer, J. D., & Sidorowich, J. J. (1987). Predicting chaotic time series. Physical Review
Letters, 59, 845-848.
[9] Fraser, A. M., & Swinney, H. L. (1986). Independent coordinates for strange attractors
from mutual information. Physical Review A, 33, 1134-1140.
[10] Froyland, G., Judd, K., Mees, A. I., & Watson, D. (1995). Constructing invariant measures
from data. International Journal of Bifurcation and Chaos, 5, 1181-1192.
[11] Froyland, G., Judd, K., & Mees, A. I. (1995). Estimating of Lyapunov exponents of dynamical systems using a spatial average. Physical Review E, 51, 2844-2855.
[12] Grassberger, P., & Procaccia, I. (1983). Characterization of strange attractors. Physical
Review Letters, 50, 346-349.
[13] Grassberger, P., Schreiber, T., & Scharath, C. (1991). Nonlinear time sequence analysis.
International Journal of Bifurcation and Chaos, 1, 521-547.
[14] Guckenheimer, J., & Holmes, P. (1983). Nonlinear Oscillations, Dynamical Systems, and
Bifurcations of Vector Fields, New York: Springer-Verlag.
[15] Ito, K. (1975). Stochatic dierentials. Applied Mathematical Optimization, 1, 347-381.
[16] Kennel, M. B., Brown, R., & Abarbanel, H. D. I. (1992). Determining minimum embedding
dimension using a geometrical construction. Physical Review A, 45, 3403-3411.
[17] Lapedes, A., & Farber, R. (1987). Nonlinear signal processing using neural networks:
prediction and system modelling. Technical Report LA-UR-87, 2662, Los Alamos: Los
Alamos National Laboratory.
Reconstruction of chaotic dynamics
25
[18] Moody, J., & Darken, C. J. (1989). Fast learning in networks of locally-tuned processing
units. Neural Computation, 1, 281-294.
[19] Sato, M. (1990). A learning algorithm to teach spatiotemporal patterns to recurrent neural
networks, Biological Cybernetics, 62, 259-263.
[20] Sato, M., Joe, K., & Hirahara, T. (1990). APOLONN brings us to the real world. In
Proceedings of 1990 International Joint Conference on Neural Networks (pp. 581-587).
[21] Sato, M., Murakami, Y., & Joe, K. (1990). Learning chaotic dynamics by recurrent neural
networks. In Proceedings of International Conference on Fuzzy Logic & Neural Networks
(pp. 601-605).
[22] Sato, M., & Murakami, Y. (1991). Learning nonlinear dynamics by recurrent neural networks. In Proceedings of Symposium on Some Problems on the Theory of Dynamical Systems in Applied Science (pp. 49-63), Singapore: World Scientic.
[23] Sato, M., & Ishii, S. (2000). On-line EM algorithm for the normalized Gaussian network.
Neural Computation, 12, 407-432.
[24] Sato, M. (2000). On-line model selection based on the variational Bayes. Neural Computation, 13(7).
[25] Sato, M. (2000). Convergence of on-line EM algorithm. In Proceedings of 7th International
Conference on Neural Information Processing (pp. 476-481).
[26] Sauer, T., Yorke, J. A., Casdagli, M. (1991). Embedology. Journal of Statistical Physics,
65, 579-616.
[27] Shimada, I. & Nagashima, T. (1979). A numerical approach to ergodic problem of dissipative dynamical systems. Progress of Theoretical Physics, 61, 1605-1616.
[28] Sugihara, G. & May, R. (1990). Nonlinear forecasting as a way of distinguishing chaos
from measurement error in forecasting. Nature, 344, 734-741.
[29] Takens, F. (1981). Detecting strange attractors in uid turbulence. In Dynamical Systems
and Turbulence, Berlin: Springer-Verlag.
[30] Tokuda, I. (1993). Deterministic prediction and speech signals of the Japanese vowel /a/.
Master Thesis for University of Tsukuba.
[31] Xu, L., Jordan, M. I., & Hinton, G. E. (1995). An alternative model for mixtures of experts.
In Advances in Neural Information Processing Systems 7 (pp. 633-640), Cambridge: MIT
Press.
[32] Yoshida, W., Ishii, S., & Sato, M. (2000). Reconstruction of chaotic dynamics using a noiserobust embedding method. In Proceedings of IEEE International Conference on Acoustics,
Speech, and Signal Processing, I, 181-184.
26
Figures
Lorenz attractor
50
40
z(t)
30
20
10
0
40
20
20
10
0
0
−20
−10
−40
y(t)
−20
x(t)
Figure 1
The Lorenz attractor. 10,000 points from a single trajectory are plotted.
#data=100,000, noiseless, nMSE=0.015%
50
40
z(t)
30
20
10
0
40
20
20
10
0
0
−20
y(t)
−10
−40
−20
Figure 2
x(t)
The attractor reproduced by NGnet after learning of 100,000 data. The nMSE
of the discretized vector eld is 0.015% and the number of units is 135.
27
Figures
Receptive fields (No=135)
50
40
z(t)
30
20
10
0
40
20
20
10
0
0
−20
−10
−40
y(t)
−20
x(t)
Figure 3
Receptive elds of the 135 units after the learning of 100,000 data. The ith receptive eld is depicted by an ellipse whose axes correspond to the two
principal components of the local covariance matrix 6 . The ellipse center is
set at the unit center .
i
i
Normalized mean squared error
0.6
0.5
0.4
0.3
0.2
0.1
0
0
2
4
6
No. of data
Figure 4(a)
8
10
5
x 10
28
Figures
4
Weighted log−likelihood
2
0
−2
−4
−6
−8
−10
−12
−14
0
2
4
6
No. of data
8
10
5
x 10
8
10
5
x 10
Figure 4(b)
Log−likelihood for test data
5
0
−5
−10
−15
0
2
4
6
No. of data
Figure 4(c)
Learning curves of the on-line EM algorithm. In training, no
noise (solid), 10% system noise (dashed), 20% system noise (dash-dotted),
and 30% system noise (dotted) are added to the dynamical variables. (a) The
nMSE of the discretized vector eld, which is evaluated using 10,000 test data
randomly taken from the Lorenz attractor. (b) The discounted log-likelihood
for the training data. Our on-line EM algorithm maximizes the discounted loglikelihood. (c) The averaged log-likelihood, which is evaluated for the same
test data set with the nMSE.
Figure 4
29
Figures
#data=3,000,000, SN=30%, ON=30%, nMSE=0.178%
50
40
z(t)
30
20
10
0
40
20
20
10
0
0
−20
−10
−40
y(t)
−20
x(t)
Figure 5
The attractor reproduced by NGnet after learning of 3,000,000 data with both
30% system noise and 30% observation noise. The nMSE of the discretized
vector eld is 0.178% and the number of units is 917.
True delay attractor
20
x(t+2*TAU)
10
0
−10
−20
20
10
20
10
0
0
−10
x(t+TAU)
−10
−20
−20
Figure 7
x(t)
The Lorenz attractor embedded to a delay coordinate for = 0:15.
30
Figures
20
x(t)
10
0
−10
−20
0
1
2
3
4
5
6
7
8
(a)
Prediction Error
1
0
0
1
2
3
4
5
Time(t)
(b)
Figure 6
(a) The time-series predicted by NGnet, which is trained with no noise
(dashed), 10% system noise (dash-dotted), and 10% observation noise (dotted). The solid line denotes the true Lorenz time-series. (b) The squared error
between the true Lorenz time-series and the time-series predicted by NGnet.
The error is averaged over 100 trials and is normalized by the variance of the
state variables. The lines correspond to no noise (dashed), 10% system noise
(dash-dotted), and 10% observation noise (dotted).
31
Figures
#data=100,000, noiseless, nMSE=0.016%
20
x(t+2*TAU)
10
0
−10
−20
20
10
20
10
0
0
−10
−10
−20
x(t+TAU)
−20
x(t)
Figure 8
The attractor in the delay coordinate space reproduced by NGnet after learning
of 100,000 data. The nMSE of the discretized vector eld is 0.016% and the
number of units is 285.
#data=2,000,000, SN=15%, ON=15%, nMSE=0.037%
20
x(t+2*TAU)
10
0
−10
−20
20
10
20
10
0
0
−10
x(t+TAU)
−10
−20
−20
Figure 9
x(t)
The attractor in the delay coordinate space reproduced by NGnet after learning
of 2,000,000 noisy data. 15% system noise is added to all dynamical variables,
and 15% observation noise is added to the observed variable x(t). The nMSE
of the discretized vector eld in the delay coordinate is 0.037% and the number
of units is 1000.
32
Figures
20
x(t)
10
0
−10
−20
0
1
2
3
4
5
Time(t)
6
7
8
(a)
Prediction Error
1
0
0
1
2
3
4
5
Time(t)
(b)
Figure 10
(a) The time-series predicted by NGnet, which is trained with no noise
(dashed), 10% system noise (dash-dotted), and 10% observation noise (dotted). The solid line denotes the time-series of the observed variable x(t) for
the true Lorenz system. (b) The squared error between the true Z (t) trajectories of the Lorenz system and the trajectories predicted by the NGnet. The
error is averaged over 100 trials and is normalized by the variance of the state
variable in the delay coordinate. The lines correspond to no noise (dashed),
10% system noise (dash-dotted), and 10% observation noise (dotted).