SOME RECENT ADVANCES IN AVERAGING
YURI KIFER
To my friend Tolya Katok for his sixtieth birthday
A BSTRACT. In the study of systems which combine slow and fast motions the averaging principle
suggests that a good approximation of the slow motion on long time intervals can be obtained by
averaging its parameters in fast variables. A better diffusion approximation of the slow motion can
be justified provided the fast motion is sufficiently fast mixing. We extend this approximation to the
setup of suspension flows over sufficiently fast mixing transformations which is important from the
dynamical systems point of view. When slow and fast motions depend on each other (fully coupled), as
it is usually the case, for instance, in perturbations of Hamiltonian systems, the averaging prescription
not always can be applied and when it works then usually only in some averaged with respect to initial
conditions sense. We discuss problems arising in this situation and provide some general necessary
and sufficient conditions for the averaging principle to hold true (in the above sense) which can be
verified both in previously known cases and (via some large deviations argument) in the case when fast
motions are hyperbolic (Axiom A) flows for each frozen slow variable.
1. I NTRODUCTION
Evolution of many real systems can be viewed as a combination of slow and fast motions, which
leads to complicated double-scale equations that usually cannot be analysed directly. Already in
the 19th century, in applications to celestial mechanics, it was well understood (though without
rigorous justification) that a good approximation of the slow motion can be obtained by averaging
its parameters in the fast variables. Later, averaging methods were applied in signal processing and,
rather recently, to model climate–weather interactions (see [13] and [20]). The classical setup of
averaging justified rigorously in [7] presumes that the fast motion does not depend on the slow one
and most of the work on averaging treats this case only. On the other hand, in real systems both slow
and fast motions depend on each other. This leads to the more difficult fully coupled case, which
emerges, in particular, in typical perturbations of Hamiltonian systems, where motions on constant
energy manifolds are fast and across them are slow.
The averaging setup emerges naturally in the following way. Suppose that an idealized physical
system described by an (n + m)-dimensional system of ordinary differential equations
(1.1)
dW (t)
= V (W (t)), W (t) = w
dt
possesses n integrals (constants) of motion H(w) = (H1 (w), ..., Hn (w)), i.e. Hk (W (t)) ≡ Hk (w)
∀t, k. If the Jacobi matrix (∂Hk /∂wi ) has the maximal rank n in an n−dimensional domain G then
2000 Mathematics Subject Classification. Primary: 37D20 Secondary: 34C29, 60F10.
Key words and phrases. averaging principle, diffusion approximation, hyperbolic attractors.
The author was partially supported by US–Israel BSF.
1
2
YURI KIFER
setting X(t) = H(W (t)) we can rewrite (1.1) in the form
dX(t)
dYx (t)
= 0, X(0) = x,
= b(x, Yx (t)), Yx (0) = y.
dt
dt
A real physical system can be viewed as a small perturbation of the above idealized one, and so it
ε
ε
should be described by solutions X ε = Xx,y
and Y ε = Yx,y
of a perturbed system of equations
having the form
(1.2)
dY ε (t)
dX ε (t)
= εB(X ε (t), Y ε (t), ε),
= b(X ε (t), Y ε (t), ε),
dt
dt
with initial conditions X ε (0) = x and Y ε (0) = y. The solutions of (1.3) determine the flow of
ε
ε
diffeomorphisms Φtε acting by Φtε (x, y) = (Xx,y
(t), Yx,y
(t)), which is a perturbation of the flow
Φt = Φt0 acting by Φt0 (x, y) = (x, Fxt y), where Fxt is another family of flows given by Fxt y =
0
Yx,y (t) with Y = Yx,y = Yx,y
. Observe that a discrete time averaging setup makes perfect sense as
well, and it was treated in [20]. In this case a physical system is described by difference equations,
which replace (1.1)–(1.3), and we have to deal with transformations rather than with flows as above.
In general, the system (1.3) should be considered on a (locally trivial) fiber bundle M = {(x, y) :
x ∈ U, y ∈ Mx } with a base U being an open subset in a Riemannian manifold N and Fxt acting
on the fiber Mx , all of them being pairwise diffeomorphic compact Riemannian manifolds. On the
other hand, M has a local product structure and if kBk is bounded then the slow motion X ε stays
in one chart during time intervals of order ∆/ε with ∆ small enough. Hence, studying the behavior
of X ε on each such time interval separately it suffices to deal only with the product space Rn × M
where M is a compact Riemannian manifold, and will only have to piece results together to see the
picture on a larger time interval of length T /ε. Observe, that if from the beginning we consider a
perturbation of the original equation (1.1) then generically it can be represented in the form (1.3) but
only locally, and different domains where (1.1) reduces to (1.3) are separated by certain surfaces.
The perturbed system can pass through these surfaces, which creates new interesting effects (see, for
instance, [5]) but this direction will not be discussed here.
Set B(x, y) = B(x, y, 0) and assume that the limit
Z T
(1.4)
B̄(x) = B̄y (x) = lim T −1
B(x, Fxt y)dt
(1.3)
T →∞
0
0
exists and it is the same for “many” y s. Namely, let µx be an ergodic invariant measure of the flow
Fxt . Then the limit (1.4) exists for µx −almost all y and it is equal to
Z
(1.5)
B̄(x) = B̄µx (x) = B(x, y) dµx (y).
If b(x, y) does not, in fact, depend on x then Fxt = F t and µx = µ are also independent of x and we
arrive at the classical setup. In this case Lipschitz continuity of B implies already that B̄(x) is also
Lipshitz continuous in x, and so there exists a unique solution X̄ = X̄x of the averaged equation
dX̄ ε (t)
= εB̄(X̄ ε (t)), X̄ ε (0) = x.
dt
Then the classical averaging principle says (see [27]) that
(1.6)
(1.7)
lim
sup
ε→0 0≤t≤T /ε
ε
|Xx,y
(t) − X̄xε (t)| = 0
for all x and for those y for which (1.4) holds true. In the general case (1.3) (which we call fully
coupled by natural reasons) the averaged vector field B̄(x) in (1.4)–(1.5) may not even be continuous
SOME RECENT ADVANCES IN AVERAGING
3
in x, let alone Lipschitz, and so (1.6) may have many solutions or not at all. Moreover, there may
exist no natural family of invariant measures µx that depend well on x ∈ Rn since dynamical
systems Fxt may have rather different properties for different x’s. Even when all measures µx are
the same the averaging principle often does not hold true in the form (1.7), for instance, in the
presence of resonances (see [24]). Thus even basic results on approximation of the slow motion by
the averaged one in the fully coupled case should usually be formulated in a different way and they
require stronger and more specific assumptions.
Assume, first, that b(x, y) = b(y), Fx = F , and µx = µ do not depend on x where µ is an
F −invariant ergodic measure on M. Then (1.6) holds true and the next natural step is to describe
ε
ε
the error of the averaging approximation. Namely, set Z ε (t) = Zx,y
(t) = Xx,y
(t/ε) and Z̄(t) =
ε
Z̄x (t) = X̄ (t/ε) which satisfy
dZ̄(t)
dZ ε (t)
= B(Z ε (t), F t/ε y) and
= B̄(Z̄(t)), Z ε (0) = Z̄(0) = x.
dt
dt
It turns out that for “sufficiently chaotic” flows F t the normalized error
(1.8)
(1.9)
ε,κ
ε
Vx,y
(t) = εκ−1 (Zx,y
(t) − Z̄x (t))
exhibits stochastic properties considered as a random variable in y on the probability space (M, µ).
Namely, if F t is an Axiom A flow in a neighborhood of a hyperbolic attractor and µ = µSRB is the
ε,1
corresponding Sinai-Ruelle-Bowen measure then µSRB {y : Vx,y
(·) ∈ ·} satisfies large deviations
ε
bounds (see [17]) which says, in particular, that most likely the slow motion Xx,y
(t) will go close to
SRB
the averaged with respect to µ motion and other possibilities have exponentially small likelihood.
Moreover, [17] describes also transitions of the slow motion between attractors which happens on
time intervals that are exponentially large in 1/ε. When 1/2 < κ < 1 the asymptotical study as ε →
ε,κ
0 of µSRB {y : Vx,y
(·) ∈ ·} yield moderate deviations bounds (see [18]). These results remain true if
SRB
µ is replaced by the normalized Riemannian volume which is important from a physical point of
ε,1/2
view. Under even more general assumptions it is possible to show (see [18]) that Vx,y (t) considered as a stochastic process on the probability space (M, µ) tends weakly to a Gaussian process as
ε → 0. We will show in the next section that for each ε there exists a (nonlinear) diffusion process S ε
solving the following (nonlinear) stochastic differential equation (suggested by Hasselmann in [13])
√
(1.10)
dS ε (t) = B̄(S ε (t))dt + εσ(S ε (t))dW (t),
which provides better approximation of the slow motion than the averaged motion and has
advantages over the above Gaussian approximation, as well. Moreover, a slightly corrected version
of (1.10) still makes sense when we replace Rn by a Riemannian manifold, i.e. when B and B̄
are vector fields on a manifold N and, correspondingly, Z ε and Z̄ live on N. In this case it would
natural to consider the diffusion S ε as a global process on N and not as a collection of processes
each defined only in one coordinate chart, which is the only possibility for Gaussian approximations.
The equation (1.10) does not make sense in general on a manifold but we will show in Section 2
that its order ε correction can be defined globally on a manifold.
Next, let us discuss further the general fully coupled situation. If convergence in (1.4) is
uniform in x and y then (see, for instance, [20]) any limit point Z̄(t) = Z̄x (t) as ε → 0 of
ε
ε
Zx,y
(t) = Xx,y
(t/ε) is a solution of the averaged equation in (1.8). It is known that the limit
in (1.4) is uniform in y if and only if the flow Fxt on M is uniquely ergodic, i.e. it possesses a
unique invariant measure, which occurs rather rarely. If, in addition, the uniform convergence in y
is required, as well, we arrive at very few specific models of uniquely ergodic families which is too
restrictive and excludes many interesting cases. Probably, the first relatively general result on fully
4
YURI KIFER
coupled averaging is due to Anosov [1] (see also [24] and [19]) which says that if each flow Fxt
preserves a probability measure µx on M having a density with respect to the Riemannian volume
m on M that is C 1 dependent on x then for any δ > 0,
(1.11)
mes{(x, y) :
ε
sup |Zx,y
(t) − Z̄x (t)| > δ} → 0 as ε → 0,
0≤t≤T
where mes is the product of m and the Lebesgue measure in a relatively compact domain X ⊂ Rn .
This says that Z ε converges to Z̄ in measure and not only an improvement of (1.11) to (1.7) is
impossible in a general fully coupled situation but as we will see from an example in Section 3,
ε
Zx,y
may not converge to Z̄x (t) for any fixed x, y.
Observe that Anosov’s theorem does not cover other situations where the averaging principle is
known to work, for instance, when the fast motion Fxt = F t does not depend on the slow variable
x where no assumptions on invariant measures except ergodicity are required. So it is useful to
have necessary and sufficient conditions, which we exhibit in Section 3, ensuring us that for given
probability measures ν on Rn and µx on M,
Z Z
ε
sup |Xx,y
(t) − X̄xε (t)| dµx (y) dν(x) → 0 as ε → 0.
(1.12)
Rn
M 0≤t≤T /ε
In addition to known cases described above it turns out that our conditions can be verified when fast
motions are hyperbolic flows. Namely, we consider the case when Fxt are Axiom A flows depending
C 2 on x acting in a neighborhood of an attractor
Λx . Let µSRB
be the Sinai-Ruelle-Bowen (SRB)
x
R
t
invariant measure of Fx on Λ and set B̄(x) = B(x, y) dµSRB
(y).
It is known (see [9]) that the vecx
tor field B̄(x) is Lipschitz continuous in x, and so the averaged equation (1.6) is uniquely solvable.
Still, in general, the measures µSRB
x are singular with respect to the Riemannian volume on M, and so
the method of [1] cannot be applied here, though, as it turns out, our general conditions are satisfied
in this case, and so (1.12) hold true both with µx = µSRB
x and with µx being the Riemannian volume
on M. Observe that the large and moderate deviations results, as well as the Gaussian and diffusion
approximations discussed above in the uncoupled case are not yet proved and seem to be difficult
in the fully coupled situation. Nonresonant and resonant unperturbed fast motions which are toral
rotations differ from each other by having a unique and many invariant measures, respectively, with
resonant directions occuring “very rarely”. When unperturbed motions are Axiom A flows they all
have abundance of invariant measures but, still, the averaging principle in the form (1.12) takes place
in this case. Actually, it is more reasonable to explain problems in averaging due to resonances by the
fact that the reference (Lebesgue) measure becomes nonergodic for some parameters and the time
averaging there has nothing to do with the space averaging with respect to this measure. The situation in the Axiom A case is different in this respect since µx = µSRB
x is ergodic for each x in question
and, probably, resonance effects cannot be observed in this case. Note that when the unperturbed fast
motions Fxt are toral translations
√ Neishtadt obtained the optimal speed of convergence to 0 in (1.11)
and (1.12) which is of order ε ( see [2] and [24]). On the other hand, when Fxt are hyperbolic
flows the speed of convergence to 0 in (1.11) is exponential in −1/ε (see Theorem 3.4 below).
The study of deviations from the averaged motion in the fully coupled case seems to be quite
important for applications, especially from the phenomenological point of view. In addition to
perturbations of Hamiltonian systems mentioned above there are many non-Hamiltonian systems
which are naturally to consider from the beginning as a combination of fast and slow motions.
For instance, Hasselmann [13] based his model of weather–climate interaction on the assumption
that weather is a fast chaotic motion depending on climate as a slow motion which differs from
SOME RECENT ADVANCES IN AVERAGING
5
the corresponding averaged motion mainly by a diffusion term which has been considered as a
justification of stochastic models of climate.
Traditionally, averaging methods were employed in the investigation of two-scale ordinary differential equations as above but it is well known that the study of discrete time dynamical systems
enables us to deal with a wider class of models and examples and to reveal new effects. The averaging setup in the discrete time case comes into the picture when we start with a transformation
F0 (x, y) = (x, fx y) where x ∈ Rn and fx : M → M is a family of smooth maps of a compact Riemannian manifold M (though the case of transformations of a locally trivial fiber bundle
M = {(x, y) : x ∈ U, y ∈ Mx }, with fx mapping Mx to itself, can be considered as well). If
F0 describes a discrete time evolution of an idealized physical system then a real one should be
described by a perturbed transformation
Fε (x, y) = (x + εΦ(x, y, ε), f (x, y, ε))
(1.13)
n
where Φ(·, ·, ε) : R × M → Rn and f (x, ·, ε) : M → M. In place of (1.3) we arrive now at the
ε
ε
difference equation for X ε = Xx,y
and Y ε = Yx,y
having the form
(1.14)
X ε (k + 1) − X ε (k) = εΦ(X ε (k), Y ε (k), ε), X ε (0) = x,
Y ε (k + 1) = f (X ε (k), Y ε (k), ε), Y ε (0) = y,
ε
ε
so that Fεk (x, y) = (Xx,y
(k), Yx,y
(k)).
Set Φ(x, y) = Φ(x, y, ε) and assume that the limit
N −1
1 X
Φ(x, fxk y)
N →∞ N
n=0
Φ̄(x) = Φ̄y (x) = lim
(1.15)
exists and it is the same for “many” y’s. Namely, if µx is an ergodic fx -invariant measure then the
limit (1.15) exists for µx -almost all y’s and
Z
(1.16)
Φ̄(x) = Φ̄µx (x) = Φ(x, y) dµx (y).
Assume that Φ̄(x) is Lipschitz continuous in x so that the ordinary differential equation
dX̄ ε (t)
= εΦ̄(X̄ ε (t)), X̄ ε (0) = x
dt
for X ε (t) = Xxε (t) has a unique solution. In [20] we gave conditions similar to ones in Theorem
3.1 of the present paper which ensure that
Z Z
ε
(1.18)
max |Xx,y
(k) − X̄xε (k)| dµx (y) dν(x) → 0 as ε → 0.
(1.17)
Rn
M 0≤k≤T /ε
In particular, by [20] these conditions are satisfied when fx , x ∈ Rn is a family of C 2 and C 2
depending on x expanding transformations of M or Axiom A diffeomorphisms considered in a
neighborhood of a hyperbolic attractor and µx is the corresponding Sinai-Ruelle-Bowen measure
of fx . When the maps fx are toral translations or skew translations the speed of convergence to 0
in (1.18) can be estimated by some (small) power of ε (see [20]). Modifying slightly the proof of
Corollary 2.3 in [20] in the spirit of the proof of Theorem 3.4 below given in [21] it is possible to
show that in the case when fx are hyperbolic or expanding one obtains the convergence to zero in
ε
measure of max0≤k≤T /ε |Xx,y
(k) − X̄xε (k)| that is exponentially fast in −1/ε similarly to (3.11)
and (3.12) below.
In the case when the fast motion Y ε (k) does not depend on the slow motion, i.e. f (x, y, ε) = f y
ε
and Yx,y
(k) = f k y, it is not difficult to modify (actually, to simplify) the proofs from [17], [18],
6
YURI KIFER
and [22] in order to obtain similar results for the discrete time setup concerning large and moderate
ε
deviations, as well as the Gaussian and diffusion approximations for the averaging error Xx,y
− X̄xε .
In particular, Theorems 2.1 and 2.2 below can be easily reformulated for the discrete time setup with
the diffusion S ε (t) from (1.10) taken at times εk, 0 ≤ k ≤ T /ε and the diffusion matrix σ obtained
by (2.5) or (2.19) below with ξu = ξk or F u y = F k y, respectively, when k ≤ u < k + 1.
2. D IFFUSION
APPROXIMATION
First, we discuss the following probabilistic setup considered in [22]. Let ξt = ξt (ω) be a stationary random process on a complete probability space (Ω, F , P ) taking values in some measurable
space Y and in place of (1.8) consider
dZ̄(t)
dZ ε (t)
= B(Z ε (t), ξt/ε ) and
= B̄(Z̄(t)), Z ε (0) = Z̄(0) = x
dt
dt
where B is measurable on Rn × Y, B̄(x) = EB(x, ξ0 ) and E denotes the expectation.
Let Fst , −∞ ≤ s ≤ t ≤ ∞ be the (complete) σ−algebra generated by ξu when u varies between
s and t. Recall, (see [10]) that the process ξu is called strongly mixing if
t
∞
(2.2)
α(τ ) = sup |P (U ∩ V ) − P (U )P (V )| : U ∈ F−∞
, V ∈ Ft+τ
↓ 0 as τ ↑ ∞.
(2.1)
t,U,V
The following result was proved in [22].
2.1. Theorem. Suppose that ξ is a measurable strongly mixing stationary process with values in a
Polish space Y (i.e. a complete separable metric space) and there exists a constant K > 0 such that
Z ∞
(2.3)
kBi (·, y)kC 2 ≤ K ∀y ∈ Y, i = 1, ..., n and
τ α(τ )dτ < ∞,
0
∂ 2 f (x)
|, | ∂x
|)
supx maxi,j max(|f (x)|, | ∂f∂x(x)
i
i ∂xj
is the uniform C 2 -norm in x ∈ Rn .
where kf kC 2 =
Then, without changing its distribution, for each ε > 0 and any initial condition x = Z ε (0) we can
redefine the process ξu on a richer probability space where there exists a n−dimensional Brownian
motion W (t) such that the solutions Z ε of (2.1) and S ε of (1.10) with S ε (0) = x and these ξ and
W satisfy
(2.4)
E sup |Z ε (t) − S ε (t)|2 ≤ Cδ,T ε1+δ
0≤t≤T
−1
for any δ < 2(177 + 90n)
where Cδ,T > 0 does not depend on ε and a diffusion matrix function σ(x) in (1.10) is chosen to be bounded, Lipschitz continuous and satisfying
σ(x)σ ∗ (x) = a(x), a(x) = (aij (x)) with
Z t
Z t
1
(Bi (x, ξu ) − B̄i (x))du
(Bj (x, ξu ) − B̄j (x))du .
(2.5)
aij (x) = lim E
t→∞ t
0
0
The latter limits exist, have bounded C 2 norms and for each x the matrix a(x) is symmetric and
nonnegative definite.
According to [12] and [25] it suffices to take the richer probability space of Theorem 2.1 to be
the product (Ω̃, F̃ , P̃ ) = ([0, 1], B, Leb) × (Ω, F , P ), where B is the Borel σ-algebra and Leb
is the Lebesgue measure, and to redefine ξ on Ω̃ so that ξt ((u, ω)) = ξt (ω). The space (Ω̃, F̃ , P̃ )
is already rich enough to possess a Brownian motion W (t) so that S ε defined by (1.10) with such
W (t) will satisfy (2.3).
SOME RECENT ADVANCES IN AVERAGING
7
Suppose that B(·, ξ) in (2.1) is a family of C 2 −bounded vector fields on a d−dimensional Riemannian manifold N continuously (even just measurably) depending on a parameter ξ ∈ Y. Then
(2.1) and its solution Z ε can be considered globally on the manifold N which we assume to have a
positive injectivity radius. In this case B̄ will also be a vector field on N, so the averaged motion
Z̄ defines a dynamical system on N. The stochastic differential equation (1.10) is not written in
the form which makes sense on a manifold. The matrix function a defined by (2.5) is C 2 but, in
general, it is only nonnegative definite so we can only be sure that σ is Lipschitz (see, for instance,
[14], Section 1.3) which does not enable us to write (1.10) in the Stratonovich form which is used
usually when dealing with diffusions on manifolds. Still, it is easy to see from the formula (2.5)
for a(x) = (aij (x)) that if it is written in local coordinates (x1 , ..., xn ) and ã(x) = (ãij (x)) is its
expression with respect to another set of local coordinates (x̃1 , ..., x̃n ) at x then
ãkl (x) =
(2.6)
n
X
i,j=1
aij (x)
∂ x̃k ∂ x̃l
,
∂xi ∂xj
i.e. a(x) is a (0, 2) tensor field. Next, we observe that there exists a (not unique) second order elliptic
differential operator L on the manifold N with a prescribed symbol, i.e. coefficients in second
derivatives a(x) = aij (x), provided (2.6) holds true. Define, for instance, L in local coordinats
(x1 , ..., xn ) by the formula
n
X
p
1 1
∂ ∂
p
aij (x) g(x)
2 g(x) i,j=1 ∂xi
∂xj
p
p
where g(x) is the element of the Riemannian volume, i.e. dv =
g(x)dx1 · · · dxn . This is
a (weakly) elliptic operator which is self-adjoint with respect to the Riemannian volume and its
coefficients obey (as it is easy to check) the correct change of coordinates transformation rules, i.e.
L is indeed a differential operator on the manifold. Since
n
X
∂
B̄ =
B̄i (x)
∂x
i
i=1
(2.7)
L=
is a vector field then Lε = εL + B̄ is again an elliptic 2nd order differential operator on the manifold
N. Next, we can proceed as in Section 1.3 of [14] in order to construct a diffusion Ŝ ε which has
generator Lε and solves
√
(2.8)
dŜ ε (t) = B̄(Ŝ ε (t)) + εQ(Ŝ ε (t)) dt + εσ(Ŝ ε (t))dW (t)
where Q = (Qi , i = 1, ..., d) and
(2.9)
d
∂
1 X ∂aij (x) 1
+ aij (x)
(ln g(x)) .
Qj (x) =
2 i=1
∂xi
2
∂xi
Namely, relying on the Whitney emedding theorem embed N smoothly as a closed submanifold
into a Euclidean space RN of sufficiently high dimension N and as in [14] extend the operator Lε
into a 2nd order elliptic operator L̃ε with C 2 coefficients on the whole RN . This operator serves
as a generator of a diffusion on RN solving a stochastic differential equation similar to (2.9) with
Lipschitz coefficients and its restriction to X yieldsR the required diffusion rε which solves, in fact,
t
the martingale problem for f (rε (t)) − f (rε (0)) − 0 Lε f (rε (s))ds.
The setup of Theorem 2.1 is usually not convenient for applications to dynamical systems. In this
case we have a measurable (semi) flow F t on a probability space (Y, F , P ) where Y is a Polish
8
YURI KIFER
space, F is its Borel σ−algebra and P is an F t −invariant Borel probability measure. The “random
variables” {F t y} do not generate mixing σ-algebras and in order to fit the framework of Theorem
2.1 we have to introduce an auxiliary stationary process ξ which can be defined, for instance, via a
finite partition η = {A1 , ..., Am }W
of Y so that ξt (y) = j if F t y ∈ Aj . If F00 is the finite σ−algebra
t
generated by η we can set Fs = s≤u≤t F −u F00 and this family may satisfy mixing conditions of
Theorem 2.1. In reality, this construction means that we replace the original right hand side B(x, y)
in (1.8) by B̃(x, y) = B(x, ξ0 (y)), which is piecewise constant in y. On the other hand, if for
instance, Y = M is a smooth manifold it is much more natural from the differential equations point
of view to consider B in (1.8) smoothly dependent on both x and y which we cannot achieve in the
framework of Theorem 2.1.
In order to formulate a result more appropriate from the dynamical systems point of view than
Theorem 2.1 we start with a family Fst , s ≤ t of complete σ-subalgebras of the Borel σ-algebra F
of Y such that
(2.10)
0
t+u
F −u Fst = Fs+u
for any u ≥ 0 and Fst ⊂ Fst0 whenever s0 ≤ s ≤ t ≤ t0 .
If Fst = ∨s≤u≤t F00 then, clearly, (2.10) holds true and given F00 this yields the family of minimal
possible σ-algebras satisfying (2.10). Set
s+t
β(t) = E sup kB(x, ·) ◦ F s − E(B(x, ·) ◦ F s |Fs−t
)k
(2.11)
x,s
s
where B(x, ·) ◦ F (y) = B(x, F s y), E and E(·|·) are the expectation and the conditional
expectation, respectively, on the probability space (Y, F , P ) where, recall, P is F t -invariant.
2.2. Theorem. Suppose that (2.1) and (2.2) hold true and, in addition,
Z ∞
τ β(τ )dτ < ∞.
(2.12)
0
Then for each ε > 0 and x there exists a Brownian motion W (t) (depending on ε and x) defined on
the product probability space (Ỹ, F̃, P̃ ) = ([0, 1], B, Leb) × (Y, F , P ) such that
(2.13)
Ẽ sup |Z ε (t) − S ε (t)|2 ≤ Cδ,T ε1+δ ,
0≤t≤T
ε
where Ẽ is the expectation on (Ỹ, F̃ , P̃ ), Z ε (t) = Zyε (t) = Zs,y
(t) does not depend on the first
factor in Ỹ and solves
dZ ε (t)
= B(Z ε (t), F t/ε y), Z ε (0) = x,
dt
S ε is given by (1.10) with S ε (0) = x, σ(x)σ ∗ (x) = a(x), and the latter is obtained by
Z t
Z t
1
u
u
(Bi (x, F y) − B̄i (x))du
(Bj (x, F y) − B̄j (x))du ,
(2.15)
aij (x) = lim E
t→∞ t
0
0
(2.14)
and δ is the same as in Theorem 2.1. In fact, the matrix a(x) is C 2 in x and nonnegative definite and
there exists a Lipschitz continuous symmetric matrix σ(x) such that σ 2 (x) = a(x) (see [22]).
Probabilistic properties of important classes of dynamical systems, such as Axiom A (in particular, Anosov) flows are usually studied via the well known construction called suspension. Flows
obtained via this construction even over very well mixing transformation may still not satisfy the
conditions of Theorem 2.2, and another result is needed for this important situation. The corresponding setup consists of a probability space (Γ, G, P r) together with a P r−preserving invertible
SOME RECENT ADVANCES IN AVERAGING
9
transformation θ and a measurable positive (ceiling) function ` on Γ such that the flow F t is des
fined on the space Y = {(γ, t) : γ ∈ Γ, 0 ≤ t ≤ `(γ)} by the formulas F
P (γ, t) = (γ,i s + t)
s
k
if 0 ≤ s + t < `(γ), F (γ, t) = (θ γ, u) if 0 ≤ u = s + t − 0≤i≤k−1 `(θ γ) for
P
k = max{i ≥ 1 : s + t − 0≤i≤k−1 `(θi γ) ≥ 0}, and F s (γ, t) = (θ−k γ, u) if s + t ≤ 0
P
P
and 0 ≤ u = s + t + 1≤i≤k `(θ−i γ) for k = min{i : s + t + 1≤i≤k `(θ−i γ) ≥ 0}. In
other words, F s increases the t−coordinate of (γ, t) with speed 1 until it reaches the value `(γ)
and then it jumps to (θγ, 0). We identify (glue together) the points (γ, `(γ)) with (θγ, 0) in Γ, and
so the above construction yields a continuous flow. Let EΓ denote the expectation on the space
(Γ, G, P r) and set `¯ = EΓ `. Then the flow F s preserves the probability measure P on Y defined
by dP (γ, t) = 1`¯ dP r(γ)dt. Next, let Gnm ⊂ G, −∞ ≤ n ≤ m ≤ ∞ be a sequence of σ−algebras
such that Gnm ⊂ Gnm11 whenever m1 ≥ m and n1 ≤ n. Set
k
∞
α(l) = sup{|P r(U ∩ V ) − P r(U )P r(V )| : U ∈ G−∞
, V ∈ Gk+l
, k ≥ 0}
and
k+l
β(l) = EΓ sup |` ◦ θk − Em (` ◦ θk |Gk−l
)|.
k
Most often this situation arises with θ being a left shift on a space Γ of sequences taken from
a finite (or countable) alphabet with Gnm being σ−algebras generated by coordinates between n
and m. This construction is the main tool in the ergodic theory of Axiom A flows, where coding is achieved via Markov partitions and the role of the probability P r can be played by any
Gibbs measure constructed by a Hölder continuous function (see [8]). Now, let B(x, (γ, s)) =
(B1 (x, (γ, s)), ..., Bn (x, (γ, s))), x ∈ Rn and Z ε be the solution of (1.8) with y ∈ Y. Set
R `(γ)
B̃(x, γ) = 1`¯ 0 B(x, (γ, s))ds and
k+l
ζ(l) = EΓ sup kB̃(x, ·) ◦ θk − EΓ (B̃(x, ·) ◦ θk | Fk−l
)k.
x,k
2.3. Theorem. Let κ(l) = max(α(l), β(l), ζ(l)) and suppose that for some constant K > 0,
(2.16)
K −1 < ` < K, kBi (·, ω)kC 2 ≤ K ∀ω ∈ Ω, i = 1, ..., n and
∞
X
l=1
lκ(l) < ∞
(which imposes mixing conditions only on the base transformation θ and not on the flow F s itself).
Redefine F s on Ỹ = [0, 1] × Y so that F s (u, ω) = (u, F s ω). Then for each ε > 0 and x there
exists a Brownian motion W (t) on the probability space (Ỹ, F̃ , P̃ ) = ([0, 1], B, Leb) × (Y, F , P )
such that (2.13) holds true provided Z ε (0) = S ε (0) = x with S ε solving (1.10) where σ(x)σ ∗ (x) =
a(x) = (aij (x)) and
Z t
1
(2.17) aij (x) = lim E
B̃i (x, θ[s] γ) − `¯−1 `(θ[s] γ)B̄i (x) ds
t→∞ t
0
Z t
Bj (x, θ[s] γ) − `¯−1 `(θ[s] γ)B̄j (x) ds .
×
0
The proof of Theorems 2.1 and 2.2 is based on the technique developed in [12] and [25] which
yields random variables close in probability or in average as soon, as corresponding (conditional)
characteristic functions are sufficiently close in certain sense. It is important to observe that since we
are interested in uniform (in time) estimates it is necessary to consider in this paper continuous (in
time) modifications of processes and stochastic integrals in question though the technique employed
in the above mentioned papers does not provide them directly. In the same way as in [22] this can
10
YURI KIFER
be overcome by first considering everywhere expectations of supremums only over rational times
(which do not depend on a version of a process in question) and then choosing continuous versions
and taking into account that the process Z ε is always continuous in t we arrive at the supremum
estimates over the whole interval [0, T ]. The same argument enables us to take Hölder continuous
modification of processes and stochastic integrals we are interested in, which will be important in
Section 4.
It is shown in [22] via simple moment estimates of stochastic integrals that
√
(2.18)
Ẽ sup kS ε (t) − Z̄(t) − εG(t)k2 ≤ Cε2
0≤t≤T
where G(t) is the Gaussian process solving the stochastic differential equation
√
(2.19)
dG(t) = ∇B̄(Z̄(t))G(t)dt + εσ(Z̄(t))dW (t).
This together with Theorem 2.3 yields, in particular, Theorem 2.1 from [18] which says that
ε−1/2 (Z ε (·) − Z̄(·)) converges
in the weak sense to the Gaussian process G. Observe, that though
√
both S ε (t) and Z̄(t) + εG(t) give the same quality of approximation of the slow motion Z ε , the
process S ε (t) has several advantages (see [22]), in particular, (as we described this above after the
statement of Theorem 2.1), its correction of order ε (which does not influence the asymptotic estimate (2.13) ) makes sense globally on a smooth manifold if we consider (1.8) there and not on
Rn . On the other hand, (2.19) defines a family of Gaussian processes G(t), different processes for
different Z̄(0), and they can only be defined in each coordinate chart separately.
3. AVERAGING
FOR FULLY COUPLED SYSTEMS
Consider the system of differential equations (1.3) on the product X¯ × M where X ⊂ Rn is an
open set, X̄ is its closure and M is a compact C 2 Riemannian manifold, and assume that there exists
L > 0 such that for all ε ≥ 0, x, z ∈ X¯ and y, v ∈ M,
(3.1)
(3.2)
kB(x, y, ε) − B(z, v)k + kb(x, y, ε) − b(z, v)k ≤ L(ε + |x − z| + dM (y, v))
and kB(x, y)k + kb(x, y)k ≤ L
where dM is the distance on M. Together with (1.3) we consider also the equation (1.6) on X̄ with
coefficients B̄ such that there exists L̄ > 0 such that for all x, z ∈ X̄ ,
(3.3)
kB̄(x) − B̄(z)k ≤ L̄|x − z| and kB̄(x)k ≤ L̄.
The Lipschitz continuity conditions (3.1) and (3.3) ensure existence and uniqueness of solutions of
(1.3) and (1.6), respectively. If B̄ is defined by (1.5) then (3.3) is equivalent to the existence of L̃ > 0
such that for all x, z ∈ X ,
Z
(3.4)
|
B(x, y)d(µx − µz )(y)| ≤ L̃|x − z|,
M
ε
which is a condition of regular dependence of µx on x. Set Xt = {x ∈ X : Xx,y
(s) ∈ X , X̄xε (s) ∈
X for all y ∈ M and s ∈ [0, t/ε]}. It is clear that Xt is an open set and by (3.1) and (3.3) it follows
that Xt ⊃ {x ∈ X : inf z∈X
/ |z − x| > 2t max(L, L̄)}. Introduce
Z
1 t
ε
Eε (t, δ) = {(x, y) ∈ Xt × M : |
B(x, Yx,y
(u))du − B̄(x)| > δ}.
t 0
The following result is proved in [21].
SOME RECENT ADVANCES IN AVERAGING
11
3.1. Theorem. Suppose (3.1) and (3.3) hold, and let µ be a probability measure on X × M. Then
Z Z
ε
(3.5)
lim
sup |Xx,y
(t) − X̄xε (t)| dµ(x, y) = 0
ε→0
XT
M 0≤t≤T /ε
if and only if there is an integer valued function n = n(ε) → ∞ as ε → 0 such that for any δ > 0,
lim
(3.6)
max
ε→0 0≤j<n(ε)
where t(ε) =
T
εn(ε)
Eε (t(ε), δ)} = 0,
µ{(XT × M) ∩ Φ−jt(ε)
ε
ε
ε
and, recall, Φtε (x, y) = (Xx,y
(t), Yx,y
(t)).
ε
0
Taking into account that Yx,y
(t) and Yx,y
(t) stay close during the time t ≤ t(ε) with t(ε) much
smaller than log(1/ε), we obtain a sufficient condition for (3.5) in the form of (3.6) with E0 (·, ·) in
place of Eε (·, ·). It is not difficult (see [21]) to check (3.6) in two situations where (3.5) was known
ε
ε
before, namely, when the fast motion Yx,y
does not depend on the slow motion Xx,y
and in the situ1
ation of the Anosov theorem. The latter requires ν to have a bounded C density with respect to the
Lebesgue measure on Rn and µx , x ∈ X to be invariant measures of the corresponding unperturbed
flows Fxt so that µx is ergodic for ν−almost all (a.a.) x and for each x ∈ X the measure µx has a
density qx = qx (y) > 0 with respect to the Riemannian volume on M that is C 1 in both x and y.
The following simple example shows that a system may satisfy (3.1)–(3.4) but the convergence
(3.5) does not hold true there, i.e. an additional condition of type (3.6) is, indeed, needed. Let M be
the circle T1 of length 1, all Fxt be identity maps, µx = δx be the unit mass at x ∈ RR1 , and B(x, y) =
B(y) be a C 1 function on T1 . The measures µx are trivially ergodic and B̄(x) = B(y) dµx (y) =
ε
ε
B(x) extended as a periodic function to R1 satisfies (3.3). Then Xx,y
(t) = x+ εtB(y), Yx,y
(t) ≡ y,
R
t
ε
ε
and X̄x (t) = x + ε 0 B(X̄x (s))ds. Hence
Z 1Z
Z 1
ε
sup |εtB(x) + x − X̄xε (t)|dx.
(t) − X̄xε (t)| dµx (y)dx =
sup |Xx,y
0
0 0≤t≤T /ε
T1 0≤t≤T /ε
2
X̄xε (t)
Take, for instance, B(x) = cos (2πx). Then
= (2π)−1 arctan(2πεt + tan(2πx)), and
making a change of variables we obtain that the right hand side here is equal to
Z 1
sup |t cos2 (2πx) + x − (2π)−1 arctan(2πt + tan(2πx))|dx,
0 0≤t≤T
and the latter does not depend on ε and is not zero as it is easy to see, i.e. (3.5) does not hold true.
It is easy to check directly that (3.6) is not satisfied either. Similar examples can be constructed
with all Fxt coinciding with a rational rotation of, say, a two dimensional torus with measures µx
being the ergodic invariant (normalized one dimensional Lebesgue) measures supported by the
corresponding invariant cycles.
Theorem 3.1 gives conditions for convergence in average in the averaging principle. In view of
resonances (see, for instance, [24]) it is impossible for many interesting examples to ensure (1.7)
for all x ∈ X and µx -almost all (a.a) y where µx is a reasonable family of probability measures on
M. One still could hope that the convergence in average (3.5) could be improved to convergence
almost everywhere but somehow this question has not been touched upon until now in the literature.
Recently, A. Neishtadt explained to me that in Example 7 on pp. 157–159 of [2] the convergence
(1.7) does not hold true for any initial condition (from a large domain) though this point has not
been discussed in that book and by this reason we consider this example again in Appendix. Thus
the type of convergence to the averaged motion described in Theorem 3.1 cannot be improved, in
general, in the fully coupled averaging setup.
12
YURI KIFER
There is a very restricted class of systems where (1.7) holds true for all x ∈ X and y ∈ M.
This happens, for instance, when Arnold’s conditions for two-frequency systems are satisfied (see
Section 3.5 in [24] and Section 5.1 in [2]). If the convergence in (1.4) is uniform in x ∈ X and
y ∈ M then (1.7) takes place, as well. In fact, it suffices to assume a bit less, namely, that for
any δ > 0 there exists εδ such that for any positive ε ≤ εδ one can find an integer valued function
n(ε) → ∞ as ε → ∞ so that Eε (t(ε), δ) = ∅ where, again, t(ε) = T (n(ε)ε)−1 . Such conditions
can only be satisfied for some families of uniquely ergodic dynamical systems such as flows on a
circle and horocycle flows nicely depending on a parameter (slow variable).
Another situation where we are able to verify (3.6) is the case of fast motions being slowly
changing Axiom A flows where the averaging principle in the form (3.5) was not known before.
3.2. Assumption. The family b(x, ·) in (1.2) consists of C 2 vector fields on an n−dimensional
Riemannian manifold M with uniform C 2 dependence on the parameter x belonging to a relatively
compact connected open set X and depending continuously on x in its closure X̄ . Each flow Fxt , x ∈
X̄ on M given by
dFxt y
= b(x, Fxt y),
dt
(3.7)
Fx0 y = y
possesses a basic hyperbolic attractor Λx (see [15]) with a hyperbolic splitting TΛx M = Γsx ⊕ Γ0x ⊕
Γux , where Γsx , Γux , and Γ0x are the stable, unstable, and flow directions, respectively, and there exists
an open set W ⊂ M and t0 > 0 such that
\
(3.8)
Λx ⊂ W, Fxt W̄ ⊂ W ∀t ≥ t0 , and
Fxt W = Λx ∀x ∈ X̄ .
t>0
Jxu (t, y)
Let
be the Jacobian of the linear map
Riemannian inner products and set
DFxt (y)
: Γux (y) → Γux (Fxt y) with respect to the
dJxu (t, y) .
t=0
dt
The function ϕux (y) is known to be Hölder continuous in y, since the subbundles Γux are Hölder
t
continuous (see [15]), and ϕux (y) is C 1 in x (see [9]). The Sinai-Ruelle-Bowen measure µSRB
x of Fx
is the unique equilibrium state of Fxt for the function ϕux , i.e. it is the only Fxt -invariant probability
measure on Λx whose topological pressure is zero (since Λx is an attractor). We replace now the
condition (3.1) by the following stronger one:
(3.9)
ϕux (y) = −
3.3. Assumption. There exist L, ε0 > 0 such that for all x ∈ X¯, y ∈ M, and ε ∈ [0, ε0 ),
(3.10)
kB(x, y, ε)kC 1 ([0,ε0 )×X̄ ×M) + kb(x, y, ε)kC 2 ([0,ε0 )×X̄ ×M) ≤ L
where k · kC i ([0,ε0 )×X̄ ×M) is the C i norm of the corresponding vector fields on [0, ε0 ) × X̄ × M
(taking only the right derivative in ε at 0).
Set
(3.11)
B̄(x) =
Z
B(x, y) dµSRB
x (y)
then under Assumption 3.3 B̄ is C 1 in x (see [9]), and so (3.3) is automatically satisfied.
3.4. Theorem. Suppose that Assumptions 3.2 and 3.3 hold true. Define B̄ by (3.11) and let µ be the
product of a probability measure ν with support in XT and the normalized Riemannian volume mW
SOME RECENT ADVANCES IN AVERAGING
13
on W. Then (3.6) is satisfied with t(ε) = T /εn(ε) whenever both t(ε) → ∞ and n(ε) → ∞ as
ε → 0, and so (3.5) holds true. Moreover, for any a > 0 there exist c > 0 and εa such that
µ{(x, y) ∈ XT × W :
(3.12)
sup
0≤t≤T /ε
ε
|Xx,y
(t) − X̄xε (t)| > a} ≤ e−c/ε ,
provided ε ≤ εa . The result remains true if in place of the above we take µ defined by dµ(x, y) =
dν(x)dµSRB
x (y).
Observe that we can take, in particular, ν to be the Dirac measure (unit mass) at a point x ∈ X ,
i.e. (3.5) follows here without integration in x and (3.12) can be replaced by
µx {y ∈ W :
(3.13)
sup
0≤t≤T /ε
ε
|Xx,y
(t) − X̄xε (t)| > a} ≤ e−c/ε
SRB
for either µx = mW or µx = µx . Note that Neishtadt’s example discussed in Appendix is constructed in the standard resonance framework where for some x the measures µx (which all coincide
with the Lebesgue measure there) become nonergodic. In the the setup of Theorem 3.4 all measures
µx are ergodic, and so it is still not clear whether it is possible in these circumstances to derive the
convergence (1.7) for all (or for Lebesgue almost all) x ∈ XT and for mW -almost all y ∈ W and
not just convergence in average (3.5) or in measure (3.12) and (3.13).
4. P ROOF
OF DIFFUSION APPROXIMATION FOR SUSPENSIONS
The strategy of the proof of Theorem 2.2 is the same as in the proof of Theorem 2.1 given in [22]
and the only difference is the need to employ mixing assumptions for functions depending on the
whole path of the system but their dependence on faraway tails is weak. Namely, the proof in [22]
was based on mixing estimates from [16] and [10] which, in turn, are derived from the inequality
|EΞ1 Ξ2 − EΞ1 EΞ2 | ≤ 4α(u1 − u2 ),
(4.1)
u1
and Fu∞2 , u2 ≥ u1 ,
whenever the random variables Ξ1 and Ξ2 are measurable with respect to F−∞
respectively, |Ξi | ≤ 1, i = 1, 2 and α is defined by (2.2). In the setup of Theorem 2.2 all we have
to do is to replace (4.1) by a similar estimate for random variables Ψi with |Ψi | ≤ 1, i = 1, 2 on
(Y, F , P ) such that for all t ≥ 0,
(4.2)
s+t
E sup |Ψi ◦ F s − (Ψi ◦ F s )(s,t) )| ≤ β(t), i = 1, 2 where Ψ(s,t) = E(Ψ|Fs−t
).
s
In fact, using (4.1) and (4.2) we obtain
(0,s/3)
(4.3) |EΨ1 (Ψ2 ◦ F s ) − EΨ1 EΨ2 | ≤ |EΨ1
(0,s/3)
+ E|Ψ1 − Ψ1
(Ψ2 ◦ F s )(s,s/3) − EΨ1 EΨ2 |
| + E|Ψ2 ◦ F s − (Ψ2 ◦ F s )(s,s/3) | ≤ 4α(s/3) + 2β(s/3).
This inequality, applied in place of (4.1) whenever appropriate, extends the proof of Theorem 2.1 in
[22] to the setup of Theorem 2.2 (see details of similar arguments in Chapter 7 of [26]).
ε
Next, we derive Theorem 2.3 from Theorem 2.2 (cf. [11]). Let Z̃ ε (t) = Z̃x,γ
(t) and Ẑ ε (t) =
ε
Ẑx,γ
(t) solve the equations
(4.4)
t
dZ̃ ε (t)
= B̃(Z̃ ε (t), θ[ ε`¯] γ), Z̃ ε (0) = x
dt
and
(4.5)
t
dẐ ε (t)
= `¯−1 `(θ[ ε`¯] γ)B̄(Ẑ ε (t)), Ẑ ε (0) = x,
dt
14
YURI KIFER
respectively, where [a] denotes the integral part of a. Observe that EΓ B̃(x, γ) = EΓ `¯−1 `(x)B̄(x) =
B̄(x). Employing Lemma 3.1 from [18] we derive that there exists C > 0 such that for all T > 0
and t ∈ [0, T ],
ε
ε
¯
|Zx,(γ,s)
(t) − Z̃x,γ
(`εn(t/ε,
γ))| ≤ C T ε, s ∈ [0, `(γ))
(4.6)
and
ε
¯
|Z̄x (t) − Ẑx,γ
(`εn(t/ε,
γ))| ≤ C T ε
P
where n(t, γ) = min{n ≥ 0 : ni=0 `(θi γ) > t}.
Let S ε be the diffusion process solving the stochastic differential equation (1.10) on the probability space (Ỹ, F̃ , P̃ ). Then by (4.6) and (4.7) for all t ∈ [0, T ],
(4.7)
ε
ε
ε
|Zx,y
(t) − Sxε (t, ỹ)| ≤ 2C T ε + |Z̃x,γ
(t) − Ẑx,γ
(t) + Z̄x (t) − Sxε (t, ỹ)| + R0ε (t),
(4.8)
where
ε
ε
ε
ε
¯
¯
R0ε (t) = R0ε (t, γ) = Z̃x,γ
(`εn(t/ε,
γ)) − Ẑx,γ
(`εn(t/ε,
γ)) − Z̃x,γ
(t) + Ẑx,γ
(t),
and ỹ = (u, y) = (u, γ, s) ∈ Ỹ. By the Taylor formula (cf. [22], Section 3),
ε
(4.9) Z̃x,γ
(t) = x +
t
Z
B̄(Z̄x (s))ds +
Z
t
0
0
ε
∇B̄(Z̄x (s))(Z̃x,γ
(s) − Z̄x (s))ds
+ Qε1 (t) + R1ε (t) + R2ε (t)
and
ε
Ẑx,γ
(t)
(4.10)
=x+
Z
t
B̄(Z̄x (s))ds +
0
Z
0
t
ε
∇B̄(Z̄x (s))(Ẑx,γ
(s) − Z̄x (s))ds
+ Qε2 (t) + R3ε (t) + R4ε (t)
where
Qε1 (t)
Z t
s
B̃(Z̄x (s), θ[ ε`¯ ] γ) − B̄(Z̄x (s)) ds,
=
0
Z t
s
Qε2 (t) =
`¯−1 `(θ[ ε`¯ ] γ)B̄(Z̄x (s)) − B̄(Z̄x (s)) ds,
0
Z
R2ε (t) =
0
and
Z
R4ε (t)=
t
0
t
R1ε (t) =
Z t
s
ε
(s) − Z̄x (s))ds,
∇B̃(Z̄x (s), θ[ ε`¯ ] γ) − ∇B̄(Z̄x (s)) (Z̃x,γ
R3ε (t) =
Z t
s
ε
∇B̃(Z̄x (s), θ[ ε`¯ ] γ) − ∇B̄(Z̄x (s)) (Ẑx,γ
(s) − Z̄x (s))ds,
0
0
ε
s
ε
ε
∇2 B̃ Z̄x (s) + κ1 (Z̃x,γ
(s) − Z̄x (s), θ[ ε`¯] γ (Zx,γ
(s) − Z̄x (s)), (Zx,γ
(s) − Z̄x (s)) ds,
ε
s
ε
ε
`¯−1 `(θ[ ε`¯ ] γ)∇2 B̄ Z̄x (s)+κ2 (Ẑx,γ
(s)−Z̄x (s)) (Ẑx,γ
(s)−Z̄x (s)), (Ẑx,γ
(s)−Z̄x (s))ds.
SOME RECENT ADVANCES IN AVERAGING
15
Again, by the Taylor formula applied to Sxε (t) = Sxε (t, ỹ) we obtain from (1.10) that
Z t
Z t
ε
∇B̄(Z̄x (s))(Sxε (s) − Z̄x (s))ds
B̄(Z̄x (s))ds +
(4.11) Sx (t) = x +
0
0
Z t
√
+ ε
σ(Z̄x (s))dW (s) + R5ε (t) + R6ε (t),
0
where
R5ε (t) =
√
ε
Z
t
0
and
R6ε (t) =
Z
t
0
σ(Sxε (s)) − σ(Z̄x (s)) dW (s)
∇2 B̄(Z̄x (s) + κ3 (Sxε (s)) − Z̄x (s)))(Sxε (s) − Z̄x (s)), (Sxε (s) − Z̄x (s)) ds.
It follows from (2.2) and (4.9)–(4.11) that
ε
ε
sup |Z̃x,γ
(s) − Ẑx,γ
(s) + Z̄x (s) − Sxε (s, ỹ)|
0≤s≤t
≤ Kn
Z
t
ε
ε
sup |Z̃x,γ
(u) − Ẑx,γ
(u) + Z̄x (u) − Sxε (u, ỹ)|ds
0 0≤u≤s
√
ε
+ sup |Qε (s) −
0≤s≤t
Z
s
σ(Z̄x (u))dW (u)| +
0
X
sup |Riε (s)|,
1≤i≤6 0≤s≤t
where
Qε (t) = Qε1 (t) − Qε2 (t) =
Z
t
0
s
s
B̃(Z̄x (s), θ[ ε`¯ ] γ) − `¯−1 `(θ[ ε`¯] γ)B̄(Z̄x (s)) ds.
Employing here the Gronwall inequality, then taking the square and, finally, applying the expectation
on the space (Ỹ, F̃ , P̃ ) we obtain
ε
ε
(4.12) Ẽ sup |Z̃x,·
(s) − Ẑx,·
(s) + Z̄x (s) − Sxε (s)|2
0≤s≤t
Z
√
≤ 7e2Knt Ẽ sup |Qε (s) − ε
0≤s≤t
s
σ(Z̄x (u))dW (u)|2 +
0
X
1≤i≤6
Ẽ sup |Riε (s)|2
0≤s≤t
for any t ∈ [0, T ].
Similarly to Lemma 3.2 from [22] we derive that for i = 2, 4, 5, 6,
Ẽ sup |Riε (s)|2 ≤ Ct ε2
(4.13)
0≤s≤t
for some Ct > 0 depending only on t and on n. Furthermore, for each t > 0 and any integer m > 0
there exists Ct,m > 0 such that for all ε > 0,
(4.14)
1
Ẽ sup |Riε (s)|2 ≤ Ct,m ε2− m , i = 1, 3.
0≤s≤t
In the same way as in Lemma 3.2 from [22] the proof of (4.14) relies on some inequalities from [16]
which should be generalized to our setup using (4.2) in place of (4.1).
ε
Set D(x, γ) = B̃(x, γ) − `¯−1 `(γ)B̄(x) and let V ε = Vx,γ
solves the equation
(4.15)
t
dV ε (t)
= D(V ε (t) + Z̄x (t), θ[ ε`¯] γ), V ε (0) = x.
dt
16
YURI KIFER
This equation can be treated in the framework of Theorem 2.2 if we write it in the equivalent form
for the pair (V ε (t), t) ∈ Rn+1 ,
t
d(V ε (t), t)
(4.16)
= D̃ V ε (t), t, θ[ ε`¯] γ), 1 ,
dt
where D̃(z, t, γ) = D(z + Z̄x (t), γ). By the Taylor formula
ε
V0,γ
(t) = Qε (t) + R7ε (t),
(4.17)
where
R7ε =
Z t
s
s
ε
ε
ε
∇B̃(Z̄x (s) + κ4 V0,γ
(s), θ[ ε`¯ ] γ) − `¯−1 `(θ[ ε`¯] γ)∇2 B̄ Z̄x (s) + κ4 V0,γ
(s)) V0,γ
(s)ds.
0
Similar to the proof of (4.14) we obtain
1
Ẽ sup |R7ε (s)|2 ≤ Ct,m ε2− m
(4.18)
0≤s≤t
where Ct,m > 0 can be chosen for any t > 0 and an integer m > 0 independently of ε. According
to Theorem 2.2 we can pick up a Brownian motion W defined on (Ỹ, F̃ , P̃ ) so that S̃ ε (t) solving
the equation
Z t
√
ε
σ(S̃ ε (s) + Z̄x (s))dW (s),
(4.19)
S̃ (t) = ε
0
with σ(x)σ ∗ (x) = a(x) given by (2.17), satisfies
ε
Ẽ sup |V0,·
(s) − S̃ ε (s)|2 ≤ C̃δ,t ε1+δ ,
(4.20)
0≤s≤t
taking into account that D̄(x) ≡ 0. Applying the Taylor formula to (4.19) we see that
Z t
√
σ(Z̄x (s))dW (s) + R8ε (t),
(4.21)
S̃ ε (t) = ε
0
where similarly to the estimate of R5ε in (4.13) we have by standard moment inequalities for stochastic integrals that
Ẽ sup |R8ε (s)|2 ≤ C̃t ε2
(4.22)
0≤s≤t
for some C̃t > 0 independent of ε. This together with (4.17), (4.18), (4.20) and (4.21) yield that for
this choice of W ,
Z s
√
σ(Z̄x (u))dW (u)|2 ≤ Ĉδ,t ε1+δ ,
(4.23)
Ẽ sup |Qε (s) − ε
0≤s≤t
0
with δ given by Theorem 2.2. Actually, an estimate of the type (4.23) is the main step in the proof
of both Theorem 2.1 and Theorem 2.2 and it can be derived directly proceeding similarly to Section
4 in [22].
Taking S ε solving (1.10) with the above choice of the Brownian motion W we obtain from (4.8),
(4.12)–(4.14) and (4.23) that for any δ < 2(177 + 90n)−1 and t > 0 there exists Cδ,t > 0 such that
(4.24)
ε
Ẽ sup |Zx,·
(s) − Sxε (s)|2 ≤ Cδ,t ε1+δ + Ẽ sup |R0ε (s)|2 .
0≤s≤t
0≤s≤t
By (4.11)–(4.14), (4.23) and the definition of R0 we obtain that
(4.25)
Ẽ sup |R0ε (s)|2 ≤ C̃δ,t ε1+δ + 4I1 (t) + 4εI2 (t)
0≤s≤t
SOME RECENT ADVANCES IN AVERAGING
where
Z
I1 (t) = Ẽ sup 0≤s≤t
and
¯
`εn(s/ε,γ)
s
I2 (t) = Ẽ sup 0≤s≤t
2
∇B̄(Z̄x (u))(Sxε (u) − Z̄x (u))du
¯
`εn(s/ε,γ)
Z
17
s
2
σ(Z̄x (u))dW (u) .
¯
Since K ≥ ` ≥ K we get n(t/ε, γ) ≤ Kt/ε and `εn(t/ε,
γ) ≤ K 2 t. This, together with (2.3)
and the Cauchy–Schwarz inequality, yields
−1
¯
I1 (t) ≤ K 2 Ẽ sup |`εn(s/ε,
γ) − s|2
0≤s≤t
sup
0≤s≤K 2 t
|Sxε (s) − Z̄x (s)|2
¯
γ) − s|4
≤ K 2 Ẽ sup |`εn(s/ε,
0≤s≤t
1/2
Ẽ
sup
0≤s≤K 2 t
|Sxε (s) − Z̄x (s)|4
1/2
.
By (1.10) and (2.3),
|Sxε (s) − Z̄x (s)| ≤ K
Z
s
0
|Sxε (u) − Z̄x (u)|du +
√
ε
Z
0
s
σ(Sxε (u)dW (u).
Taking here the 4th power, then the supremum over s ∈ [0, t] and employing the Gronwall inequality
together with standard moment estimates of stochastic integrals we obtain
Ẽ
(4.26)
sup
0≤s≤K 2 t
|Sxε (s) − Z̄x (s)|4 ≤ Ct ε2
for some Ct > 0. According to Theorem
1.4.1 from [23] we can choose a Hölder continuous modRs
ification of the process L(s) = 0 σ(Z̄x (u))dW (u), 0 ≤ s ≤ t whose Hölder constant Ct (γ̃) is
integrable together with any its power. We recall the remark after the statement of Theorem 2.3 saying that we can choose any continuous modification of a process we work with that is convenient for
our purposes since we can always take supremums only over rational times which give supremums
over the whole time interval whenever we deal with a continuous process. Thus for any ρ ∈ (0, 21 )
and s > 0 there exists C̃s,ρ > 0 such that
ρ
¯
¯
(4.27) I2 (t) ≤ Ẽ C 2 2 (γ̃) sup |`εn(s/ε,
γ) − s|2ρ ≤ C̃K 2 t,ρ Ẽ sup |`εn(s/ε,
γ) − s|2 .
K t
0≤s≤t
0≤s≤t
Since sup0≤s≤t |a(s)| ≤ sup0≤s≤τ |a(s)| + supτ ≤s≤t |a(s)|, provided 0 ≤ τ ≤ t we can write
(4.28)
¯
¯
Ẽ sup |`εn(s/ε,
γ) − s|k ≤ (K 2 + 1)k τ k + Ẽ sup |`εn(s/ε,
γ) − s|k
0≤s≤t
τ ≤s≤t
for any integer k ≥ 0. By the definition of n(s, γ) and estimates on `¯ we derive easily that for any
ρ ∈ (0, 1),
(4.29) {γ :
¯
sup |`εn(s/ε,
γ) − s| > ερ }
2ε≤s≤t
⊂
{γ :
∪ {γ :
sup |
2ε≤s≤t
sup |
2ε≤s≤t
X
¯ ≥ ερ−1 − 4K}
(`(θi γ) − `)|
X
¯ ≥ ερ−1 − 4K}
(`(θi γ) − `)|
0≤i≤(s+ερ )ε−1 `¯−1
0≤i≤(s−ερ )ε−1 `¯−1
⊂ {γ :
max
K −1 ερ−1 ≤k≤(t+1)Kε−1
|
k
X
i=0
¯ ≥ ερ−1 − 4K}.
(`(θi γ) − `)|
18
YURI KIFER
By Chebyshev’s inequality
P r{
sup
K −1 ερ−1 ≤k≤(t+1)Kε−1
|
k
X
i=0
¯ ≥ ερ−1 − 4K}
(`(θi γ) − `)|
≤ ε2(1−ρ) (1 − 4ε1−ρ K)−2 EΓ
max
|
−1
0≤k≤(t+1)Kε
k
X
i=0
¯ 2.
(` ◦ θi − `)|
From (2.16) together with moment and maximal inequalities from Sections 1.4.1 and 1.4.3 in [Do]
we derive that
k
X
¯ 2 ≤ Ct ε−1 log 1
(` ◦ θi − `)|
(4.30)
EΓ
max
|
−1
ε
0≤k≤(t+1)Kε
i=0
for some Ct > 0 independent of ε. It follows by (4.28)–(4.30) that
(4.31)
1
¯
Ẽ sup |`εn(s/ε,
γ) − s|k ≤ C̃t (εkρ + ε1−2ρ log )
ε
0≤s≤t
for some C̃ > 0 independent of ε. Finally, Theorem 2.3 (i.e. the corresponding estimate (2.13)
follows from (4.24)–(4.27) and (4.31) choosing, for instance, ρ = 41 both in (4.27) and in (4.31).
5. A PPENDIX : N EISHTADT ’ S EXAMPLE
We exhibit in this section an example mentioned in Section 3 together with the corresponding
arguments, both have been kindly provided to us by A. Neishtadt. Consider the system of equations
(5.1)
I˙ = ε(4 + 8 sin γ − I), γ̇ = I
with the corresponding averaged equation
(5.2)
1
J˙ = ε(4 − J).
Here γ belongs to the circle T parametrized by the interval [−2π, 0] with the end points glued together. Denote by (IIε0 ,γ0 (t), γIε0 ,γ0 (t)) and by JIε0 (t) the solution of (5.1) and of (5.2), respectively,
with the initial conditions IIε0 ,γ0 (0) = I0 , γIε0 ,γ0 (0) = γ0 and JIε0 (0) = I0 .
5.1. Proposition. For any ε1 > 0 and any point (I0 , γ0 ) with −2 < I0 < −1 there is ε ∈ (0, ε1 )
such that
(5.3)
IIε0 ,γ0 (1/ε) < JIε0 (1/ε) − 3/2.
Proof. Differentiating the second equation in (5.1) in t, substituting I˙ from the first equation
√ there
and writing the resulting equation with derivatives γ 0 , γ 00 with respect to the slow time τ = εt we
arrive at the equation
√
(5.4)
γ 00 = 4 + 8 sin γ − εγ 0 .
This equation describes the motion of a pendulum with a constant torque and a small friction linear
0
0
1
with respect to velocity. The phase space
√: γ0 ∈ R, γ ∈ T }
√of this0 motion√is the cylinder {(γ , γ)
but we restrict ourselves to its part −2/ ε < γ < 2/ ε. Observe that I = εγ so we consider
the domain −2 < I < 2. The dynamical system describing (5.4) in the phase space has two fixed
points O1 = (0, −π/6) and O2 = (0, −5π/6). By simple computations we see that O1 is a saddle
(1)
point and O2 is an attracting focus. The stable manifold of O1 consists of two separatrices Sε and
(2)
(1)
Sε joined at O1 such that Sε \ O1 is contained in the lower half {γ 0 < 0} of the cylinder while
SOME RECENT ADVANCES IN AVERAGING
19
(2)
Sε has a piece in the upper half {γ 0 > 0} of it which intersects {γ 0 = 0} at some point γ (0) < 0
(2)
(lying, in fact, between −5π/4 and −7π/6 if ε is small enough) and the remaining part of Sε is
contained in {γ 0 < 0}. We can rewrite (5.4) in the form
√
dγ 0
= 4 + 8 sin γ − εγ 0
dγ
which amounts to the functional equation
Z γ
√
(5.6)
(γ 0 (γ))2 = C + 8γ − 16 cos γ − 2 ε
γ0
(5.5)
γ 0 (u)du
−π/6
(1)
Sε
(2)
Sε
on the covering plane. In particular,
the liftings of
and
to the covering plane are described
√
by (5.6), as well, with C = 8 3+4π/3 and their form can be derived by geometric and perturbation
(from the case ε = 0) considerations.
√
Next, fix a point (I0 , γ0 ) on the cylinder with −2 < I0 < −1. In the domain −2/ ε < γ 0 <
√
(1)
−1/ ε the γ-coordinate of Sε (the latter considered as a solution of (5.4)) decreases with the
√
√
(1)
velocity of order 1/ ε, and so Sε goes around the cylinder for the time of order ε. But by
(5.4) γ 00 remains bounded in this domain which means
that for one full round around the cylinder
√
γ 0 changes for an amount of order not exceeding ε which means that in the coordinates I, γ the
(1)
distance between neighboring coils of Sε has the order not exceeding ε. Moreover, by (5.6),
Z γ+2π
√
γ 0 (u)du,
(γ 0 (γ + 2π))2 − (γ 0 (γ))2 = 16π − 2 ε
γ
and so
(5.7)
0
0
0
0
−1
|γ (γ + 2π) − γ (γ)| = |γ (γ + 2π) + γ (γ)|
√
|16π − 2 ε
Z
γ+2π
γ 0 (u)du|.
γ
√
√
Therefore, in the region −2/ ε < γ 0 < −1/ ε, i.e. when −2 < I0 < −1, the distance between
√
(1)
subsequent coils of Sε is exactly of order ε in the coordinates (γ 0 , γ), i.e. this distance is of order
ε in the coordinates (I, γ). Let N (ε) be the number of intersections of the interval [I0 , 0) × {0} on
(1)
the surface of the cylinder parallel to its axis with Sε which is approximately the number of coils
(1)
of Sε intersecting the strip [I0 , 0) on the surface of the cylinder. It is clear by the above description
(1)
(1)
of Sε that N (ε) → ∞ as ε → 0 and since Sε changes continuously with ε we conclude that as
(1)
ε → 0 coils of Sε pass the point (I0 , γ0 ) infinitely many times or, in other words, there exists a
(1)
(1)
sequence εn → 0 such that (I0 , γ0 ) ∈ Sεn for all n = 1, 2, .... Since Sεn is invariant under the time
(1)
evolution we obtain that (IIε0n,γ0 (t), γIε0n,γ0 (t)) ∈ Sεn for all t ≥ 0, and so IIε0n,γ0 (t) < 0. On the other
hand, solving the linear averaged equation (5.2) we obtain that JIε0n (t) = 4 − 4e−εn t + I0 e−εn t , and
so JIε0n (1/εn ) ≥ 4 − 6e−1 > 3/2 which yields (5.3).
Observe for completness, that by perturbation arguments after crossing {γ 0 = 0} the separatrix
comes close to the saddle O1 . By geometric considerations and arguments based on the unique(2)
ness of solutions of (5.1) on the cylinder it follows that after passing close to the saddle, Sε start
(1)
(1)
(2)
winding around the cylinder close to Sε so that Sε ∪ Sε forms the boundary of the domain of
attraction Gε of O2 which looks like a strip spiraling around the cylinder without self-intersections.
(1)
It follows that for all initial conditions (I0 , γ0 ) ∈ Gε and not only for those which belong to Sε
ε
ε
the distance for small ε between the slow motion II0 ,γ0 and the averaged one JI0 at time 1/ε is of
(2)
Sε
20
YURI KIFER
(1)
order 1. Since by the proof above the distance between subsequent coils of Sε is of order ε in the
coordinates (I, γ) when I is of order 1, the order of the width of this strip cannot be larger than ε
(with respect to the same coordinates) but, in fact, more advanced arguments show that this width
is of order ε3/2 . By geometric and perturbation considerations it is easy to see also that one of two
separatrices of the unstable manifold of O1 attracts to O2 while the other one goes away from Gε .
The corresponding pictures showing Gε can be found on pp. 158–159 of [2].
R EFERENCES
[1] D.B. Anosov, Averaging in systems of ordinary differential equations with fast oscillating solutions, Izv. Acad.
Nauk SSSR Ser. Mat., 24 (1960), 731–742 (in Russian).
[2] V.I. Arnold, V.V. Kozlov, A.I. Neishtadt, Mathematical Aspects of Classical and Celestial Mechanics (Dynamical
Systems III, V.I. Arnold ed., Encyclopedia Math. Sci., 3), (1988) Springer-Verlag, Berlin.
[3] V.I. Bakhtin, Cramér asymptotics in a system with slow and fast Markovian motions, Theory Probab. Appl., 44
(1999), 1–17.
[4] V.I. Bakhtin, On the averaging method in a system with fast hyperbolic motions, Proc. Math. Inst. of Belarus Acad.
Sci., 6 (2000), 23–26 (in Russian).
[5] M. Brin and M. Freidlin, On stochastic behavior of perturbed Hamiltonian systems, Ergod. Th.& Dynam. Sys., 20
(2000), 55–76.
[6] V.I. Bakhtin and Yu. Kifer, Diffusion approximation for slow motion in fully coupled averaging, Preprint.
[7] N.N. Bogolyubov and Yu.A. Mitropol’skii, Asymptotic Methods in the Theory of Nonlinear Oscillations, (1961),
Hindustan Publ. Co., Delhi.
[8] R. Bowen and D. Ruelle, The ergodic theory of Axiom A flows, Invent. Math. 29 (1975), 181–202.
[9] G. Contreras, Regularity of topological and metric entropy of hyperbolic flows, Math. Z., 210 (1992), 97–111.
[10] P. Doukhan, Mixing, (1994), Springer-Verlag, New York.
[11] M. Denker and W. Philipp, Approximation by Brownian motion for Gibbs measures and flows under a function,
Ergod. Th. & Dynam. Sys. 4 (1984), 541–552.
[12] R.M. Dudley and W. Philipp, Invariance principles for sums of Banach space valued random elements and empirical processes, Z. Wahrscheinlichkeitstheor. Verw. Geb. 62 (1983), 509–552.
[13] K. Hasselmann, Stochastic climate models, Part I. Theory, Tellus 28 (1976), 473–485.
[14] E.P. Hsu, Stochastic analysis on manifolds, American Math. Soc., Providence, RI, 2002.
[15] A. Katok and B. Hasselblatt, Introduction to the Modern Theory of Dynamical Systems, (1995), Cambridge Univ.
Press, Cambridge.
[16] R.Z. Khasminskii, On stochastic processes defined by differential equations with a small parameter, Th. Probab.
Appl., 11 (1966), 211–228.
[17] Yu. Kifer, Averaging in dynamical systems and large deviations, Invent. Math., 110 (1992), 337–370.
[18] Yu. Kifer, Limit theorems in averaging for dynamical systems, Ergod. Th.& Dynam. Sys., 15 (1995), 1143–1172.
[19] Yu. Kifer, Averaging and climate models, Stochastic Climate Models, Progress in Probability 49 (2001) Birkhäuser.
[20] Yu. Kifer, Averaging in difference equations driven by dynamical systems, in: Geometric Methods in Dynamics,
Asterisque 2003.
[21] Yu. Kifer, Averaging principle for fully coupled dynamical systems and large deviations, Ergod. Th.& Dynam. Sys.,
to appear.
[22] Yu. Kifer, L2 Diffusion approximation for slow motion in averaging, Stochastics and Dynamics, to appear.
[23] H. Kunita, Stochastic Flows and Stochastic Differential Equations, (1990), Cambridge Univ. Press, Cambridge.
[24] P. Lochak and C. Meunier, Multiple Averaging for Classical Systems, (1988), Springer-Verlag, New York.
[25] D. Monrad and W. Philipp, Nearby variables with nearby conditional laws and a strong approximation theorem for
Hilbert space valued martingales, Probab. Th. Rel. Fields 88 (1991), 381–404.
[26] W. Philipp and W. Stout, Almost Sure Invariance Principles for Partial Sums of Weakly Dependent Random Variables, Mem. Amer. Math. Soc., 161, Providence, R.I., 1975.
[27] J.A. Sanders and F.Verhurst, Averaging Methods in Nonlinear Dynamical Systems, (1985), Springer-Verlag, Berlin.
I NSTITUTE OF M ATHEMATICS , T HE H EBREW U NIVERSITY, J ERUSALEM 91904, I SRAEL
E-mail address: [email protected]
© Copyright 2026 Paperzz