Degenerate U- and V-statistics under weak dependence

Degenerate U - and V -statistics under weak dependence:
Asymptotic theory and bootstrap consistency
Anne Leucht*
Friedrich-Schiller-Universität Jena
Institut für Stochastik
Ernst-Abbe-Platz 2
D-07743 Jena
Germany
E-mail: [email protected]
Abstract
We devise a general result on the consistency of model-based bootstrap methods for U and V -statistics under easily verifiable conditions. For that purpose we derive the limit
distribution of degree-2 degenerate U - and V -statistics for weakly dependent Rd -valued
random variables first. To this end, only some moment conditions and smoothness assumptions concerning the kernel are required. Based on this result, we verify that the bootstrap
counterparts of these statistics converge to the same limit distributions. Finally, some
applications to hypothesis testing are presented.
2000 Mathematics Subject Classification. 62E20, 62G09.
Keywords and Phrases. Bootstrap, consistency, U -statistics, V -statistics, weak dependence.
Short title. Bootstrap for U -statistics under weak dependence.
1
1. Introduction
Numerous test statistics can be formulated or approximated in terms of degenerate U - or V -type statistics multiplied with the sample size. Examples include the
Cramér-von Mises statistic, the Anderson-Darling statistic or the χ2 -statistic. For
i.i.d. random variables the limit distributions of U - and V -statistics can be derived
via a spectral decomposition of its kernel if the latter is squared integrable. To use the
same method for dependent data, often restrictive assumptions are required whose
validity is quite complicated or even impossible to check in many cases. The first
of our two main results is the derivation of the asymptotic distributions of U - and
V -statistics under assumptions that are fairly easy to check. This approach is based
on a wavelet decomposition instead of a spectral decomposition of the kernel.
These limit distributions for both independent and dependent observations depend on certain parameters which in turn depend on the underlying situation in a
complicated way. Therefore, problems arise as soon as critical values for test statistics of U - and V -type have to be determined. The bootstrap offers a convenient way
to circumvent these problems, see Arcones and Giné [1], Dehling and Mikosch [8] or
Leucht and Neumann [23] for the i.i.d. case. To our knowledge, there are no results
concerning bootstrapping general degenerate U -statistics of non-independent observations. As a second main result of the paper, we establish consistency of model-based
bootstrap methods for statistics of weakly dependent data.
In order to describe the dependence structure of the sample we do not use the
concept of mixing. It is inappropriate in some sense since not only the asymptotic
behaviour of U - and V -type statistics but also bootstrap consistency is focussed in
this paper. Unfortunately, model-based bootstrap methods can yield samples that
are no longer mixing even though the original sample satisfies some mixing condition.
For further discussion of this issue, see Doukhan and Neumann [14]. Instead of mixing
we suppose the sample to be τ -dependent in the sense of Dedecker and Prieur [7].
Our paper is organized as follows. We start with an overview of asymptotic results
on degenerate U - and V -statistics of dependent random variables. In Subsection 2.2
we introduce the underlying concept of weak dependence and derive the asymptotic
distributions of U - and V -statistics. On the basis of these results, we deduce consistency of general bootstrap methods in Section 3. Some applications of the theory
to hypotheses testing are presented in Section 4. All proofs are deferred to a final
Section 5.
2. Asymptotic distributions of U - and V -statistics
2.1. Survey of literature. Let (Xn )n∈N be a sequence of Rd -valued random variables on some probability space. In the case of i.i.d. random variables, the limit
distribution of degenerate U - and V -type statistics, i.e.
n
n
1 X
1 XX
h(Xj , Xk ) and n Vn =
h(Xj , Xk ),
n Un =
n j=1 k6=j
n j,k=1
R
with h : Rd × Rd → R symmetric and Rd h(x, y)dF (x)P= 0, ∀ y ∈ Rd , can be
∞
derived by using a spectral decomposition of h(x, y) =
k=1 λk Φk (x)Φk (y) which
holds true in the L2 -sense. Here, (Φk )k denote orthonormal eigenfunctions and (λk )k
2
the corresponding eigenvalues of the integral equation
Z
h(x, y)g(y)dF (y) = λg(y),
(2.1)
Rd
(K)
where F denotes the distribution function of X
:=
1 . Approximate n Un by n Un
2
P
P
PK
n
√1
− n1 ni=1 Φ2k (Xi ) . Then the sum under the round
i=1 Φk (Xi )
k=1 λk
n
brackets is asymptotically normal while the latter sum converges in probability to 1.
Finally, one obtains
d
n Un −→
∞
X
λk (Zk2 − 1),
(2.2)
k=1
where (Zk )k is a sequence of i.i.d. standard normal random variables, cf. Serfling [25].
d P
2
If additionally E|h(X1 , X1 )| < ∞, Slutsky’s theorem implies Vn −→ ∞
k=1 λk (Zk −
d
1) + Eh(X1 , X1 ). (Here, −→ denotes convergence in distribution.)
So far, most previous attempts to derive the limit distribution of degenerate
U - and V -statistics of dependent random variables are based on the adoption of
this method of proof. Eagleson [15] developed the asymptotic theory in the case
of a strictly stationary sequence of φ-mixing, real-valued random variables under
the assumption of absolutely summable eigenvalues.
This condition is satisfied if
R
the kernel function is of the form h(x, y) = R h1 (x, z)h2 (z, y)dF (z). Using general
heavy-tailed weight functions instead of F , the eigenvalues are not necessarily absolutely summable. For an example see de Wet [11]. Carlstein [4] analysed U -statistics
of α-mixing, real-valued random variables in the case of finitely many eigenfunctions. He derived a limit distribution of the form (2.2), where (Zk )k∈N is a sequence
of centered normal random variables. Denker [9] considered stationary functionals
(Xn = f (Zn , Zn+1 , . . . ))n of β-mixing random variables (Zn )n . He assumed f and
the cumulative distribution function of X0 to be Hölder continuous. Imposing some
smoothness condition on h the limit distribution of n Un is derived under the additional assumption kΦk k∞ < ∞, ∀ k ∈ N. The condition on (Φk )k is difficult or even
impossible to check in a multitude of cases since this requires to solve the associated
integral equation (2.1). Similar difficulties occur if one wants to apply the results of
Dewan and Prakasa Rao [10] or Huang and Zhang [21]. They studied U -statistics
of associated, real-valued random variables. Besides the absolute summability of the
eigenvalues some regularity conditions have to be satisfied uniformly by the eigenfunctions, in order to obtain the asymptotic distribution of n Un .
A different approach was used by Babbel [2] to determine the limit distribution of
U -statistics of φ- and β-mixing random variables. She deduced the limit distribution
via a Haar-wavelet decomposition of the kernel and empirical process theory without imposing the critical conditions mentioned above. However, this approach is not
suitable when dealing with U -statistics of τ -dependent random variables since Lipschitz continuity will be the crucial property of the (approximating) kernel in order
to exploit the underlying dependence structure.
2.2. Main results. Let (Xn )n∈N be a sequence of Rd -valued random variables on
some probability space (Ω, A, P ). In this subsection we derive the limit distributions
3
of
n
n
1 XX
1 X
h(Xj , Xk ) and n Vn =
h(Xj , Xk ),
n j=1 k6=j
n j,k=1
R
where h : Rd × Rd → R is a symmetric function with Rd h(x, y)dF (x) = 0 ∀ y ∈ Rd .
In order to describe the dependence structure of (Xn )n∈N , we recall the definition of
the τ -dependence coefficient for Rd -valued random variables of Dedecker and Prieur
[7]:
n Un =
Definition 2.1. Let (Ω, A, P ) be a probability space, M a σ-algebra of A and X
an Rd -valued random variable (d ∈ N). Assume that EkXkl1 < ∞, where kxkl1 =
Pd
i=1 |xi |, and define
Z
!
Z
τ (M, X) = E
sup f (x)dF (x) .
f (x)dF X|M (x) −
f ∈Λ1 (Rd )
Rd
Rd
Here, F X|M denotes the conditional distribution function of the distribution of X
given M, and Λ1 (Rd ) denotes the set of 1-Lipschitz functions from Rd to R.
We assume
(A1) (i) (Xn )n∈N is a (strictly) stationary sequence of Rd -valued random variables
(d ∈ N) on some probability space (Ω, A, P ) with EkX0 kl1 < ∞.
(ii) The sequence (τr )r∈N defined by
τr := sup{τ (σ(Xs1 , . . . , Xsu ), (Xt01 , Xt02 , Xt03 )0 ) |
u ∈ N, s1 ≤ · · · ≤ su < su + r ≤ t1 ≤ t2 ≤ t3 ∈ N}
P∞
satisfies r=1 rτrδ < ∞ for some δ ∈ (0, 1). (Here, prime denotes the
transposition.)
Remark 1. If Ω is rich enough, due to Dedecker and Prieur [6] the validity of (A1)
0 d
0
e0 , X
e0 , X
e0
= Xt01 , Xt02 , Xt03 ,
allows for the construction of a random vector X
t1
t2
t3
independent of Xs1 , ..., Xsu with
τr =
3
X
et − Xt kl1 .
EkX
i
i
(2.3)
i=1
R β(r)
We obtain an upper bound for the dependence coefficient τr ≤ 6 0 Q|X0 | (u)du,
where Q|X0 | (u) = inf{t ∈ R | P (kX0 kl1 > t) ≤ u}, u ∈ [0, 1] and β(r) denotes
the ordinary β-mixing coefficient β(r) := E supB∈σ(Xs ,s>t+r),t∈Z |P (B|σ(Xs , s ≤ t)) −
P (B)|). This is a consequence of Remark 2 of Dedecker and Prieur [6]. Moreover,
(A1) is related to the concept of weak dependence which was introduced by Doukhan
and Louhichi [13]. Remark 1 immediately implies
lv m
τr
(2.4)
|cov (h(Xs1 , . . . , Xsu ), k(Xt1 , . . . , Xtv ))| ≤ 2khk∞ Lip(k)
3
for s1 ≤ · · · ≤ su < su +r ≤ t1 ≤ · · · ≤ tv ∈ N and for all functions h : Ru → R and k :
Rv → R in L := {f : Rp → R for some p ∈ N | Lipschitz continuous and bounded}.
Therefore, a sequence of random variables, that
satisfies (A1), is ((τr )r , L, ψ)-weak
dependent with ψ(h, k, u, v) = 2khk∞ Lip(k) v3 . (Here and in the sequel, Lip(g)
denotes the Lipschitz constant of a generic function g.)
4
A list of examples for τ -dependent processes including causal linear and functional
autoregressive processes is provided by Dedecker and Prieur [7].
Besides the conditions on the dependence structure of (Xn )n∈N we make the following
assumptions concerning the kernel:
(A2) (i) The kernel h : Rd × Rd →
R R is a symmetric, measurable function and
degenerate under F , i.e. Rd h(x, y)dF (x) = 0, ∀ y ∈ Rd .
(ii) For a δ satisfying (A1)(ii) the following moment constraints hold true
with some ν > (2 − δ)/(1 − δ):
e1 )|ν < ∞,
sup E|h(X1 , Xk )|ν < ∞ and E|h(X1 , X
k∈N
e1 is an independent copy of X1 .
where X
(A3) The kernel h is Lipschitz continuous.
Using an appropriate kernel truncation it is possible to reduce the problem of deriving
the asymptotic distribution of n Vn to statistics with bounded kernel functions.
Lemma 2.1. Suppose that (A1), (A2), and (A3) are fulfilled. Then there exists a
family of bounded functions (hc )c∈R+ satisfying (A2) and (A3) uniformly and such
that
lim lim sup n2 E(Vn − Vn,c )2 = 0,
(2.5)
c→∞ n→∞
P
where Vn,c = n12 nj,k=1 hc (Xj , Xk ).
After this simplification of the problem, we intend to develop a decomposition of
the kernel that allows for the application of a central limit theorem (CLT) for weakly
dependent random variables. One could try to imitate the proof of the i.i.d. case.
According to the discussion in the previous subsection this leads to prerequisites that
can hardly be checked in numerous cases. Therefore, we do not use a spectral decomposition of the kernel in order to apply the CLT but a wavelet decomposition
instead. It turns out that Lipschitz continuity is the central property, the kernel
function should satisfy, in order to exploit (2.3). For this reason the choice of Haar
wavelets, as they were employed by Babbel [2], is inappropriate. Instead, the application of Lipschitz continuous scale and wavelet functions is more suitable here.
In the sequel, let φ and ψ denote scale and wavelet functions associated with an onedimensional multiresolution analysis. As illustrated by Daubechies [5], Sect. 8, these
functions can be selected in such a manner that they possess the following properties:
(1) φ and ψ are Lipschitz continuous,
(2) φ
R ∞and ψ have compact
R ∞support,
(3) −∞ φ(x)dx = 1 and −∞ ψ(x)dx = 0.
It is well-known that an orthonormal basis of L2 (Rd ) can be constructed on the
base of φ and ψ. For this purpose define E := {0, 1}d \{0d }, where 0d denotes the
d-dimensional null vector. In addition, set
(
φ for i = 0,
ϕ(i) :=
ψ for i = 1
5
(e)
and define the functions Ψj,k : Rd → R for j ∈ Z, k = (k1 , . . . , kd )0 ∈ Zd by
(e)
Ψj,k (x) := 2jd/2
d
Y
ϕ(ei ) (2j xi − ki ), ∀ e = (e1 , . . . , ed )0 ∈ E,
∀ x = (x1 , . . . , xd )0 ∈ Rd .
i=1
The system
(e)
Ψj,k
is an orthonormal basis of L2 (Rd ), see Wojtaszczyk
e∈E,j∈Z,k∈Zd
[26], Sect. 5. The same holds true for
[ (e) (Φj0 ,k )k∈Zd
Ψj,k
e∈E,j≥j0 ,k∈Zd
j0 ∈ Z,
,
Q
where Φj,k : Rd → R is given by Φj,k (x) := 2jd/2 di=1 φ(2j xi − ki ) for j ∈ Z, k ∈ Zd .
Thus, an L2 -approximation of Vn,c by a statistic based on a wavelet approximation
(K,L)
of hc can be established. To this end we introduce e
hc
with
X
(c)
e
h(K,L)
(x, y) :=
αj0 ;k1 ,k2 Φj0 ,k1 (x)Φj0 ,k2 (y)
c
k1 ,k2 ∈{−L,...,L}d
J(K)−1
+
X
j=j0
X
X
(c,e)
(e)
βj;k1 ,k2 Ψj;k1 ,k2 (x, y),
k1 ,k2 ∈{−L,...,L}d e∈Ē
where Ē := (E × E) ∪ (E × {0d }) ∪ ({0d } × E),
 (e ) (e )
1
2
0 0
0

Ψj,k1 Ψj,k2 for (e1 , e2 ) ∈ E × E,
(e)
(e1 )
Ψj;k1 ,k2 := Ψj,k1 Φj,k2 for (e01 , e02 )0 ∈ E × {0d },


(e )
Φj,k1 Ψj,k22
for (e01 , e02 )0 ∈ {0d } × E
RR
(c)
αj0 ;k1 ,k2 = Rd ×Rd hc (x, y)Φj0 ,k1 (x)Φj0 ,k2 (y)dx dy and
RR
(c,e)
(e)
βj;k1 ,k2 = Rd ×Rd hc (x, y)Ψj;k1 ,k2 (x, y)dx dy. We refer to the degenerate version of
(K,L)
(K,L)
e
hc
as hc
, given by
Z
Z
(K,L)
(K,L)
(K,L)
e
e
e
hc
(x, y) = hc
(x, y) −
hc
(x, y)dF (x) −
h(K,L)
(x, y)dF (y)
c
d
d
R
R
ZZ
e
(x, y)dF (x)dF (y).
+
h(K,L)
c
Rd ×Rd
(K,L)
The associated V -type statistic will be denoted by Vn,c
.
Lemma 2.2. Assume that (A1), (A2), and (A3) are fulfilled. Then (J(K))K∈N ⊆ N
with J(K) −→ ∞ can be chosen such that
K→∞
(K,L)
lim lim lim sup n2 E Vn,c − Vn,c
K→∞ L→∞
2
= 0.
n→∞
Employing the CLT of Neumann and Paparoditis [24] and the continuous mapping
K,L
theorem we obtain the limit distribution of n Vn,c
. Finally, based on this result,
the limit distribution of the V -type statistic n Vn can be derived. Moreover, the
law of large numbers P
allows for deducing
the asymptotic distribution of n Un since
n
n
1
n Un = n−1 n Vn − n k=1 h(Xk , Xk ) .
6
Theorem 2.1. Suppose that the assumptions (A1), (A2), and (A3) are fulfilled.
Then, as n → ∞,
d
n Vn −→ Z
and
d
n Un −→ Z − Eh(X1 , X1 )
with

Z := lim 
c→∞
X
(c)
αj0 ;k1 ,k2 Zk1 Zk2 +
∞
X
X

X
(c,e)
(e )
(e )
βj;k1 ,k2 Zj;k11 Zj;k22  .
j=j0 k1 ,k2 ∈Zd e=(e01 ,e02 )0 ∈Ē
k1 ,k2 ∈Zd
(e)
Here, (Zk )k∈Zd and (Zj;k )j≥j0 , k∈Zd , e∈{0,1}d are centered normally distributed random
variables and the r.h.s. converges in the L2 -sense.
As in the case of i.i.d. random variables, the limit distribution of n Vn is a weighted
sum of products of centered normal random variables. In contrast to many other results in the literature, the prerequisites of this theorem, namely moment constraints
and Lipschitz continuity of the kernel, can be checked fairly easily in many cases.
Unfortunately, the asymptotic distribution still has a complicated structure. Hence,
quantiles can hardly be determined on the basis of the previous result. However, this
problem plays a minor role since we show in the following section that the conditional distribution of the bootstrap counterparts of n Un and n Vn , given X1 , . . . , Xn ,
converge to the same limits in probability.
Of course, the assumption of Lipschitz continuous kernels is rather restrictive. Thus,
we intend to extend our theory to a more general class of kernel functions. The costs
for enlarging the class of feasible kernels are additional moment constraints.
Besides (A1) and (A2) we assume
(A4) (i) The kernel function satisfies
|h(x, y) − h(x̄, ȳ)| ≤ f (x, x̄, y, ȳ) [kx − x̄kl1 + ky − ȳkl1 ] , ∀ x, x̄, y, ȳ ∈ Rd
where f : R4d → R is continuous. Moreover, for η := 1/(1 − δ) and some
a>0
η
sup E
max f (Yk1 , Yk2 + c1 , Yk3 , Yk4 + c2 ) (1 + kYk5 kl1 ) < ∞,
k1 ,...,k5 ∈N
c1 ,c2 ∈[−a,a]d
ek , where X
ek denotes an
for any sequence (Yk )k∈N with Yk = Xk or Yk = X
independent copy of Xk .
P
δ2
(ii) ∞
< ∞.
r=1 r(τr )
Even though the assumption (A4)(i) has a rather technical structure, it is satisfied e.g.
by polynomial kernel functions as long as the sample variables have sufficiently many
finite moments. Analogously to Lemma 2.1 and Lemma 2.2 the following assertion
holds.
Lemma 2.3. Suppose that (A1), (A2), and (A4) are fulfilled. Then (J(K))K∈N ⊂ N
with J(K) −→ ∞ can be chosen such that
K→∞
(K,L) 2
lim lim lim lim sup E(Vn − Vn,c
) = 0.
c→∞ K→∞ L→∞
n→∞
This auxiliary result implies the analogue to Theorem 2.1 for non-Lipschitz kernels.
7
Theorem 2.2. Assume that (A1), (A2), and (A4) are satisfied. Then, as n → ∞,
d
n Vn −→ Z
d
n Un −→ Z − Eh(X1 , X1 ),
and
where Z is defined as in Theorem 2.1.
3. Consistency of general bootstrap methods
As we have seen in the previous section, the limit distributions of degenerate U and V -statistics have a rather complicated structure. Therefore, in the majority of
cases it is quite difficult to determine quantiles which are required in order to derive
asymptotic critical values of U - and V -type test statistics. The bootstrap offers a
suitable way of determining these quantities.
Given X1 , . . . , Xn , let X ∗ and Y ∗ denote vectors of bootstrap random variables
with values in Rd1 and Rd2 . In order to describe the dependence structure of the
bootstrap sample, we introduce Xn := (X10 , . . . , Xn0 )0 ,
!
Z
Z
∗
∗
∗
τ ∗ (Y ∗ , X ∗ , ω) := E
sup f (x)dF X |Y (x) −
f (x)dF X (x) Xn = ω
f ∈Λ1 (Rd1 )
Rd 1
Rd1
and make the following assumptions:
(A1∗ ) (i) The sequence of bootstrap variables is stationary with probability tending
d
to one. Additionally, (Xt∗01 , Xt∗02 )0 −→ (Xt01 , Xt02 )0 , ∀ t1 , t2 ∈ N, holds true
in probability.
(ii) Conditionally on X1 , . . . , Xn , the random variables (Xk∗ )k∈Z are τ -weakly
dependent, i.e. there exist a sequence of coefficients (τ̄r )r∈N and a sequence
(1)
(1)
of sets (Ωn )n∈N with P (Xn ∈ Ωn ) −→ 1 and the following property: For
n→∞
(1)
any sequence (ωn )n∈N with ωn ∈ Ωn ,
τr∗ (ωn ) := sup{τ ∗ ((Xs∗01 , . . . , Xs∗0u )0 , (Xt∗01 , Xt∗02 , Xt∗03 )0 , ωn ) |
u ∈ N, s1 ≤ · · · ≤ su < su + r ≤ t1 ≤ t2 ≤ t3 ∈ N}
can
by some τ̄r such that the sequence (τ̄r )r∈N satisfies
P∞ be bounded
δ
r=1 r(τ̄r ) < ∞.
Remark 2. Neumann and Paparoditis [24] proved that in case of stationary Markov
chains of finite order, the key for convergence of the finite-dimensional distributions
is convergence of the conditional distributions, cf. their Lemma 4.2. In particular,
they showed that AR(p)-bootstrap and ARCH(p)-bootstrap yield samples that satisfy
(A1∗ )(i).
Lemma 3.1. Suppose that (A1) and (A1∗ ) hold true. Further let h : Rd × Rd → R
be a bounded, symmetric, Lipschitz continuous function such that
Eh(X1 , y) = E(h(X1∗ , y)|X1 , . . . , Xn ) = 0, ∀ y ∈ Rd . Then,
n
1 X
d
h(Xj∗ , Xk∗ ) −→ Z
n j,k=1
n
and
1 XX
d
h(Xj∗ , Xk∗ ) −→ Z − Eh(X1 , X1 )
n j=1 k6=j
hold in probability. Here, Z is defined as in Theorem 2.1.
8
In order to deduce bootstrap
consistency, additionally, convergence in a certain
Pn
P
1
∗
∗
metric ρ is required, i.e. ρ P ( n j,k=1 h(Xj , Xk ) ≤ x|X1 , ..., Xn ), P (n Vn ≤ x) −→
P
0. (Here, −→ denotes convergence in probability.) Convergence in the uniform
metric follows from Lemma 3.1 if the limit distribution has a continuous cumulative
distribution function. The next assertion gives a necessary and sufficient condition
for this.
Lemma 3.2. The limit variable Z, derived in Theorem 2.1 / Theorem 2.2 under
(A1), (A2), and (A3)/(A4), has a continuous cumulative distribution function if
var(Z) > 0.
Kernels of statistics emerging from goodness-of-fit tests for composite hypotheses
often depend on an unknown parameter. We intend to employ bootstrap consistency
for this setting, i.e. when parameters have to be estimated. Moreover, we enlarge the
class of feasible kernels. For this purpose we additionally assume
P
(A2∗ ) (i) θbn−→ θ ∈ Rp .
(ii) E h(X ∗ , y, θbn )|X1 , . . . , Xn = 0, ∀ y ∈ Rd .
1
(iii) For some δ satisfying (A1∗ )(ii), ν > (2−δ)/(1−δ) and a constant C1 < ∞
(2)
(2)
there exists a sequence of sets (Ωn )n∈N such that P (Xn ∈ Ωn ) −→ 1
n→∞
and ∀ (ωn )n∈N with ωn ∈
(2)
Ωn
the following moment constraint holds true
e1∗ , θbn )|ν |Xn = ωn ≤ C1 ,
sup E |h(X1∗ , Xk∗ , θbn )|ν + |h(X1∗ , X
1≤k≤n
e1∗ denotes an independent copy of
where (conditionally on X1 , . . . , Xn ) X
∗
X1 .
and
(A3∗ ) (i) The kernel satisfies
|h(x, y, θbn ) − h(x̄, ȳ, θbn )| ≤ f (x, x̄, y, ȳ, θbn ) [kx − x̄kl1 + ky − ȳkl1 ] , ∀x, x̄, y, ȳ ∈ Rd ,
sup
1≤k1 ,...,k5 ≤n
where f : R4d × Rp → R is continuous on R4d × U (θ) for some compact
neighborhood U (θ) of θ. Moreover, for η := 1/(1 − δ), some a > 0
(3)
and some C2 < ∞ there exists a sequence of sets (Ωn )n∈N such that
(3)
(3)
P (Xn ∈ Ωn ) −→ 1 and ∀ (ωn )n∈N with ωn ∈ Ωn the following moment
n→∞
constraint holds true
E[ max f (Y ∗ , Y ∗ + c1 , Y ∗ , Y ∗ + c2 , θbn )η (1 + kY ∗ kl )|Xn = ωn ] ≤ C2 ,
c1 ,c2 ∈[−a,a]d
k1
k2
k3
k4
k5
1
e ∗ , where X
e ∗ denotes
for any sequence (Yk∗ )k∈Z with Yk∗ = Xk∗ or Yk∗ = X
k
k
∗
an independent copy of Xk , conditionally on X1 , . . . , Xn .
P
δ2
(ii) ∞
<∞.
r=1 r(τ̄r )
Under these assumptions we derive a result concerning the asymptotic distribuP
P P
tions of n Vn∗ = n−1 nj,k=1 h(Xj∗ , Xk∗ , θbn ) and n Un∗ = n−1 nj=1 k6=j h(Xj∗ , Xk∗ , θbn ).
To this end we denote the U - and V -statistics with kernel h(·, ·, θ) by Un and Vn ,
respectively.
9
Theorem 3.1. Suppose that (A1), (A2), and (A4), as well as (A1∗ ), (A2∗ ), and
(A3∗ ) are fulfilled. Then, as n → ∞,
d
n Vn∗ −→ Z,
and
d
n Un∗ −→ Z − Eh(X1 , X1 , θ),
in probability,
where Z is defined as in Theorem 2.1. Moreover, if var(Z) > 0,
P
sup
−∞<x<∞
|P (n Vn∗ ≤ x|X1 , ..., Xn ) − P (n Vn ≤ x)| −→ 0
and
sup
−∞<x<∞
P
|P (n Un∗ ≤ x|X1 , ..., Xn ) − P (n Un ≤ x)| −→ 0.
This implies that bootstrap-based tests of U - or V -type have asymptotically a
prescribed size α, i.e. P (n Un > t∗u,α ) −→ α and P (n Vn > t∗v,α ) −→ α, where t∗u,α and
n→∞
n→∞
t∗v,α denote the (1 − α)-quantiles of n Un∗ and n Vn∗ , respectively, given X1 , . . . , Xn .
4. L2 -tests for weakly dependent observations
This section is dedicated to two applications in the field of hypothesis testing.
For sake of simplicity we restrict ourselves to real-valued random variables and consider only a simple null hypothesis. The test for symmetry as well as the modelspecification test can be extended to problems with composite hypotheses, cf. Leucht
[22].
4.1. Test for symmetry. Answering the question whether a distribution is symmetric or not is interesting for several reasons. There is a pure statistical interest
since symmetry characterizes the relation of location parameters. Moreover, symmetry plays a central role in analyzing and modelling real-life phenomena. For instance,
it is often presumed that an observed process can be described by an AR(p)-process
with Gaussian innovations, which in turn implies a Gaussian marginal distribution.
Rejecting the hypothesis of symmetry contradicts this type of marginal distribution.
Furthermore, this result of the test excludes any kind of symmetric innovations in
this context.
Suppose that (Xn )n∈N is a sequence of real-valued random variables with distribution
P X and satisfying (A1). For some µ ∈ R we are given the problem
H0 :
P X−µ = P µ−X
vs. H1 :
P X−µ 6= P µ−X .
Similar to Feuerverger and Mureika [17], who studied the problem for i.i.d. random
variables, we propose the following test statistic
Z
n Z
1 X
−iµt 2
Tn = n [=(cn (t)e
)] w(t)dt =
sin(t(Xj − µ)) sin(t(Xk − µ))w(t)dt
n j,k=1 R
R
which makes use of the fact that symmetry of a distribution is equivalent to a vanishing imaginary part of the associated characteristic function. Here, =(z) denotes the
imaginary part of z ∈ C, cn denotes the empirical
R characteristic function and w is
some positive, measurable weight function with R (1 + |t|)w(t)dt < ∞. Obviously, Tn
is a V -type statistic whose kernel satisfies (A2) and (A3). Thus, its limit distribution
can be determined by Theorem 2.1. Assuming that the observations come from a
stationary AR(p)- or ARCH(p)-process, the validity of (A1∗ ) is assured by using the
AR(p)- or ARCH(p)-bootstrap methods, given by Neumann and Paparoditis [24], in
10
order to generate the bootstrap counterpart of the sample. Hence, in these cases the
prerequisites of Lemma 3.1 are satisfied excluding degeneracy. Inspired by Dehling
and Mikosch [8] who discussed this problem for Efron’s Bootstrap in the i.i.d. case,
we propose a bootstrap statistic with the following kernel
Z
Z
Z
∗
∗
∗
hn (x, y) = h(x, y) − h(x, y)dFn (x) − h(x, y)dFn (y) +
h(x, y)dFn∗ (x)dFn∗ (y).
R
R
R2
Here, h denotes the kernel function of Tn and Fn∗ the distribution function of X1∗ conditionally on X1 , . . . , Xn . Similar to the proof of Theorem 3.1 the desired convergence
property of Tn∗ can be verified.
4.2. Model-specification test. Let (Xk )k∈Z be a stationary real-valued nonlinear
autoregressive process with centered i.i.d. innovations (εk )k∈Z , i.e. Xk = g(Xk−1 )+εk .
Suppose that E|ε0 |4+δ < ∞ for some δ > 0 and that
g ∈ G := {f : R → R | f Lipschitz-continuous with Lip(f ) < 1}. Thus, the process
(Xk )k∈Z is τ -dependent with exponential rate, see Dedecker and Prieur [7], Example
4.2.
We will present a test for the problem
H0 :
P (E(X1 |X0 ) = g0 (X0 )) = 1 vs. H1 :
P (E(X1 |X0 ) = g0 (X0 )) < 1
with g0 ∈ G. For sake of simplicity we stick to this small classes of functions G and
processes (Xk )k∈Z . An extension to a more comprehensive variety of model specification tests is investigated in a forthcoming paper, cf. Leucht [22].
Similar to Fan and Li [16] we propose the following test statistic
n
Xj−1 − Xk−1
1 XX
(Xj − g0 (Xj−1 ))(Xk − g0 (Xk−1 ))K
Tn = √
h
n h j=1 k6=j
n
1 XX
H(Zj , Zk ),
=:
n j=1 k6=j
i.e. a kernel
estimator of E([X1 − g(X0 )]E(X1 − g(X0 )|X0 )f (X0 )) = 0 multiplied
√
with n h. Here, Zk := (Xk , Xk−1 )0 , k ∈ Z, and f denotes the density of the
distribution of X0 . Fan and Li [16], who considered β-mixing processes, used a
similar test statistic with a vanishing bandwidth. In contrast, we consider the case of
a fixed bandwidth. These tests a more powerful against Pitman alternatives g1,n (x) =
g0 (x) + n−β w(x) + o(n−β ), β > 0, w ∈ G. For a detailed discussion of this topic see
Ghosh and Huang [19].
Obviously, Tn is degenerate. If we assume K to be a bounded, even, and Lipschitz continuous function, then there exists a function f : R8 → R with |H(z1 , z2 ) −
H(z̄1 , z̄2 )| ≤ f (z1 , z̄1 , z2 , z̄2 )(kz1 − z2 kl1 + kz̄1 − z̄2 kl1 ) and such that (A4) is valid.
Moreover, under these conditions H satisfies (A2). Hence, the assertion of Theorem 2.2 holds true. In order to determine critical values of the test we propose the
bootstrap procedure given by Franke and Wendel [18] (without estimating the regression function). P
The bootstrap innovations (ε∗t )t are drawn with replacement of the set
{ε̃t = εt − n−1 nk=1 εk }nt=1 , where εt = Xt − g0 (Xt−1 ), t = 1, . . . , n. After choosing
∗
a starting value independently of (ε∗t )t≥1 the bootstrap sample P
Xt∗ =P
g(Xt−1
) + ε∗t as
n
well as the bootstrap counterpart of the test statistic Tn∗ = n1 j=1 k6=j H(Zj∗ , Zk∗ )
11
∗
)0 , k = 1, . . . , n, can be computed. In contrast to the previwith Zk∗ = (Xk∗ , Xk−1
ous subsection the proposed bootstrap method leads to a degenerate kernel function. Obviously, the bootstrap sample is τ -dependent in the sense of (A1∗ ) and
satisfies E(|Xk∗ | | Z1 , . . . , Zn ) < C for some C < ∞ with probability tending to
one. By Theorem 1 of Diaconis and Freedman [12] we obtain stationarity of the
bootstrap sample. In order to verify convergence of the finite dimensional distributions, we apply Lemma 4.2 of Neumann and Paparoditis [24]. The application of this result requires the convergence of the conditional distribution, i.e.
∗
∗
P
supx∈K d(P Xt |Xt−1 =x , P Xt |Xt−1 =x ) −→ 0 for every compact K ⊂ R and d(P, Q) =
inf X∼P,Y ∼Q E(|X − Y | ∧ 1). In the present context this can be confirmed similarly to
the proof of Lemma 4.1 by Neumann and Paparoditis [24]. Summing up, all prerequisites of Theorem 3.1 are satisfied. Hence, the asymptotic critical value of the above
test can be determined using the proposed model-based bootstrap procedure.
5. Proofs
5.1. Proofs of the main theorems. Throughout this section C denotes a positive,
finite generic constant.
(K,L)
Proof of Theorem 2.1. First the limit distribution of n Vn
, defined before Lemma 2.2,
is derived. Afterwards the asymptotic distribution of n Vn and n Un are deduced by
means of Lemma 2.1, Lemma 2.2, and the weak law of large numbers.
(K,L)
The following modified representation of e
hc
will be useful in the sequel
M (K,L)
e
h(K,L)
(x, y) =
c
X
(c)
γk,l q̃k (x)q̃l (y),
k,l=1
M (K,L)
where (q̃k q̃l )k,l=1
(c)
is an ordering of
S
n
o
(e)
(Φ
Φ
)
∪
(Ψ
)
j0 ,k1 j0 ,k2
j;k1 ,k2 e∈Ē,j∈{j0 ,...,J(K)−1}
k1 ,k2 ∈{−L,...,L}d
(c)
and γk,l = γl,k , k, l ∈ {1, . . . , M (K, L)}, are the associated coefficients. Moreover, the
introduction of qk (Xi ) := q̃k (Xi ) − Eq̃k (Xi ), k ∈ {1, . . . , M (K, L)}, i ∈ {1, . . . , n},
(K,L)
allows for the compact notation of n Vn,c ,
!
!
M (K,L)
n
n
X (c)
X
X
1
1
(K,L)
√
n Vn,c
=
γk,l √
qk (Xi )
ql (Xj ) .
n
n
i=1
j=1
k,l=1
0
q1 (Xi ), ..., qM (K,L) (Xi )
PM (K,L) 1 Pn
first. Due to the Cramér-Wold-device it suffices to investigate k=1
tk √n i=1 qk (Xi )
∀ (t1 , ..., tM (K,L) )0 ∈ RM (K,L) . Asymptotic normality can be established by applying
PM (K,L)
the CLT of Neumann and Paparoditis [24] to Qi := k=1
tk qk (Xi ), i = 1, . . . , n.
To this end the prerequisites of this tool have to be checked. Obviously, we are
given a strictly stationary sequence of centered and bounded random variables. This
implies in conjunction with the dominated convergence theorem that the Lindeberg
condition is fulfilled. In order to show n1 var(Q1 + · · · + Qn ) −→ σ 2 := var(Q1 ) +
n→∞
P
2 ∞
cov(Q
,
Q
),
the
validity
of
(A1)
can
be
employed
which
moreover assures the
1
k
k=2
In order to derive its limit distribution, we consider
√1
n
Pn
i=1
12
existence of σ 2 . Then,
X
n
∞
X
1
2
var(Q1 + ... + Qn ) − σ 2 = (n
−
[r
−
1])cov(Q
,
Q
)
−
2
cov(Q
,
Q
)
1
r
1
k n
n
r=2
k=2
∞
X
r−1
, 1 |cov(Q1 , Qr )|
≤2
min
n
r=2
∞
X
r−1
, 1 τr−1 ,
≤ 4kQ1 k∞ Lip(Q1 )
min
n
r=2
where the latter inequality follows from (2.4). The summability condition of the dependence coefficients in connection with Lebesgue’s dominated convergence theorem
yields the desired result. Since Qt1 Qt2 forms a Lipschitz continuous function, inequality (6.4) of Neumann and Paparoditis [24] holds true with θr = Lip(Qt1 Qt2 )τr . It is
easy to convince oneself that their condition (6.3) is not needed to be checked if the
involved random variables are uniformly bounded. Finally, we obtain
d
n−1/2 (Q1 + ... + Qn ) −→ N (0, σ 2 )
and hence,
d
X
(K,L)
n Vn,c
−→ Zc(K,L) :=
(c)
αj0 ;k1 ,k2 Zk1 Zk2
k1 ,k2 ∈{−L,...,L}d
J(K)−1
+
X
j=j0
X
X
(c,e)
(e )
(e )
βj;k1 ,k2 Zj;k11 Zj;k22 .
k1 ,k2 ∈{−L,...,L}d e=(e01 ,e02 )0 ∈Ē
(e)
Here, (Zk )k∈{−L,...,L}d and (Zj;k )j∈{j= ,...,J(K)},e∈{0,1}d ,k∈{−L,...,L}d , respectively, are cen
d
(K,L )
(K,L )
tered normally distributed random variables. Note that also n Vn,c 1 − Vn,c 2 −→
(K,L1 )
Zc
(K,L2 )
− Zc
. By Lemma 2.1 and Lemma 2.2 we have
(K,L)
lim lim lim lim sup n2 E(Vn,c
− Vn )2 = 0.
c→∞ K→∞ L→∞
n→∞
(K,L)
Hence, it remains to show that limc→∞ limK→∞ limL→∞ E(Zc
− Z)2 = 0. To this
end we first show that (ZcK,L )L is a Cauchy sequence in L2 (Ω, A, P ) which, due to com(K,L) d
(K)
(K,L)
pleteness of L2 , implies limL→∞ n Vn,c −→ Zc := L2 − limL→∞ Zc
. According
(K,L1 )
(K,L2 ) 2
(K,L )
to Theorem 5.3 of Billingsley [3] we obtain E(Zc
−Zc
) ≤ lim inf n→∞ n2 E(Vn,c 1 −
(K,L )
Vn,c 2 )2 . The second part of the proof of Lemma 2.2 implies
2
(K,L )
(K,L )
lim inf n→∞ n2 E Vn,c 1 − Vn,c 2
−→ 0 for L1 > L2 . Iterating this method in
L2 →∞
(K)
conjunction with Lemma 2.1 yields limc→∞ limK→∞ E(Z − Zc )2 = 0 and hence
(K,L)
limc→∞ limK→∞ limL→∞ E(Zc
− Z)2 = 0 which in turn leads to the desired limit
distribution of n Vn .
Based on the result concerning V -type statistics,
the limit distribution of n Un can
Pn
n
1
be established. Since Un = n−1 Vn − n(n−1) i=1 h(Xi , Xi ), it remains to verify that
Pn
P
1
i=1 h(Xi , Xi ) −→ Eh(X1 , X1 ). According to Markov’s inequality we prove that
n−1
13
var
1
n−1
Pn
i=1
h(Xi , Xi ) −→ 0. For this purpose we investigate cov(h(Xj , Xj ), h(Xk , Xk ))
n→∞
d
ek =
ek kl1 = τk−j . The confor j < k. Let X
Xk be independent of Xj with EkXk − X
dition (A2) implies
ek , X
ek )]
cov(h(Xj , Xj ), h(Xk , Xk )) = E(h(Xj , Xj )[h(Xk , Xk ) − h(X
1−δ
δ
ek , X
ek )|
E|h(Xj , Xj )|1/(1−δ) |h(Xk , Xk ) − h(X
≤ Cτk−j
δ
≤ Cτk−j
.
Therefore, by (A1) we get with τ0 = [E(h(X1 , X1 ))2 ]1/δ
!
n
n−1
X
1 X
C
var
h(Xi , Xi ) ≤
(n − r)τrδ −→ 0.
n→∞
n − 1 i=1
(n − 1)2 r=0
Proof of Theorem 2.2. On the basis of Lemma 2.3 similar arguments as in the proof
d
d
of Theorem
2.1 yield n Vn −→ Z. In order to obtain n Un −→ Z − Eh(X1 , X1 ),
P
n
1
var( n−1
k=1 h(Xk , Xk )) −→ 0 has to be verified. To this end the approximation of
n→∞
cov(h(Xj , Xj ), h(Xk , Xk )) has to be modified:
δ2
ek , Xk , X
ek )1−δ [kXk kl1 + kX
ek kl1 ])δ(1−δ)
cov(h(Xj , Xj ), h(Xk , Xk )) ≤ τk−j
(2 Ef (Xk , X
ek , X
ek )|)1−δ
× (E|h(Xj , Xj )|1/(1−δ) |h(Xk , Xk ) − h(X
2
δ
= O(τk−j
).
The summability assumption concerning the dependence coefficients in (A4) implies
the desired convergence property.
Proof of Theorem 3.1. Due to Lemma 3.2 it suffices to verify distributional convergence. To this end we introduce
(2)
(3)
b
Ωθn ⊆ Ω(1)
n ∩ Ωn ∩ Ωn ∩ Xn | kθn − θkl1 < δn
such that for any sequence (ωn )n∈N with ωn ∈ Ωθn
L((Xt∗01 , Xt∗02 )0 |Xn = ωn ) = L((Xt∗01 +k , Xt∗02 +k )0 |Xn = ωn ), k ∈ N,
L((Xt∗01 , Xt∗02 )0 |Xn = ωn ) =⇒ L((Xt01 , Xt02 )0 ).
Moreover, the null sequence (δn )n∈N can be chosen such that on Ωθn , θbn ∈ U (θ) and
d
P (Xn ∈ Ωθn ) −→ 1 holds. Hence, to prove Vn∗ −→ Z, in probability, it suffices to
n→∞
verify
−→ Z conditionally on Xn = ωn for any sequence (ωn )n with ωn ∈ Ωθn .
Now, we take an arbitrary sequence (ωn )n with ωn ∈ Ωθn .
In order to show that it suffices to investigate statistics with bounded kernels, we
consider the degenerate version h∗(c) of

d

/ [−c, c] with |h(x, y, θbn )| ≤ ch ,
h(x, y, θbn ), for x, y ∈ [−c, c] or x, y ∈
∗(c)
d
e
h (x, y, θbn ) := −ch ,
for x, y ∈
/ [−c, c] with h(x, y, θbn ) < −ch ,

c ,
for x, y ∈
/ [−c, c]d with h(x, y, θbn ) > ch ,
h
Vn∗
d
14
with ch := max{c, maxx,y∈[−c,c]d ,θ∈U (θ) |h(x, y, θ)|}. The associated V -statistic is de∗
noted by Vn,c
. Now, imitating the proof of Lemma 2.1 results in
∗ 2
) |Xn = ωn ] ≤ εc
lim sup n2 E[(Vn∗ − Vn,c
n→∞
with εc −→ 0. Next we approximate the bounded kernel by the degenerate version of
c→∞
X
(c)
e
h∗(K,L)
:=
α
bj0 ;k1 ,k2 Φj0 ,k1 Φj0 ,k2
c
k1 ,k2 ∈{−L,...,L}d
J(K)−1
+
X
X
j=j0
X
(c,e)
(e)
βbj;k1 ,k2 Ψj;k1 ,k2 ,
k1 ,k2 ∈{−L,...,L}d e∈Ē
RR
(c)
where α
bj0 ;k1 ,k2 = Rd ×Rd h∗(c) (x, y, θbn )Φj0 ,k1 (x)Φj0 ,k2 (y)dx dy and
RR
(c,e)
(e)
βbj;k1 ,k2 = Rd ×Rd h∗(c) (x, y, θbn )Ψj;k1 ,k2 (x, y)dxdy. Denoting the associated V -statistic
∗(K,L)
by Vbn,c
leads to
∗
∗(K,L) 2
lim lim lim sup n2 E[(Vn,c
− Vbn,c
) |Xn = ωn ] = 0.
K→∞ L→∞
n→∞
This conjecture can be proven by following the lines of the proof of Lemma 2.3. Here,
/
J(K) is chosen as follows: Since the Portmanteau theorem implies lim supn→∞ P (X1∗ ∈
d
0 0
d
0
/ (−b, b) ), we first select some b = b(K) <
[−b, b] |X1 , . . . , Xn ) = ωn ) ≤ P (X1 ∈
∞ such that P (X1 ∈
/ (−b, b)d ) ≤ 1/K. Afterwards we choose J(K) such that
∗(K)
maxx,y∈[−b,b]d |h∗(c) (x, y, θbn ) − e
hc (x, y, θbn )| ≤ 1/K and Sφ /2J(K) < a, where Sφ
denotes the length of the support of the scale function φ. Because of the continuity
assumptions on f the index J(K) can be determined independently of n on (Ωθn )n ,
cf. proof of Lemma 5.2. Obviously,
ZZ
(c)
(c)
α
bj0 ;k1 ,k2 −→ αj0 ;k1 ,k2 :=
h∗(c) (x, y, θ)Φj0 ,k1 (x)Φj0 ,k2 (y)dx dy,
n→∞
d
d
Z Z R ×R
(c,e)
(c,e)
(e)
βbj;k1 ,k2 −→ βj;k1 ,k2 :=
h∗(c) (x, y, θ)Ψj;k1 ,k2 (x, y)dx dy
n→∞
Rd ×Rd
on (Ωθn )n . Hence,
∗(K,L)
∗(K,L) 2
n2 E[(Vbn,c
− Vn,c
) |Xn = ωn ] −→ 0,
n→∞
∗(K,L)
(c)
(c,e)
where the kernel of Vn,c
is obtained by substituting α
bj0 ;k1 ,k2 and βbj;k1 ,k2 in the
∗(K,L)
(c)
(c,e)
kernel of Vbn,c
through αj0 ;k1 ,k2 and βj;k1 ,k2 .
Thus, the next step is the application of the CLT of Neumann and Paparoditis [24]
PM (K,L)
∗(K,L)
to n Vn,c
. For this purpose we introduce Q∗i := k=1
tk qk (Xi∗ ), t1 , ..., tM (K,L) ∈
∗
R, where qk is the centered version (w.r.t. P X1 |Xn =ωn ) of qek (e
qk )k is defined as
in the proof of Theorem 2.1. Obviously, given X1 , . . . , Xn , the sequence (Q∗i )i is
centered and has uniformly bounded second moments. Due to (A1∗ )(i) the Lindeberg condition is satisfied. In order to show that for arbitrary ε > 0 the inequality | n1 var(Q∗1 P
+ · · · + Q∗n |Xn = ωn ) − σ 2 | < ε, ∀ n ≥ n0 (ε), holds true with
σ 2 = var(Q1 ) + 2 ∞
r=2 cov(Q1 , Qr ) and (Qk )k as in the proof of Theorem 2.1, the
15
abbreviations var∗ (·) = var(·|Xn = ωn ) and cov ∗ (·) = cov(·|Xn = ωn ) are used.
Hence,
1
var∗ [Q∗1 + · · · + Q∗n ] − σ 2 n
∞
∞
X
X
r−1
cov ∗ (Q∗1 , Q∗r ) − σ 2 ≤2
min
, 1 |cov ∗ (Q∗1 , Q∗r )| + var∗ (Q∗1 ) + 2
n
r=2
r=2
∞
R−1
X
X
r−1
≤2
min
, 1 |cov ∗ (Q∗1 , Q∗r )| + 2 [cov ∗ (Q∗1 , Q∗r ) − cov(Q1 , Qr )]
n
r=1
r=2
X
X
∗
∗
∗ cov(Q1 , Qr ) .
cov (Q1 , Qr ) + 2 + 2
r≥R
r≥R
P
P
By (A1) and (A1∗ ), R can be chosen such that r≥R cov(Q1 , Qr )+ r≥R cov ∗ (Q∗1 , Q∗r ) ≤
ε
. Moreover, (A1∗ ) implies that the first summand can be bounded from above by
4
ε
as well, if n ≥ n0 (ε) for some n0 (ε) ∈ N. According to the convergence of the
4
two-dimensional distributions and the uniform boundedness of (Q∗k )k∈Z it is possible to pick n0 (ε) such that additionally the two remaining summands are bounded
by 8ε . For the validity of the CLT of Neumann and Paparoditis [24] in probability,
it remains to verify their inequality (6.4). By Lipschitz continuity of Q∗t1 Q∗t2 this
holds with θr = Lip(Q∗t1 Q∗t2 )τ̄r = Lip(Qt1 Qt2 )τ̄r . The application of the continuous
∗(K,L)
d
(K,L)
mapping theorem results in Vn,c
−→ Zc
, in probability, which in turn implies
∗ d
Vn −→ Z.
In order to obtain the analogous result of convergence for n Un∗ , additionally,
!
n
1 X
∗
∗ b
P h(Xi , Xi , θn ) − Eh(X1 , X1 , θ) > ε Xn = ωn −→ 0
n→∞
n − 1
i=1
has to be proven for arbitrary ε > 0. Due to the continuity of h and according to
(A1∗ )(i)and (A2∗ )(i) it suffices to verify
!
n h
1 X
i ε h(Xk∗ , Xk∗ , θbn ) − E(h(X1∗ , X1∗ , θbn )|Xn = ωn ) > Xn = ωn −→ 0.
P n→∞
n − 1
2
k=1
The l.h.s. tends to zero if

"
#2
X
1
E
h(Xk∗ , Xk∗ , θbn ) − E(h(X1∗ , X1∗ , θbn )|Xn = ωn ) Xn = ωn 
n − 1 k=1
vanishes asymptotically. This can be proven using the same arguments as in the proof
of Theorem 2.2 for verifying the convergence of U -statistics. Bootstrap consistency
follows from Lemma 3.2.
5.2. Proofs of auxiliary results. In order to prove Lemma 2.1, Lemma 2.2, and
Lemma 2.3 an approximation of terms of the structure
n
1 X
Zn := 2
EH(Xi , Xj )H(Xk , Xl )
n i,j,k,l=1
16
will be required. Here, H denotes a symmetric, degenerate kernel function. Assuming
that (Xn )n∈N satisfies (A1) we obtain
Zn ≤
8
n2
n
X
|EH(Xi , Xj )H(Xk , Xl )| ≤ 8 sup E|H(X1 , Xk )|2 +
1≤k≤n
i≤j;k≤l;i≤k
n−1 4
8 X X (t)
Z
n2 r=1 t=1 n,r
with
n
X
(1)
=
Zn,r
ej )H(X
ek , X
el )|,
|EH(Xi , Xj )H(Xk , Xl ) − EH(Xi , X
i≤j,k≤l
r:=min{j,k}−i≥l−max{j,k}
(2)
Zn,r
n
X
=
el )|,
|EH(Xi , Xj )H(Xk , Xl ) − EH(Xi , Xj )H(Xk , X
i≤j,k≤l
r:=l−max{j,k}>min{j,k}−i
(3)
Zn,r
=
(4)
Zn,r
=
n
X
ej )H(X
ek , X
el )|,
|EH(Xi , Xj )H(Xk , Xl ) − EH(Xi , X
i≤k≤l<j
r:=i−k≥j−l
n
X
ej )H(Xk , Xl )|.
|EH(Xi , Xj )H(Xk , Xl ) − EH(Xi , X
i≤k≤l<j
r:=j−l>i−k
(1)
(3)
e0 , X
e 0 )0 is chosen such that
e0, X
Here, in each summand of Zn,r and Zn,r the vector (X
j
k
l
d
(2)
0
0
0
0
0
0
0
0
ej , X
e ,X
e ) = (Xj , X , X ) and (2.3) holds. Within Zn,r
it is independent of Xi , (X
k
l
k
l
(4)
e (r) (respectively X
e (r) ) is chosen to be inde(respectively Zn,r ) the random variable X
j
l
d
0
0 0
0 0
0
0
0
e (r) =
Xl (respectively
pendent of (X , X , X ) (respectively (X , X , X ) ) such that X
i
j
i
k
k
l
l
d
e (r) =
X
j
Xj ) and (2.3) holds. (This may possibly require an enlarging of the underlying
probability space.) Thus, by degeneracy the subtrahends of these expressions vanish.
(t)
Moreover, note that the number of summands of Zn,r , t = 1, . . . , 4, is bounded by
2(r + 1)n2 .
Proof of Lemma 2.1. For c > 0 we define ch := max{c, maxx,y∈[−c,c]d |h(x, y)|},

0 0 0
2d
0 0 0

/ [−c, c]2d with |h(x, y)| ≤ ch ,
h(x, y) for (x , y ) ∈ [−c, c] or (x , y ) ∈
e
h(c) (x, y) := −ch
for (x0 , y 0 )0 ∈
/ [−c, c]2d with h(x, y) < −ch ,

c
for (x0 , y 0 )0 ∈
/ [−c, c]2d with h(x, y) > ch
h
and its degenerate version
Z
Z
(c)
(c)
e
e
e
hc (x, y) := h (x, y) −
h (x, y)dF (x) −
h(c) (x, y)dF (y)
d
d
R
R
ZZ
e
+
h(c) (x, y) dF (x) dF (y).
Rd ×Rd
The approximation error n2 E(Vn − Vn,c )2 can be reformulated in terms of Zn with
kernel H = H (c) := h − h(c) . Hence, it remains to verify that supk∈N E|H (c) (H1 , Xk )|2
P
P4
(t)
and lim supn→∞ n12 n−1
r=1
t=1 Zn,r tend to zero as c → ∞. First, we investigate
17
P
(1)
lim supn→∞ n12 n−1
r=1 Zn,r , the remaining summands can be treated similarly. The
(1)
summands of Zn,r can be bounded as follows
ej )H (c) (X
ek , X
el )
EH (c) (Xi , Xj )H (c) (Xk , Xl ) − EH (c) (Xi , X
h
i
(c)
(c)
(c)
e
≤ E H (Xk , Xl ) H (Xi , Xj ) − H (Xi , Xj ) 1(Xk0 ,Xl0 )0 ∈[−c,c]2d h
i
ej ) 1(X 0 ,X 0 )0 ∈[−c,c]
+ E H (c) (Xk , Xl ) H (c) (Xi , Xj ) − H (c) (Xi , X
2d
k
l /
h
i
ej ) H (c) (Xk , Xl ) − H (c) (X
ek , X
el ) 1 0 e 0 0
+ E H (c) (Xi , X
(Xi ,Xj ) ∈[−c,c]2d h
i
ej ) H (c) (Xk , Xl ) − H (c) (X
ek , X
el ) 1 0 e 0 0
+ E H (c) (Xi , X
2d (Xi ,Xj ) ∈[−c,c]
/
(5.1)
= E1 + E2 + E3 + E4 .
The Lipschitz constant of H (c) is obviously independent of c. Therefore, the
iterative application of Hölder’s inequality to E2 yields
δ
ej )
E2 ≤ E H (c) (Xi , Xj ) − H (c) (Xi , X
1−δ
ej )|1(X 0 ,X 0 )0 ∈[−c,c]
× E|H (c) (Xk , Xl )|1/(1−δ) |H (c) (Xi , Xj ) − H (c) (Xi , X
2d
k
l /
n
1/(2−δ)
≤Cτrδ (E|H (c) (Xk , Xl )|(2−δ)/(1−δ) 1(Xk0 ,Xl0 )0 ∈[−c,c]
2d )
/
o1−δ
(c)
(2−δ)/(1−δ) (1−δ)/(2−δ)
(c)
(2−δ)/(1−δ) (1−δ)/(2−δ)
e
(E|H (Xi , Xj )|
)
+ (E|H (Xi , Xj )|
)
(1−δ)/(2−δ)
≤ Cτrδ (E|H (c) (Xk , Xl )|(2−δ)/(1−δ) 1(Xk0 ,Xl0 )0 ∈[−c,c]
.
2d )
/
(5.2)
As supk∈N E|h(X1 , Xk )| < ∞ for ν > (2 − δ)/(1 − δ), we obtain E2 ≤
with
ε1 (c) −→ 0 after employing Hölder’s inequality once again. Analogous calculations
ν
τrδ ε1 (c)
c→∞
yield E4 ≤ τrδ ε2 (c) with ε2 (c) −→ 0. Likewise, the approximation methods for E1 and
c→∞
E3 are equal. Therefore, only E1 is investigated:
Z
E1 ≤ E Rd
h
i
(c)
(c)
(c)
e
e
h (Xk , y)dF (y) H (Xi , Xj ) − H (Xi , Xj ) 1Xk ∈[−c,c]d Z
h
i
(c)
(c)
(c)
e
ej ) 1X ∈[−c,c]d + E h (y, Xl )dF (y) H (Xi , Xj ) − H (Xi , X
l
d
ZRZ
h
i
(c)
(c)
(c)
e
e
+ E
h (x, y)dF (x)dF (y) H (Xi , Xj ) − H (Xi , Xj ) Rd ×Rd
= E1,1 + E1,2 + E1,3 .
18
Analogously to (5.2) we obtain
(2−δ)/(1−δ)
1/(2−δ)
n Z
δ
(c)
e
1Xk ∈[−c,c]d
E1,1 ≤ Cτr E h(Xk , y) − h (Xk , y)dF (y)
d
R
o1−δ
(c)
(2−δ)/(1−δ)
(c)
(2−δ)/(1−δ) (1−δ)/(2−δ)
e
× 2(sup E|H (X0 , Xk )|
+ E|H (X, Xj )|
)
k∈N
Z Z
≤
Cτrδ
≤
τrδ ε3 (c)
Rd ×Rd
(1−δ)/(2−δ)
(c)
(2−δ)/(1−δ)
e
|h(x, y) − h (x, y)|
dF (y)1x∈[−c,c]d dF (x)
with ε3 (c) −→ 0. The investigation of E1,2 coincides with the previous one. The
c→∞
expression E1,3 can be approximated as follows
ZZ
|h(x, y) − e
h(c) (x, y)|dF (x)dF (y)
E1,3 ≤ Cτr
d
d
Z ZR ×R
≤ Cτr
|h(x, y)|1(x0 ,y0 )0 ∈[−c,c]
2d dF (x)dF (y)
/
Rd ×Rd
≤ τr ε4 (c) −→ 0.
c→∞
To sum up, we have E1 + E2 + E3 + E4 ≤ ε5 (c)τrδ , where ε5 (c) −→ 0 uniformly in n.
c→∞
This leads to
n−1
n−1
1 X
1 X (1)
Zn,r ≤ 2
lim lim sup 2
2(r + 1)n2 τrδ ε5 (c) = 0.
c→∞ n→∞ n
n r=1
r=1
It remains to examine
sup E[H (c) (X1 , Xk )]2 ≤ C( sup E[h(X1 , Xk ) − e
h(c) (X1 , Xk )]2
k∈N
k∈N
e1 ) − e
e1 )]2 ).
+ E[h(X1 , X
h(c) (X1 , X
e1 denotes an independent copy of X1 . Similar arguments as before yield
Here, X
limc→∞ supk∈N E[H (c) (X1 , Xk )]2 = 0.
The characteristics, stated in the following two lemmas, will be essential for a
wavelet approximation of the kernel function h.
Lemma 5.1. Given
a Lipschitz continuous function gR : Rd → R, define a function
P
gj by gj (x) := k∈Zd αj,k Φj,k (x), j ∈ Z, where αj,k = Rd g(x)Φj,k (x)dx. Then gj is
Lipschitz continuous with a constant that is independent of j.
Proof of Lemma 5.1. In order to establish Lipschitz continuity, gj is decomposed into
two parts
X Z
X Z
gj (x) =
Φj,k (u)g(x)du Φj,k (x) +
Φj,k (u)[g(u) − g(x)]du Φj,k (x)
k∈Zd
Rd
k∈Zd
Rd
= H1 (x) + H2 (x).
According to the above choice of the scale function (with characteristics (1) - (3) of
Subsection 2.2) the prerequisites of Theorem 8.2 of Härdle et al. [20] are fulfilled for
19
N = 1. This implies that
result we obtain
R∞ P
jd
H1 (x) = g(x)2
l∈Z
−∞
d Z
Y
i=1
∞
φ(y − l)φ(z − l)dz = 1 ∀y ∈ R. Based on this
X
φ(2j ui − l)φ(2j xi − l)dui = g(x)
−∞ l∈Z
by applying an appropriate variable substitution. This in turn immediately implies
the desired continuity property. Note that for every fixed x the sum has finitely many
non-vanishing summands because of the finite support of φ and since the number of
summands is independent of j. Therefore, the order of summation and integration is
interchangeable. In order to investigate H2 , we define a sequence of functions (κk )k∈Z
by
Z
κk (x) =
Φj,k (u)[g(u) − g(x)]du.
Rd
These functions are Lipschitz continuous with a constant decreasing in j:
|κk (x) − κk (x̄)| ≤ Lip(g)O(2−jd/2 )kx − x̄kl1 .
(5.3)
Moreover, boundedness and Lipschitz continuity of φ yield
kΦj,k k∞ = O(2jd/2 ) and
|Φj,k (x) − Φj,k (x̄)| = O(2j(d/2+1) )kx − x̄kl1 .
(5.4)
Thus,
|H2 (x) − H2 (x̄)| ≤
X
|Φj,k (x)| |κk (x) − κk (x̄)| +
k∈Zd
≤Ckx − x̄kl1 +
X
|κk (x̄)| |Φj,k (x) − Φj,k (x̄)|
k∈Zd
X
|κk (x̄)| |Φj,k (x) − Φj,k (x̄)| .
k∈Zd
It has to be distinguished whether or not x̄ ∈ supp Φj,k in order to approximate
the second summand. (Here, supp denotes
R the support of a function.) In the first
case it is helpful to investigate |κk (x̄)| = | Rd Φj,k (u)[g(u) − g(x̄)]du|. The integrand
is non-trivial only if u ∈ supp Φj,k . In these situations, |g(u) − g(x̄)| = O(2−j ) by
Lipschitz continuity. Consequently, we achieve
Z
−j
|κk (x̄)| ≤ O(2 )
|Φj,k (u)|du = O(2−j(d/2+1) )
Rd
which leads to
X
|κk (x̄)| |Φj,k (x) − Φj,k (x̄)| = O(kx − x̄kl1 )
k∈Zd
since the number of non-vanishing summands is finite, independently of the values of
x and x̄. Therefore, Lipschitz continuity of H2 is obtained as long as x̄ ∈ supp Φj,k .
In the opposite case, we only have to consider the situation of x ∈ supp Φj,k since
the setting x̄, x ∈
/ supp Φj,k is trivial. With the aid of (5.3) and (5.4), the first term
of the r.h.s. of
|κk (x̄) [Φj,k (x) − Φj,k (x̄)]| ≤ |κk (x̄) − κk (x)| |Φj,k (x)| + |κk (x)| |Φj,k (x) − Φj,k (x̄)|
(5.5)
can be approximated from above by Ckx − x̄kl1 . The investigation of the second
summand is identical to the analysis of the case x̄ ∈ supp Φj,k .
Finally, we obtain |H2 (x) − H2 (x̄)| ≤ Ckx − x̄kl1 where C < ∞ is a constant that is
independent of j. This yields the assertion of the lemma.
20
Lemma 5.2. Let g : Rd → R be a continuous function that is Lipschitz continuous on
some interval (−c, c)d , c > Sφ , where Sφ denotes the length of the support of the scale
function φ. For arbitrary 0 < b < c−Sφ and K ∈ N there exists
P J = J(K, b) ∈ N such
(K,b)
(K,b)
that for g and its approximation g
given by g
(x) = k∈Zd αJ(K,b),k ΦJ(K,b),k (x)
it holds
max |g(x) − g (K,b) (x)| ≤ 1/K.
x∈[−b,b]d
Proof of Lemma 5.2. Given 0 < b < ∞, we define ḡ (b) (x) := g(x)wb (x), where wb
is a Lipschitz continuous weight function with compact support. Moreover, wb is
assumed to be bounded from above by 1 and wb (x) := 1 for x ∈ [−b − Sφ , b + Sφ ]d .
R
(b)
Additionally, we set αJ(K,b),k := Rd ḡ (b) (u)ΦJ(K,b),k (u)du. Hence,
X
(b)
(b)
(K,b)
max g(x) − g
(x) ≤ max ḡ (x) −
αJ(K,b),k ΦJ(K,b),k (x)
d
d
x∈[−b,b]
x∈[−b,b] k∈Zd
X
(b)
+ max αJ(K,b),k ΦJ(K,b),k (x) − g (K,b) (x)
x∈[−b,b]d d
k∈Z
= max A
x∈[−b,b]d
Since ḡ (b) ∈ L2 (Rd ),
X XX
(b,e)
(e)
(J)
A = βj,k Ψj,k j≥J(K,b) k∈Zd e∈E
(J)
(x) + max B (J) (x).
x∈[−b,b]d
with
(b,e)
βj,k
Z
=
Rd
(e)
ḡ (b) (u)Ψj,k (u)du
holds true in the L2 -sense. Lemma 5.1 implies equicontinuity of (A(J) )J∈N on [−b, b]d .
Therefore, the above equality is valid pointwise on [−b, b]d if additionally
P
P
P
(b,e) (e)
d
e∈E βj,k Ψj,k is continuous on [−b, b] . This can be proven by verij≥J(K,b)
k∈Zd
P
P
P
P
P
P
(b,e) (e)
(b,e) (e)
fying uniform convergence of N
j=J(K,b)
k∈Zd
e∈E βj,k Ψj,k to
j≥J(K,b)
k∈Zd
e∈E βj,k Ψj,k .
P
(b,e) (e)
For all x ∈ [−b, b]d , the number of k ∈ Zd with e∈E βj,k Ψj,k (x) 6= 0 is bounded
(e)
uniformly in j. Furthermore, kΨj,k k∞ = O(2jd/2 ), ∀ j ∈ Z, k ∈ Zd , e ∈ E. As ḡ (b) is
R∞
Lipschitz continuous and −∞ ψ(x)dx = 0, we obtain
Z
(b,e)
(e)
|βj,k | ≤
|ḡ (b) (u) − ḡ (b) (x)|Ψj,k (u)du = O(2−j(d/2+1) ), ∀j ≥ j0 (b)
Rd
(e)
for some j0 (b) ∈ Z if Ψj,k (x) 6= 0 for some x ∈ [−b, b]d . Finally, we end up with
∞
∞
X XX
X
(b,e) (e)
max βj,k Ψj,k (x) ≤ C
2−j −→ 0
(5.6)
d
N →∞
x∈[−b,b]
d
j=N +1 k∈Z e∈E
j=N +1
and consequently,
X XX
(b,e)
(e)
(J)
max A (x) = max βj,k Ψj,k .
x∈[−b,b]d
x∈[−b,b]d j≥J(K,b) k∈Zd e∈E
21
Moreover, the inequality maxx∈[−b,b]d A(J) (x) ≤ K1 , ∀ J(K, b) ≥ J0 (K, b), for some
J0 (K, b) ∈ N results from (5.6). The introduction of the finite set of indices
Z̄(J) := k ∈ Zd | ΦJ(K,b),k (x) 6= 0 for some x ∈ [−b, b]d
leads to
max B
x∈[−b,b]d
(J)
X
(x) = max d
x∈[−b,b]
αJ(K,b),k −
(b)
αJ(K,b),k ΦJ(K,b),k (x).
k∈Z̄(J)
(b)
This term is equal to 0 since the definition of ḡ (b) implies αJ(K,b),k −αJ(K,b),k = 0, ∀ k ∈
Z̄.
Proof of Lemma 2.2. The assertion of the Lemma is verified in two steps. First,
the bounded kernel hc , constructed in the proof of Lemma 2.1, is approximated by
P
(K)
(K)
(c)
e
hc which is defined by e
hc (x, y) = k1 ,k2 ∈Rd αJ(K);k1 ,k2 ΦJ(K),k1 (x)ΦJ(K),k2 (y) with
RR
(c)
αJ(K);k1 ,k2 = Rd ×Rd hc (x, y)ΦJ(K),k1 (x)ΦJ(K),k2 (y)dx dy. Here, the indices (J(K))K∈N
with J(K) −→ ∞ are chosen such that the assertion of Lemma 5.2 holds true for
K→∞
(K)
b ∈ R with P (X1 ∈
/ [−b, b]d ) ≤ K1 . Since the function e
hc is not degenerate in
general, we introduce its degenerate counterpart
Z
Z
(K)
(K)
(K)
e
e
e
hc (x, y) = hc (x, y) −
hc (x, y)dF (x) −
h(K)
c (x, y)dF (y)
d
d
R
R
ZZ
e
+
h(K)
c (x, y)dF (x)dF (y).
Rd ×Rd
(K)
and denote the corresponding V -statistic by Vn,c .
Now, the structure of the proof is as follows. First, we prove
(K) 2
lim sup n2 E Vn,c − Vn,c
−→ 0.
(5.7)
In a second step, it remains to show that for every fixed K
(K)
(K,L) 2
lim sup n2 E Vn,c
− Vn,c
−→ 0.
(5.8)
n→∞
n→∞
K→∞
L→∞
2
(K)
In order to verify (5.7), we rewrite n2 E Vn,c − Vn,c
in terms of Zn with kernel funcP
P4
(K)
(t)
tion H := H (K) = hc −hc . Hence it remains to verify that lim supn→∞ n12 n−1
r=1
t=1 Zn,r
and supk∈N E|H (K) (H1 , Xk )|2 tend to zero as K → ∞. Exemplarily, we investigate
P
(1)
(1)
lim supn→∞ n12 n−1
r=1 Zn,r . The summands of Zn,r can be bounded as follows
ej )H (K) (X
ek , X
el )
EH (K) (Xi , Xj )H (K) (Xk , Xl ) − H (K) (Xi , X
h
i
(K)
(K)
(K)
e
≤ E H (Xk , Xl ) H (Xi , Xj ) − H (Xi , Xj ) h
i
ej ) H (K) (Xk , Xl ) − H (K) (X
ek , X
el ) + E H (K) (Xi , X
≤ 2kH (K) k∞ Lip(H (K) )τr .
22
According to Lemma 5.1 Lip(H (K) ) does not depend on K. The degeneracy of hc
(K)
e (K)
implies
R ∞ P kH k∞ ≤ 4khc − hc k∞ . As already indicated in the proof of Lemma 5.1,
l∈Z φ(y1 − l)φ(y2 − l)dy2 = 1 holds true. This leads to
−∞
X
(K)
ΦJ(K),k1 (x)ΦJ(K),k2 (y)
khc − e
hc k∞ = k1 ,k2 ∈Zd
ZZ
[hc (x, y) − hc (u, v)]ΦJ(K),k1 (u)ΦJ(K),k2 (v)dudv ×
.
Rd ×Rd
0
∞
0 0
If (x , y ) does not lie in the support of ΦJ(K),k1 ΦJ(K),k2 , the corresponding summand
vanishes. Otherwise, Lipschitz continuity of h is employed for further approximations.
Similarly to the proof of Lemma 5.1 we obtain
Z Z
−J(K)(d+1)
.
d d [hc (x, y) − hc (u, v)]ΦJ(K),k1 (u)ΦJ(K),k2 (v)du dv = O 2
R
R
(K)
This implies khc − e
hc k∞ = O(2−J(K) ) since for every fixed tuple (x0 , y 0 )0 only a finite
number of summands is non-vanishing. This
number does neither depend
on J(K)
nor on the value of (x, y). It follows that EH (K) (Xi , Xj )H (K) (Xk , Xl ) ≤ C2−J(K) τr .
(K)
Due to kH (K) k∞ ≤ 4khc − e
hc k∞ = O(2−J(K) ), supk∈N E|H (K) (H1 , Xk )|2 → 0 as
K → ∞. Set τ0 = 1. Summing up the achievements results in
"
#
n−1
X
1
(K) 2
lim sup n2 E(Vn,c − Vn,c
) ≤ C lim sup 2
2−J(K) (r + 1)n2 τr + 2−2J(K) −→ 0,
K→∞
n
n→∞
n→∞
r=0
which finishes the proof of (5.7).
The main goal of the previous step was the multiplicative separation of the random
variables which are cumulated in hc . The aim of the second step is the approximation
(K)
of hc , whose representation is given by an infinite sum, by a function consisting
of only finitely many summands.
Similar
to the foregoing part of the proof the
(K)
(K,L)
approximation error n2 E Vn,c − Vn,c
2
is reformulated in terms of Zn with kernel
Pn−1 (1)
(K)
(K,L)
H := H (L) = hc − hc
. As before, we exemplarily take n−2 r=1
Zn,r and
(K)
2
supk∈N E|H (X1 , Xk )| into further consideration. Concerning the summands of
(1)
Zn,r we obtain
(L)
(L)
(L)
(L) e
e
e
EH (Xi , Xj )H (Xk , Xl ) − EH (Xi , Xj )H (Xk , Xl )
h
i
ej ) 1(X 0 ,X 0 )0 ∈[−B,B]2d ≤ E H (L) (Xk , Xl ) H (L) (Xi , Xj ) − H (L) (Xi , X
k
l
h
i
(L)
(L)
(L)
e
+ E H (Xk , Xl ) H (Xi , Xj ) − H (Xi , Xj ) 1(Xk0 ,Xl0 )0 ∈[−B,B]
2d /
h
i
ej ) H (L) (Xk , Xl ) − H (L) (X
ek , X
el ) 1 0 e 0 0
+ E H (L) (Xi , X
(Xi ,Xj ) ∈[−B,B]2d h
i
(L)
(L)
(L) e
e
e
+ E H (Xi , Xj ) H (Xk , Xl ) − H (Xk , Xl ) 1(X 0 ,Xe 0 )0 ∈[−B,B]
2d i
j /
=E1 + E2 + E3 + E4
23
for arbitrary B > 0. Obviously, it suffices to take the first two summands into further
considerations. The both remaining terms can be treated similarly. Since φ and ψ
have compact support, the number of overlapping functions within (Φj0 ,k )k∈{−L,...,L}d
and (Ψj,k )k∈{−L,...,L}d ,j0 ≤j<J(K) can be bounded by a constant that is independent of
L. By Lipschitz continuity of φ and ψ this leads to uniform Lipschitz continuity of
(K,L)
(hc
)L∈N . Moreover, note that (H (L) )L is uniformly bounded. Due to the reformulation
J(K)−1
e
h(K)
c (x, y)
=
X
(c)
αj0 ;k1 ,k2 Φj0 ,k1 (x)Φj0 ,k1 (y)
k1 ,k2 ∈Zd
+
X
X X
j=j0
k1 ,k2 ∈Zd e∈Ē
(c,e)
(e)
βj;k1 ,k2 Ψj;k1 ,k2 (x, y)
(K)
(K,L)
one can choose (B = B(K, L))L∈N such that supx,y∈[−B,B]d |e
hc (x, y) − e
hc
(x, y)| =
0 and B −→ ∞. This setting allows for the following approximations
L→∞
h
i1−δ
1−δ
E1 ≤ Cτrδ E|H (L) (Xk , Xl )|1/(1−δ) 1(Xk0 ,Xl0 )0 ∈[−B,B]d
≤ Cτrδ P (X1 ∈
,
/ [−B, B]d )
1−δ
E2 ≤ Cτrδ P (X1 ∈
.
/ [−B, B]2d )
Analogously, it can be shown that supk∈N0 E[H (L) (X0 , Xk )]2 ≤ CP (X1 ∈
/ [−B, B]d ).
Finally, we obtain
#
"
n−1
X
1
(K)
(K,L) 2
(r + 1)n2 τrδ −→ 0.
lim sup n2 E(Vn,c
−Vn,c
) ≤ C(P (X1 ∈
/ [−B, B]d ))1−δ lim sup 2
L→∞
n
n→∞
n→∞
r=1
Hence, (5.7) and (5.8) hold.
Proof of Lemma 2.3. In order to prove this result we follow the lines of the proofs of
Lemma 2.1, Lemma 2.2 and Lemma 5.1 and carry out some modifications.
In a first step we reduce the problem to statistics with bounded kernels. To this end
we use the modified approximation
|H (c) (x, y) − H (c) (x̄, ȳ)| ≤ [2 f (x, x̄, y, ȳ) + g(x, x̄) + g(y, ȳ)] [kx − x̄kl1 + ky − ȳkl1 ]
=: f1 (x, x̄, y, ȳ) [kx − x̄kl1 + ky − ȳkl1 ],
R
where g is given by g(x, x̄) := Rd f (x, x̄, z, z)dF (z). Under (A4)(i) Hölder’s inequality
yield
E|H (c) (Yk1 , Yk2 ) − H (c) (Yk3 , Yk4 )|
≤
E[f1 (Yk1 , Yk2 , Yk3 , Yk4 )]1/(1−δ)
4
X
!1−δ
kYki kl1
(EkYk1 − Yk3 kl1 + EkYk2 − Yk4 kl1 )δ
i=1
for Yki (ki ∈ N, i = 1, . . . , 4), as defined in (A4). Plugging in this inequality into the
(c)
calculations of the proof of Lemma 2.1 yields lim supn→∞ n2 E(Vn − Vn )2 −→ 0.
c→∞
The next step will be the wavelet approximation of the bounded kernel hc . Defining
(K)
(K)
hc and Vn,c as in the proof of Lemma 2.2 analogous to the proof of Lemma 5.1
24
there exists a C > 0 such that
|e
hc(K) (x̄, ȳ) − e
h(K)
c (x, y)|
≤ f1 (x, x̄, y, ȳ)[kx − x̄kl1 + ky − ȳkl1 ] + |H2 (x̄, ȳ) − H2 (x, y)|
≤ Cf1 (x, x̄, y, ȳ)[kx − x̄kl1 + ky − ȳkl1 ]
X
+
|κk1 ,k2 (x̄, ȳ)||ΦJ(K),k1 (x)ΦJ(K),k2 (y) − ΦJ(K),k1 (x̄)ΦJ(K),k2 (ȳ)| ,
(5.9)
k1 ,k2 ∈Zd
R R
where κk1 ,k2 is given by κk1 ,k2 (x, y) := Rd Rd ΦJ(K);k1 ,k2 (u, v)[hc (u, v) − hc (x, y)]du dv
and H2 is defined as in the proof of Lemma 5.1. In order to approximate the last
summand of (5.9), we distinguish again between the cases whether or not (x̄0 , ȳ 0 )0 ∈
supp ΦJ(K),k1 ΦJ(K),k2 . In the first case, an upper bound of order
O(maxc1 ,c2 ∈[−Sφ /2J(K) ,Sφ /2J(K) ]d f1 (x̄ + c1 , x̄, ȳ + c2 , ȳ))(kx̄ − xkl1 + kȳ − ykl1 ) can be
achieved. Here, Sφ denotes the length of the support of φ. In the second case
a decomposition, similar to (5.5), can be employed which leads to the upper bound
O(f1 (x, x̄, y, ȳ)+maxc1 ,c2 ∈[−Sφ /2J(K) ,Sφ /2J(K) ]d f1 (x+c1 , x, y+c2 , y))(kx̄−xkl1 +kȳ−ykl1 ).
Consequently, we obtain
|e
hc(K) (x̄, ȳ) − e
h(K)
c (x, y)| ≤ O f1 (x, x̄, y, ȳ) +
+
f1 (x + c1 , x, y + c2 , y)
f1 (x̄ + c1 , x̄, ȳ + c2 , ȳ)
max
c1 ,c2 ∈[−Sφ /2J(K) ,Sφ /2J(K) ]d
max
c1 ,c2 ∈[−Sφ /2J(K) ,Sφ /2J(K) ]d
× (kx̄ − xkl1 + kȳ − ykl1 )
=: f2 (x, x̄, y, ȳ)(kx̄ − xkl1 + kȳ − ykl1 ).
This leads to |H (K) (x, y) − H (K)
R (x̄, ȳ)| ≤ f3 (x, x̄, y, ȳ)(kx
R − x̄kl1 + ky − ȳkl1 ) with
f3 (x, x̄, y, ȳ) = 2f2 (x, x̄, y, ȳ) + Rd f2 (x, x̄, z, z)dF (z) + Rd f2 (z, z, ȳ, y)dF (z). Note
that under (A4) Ef3 (Yi , Yj , Yk , Yl )η (kYi kl1 | + kYj kl1 + kYk kl1 + kYl kl1 ) < ∞ if J(K)
is sufficiently large. Making some modifications in the proof of Lemma 2.3, Zn =
(K)
n2 E(Vn,c −Vn,c )2 can be approximated. As before we restrict ourselves to investigate
P
(1)
(1)
supk∈N E[H (K) (X1 , Xk )]2 and n−1
r=1 Zn,r . The summands of Zn,r can be bounded as
follows
ej )H (K) (X
ek , X
el )
EH (K) (Xi , Xj )H (K) (Xk , Xl ) − EH (K) (Xi , X
h
i
(K)
(K)
(K)
e
≤ E H (Xk , Xl ) H (Xi , Xj ) − H (Xi , Xj ) h
i
ej ) H (K) (Xk , Xl ) − H (K) (X
ek , X
el ) .
+ E H (K) (Xi , X
Since further approximations are similar for both summands, we concentrate on the
first one. Iterative application of Hölder’s inequality leads to
h
i
ej ) E H (K) (Xk , Xl ) H (K) (Xi , Xj ) − H (K) (Xi , X
2
ej kl1 f3 (Xi , Xi , Xj , X
ej )1/(1−δ) ]δ(1−δ) E|H (K) (Xk , Xl )|1/(1−δ) 1−δ ,
≤ O(τrδ )[EkXj − X
as boundedness of hc implies uniform boundedness of (H (K) )K . The middle factor is
bounded because of (A4)(i). Therefore, it remains to examine E|H (K) (Xk , Xl )|1/(1−δ) .
25
Let b(K) be chosen such that P (X1 ∈
/ [−b(K), b(K)]d ) ≤ 1/K, then
E|H (K) (Xk , Xl )|1/(1−δ)
= E|H (K) (Xk , Xl )|1/(1−δ) 1Xk ,Xl ∈[−b(K),b(K)]d + O P (X1 ∈
/ [−b(K), b(K)]d )
/ [−b(K), b(K)]d ) .
≤4
sup
|hc (x, y) − e
h(K) (x, y)|1/(1−δ) + O P (X1 ∈
c
x,y∈[−b(K),b(K)]d
Due to Lemma 5.2 and the continuity of f we obtain 1/K as an upper bound for the first term if J(K) is chosen sufficiently big. Consequently, EH (K) (Xi , Xj )H (K) (Xk , Xl ) ≤
2
(1)
C K δ−1 τrδ which implies that Zn,r tends to zero as K increases. Furthermore, one obtains supk∈N0 E[H (K) (X0 , Xk )]2 −→ 0 similarly to the investigation of E|H (K) (Xk , Xl )|1/(1−δ)
K→∞
2
(K)
2
above. Thus, we get lim supn→∞ n E Vn,c − Vn,c
−→ 0.
K→∞
(K)
(K,L) 2
Step three of the proof contains the verification of lim supn→∞ n2 E(Vn,c −Vn,c
) −→ 0.
L→∞
For this purpose it suffices to plug in a modified approximation of H (L) (x, y) −
H (L) (x̄, ȳ) into the second part of the proof of Lemma 2.2. Lipschitz continuity
(K,L)
of hc
implies
|H (L) (x, y) − H (L) (x̄, ȳ)| ≤ f4 (x, x̄, y, ȳ)[kx − x̄kl1 + ky − ȳkl1 ]
with f4 (x, x̄, y, ȳ) = C + f3 (x, x̄, y, ȳ). Since, for J(K) sufficiently large, f4 satisfies
the moment assumption of (A4)(i) with a = 0, we obtain
E|H (L) (Yk1 , Yk2 ) − H (L) (Yk3 , Yk4 )| ≤ C[E(kYk1 − Yk3 kl1 + kYk2 − Yk4 kl1 )]δ .
(K)
(K,L) 2
Hence, lim supn→∞ n2 E(Vn,c − Vn,c
) −→ 0 under (A4). Summing up the three
L→∞
steps leads to
(K,L) 2
lim lim lim lim sup n2 E(Vn − Vn,c
) = 0.
c→∞ K→∞ L→∞
n→∞
Proof of Lemma 3.2. A positive variance of Z implies the existence of constants V >
0 and c0 > 0 such that for every c ≥ c0 we can find K0 ∈ N such that for every K ≥ K0
(K,L)
there is an L0 with var(Zc
) ≥ V, ∀L ≥ L0 . Therefore, uniform equicontinuity of
(K,L)
)L )K )c yields the desired property of Z. By
the distribution functions of (((Zc
(K,L)
matrices-based notation of Zc
we obtain
M (K,L)
Zc(K,L)
=
X
(c,K,L)
(K,L)
γk1 ,k2 Zk1
(K,L)
Zk2
= Z̄ (K,L)0 Γ(K,L)
Z̄ (K,L) ,
c
k1 ,k2 =1
(K,L)
with a symmetric matrix of coefficients Γc
and a normal vector
(K,L)
(K,L)
(K,L)
0
(K,L)
Z̄
= (Z1
, . . . , ZM (K,L) ) . Hence, Z
can be rewritten in the following way
M (K,L)
d
Zc(K,L) =
0
Ȳ U
(K,L)0
U (K,L) Ȳ
Λ(K,L)
c
=Y
0
Λ(K,L)
Y
c
=
X
(c,K,L)
λk
Yk2 .
k=1
Here U (K,L) is a certain orthogonal matrix,
(c,K,L)
|λ1
(c,K,L)
(K,L)
Λc
(c,K,L)
:= diag(λ1
(c,K,L)
, ..., λM (K,L) ) with
| ≥ ... ≥ |λM (K,L) | and, Ȳ as well as Y are multivariate standard normally
26
distributed random vectors. For notational simplicity we suppress the upper index (c, K, L) in the sequel. Due to the above choice of the triple (c, K, L), either
P4
PM (K,L)
(λk )2 or
(λk )2 is bounded from below by V /4. In the first case,
k=5
k=1p
λ1 ≥ V /16 holds true which implies for arbitrary ε > 0
Z 2ε
C(ε)
(K,L)
∀ x ∈ R,
P (Zc
∈ [x − ε, x + ε]) ≤
fλ1 Y12 (t)dt ≤ p
V /16
0
where C(ε) −→ 0. Here, the first inequality results from the fact that convolution
ε→0
preserves the continuity properties of the smoother function.
PM (K,L)
In the opposite case, i.e.
(λk )2 ≥ V /4, it is possible to bound the uniform
k=5
(K,L)
norm of the density function of Zc
by means of its variance. To this end, we
(K,L)
first consider the characteristic function ϕZc(K,L) of Zc
and assume w.l.o.g. that
M (K,L)/4
M (K, L) is divisible by 4. Defining a sequence (µk )k=1
by µk = λ4k for k ∈
{1, ..., M (K, L)/4} allows for the following approximation

−1/4
(K,L)
MY

2
1 + (2λj t)
ϕZc(K,L) (t) =


j=1

−1
M (K,L)/4
Y

≤
1 + (2µj t)2


j=1
≤
1+
4(µ21
1
.
+ · · · + µ2M (K,L)/4 )t2
By inverse Fourier transform we obtain the following result concerning the density
(K,L)
function of Zc
Z ∞
1
1
1
q
kϕZc(K,L) k1 ≤
dt
fZc(K,L) ≤
2π
2π −∞ 1 + 4( µ2 + · · · + µ2
∞
2
t)
1
M (K,L)/4
Z ∞
1
1
1
=q
du
2
µ21 + · · · + µ2M (K,L)/4 2π 0 1 + u
1
≤
2
q
2
q
4(µ21 + · · · + µ2M (K,L)/4−1 )
1
≤
λ25 + · · · + λ2M (K,L)
1
≤√ .
V
(K,L)
Thus, P (Zc
∈ [x − ε, x + ε]) ≤ √2V ε which completes the studies of the case
PM (K,L)
(λk )2 > V /4 and finally yields the assertion.
k=5
Proof of Lemma 3.2. This result is an immediate consequence of Theorem 3.1.
Acknowledgment . The author thanks Michael H. Neumann for his constructive advice
and fruitful discussions.
27
This research was funded by the German Research Foundation DFG (project: NE
606/2-1).
References
[1] Arcones, M. A. and Giné, E. (1992). On the bootstrap of U and V statistics.
Ann. Statist. 20, 655–674.
[2] Babbel, B. (1989). Invariance principles for U -statistics and von Mises functionals. J. Statist. Plann. Inference 22, 337–354.
[3] Billingsley, P. (1968). Convergence of Probabilty measures. New York: Wiley.
[4] Carlstein, E. (1989). Degenerate U -statistics based on non-independent observations. Calcutta Statist. Assoc. Bull. 37, 55–65.
[5] Daubechies, I. (2002). Ten Lectures on Wavelets. Philadelphia, PA: Society for
Industrial and Applied Mathematics.
[6] Dedecker, J. and Prieur, C. (2004). Couplage pour la distance minimale. C. R.
Acad. Sci. Paris, Ser I 338, 805–808.
[7] Dedecker, J. and Prieur, C. (2005). New dependence coefficients. Examples and
applications to statistics. Probab. Theory Relat. Fields 132, 203–236.
[8] Dehling, H. and Mikosch, T. (1994). Random Quadratic Forms and the Bootstrap
for U -Statistics. J. Multivariate Anal. 51, 392–413.
[9] Denker, M. (1982). Statistical decision procedures and ergodic theory. In Ergodic
theory and related topics (Vitte, 1981) (ed. Michel, H.). Math. Res. 12, 35–47.
Berlin: Akademieverlag.
[10] Dewan, I. and Prakasa Rao, B. L. S. (2001). Asymptotic normality for U statistics of associated random variables. J. Statist. Plann. Inference 97, 201–225.
[11] de Wet, T. (1987). Degenerate U - and V -statistics. South African Statist. J. 21,
99–129.
[12] Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Rev.
41, 45–76.
[13] Doukhan, P. and Louhichi, S. (1999). A weak dependence condition and applications to moment inequalities. Stochastic Process. Appl. 84, 313–342.
[14] Doukhan, P. and Neumann, M. H. (2008). The notion of ψ-weak dependence and
its applications to bootstrapping time series. Probab. Surv. 5, 146–168.
[15] Eagleson, G. K. (1979). Orthogonal expansions and U -statistics. Austral. J.
Statist. 21, 221-237.
[16] Fan, Y. and Li, Q. (1999). Central limit theorem for degenerate U -statistics of
absolutely regular processes with applications to model specification testing. J.
Nonparametr. Stat. 10, 245–271.
[17] Feuerverger, A. and Mureika, R. A. (1977). The empirical characteristic function
and its applications. Ann. Statist. 5, 88-97.
[18] Franke, J. and Wendel, M. (1992). A Bootstrap Approach for Nonlinear Autoregressions - Some Preliminary Results. In Bootstrapping and Related Techniques
(eds. Jöckel, K. H., Rothe, G. and Sendler, W.). Lecture Notes in Economics and
Mathematical Systems 376, Berlin: Springer.
[19] Ghosh, B. K. and Huang, W.-M. (1991). The Power and Optimal Kernel of the
Bickel-Rosenblatt Test for Goodness of Fit. Ann. Statist. 19, 999–1009.
[20] Härdle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). Wavelets,
Approximation, and Statistical Applications. New York: Springer.
[21] Huang, W. and Zhang, L.-X. (2006). Asymptotic normality for U -statistics of
negatively associated random variables. Statist. Probab. Lett. 76, 1125-1131.
28
[22] Leucht, A. (2010). Some tests of L2 -type for weakly dependent observations. In
preparation.
[23] Leucht, A. and Neumann, M. H. (2009). Consistency of general bootstrap methods for degenerate U - and V -type statistics, J. Multivariate Anal. 100, 16221633.
[24] Neumann, M. H. and Paparoditis, E. (2008). Goodness-of-fit tests for Markovian
time series models: Central limit theory and bootstrap approximations. Bernoulli
14, 14-46.
[25] Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New
York: Wiley.
[26] Wojtaszczyk, P. (1997). A Mathematical Introduction to Wavelets. London
Mathematical Society, Students Texts 37. Cambridge: Cambridge Univ. Press.