Degenerate U - and V -statistics under weak dependence: Asymptotic theory and bootstrap consistency Anne Leucht* Friedrich-Schiller-Universität Jena Institut für Stochastik Ernst-Abbe-Platz 2 D-07743 Jena Germany E-mail: [email protected] Abstract We devise a general result on the consistency of model-based bootstrap methods for U and V -statistics under easily verifiable conditions. For that purpose we derive the limit distribution of degree-2 degenerate U - and V -statistics for weakly dependent Rd -valued random variables first. To this end, only some moment conditions and smoothness assumptions concerning the kernel are required. Based on this result, we verify that the bootstrap counterparts of these statistics converge to the same limit distributions. Finally, some applications to hypothesis testing are presented. 2000 Mathematics Subject Classification. 62E20, 62G09. Keywords and Phrases. Bootstrap, consistency, U -statistics, V -statistics, weak dependence. Short title. Bootstrap for U -statistics under weak dependence. 1 1. Introduction Numerous test statistics can be formulated or approximated in terms of degenerate U - or V -type statistics multiplied with the sample size. Examples include the Cramér-von Mises statistic, the Anderson-Darling statistic or the χ2 -statistic. For i.i.d. random variables the limit distributions of U - and V -statistics can be derived via a spectral decomposition of its kernel if the latter is squared integrable. To use the same method for dependent data, often restrictive assumptions are required whose validity is quite complicated or even impossible to check in many cases. The first of our two main results is the derivation of the asymptotic distributions of U - and V -statistics under assumptions that are fairly easy to check. This approach is based on a wavelet decomposition instead of a spectral decomposition of the kernel. These limit distributions for both independent and dependent observations depend on certain parameters which in turn depend on the underlying situation in a complicated way. Therefore, problems arise as soon as critical values for test statistics of U - and V -type have to be determined. The bootstrap offers a convenient way to circumvent these problems, see Arcones and Giné [1], Dehling and Mikosch [8] or Leucht and Neumann [23] for the i.i.d. case. To our knowledge, there are no results concerning bootstrapping general degenerate U -statistics of non-independent observations. As a second main result of the paper, we establish consistency of model-based bootstrap methods for statistics of weakly dependent data. In order to describe the dependence structure of the sample we do not use the concept of mixing. It is inappropriate in some sense since not only the asymptotic behaviour of U - and V -type statistics but also bootstrap consistency is focussed in this paper. Unfortunately, model-based bootstrap methods can yield samples that are no longer mixing even though the original sample satisfies some mixing condition. For further discussion of this issue, see Doukhan and Neumann [14]. Instead of mixing we suppose the sample to be τ -dependent in the sense of Dedecker and Prieur [7]. Our paper is organized as follows. We start with an overview of asymptotic results on degenerate U - and V -statistics of dependent random variables. In Subsection 2.2 we introduce the underlying concept of weak dependence and derive the asymptotic distributions of U - and V -statistics. On the basis of these results, we deduce consistency of general bootstrap methods in Section 3. Some applications of the theory to hypotheses testing are presented in Section 4. All proofs are deferred to a final Section 5. 2. Asymptotic distributions of U - and V -statistics 2.1. Survey of literature. Let (Xn )n∈N be a sequence of Rd -valued random variables on some probability space. In the case of i.i.d. random variables, the limit distribution of degenerate U - and V -type statistics, i.e. n n 1 X 1 XX h(Xj , Xk ) and n Vn = h(Xj , Xk ), n Un = n j=1 k6=j n j,k=1 R with h : Rd × Rd → R symmetric and Rd h(x, y)dF (x)P= 0, ∀ y ∈ Rd , can be ∞ derived by using a spectral decomposition of h(x, y) = k=1 λk Φk (x)Φk (y) which holds true in the L2 -sense. Here, (Φk )k denote orthonormal eigenfunctions and (λk )k 2 the corresponding eigenvalues of the integral equation Z h(x, y)g(y)dF (y) = λg(y), (2.1) Rd (K) where F denotes the distribution function of X := 1 . Approximate n Un by n Un 2 P P PK n √1 − n1 ni=1 Φ2k (Xi ) . Then the sum under the round i=1 Φk (Xi ) k=1 λk n brackets is asymptotically normal while the latter sum converges in probability to 1. Finally, one obtains d n Un −→ ∞ X λk (Zk2 − 1), (2.2) k=1 where (Zk )k is a sequence of i.i.d. standard normal random variables, cf. Serfling [25]. d P 2 If additionally E|h(X1 , X1 )| < ∞, Slutsky’s theorem implies Vn −→ ∞ k=1 λk (Zk − d 1) + Eh(X1 , X1 ). (Here, −→ denotes convergence in distribution.) So far, most previous attempts to derive the limit distribution of degenerate U - and V -statistics of dependent random variables are based on the adoption of this method of proof. Eagleson [15] developed the asymptotic theory in the case of a strictly stationary sequence of φ-mixing, real-valued random variables under the assumption of absolutely summable eigenvalues. This condition is satisfied if R the kernel function is of the form h(x, y) = R h1 (x, z)h2 (z, y)dF (z). Using general heavy-tailed weight functions instead of F , the eigenvalues are not necessarily absolutely summable. For an example see de Wet [11]. Carlstein [4] analysed U -statistics of α-mixing, real-valued random variables in the case of finitely many eigenfunctions. He derived a limit distribution of the form (2.2), where (Zk )k∈N is a sequence of centered normal random variables. Denker [9] considered stationary functionals (Xn = f (Zn , Zn+1 , . . . ))n of β-mixing random variables (Zn )n . He assumed f and the cumulative distribution function of X0 to be Hölder continuous. Imposing some smoothness condition on h the limit distribution of n Un is derived under the additional assumption kΦk k∞ < ∞, ∀ k ∈ N. The condition on (Φk )k is difficult or even impossible to check in a multitude of cases since this requires to solve the associated integral equation (2.1). Similar difficulties occur if one wants to apply the results of Dewan and Prakasa Rao [10] or Huang and Zhang [21]. They studied U -statistics of associated, real-valued random variables. Besides the absolute summability of the eigenvalues some regularity conditions have to be satisfied uniformly by the eigenfunctions, in order to obtain the asymptotic distribution of n Un . A different approach was used by Babbel [2] to determine the limit distribution of U -statistics of φ- and β-mixing random variables. She deduced the limit distribution via a Haar-wavelet decomposition of the kernel and empirical process theory without imposing the critical conditions mentioned above. However, this approach is not suitable when dealing with U -statistics of τ -dependent random variables since Lipschitz continuity will be the crucial property of the (approximating) kernel in order to exploit the underlying dependence structure. 2.2. Main results. Let (Xn )n∈N be a sequence of Rd -valued random variables on some probability space (Ω, A, P ). In this subsection we derive the limit distributions 3 of n n 1 XX 1 X h(Xj , Xk ) and n Vn = h(Xj , Xk ), n j=1 k6=j n j,k=1 R where h : Rd × Rd → R is a symmetric function with Rd h(x, y)dF (x) = 0 ∀ y ∈ Rd . In order to describe the dependence structure of (Xn )n∈N , we recall the definition of the τ -dependence coefficient for Rd -valued random variables of Dedecker and Prieur [7]: n Un = Definition 2.1. Let (Ω, A, P ) be a probability space, M a σ-algebra of A and X an Rd -valued random variable (d ∈ N). Assume that EkXkl1 < ∞, where kxkl1 = Pd i=1 |xi |, and define Z ! Z τ (M, X) = E sup f (x)dF (x) . f (x)dF X|M (x) − f ∈Λ1 (Rd ) Rd Rd Here, F X|M denotes the conditional distribution function of the distribution of X given M, and Λ1 (Rd ) denotes the set of 1-Lipschitz functions from Rd to R. We assume (A1) (i) (Xn )n∈N is a (strictly) stationary sequence of Rd -valued random variables (d ∈ N) on some probability space (Ω, A, P ) with EkX0 kl1 < ∞. (ii) The sequence (τr )r∈N defined by τr := sup{τ (σ(Xs1 , . . . , Xsu ), (Xt01 , Xt02 , Xt03 )0 ) | u ∈ N, s1 ≤ · · · ≤ su < su + r ≤ t1 ≤ t2 ≤ t3 ∈ N} P∞ satisfies r=1 rτrδ < ∞ for some δ ∈ (0, 1). (Here, prime denotes the transposition.) Remark 1. If Ω is rich enough, due to Dedecker and Prieur [6] the validity of (A1) 0 d 0 e0 , X e0 , X e0 = Xt01 , Xt02 , Xt03 , allows for the construction of a random vector X t1 t2 t3 independent of Xs1 , ..., Xsu with τr = 3 X et − Xt kl1 . EkX i i (2.3) i=1 R β(r) We obtain an upper bound for the dependence coefficient τr ≤ 6 0 Q|X0 | (u)du, where Q|X0 | (u) = inf{t ∈ R | P (kX0 kl1 > t) ≤ u}, u ∈ [0, 1] and β(r) denotes the ordinary β-mixing coefficient β(r) := E supB∈σ(Xs ,s>t+r),t∈Z |P (B|σ(Xs , s ≤ t)) − P (B)|). This is a consequence of Remark 2 of Dedecker and Prieur [6]. Moreover, (A1) is related to the concept of weak dependence which was introduced by Doukhan and Louhichi [13]. Remark 1 immediately implies lv m τr (2.4) |cov (h(Xs1 , . . . , Xsu ), k(Xt1 , . . . , Xtv ))| ≤ 2khk∞ Lip(k) 3 for s1 ≤ · · · ≤ su < su +r ≤ t1 ≤ · · · ≤ tv ∈ N and for all functions h : Ru → R and k : Rv → R in L := {f : Rp → R for some p ∈ N | Lipschitz continuous and bounded}. Therefore, a sequence of random variables, that satisfies (A1), is ((τr )r , L, ψ)-weak dependent with ψ(h, k, u, v) = 2khk∞ Lip(k) v3 . (Here and in the sequel, Lip(g) denotes the Lipschitz constant of a generic function g.) 4 A list of examples for τ -dependent processes including causal linear and functional autoregressive processes is provided by Dedecker and Prieur [7]. Besides the conditions on the dependence structure of (Xn )n∈N we make the following assumptions concerning the kernel: (A2) (i) The kernel h : Rd × Rd → R R is a symmetric, measurable function and degenerate under F , i.e. Rd h(x, y)dF (x) = 0, ∀ y ∈ Rd . (ii) For a δ satisfying (A1)(ii) the following moment constraints hold true with some ν > (2 − δ)/(1 − δ): e1 )|ν < ∞, sup E|h(X1 , Xk )|ν < ∞ and E|h(X1 , X k∈N e1 is an independent copy of X1 . where X (A3) The kernel h is Lipschitz continuous. Using an appropriate kernel truncation it is possible to reduce the problem of deriving the asymptotic distribution of n Vn to statistics with bounded kernel functions. Lemma 2.1. Suppose that (A1), (A2), and (A3) are fulfilled. Then there exists a family of bounded functions (hc )c∈R+ satisfying (A2) and (A3) uniformly and such that lim lim sup n2 E(Vn − Vn,c )2 = 0, (2.5) c→∞ n→∞ P where Vn,c = n12 nj,k=1 hc (Xj , Xk ). After this simplification of the problem, we intend to develop a decomposition of the kernel that allows for the application of a central limit theorem (CLT) for weakly dependent random variables. One could try to imitate the proof of the i.i.d. case. According to the discussion in the previous subsection this leads to prerequisites that can hardly be checked in numerous cases. Therefore, we do not use a spectral decomposition of the kernel in order to apply the CLT but a wavelet decomposition instead. It turns out that Lipschitz continuity is the central property, the kernel function should satisfy, in order to exploit (2.3). For this reason the choice of Haar wavelets, as they were employed by Babbel [2], is inappropriate. Instead, the application of Lipschitz continuous scale and wavelet functions is more suitable here. In the sequel, let φ and ψ denote scale and wavelet functions associated with an onedimensional multiresolution analysis. As illustrated by Daubechies [5], Sect. 8, these functions can be selected in such a manner that they possess the following properties: (1) φ and ψ are Lipschitz continuous, (2) φ R ∞and ψ have compact R ∞support, (3) −∞ φ(x)dx = 1 and −∞ ψ(x)dx = 0. It is well-known that an orthonormal basis of L2 (Rd ) can be constructed on the base of φ and ψ. For this purpose define E := {0, 1}d \{0d }, where 0d denotes the d-dimensional null vector. In addition, set ( φ for i = 0, ϕ(i) := ψ for i = 1 5 (e) and define the functions Ψj,k : Rd → R for j ∈ Z, k = (k1 , . . . , kd )0 ∈ Zd by (e) Ψj,k (x) := 2jd/2 d Y ϕ(ei ) (2j xi − ki ), ∀ e = (e1 , . . . , ed )0 ∈ E, ∀ x = (x1 , . . . , xd )0 ∈ Rd . i=1 The system (e) Ψj,k is an orthonormal basis of L2 (Rd ), see Wojtaszczyk e∈E,j∈Z,k∈Zd [26], Sect. 5. The same holds true for [ (e) (Φj0 ,k )k∈Zd Ψj,k e∈E,j≥j0 ,k∈Zd j0 ∈ Z, , Q where Φj,k : Rd → R is given by Φj,k (x) := 2jd/2 di=1 φ(2j xi − ki ) for j ∈ Z, k ∈ Zd . Thus, an L2 -approximation of Vn,c by a statistic based on a wavelet approximation (K,L) of hc can be established. To this end we introduce e hc with X (c) e h(K,L) (x, y) := αj0 ;k1 ,k2 Φj0 ,k1 (x)Φj0 ,k2 (y) c k1 ,k2 ∈{−L,...,L}d J(K)−1 + X j=j0 X X (c,e) (e) βj;k1 ,k2 Ψj;k1 ,k2 (x, y), k1 ,k2 ∈{−L,...,L}d e∈Ē where Ē := (E × E) ∪ (E × {0d }) ∪ ({0d } × E), (e ) (e ) 1 2 0 0 0 Ψj,k1 Ψj,k2 for (e1 , e2 ) ∈ E × E, (e) (e1 ) Ψj;k1 ,k2 := Ψj,k1 Φj,k2 for (e01 , e02 )0 ∈ E × {0d }, (e ) Φj,k1 Ψj,k22 for (e01 , e02 )0 ∈ {0d } × E RR (c) αj0 ;k1 ,k2 = Rd ×Rd hc (x, y)Φj0 ,k1 (x)Φj0 ,k2 (y)dx dy and RR (c,e) (e) βj;k1 ,k2 = Rd ×Rd hc (x, y)Ψj;k1 ,k2 (x, y)dx dy. We refer to the degenerate version of (K,L) (K,L) e hc as hc , given by Z Z (K,L) (K,L) (K,L) e e e hc (x, y) = hc (x, y) − hc (x, y)dF (x) − h(K,L) (x, y)dF (y) c d d R R ZZ e (x, y)dF (x)dF (y). + h(K,L) c Rd ×Rd (K,L) The associated V -type statistic will be denoted by Vn,c . Lemma 2.2. Assume that (A1), (A2), and (A3) are fulfilled. Then (J(K))K∈N ⊆ N with J(K) −→ ∞ can be chosen such that K→∞ (K,L) lim lim lim sup n2 E Vn,c − Vn,c K→∞ L→∞ 2 = 0. n→∞ Employing the CLT of Neumann and Paparoditis [24] and the continuous mapping K,L theorem we obtain the limit distribution of n Vn,c . Finally, based on this result, the limit distribution of the V -type statistic n Vn can be derived. Moreover, the law of large numbers P allows for deducing the asymptotic distribution of n Un since n n 1 n Un = n−1 n Vn − n k=1 h(Xk , Xk ) . 6 Theorem 2.1. Suppose that the assumptions (A1), (A2), and (A3) are fulfilled. Then, as n → ∞, d n Vn −→ Z and d n Un −→ Z − Eh(X1 , X1 ) with Z := lim c→∞ X (c) αj0 ;k1 ,k2 Zk1 Zk2 + ∞ X X X (c,e) (e ) (e ) βj;k1 ,k2 Zj;k11 Zj;k22 . j=j0 k1 ,k2 ∈Zd e=(e01 ,e02 )0 ∈Ē k1 ,k2 ∈Zd (e) Here, (Zk )k∈Zd and (Zj;k )j≥j0 , k∈Zd , e∈{0,1}d are centered normally distributed random variables and the r.h.s. converges in the L2 -sense. As in the case of i.i.d. random variables, the limit distribution of n Vn is a weighted sum of products of centered normal random variables. In contrast to many other results in the literature, the prerequisites of this theorem, namely moment constraints and Lipschitz continuity of the kernel, can be checked fairly easily in many cases. Unfortunately, the asymptotic distribution still has a complicated structure. Hence, quantiles can hardly be determined on the basis of the previous result. However, this problem plays a minor role since we show in the following section that the conditional distribution of the bootstrap counterparts of n Un and n Vn , given X1 , . . . , Xn , converge to the same limits in probability. Of course, the assumption of Lipschitz continuous kernels is rather restrictive. Thus, we intend to extend our theory to a more general class of kernel functions. The costs for enlarging the class of feasible kernels are additional moment constraints. Besides (A1) and (A2) we assume (A4) (i) The kernel function satisfies |h(x, y) − h(x̄, ȳ)| ≤ f (x, x̄, y, ȳ) [kx − x̄kl1 + ky − ȳkl1 ] , ∀ x, x̄, y, ȳ ∈ Rd where f : R4d → R is continuous. Moreover, for η := 1/(1 − δ) and some a>0 η sup E max f (Yk1 , Yk2 + c1 , Yk3 , Yk4 + c2 ) (1 + kYk5 kl1 ) < ∞, k1 ,...,k5 ∈N c1 ,c2 ∈[−a,a]d ek , where X ek denotes an for any sequence (Yk )k∈N with Yk = Xk or Yk = X independent copy of Xk . P δ2 (ii) ∞ < ∞. r=1 r(τr ) Even though the assumption (A4)(i) has a rather technical structure, it is satisfied e.g. by polynomial kernel functions as long as the sample variables have sufficiently many finite moments. Analogously to Lemma 2.1 and Lemma 2.2 the following assertion holds. Lemma 2.3. Suppose that (A1), (A2), and (A4) are fulfilled. Then (J(K))K∈N ⊂ N with J(K) −→ ∞ can be chosen such that K→∞ (K,L) 2 lim lim lim lim sup E(Vn − Vn,c ) = 0. c→∞ K→∞ L→∞ n→∞ This auxiliary result implies the analogue to Theorem 2.1 for non-Lipschitz kernels. 7 Theorem 2.2. Assume that (A1), (A2), and (A4) are satisfied. Then, as n → ∞, d n Vn −→ Z d n Un −→ Z − Eh(X1 , X1 ), and where Z is defined as in Theorem 2.1. 3. Consistency of general bootstrap methods As we have seen in the previous section, the limit distributions of degenerate U and V -statistics have a rather complicated structure. Therefore, in the majority of cases it is quite difficult to determine quantiles which are required in order to derive asymptotic critical values of U - and V -type test statistics. The bootstrap offers a suitable way of determining these quantities. Given X1 , . . . , Xn , let X ∗ and Y ∗ denote vectors of bootstrap random variables with values in Rd1 and Rd2 . In order to describe the dependence structure of the bootstrap sample, we introduce Xn := (X10 , . . . , Xn0 )0 , ! Z Z ∗ ∗ ∗ τ ∗ (Y ∗ , X ∗ , ω) := E sup f (x)dF X |Y (x) − f (x)dF X (x) Xn = ω f ∈Λ1 (Rd1 ) Rd 1 Rd1 and make the following assumptions: (A1∗ ) (i) The sequence of bootstrap variables is stationary with probability tending d to one. Additionally, (Xt∗01 , Xt∗02 )0 −→ (Xt01 , Xt02 )0 , ∀ t1 , t2 ∈ N, holds true in probability. (ii) Conditionally on X1 , . . . , Xn , the random variables (Xk∗ )k∈Z are τ -weakly dependent, i.e. there exist a sequence of coefficients (τ̄r )r∈N and a sequence (1) (1) of sets (Ωn )n∈N with P (Xn ∈ Ωn ) −→ 1 and the following property: For n→∞ (1) any sequence (ωn )n∈N with ωn ∈ Ωn , τr∗ (ωn ) := sup{τ ∗ ((Xs∗01 , . . . , Xs∗0u )0 , (Xt∗01 , Xt∗02 , Xt∗03 )0 , ωn ) | u ∈ N, s1 ≤ · · · ≤ su < su + r ≤ t1 ≤ t2 ≤ t3 ∈ N} can by some τ̄r such that the sequence (τ̄r )r∈N satisfies P∞ be bounded δ r=1 r(τ̄r ) < ∞. Remark 2. Neumann and Paparoditis [24] proved that in case of stationary Markov chains of finite order, the key for convergence of the finite-dimensional distributions is convergence of the conditional distributions, cf. their Lemma 4.2. In particular, they showed that AR(p)-bootstrap and ARCH(p)-bootstrap yield samples that satisfy (A1∗ )(i). Lemma 3.1. Suppose that (A1) and (A1∗ ) hold true. Further let h : Rd × Rd → R be a bounded, symmetric, Lipschitz continuous function such that Eh(X1 , y) = E(h(X1∗ , y)|X1 , . . . , Xn ) = 0, ∀ y ∈ Rd . Then, n 1 X d h(Xj∗ , Xk∗ ) −→ Z n j,k=1 n and 1 XX d h(Xj∗ , Xk∗ ) −→ Z − Eh(X1 , X1 ) n j=1 k6=j hold in probability. Here, Z is defined as in Theorem 2.1. 8 In order to deduce bootstrap consistency, additionally, convergence in a certain Pn P 1 ∗ ∗ metric ρ is required, i.e. ρ P ( n j,k=1 h(Xj , Xk ) ≤ x|X1 , ..., Xn ), P (n Vn ≤ x) −→ P 0. (Here, −→ denotes convergence in probability.) Convergence in the uniform metric follows from Lemma 3.1 if the limit distribution has a continuous cumulative distribution function. The next assertion gives a necessary and sufficient condition for this. Lemma 3.2. The limit variable Z, derived in Theorem 2.1 / Theorem 2.2 under (A1), (A2), and (A3)/(A4), has a continuous cumulative distribution function if var(Z) > 0. Kernels of statistics emerging from goodness-of-fit tests for composite hypotheses often depend on an unknown parameter. We intend to employ bootstrap consistency for this setting, i.e. when parameters have to be estimated. Moreover, we enlarge the class of feasible kernels. For this purpose we additionally assume P (A2∗ ) (i) θbn−→ θ ∈ Rp . (ii) E h(X ∗ , y, θbn )|X1 , . . . , Xn = 0, ∀ y ∈ Rd . 1 (iii) For some δ satisfying (A1∗ )(ii), ν > (2−δ)/(1−δ) and a constant C1 < ∞ (2) (2) there exists a sequence of sets (Ωn )n∈N such that P (Xn ∈ Ωn ) −→ 1 n→∞ and ∀ (ωn )n∈N with ωn ∈ (2) Ωn the following moment constraint holds true e1∗ , θbn )|ν |Xn = ωn ≤ C1 , sup E |h(X1∗ , Xk∗ , θbn )|ν + |h(X1∗ , X 1≤k≤n e1∗ denotes an independent copy of where (conditionally on X1 , . . . , Xn ) X ∗ X1 . and (A3∗ ) (i) The kernel satisfies |h(x, y, θbn ) − h(x̄, ȳ, θbn )| ≤ f (x, x̄, y, ȳ, θbn ) [kx − x̄kl1 + ky − ȳkl1 ] , ∀x, x̄, y, ȳ ∈ Rd , sup 1≤k1 ,...,k5 ≤n where f : R4d × Rp → R is continuous on R4d × U (θ) for some compact neighborhood U (θ) of θ. Moreover, for η := 1/(1 − δ), some a > 0 (3) and some C2 < ∞ there exists a sequence of sets (Ωn )n∈N such that (3) (3) P (Xn ∈ Ωn ) −→ 1 and ∀ (ωn )n∈N with ωn ∈ Ωn the following moment n→∞ constraint holds true E[ max f (Y ∗ , Y ∗ + c1 , Y ∗ , Y ∗ + c2 , θbn )η (1 + kY ∗ kl )|Xn = ωn ] ≤ C2 , c1 ,c2 ∈[−a,a]d k1 k2 k3 k4 k5 1 e ∗ , where X e ∗ denotes for any sequence (Yk∗ )k∈Z with Yk∗ = Xk∗ or Yk∗ = X k k ∗ an independent copy of Xk , conditionally on X1 , . . . , Xn . P δ2 (ii) ∞ <∞. r=1 r(τ̄r ) Under these assumptions we derive a result concerning the asymptotic distribuP P P tions of n Vn∗ = n−1 nj,k=1 h(Xj∗ , Xk∗ , θbn ) and n Un∗ = n−1 nj=1 k6=j h(Xj∗ , Xk∗ , θbn ). To this end we denote the U - and V -statistics with kernel h(·, ·, θ) by Un and Vn , respectively. 9 Theorem 3.1. Suppose that (A1), (A2), and (A4), as well as (A1∗ ), (A2∗ ), and (A3∗ ) are fulfilled. Then, as n → ∞, d n Vn∗ −→ Z, and d n Un∗ −→ Z − Eh(X1 , X1 , θ), in probability, where Z is defined as in Theorem 2.1. Moreover, if var(Z) > 0, P sup −∞<x<∞ |P (n Vn∗ ≤ x|X1 , ..., Xn ) − P (n Vn ≤ x)| −→ 0 and sup −∞<x<∞ P |P (n Un∗ ≤ x|X1 , ..., Xn ) − P (n Un ≤ x)| −→ 0. This implies that bootstrap-based tests of U - or V -type have asymptotically a prescribed size α, i.e. P (n Un > t∗u,α ) −→ α and P (n Vn > t∗v,α ) −→ α, where t∗u,α and n→∞ n→∞ t∗v,α denote the (1 − α)-quantiles of n Un∗ and n Vn∗ , respectively, given X1 , . . . , Xn . 4. L2 -tests for weakly dependent observations This section is dedicated to two applications in the field of hypothesis testing. For sake of simplicity we restrict ourselves to real-valued random variables and consider only a simple null hypothesis. The test for symmetry as well as the modelspecification test can be extended to problems with composite hypotheses, cf. Leucht [22]. 4.1. Test for symmetry. Answering the question whether a distribution is symmetric or not is interesting for several reasons. There is a pure statistical interest since symmetry characterizes the relation of location parameters. Moreover, symmetry plays a central role in analyzing and modelling real-life phenomena. For instance, it is often presumed that an observed process can be described by an AR(p)-process with Gaussian innovations, which in turn implies a Gaussian marginal distribution. Rejecting the hypothesis of symmetry contradicts this type of marginal distribution. Furthermore, this result of the test excludes any kind of symmetric innovations in this context. Suppose that (Xn )n∈N is a sequence of real-valued random variables with distribution P X and satisfying (A1). For some µ ∈ R we are given the problem H0 : P X−µ = P µ−X vs. H1 : P X−µ 6= P µ−X . Similar to Feuerverger and Mureika [17], who studied the problem for i.i.d. random variables, we propose the following test statistic Z n Z 1 X −iµt 2 Tn = n [=(cn (t)e )] w(t)dt = sin(t(Xj − µ)) sin(t(Xk − µ))w(t)dt n j,k=1 R R which makes use of the fact that symmetry of a distribution is equivalent to a vanishing imaginary part of the associated characteristic function. Here, =(z) denotes the imaginary part of z ∈ C, cn denotes the empirical R characteristic function and w is some positive, measurable weight function with R (1 + |t|)w(t)dt < ∞. Obviously, Tn is a V -type statistic whose kernel satisfies (A2) and (A3). Thus, its limit distribution can be determined by Theorem 2.1. Assuming that the observations come from a stationary AR(p)- or ARCH(p)-process, the validity of (A1∗ ) is assured by using the AR(p)- or ARCH(p)-bootstrap methods, given by Neumann and Paparoditis [24], in 10 order to generate the bootstrap counterpart of the sample. Hence, in these cases the prerequisites of Lemma 3.1 are satisfied excluding degeneracy. Inspired by Dehling and Mikosch [8] who discussed this problem for Efron’s Bootstrap in the i.i.d. case, we propose a bootstrap statistic with the following kernel Z Z Z ∗ ∗ ∗ hn (x, y) = h(x, y) − h(x, y)dFn (x) − h(x, y)dFn (y) + h(x, y)dFn∗ (x)dFn∗ (y). R R R2 Here, h denotes the kernel function of Tn and Fn∗ the distribution function of X1∗ conditionally on X1 , . . . , Xn . Similar to the proof of Theorem 3.1 the desired convergence property of Tn∗ can be verified. 4.2. Model-specification test. Let (Xk )k∈Z be a stationary real-valued nonlinear autoregressive process with centered i.i.d. innovations (εk )k∈Z , i.e. Xk = g(Xk−1 )+εk . Suppose that E|ε0 |4+δ < ∞ for some δ > 0 and that g ∈ G := {f : R → R | f Lipschitz-continuous with Lip(f ) < 1}. Thus, the process (Xk )k∈Z is τ -dependent with exponential rate, see Dedecker and Prieur [7], Example 4.2. We will present a test for the problem H0 : P (E(X1 |X0 ) = g0 (X0 )) = 1 vs. H1 : P (E(X1 |X0 ) = g0 (X0 )) < 1 with g0 ∈ G. For sake of simplicity we stick to this small classes of functions G and processes (Xk )k∈Z . An extension to a more comprehensive variety of model specification tests is investigated in a forthcoming paper, cf. Leucht [22]. Similar to Fan and Li [16] we propose the following test statistic n Xj−1 − Xk−1 1 XX (Xj − g0 (Xj−1 ))(Xk − g0 (Xk−1 ))K Tn = √ h n h j=1 k6=j n 1 XX H(Zj , Zk ), =: n j=1 k6=j i.e. a kernel estimator of E([X1 − g(X0 )]E(X1 − g(X0 )|X0 )f (X0 )) = 0 multiplied √ with n h. Here, Zk := (Xk , Xk−1 )0 , k ∈ Z, and f denotes the density of the distribution of X0 . Fan and Li [16], who considered β-mixing processes, used a similar test statistic with a vanishing bandwidth. In contrast, we consider the case of a fixed bandwidth. These tests a more powerful against Pitman alternatives g1,n (x) = g0 (x) + n−β w(x) + o(n−β ), β > 0, w ∈ G. For a detailed discussion of this topic see Ghosh and Huang [19]. Obviously, Tn is degenerate. If we assume K to be a bounded, even, and Lipschitz continuous function, then there exists a function f : R8 → R with |H(z1 , z2 ) − H(z̄1 , z̄2 )| ≤ f (z1 , z̄1 , z2 , z̄2 )(kz1 − z2 kl1 + kz̄1 − z̄2 kl1 ) and such that (A4) is valid. Moreover, under these conditions H satisfies (A2). Hence, the assertion of Theorem 2.2 holds true. In order to determine critical values of the test we propose the bootstrap procedure given by Franke and Wendel [18] (without estimating the regression function). P The bootstrap innovations (ε∗t )t are drawn with replacement of the set {ε̃t = εt − n−1 nk=1 εk }nt=1 , where εt = Xt − g0 (Xt−1 ), t = 1, . . . , n. After choosing ∗ a starting value independently of (ε∗t )t≥1 the bootstrap sample P Xt∗ =P g(Xt−1 ) + ε∗t as n well as the bootstrap counterpart of the test statistic Tn∗ = n1 j=1 k6=j H(Zj∗ , Zk∗ ) 11 ∗ )0 , k = 1, . . . , n, can be computed. In contrast to the previwith Zk∗ = (Xk∗ , Xk−1 ous subsection the proposed bootstrap method leads to a degenerate kernel function. Obviously, the bootstrap sample is τ -dependent in the sense of (A1∗ ) and satisfies E(|Xk∗ | | Z1 , . . . , Zn ) < C for some C < ∞ with probability tending to one. By Theorem 1 of Diaconis and Freedman [12] we obtain stationarity of the bootstrap sample. In order to verify convergence of the finite dimensional distributions, we apply Lemma 4.2 of Neumann and Paparoditis [24]. The application of this result requires the convergence of the conditional distribution, i.e. ∗ ∗ P supx∈K d(P Xt |Xt−1 =x , P Xt |Xt−1 =x ) −→ 0 for every compact K ⊂ R and d(P, Q) = inf X∼P,Y ∼Q E(|X − Y | ∧ 1). In the present context this can be confirmed similarly to the proof of Lemma 4.1 by Neumann and Paparoditis [24]. Summing up, all prerequisites of Theorem 3.1 are satisfied. Hence, the asymptotic critical value of the above test can be determined using the proposed model-based bootstrap procedure. 5. Proofs 5.1. Proofs of the main theorems. Throughout this section C denotes a positive, finite generic constant. (K,L) Proof of Theorem 2.1. First the limit distribution of n Vn , defined before Lemma 2.2, is derived. Afterwards the asymptotic distribution of n Vn and n Un are deduced by means of Lemma 2.1, Lemma 2.2, and the weak law of large numbers. (K,L) The following modified representation of e hc will be useful in the sequel M (K,L) e h(K,L) (x, y) = c X (c) γk,l q̃k (x)q̃l (y), k,l=1 M (K,L) where (q̃k q̃l )k,l=1 (c) is an ordering of S n o (e) (Φ Φ ) ∪ (Ψ ) j0 ,k1 j0 ,k2 j;k1 ,k2 e∈Ē,j∈{j0 ,...,J(K)−1} k1 ,k2 ∈{−L,...,L}d (c) and γk,l = γl,k , k, l ∈ {1, . . . , M (K, L)}, are the associated coefficients. Moreover, the introduction of qk (Xi ) := q̃k (Xi ) − Eq̃k (Xi ), k ∈ {1, . . . , M (K, L)}, i ∈ {1, . . . , n}, (K,L) allows for the compact notation of n Vn,c , ! ! M (K,L) n n X (c) X X 1 1 (K,L) √ n Vn,c = γk,l √ qk (Xi ) ql (Xj ) . n n i=1 j=1 k,l=1 0 q1 (Xi ), ..., qM (K,L) (Xi ) PM (K,L) 1 Pn first. Due to the Cramér-Wold-device it suffices to investigate k=1 tk √n i=1 qk (Xi ) ∀ (t1 , ..., tM (K,L) )0 ∈ RM (K,L) . Asymptotic normality can be established by applying PM (K,L) the CLT of Neumann and Paparoditis [24] to Qi := k=1 tk qk (Xi ), i = 1, . . . , n. To this end the prerequisites of this tool have to be checked. Obviously, we are given a strictly stationary sequence of centered and bounded random variables. This implies in conjunction with the dominated convergence theorem that the Lindeberg condition is fulfilled. In order to show n1 var(Q1 + · · · + Qn ) −→ σ 2 := var(Q1 ) + n→∞ P 2 ∞ cov(Q , Q ), the validity of (A1) can be employed which moreover assures the 1 k k=2 In order to derive its limit distribution, we consider √1 n Pn i=1 12 existence of σ 2 . Then, X n ∞ X 1 2 var(Q1 + ... + Qn ) − σ 2 = (n − [r − 1])cov(Q , Q ) − 2 cov(Q , Q ) 1 r 1 k n n r=2 k=2 ∞ X r−1 , 1 |cov(Q1 , Qr )| ≤2 min n r=2 ∞ X r−1 , 1 τr−1 , ≤ 4kQ1 k∞ Lip(Q1 ) min n r=2 where the latter inequality follows from (2.4). The summability condition of the dependence coefficients in connection with Lebesgue’s dominated convergence theorem yields the desired result. Since Qt1 Qt2 forms a Lipschitz continuous function, inequality (6.4) of Neumann and Paparoditis [24] holds true with θr = Lip(Qt1 Qt2 )τr . It is easy to convince oneself that their condition (6.3) is not needed to be checked if the involved random variables are uniformly bounded. Finally, we obtain d n−1/2 (Q1 + ... + Qn ) −→ N (0, σ 2 ) and hence, d X (K,L) n Vn,c −→ Zc(K,L) := (c) αj0 ;k1 ,k2 Zk1 Zk2 k1 ,k2 ∈{−L,...,L}d J(K)−1 + X j=j0 X X (c,e) (e ) (e ) βj;k1 ,k2 Zj;k11 Zj;k22 . k1 ,k2 ∈{−L,...,L}d e=(e01 ,e02 )0 ∈Ē (e) Here, (Zk )k∈{−L,...,L}d and (Zj;k )j∈{j= ,...,J(K)},e∈{0,1}d ,k∈{−L,...,L}d , respectively, are cen d (K,L ) (K,L ) tered normally distributed random variables. Note that also n Vn,c 1 − Vn,c 2 −→ (K,L1 ) Zc (K,L2 ) − Zc . By Lemma 2.1 and Lemma 2.2 we have (K,L) lim lim lim lim sup n2 E(Vn,c − Vn )2 = 0. c→∞ K→∞ L→∞ n→∞ (K,L) Hence, it remains to show that limc→∞ limK→∞ limL→∞ E(Zc − Z)2 = 0. To this end we first show that (ZcK,L )L is a Cauchy sequence in L2 (Ω, A, P ) which, due to com(K,L) d (K) (K,L) pleteness of L2 , implies limL→∞ n Vn,c −→ Zc := L2 − limL→∞ Zc . According (K,L1 ) (K,L2 ) 2 (K,L ) to Theorem 5.3 of Billingsley [3] we obtain E(Zc −Zc ) ≤ lim inf n→∞ n2 E(Vn,c 1 − (K,L ) Vn,c 2 )2 . The second part of the proof of Lemma 2.2 implies 2 (K,L ) (K,L ) lim inf n→∞ n2 E Vn,c 1 − Vn,c 2 −→ 0 for L1 > L2 . Iterating this method in L2 →∞ (K) conjunction with Lemma 2.1 yields limc→∞ limK→∞ E(Z − Zc )2 = 0 and hence (K,L) limc→∞ limK→∞ limL→∞ E(Zc − Z)2 = 0 which in turn leads to the desired limit distribution of n Vn . Based on the result concerning V -type statistics, the limit distribution of n Un can Pn n 1 be established. Since Un = n−1 Vn − n(n−1) i=1 h(Xi , Xi ), it remains to verify that Pn P 1 i=1 h(Xi , Xi ) −→ Eh(X1 , X1 ). According to Markov’s inequality we prove that n−1 13 var 1 n−1 Pn i=1 h(Xi , Xi ) −→ 0. For this purpose we investigate cov(h(Xj , Xj ), h(Xk , Xk )) n→∞ d ek = ek kl1 = τk−j . The confor j < k. Let X Xk be independent of Xj with EkXk − X dition (A2) implies ek , X ek )] cov(h(Xj , Xj ), h(Xk , Xk )) = E(h(Xj , Xj )[h(Xk , Xk ) − h(X 1−δ δ ek , X ek )| E|h(Xj , Xj )|1/(1−δ) |h(Xk , Xk ) − h(X ≤ Cτk−j δ ≤ Cτk−j . Therefore, by (A1) we get with τ0 = [E(h(X1 , X1 ))2 ]1/δ ! n n−1 X 1 X C var h(Xi , Xi ) ≤ (n − r)τrδ −→ 0. n→∞ n − 1 i=1 (n − 1)2 r=0 Proof of Theorem 2.2. On the basis of Lemma 2.3 similar arguments as in the proof d d of Theorem 2.1 yield n Vn −→ Z. In order to obtain n Un −→ Z − Eh(X1 , X1 ), P n 1 var( n−1 k=1 h(Xk , Xk )) −→ 0 has to be verified. To this end the approximation of n→∞ cov(h(Xj , Xj ), h(Xk , Xk )) has to be modified: δ2 ek , Xk , X ek )1−δ [kXk kl1 + kX ek kl1 ])δ(1−δ) cov(h(Xj , Xj ), h(Xk , Xk )) ≤ τk−j (2 Ef (Xk , X ek , X ek )|)1−δ × (E|h(Xj , Xj )|1/(1−δ) |h(Xk , Xk ) − h(X 2 δ = O(τk−j ). The summability assumption concerning the dependence coefficients in (A4) implies the desired convergence property. Proof of Theorem 3.1. Due to Lemma 3.2 it suffices to verify distributional convergence. To this end we introduce (2) (3) b Ωθn ⊆ Ω(1) n ∩ Ωn ∩ Ωn ∩ Xn | kθn − θkl1 < δn such that for any sequence (ωn )n∈N with ωn ∈ Ωθn L((Xt∗01 , Xt∗02 )0 |Xn = ωn ) = L((Xt∗01 +k , Xt∗02 +k )0 |Xn = ωn ), k ∈ N, L((Xt∗01 , Xt∗02 )0 |Xn = ωn ) =⇒ L((Xt01 , Xt02 )0 ). Moreover, the null sequence (δn )n∈N can be chosen such that on Ωθn , θbn ∈ U (θ) and d P (Xn ∈ Ωθn ) −→ 1 holds. Hence, to prove Vn∗ −→ Z, in probability, it suffices to n→∞ verify −→ Z conditionally on Xn = ωn for any sequence (ωn )n with ωn ∈ Ωθn . Now, we take an arbitrary sequence (ωn )n with ωn ∈ Ωθn . In order to show that it suffices to investigate statistics with bounded kernels, we consider the degenerate version h∗(c) of d / [−c, c] with |h(x, y, θbn )| ≤ ch , h(x, y, θbn ), for x, y ∈ [−c, c] or x, y ∈ ∗(c) d e h (x, y, θbn ) := −ch , for x, y ∈ / [−c, c] with h(x, y, θbn ) < −ch , c , for x, y ∈ / [−c, c]d with h(x, y, θbn ) > ch , h Vn∗ d 14 with ch := max{c, maxx,y∈[−c,c]d ,θ∈U (θ) |h(x, y, θ)|}. The associated V -statistic is de∗ noted by Vn,c . Now, imitating the proof of Lemma 2.1 results in ∗ 2 ) |Xn = ωn ] ≤ εc lim sup n2 E[(Vn∗ − Vn,c n→∞ with εc −→ 0. Next we approximate the bounded kernel by the degenerate version of c→∞ X (c) e h∗(K,L) := α bj0 ;k1 ,k2 Φj0 ,k1 Φj0 ,k2 c k1 ,k2 ∈{−L,...,L}d J(K)−1 + X X j=j0 X (c,e) (e) βbj;k1 ,k2 Ψj;k1 ,k2 , k1 ,k2 ∈{−L,...,L}d e∈Ē RR (c) where α bj0 ;k1 ,k2 = Rd ×Rd h∗(c) (x, y, θbn )Φj0 ,k1 (x)Φj0 ,k2 (y)dx dy and RR (c,e) (e) βbj;k1 ,k2 = Rd ×Rd h∗(c) (x, y, θbn )Ψj;k1 ,k2 (x, y)dxdy. Denoting the associated V -statistic ∗(K,L) by Vbn,c leads to ∗ ∗(K,L) 2 lim lim lim sup n2 E[(Vn,c − Vbn,c ) |Xn = ωn ] = 0. K→∞ L→∞ n→∞ This conjecture can be proven by following the lines of the proof of Lemma 2.3. Here, / J(K) is chosen as follows: Since the Portmanteau theorem implies lim supn→∞ P (X1∗ ∈ d 0 0 d 0 / (−b, b) ), we first select some b = b(K) < [−b, b] |X1 , . . . , Xn ) = ωn ) ≤ P (X1 ∈ ∞ such that P (X1 ∈ / (−b, b)d ) ≤ 1/K. Afterwards we choose J(K) such that ∗(K) maxx,y∈[−b,b]d |h∗(c) (x, y, θbn ) − e hc (x, y, θbn )| ≤ 1/K and Sφ /2J(K) < a, where Sφ denotes the length of the support of the scale function φ. Because of the continuity assumptions on f the index J(K) can be determined independently of n on (Ωθn )n , cf. proof of Lemma 5.2. Obviously, ZZ (c) (c) α bj0 ;k1 ,k2 −→ αj0 ;k1 ,k2 := h∗(c) (x, y, θ)Φj0 ,k1 (x)Φj0 ,k2 (y)dx dy, n→∞ d d Z Z R ×R (c,e) (c,e) (e) βbj;k1 ,k2 −→ βj;k1 ,k2 := h∗(c) (x, y, θ)Ψj;k1 ,k2 (x, y)dx dy n→∞ Rd ×Rd on (Ωθn )n . Hence, ∗(K,L) ∗(K,L) 2 n2 E[(Vbn,c − Vn,c ) |Xn = ωn ] −→ 0, n→∞ ∗(K,L) (c) (c,e) where the kernel of Vn,c is obtained by substituting α bj0 ;k1 ,k2 and βbj;k1 ,k2 in the ∗(K,L) (c) (c,e) kernel of Vbn,c through αj0 ;k1 ,k2 and βj;k1 ,k2 . Thus, the next step is the application of the CLT of Neumann and Paparoditis [24] PM (K,L) ∗(K,L) to n Vn,c . For this purpose we introduce Q∗i := k=1 tk qk (Xi∗ ), t1 , ..., tM (K,L) ∈ ∗ R, where qk is the centered version (w.r.t. P X1 |Xn =ωn ) of qek (e qk )k is defined as in the proof of Theorem 2.1. Obviously, given X1 , . . . , Xn , the sequence (Q∗i )i is centered and has uniformly bounded second moments. Due to (A1∗ )(i) the Lindeberg condition is satisfied. In order to show that for arbitrary ε > 0 the inequality | n1 var(Q∗1 P + · · · + Q∗n |Xn = ωn ) − σ 2 | < ε, ∀ n ≥ n0 (ε), holds true with σ 2 = var(Q1 ) + 2 ∞ r=2 cov(Q1 , Qr ) and (Qk )k as in the proof of Theorem 2.1, the 15 abbreviations var∗ (·) = var(·|Xn = ωn ) and cov ∗ (·) = cov(·|Xn = ωn ) are used. Hence, 1 var∗ [Q∗1 + · · · + Q∗n ] − σ 2 n ∞ ∞ X X r−1 cov ∗ (Q∗1 , Q∗r ) − σ 2 ≤2 min , 1 |cov ∗ (Q∗1 , Q∗r )| + var∗ (Q∗1 ) + 2 n r=2 r=2 ∞ R−1 X X r−1 ≤2 min , 1 |cov ∗ (Q∗1 , Q∗r )| + 2 [cov ∗ (Q∗1 , Q∗r ) − cov(Q1 , Qr )] n r=1 r=2 X X ∗ ∗ ∗ cov(Q1 , Qr ) . cov (Q1 , Qr ) + 2 + 2 r≥R r≥R P P By (A1) and (A1∗ ), R can be chosen such that r≥R cov(Q1 , Qr )+ r≥R cov ∗ (Q∗1 , Q∗r ) ≤ ε . Moreover, (A1∗ ) implies that the first summand can be bounded from above by 4 ε as well, if n ≥ n0 (ε) for some n0 (ε) ∈ N. According to the convergence of the 4 two-dimensional distributions and the uniform boundedness of (Q∗k )k∈Z it is possible to pick n0 (ε) such that additionally the two remaining summands are bounded by 8ε . For the validity of the CLT of Neumann and Paparoditis [24] in probability, it remains to verify their inequality (6.4). By Lipschitz continuity of Q∗t1 Q∗t2 this holds with θr = Lip(Q∗t1 Q∗t2 )τ̄r = Lip(Qt1 Qt2 )τ̄r . The application of the continuous ∗(K,L) d (K,L) mapping theorem results in Vn,c −→ Zc , in probability, which in turn implies ∗ d Vn −→ Z. In order to obtain the analogous result of convergence for n Un∗ , additionally, ! n 1 X ∗ ∗ b P h(Xi , Xi , θn ) − Eh(X1 , X1 , θ) > ε Xn = ωn −→ 0 n→∞ n − 1 i=1 has to be proven for arbitrary ε > 0. Due to the continuity of h and according to (A1∗ )(i)and (A2∗ )(i) it suffices to verify ! n h 1 X i ε h(Xk∗ , Xk∗ , θbn ) − E(h(X1∗ , X1∗ , θbn )|Xn = ωn ) > Xn = ωn −→ 0. P n→∞ n − 1 2 k=1 The l.h.s. tends to zero if " #2 X 1 E h(Xk∗ , Xk∗ , θbn ) − E(h(X1∗ , X1∗ , θbn )|Xn = ωn ) Xn = ωn n − 1 k=1 vanishes asymptotically. This can be proven using the same arguments as in the proof of Theorem 2.2 for verifying the convergence of U -statistics. Bootstrap consistency follows from Lemma 3.2. 5.2. Proofs of auxiliary results. In order to prove Lemma 2.1, Lemma 2.2, and Lemma 2.3 an approximation of terms of the structure n 1 X Zn := 2 EH(Xi , Xj )H(Xk , Xl ) n i,j,k,l=1 16 will be required. Here, H denotes a symmetric, degenerate kernel function. Assuming that (Xn )n∈N satisfies (A1) we obtain Zn ≤ 8 n2 n X |EH(Xi , Xj )H(Xk , Xl )| ≤ 8 sup E|H(X1 , Xk )|2 + 1≤k≤n i≤j;k≤l;i≤k n−1 4 8 X X (t) Z n2 r=1 t=1 n,r with n X (1) = Zn,r ej )H(X ek , X el )|, |EH(Xi , Xj )H(Xk , Xl ) − EH(Xi , X i≤j,k≤l r:=min{j,k}−i≥l−max{j,k} (2) Zn,r n X = el )|, |EH(Xi , Xj )H(Xk , Xl ) − EH(Xi , Xj )H(Xk , X i≤j,k≤l r:=l−max{j,k}>min{j,k}−i (3) Zn,r = (4) Zn,r = n X ej )H(X ek , X el )|, |EH(Xi , Xj )H(Xk , Xl ) − EH(Xi , X i≤k≤l<j r:=i−k≥j−l n X ej )H(Xk , Xl )|. |EH(Xi , Xj )H(Xk , Xl ) − EH(Xi , X i≤k≤l<j r:=j−l>i−k (1) (3) e0 , X e 0 )0 is chosen such that e0, X Here, in each summand of Zn,r and Zn,r the vector (X j k l d (2) 0 0 0 0 0 0 0 0 ej , X e ,X e ) = (Xj , X , X ) and (2.3) holds. Within Zn,r it is independent of Xi , (X k l k l (4) e (r) (respectively X e (r) ) is chosen to be inde(respectively Zn,r ) the random variable X j l d 0 0 0 0 0 0 0 0 e (r) = Xl (respectively pendent of (X , X , X ) (respectively (X , X , X ) ) such that X i j i k k l l d e (r) = X j Xj ) and (2.3) holds. (This may possibly require an enlarging of the underlying probability space.) Thus, by degeneracy the subtrahends of these expressions vanish. (t) Moreover, note that the number of summands of Zn,r , t = 1, . . . , 4, is bounded by 2(r + 1)n2 . Proof of Lemma 2.1. For c > 0 we define ch := max{c, maxx,y∈[−c,c]d |h(x, y)|}, 0 0 0 2d 0 0 0 / [−c, c]2d with |h(x, y)| ≤ ch , h(x, y) for (x , y ) ∈ [−c, c] or (x , y ) ∈ e h(c) (x, y) := −ch for (x0 , y 0 )0 ∈ / [−c, c]2d with h(x, y) < −ch , c for (x0 , y 0 )0 ∈ / [−c, c]2d with h(x, y) > ch h and its degenerate version Z Z (c) (c) e e e hc (x, y) := h (x, y) − h (x, y)dF (x) − h(c) (x, y)dF (y) d d R R ZZ e + h(c) (x, y) dF (x) dF (y). Rd ×Rd The approximation error n2 E(Vn − Vn,c )2 can be reformulated in terms of Zn with kernel H = H (c) := h − h(c) . Hence, it remains to verify that supk∈N E|H (c) (H1 , Xk )|2 P P4 (t) and lim supn→∞ n12 n−1 r=1 t=1 Zn,r tend to zero as c → ∞. First, we investigate 17 P (1) lim supn→∞ n12 n−1 r=1 Zn,r , the remaining summands can be treated similarly. The (1) summands of Zn,r can be bounded as follows ej )H (c) (X ek , X el ) EH (c) (Xi , Xj )H (c) (Xk , Xl ) − EH (c) (Xi , X h i (c) (c) (c) e ≤ E H (Xk , Xl ) H (Xi , Xj ) − H (Xi , Xj ) 1(Xk0 ,Xl0 )0 ∈[−c,c]2d h i ej ) 1(X 0 ,X 0 )0 ∈[−c,c] + E H (c) (Xk , Xl ) H (c) (Xi , Xj ) − H (c) (Xi , X 2d k l / h i ej ) H (c) (Xk , Xl ) − H (c) (X ek , X el ) 1 0 e 0 0 + E H (c) (Xi , X (Xi ,Xj ) ∈[−c,c]2d h i ej ) H (c) (Xk , Xl ) − H (c) (X ek , X el ) 1 0 e 0 0 + E H (c) (Xi , X 2d (Xi ,Xj ) ∈[−c,c] / (5.1) = E1 + E2 + E3 + E4 . The Lipschitz constant of H (c) is obviously independent of c. Therefore, the iterative application of Hölder’s inequality to E2 yields δ ej ) E2 ≤ E H (c) (Xi , Xj ) − H (c) (Xi , X 1−δ ej )|1(X 0 ,X 0 )0 ∈[−c,c] × E|H (c) (Xk , Xl )|1/(1−δ) |H (c) (Xi , Xj ) − H (c) (Xi , X 2d k l / n 1/(2−δ) ≤Cτrδ (E|H (c) (Xk , Xl )|(2−δ)/(1−δ) 1(Xk0 ,Xl0 )0 ∈[−c,c] 2d ) / o1−δ (c) (2−δ)/(1−δ) (1−δ)/(2−δ) (c) (2−δ)/(1−δ) (1−δ)/(2−δ) e (E|H (Xi , Xj )| ) + (E|H (Xi , Xj )| ) (1−δ)/(2−δ) ≤ Cτrδ (E|H (c) (Xk , Xl )|(2−δ)/(1−δ) 1(Xk0 ,Xl0 )0 ∈[−c,c] . 2d ) / (5.2) As supk∈N E|h(X1 , Xk )| < ∞ for ν > (2 − δ)/(1 − δ), we obtain E2 ≤ with ε1 (c) −→ 0 after employing Hölder’s inequality once again. Analogous calculations ν τrδ ε1 (c) c→∞ yield E4 ≤ τrδ ε2 (c) with ε2 (c) −→ 0. Likewise, the approximation methods for E1 and c→∞ E3 are equal. Therefore, only E1 is investigated: Z E1 ≤ E Rd h i (c) (c) (c) e e h (Xk , y)dF (y) H (Xi , Xj ) − H (Xi , Xj ) 1Xk ∈[−c,c]d Z h i (c) (c) (c) e ej ) 1X ∈[−c,c]d + E h (y, Xl )dF (y) H (Xi , Xj ) − H (Xi , X l d ZRZ h i (c) (c) (c) e e + E h (x, y)dF (x)dF (y) H (Xi , Xj ) − H (Xi , Xj ) Rd ×Rd = E1,1 + E1,2 + E1,3 . 18 Analogously to (5.2) we obtain (2−δ)/(1−δ) 1/(2−δ) n Z δ (c) e 1Xk ∈[−c,c]d E1,1 ≤ Cτr E h(Xk , y) − h (Xk , y)dF (y) d R o1−δ (c) (2−δ)/(1−δ) (c) (2−δ)/(1−δ) (1−δ)/(2−δ) e × 2(sup E|H (X0 , Xk )| + E|H (X, Xj )| ) k∈N Z Z ≤ Cτrδ ≤ τrδ ε3 (c) Rd ×Rd (1−δ)/(2−δ) (c) (2−δ)/(1−δ) e |h(x, y) − h (x, y)| dF (y)1x∈[−c,c]d dF (x) with ε3 (c) −→ 0. The investigation of E1,2 coincides with the previous one. The c→∞ expression E1,3 can be approximated as follows ZZ |h(x, y) − e h(c) (x, y)|dF (x)dF (y) E1,3 ≤ Cτr d d Z ZR ×R ≤ Cτr |h(x, y)|1(x0 ,y0 )0 ∈[−c,c] 2d dF (x)dF (y) / Rd ×Rd ≤ τr ε4 (c) −→ 0. c→∞ To sum up, we have E1 + E2 + E3 + E4 ≤ ε5 (c)τrδ , where ε5 (c) −→ 0 uniformly in n. c→∞ This leads to n−1 n−1 1 X 1 X (1) Zn,r ≤ 2 lim lim sup 2 2(r + 1)n2 τrδ ε5 (c) = 0. c→∞ n→∞ n n r=1 r=1 It remains to examine sup E[H (c) (X1 , Xk )]2 ≤ C( sup E[h(X1 , Xk ) − e h(c) (X1 , Xk )]2 k∈N k∈N e1 ) − e e1 )]2 ). + E[h(X1 , X h(c) (X1 , X e1 denotes an independent copy of X1 . Similar arguments as before yield Here, X limc→∞ supk∈N E[H (c) (X1 , Xk )]2 = 0. The characteristics, stated in the following two lemmas, will be essential for a wavelet approximation of the kernel function h. Lemma 5.1. Given a Lipschitz continuous function gR : Rd → R, define a function P gj by gj (x) := k∈Zd αj,k Φj,k (x), j ∈ Z, where αj,k = Rd g(x)Φj,k (x)dx. Then gj is Lipschitz continuous with a constant that is independent of j. Proof of Lemma 5.1. In order to establish Lipschitz continuity, gj is decomposed into two parts X Z X Z gj (x) = Φj,k (u)g(x)du Φj,k (x) + Φj,k (u)[g(u) − g(x)]du Φj,k (x) k∈Zd Rd k∈Zd Rd = H1 (x) + H2 (x). According to the above choice of the scale function (with characteristics (1) - (3) of Subsection 2.2) the prerequisites of Theorem 8.2 of Härdle et al. [20] are fulfilled for 19 N = 1. This implies that result we obtain R∞ P jd H1 (x) = g(x)2 l∈Z −∞ d Z Y i=1 ∞ φ(y − l)φ(z − l)dz = 1 ∀y ∈ R. Based on this X φ(2j ui − l)φ(2j xi − l)dui = g(x) −∞ l∈Z by applying an appropriate variable substitution. This in turn immediately implies the desired continuity property. Note that for every fixed x the sum has finitely many non-vanishing summands because of the finite support of φ and since the number of summands is independent of j. Therefore, the order of summation and integration is interchangeable. In order to investigate H2 , we define a sequence of functions (κk )k∈Z by Z κk (x) = Φj,k (u)[g(u) − g(x)]du. Rd These functions are Lipschitz continuous with a constant decreasing in j: |κk (x) − κk (x̄)| ≤ Lip(g)O(2−jd/2 )kx − x̄kl1 . (5.3) Moreover, boundedness and Lipschitz continuity of φ yield kΦj,k k∞ = O(2jd/2 ) and |Φj,k (x) − Φj,k (x̄)| = O(2j(d/2+1) )kx − x̄kl1 . (5.4) Thus, |H2 (x) − H2 (x̄)| ≤ X |Φj,k (x)| |κk (x) − κk (x̄)| + k∈Zd ≤Ckx − x̄kl1 + X |κk (x̄)| |Φj,k (x) − Φj,k (x̄)| k∈Zd X |κk (x̄)| |Φj,k (x) − Φj,k (x̄)| . k∈Zd It has to be distinguished whether or not x̄ ∈ supp Φj,k in order to approximate the second summand. (Here, supp denotes R the support of a function.) In the first case it is helpful to investigate |κk (x̄)| = | Rd Φj,k (u)[g(u) − g(x̄)]du|. The integrand is non-trivial only if u ∈ supp Φj,k . In these situations, |g(u) − g(x̄)| = O(2−j ) by Lipschitz continuity. Consequently, we achieve Z −j |κk (x̄)| ≤ O(2 ) |Φj,k (u)|du = O(2−j(d/2+1) ) Rd which leads to X |κk (x̄)| |Φj,k (x) − Φj,k (x̄)| = O(kx − x̄kl1 ) k∈Zd since the number of non-vanishing summands is finite, independently of the values of x and x̄. Therefore, Lipschitz continuity of H2 is obtained as long as x̄ ∈ supp Φj,k . In the opposite case, we only have to consider the situation of x ∈ supp Φj,k since the setting x̄, x ∈ / supp Φj,k is trivial. With the aid of (5.3) and (5.4), the first term of the r.h.s. of |κk (x̄) [Φj,k (x) − Φj,k (x̄)]| ≤ |κk (x̄) − κk (x)| |Φj,k (x)| + |κk (x)| |Φj,k (x) − Φj,k (x̄)| (5.5) can be approximated from above by Ckx − x̄kl1 . The investigation of the second summand is identical to the analysis of the case x̄ ∈ supp Φj,k . Finally, we obtain |H2 (x) − H2 (x̄)| ≤ Ckx − x̄kl1 where C < ∞ is a constant that is independent of j. This yields the assertion of the lemma. 20 Lemma 5.2. Let g : Rd → R be a continuous function that is Lipschitz continuous on some interval (−c, c)d , c > Sφ , where Sφ denotes the length of the support of the scale function φ. For arbitrary 0 < b < c−Sφ and K ∈ N there exists P J = J(K, b) ∈ N such (K,b) (K,b) that for g and its approximation g given by g (x) = k∈Zd αJ(K,b),k ΦJ(K,b),k (x) it holds max |g(x) − g (K,b) (x)| ≤ 1/K. x∈[−b,b]d Proof of Lemma 5.2. Given 0 < b < ∞, we define ḡ (b) (x) := g(x)wb (x), where wb is a Lipschitz continuous weight function with compact support. Moreover, wb is assumed to be bounded from above by 1 and wb (x) := 1 for x ∈ [−b − Sφ , b + Sφ ]d . R (b) Additionally, we set αJ(K,b),k := Rd ḡ (b) (u)ΦJ(K,b),k (u)du. Hence, X (b) (b) (K,b) max g(x) − g (x) ≤ max ḡ (x) − αJ(K,b),k ΦJ(K,b),k (x) d d x∈[−b,b] x∈[−b,b] k∈Zd X (b) + max αJ(K,b),k ΦJ(K,b),k (x) − g (K,b) (x) x∈[−b,b]d d k∈Z = max A x∈[−b,b]d Since ḡ (b) ∈ L2 (Rd ), X XX (b,e) (e) (J) A = βj,k Ψj,k j≥J(K,b) k∈Zd e∈E (J) (x) + max B (J) (x). x∈[−b,b]d with (b,e) βj,k Z = Rd (e) ḡ (b) (u)Ψj,k (u)du holds true in the L2 -sense. Lemma 5.1 implies equicontinuity of (A(J) )J∈N on [−b, b]d . Therefore, the above equality is valid pointwise on [−b, b]d if additionally P P P (b,e) (e) d e∈E βj,k Ψj,k is continuous on [−b, b] . This can be proven by verij≥J(K,b) k∈Zd P P P P P P (b,e) (e) (b,e) (e) fying uniform convergence of N j=J(K,b) k∈Zd e∈E βj,k Ψj,k to j≥J(K,b) k∈Zd e∈E βj,k Ψj,k . P (b,e) (e) For all x ∈ [−b, b]d , the number of k ∈ Zd with e∈E βj,k Ψj,k (x) 6= 0 is bounded (e) uniformly in j. Furthermore, kΨj,k k∞ = O(2jd/2 ), ∀ j ∈ Z, k ∈ Zd , e ∈ E. As ḡ (b) is R∞ Lipschitz continuous and −∞ ψ(x)dx = 0, we obtain Z (b,e) (e) |βj,k | ≤ |ḡ (b) (u) − ḡ (b) (x)|Ψj,k (u)du = O(2−j(d/2+1) ), ∀j ≥ j0 (b) Rd (e) for some j0 (b) ∈ Z if Ψj,k (x) 6= 0 for some x ∈ [−b, b]d . Finally, we end up with ∞ ∞ X XX X (b,e) (e) max βj,k Ψj,k (x) ≤ C 2−j −→ 0 (5.6) d N →∞ x∈[−b,b] d j=N +1 k∈Z e∈E j=N +1 and consequently, X XX (b,e) (e) (J) max A (x) = max βj,k Ψj,k . x∈[−b,b]d x∈[−b,b]d j≥J(K,b) k∈Zd e∈E 21 Moreover, the inequality maxx∈[−b,b]d A(J) (x) ≤ K1 , ∀ J(K, b) ≥ J0 (K, b), for some J0 (K, b) ∈ N results from (5.6). The introduction of the finite set of indices Z̄(J) := k ∈ Zd | ΦJ(K,b),k (x) 6= 0 for some x ∈ [−b, b]d leads to max B x∈[−b,b]d (J) X (x) = max d x∈[−b,b] αJ(K,b),k − (b) αJ(K,b),k ΦJ(K,b),k (x). k∈Z̄(J) (b) This term is equal to 0 since the definition of ḡ (b) implies αJ(K,b),k −αJ(K,b),k = 0, ∀ k ∈ Z̄. Proof of Lemma 2.2. The assertion of the Lemma is verified in two steps. First, the bounded kernel hc , constructed in the proof of Lemma 2.1, is approximated by P (K) (K) (c) e hc which is defined by e hc (x, y) = k1 ,k2 ∈Rd αJ(K);k1 ,k2 ΦJ(K),k1 (x)ΦJ(K),k2 (y) with RR (c) αJ(K);k1 ,k2 = Rd ×Rd hc (x, y)ΦJ(K),k1 (x)ΦJ(K),k2 (y)dx dy. Here, the indices (J(K))K∈N with J(K) −→ ∞ are chosen such that the assertion of Lemma 5.2 holds true for K→∞ (K) b ∈ R with P (X1 ∈ / [−b, b]d ) ≤ K1 . Since the function e hc is not degenerate in general, we introduce its degenerate counterpart Z Z (K) (K) (K) e e e hc (x, y) = hc (x, y) − hc (x, y)dF (x) − h(K) c (x, y)dF (y) d d R R ZZ e + h(K) c (x, y)dF (x)dF (y). Rd ×Rd (K) and denote the corresponding V -statistic by Vn,c . Now, the structure of the proof is as follows. First, we prove (K) 2 lim sup n2 E Vn,c − Vn,c −→ 0. (5.7) In a second step, it remains to show that for every fixed K (K) (K,L) 2 lim sup n2 E Vn,c − Vn,c −→ 0. (5.8) n→∞ n→∞ K→∞ L→∞ 2 (K) In order to verify (5.7), we rewrite n2 E Vn,c − Vn,c in terms of Zn with kernel funcP P4 (K) (t) tion H := H (K) = hc −hc . Hence it remains to verify that lim supn→∞ n12 n−1 r=1 t=1 Zn,r and supk∈N E|H (K) (H1 , Xk )|2 tend to zero as K → ∞. Exemplarily, we investigate P (1) (1) lim supn→∞ n12 n−1 r=1 Zn,r . The summands of Zn,r can be bounded as follows ej )H (K) (X ek , X el ) EH (K) (Xi , Xj )H (K) (Xk , Xl ) − H (K) (Xi , X h i (K) (K) (K) e ≤ E H (Xk , Xl ) H (Xi , Xj ) − H (Xi , Xj ) h i ej ) H (K) (Xk , Xl ) − H (K) (X ek , X el ) + E H (K) (Xi , X ≤ 2kH (K) k∞ Lip(H (K) )τr . 22 According to Lemma 5.1 Lip(H (K) ) does not depend on K. The degeneracy of hc (K) e (K) implies R ∞ P kH k∞ ≤ 4khc − hc k∞ . As already indicated in the proof of Lemma 5.1, l∈Z φ(y1 − l)φ(y2 − l)dy2 = 1 holds true. This leads to −∞ X (K) ΦJ(K),k1 (x)ΦJ(K),k2 (y) khc − e hc k∞ = k1 ,k2 ∈Zd ZZ [hc (x, y) − hc (u, v)]ΦJ(K),k1 (u)ΦJ(K),k2 (v)dudv × . Rd ×Rd 0 ∞ 0 0 If (x , y ) does not lie in the support of ΦJ(K),k1 ΦJ(K),k2 , the corresponding summand vanishes. Otherwise, Lipschitz continuity of h is employed for further approximations. Similarly to the proof of Lemma 5.1 we obtain Z Z −J(K)(d+1) . d d [hc (x, y) − hc (u, v)]ΦJ(K),k1 (u)ΦJ(K),k2 (v)du dv = O 2 R R (K) This implies khc − e hc k∞ = O(2−J(K) ) since for every fixed tuple (x0 , y 0 )0 only a finite number of summands is non-vanishing. This number does neither depend on J(K) nor on the value of (x, y). It follows that EH (K) (Xi , Xj )H (K) (Xk , Xl ) ≤ C2−J(K) τr . (K) Due to kH (K) k∞ ≤ 4khc − e hc k∞ = O(2−J(K) ), supk∈N E|H (K) (H1 , Xk )|2 → 0 as K → ∞. Set τ0 = 1. Summing up the achievements results in " # n−1 X 1 (K) 2 lim sup n2 E(Vn,c − Vn,c ) ≤ C lim sup 2 2−J(K) (r + 1)n2 τr + 2−2J(K) −→ 0, K→∞ n n→∞ n→∞ r=0 which finishes the proof of (5.7). The main goal of the previous step was the multiplicative separation of the random variables which are cumulated in hc . The aim of the second step is the approximation (K) of hc , whose representation is given by an infinite sum, by a function consisting of only finitely many summands. Similar to the foregoing part of the proof the (K) (K,L) approximation error n2 E Vn,c − Vn,c 2 is reformulated in terms of Zn with kernel Pn−1 (1) (K) (K,L) H := H (L) = hc − hc . As before, we exemplarily take n−2 r=1 Zn,r and (K) 2 supk∈N E|H (X1 , Xk )| into further consideration. Concerning the summands of (1) Zn,r we obtain (L) (L) (L) (L) e e e EH (Xi , Xj )H (Xk , Xl ) − EH (Xi , Xj )H (Xk , Xl ) h i ej ) 1(X 0 ,X 0 )0 ∈[−B,B]2d ≤ E H (L) (Xk , Xl ) H (L) (Xi , Xj ) − H (L) (Xi , X k l h i (L) (L) (L) e + E H (Xk , Xl ) H (Xi , Xj ) − H (Xi , Xj ) 1(Xk0 ,Xl0 )0 ∈[−B,B] 2d / h i ej ) H (L) (Xk , Xl ) − H (L) (X ek , X el ) 1 0 e 0 0 + E H (L) (Xi , X (Xi ,Xj ) ∈[−B,B]2d h i (L) (L) (L) e e e + E H (Xi , Xj ) H (Xk , Xl ) − H (Xk , Xl ) 1(X 0 ,Xe 0 )0 ∈[−B,B] 2d i j / =E1 + E2 + E3 + E4 23 for arbitrary B > 0. Obviously, it suffices to take the first two summands into further considerations. The both remaining terms can be treated similarly. Since φ and ψ have compact support, the number of overlapping functions within (Φj0 ,k )k∈{−L,...,L}d and (Ψj,k )k∈{−L,...,L}d ,j0 ≤j<J(K) can be bounded by a constant that is independent of L. By Lipschitz continuity of φ and ψ this leads to uniform Lipschitz continuity of (K,L) (hc )L∈N . Moreover, note that (H (L) )L is uniformly bounded. Due to the reformulation J(K)−1 e h(K) c (x, y) = X (c) αj0 ;k1 ,k2 Φj0 ,k1 (x)Φj0 ,k1 (y) k1 ,k2 ∈Zd + X X X j=j0 k1 ,k2 ∈Zd e∈Ē (c,e) (e) βj;k1 ,k2 Ψj;k1 ,k2 (x, y) (K) (K,L) one can choose (B = B(K, L))L∈N such that supx,y∈[−B,B]d |e hc (x, y) − e hc (x, y)| = 0 and B −→ ∞. This setting allows for the following approximations L→∞ h i1−δ 1−δ E1 ≤ Cτrδ E|H (L) (Xk , Xl )|1/(1−δ) 1(Xk0 ,Xl0 )0 ∈[−B,B]d ≤ Cτrδ P (X1 ∈ , / [−B, B]d ) 1−δ E2 ≤ Cτrδ P (X1 ∈ . / [−B, B]2d ) Analogously, it can be shown that supk∈N0 E[H (L) (X0 , Xk )]2 ≤ CP (X1 ∈ / [−B, B]d ). Finally, we obtain # " n−1 X 1 (K) (K,L) 2 (r + 1)n2 τrδ −→ 0. lim sup n2 E(Vn,c −Vn,c ) ≤ C(P (X1 ∈ / [−B, B]d ))1−δ lim sup 2 L→∞ n n→∞ n→∞ r=1 Hence, (5.7) and (5.8) hold. Proof of Lemma 2.3. In order to prove this result we follow the lines of the proofs of Lemma 2.1, Lemma 2.2 and Lemma 5.1 and carry out some modifications. In a first step we reduce the problem to statistics with bounded kernels. To this end we use the modified approximation |H (c) (x, y) − H (c) (x̄, ȳ)| ≤ [2 f (x, x̄, y, ȳ) + g(x, x̄) + g(y, ȳ)] [kx − x̄kl1 + ky − ȳkl1 ] =: f1 (x, x̄, y, ȳ) [kx − x̄kl1 + ky − ȳkl1 ], R where g is given by g(x, x̄) := Rd f (x, x̄, z, z)dF (z). Under (A4)(i) Hölder’s inequality yield E|H (c) (Yk1 , Yk2 ) − H (c) (Yk3 , Yk4 )| ≤ E[f1 (Yk1 , Yk2 , Yk3 , Yk4 )]1/(1−δ) 4 X !1−δ kYki kl1 (EkYk1 − Yk3 kl1 + EkYk2 − Yk4 kl1 )δ i=1 for Yki (ki ∈ N, i = 1, . . . , 4), as defined in (A4). Plugging in this inequality into the (c) calculations of the proof of Lemma 2.1 yields lim supn→∞ n2 E(Vn − Vn )2 −→ 0. c→∞ The next step will be the wavelet approximation of the bounded kernel hc . Defining (K) (K) hc and Vn,c as in the proof of Lemma 2.2 analogous to the proof of Lemma 5.1 24 there exists a C > 0 such that |e hc(K) (x̄, ȳ) − e h(K) c (x, y)| ≤ f1 (x, x̄, y, ȳ)[kx − x̄kl1 + ky − ȳkl1 ] + |H2 (x̄, ȳ) − H2 (x, y)| ≤ Cf1 (x, x̄, y, ȳ)[kx − x̄kl1 + ky − ȳkl1 ] X + |κk1 ,k2 (x̄, ȳ)||ΦJ(K),k1 (x)ΦJ(K),k2 (y) − ΦJ(K),k1 (x̄)ΦJ(K),k2 (ȳ)| , (5.9) k1 ,k2 ∈Zd R R where κk1 ,k2 is given by κk1 ,k2 (x, y) := Rd Rd ΦJ(K);k1 ,k2 (u, v)[hc (u, v) − hc (x, y)]du dv and H2 is defined as in the proof of Lemma 5.1. In order to approximate the last summand of (5.9), we distinguish again between the cases whether or not (x̄0 , ȳ 0 )0 ∈ supp ΦJ(K),k1 ΦJ(K),k2 . In the first case, an upper bound of order O(maxc1 ,c2 ∈[−Sφ /2J(K) ,Sφ /2J(K) ]d f1 (x̄ + c1 , x̄, ȳ + c2 , ȳ))(kx̄ − xkl1 + kȳ − ykl1 ) can be achieved. Here, Sφ denotes the length of the support of φ. In the second case a decomposition, similar to (5.5), can be employed which leads to the upper bound O(f1 (x, x̄, y, ȳ)+maxc1 ,c2 ∈[−Sφ /2J(K) ,Sφ /2J(K) ]d f1 (x+c1 , x, y+c2 , y))(kx̄−xkl1 +kȳ−ykl1 ). Consequently, we obtain |e hc(K) (x̄, ȳ) − e h(K) c (x, y)| ≤ O f1 (x, x̄, y, ȳ) + + f1 (x + c1 , x, y + c2 , y) f1 (x̄ + c1 , x̄, ȳ + c2 , ȳ) max c1 ,c2 ∈[−Sφ /2J(K) ,Sφ /2J(K) ]d max c1 ,c2 ∈[−Sφ /2J(K) ,Sφ /2J(K) ]d × (kx̄ − xkl1 + kȳ − ykl1 ) =: f2 (x, x̄, y, ȳ)(kx̄ − xkl1 + kȳ − ykl1 ). This leads to |H (K) (x, y) − H (K) R (x̄, ȳ)| ≤ f3 (x, x̄, y, ȳ)(kx R − x̄kl1 + ky − ȳkl1 ) with f3 (x, x̄, y, ȳ) = 2f2 (x, x̄, y, ȳ) + Rd f2 (x, x̄, z, z)dF (z) + Rd f2 (z, z, ȳ, y)dF (z). Note that under (A4) Ef3 (Yi , Yj , Yk , Yl )η (kYi kl1 | + kYj kl1 + kYk kl1 + kYl kl1 ) < ∞ if J(K) is sufficiently large. Making some modifications in the proof of Lemma 2.3, Zn = (K) n2 E(Vn,c −Vn,c )2 can be approximated. As before we restrict ourselves to investigate P (1) (1) supk∈N E[H (K) (X1 , Xk )]2 and n−1 r=1 Zn,r . The summands of Zn,r can be bounded as follows ej )H (K) (X ek , X el ) EH (K) (Xi , Xj )H (K) (Xk , Xl ) − EH (K) (Xi , X h i (K) (K) (K) e ≤ E H (Xk , Xl ) H (Xi , Xj ) − H (Xi , Xj ) h i ej ) H (K) (Xk , Xl ) − H (K) (X ek , X el ) . + E H (K) (Xi , X Since further approximations are similar for both summands, we concentrate on the first one. Iterative application of Hölder’s inequality leads to h i ej ) E H (K) (Xk , Xl ) H (K) (Xi , Xj ) − H (K) (Xi , X 2 ej kl1 f3 (Xi , Xi , Xj , X ej )1/(1−δ) ]δ(1−δ) E|H (K) (Xk , Xl )|1/(1−δ) 1−δ , ≤ O(τrδ )[EkXj − X as boundedness of hc implies uniform boundedness of (H (K) )K . The middle factor is bounded because of (A4)(i). Therefore, it remains to examine E|H (K) (Xk , Xl )|1/(1−δ) . 25 Let b(K) be chosen such that P (X1 ∈ / [−b(K), b(K)]d ) ≤ 1/K, then E|H (K) (Xk , Xl )|1/(1−δ) = E|H (K) (Xk , Xl )|1/(1−δ) 1Xk ,Xl ∈[−b(K),b(K)]d + O P (X1 ∈ / [−b(K), b(K)]d ) / [−b(K), b(K)]d ) . ≤4 sup |hc (x, y) − e h(K) (x, y)|1/(1−δ) + O P (X1 ∈ c x,y∈[−b(K),b(K)]d Due to Lemma 5.2 and the continuity of f we obtain 1/K as an upper bound for the first term if J(K) is chosen sufficiently big. Consequently, EH (K) (Xi , Xj )H (K) (Xk , Xl ) ≤ 2 (1) C K δ−1 τrδ which implies that Zn,r tends to zero as K increases. Furthermore, one obtains supk∈N0 E[H (K) (X0 , Xk )]2 −→ 0 similarly to the investigation of E|H (K) (Xk , Xl )|1/(1−δ) K→∞ 2 (K) 2 above. Thus, we get lim supn→∞ n E Vn,c − Vn,c −→ 0. K→∞ (K) (K,L) 2 Step three of the proof contains the verification of lim supn→∞ n2 E(Vn,c −Vn,c ) −→ 0. L→∞ For this purpose it suffices to plug in a modified approximation of H (L) (x, y) − H (L) (x̄, ȳ) into the second part of the proof of Lemma 2.2. Lipschitz continuity (K,L) of hc implies |H (L) (x, y) − H (L) (x̄, ȳ)| ≤ f4 (x, x̄, y, ȳ)[kx − x̄kl1 + ky − ȳkl1 ] with f4 (x, x̄, y, ȳ) = C + f3 (x, x̄, y, ȳ). Since, for J(K) sufficiently large, f4 satisfies the moment assumption of (A4)(i) with a = 0, we obtain E|H (L) (Yk1 , Yk2 ) − H (L) (Yk3 , Yk4 )| ≤ C[E(kYk1 − Yk3 kl1 + kYk2 − Yk4 kl1 )]δ . (K) (K,L) 2 Hence, lim supn→∞ n2 E(Vn,c − Vn,c ) −→ 0 under (A4). Summing up the three L→∞ steps leads to (K,L) 2 lim lim lim lim sup n2 E(Vn − Vn,c ) = 0. c→∞ K→∞ L→∞ n→∞ Proof of Lemma 3.2. A positive variance of Z implies the existence of constants V > 0 and c0 > 0 such that for every c ≥ c0 we can find K0 ∈ N such that for every K ≥ K0 (K,L) there is an L0 with var(Zc ) ≥ V, ∀L ≥ L0 . Therefore, uniform equicontinuity of (K,L) )L )K )c yields the desired property of Z. By the distribution functions of (((Zc (K,L) matrices-based notation of Zc we obtain M (K,L) Zc(K,L) = X (c,K,L) (K,L) γk1 ,k2 Zk1 (K,L) Zk2 = Z̄ (K,L)0 Γ(K,L) Z̄ (K,L) , c k1 ,k2 =1 (K,L) with a symmetric matrix of coefficients Γc and a normal vector (K,L) (K,L) (K,L) 0 (K,L) Z̄ = (Z1 , . . . , ZM (K,L) ) . Hence, Z can be rewritten in the following way M (K,L) d Zc(K,L) = 0 Ȳ U (K,L)0 U (K,L) Ȳ Λ(K,L) c =Y 0 Λ(K,L) Y c = X (c,K,L) λk Yk2 . k=1 Here U (K,L) is a certain orthogonal matrix, (c,K,L) |λ1 (c,K,L) (K,L) Λc (c,K,L) := diag(λ1 (c,K,L) , ..., λM (K,L) ) with | ≥ ... ≥ |λM (K,L) | and, Ȳ as well as Y are multivariate standard normally 26 distributed random vectors. For notational simplicity we suppress the upper index (c, K, L) in the sequel. Due to the above choice of the triple (c, K, L), either P4 PM (K,L) (λk )2 or (λk )2 is bounded from below by V /4. In the first case, k=5 k=1p λ1 ≥ V /16 holds true which implies for arbitrary ε > 0 Z 2ε C(ε) (K,L) ∀ x ∈ R, P (Zc ∈ [x − ε, x + ε]) ≤ fλ1 Y12 (t)dt ≤ p V /16 0 where C(ε) −→ 0. Here, the first inequality results from the fact that convolution ε→0 preserves the continuity properties of the smoother function. PM (K,L) In the opposite case, i.e. (λk )2 ≥ V /4, it is possible to bound the uniform k=5 (K,L) norm of the density function of Zc by means of its variance. To this end, we (K,L) first consider the characteristic function ϕZc(K,L) of Zc and assume w.l.o.g. that M (K,L)/4 M (K, L) is divisible by 4. Defining a sequence (µk )k=1 by µk = λ4k for k ∈ {1, ..., M (K, L)/4} allows for the following approximation −1/4 (K,L) MY 2 1 + (2λj t) ϕZc(K,L) (t) = j=1 −1 M (K,L)/4 Y ≤ 1 + (2µj t)2 j=1 ≤ 1+ 4(µ21 1 . + · · · + µ2M (K,L)/4 )t2 By inverse Fourier transform we obtain the following result concerning the density (K,L) function of Zc Z ∞ 1 1 1 q kϕZc(K,L) k1 ≤ dt fZc(K,L) ≤ 2π 2π −∞ 1 + 4( µ2 + · · · + µ2 ∞ 2 t) 1 M (K,L)/4 Z ∞ 1 1 1 =q du 2 µ21 + · · · + µ2M (K,L)/4 2π 0 1 + u 1 ≤ 2 q 2 q 4(µ21 + · · · + µ2M (K,L)/4−1 ) 1 ≤ λ25 + · · · + λ2M (K,L) 1 ≤√ . V (K,L) Thus, P (Zc ∈ [x − ε, x + ε]) ≤ √2V ε which completes the studies of the case PM (K,L) (λk )2 > V /4 and finally yields the assertion. k=5 Proof of Lemma 3.2. This result is an immediate consequence of Theorem 3.1. Acknowledgment . The author thanks Michael H. Neumann for his constructive advice and fruitful discussions. 27 This research was funded by the German Research Foundation DFG (project: NE 606/2-1). References [1] Arcones, M. A. and Giné, E. (1992). On the bootstrap of U and V statistics. Ann. Statist. 20, 655–674. [2] Babbel, B. (1989). Invariance principles for U -statistics and von Mises functionals. J. Statist. Plann. Inference 22, 337–354. [3] Billingsley, P. (1968). Convergence of Probabilty measures. New York: Wiley. [4] Carlstein, E. (1989). Degenerate U -statistics based on non-independent observations. Calcutta Statist. Assoc. Bull. 37, 55–65. [5] Daubechies, I. (2002). Ten Lectures on Wavelets. Philadelphia, PA: Society for Industrial and Applied Mathematics. [6] Dedecker, J. and Prieur, C. (2004). Couplage pour la distance minimale. C. R. Acad. Sci. Paris, Ser I 338, 805–808. [7] Dedecker, J. and Prieur, C. (2005). New dependence coefficients. Examples and applications to statistics. Probab. Theory Relat. Fields 132, 203–236. [8] Dehling, H. and Mikosch, T. (1994). Random Quadratic Forms and the Bootstrap for U -Statistics. J. Multivariate Anal. 51, 392–413. [9] Denker, M. (1982). Statistical decision procedures and ergodic theory. In Ergodic theory and related topics (Vitte, 1981) (ed. Michel, H.). Math. Res. 12, 35–47. Berlin: Akademieverlag. [10] Dewan, I. and Prakasa Rao, B. L. S. (2001). Asymptotic normality for U statistics of associated random variables. J. Statist. Plann. Inference 97, 201–225. [11] de Wet, T. (1987). Degenerate U - and V -statistics. South African Statist. J. 21, 99–129. [12] Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Rev. 41, 45–76. [13] Doukhan, P. and Louhichi, S. (1999). A weak dependence condition and applications to moment inequalities. Stochastic Process. Appl. 84, 313–342. [14] Doukhan, P. and Neumann, M. H. (2008). The notion of ψ-weak dependence and its applications to bootstrapping time series. Probab. Surv. 5, 146–168. [15] Eagleson, G. K. (1979). Orthogonal expansions and U -statistics. Austral. J. Statist. 21, 221-237. [16] Fan, Y. and Li, Q. (1999). Central limit theorem for degenerate U -statistics of absolutely regular processes with applications to model specification testing. J. Nonparametr. Stat. 10, 245–271. [17] Feuerverger, A. and Mureika, R. A. (1977). The empirical characteristic function and its applications. Ann. Statist. 5, 88-97. [18] Franke, J. and Wendel, M. (1992). A Bootstrap Approach for Nonlinear Autoregressions - Some Preliminary Results. In Bootstrapping and Related Techniques (eds. Jöckel, K. H., Rothe, G. and Sendler, W.). Lecture Notes in Economics and Mathematical Systems 376, Berlin: Springer. [19] Ghosh, B. K. and Huang, W.-M. (1991). The Power and Optimal Kernel of the Bickel-Rosenblatt Test for Goodness of Fit. Ann. Statist. 19, 999–1009. [20] Härdle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). Wavelets, Approximation, and Statistical Applications. New York: Springer. [21] Huang, W. and Zhang, L.-X. (2006). Asymptotic normality for U -statistics of negatively associated random variables. Statist. Probab. Lett. 76, 1125-1131. 28 [22] Leucht, A. (2010). Some tests of L2 -type for weakly dependent observations. In preparation. [23] Leucht, A. and Neumann, M. H. (2009). Consistency of general bootstrap methods for degenerate U - and V -type statistics, J. Multivariate Anal. 100, 16221633. [24] Neumann, M. H. and Paparoditis, E. (2008). Goodness-of-fit tests for Markovian time series models: Central limit theory and bootstrap approximations. Bernoulli 14, 14-46. [25] Serfling, R. J. (1980). Approximation theorems of mathematical statistics. New York: Wiley. [26] Wojtaszczyk, P. (1997). A Mathematical Introduction to Wavelets. London Mathematical Society, Students Texts 37. Cambridge: Cambridge Univ. Press.
© Copyright 2026 Paperzz