A New Moment Bound and Weak Laws for - UNC

A New Moment Bound and Weak Laws for Mixingale
Arrays without Memory or Heterogeneity Restrictions,
with Applications to Tail Trimmed Arrays
Jonathan B. Hill
Dept. of Economics
University of North Carolina - Chapel Hill
Dec. 2010
Abstract
We present a new weak law of large numbers for dependent heterogeneous triangular arrays fyn;t g with applications
P to tailptrimming.
Pn The lawp follows from a
new partial sum moment bound Ej n
y
j
K
n;t
t=1
t=1 Ejyn;t j for zero mean
Lp -mixingale arrays fyn;t ; =t g, p 2 [1; 2], where =t is a -…eld. We do not require
uniform integrability, nor restrict mixingale dependence or heterogeneity. The weak
law isPapplied toP
tail trimmed heavy tailed data, and we characterize self-scaled
p
n
laws n
jy
j=
n;t
t=1
t=1 Ejyn;t j ! 1 for potentially very heavy tailed data (Ejyn;t j !
1 is possible). Finally, we characterize a minimal rate of convergence for the tail
trimmed self-scaled law when distribution tails are regularly varying and possibly
non-stationary, including trending tails.
1.
Introduction We present a new partial sum moment bound and weak laws of
large numbers [WLLN] for zero mean mixingale triangular arrays fyn;t g = fyn;t : 1
t
ngn 1 . The array fyn;t g may exhibit any rate of memory decay and any degree of
heterogeneity. The primary application is to tail trimmed arrays in Sections 3 and 4.
Let f=t g be a sequence of non-decreasing -…elds on a probability space ( ; =; P ),
denote the Lp -norm jjxjjp := (Ejxjp )1=p , and let K > 0 be a …nite constant that may
change from place to place. We rule out degenerate cases by assuming
lim inf
n!1
n
X
t=1
p
E jyn;t j
K for some K > 0:
(1)
We say fyn;t ; =t g forms a zero mean Lp -mixingale triangular array, p > 1, with size
> 0 if for some base …eld =t
kE [yn;t j=t
qn ]kp
en;t
qn
and kyn;t
E [yn;t j=t+qn ]kp
en;t
qn +1 ;
(2)
where en;t > 0 and qn = O(qn
) for in…nitessimal > 0. We call fen;t g the constants;
f qn g = f qn gn 1 are the coe¢ cients where 0 = 1; and fqn g = fqn gn 1 is any sequence
Dept.
of Economics,
www.unc.edu/ jbhill.
University of North Carolina - Chapel Hill;
[email protected];
AMS classi…cations : prinary 60F15; secondary 60F25.
Keywords and phrases : moment bound, law of large numbers, mixingale array, tail trimming, heavy tails.
1
of positive …nite integer displacements. Notice we allow the displacement qn ! 1 as n
! 1, a central tool in this paper detailed extensively below.
McLeish [53] …rst proposed the L2 -mixingale property for sequences fyt g, assumed
here to have a zero mean,
kE [yt j=t
q ]k2
et #q and kyt
E [yt j=t+q ]k2
(20 )
et #q+1 ;
where q is a …xed integer displacement, et > 0 and #q = O(q
), and used (20 ) to deliver a
maximal inequality and strong law. Hansen [37], [38] extended that result to Lp -mixingale
sequences, p 2 (1; 2), with generalizations to weak and strong laws for arrays in Andrews
[1], Davidson [14], de Jong [19], [20], Hill [39], [40], and Yanjiao and Zhengyan [71], and
further generalizations to so-called s-weak dependence in Dedecker and Doukhan [23]. In
all cases a …xed displacement q is used to deliver a maximal inequality. The combined
mixingale cases imply
"
p#
j
n
X
X
K
yn;t
epn;t
E max
1 j n
t=1
t=1
for p 2 (1; 2) with size = 1 ([37], [38], [71]) or p = 2 with size = 1=2 ([53]). Similar
bounds are achieved in Davydov [18], Rio [61], [62], Dedecker and Doukhan [23], and are
widely used for central limit theory ([21], [39], [40]) and HAC matrix estimation ([17],
[40]).
The large discontinuity in size requirements = 1 if p 2 (1; 2) and = 1=2 if p =
2 arises from the architecture of Lp -spaces. See arguments in McLeish [53] and Hansen
[37], [38].
An adapted martingale di¤erence is trivially a mixingale, and von Bahr and Esséen
[68] prove Lp -bounded martingale di¤erence sequences fyt ; =t g, satisfy
E
n
X
p
yt
K
t=1
n
X
t=1
p
E jyt j for p 2 [1; 2]:
(3)
Hitczenko [43], [44] generalizes Rosenthal [64] and Burkholder [9] inequalities for iid and
martingale di¤erence processes. First, for any arbitrary =t -adapted nonnegative random
variables yt that are Lp -bounded, and some …nite Kp > 0:
!p
( n
!p )
n
n
X
X
X
p
E
yt
Kp max
E [yt ] ; E
E [yt j=t 1 ]
for p > 1:
(4)
t=1
t=1
t=1
Similarly, for a martingale di¤erence sequence fyt ; =t g of Lp -bounded yt :
8
!p=2 9
p
n
n
n
<X
=
X
X
p
yt
Kp max
E jyt j ; E
E yt2 j=t 1
for p > 2:
E
:
;
t=1
t=1
(5)
t=1
See de la Peña et al [22] for a sharp characterization of the scale Kp , cf. Ibragimov and
Sharakhmetov [48], [49].
Although (2) is nearly identical to (20 ), and mixingale arrays appear as early as Andrews [1], (2) is decidedly di¤erent due to the allowed dependence of qn on n. Notice for
any sequence of …nite displacements fqn g that satisfy qn = expfmax1 t n fent gg ! 1, and
any tiny 0 < <
kE [yn;t ]
kyn;t
E [yn;t j=t
en;t
ln (qn )
qn ]kp
E [yn;t j=t+qn ]kp
en;t
ln (qn )
2
ln (qn )
ln (qn )
qn
qn +1
Kqn
Kqn
+
+
:
(6)
By displacing fast enough as sample size n ! 1 the measure of heterogeneity en;t is irrele+
vant: only memory captured by qn
matters. Ultimately this suggests an exploitable
defect in the mixingale concept itself: separating heterogeneity en;t from memory qn is
arti…cial and meaningless from the perspective of (6). Hill [41] apparently …rst used displacement sequences to deduce a property like (6) for tail trimmed Near Epoch Dependent
arrays. We exploit the idea here to prove moment bound (3) under (2), and deliver a new
WLLN.
The sequence fqn g is arbitrary and controlled by the analyst, hence we can always
displace to the deep past or distant future faster than the accumulation of heterogeneous
characteristics of yn;t measured by en;t . This key quality of mixingale (2), exhibited in
(6), leads to bound (3) and therefore an improvement over Burkholder-Rosenthal type
bounds and their recent advances (4) for a massive array of dependent processes. See de
Jong [21], Hill [39], [40], [41], Leadbetter et al [51], and Rootzén [63] to name a few for
usage of displacement sequences in various settings.
In this paper we exploit qn ! 1 as n ! 1 to prove (3) in Theorem 2.1 for Lp mixingale arrays fyn;t ; =t g, p 2 [1; 2], with any size > 0 and any degree of heterogeneity
en;t . Bound (3) then promotes our main WLLN implied by Theorem 2.2. The bound
is potentially far sharper than bounds implied by McLeish [53] and Hansen [37] since
Ejyn;t jp may be much smaller than mixingale constants en;t . See Sections 2 and 5 for
examples.
Further, at least for asymptotic
(3) renders bounds like (4) irrelevant since,
Pn arguments
p
up to a multiplicative scale,
Ejy
j
is
trivially better. Whether our scale K is
n;t
t=1
sharp is not considered here since this is irrelevant for a WLLN. See [22], [48], [49] and
their references.
Pn
Our proof of (3) re-vitalizes a well known decomposition of t=1 yn;t into an in…nite
series of a martingale di¤erence. We use qn ! 1 as n ! 1 to support a …nite series approximation that eliminates mixingale size and heterogeneity restrictions. Consult
Section 2 for an intuitive explanation behind the proof of (3) and why the use of …xed
displacements q fails to promote (3).
Andrews [1] and Davidson [14] deliver laws of large numbers for uniformly
Pintegrable
n
L1 -mixingale arrays fyn;t ; =t g of any size, provided heterogeneity is bounded: t=1 en;t =
O(n). Yanjiao and Zhengyan [71] tackle P
non-uniformly integrable Lp -mixingales fyt ; =t g
1
p p
with size 1 and bounded heterogeneity:
t=1 et =bt < 1 for monotone fbt g, bt ! 1 as
t ! 1. We allow any size , any degree of heterogeneity since we never bound en;t , and
non-uniform integrability, and we do not require uniform Lp -boundedness for any p > 0.
For example, Ejyn;t jp ! 1 is possible for small 0 < p
1 when yn;t is a tail trimmed
version of heavy tailed random variable yt .
Related inequalities for mixing sequences, martingales and sums of martingale differences date to Burkholder [9], Davydov [18], Doob [25], Ibragimov [46], Marcinkiewicz
and Zygmund [52], Rosenthal [64], von Bahr and Esséen [68], and Yokoyama [72], with
recent contributions for associated, mixing or weakly dependent sequences by Birkel [5],
Boucheron et al [6], Dedecker and Doukhan [23], Doukhan and Louhichi [27], Doukhan and
Neumann [28] and Rio [61], [62]. Subsequent improvements for martingales predominately
concern the constant Kp in (4): see [22], [23], and [58].
Pj
Our method of proof
evidently precludes a maximal inequality E[max1 j n j t=1 fyn;t
P
n
p
E[yn;t ]gjp ] K t=1 E jyn;t j . We therefore do not provide a strong law of large numbers (e.g. [37], [53]). Nevertheless, we gain substantial generality for weak laws since
Ejyn;t jp may be much smaller than epn;t , while all mixingale inequalities require at least
1=2 and/or summable powers of en;t , e¤ectively restricting hyperbolic memory and/or
non-stationarity.
Our WLLN covers martingale di¤erences, -, -, - and -mixing, and geometrically
3
ergodic arrays, and Near Epoch Dependent and mixingale arrays, allowing stochastic
trend, nonlinear distributed lags, AR-GARCH, and nonlinear GARCH. See Sections 5
and 6 for veri…cation of the major assumptions and examples.
Laws of large numbers for dependent heterogeneous data are now well established. A
small subset includes Andrews [1], Arcones [2], Chandra and Ghosal [10], Davidson [14],
de Jong [19], [20], Doukhan and Wintenberger [29], Hansen [37], McLeish [53], Shao [65],
and Teicher [67]. See de Jong [19] for what appears to be the lightest available restrictions
on mixingale heterogeneity, to date. Marcinkiewicz-Zygmund moment bounds result in
maximal inequalities for stationary uniformly integrable -mixing sequences, from which
strong laws are derived ([2], [24], [62]). Bounds are imposed that replicate mixingale
coe¢ cient summability and uniform integrability.
In Sections 3 and 4 we apply the main results to tail trimmed arrays fyn;t g of a possibly
non-integrable sequences fyt g. In the case of tail trimming Ejyn;t js ! 1 as n ! 1 is
possible for some s > 0. We therefore characterize so-called self-scaled laws of the form
Pn
s
jy j
Pnt=1 n;t s = 1 + op (1) for any s > 0;
(7)
t=1 E jyn;t j
and explicitly characterize the rate of convergence in (7) when the untrimmed sequence
fy
or non-stationary regularly P
varying distribution tails. Scaling by
Ptng has stationary
s
n
s
E
jy
j
implies
the
rate
of
convergence
of
n;t
t=1
t=1 jyn;t j itself need not be known.
Limit theory for trimmed sums of iid data has a deep history, dating at least to
Newcomb [56], cf. Stigler [66]. Very few extensions to non-iid data exist ([35], [69]), and
apparently none allow general dependence, heterogeneity and heavy tails together (e.g.
Csörg½o et al [13], Gri¢ n and Qazi [33], Hahn et al [36], Pruitt [59]).
Throughout K denotes a positive …nite constant whose value may change from line to
line. > 0 is a tiny number that may change with the context. fan g = fan gn 1 . an bn
p
signi…es an =bn ! 1. ! denotes convergence in probability. L(x) is a slowly varying [s.v.]
iid
function, the value or rate of which may change with the context1 . t
Pb
is iid with zero mean and unit variance. Summations a = 0 if b < a.
(0; 1) implies
t
2.
Weak Laws of Large Numbers In this section we develop the moment bound
and WLLN for a generic Lp -bounded mixingale array fyn;t ; =t g.
Assumption 1.
The random variables yn;t are zero mean Lp -bounded p 2 [1; 2]
for each 1
t
n and n
1, such that non-degeneracy (1) holds. Further, fyn;t ; =t g
forms an Lp -mixingale array (2) with monotonic coe¢ cients qn & 0 as qn ! 1 of any
size > 0.
Remark 1:
Monotonic decay does not reduce generality and merely sharpens notation in the proof of Theorem 2.1, below.
Remark 2:
The mixingale base =t does not need to be explicitly de…ned here. It
su¢ ces merely to impose (2), where Lp -boundedness for p 2 [1; 2] su¢ ces for a version
E[yn;t j=s ] to exist for any s 2 Z by the Radon-Nikodym theorem.
Pn
Theorem
2.1 (Partial Sum Moment Bound).
Under Assumption 1 Ej t=1 yn;t jp
Pn
K t=1 Ejyn;t jp .
Pn
Pn
p
Apply Markov’s inequality and Theorem 2.1 to deduce P (j t=1 yn;t j > ") = O( t=1 E jyn;t j )
for any " > 0. This leads to the second main result of this paper.
1 Recall L( x)=L(x) ! 1 as x ! 1, 8
cf Resnick [59].
> 0, where products and powers of L(x) are slowly varying,
4
Theorem 2.2 (Partial Sum Probability Bound).
1
Pn
1=p
( t=1 Ejyn;t jp )
n
X
Under Assumption 1
yn;t = Op (1) :
t=1
Remark 1:
Although Assumption 1 presumes Lp -boundedness for each t and n,
Theorem 2.2 does not require uniform integrability. In particular Ejyn;t jp ! 1 as n !
1 is allowed.
Remark 2:
Theorem 2.2 implies a WLLN: for any L(n) ! 1 as n ! 1
n
2.1
X
1
p
yn;t ! 0:
Pn
p 1=p
( t=1 E jyn;t j ) L(n) t=1
Intuition Behind Theorem 2.1
In order to understand how displacements qn ! 1 as n ! 1 can alleviate size and
heterogeneity restrictions, recall a standard martingale di¤erence decomposition for Lp mixingales p 2 (1; 2] (e.g. [37], [38], [53], [54]):
n
X
n
1 X
X
yn;t =
q= 1 t=1
t=1
fE[yn;t j=t+q ]
E[yn;t j=t+q
1 ]g =:
1
X
q= 1
Yn;q a:s:
(8)
Since we are only interested in a partial sum moment bound, for simplicity assume here
fyn;t ; =t g is an L2 -mixingale with size = 1. In the spirit of McLeish [53] and Hansen
[37] observe
n
X
t=1
yn;t
2
1
X
kYn;q k2
1
X
q= 1
n
X
1
X
n
X
q= 1
q= 1
t=1
t=1
(9)
E (E[yn;t j=t+q ]
e2n;t 2q
!1=2
=
1
X
E[yn;t j=t+q
q= 1
q
n
X
t=1
2
1 ])
e2n;t
!1=2
!1=2
:
The …rst inequality follows from (8) and Minkowski’s inequality; the second from the
martingale di¤erence property of E[yn;t j=t+q ]
E[yn;t j=t+q 1 ]; and the third from the
mixingale property (2). See Hansen [37], [38] for the Lp -mixingale case for p 2P(1; 2), and
n
2
the proof
P1Clearly E( t=1 yn;t )
Pn of 2Theorem 2.1, below, for greater detail and generality.
K t=1 en;t since size = 1 ensures coe¢ cient summability q= 1 q = K < 1.
Pn
Pn
We necessarily arrive at E( t=1 yn;t )2
K t=1 e2n;t because the mixingale property is quanti…ed over all integer displacements P
q. This is precisely what occurs at
n
2
the third inequality: we are literally stuck with
t=1 en;t because E(fE[yn;t j=t+q ]
2
2
2
E[yn;t j=t+q 1 ]g)
en;t q is evaluated at every t and q. Since P
decomposition (8) in1
volves every integer lag q 2 Z we are faced with the in…nite series q= 1 q in (9). The
right-hand-side of the equality in (9) is …nite only if the coe¢ cients q have a size 1.
Although far more sophisticated
arguments exist, in all cases a conclusion like (9) is
Pn
reached: by decomposing t=1 yn;t with E[yn;t j=t+q ]
E[yn;t j=t+q 1 ] for every q 2 Z
the …nal bound involves every fen;t gnt=1 and f q gq2Z . This classic approach implies a very
strong cost to pay: memory must be restricted and the bound is in terms of unknown
en;t .
5
P1
We solve the size restriction necessity associated with the in…nite summation q= 1
and the heterogeneity restriction associated with the constants en;t by spreading out the
information used in Yn;q . Rather than summing over every E[yn;t j=t+q ] E[yn;t j=t+q 1 ],
we show for some sequences of …nite positive integers fgn ; rn g
lim
n!1
n
X
yn;t
t=1
rn
n
X
X
t=1 q= rn
E[yn;t j=gn (t+q) ]
E[yn;t j=gn (t+q
= 0:
1) ]
2
This follows since we are free to choose gn ! 1 as n ! 1 so fast that the terms
E[yn;t j=gn (t+rn ) ] ! yn;t and E[yn;t j=gn (t rn 1) ] ! 0 in L2 -norm as fast as we choose by
appealing to mixingale property (6). Precisely how fast gn ! 1 will be made clear in
the proof, below. P
Pn
n Prn
2
E[yn;t j=gn (t+q 1) ]g)2 K t=1 E[yn;t
]
We then show E( t=1 q=
rn fE[yn;t j=gn (t+q) ]
by exploiting three facts. First, property (6) implies limn!1 jjE[yn;t j=gn (t+q) ] E[yn;t j=gn (t+q
= 0 for all q 2
= f t; t + 1g, where convergence occurs as fast as we choose by setting gn ! 1 su¢ ciently fast. Second, for the remaining q 2 f t; t + 1g the
Prterms
n
jjE[yn;t j=gn (t+q) ] E[yn;t j=gn (t+q 1) ]jj2 2jjyn;t jj2 by Jensen’s inequality. Third, q=
rn
always sums over …nitely many E[yn;t j=gn (t+q) ]
E[yn;t j=gn (t+q 1) ], freeing us from
mixingale size restrictions.
2.2
Proof of Theorem 2.1
The case p = 1 is trivial, so we prove the claim for any p 2 (1; 2] by modifying
arguments in [37], [38], and [53].
Since the displacements qn are arbitrary put
qn = q
gn where q 2 Z;
for some sequence fgn g of positive integers gn 2 N, lim inf n!1 gn
Write Em [ ] := E[ j=m ] and de…ne
Sn :=
n
X
t=1
yn;;t and Yn;q :=
n
X
Egn (t+q) [yn;t ]
Egn (t+q
Sn =
q= rn
Yn;q + Rn =
1) [yn;t ]
:
t=1
Let frn g be a sequence of positive integers that satis…es 1 + 3n
the decomposition
rn
X
1, gn < 1 if n < 1.
rn
X
q= rn
Yn;q + op
gn rn
rn < 1. We …rst derive
n
X
t=1
en;t
!
(10)
where fRn g is a zero mean stochastic sequence, for each n bounded and measurable. We
then use rn < 1 to escape any mixingale size requirements, and gn ! 1 to promote the
desired bound (3) itself.
De…ne
Step 1 (…nite series decomposition):
E1;n :=
n
X
yn;t
Egn (t+rn ) [yn;t ]
and E2;n :=
t=1
n
X
Egn (t
rn 1)
[yn;t ] :
t=1
Clearly for any n
rn
X
Egn (t+q) [yn;t ]
Egn (t+q
1) [yn;t ]
q= rn
6
= Egn (t+rn ) [yn;t ]
E
gn (rn +1 t) [yn;t ]
1) ]jj2
hence
rn
X
Sn =
q= rn
Yn;q + E1;n + E2;n :
Note the identity
Egn (t+rn ) [yn;t ] = E yn;t j=t+(gn
where (gn
1)t + gn rn
E
gn rn holds for each n
gn (rn +1 t)
1)t+gn rn
1, 1
[yn;t ] = E yn;t j=t+(gn
t
n, and gn
1. Similarly
1)t gn (rn +1)
where (gn 1)t gn (rn + 1) < [gn rn =2] for all n 1 and 1 t n since we assume
rn
3n + 1. Now apply Assumption 1 with monotonic mixingale coe¢ cients, E[yn;t ] =
0 and Minkowski’s inequality to deduce
n
X
kE1;n kp
yn;t
Egn (t+rn ) [yn;t ]
p
t=1
K
n
X
t=1
n
X
kE2;n kp
kyn;t
E
n
X
Et+gn rn [yn;t ]kp = o gn rn
en;t
t=1
gn (rn +1 t) [yn;t ] p
t=1
K
n
X
!
Et
[gn rn =2] [yn;t ] p
= o gn rn
n
X
en;t
t=1
t=1
!
:
But this implies by Minkowski’s inequality
Sn
rn
X
q= rn
Yn;q
kE1;n kp + kE2;n kp = o gn rn
p
n
X
en;t
t=1
!
:
An appeal to Markov’s inequality leads to (10) with Rn = E1;n + E2;n , hence
!
n
X
kRn kp = o gn rn
en;t :
(11)
t=1
Use the …nite series decomposition (10) to deduce by
Step 2 (moment bound):
Minkowski’s inequality
kSn kp
rn
X
q= rn
Yn;q
p
+ kRn kp :
(12)
Pn
If lim supn!1 t=1 en;t
K then (11) and rn > n imply jjRn jjp ! 0 for any gn
! P
1. Otherwise choose any sequence of positive …nite integers fgn g that satis…es
n
gn ( t=1 en;t ) 1= ! 1 to force
kRn kp ! 0 hence kSn kp
In order to bound jj
Prn
q= rn
rn
X
q= rn
Yn;q
+ o (1) .
p
Yn;q jjp , de…ne
Xn;t;q := Egn (t+q) [yn;t ]
7
Egn (t+q
1) [yn;t ]
(13)
such that
Yn;q =
Note fXn;t;q ; =gn (t+q) : 1
E Xn;t;q j=gn (t
t
ngn
1
n
X
t=1
Xn;t;q :
forms for every q a martingale di¤erence array:
= E E[yn;t j=gn (t+q) ]
1+q)
E[yn;t j=gn (t+q
1) ]
j=gn (t
= 0:
1+q)
De…ne s = p=(p
1) for p 2 (1; 2]. Any sequence of non-zero real numbers faq g1
q=
P1
s=p
a
<
1,
satis…es
by
Hölder’
s
inequality
(cf.
[54])
q
q= 1
E
rn
X
q= rn
p
Yn;q
p
rn
X
= E
q= rn
rn
X
1=p
a1=p
Yn;q
q aq
as=p
q
q= rn
Now set a0 = 1 and a
< 1, and de…ne
q
1,
!p=s
= aq = q
(14)
rn
X
q= rn
p=s
Zn;l := yn;t
aq 1 E Yn;q
such that
p
rn
X
K
q= rn
P1
s=p
aq
q= 1
=
El [yn;t ] for l 2 Z:
p
aq 1 E Yn;q
P1
q= 1
q
1
:
(s=p)
We require the following bounds by Minkowski’s inequality:
(15)
Then
E Egn (t+q) [yn;t ]
Egn (t+q
1) [yn;t ]
E Egn (t+q) [yn;t ]
Egn (t+q
1) [yn;t ]
Prn
rn
X
q= rn
q= rn
1
aq 1 E Yn;q
aq E Yn;q
p
=
p
p
p
Egn (t+q) [yn;t ]
p
+ Egn (t+q
1) [yn;t ] p
p
p
Zn;gn (t+q)
p
+ Zn;gn (t+q
:
1) p
is bounded from above:
rn
X
n
X
1
aq E
q= rn
= K
rn
n
X
X
t=1
p
Xn;t;q
K
rn
X
aq 1
q= rn
aq 1 E Egn (t+q) [yn;t ]
n
X
t=1
Egn (t+q
p
E jXn;t;q j
1) [yn;t ]
(16)
p
t=1 q= rn
K
n
t 1
X
X
p
aq 1
t=1 q= rn
rn
n
X
X
+K
Egn (t+q) [yn;t ]
aq 1
t=1 q= t+2
+K
n X
t+1
X
+ Egn (t+q
p
1) [yn;t ] p
p
Zn;gn (t+q)
aq 1 E Egn (t+q) [yn;t ]
p
+ Zn;gn (t+q
Egn (t+q
1) p
1) [yn;t ]
p
t=1 q= t
= A1;n + A2;n + A3n :
The …rst inequality follows from a generalization of the van Bahr-Esséen inequality to
Lp -bounded martingale di¤erence arrays fXn;t;q ; =gn (t+q) : 1
t
ngn 1 for p 2 (1; 2]
8
(see Theorem 2 of [68]) The second inequality followsPfrom Minkowski’s inequality and
rn
(15), and an obvious decomposition of the summation q=
rn . Re-write the summations
over q in A1;n and A2;n using aq = a q :
A1;n = K
n rX
n t
X
1
aq+t
kE
A2;n = K
n rX
n +t
X
aq 1t kZn;gn q kp + Zn;gn (q
t=1 q=1
t=1 q=2
p
gn q [yn;t ]kp
+ E
gn (q+1) [yn;t ] p
p
:
1) p
Apply the Assumption 1 mixingale property with monotonicity of coe¢ cients, and aq
> 0, to obtain
A1;n
K
rn
n X
X
t=1 q=1
K
rn
n X
X
p
1
aq+t
kE
1 p
aq+t
en;t
gn q [yn;t ]kp
p
gn q
+ E
Kgn p
t=1 q=1
n
X
gn (q+1) [yn;t ] p
epn;t
t=1
rn
X
1
aq+t
q
p
:
q=1
Prn
1
Notice 0 < q=1
aq+t
q p < 1 by construction for each rn < 1 hence for each n < 1.
Since fgn g are arbitrary …nite integers we can always choose them to satisfy (13) and
A1;n
Kgn p
n
X
epn;t
K
2rn
n X
X
t=1 q=2
Kgn p
1
aq+t
q
p
q=1
t=1
By an identical argument and the fact that t
(13), (17) and
A2;n
rn
X
! 0:
n < rn , we can choose fgn g to satisfy
aq 1t kZn;gn q kp + Zn;gn (q
n
X
t=1
epn;t
2rn
X
(17)
aq 1t q
q=2
p
p
! 0:
Finally, apply Minkowski and Jensen inequalities, and a0 = 1 and a
to deduce
n X
t+1
n
X
X
p
p
A3n K
aq 1 E jyn;t j
K
E jyn;t j :
t=1 q= t
(18)
1) p
q
p=s
= aq = q
,
(19)
t=1
Combine (12)-(14) with (16)-(19) and non-degeneracy (1) to obtain
kSn kp
rn
X
q= rn
Yn;q
p
+ kRn kp
K
n
X
t=1
p
E jyn;t j
This completes the proof. QED.
9
!1=p
+ o (1)
K
n
X
t=1
p
E jyn;t j
!1=p
:
3.
Tail trimmed Arrays and Self-Scaled Laws An interesting application of
Theorem 2.2 is to tail trimmed arrays fyn;t g of a heavy tailed stochastic process fyt gt2Z .
Assume yt is measurable on the probability space ( ; G; P ), and uniformly Lp -bounded
for some p 2 (0; 2) with in…nite variance
inf E yt2 = 1:
t2Z
Let fyt gnt=1 be the sample path with size n 1, and let fcn;t g = fcn;t : 1 t ngn 1
be a triangular array of positive constants, cn;t ! 1 as n ! 1 for all t. The tail trimmed
version is
yn;t := yt I (jyt j cn;t ) ,
(20)
where I(A) = 1 if A is true, and 0 otherwise. The thresholds fcn;t g are determined by an
intermediate order sequence fmn g:
n
P (jyt j > cn;t ) ! 1 where mn ! 1 and mn = o(n):
mn
(21)
Thus cn;t approximates the mn =n ! 0 two-tailed quantile of yt . Asymmetric trimming is
a simple extension with only added notation.
Central order trimming mn =n ! (0; 1) implies yn;t is uniformly bounded. This case
is implicitly covered in the mixingale literature (e.g. [1], [14], [20]). Conversely, if yt is
non-integrable then extreme order trimming mn ! m, a …nite integer, results in too little
trimming for a WLLN to be deduced from Theorem 2.2. See Remarks 1 and 3 of Theorem
3.3, below, and see Leadbetter et al [51] for seminal order statistic theory.
In…nite variance precludes such simple non-stationary cases as uniformly distributed
yt taking values in f t ; t g for some > 0. See, e.g., Davidson [14]. Examples covering
(20)-(21) are detailed below.
Trivially yn;t is L2 -bounded for each t and n, while inf t2Z E[yt2 ] = 1 ensures nondegeneracy (1) since under tail trimming
lim inf inf E[yt2 I (jyt j
c!1 t2Z
c)] > 0:
Theorem 2.2 applies as long as fyn;t ; =t g is a mixingale array.
Assumption 2.
Let fyn;t g be the tail trimmed array de…ned by (20)-(21), and
assume fyn;t ; =t g forms an L2 -mixingale array with monotonic coe¢ cients qn & 0 as
qn ! 1 of any size > 0.
Theorem
(Tail Trimmed Partial
Bound).
P3.1
PnSum Probability
n
1=2
2
sumption 2
fy
E[y
]g
=
O
((
])
).
E[y
n;t
n;t
p
n;t
t=1
t=1
Under As-
Since L2 -mixingales form Lp -mixingales for any p 2 (1; 2] by Lyapunov’s inequality,
apply Theorems 2.1 and 3.1 to deduce for tiny > 0
0
!1=(1+ ) 1
n
n
X
X
1+
A:
fyn;t E[yn;t ]g = Op @
E jyn;t j
(22)
t=1
t=1
Pn
E[yn;t ]g = Op (n ), hence a
Thus if yt is uniformly L1+ -bounded then 1=n t=1 fyn;t
WLLN applies. Since this case is implicitly covered elsewhere, in the sequel we focus on
non-integrable cases.
Corollary 3.2 (Tail Trimmed Mean WLLN).
Assume fyt g is uniformly
Pn
p
L1+ -bounded. Under Assumption 2 1=n t=1 fyn;t
E[yn;t ]g ! 0.
10
Far greater re…nement is available under a regular variation property. Assume for each
t and slowly varying [s.v.] Lt (y)
P (jyt j > y) = y
t
Lt (y), where
t
2 (0; 2], Lt (y) > 0 for each t 2 Z and y 2 R:
(23)
In the case of stationary tails
Lt (y) = L(y) and
t
= , hence the thresholds are cn;t = cn :
(24)
Class (23)-(24) coincides with the maximum domain of attraction of a Type II extreme
value distribution, and the domain of attraction of a stable law if < 2, it naturally
characterizes stochastic recurrence equations like GARCH data, and has been used to
model asset returns, urban growth, network data, insurance claims, and meteorological
events. See Basrak et al [3], Bingham et al [4], Hill [40], [41], Kesten [50], Resnick [60],
and their citations.
De…ne a signed power transform
<s>
yn;t
:= sign(yt )
s
jyn;t j
for s > 0:
A direct appeal to Karamata’s Theorem under stationary regular variation (23)-(24) gives
the well known trimmed moment formula (e.g. Theorem 0.6 of Resnick [60]):
s
E jyn;t j
Kcsn P (jyt j > cn )
Kcns
E jyn;t j
L (cn ) for s.v. L ( ) .
L (cn ) as cn ! 1 8s >
(25)
(250 )
We can assume L( ) in (25) and (250 ) are the same up to a multiplicative constant. Simply
<s>
extend Theorem 3.1 to fjyn;t js ; yn;t
g to deduce the following self-scaled laws. See the
appendix for a proof.
Theorem 3.3 (Self-Scaled Tail Trimmed WLLN).
Let trimming properties
(20) and (21) hold, assume tail property (23) under stationarity (24), choose any s
s
and assume lim inf n!1 fE jyn;t j g > 0.
Pn
1=2
a: If jyn;t js satis…es Assumption 2 then 1=n t=1 jyn;t js =Ejyn;t js = 1 + Op (1=mn );
Pn
<s>
<s>
<s>
b: If yn;t
satis…es Assumption 2 then 1=n t=1 yn;t
= E[yn;t
] + Op ((n=mn )s= 1+ L (n) =n )
for some s.v. L(n) and tiny > 0.
Remark 1:
The proof only exploits the property (21) implication cn ! 1, and
technically
does
not
require mn ! 1. But extreme order trimming mn ! m implies
Pn
1=n t=1 jyn;t js =Ejyn;t js = 1 + Op (1) so a WLLN cannot be deduced for higher moments
s
.
Remark 2:
Under intermediate order trimming the minimum rate of convergence
1=2
mn ! 1 is optimized by maximal trimming, hence a minimal threshold rate cn ! 1.
Consider mn = n=L(n) for s.v. L(n) ! 1:
!
n
s
n1=2 1 X jyn;t j
1 = Op (1):
L (n) n t=1 E jyn;t js
Remark 3:
Self-scaling under tail stationarity (23)-(24) provides an elegant avenue
for
characterizing
a minimal rate since by Karamata’s Theorem both Ejyn;t js and
Pn
.
( t=1 Ejyn;t j2s )1=2 are proportional to the same functions of n for any s
11
The non-stationary tail case similarly follows from Theorem 3.1. Consider for brevity
a Paretian tail
P (jyt j > y) = dt y
t
(1 + o(1)) ; 8t 0 < dt < 1 and
t
2 (0; 2]:
Use threshold construction (21) and Karamata’s Theorem to deduce for any s
1=
cn;t = dt
(n=mn )1=
t
t
s
and E [jyt j I (jyt j
2s
Non-degeneracy lim inf n!1 inf t2Z E jyn;t j
1
Pn
1=2
2s
t=1
E jyn;t j
n
X
t=1
s
Kdt
s=
t
(n=mn )
t
maxt2N f t g
1
:
(27)
> 0 implies we can always write
s
fjyn;t j
s=
cn;t )]
(26)
E jyn;t j g =
Pn
s
t=1
Pn
t=1
E jyn;t j
2s
1=2
E jyn;t j
Pn
s
jy j
Pnt=1 n;t s
t=1 E jyn;t j
1
where threshold and moment formulae (27) lead to
Pn
s
t=1
Pn
t=1
E jyn;t j
2s
1=2
K
E jyn;t j
Pn
Pn
t=1
= K
s=
t=1
Pn
2s=
dt
t
s=
t=1
Pn
dt
dt
2s=
t=1 dt
t
t
s=
(n=mn )
2s=
(n=mn )
t
s=
(n=mn )
2s=
(n=mn )
t
t
1
1
1=2
t
t
1=2
mn
n
1=2
= Dn
mn
n
1=2
say. It is easy to con…rm under tail-stationarity dt = d and t = that Dn = Kn1=2 ,
1=2
hence Dn (mn =n)1=2 = Kmn as in Theorem 3.3. Theorem 3.1 and the above relations
prove the next result.
Theorem 3.4.
Let trimming properties (20) and (21), and power-law tail (26)
hold, and let fjyn;t js g satisfy Assumption 2 for s maxt2N f t g. Then:
Pn
s
mn 1=2
t=1 jyn;t j
P
Dn
1 = Op (1) :
(28)
n
s
n
t=1 E jyn;t j
Notice (27) does not guarantee a WLLN since Dn (mn =n)1=2 ! 0 is evidently possible
in some non-stationary cases. If the probability tails have a constant index t = and
are trending in scale then a WLLN does exist.
EXAMPLE 1 (Trending Scale):
Assume yt has tail (26) with indices t = and
scales dt = d t for some
0 and …nite d > 0. Then cn;t = Kt = (n=mn )1= . If =
0 then Dn = n1=2 , and otherwise
Pn
t s=
Kns= +1
Dn = P t=1
= Kn1=2 :
n
n(2s= +1)=2
2 s= 1=2
t
t=1
If jyn;t js for some s > 0 satis…es mixingale Assumption 2, then by Theorem 3.4
Pn
s
jyt j I jyt j t1= (n=mn )
Pn t=1
s
t1= (n=mn )
t=1 E jyt j I jyt j
= 1 + Op 1=m1=2
:
n
The scale properties of power-law tails (26) imply trend in tail scale has no impact on the
minimum rate of convergence.
12
;
EXAMPLE 2 (Stochastic Trend):
Let f t g be an iid process with stationary
power-law tail
P (j t j > ) = d
(1 + o(1)) ; d > 0; > 0;
(29)
Pt
and de…ne a stochastic trend yt := i=0 t i . Then yt has tail (26) with constant index t
= and trending scale dt = d t (see, e.g., [8]), hence cn;t = Kt1= (n=mn )1= . Note for
any s > 0 and every n 1 there exists a su¢ ciently large q 2 N that
s
max E (E jyn;t j
1 t n
s
E[jyn;t j j=t
2
q ])
s
= 0 and max E (jyn;t j
1 t n
2
s
E[jyn;t j j=t+q ]) = 0;
so mixingale Assumption 2 holds. Therefore fyt g satis…es the conditions of Example 1.
4.
Weak Laws under Stochastic Trimming Let fyn;t g be the tail-trimmed array in (20), and assume tail-stationarity for brevity: cn;t = cn . In practice a stochastic
(a)
plug-in will be used for the threshold cn , so de…ne yt := jyt j, and construct two-tailed
(a)
(a)
(a)
order statistics y(1) y(2)
y(n) and a stochastically trimmed array f^
yn;t g,
y^n;t = yt I jyt j
(a)
y(mn +1) :
(a)
Clearly y(mn +1) estimates cn . The literature on the sharpness of the approximation for iid
and mixing data is substantial (cf. [45], [51]), with few results under general dependence
and heterogeneity. See [39], P
[40], [41, [45], [63] and their references.
n
In order to characterize t=1 f^
yn;t
yn;t g we must bound the threshold approximation rate (21), and memory in the tail-event In;t (u) = I(jyt j > cn eu ). The following
assumptions are veri…ed for a variety of processes in Sections 5 and 6.
Assumption 3 (Extremal-NED).
fIn;t (u)g is L2 -Near Epoch Dependent on
n
f=t g with size 1=2: jjIn;t (u) E[In;t (u)j=t+q
en;t (u) qn where en;t (u) is Lebesgue
t qn ]jj2
integrable on R+ , max1
q 2 N.
t n
supu
0 fen;t (u)g
Assumption 4 (Tail Order).
= K(mn =n)1=2 ,
qn
= o(qn
1=2
) and lim inf n!1 qn
1=2
(n=mn )P (jyt j > cn ) = 1 + o(1=mn ).
Remark :
The Extremal-NED property Assumption 3 was …rst explored in Hill [39],
[40], [41] as a way to characterize tail memory for asymptotic theory for tail estimators.
Together Assumptions 3 and 4 with a base -…eld =t induced by an -mixing process f t g
(a)
ensure consistency and asymptotic normality of an intermediate order statistic y(mn +1) ,
and tail index and tail dependence estimators under general dependence conditions. We
(a)
1=2
use the assumptions here to exploit y(mn +1) =cn = 1 + Op (1=mn ) for a large class of
dependent sequences fyt g.
Pn
yn;t yn;t g
Lemma 4.1.
Under (20), tail-stationarity and Assumptions 2-4 t=1 f^
= Op (cn ).
Lemma 4.1 and partial sum probability bound Theorem 3.1 together suggest we need
only bound the thresholds cn to characterize a minimum rate of convergence.
cn = O(n1=2 jjyn;t jj2 ).
Assumption 5 (Threshold Bound).
Remark :
The property characterizes a sequence of …xed point bounds
cn
Kn1=2 E yt2 I (jyt j
13
cn )
1=2
:
Such a sequences fcn g trivially exists under non-degeneracy lim inf c!1 jjyt I(jyt j
c)jj2
> 0 since any cn ! 1 with cn Kn1=2 satis…es the bound. Further, cn Kn1=2 jjyn;t jj2
automatically holds when yt has Paretian tail (29). See Section 6.
The next result follows instantly from Theorem 3.1 and Lemma 4.1.
Theorem 4.2.
Under tail-stationarity and Assumptions 2-5
n
X
1
f^
yn;t
n1=2 kyn;t k2 t=1
E [yn;t ]g = Op (1) :
Similarly, Corollary 3.2 and Lemma 4.1 deliver a WLLN for stochastically tail-trimmed
arrays f^
yn;t g.
Corollary 4.3.
Let fyt gbe uniformly L1+ -bounded and tail-stationary. Under
Pn
p
Assumptions 2-5 1=n t=1 f^
yn;t E[yn;t ]g ! 0.
5.
Mixingale Arrays: Assumptions 2-3 We verify all assumptions imposed for
tail trimming in Sections 3 and 4. We begin with dependence assumptions here, and
tackle tail order and threshold assumptions in Section 6.
In general mixingale sequences fyt ; =t g satisfy tail trimmed mixingale Assumption 2,
and both mixingale Assumption 2 and NED tail array Assumption 3 apply to Lp -Near
Epoch Dependent and -mixing sequences fyt g. Consult Hill [39], [40], [41] for related
theory for tail and tail trimmed arrays. De…ne the tail trimmed array fyn;t g as in (20)(21).
5.1
Mixingale
If fyt ; =t g is an Lp -mixingale, p > 0 then an argument identical to Corollary 3.5 in
Hill [41] shows the tail trimmed fyn;t ; =t g forms an L2 -mixingale array.
Lemma 5.1.
If fyt ; =t g forms an Lp -mixingale, p > 0, then Assumption 2 holds.
A trivial example of an Lp -mixingale is an adapted martingale di¤erence sequence
fyt ; =t g (McLeish [53]).
EXAMPLE 3 (GARCH):
De…ne a strong-GARCH(p; q) process ytP
= ht t , where
p
2
2
=
!
+
the
error
is
iid
with
zero
mean
and
unit
variance,
and
h
t
t
i=1 i yt i +
Pq
2
h
,
!
>
0,
;
0.
Each
y
and
y
y
for
h
1
satisfy
Lemma
5.1
with
=t
i
t
t t h
i
i=1 i t i
:= (yP :
t). P
p
q
If i=1 i + i=1 i
1 then yt is stationary, and if t has a positive density on R
and at least one i or i is strictly positive then power-law tail (29) holds with index
> 2. See, e.g., Theorem 3.1 in Basrak et al [3]. The same methods based on results in
Kesten [50] coupled with convolution tail theory in Cline [11] su¢ ce to show yt yt h for h
1 has tail (29) with index =2. See Cline [12] for similar arguments.
Tail trimmed versions of yt and yt yt h for any h
1 therefore satisfy a WLLN by
Theorem 3.1 and Lemma 5.1. De…ne Yt;h := yt yt h , and a trimmed version Yn;t;h :=
Yt;h I(jYt;h j cn;h ) where fcn;h ; mn g satisfy (n=mn )P (jYt;h j > cn;h ) ! 1. Then
n
X
1
fYn;t;h
n1=2 kYn;t;h k t=1
E [Yn;t;h ]g = Op (1) :
If yt is uniformly L2+ -bounded then the sample tail trimmed covariance is consistent by
Pn
p
Corollary 3.2 and Lemma 5.1: 1=n t=1 fYn;t;h E[Yn;t;h ]g ! 0.
5.2
Near Epoch Dependence
14
Recall fyt g is Lp -Near Epoch Dependent on f=t g or on f t g with size
exist positive real sequences fdt g and f q g such that dt < 1 for each t,
for tiny > 0, and
yt E yt j=t+q
dt q :
t q p
q
> 0 if there
= O(q
)
The NED property dates in some form to Ibragimov and Linnik [47] and McLeish [53],
[54], and was coined in Gallant and White [31]. See Davidson [15], Nze and Doukhan [57]
and Hill [41] for historical notes. Both NED and mixingale properties are relaxed versions
of Lp -Weak Dependence ([70]) and related to s-weak dependence ([23]). The idea is yt
may be perfectly predicted in Lp -norm by the near-epoch f gt+q
t q as q ! 1.
Similarly, f t g is -mixing or strong mixing with coe¢ cients q of size > 0 if
q
:= sup
t2Z A =t
sup
q
1 ;B
=+1
t
jP (A \ B)
P (A)P (B)j = O(q
):
The property applies to -mixing, -mixing, -mixing, and geometrically ergodic sequences, as well as a variety of weak dependence properties ([23], [26], [27]. [47]).
Lemma 5.2.
Let fyt g be Lr -NED, r > 0, on f=t g with any size
> 0, and
<s>
-mixing base f t g of any size
> 0. Then yn;t
and jyn;t js for any s > 0 satisfy
Assumption 2 with mixingale constants en;t = csn;t , and Assumption 3.
<s>
and jyn;t js may be much larger
Remark 1:
The mixingale constants csn;t for yn;t
2s 1=2
than (Ejyn;t j ) . Consider power-law tail (26): if s > t =2 then Ejyn;t j2s Kc2s
n;t (mn =n)
tail
trimming
m
=n
!
0.
Theorem
2.1
is
important
for
deducis dominated by csn;t under
n
Pn
Pn
ing a "minimal" scale t=1 Ejyn;t j2s < t=1 e2n;t for convergence. Clearly if it is possible
to characterize constants en;t that are smaller than csn;t this conclusion may no longer be
valid. The problem is precisely that computing mixingale constants under mixing or NED
assumptions typically requires higher moments and some form of bounding argument. In
our setting then en;t > (Ejyn;t j2s )1=2 is hardly surprising. See especially Theorem 17.5 of
Davidson [15] and Corollary 3.5 of Hill [41].
Remark 2:
Product convolutions yt2 or yt yt h are Lr=2 -NED if yt is Lr -NED by
a straightforward generalization of Theorem 17.9 of Davidson [15]2 . Thus Lemma 5.2
extends to tail trimmed higher moments.
Since any yt is NED on itself, the next claim follows from Lemma 5.2 by setting
t
=
yt .
Corollary 5.3.
If fyt g is Lp -bounded for any p > 0 and
then Assumptions 2 and 3 hold.
Let f t g be a uniformly Lp -
EXAMPLE 4 (Non-Stationary Distributed Lag):
bounded process, p > 0, and de…ne the distributed lag
yt :=
1
X
-mixing with any size
t;i t i :
i=0
Assume 0;i = 1, and assume t;i are Lp -bounded jj
> 1= minf1; pg and some mapping $ : N ! R+ , 0
f t;i ; t g are -mixing with -…eld
=t =
;i ;
:0
i
t;
t;i jjp
= $ (t) i
for some p > 0,
$(t) < 1 for each t. Further,
1<
t :
2 Davidson ([54]: p. 268) uses triangle and Cauchy-Schwartz inequalities to prove the claim for r = 2.
Replace the triangle inequality with Loève’s inequality to deduce the claim for any r > 0.
15
If p
2 use Minkowski and Cauchy-Schwartz inequalities to deduce for any q 2 N
E yt j=tt+qq
yt
If p < 2 then by
yt
1
X
p=2
t;i
i=q+1
k t kp
p
sup k t kp
t2Z
1
X
$(t)q
K$(t)q
:
i=q+1
> 1=p and Loève’s and Cauchy-Schwartz inequalities
0
@
E[yt j=tt+qq ] p=2
1
X
E
1=2
p
p
t2Z
i=q+1
Hence fyt g is Lp=2 -NED on f=t g with size
A
sup E j t j
t;i
12=p
=
K$(t)q
:
and constants dt = K$(t).
P1
EXAMPLE 5 (Distributed Lag):
Assume yt = i=0 t;i t i where t is iid with
stationaryPregularly varying tail (23)-(24) and index > 0. Assume t;i is independent
1
of t and i=0 Ej t;i j < 1. Then yt is a special case of Example 4, and since Ej t;i j
< 1 it follows t;i t has tail (23)-(24) with index > 0, and by independence yt has tail
(23) with index (see [7] and [8]).
EXAMPLE 6 (ARFIMA):
Let yt be ARFIMA(p; d; q) with di¤erence d = 1
for some > 1= minf1; pg. If the innovations t are uniformly Lp -bounded -mixing then
Example 4 applies.
EXAMPLE 7 (Threshold AR(1)-GARCH(1,1)):
order threshold autoregression
yt
t
=
yt
1I
(yt
= ht u t , u t
1
Assume yt is a stationary …rst-
a) + t , j j < 1
(0; 1) , h2t = ! +
2
t 1
+ h2t 1 , ! > 0, ;
0:
Stationarity is assured by E[ln( 2t + )] < 0, hence fyt ; t g are geometrically -mixing.
The same applies if the binary switching indicator I( ) is replaced with any uniformly
bounded g : R ! R: yt = yt 1 g(yt 1 ) + t , j j < 1. See Meitz and Saikkonen [55].
Smooth Transition AR models, for example, are based on either g(yt ) = expf (yt
c)2 g or g(yt ) = [1 + expf (yt
c)g] 1 with threshold c 2 R and rate of transition
0.
Let yt = yt
EXAMPLE 8 (AR(1)-QARCH(1,1)):
ut
1
+
t
and
t
= ht ut , where
iid
(0; 1) and j j 2 (0; 1), with Quadratic-ARCH volatility ht = j + yt 1 j, > 0 and
2 (0; 1]. Then t is geometrically ergodic with stationary tail (29). See, e.g., Cline [12].
EXAMPLE 9 (Hyperbolic-GARCH):
Davidson [16] proposed a hyperbolic memory GARCH(p; q) model that escapes discontinuities and memory perversions of the FIiid
2
GARCH class. Let yt = ht P
(0; 1), and (L)h
(L)yt2 , ! > 0, where the
t, t
Pqt = ! +
p
i
i
lag polynomials are (L) = i=1 i L and (L) = i=1 i L with i ; i 0 De…ne (L)
= (L) + (L) and assume the roots of (L) lie outside he unit circle. Since h2t has the
form ! + (L)yt2 , Davidson [16] suggests the Hyperbolic-GARCH model
"
#
1
X
(L)
1 d i
1
i
L ; d > 0;
0;
(L) = 1
(L)
(1 + d) i=0
where ( ) is the Riemann zeta function. The index d > 0 governs the degree of hyperbolic
memory. As long as 2 [0; 1) then yt is L2 -bounded L1 -NED on f t g with size = d
16
(Theorem 3.1 of Davidson [16]).
EXAMPLE 10 (Stochastic Volatility):
Consider a univariate Log-Autoregressive
Stochastic Volatility process yt = ht t where ln ht = ! + ln ht 1 + ut ; and j j < 1.
iid
Assume t is -mixing with regularly varying tail (15)-(16), and ut
N (0; 1). Then fyt g
satis…es (23)-(24) and is geometrically Lp -NED on f t g. See Hill [42].
6.
Tail Order and Threshold Bounds: Assumptions 4-5 Finally, we verify
tail order Assumption 4 and threshold bound Assumption 5. Assume tail stationarity
throughout.
Assumption 4 (tail order):
Assumption 4 holds for smooth distributions, and under
second order regular variation properties like slow variation with remainder, including
Pareto-type tails (21). See Goldie and Smith [32] and Haeusler and Teugels [34] for
numerous examples.
Absolute Continuity:
If yt has a distribution F (y) := P (yt y) absolutely continuous with respect to Lebesgue measure on R-a:e: then there exists of a probability density
f (y) := (@=@y)F (y) on R-a:e: Continuity ensures arbitrarily many threshold sequences
fcn g can be found to satisfy s cncn f (y)dy = 1 mn =n for any chosen intermediate order
fmn g, hence Assumption 4 holds: (n=mn )P (jyt j > cn ) = 1.
Second Order Power-Law :
If P (jyt j > y) = dy (1 + O(y )), ; > 0, then it
1=2
is easy to show (n=mn )P (jyt j > cn ) = 1 + o(1=mn ) holds for cn = d1= (n=mn )1= and
any intermediate order sequence fmn g with mn = o(n2 =(2 + ) ).
Assumption 5 (threshold bound):
We demonstrate a restricted version of Assumption 5 in general, and exactly Assumption 5 for power-law tails.
Sharp Bound :
If lim inf c!1 jjyt I(jyt j
c)jj2 > 0 then cn
Kn1=2 jjyn;t jj2 auto1=2
matically holds for any cn ! 1 and cn Kn . There is a potential cost to pay for such
a tight bound. Consider if yt has stationary tail (29) then cn = K(n=mn )1=
Kn1=2
1
=2
always holds if
2, and otherwise only for fractiles lim inf n!1 fmn =n
g K. This
threshold bound is achieved only if su¢ ciently many observations are trimmed.
Power-Law :
In order to exploit jjyn;t jj2 ! 1 for non-square integrable yt and any
threshold cn ! 1 it helps to assume yt has tail (29). If variance is …nite > 2 then
(n=mn )1=
(n=mn )1=2
n1=2 , hence cn
Kn1=2 = O(n1=2 jjyn;t jj2 ) so Assumption 5
holds.
Conversely, if variance is in…nite 2 (0; 2) then by Karamata’s Theorem, cf. (25),
E yt2 I (jyt j
cn )
Kc2n (mn =n) = K(n=mn )2=
1
(1 + o(1)) ;
and if = 2 then E[yt2 I(jyt j cn )] L(cn ) for s.v. L(cn ). Now since n=mn
n under
tail trimming it follows cn K(n=mn )1=
Kn1=2 (n=mn )1= 1=2 for any 2 (0; 2), and
cn
K(n=mn )1=2
n1=2 L(cn ) if = 2 since by tail trimming mn = o(n). Therefore,
1=2
cn = O(n jjyn;t jj2 ) for any intermediate order sequence fmn g, so again Assumption 5
holds.
Appendix A: Proofs
Proof of Theorem 3.3.
17
Claim (a):
By trimmed moment formula (25) it follows for any s >
L(u) > 0 8u 2 R,
s
nE jyn;t j
2s
n1=2 E jyn;t j
ncns
1=2
s
1=2
L (cn )
=2
n1=2 cn
and s.v. L(u),
1=2
n1=2 L (cn )
=
:
=2
(30)
cn
L (cn )
Further, regular variation and tail stationarity (23)-(24) imply by construction cn satis…es
the …xed point identity cn = (n=mn )1= L(cn )1= where L(cn ) is the same s.v. function in
(30). Hence
1=2
1=2
n1=2 L (cn )
n1=2 L (cn )
= m1=2
(31)
=
n :
1=2 L(c )1=2
=2
(n=m
)
n
n
cn
Combine (30) and (31) and invoke tail-stationarity and Theorem 3.1 to deduce as claimed
!
!
(s)
^ (s)
n n
^ (s)
n
n
1=2
mn
1
1
1=2
(s)
(s)
2s
n
n
n1=2 E jyn;t j
=
n
Pn
t=1
=
Pn
t=1
2s
1=2
^ (s)
n
(s)
n
E jyn;t j
1
2s
1=2
E jyn;t j
n
X
t=1
s
fjyn;t j
s
E jyn;t j g = Op (1) :
=2
1=2
The case s = is identical since Ejyn;t j
L (cn ) and Ejyn;t j2
cn L (cn ) , where
L( ) are the same s.v. functions up to a multiplicative constant. This implies (30) with
=2
1=2
Kn1=2 L (cn ) =cn for some s.v. L( ).
Claim (b):
We require L(cn )
L(n), where s.v. L( ) may be di¤erent in di¤erent
places. This follows from the construction of cn and properties of regular variation (15)(16). Observe for any a > 0
1
an
P (jyt j > can )
mn
n
P jyt j > a
mn
1=
can ;
hence we can assume can = a1= cn . Since this implies by slow variation L(can )=L(cn ) =
L(a1= cn )=L(cn ) ! 1, we may write L(cn ) L(n) for some s.v. L(n).
By the same argument as claim (a), and L(cn ) L(n), for any s > (case s = is
similar) and tiny > 0
n
X
t=1
s(1+ )
E jyn;t j
!1=(1+
)
n1=(1+ ) csn
=(1+ )
1=(1+ )
L (cn )
Kn1 (n=mn )s=
1+
where := =(1 + ) > 0 is tiny. Now invoke consequence (22) of Theorem 3.1 to deduce
the claim. QED:
Proof of Lemma 4.1.
De…ne In;t := I(jyt j
18
cn ), In;t := 1
1
L (n)
In;t and I^n;t := I(jyt j
;
(a)
y(mn +1) ). By construction and the triangle inequality
n
X
t=1
f^
yn;t
max
yn;t g
1 t n
K
n
n
yt I^n;t
(a)
y(mn +1)
In;t
oo
n
X
cn
n
X
I^n;t
In;t
t=1
In;t
E In;t
+ mn
t=1
= An
(Bn + Cn ) :
n
E In;t
mn
1
1=2
Notice Cn = O(mn ) is implied by threshold Assumption 4.
Use Lemma 3.3.1 of Hill [39], cf. Lemma 3 of Hill [40], to deduce An
1=2
(a)
2cn jy(mn +1) =cn
1j = Op (cn =mn ). Further, Extremal-NED Assumption 3 and variance bound Theorem
Pn
1=2
1=2
2.1 imply E[Bn =mn ] 1=mn t=1 E In;t
K hence Bn = Op (mn ) by Chebyshev’s
1=2
1=2 1=2
inequality. Therefore An
(Bn + O(mn )) = Op (cn mn mn ) = Op (cn ) as required.
QED.
Proof of Lemma 5.2.
Assumption 2:
Under the Lr -NED supposition fjyn;t js =csn;t g is, for any s > 0, L2 NED on f=t g by Corollary 3.5 of Hill [41] . Since the base is -mixing and jyn;t js =csn;t 1,
apply Theorem 17.5 of Davidson [15] to deduce fjyn;t js =csn;t ; =t g is an L2 -mixingale array
with bounded constants K, hence fjyn;t js ; =t g is an L2 -mixingale array with constants
<s>
Kcsn;t . An identical argument applies to yn;t
.
Assumption 3:
The Lr -NED supposition implies fI(jyt j > cn eu )g is L2 -NED on f=t g
1=2
with constants en;t (u) = K(mn =n)1=2 e u=2 and coe¢ cients #qn = o(qn ) by Theorem
2.1 of Hill [41]. QED.
ACKNOWLEDGEMENTS
We thank Paul Doukhan for comments on an earlier version of the manuscript.
REFERENCES
[1] Andrews, D.W.K. (1988). Laws of Large Numbers for Dependent Non-Identically Distributed
Random Variables, Econometric Theory 4, 458-467.
[2] Arcones, M.A. (1998). The Law of Large Numbers for U-Statistics under Absolute Regularity,
Elect. Comm. Prob. 3, 13-19.
[3] Basrak, B., R.A. Davis, and T. Mikosch (2002). Regular Variation of GARCH Processes,
Stoch. Proc. Appl. 12, 908-920.
[4] Bingham, N.H., C.M. Goldie and J. L. Teugels (1987). Regular Variation. Cambridge University Press: Great Britain.
[5] Birkel, T. (1988). Moment Bounds for Associated Sequences, Ann. Prob. 16, 1184-1193.
[6] Boucheron, S., O. Bousquet, G. Lugosi, and P. Massart (2005). Moment Inequalities for
Functions of Independent Random Variables, Ann. Prob. 33, 514-460.
[7] Breiman, L. (1965). On Some Limit Theorems Similar to the Arc-Sine Law, Theory Prob.
Appl. 10, 351–360.
[8] Brockwell, P.J. and D.B.H. Cline (1985). Linear Prediction of ARMA Processes with In…nite
Variance, Stoch. Proc. Appl. 19, 281-296.
19
!
[9] Burkholder, D.L. (1973). Distribution Function Inequalities for Martingales, Ann. Prob. 1,
19-42.
[10] Chandra, T.K., and S. Ghosal (1996). The Strong Law of Large Numbers for Weighted
Averages Under Dependence Assumptions, J. Theor. Prob. 9, 797-809.
[11] Cline, D.B.H. (1986). Convolution Tails, Product Tails and Domains of Attraction, Prob.
Theory Rel. Fields 72, 529–557.
[12] Cline, D.B.H. (2007). Regular Variation of Order 1 Nonlinear AR-ARCH models, Stoch.
Proc. Appl. 117, 840-861.
[13] Csörg½o, S., E. Haeusler, and D.M. Mason (1988). The Asymptotic Distribution of Trimmed
Sums, Ann. Prob. 16, 672-699.
[14] Davidson, J. (1993). An L1-Convergence Theorem for Heterogeneous Mixingale Arrays with
Trending Moments, Stat. Prob. Let. 16, 301-304.
[15] Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press: Oxford.
[16] Davidson, J. (2004). Moment and Memory Properties of Linear Conditional Heteroscedasticity Models, and a New Model, J. Bus. Econ. Statistics 22, 16-29.
[17] Davidson, J. and R. de Jong (2000). Consistency of Kernel Estimators of Heteroscedastic
and Autocorrelated Covariance Matrices, Econometrica 68, 407-423.
[18] Davydov, Y.A. (1968). Convergence of Distributions Generated by Stationary Stochastic
Processes, Theory Prob. App. 13, 691-696.
[19] de Jong, R.M. (1995). Laws of Large Numbers for Dependent Heterogeneous Processes,
Econometric Theory 11, 347-358.
[20] de Jong, R.M. (1996). A Strong Law of Large Numbers for Triangular Mixingale Arrays,
Stat. Prob. Let. 27, 1-9.
[21] de Jong, R. (1997). Central Limit Theorems for Dependent Heterogeneous Random Variables, Econometric Theory 13, 353-367.
[22] de la Peña, V.H., R. Ibragimov, and S. Sharakhmetov (2003). On Extremal Distributions
and Sharp Lp-Bounds for Sums of Multilinear Forms, Ann. Prob. 31, 630-675.
[23] Dedecker, J. and P. Doukhan (2003). A New Covariance Inequality and Applications, Stoch.
Proc. Appl. 106, 63-80.
[24] Dehling, H.G. and O.Sh. Sharipov (2009). Marcinkiewicz–Zygmund Strong Laws for UStatistics of Weakly Dependent Observations, Stat. Prob. Let. 79, 2028-2036.
[25] Doob, J.L. (1953). Stochastic Processes, John Wiley: New York.
[26] Doukhan, P. (1994). Mixing: Properties and Examples, Lecture Notes in Statistics 85.
Springer: New York.
[27] Doukhan, P. and S. Louhichi (1999). A New Weak Dependence Condition and Applications
to Moment Inequalities, Stoch. Proc. Appl. 84, 313-342.
[28] Doukhan, P. and M. Neumann (2007). Probability and Moment Inequalities for Sums of
Weakly Dependent Random Variables, with Applications, Stoch. Proc. Appl. 117, 878-903.
[29] Doukhan, P. and O. Wintenberger (2008). Weakly Dependent Chains with In…nite Memory,
Stoch. Proc. Appl. 118, 1997-2013.
[30] Feller, W. (1971). An Introduction to Probability Theory and its Applications, 2nd ed., Vol.
2. Wiley: New York.
[31] Gallant, A. R. and H. White (1988). A Uni…ed Theory of Estimation and Inference for
Nonlinear Dynamic Models, Basil Blackwell: Oxford.
[32] Goldie, C.M., and R.L. Smith (1987). Slow Variation with Remainder: Theory and Applications, Q. J. Math. 38, 45-71.
20
[33] Gri¢ n, P.S. and F.S. Qazi (2002). Limit Laws of Modulus Trimmed Sums, Ann. Prob. 30,
1466-1485.
[34] Haeusler, E., and J. L. Teugels (1985). On Asymptotic Normality of Hill’s Estimator for the
Exponent of Regular Variation, Ann. Stat. 13, 743-756.
[35] Hahn, M.G., J. Kuelbs and J. Samur (1987). Asymptotic Normality of Trimmed Sums of
Mixing Random Variables, Ann. Prob. 15, 1395-1418.
[36] Hahn, M.G., J. Kuelbs and D.C. Weiner (1990). The Asymptotic Joint Distribution of SelfNormalized Censored Sums and Sums of Squares, Ann. Prob. 18, 1284-1341.
[37] Hansen, B.E. (1991). Strong Laws for Dependent Heterogeneous Processes, Econometric
Theory 7, 213-221.
[38] Hansen, B.E. (1992). Erratum: Strong Laws for Dependent Heterogeneous Processes, Econometric Theory 8, 421-422.
[39] Hill, J.B. (2009). On Functional Central Limit Theorems for Dependent, Heterogeneous
Arrays with Applications to Tail Index and Tail Dependence Estimation, Journal of Statistical
Planning and Inference: 139, 2091-2110.
[40] Hill, J.B. (2010). On Tail Index Estimation for Dependent, Heterogeneous Data, Econometric Theory 26, 1398-1436.
[41] Hill, J.B. (2011a). Tail and Non-Tail Memory with Applications to Extreme Value and Robust Statistics, Econometric Theory: forthcoming.
[42] Hill, J.B. (2011b). Extremal Memory of Stochastic Volatility with an Application to Tail
Shape Inference, J. Stat. Plan. Infer. 141, 663-676
[43] Hitczenko, P. (1990). Best Constants in Martingale Version of Rosenthal’s Inequality, Ann.
Prob. 18, 1656-1668.
[44] Hitczenko, P. (1994). On a Domination of Sums of Random Variables by Sums of Conditionally Independent Ones, Ann. Prob. 22, 453-468.
[45] Hsing, T. (1991). On Tail Index Estimation Using Dependent Data, Ann. Statist. 19, 15471569.
[46] Ibragimov, I.A. (1962). Some Limit Theorems for Stationary Processes, Theory Prob. Appl.
7, 349-382.
[47] Ibragimov, I.A. and Yu. V. Linnik (1971). Independent and Stationary Sequences of Random Variables, Walters-Noordho¤: Groningen.
[48] Ibragimov, R. and Sh. Sharakhmetov (1997). On An Exact Constant for the Rosenthal
Inequality, Theory Prob. Appl.42 294-302.
[49] Ibragimov, R. and Sh. Sharakhmetov (1998). Exact Bounds on the Moments of Symmetric
Statistics. In Seventh Vilnius Conference on Probability Theory and Mathematical Statistics,
22nd European Meeting of Statisticians, Abstracts of Communications 243-244.
[50] Kesten, H. (1973) Random Di¤erence Equations and Renewal Theory for Products of Random Matrices, Acta Mathematica 131, 207–248.
[51] Leadbetter, M.R., G. Lindgren and H. Rootzén (1983). Extremes and Related Properties of
Random Sequences and Processes. Springer-Verlag: New York.
[52] Marcinkiewicz, J. and A. Zygmund (1937). Sur Les Foncions Independantes, Fundamenta
Mathematicae 28, 60–90.
[53] McLeish, D.L. (1975a). Maximal Inequality and Dependent Strong Laws, Ann. Prob. 3,
829-839.
[54] McLeish, D.L. (1975b). Invariance Principles for Dependent Variables, Prob. Theory Rel.
Fields 32, 165-178.
21
[55] Meitz M. and P. Saikkonen (2008). Stability of Nonlinear AR-GARCH Models, J. Time Ser.
Anal. 29, 453-475.
[56] Newcomb, S. (1886). A Generalized Theory of the Combination of Observations so as to
Obtain the Best Result, Amer. J. Math. 8, 343-366.
[57] Nze, P.A., and P. Doukhan (2004). Weak Dependence: Models and Applications to Econometrics, Econometric Theory 20, 995-1045.
[58] Osekowski,
¾
A. (2011). Sharp Maximal Inequality for the Moments of Martingales and Nonnegative Submartingales, Bernoulli: in press.
[59] Pruitt W. (1985). Sums of Independent Random Variables with the Extreme Terms Excluded, in J.N. Srivastava (eds.), Probability and Statistics: Essays in Honor of Franklin A.
Graybill. Elsevier: North Holland.
[60] Resnick, S.I. (1987). Extreme Values, Regular Variation and Point Processes. SpringerVerlag: New York.
[61] Rio, E. (1993). Covariance Inequalities for Strong Mixing Processes, Ann. de l’Inst. Henri
Poincaré B 29, 587-597.
[62] Rio, E. (1995). A Maximal Inequality and Dependent Marcinkiewicz-Zygmund Strong Laws,
Ann. Prob. 23, 918-937.
[63] Rootzén, H. (2009). Weak Convergence of the Tail Empirical Function for Dependent Sequences, Stoch. Proc. Appl. 119, 468-490.
[64] Rosenthal, H.P. (1970). On the Subspace Lp (p > 2) Spanned by Sequences of Independent
Random Variables, Isreal J. Math. 8, 273-303.
[65] Shao, Q-M. (1995). Maximal Inequalities for Partial Sums of p-mixing Sequences, Ann. Stat.
23, 948-965.
[66] Stigler, S.M. (1973). The Asymptotic Distribution of the Trimmed Mean, Ann. Stat. 1,
472-477.
[67] Teicher, H. (1985). Almost Certain Convergence in Double Arrays, Prob. Theory Rel. Fields
69, 331-345.
[68] von Bahr, B. and C-G. Esséen (1965). Inequalities for the rth Absolute Moment of a Sum
of Random Variables, 1 r 2, Ann. Math.l Stat. 36, 299-303.
[69] Weiner, D.C. (1991). Center, Scale and Asymptotic Normality for Censored Sums of Independent, Nonidentically Distributed Random Variables, in M.G. Hahn, J. Kuelbs and D.C.
Weiner (ed.’s), Sums, Trimmed Sums and Extremes. Birkhäuser: Berlin.
[70] Wu, W.B., W. Min (2005). On linear processes with dependent innovations. Stoch. Proc..
Appl. 115, 939–958.
[71] Yanjiao, M. and L. Zhengyan (2009). Maximal Inequalities and Laws of Large Numbers for
Lq-Mixingale Arrays, Stat. Prob. Let. 79, 1539 1547.
[72] Yokoyama, R. (1980). Moment Bounds for Stationary Mixing Sequences, Prob. Theory Rel.
Fields 52, 45-57.
Jonathan B. Hill
Dept. of Economics
University of North Carolina - Chapel Hill
22
[email protected]
www.unc.edu/ jbhill