Heavy Tail Robust Frequency Domain Estimation
Jonathan B. Hill∗
and Adam McCloskey†
Dept. of Economics
Dept. of Economics
University of North Carolina
Brown University
September 12, 2014
Abstract
We develop heavy tail robust frequency domain estimators for covariance stationary time series
with a parametric spectrum, including ARMA, GARCH and stochastic volatility. We use robust
techniques to reduce the moment requirement down to only a finite variance. In particular, we
negligibly trim the data, permitting both identification of the parameter for the candidate model,
and asymptotically normal frequency domain estimators, while leading to a classic limit theory
when the data have a finite fourth moment. The transform itself can lead to asymptotic bias in the
limit distribution of our estimators when the fourth moment does not exist, hence we correct the
bias using extreme value theory that applies whether tails decay according to a power law or not.
In the case of symmetrically distributed data, we compute the mean-squared-error of our biased
estimator and characterize the mean-squared-error minimization number of sample extremes. A
simulation experiment shows our QML estimator works well and in general has lower bias than
the standard estimator, even when the process is Gaussian, suggesting robust methods have merit
even for thin tailed processes.
Key words and phrases: Frequency domain QML, Whittle estimation, heavy tails, robust estimation.
JEL classifications : C13, C22, C49.
AMS subject classifications : 62M15, 62F35.
1
Introduction
Let {yt }t∈Z be a stationary ergodic process with E[yt2 ] < ∞. We consider those processes with
a parametric spectrum f (λ, θ) at frequency λ, where θ is a k × 1 vector of unknown parameters.
We assume there exists a unique point θ0 in the interior of a compact subset Θ ⊂ Rk such that
f (λ, θ0 ) is the spectrum of yt . Specifically, f (λ, θ) > 0 for all points (λ, θ) ∈ [−π, π] × Θ, and
∗
First, and corresponding, author.
Dept.
www.unc.edu/∼jbhill; [email protected].
†
of Economics, University of North Carolina, Chapel Hill;
Dept. of Economics, Brown University, adam [email protected].
1
f (·, θ) is twice continuously differentiable. This applies to ARMA processes, squares of GARCH
processes, and log-squares of stochastic volatility processes, each stationary with a finite second
moment. We present frequency domain [FD] estimators of θ0 that are robust to heavy tails: our
estimators are asymptotically normal and unbiased under mild regularity conditions, as long as E[yt2 ]
< ∞. Frequency domain methods are useful for estimation at a given set of time series periodicities,
especially at economic business cycle frequencies (see Granger, 1966; Qu and Tkachenko, 2012), and
recently for estimation robust to trends and level shifts (McCloskey, 2013; McCloskey and Perron,
2013; McCloskey and Hill, 2014).
The observed sample is {yt }Tt=1 with sample size T ≥ 1. The discrete Fourier transform and
periodogram of {yt } at frequency λ ∈ [−π, π] are respectively defined as follows:
wT (λ) ≡ √
T
1 X −iλt
yt e
and IT (λ) ≡ |wT (λ)|2 .
2πT t=1
Define Fourier frequencies λj ≡ 2πj/T for j = −[T /2] + 1, . . . , [T /2] − 1, [T /2]. If E[yt4 ] = ∞ then
Whittle (1953)’s estimator, and the FD-QML estimator (e.g. Whittle, 1953)
arg min
θ∈Θ
X
j∈F
IT (λj )
ln f (λj , θ) +
f (λj , θ)
where F ≡ (−T /2, T /2] ∩ Z \ {0},
are not known to be asymptotically normal. See Hannan (1973a,b), Dunsmuir and Hannan (1976),
Dunsmuir (1979), and Hosoya and Taniguchi (1982).
We exploit recent developments in the heavy tail robust estimation literature to create new robust
FD-QML and Whittle estimators. Tail-trimming in the time domain has been used to develop,
amongst others, robust estimators for autoregressions and GARCH models with possibly heavy tailed
errors (Hill, 2012b, 2014a); robust moment condition tests and tests of volatility spillover (Hill, 2012a;
Hill and Aguilar, 2013; Aguilar and Hill, 2014); and robust moment estimators (see, e.g., Khan and
Tamer, 2010; Hill, 2013; Chaudhuri and Hill, 2014).
We use transformed data yt yt−h I(|yt yt−h − γ| < c) for estimating E[yt yt−h ], where γ is a value used
for centering. Such transforms have a long history for robustness against so called outliers and sample
extremes. See, for example, Andrews, Bickel, Hampel, Huber, Rogers, and Tukey (1972), Huber
(1964), Hampel (1974), and Hampel, Ronchetti, Rousseeuw, and Stahel (1986). An intermediate
order statistic of |yt yt−h − γ| is used for c such that c → ∞ as T → ∞, which ensures we identify
θ0 , and allows us to control the amount of trimming (see, e.g., Hill, 2012a,b, 2013, 2014a; Hill and
Aguilar, 2013; Chaudhuri and Hill, 2014). Centering with γ = E[yt yt−h ] allows us to identify E[yt yt−h ]
when yt yt−h has a symmetric distribution, diminishing small sample bias of our estimator. Thus, we
center for trimming when yt yt−h can possibly have a symmetric distribution. See Section 2.
Other transforms can be used, including smoothed versions of yt yt−h I(|yt yt−h − γ| < c) like
Tukey’s bisquare (cf Andrews, Bickel, Hampel, Huber, Rogers, and Tukey, 1972; Hampel, Ronchetti,
Rousseeuw, and Stahel, 1986). However, our bias correction exploits Karamata theory for tail2
trimmed moments, and this places severe restrictions on the available transforms, and excludes
conventional transforms like Tukey’s. Thus, we focus on simple trimming.
We do not employ truncation sign(x) × min{|x|, c} because in the present context either it results
in estimator bias that cannot be reduced by our bias correction methods when c is fixed, or it does not
lead to an asymptotically normal estimator when c → ∞. In general, truncation or trimming with
a bounded threshold c → (0, ∞), will promote a bounded influence function, and therefore lead to
infinitessimal robustness (Hampel, 1974). The cost, however, is asymptotic bias that is not corrected
by our methods, although such bias in principle can be corrected by using indirect inference and an
assumed error distribution (e.g., Genton and Ronchetti, 2003; Mancini, Ronchetti, and Trojani, 2005).
See, e.g., Martin and Thomson (1982) for data contamination robust spectral density estimation. This
literature focuses on practice at the expense of theory details, and it does not appear to be extended
to parameter estimation with asymptotic theory.
Even with negligibility due to c → ∞, the transform yt yt−h I(|yt yt−h − γ| < c) can lead to asymptotic bias in the limit distribution of our estimators due to potential asymmetry in the distribution
of yt yt−h , and necessarily due to asymmetry in the distribution of yt2 . This bias may not vanish fast
enough asymptotically when yt has an infinite fourth moment, leading to bias in the limit distribution of our estimators of θ0 . We therefore estimate and remove the bias using extreme value theory
developed in Hill (2013), cf. Peng (2001), and show that power laws are not actually required for our
bias estimators to work in practice.
A necessary trade-off arises from robustifying sample correlations against heavy tails. We implicitly assume yt has a finite variance, and when E[yt4 ] = ∞ then our estimator has a sub-T 1/2
convergence rate. This applies to estimators of ARMA and stochastic volatility models, which are
often estimated in the frequency domain. Further, in the GARCH case our methods cover the square
yt = x2t of GARCH xt , hence xt must have a finite fourth moment. Nevertheless, we show by simulation experiment that our tail-trimmed FD-QML estimator performs better than the conventional
FD-QML estimator in thin and thick tail cases: trimming just a few large values and using a bias
correction strongly improves small sample bias, efficiency, and test performance, whether tails are
thin or thick.
It is also interesting to note that, in principle, we do not need covariance stationarity for our robust
Whittle estimator to be valid. For example, an AR(1) yt = φ0 yt−1 + t with |φ0 | < 1, iid t and σ02 =
E[2t ] < ∞, has a spectrum σ02 (2π)−1 ω(λ, φ0 ) where ω(λ, φ0 ) = 1/(1 + φ20 − 2φ0 cos(λ)). Even when
E[2t ] = ∞, we may evidently use ω(λ, φ) to identify φ0 based on our tail-trimmed Whittle estimator,
similar to Mikosch, Gadrich, Kluppelberg, and Adler (1995). However, we conjecture that our robust
Whittle estimator is still asymptotically normal, contrary to Mikosch, Gadrich, Kluppelberg, and
Adler (1995). We do not tackle the theory here in order to focus ideas, and therefore always maintain
covariance stationarity.
There are various efforts to study frequency domain estimators for heavy tailed data. Mikosch,
Gadrich, Kluppelberg, and Adler (1995) characterize the Whittle estimator for infinite variance
ARMA models, and derive its non-standard limit distribution. Li (2008, 2010) develops Laplace
3
and Lp -moment frequency domain estimators by replacing the usual periodogram with alternative
ones for linear models yt = θ0 xt + t , where t is finite dependent or a linear function of iid random
variables. Our methods, by comparison, focus on the Fourier based spectrum by exploiting a robust
sample version of E[yt yt−h ], and our spectrum class and weak dependence assumptions discussed in
Section 2 cover a far larger class of processes than allowed in Li (2008, 2010). See also Shao and
Wu (2007) for recent work on spectral estimation theory for nonlinear processes with more than a
fourth moment under a geometric moment contraction assumption (as opposed to a mixing condition). We require a finite second moment and a positive continuous spectrum, allowing for geometric
or hyperbolic memory decay in the form of a mixing condition.
Spectral analysis has been extended to indicator transforms, quantiles and copulas, all of which
are inherently robust to extreme values (Dette, Hallin, Kley, and Volgushev, 2011; Lee and Rao, 2011;
Hagemann, 2012), or are explicitly constructed for extreme values (Mikosch and Zhao, 2014). In these
cases, spectral density estimators are not proposed to estimate model parameters, but to describe
underlying dependence in a time series. Indeed, little is known about how these various spectra can
be used to identify model parameters.
The remaining sections of this paper are as follows. In Section 2 we present the robust estimator
of E[yt yt−h ] and FD-QML estimator, and tackle bias correction in Section 3. We discuss choosing
the amount of trimming in Section 4, a simulation study follows in Section 5, and parting comments
are left for Section 6. Since theory for our Whittle estimator is similar to our FD-QML estimator,
we present those details in Section C of the supplemental material Hill and McCloskey (2014).
We use the following notation conventions. Drop θ0 and write f (λ) = f (λ, θ0 ). All random
variables lie on a common probability measure space (Ω, F,P). ||A|| is the spectral norm of matrix A;
[z] rounds a scalar z to the nearest integer. L(c) denotes a slowly varying function: limc→∞ L(λc)/L(c)
= 1 ∀λ > 0 (e.g. a(ln(c))b for finite a > 0 and b ≥ 0). K > 0 is a finite constant that may change from
place to place. I(·) is the indicator function: I(A) = 1 if A is true, else I(A) = 0. Unless otherwise
specified, all limits are as T → ∞.
2
Robust Frequency Domain-QML
2.1
Robust FD-QML
Let x be an arbitrary F-measurable random variable. Centering with the mean γ = E[x] ensures
E[xI(|x − γ| < c)] identifies E[x] when x has a symmetric distribution since, in this case, E[xI(|x
− γ| < c)] = E[x] × E[I(|x −E[x]| < c)]. If x = yt yt−h ≥ 0 a.s. then mean-centering is irrelevant,
hence γ = 0.1 Examples include h = 0, or yt is squared GARCH, or log-squared stochastic volatility
with returns in [−1, 1].
1
Our weak dependence assumptions rule out P (yt yt−h < 0) = 1, hence P (yt yt−h ≥ 0) > 0.
4
2.1.1
Sequences for centering and trimming
We require sequences for centering {γ̃T,h }, for determining the amount of trimming {kT,h }, and for
controlling the number of lags {bT } of cross-product moment estimators used in constructing the
criterion function. First, {γ̃T,h }. It is preferrable to ensure the transformed sample moments are
unbiased whenever possible, and it is possible to do so when yt yt−h has a symmetric distribution. We
may not know whether yt yt−h has a symmetric distribution, but it cannot when the distribution is
non-degenerate with support [0, ∞). Thus, the quantity we recommend for centering in practice is:
γ̃T,h ≡
1
T −h
T
P
yt yt−h if P (yt yt−h > 0) < 1
t=h+1
0
for each exploited lag h ≥ 0.
if P (yt yt−h > 0) = 1
Since mean-centering is only helpful when yt yt−h has a symmetric distribution, if we know yt yt−h has
an asymmetric distribution then implicitly γ̃T,h = 0. Thus, γ̃T,0 = 0 since yt2 > 0 a.s., and when yt
is a squared GARCH process then γ̃T,h = 0 for each h.
It is important to understand that centering does not completely remove bias in our estimator since
the transformed yt yt−h at h = 0 will be biased, which adds to the small sample (and possibly limit
distribution) bias of our FD estimators. This is irrelevant because we remove all bias by characterizing
and estimating it in Section 3. Mean centering, however, does lead to an improved FD estimator in
controlled experiments when yt yt−h has a symmetric distribution.2
−h
to determine which yt yt−h are trimmed, so define:
Second, {kT,h }. We use {yt yt−h − γ̃T,h }Tt=1
(0)
(0)
(0)
(0)
Ybh,t ≡ |yt yt−h − γ̃T,h | with order statistics Ybh,(1) ≥ Ybh,(2) ≥ · · · Ybh,(T −h) .
(0)
The chosen threshold is the order statistic Ybh,(kT,h ) , where {kT,h } is an intermediate order sequence:
kT,h ∈ {1, ..., T − h} and
kT,h → ∞ and
kT,h
→ 0 for each exploited lag h ≥ 0.
T −h
(1)
The transformation we use is then
(0)
yt yt−h I |yt yt−h − γ̃T,h | < Ybh,(kT,h ) ,
p
(0)
while kT,h /(T − h) → 0 ensures negligibility yt yt−h I(|yt yt−h − γ̃T,h | < Ybh,(kT,h ) ) → yt yt−h .
Finally, {bT }. Trimming negligibility (1) naturally places a restriction on the number of usable
lags h. Consider that h = T − a for constant a ≥ 1 implies kT,T −a ∈ {1, ..., a} ∀T , which contradicts
2
In the Section 5 simulation study we use mean-centering for AR model estimation and h ≥ 1. In experiments not
reported here we find that our FD-QML estimator has slightly higher small sample bias when mean-centering is not
used.
5
the need for kT,h → ∞ to ensure robustness to heavy tails. Usable lags are therefore
h = {0, 1, ..., bT }
for a sequence of bandwidths {bT } that satisfy
bT ≤ T − 1, bT → ∞, and T − bT → ∞.
Examples of {kT,h , bT } include bT = [δb (T − 1)]ab and kT,h = [δk,h (ln(T ))ak,h ] for any h, and some ab
∈ (0, 1], δb ∈ (0, 1), and δk,h , ak,h > 0.3
2.1.2
Robust Periodogram and FD-QML
The periodogram has the well known expansion
2
)
(
T
T
T
−1
T
1 X
X
1 X
1
1X 2
−iλt yt yt−h cos(λh) ,
yt e
yt + 2
IT (λ) ≡ √
=
2πT
2π T
T
t=1
t=1
hence it is a function of unbiased estimators 1/T
h=1
PT
t=h+1 yt yt−h
(2)
t=h+1
of (T − h)T −1 E[yt yt−h ]. Our robust
estimator of (T − h)T −1 E[yt yt−h ] is then:
b∗ (c)
γ̂
T,h
≡
T −h
1
T −h−kT,h T
1
T
T
P
T
P
yt yt−h I
t=h+1
yt yt−h −
1
T −h
!
if P (yt yt−h > 0) < 1
yt yt−h < c
t=h+1
T
P
yt yt−h I (|yt yt−h | < c)
if P (yt yt−h > 0) = 1
t=h+1
(3)
PT
b∗ (Yb(0)
In view of (2), we simply use γ̂
T,h
t=h+1 yt yt−h :
h,(kT,h ) ) in place of 1/T
IbT∗ (λ)
1
≡
2π
b∗ (Yb(0)
γ̂
T,0
0,(kT,h ) )
+2
bT
X
b∗ (Yb(0)
γ̂
T,h
h,(kT,h ) )
!
× cos(λh) ,
(4)
h=1
hence our negligibly transformed FD-QML estimator of θ0 is
θ̂T∗
= arg min
θ∈Θ
X
j∈F
Ib∗ (λj )
ln f (λj , θ) + T
f (λj , θ)
!
where F ≡ (−T /2, T /2] ∩ Z \ {0}.
b∗ (c) ensures identification of (T − h)T −1 E[yt yt−h ] when yt yt−h has a
The construction of γ̂
T,h
symmetric distribution. Simply note that for all asymptotic arguments we can replace the centering
3
In practice we obviously want bT to be close to T − 1, but our bias correction is based on tail exponent estimators,
−h
and for large h there are very few extreme values to work with from the sample {yt yt−h − γ̃T,h }Tt=1
. We find in our
.95
simulation experiments that bT = T
works well for any T ≥ 100. Notice bT ≈ 80 when T = 100, while larger values
do not necessarily permit viable bias estimates for h near bT .
6
1/(T − h)
PT
t=h+1 yt yt−h
(0)
with E[yt yt−h ], and the threshold Yb0,(kT,h ) with the non-random sequence
{cT,h } that satisfies P (|yt yt−h − E[yt yt−h ]| > cT,h ) = kT,h /(T − h). See Section 2.3, and see Lemma
A.7 in Appendix A.2. Then, as per the discussion on mean-centering at the top of this section:
T −h
T −h
E [yt yt−h I (|yt yt−h − E [yt yt−h ]| < cT,h )]
T − h − kT,h T
T −h
T −h
P (|yt yt−h − [Eyt yt−h ]| < cT,h ) ×
E [yt yt−h ]
T − h − kT,h
T
kT,h
T −h
T −h
T −h
=
1−
×
E [yt yt−h ] =
E [yt yt−h ] .
T − h − kT,h
T −h
T
T
=
bT
b∗ (Yb(0)
Notice IbT∗ (λ) can in principle be negative for a finite sample because the sequence {γ̂
T,h
h,(kT,h ) )}h=0
need not be non-negative definite simply due to trimming (kT,h ≥ 1) and lag truncation (bT ≤ T −
1). Hence, strictly speaking Ib∗ (λ) is not a periodogram. This is irrelevant here since we only need
T
to estimate θ0 , and IbT∗ (λ) can be used for a consistent and asymptotically normal estimator.
2.2
Assumptions
We present first the assumptions and then the main results. Asymptotic theory requires a sequence
(0)
{cT,h } of non-random numbers cT,h ≥ 0 that the order statistic Yb
of |yt yt−h − γ̃T,h | approxih,(kT,h )
mates. Since γ̃T,h exists for lags h = 0, ..., bT , and T − bT → ∞, it follows that under mean-centering
P
γ̃T,h = (T − h)−1 Tt=h+1 yt yt−h is consistent for E[yt yt−h ] under regularity assumptions detailed
below. Define the population centering parameter:
γ̃h ≡
E [yt yt−h ] if P (yt yt−h > 0) < 1
0
for h ≥ 0.
(5)
if P (yt yt−h > 0) = 1
Now let {cT,h } be any (possibly non-unique) sequence that satisfies:
P (|yt yt−h − γ̃h | ≥ cT,h ) =
kT,h
for h ≥ 0.
T −h
(6)
We need to impose bounds on dependence coefficients. Define =ts ≡ σ(yτ : s ≤ τ ≤ t), a measure
space (Ω, σ(∪t∈Z =t−∞ ), P), and α-mixing coefficients αh ≡ supA⊂=t−h ,B⊂=∞ |P(A ∩ B) − P(A)P (B)|.
−∞
t
Next, let L2 (A) ≡ L2 (Ω, A, P) be the space of A-measurable L2 -bounded random variables for A
⊆ σ(∪t∈Z =t−∞ ). Define ρ(A, B) ≡ supf ∈L2 (A),g∈L2 (B) |corr(f, g)| and let Sh and Th be non-empty
subsets of N with inf s∈Sh ,t∈Th {|s − t|} ≥ h. Then the interlaced maximal correlation coefficient is ρ∗h
≡ supSh ,Th ρ(σ(yt : t ∈ Sh ), σ(ys : s ∈ Th )), where the supremum is taken over all Sh and Th (see
Bradley, 1993).
Assumption A (data generating process).
7
1. {yt yt−h } is a stationary, Lp -bounded process, p > 1, with an absolutely continuous non-degenerate
distribution with unbounded support. Further, yt has absolutely summable covariances, and is αmixing αh = O(h−p/(p−2) ) with ρ∗1 < 1.
2. yt has spectrum f (λ, θ0 ) for unique θ0 in the interior of compact Θ ⊂ Rk with properties:
(i) if θ0 6= θ then f (λ; θ) 6= f (λ; θ0 );
(ii) 0 < f (λ, θ) ≤ K < ∞ for each λ ∈ [−π, π] and θ ∈ Θ;
(iii) f (λ, θ) is twice continuously differentiable in θ, with derivatives (∂/∂θ)i f (λ, θ) for i = 1, 2
uniformly bounded on [−π, π] × Θ;
(iv) h(λ, θ) ∈ {f (λ, θ), (∂/∂θ)f (λ, θ)} are uniformly Hölder continuous of degree α ∈ (1/2, 1] in
λ: supθ∈Θ ||h(λ, θ) − h(ω, θ)|| ≤ K|λ − ω|α for all λ, ω ∈ [−π, π] and some K > 0.
3. inf θ∈Θ ||H(θ)|| > 0.
Remark 1 Distribution continuity A.1 simplifies order statistic asymptotics. The assumption can
always be assured by adding a small iid noise with a continuous distribution to yt . Geometric α-mixing
arises in stationary ARMA, squared GARCH, and log-squared autoregressive stochastic volatility, as
well as a variety of nonlinear processes. See, e.g., Doukhan (1994), Carrasco and Chen (2002) and
Meitz and Saikkonen (2008).
Remark 2 Assumption A.2 is essentially C2.1, C2.2 and C2.4 in Dunsmuir (1979) since θ0 in the
interior of compact Θ is presumed by Dunsmuir (1979). We relax Dunsmuir (1979)’s C2.3 since that
assumes {yt } has a finite fourth moment. Hölder continuity Assumption A.2.iv with any degree α ∈
(1/2, 1] applies to ARMA, squared GARCH, and log-squared autoregressive stochastic volatility on
some compact Θ containing θ0 .
1/2
(0)
Remark 3 The mixing rate αh = O(h−p/(p−2) ) ensures kT,h -convergence for the thresholds Ybh,(kT,h ) ,
cf. Hill (2010, 2014b). Standardized partial sums of the transformed variables, however, may only
have a second moment asymptotically, hence we must exploit a dependence property other than αmixing for a central limit theory to apply. Indeed, it will not help to further restrict the rate αh →
0 since most central limit theorems for dependent arrays require more than a second moment (see,
e.g., Ibragimov, 1975; Bradley, 1992). Further, in order to reduce the number of cases in proofs,
we desire partial sum variances to be strictly positive, a well known challenge (see Ibragimov, 1962,
1975; Dehling, Denker, and Phillip, 1986; Peligrad, 1996). A simple way to ensure a Gaussian limit
theory, and partial sum variance positivity, is to impose αh → 0 and ρ∗1 < 1. Of course, αh → 0 and
the existence of a continuous positive spectral density suffice as well (cf. Ibragimov, 1962) since that
implies ρ∗1 < 1 (Bradley, 1992), but it is unknown whether any of the triangular arrays in this paper
that arise from the sample-size specific data transformation have positive spectra at frequency zero.
Remark 4 We use A.3 to show that θ̂T∗ is the same asymptotically if we replace the stochastic
(0)
threshold Ybh,(kT,h ) with the deterministic one cT,h . See the proof of consistency Theorem 2.1.
8
We assume yt yt−h have regularly varying probability tails in order to allow for heavy tails. Recall
γ̃h = E [yt yt−h ] if P (yt yt−h > 0) < 1, else γ̃h ≡ 0.
Assumption B (regularly varying tails).
P (yt yt−h − γ̃h ≤ −c) = Lh,1 (c)c−κh,1 and P (yt yt−h
− γ̃h ≥ c) = Lh,2 (c)c−κh,2 where Lh,i (c) are slowly varying, and κh,i > 1.
Remark 5 The tail index κh,0 ≡ min{κh,1 , κh,2 } is identically the moment supremum κh,0 = arg sup{α
> 0 : E|yt yt−h |α < ∞} > 1, hence κ0,0 = κ/2 and
P (|yt yt−h − γ̃h | ≥ c) = Lh,0 (c)c−κh,0 (1 + o(1)) where Lh,0 (c) = Lh,1 (c) + Lh,2 (c).
(7)
See Resnick (1987). Since yt2 ≥ 0 only has a right tail, notice κ0,0 = κ0,2 = κ/2 > 1 and L0,0 (c) =
L0,2 (c). In general the moment supremum κh,0 ∈ [κ/2, κ] hence κh,0 ≥ κ0,0 . Power law tails in linear
and random volatility processes is extensively treated (e.g. Brockwell and Cline, 1985; Basrak, Davis,
and Mikosch, 2002; Davis and Mikosch, 2009), while their extensions two products is tackled, e.g., in
Cline (1986) and Mikosch and Starica (2000).
P
Finally, under mean-centering we require that (T − h)−1 Tt=h+1 yt yt−h does not asymptoti(0)
cally impact the order statistic Ybh,(kT,h ) . By exploiting a maximal inequality in Hansen (1991), we
P
use the non-sharp bound (T − h)−1 Tt=h+1 yt yt−h = E[yt yt−h ] + Op (1/T ι ) for tiny ι > 0 in our
(0)
1/2
1/2
proof of Yb
/cT,h = 1 + Op (1/k ), and Op (1/T ι ) = op (1/k ) when kT,h → ∞ at most at a
h,(kT,h )
T,h
T,h
slowly varying rate. A sharper bound can be obtained if we know the rate of convergence of (T −
P
h)−1 Tt=h+1 yt yt−h , but this requires knowledge of the rate of tail decay of yt , and in general requires
additional information about probability tails since yt yt−h is weakly dependent (e.g. Bartkiewicz,
Jakubowski, Mikosch, and Wintenberger, 2010).
Assumption C (trimming and bandwidth rates).
1. In general kT,h → ∞, kT,h /(T − h) → 0, and kT,h ∼ Kh,h̃ kT,h̃ for some Kh,h̃ > 0 and each h̃, h
∈ {0, ..., bT }. If mean-centering at lag h is used then kT,h → ∞ at most at a slowly varying rate.
2. Let bT ≤ T − 1, bT → ∞, and T − bT → ∞. Further bT /T 1/(2α) → ∞ where α ∈ (1/2, 1] is the
Hölder continuity degree in Assumption A.2.iv.
Remark 6 kT,h+1 ∼ KkT,h allows us to give a simple characterization of the rate of convergence of
θ̂T∗ . See the proofs of Theorem 2.2, and a key central limit theorem Lemma A.9 in Appendix A.2.
The restriction, though, implies when mean-centering is used for some h that kT,h → ∞ at most at
a slowly varying rate for each h. We discuss in Section 4 that a slowly increasing kT,h → ∞ for all h
both reduces small sample bias, and improves the accuracy of our bias estimator.
Remark 7 bT /T 1/(2α) → ∞ is required since we use Hölder continuity to show feasible and infeasible FD-QML estimators are asymptotically equivalent, and for a central limit theorem, similar to
Dunsmuir (1979, proof of Theorenm 2.1, Corollary 2.2). In our case, however, there are T − 1 Fourier
9
frequencies {λj }j∈F but only bT = o(T ) lags. If Assumption A.2.iv holds for any α ∈ (1/2, 1] then
take α = 1 and assume bT /T 1/2 → ∞. This applies to ARMA, squared GARCH and log-squared
autoregressive stochastic volatility.
2.3
Main Results
Let κ be the moment supremum of yt ,
κ ≡ arg sup {α > 0 : E |yt |α < ∞} ,
and define the scaled gradient
$(λ, θ) = [$i (λ, θ)]ki=1 ≡ −
1
∂
ln f (λ, θ) and $(λ) = $(λ, θ0 )
f (λ, θ) ∂θ
and Hessian:
H(θ) = −
1
2π
Z
π
$(λ, θ)
−π
∂
1
f (λ, θ)dλ −
0
∂θ
2π
Z
π
{f (λ, θ) − f (λ)}
−π
∂
$(λ, θ)dλ.
∂θ
Calling H(θ) a Hessian is justified since asymptotically with probability approaching one H(θ) =
P
(∂/∂θ)2 j∈F {ln f (λj , θ) + IbT∗ (λj )/f (λj , θ)}. See the proof of Theorem 2.1 in Appendix A.3.
The first main result follows. Proofs are given in Appendix A.
p
Theorem 2.1 Under Assumptions A-C we have θ̂T∗ → θ0 .
Remark 8 We cannot prove consistency using less structure on the spectrum. In particular, to
(a)
handle the order statistic Yb
in the trimming indicator, we exploit spectrum differentiability
h,(kT,h )
and Hölder continuity to allow a Taylor expansion argument.
The asymptotic distribution of θ̂T∗ closely follows classic arguments. Define the robust moment
estimator with population mean centering:
∗
γ̂T,h
(c)
≡
T −h
1
T −h−kT,h T
1
T
T
P
T
P
yt yt−h I (|yt yt−h − E [yt yt−h ]| < c) if P (yt yt−h > 0) < 1
t=h+1
(8)
yt yt−h I (|yt yt−h | < c)
if P (yt yt−h > 0) = 1
t=h+1
define a spectral density estimator
IT∗ (λ)
1
≡
2π
∗
γ̂T,0
(cT,0 )
+2
bT
X
h=1
10
!
∗
γ̂T,h
(cT,h ) cos(λh)
,
and construct Hessian, covariance, and scale matrices:
Z
π
∂
∂
(9)
ln f (λ; θ0 ) 0 ln f (λ; θ0 )dλ
∂θ
−π ∂θ
Z π
Z π
0 1
1
∗
∗
∗
∗
ST ≡ T × E
(I (λ) − E [IT (λ)]) $(λ)dλ
(I (λ) − E [IT (λ)]) $(λ)dλ
2π −π T
2π −π T
1
Ω≡
2π
VT = Ω−1 ST Ω−1 .
Throughout $h denotes the hth Fourier coefficient of $(λ) ≡ −(f (λ))−1 (∂/∂θ) ln f (λ).
−1/2
Theorem 2.2 Under Assumptions A-C T 1/2 VT
BT ≡ Ω
−1
d
(θ̂T∗ − θ0 + BT ) → N (0, Ik ) where
bT
n h
i
o
X
1
∗
$
E
γ̂
(c
)
−
E
[y
y
]
.
t
h
t−h
T,|h|
T,|h|
(2π)2 h=−b
T
If yt yt−h is symmetrically distributed for each h > 0 then BT = Ω−1 (2π)−2 $0 {E[yt2 I(yt2 < cT,0 )] −
E[yt2 ]}. In general ||VT || = K||ST || ∼ KE[yt4 I(yt2 < cT,0 )], lim inf T →∞ ||VT || > 0 and T 1/2 /||VT ||1/2
→ ∞. If κ > 4 then ||VT || ∼ K, if κ = 4 then ||VT || → ∞ is slowly varying, and if κ ∈ (2, 4) then
||VT || ∼ K(T /kT,0 )4/κ−1 .
Remark 9 ||ST || is proportional to E[yt4 I(yt2 < cT,0 )] because the trimming fractiles are proportional
kT,h ∼ Kh,h̃ kT,h̃ for Kh,h̃ > 0 under Assumption C.1. If fractile proportionality is relaxed, then in
general we cannot show ||ST || is proportional to a particular trimmed moment, and this complicates
deducing the rate of convergence of θ̂T∗ .
Remark 10 The rate of convergence T 1/2 /||VT ||1/2 depends on tail decay, and if κ < 4 then the rate
is T 1/2 /(T /kT,0 )2/κ−1/2 . By Assumption C.1, if mean-centering is not used then we are free to choose
all kT,h hence kT,0 → ∞ provided kT,0 /T = o(1). By setting kT,0 = T /L(T ) for slowly increasing
L(T ) → ∞ we achieve a rate arbitrarily close to T 1/2 when κ < 4. This is similar to heavy tail robust
estimators in Hill (2012b, 2013, 2014a). If mean-centering is used then kT,h → ∞ no faster than a
slowly varying function, while kT,h are all proportional, hence T 1/2 /(T /kT,0 )2/κ−1/2 cannot be made
arbitrarily close to T 1/2 . See Section 2.2 for a discussion of the assumptions.
Remark 11 ST can be estimated by standard methods (e.g. Chiu, 1988).
The next result summarizes when bias vanishes such that θ̂T∗ is asymptotically unbiased in its
limit distribution. The proof is presented in Hill and McCloskey (2014) since it is based on well
known moment properties by Karamata theory given regular variation Assumption B.
Theorem 2.3 Under Assumptions A-C:
−1/2
i. T 1/2 VT
d
(θ̂T∗ − θ0 ) → N (0, Ik ) if either κ > 4; or κ = 4, kT,h = o(ln(T )) and P (|yt yt−h − γ̃h | ≥
c) = dh,0 c−κh,0 (1 + o(1)) where κ0,0 = 2 ≤ κh,0 ;
11
−1/2
ii. T 1/2 VT
−1/2
d
(θ̂T∗ − θ0 + BT ) → N (0, Ik ) if κ ∈ (2, 4), where T 1/2 ||VT
−1/2
has a symmetric distribution, else limT →∞ T 1/2 ||VT
1/2
BT || ∼ KkT,h → ∞ if yt yt−h
1/2
BT ||/kT,h ∈ [0, ∞).
Remark 12 Case (i) can be expanded to include other “thin tail” cases, for example when yt has
exponential tails. Although we do not provide details here, if limc→∞ cκ P (|yt | > c) = 0 for any finite
κ > 0, then (i) holds provided kT,h / ln(T ) → 0. In general, in the hairline infinite fourth moment
case κ = 4, as long as kT,h → ∞ sufficiently slow, then bias is small and vanishes rapidly enough.
Remark 13 Case (ii) is similar to the far simpler case of a tail-trimmed mean of an invariance
process, noted in Csörgő, Horváth, and Mason (1986).
If E[yt4 ] < ∞ then negligibility ensures the data transformation does not affect the asymptotic
variance, and θ̂T∗ is asymptotically unbiased in its limit distribution by Theorem 2.3.i. The asymptotic
variance has a classic structure if yt is linear,
yt =
∞
X
ξi (θ0 ) t−i where
i=0
∞
X
ξi2 (θ0 ) < ∞, ξ0 (θ0 ) = 1
(10)
i=0
σ 2 (θ0 ) ≡ E[2t ] < ∞ and E[s t ] = 0 ∀s 6= t,
where t is a homoscedastic martingale difference. The following is a consequence of Theorems 2.2 and
2.3, negligibility, dominated convergence, and arguments in Dunsmuir (1979, proof of Theorem 2.1).
P
Notice E[yt4 ] < ∞ ensures under mean-centering and mixing Assumption A.1 (T −h)−1 Tt=h+1 yt yt−h
= E[yt yt−h ] + Op (1/T 1/2 ), hence Assumption C.1 is superfluous by the discussion preceding it in
Section 2.2. See also Remark 17 following Lemma A.4 in Appendix A.1. Define the σ-field =t ≡ σ(yτ
: τ ≤ t).
Corollary 2.4 In addition to Assumptions A and C.2, let yt satisfy (10), and assume E[t |=t−1 ] =
0 a.s., E[2t |=t−1 ] = σ 2 a.s., E[3t |=t−1 ] = s a.s., and E[4t ] = K < ∞. Let kT,h → ∞ and kT,h /(T
d
− h) = o(1). Then T 1/2 (θ̂T∗ − θ0 ) → N (0, V) where V = limT →∞ VT = Ω−1 (2Ω + Π)Ω−1 and
1
Ω≡
2π
3
Z
π
−π
K − 3σ 4 ∂σ 2 (θ0 ) ∂σ 2 (θ0 )
∂ ln f (λ; θ0 ) ∂ ln f (λ; θ0 )
dλ and Π ≡
.
∂θ
∂θ0
σ 8 (θ0 )
∂θ
∂θ0
(11)
Bias Corrected FD-QML
b∗ (Yb(0)
We need a bias-corrected version of γ̂
T,h
h,(kT,h ) ) to ensure asymptotically unbiased FD-QML estimation. If yt yt−h is known to have a symmetric distribution for h ≥ 1, then in theory we need
b∗ (Yb(0)
only treat γ̂
T,0
0,(kT,0 ) ). Our simulation experiments, however, show that for heavy tailed processes
bias estimation and removal is helpful even when true bias is zero. This occurs since T − h may be
very small, and a random draw {yt yt−h }Tt=h+1 may appear asymmetrically distributed. Symmetric
trimming can, therefore, add bias in small samples where none in theory exists.
12
3.1
Bias Correction : Power Law Tails
We must first specify the slowly varying components Lh,i (c) in Assumption B in order to parametrib∗ (Yb(0)
cally characterize, and therefore estimate, bias in γ̂
T,h
h,(kT,h ) ). We assume for convenience:
P (yt yt−h − γ̃h ≤ −c) = dh,1 c−κh,1 (1 + o(1)) and P (yt yt−h − γ̃h ≥ c) = dh,2 c−κh,2 (1 + o(1)) , (12)
where γ̃h = E [yt yt−h ] if P (yt yt−h > 0) < 1, else γ̃h ≡ 0. We exploit a second order version of (12)
under Assumption B0 , below, so that arguments for bias approximation sharpness in Peng (2001) and
Hill (2013) apply. Other tails can in principle be considered, with corrections to arguments in Peng
(2001, p. 259-263) and the following bias formulas.
∗
b∗ (Yb(0)
We can replace γ̂
T,h
h,(kT,h ) ) based on (3) with γ̂T,h (ch,T ) in (8) for all asymptotic arguments:
∗ (c
see Lemma A.7 in Appendix A.2. Thus, we need an expression for E[γ̂T,h
T,h )]. If P (yt yt−h > 0) =
1, then by (12) with κh,2 = κh,0 , and Karamata’s Theorem (see, e.g., (A.3) in Appendix A.2, and see
Resnick, 1987, Theorem 0.6), for h ≥ 0:
∗
(cT,h ) =
E γ̂T,h
∼
=
T −h
{E [yt yt−h ] − E [yt yt−h I (|yt yt−h | ≥ cT,h )]}
T kT,h
T −h
1
E [yt yt−h ] −
cT,h
T
κh,0 − 1 T − h
kT,h
T −h
1
E [yt yt−h ] −
cT,h .
T
κh,0 − 1 T
(13)
1/κ
Now suppose P (yt yt−h > 0) < 1. Since by (12) each cT,h ∼ dh,0 h,0 (T /kT,h )1/κh,0 , hence for h ≥ 0:
E [(yt yt−h − E[yt yt−h ]) I (|yt yt−h − E[yt yt−h ]| ≥ cT,h )]
Z ∞
{P (yt yt−h − E[yt yt−h ] ≥ u) − P (yt yt−h − E[yt yt−h ] ≤ −u)} du
=
cT,h
1−κ
1−κ
dh,1 cT,h h,1
dh,2 cT,h h,2
−κh,2
−κh,1
−
.
=
dh,2 u
− dh,1 u
du ∼
κh,2 − 1
κh,1 − 1
cT,h
Z
∞
Therefore:
∗
E γ̂T,h
(cT,h ) =
=
∼
T −h
T −h
E [yt yt−h I (|yt yt−h − E[yt yt−h ]| < cT,h )]
(14)
T − h − kh,T T
T −h
T −h
E[yt yt−h ] −
E [(yt yt−h − E[yt yt−h ]) I (|yt yt−h − E[yt yt−h ]| ≥ cT,h )]
T
T − h − kh,T
!
1−κ
1−κ
dh,2 cT,h h,2
dh,1 cT,h h,1
T −h
T −h
T −h
E[yt yt−h ] −
−
.
T
T T − h − kh,T
κh,2 − 1
κh,1 − 1
Asymptotic approximations (13) and (14) lead to asymptotically negligible approximation errors
under second order power law Assumption B0 , below, cf. Peng (2001) and Hill (2013). Thus, (13) and
13
∗ (c
−1 E[y y
(14) implicitly define asymptotic approximations of biases −(E[γ̂T,h
t t−h ]):
T,h )] − (T − h)T
RT,h ≡
kT,h
1
κh,0 −1 T cT,h
T −h
T −h
T T −h−kh,T
if P (yt yt−h > 0) = 1 (e.g. h = 0)
1−κ
dh,2 cT,h h,2
κh,2 −1
−
1−κ
dh,1 cT,h h,1
κh,1 −1
else
(0)
The bias terms RT,h are easily estimated. First, cT,h is estimated with Ybh,(kT,h ) . Second, for tail
exponents dh,i and κh,i , define left, right and two-tailed observations
(0)
Ybh,t ≡ |yt yt−h − γ̃T,h |
(1)
(2)
Ybh,t ≡ − (yt yt−h − γ̃T,h ) I (yt yt−h − γ̃T,h < 0) and Ybh,t ≡ (yt yt−h − γ̃T,h ) I (yt yt−h − γ̃T,h ≥ 0) ,
and let {mT,h } be an intermediate order sequence:
mT,h ∈ {1, ..., T − h}, mT,h → ∞ and mT,h /(T − h) → ∞.
Many estimators are possible, but we use Hill (1975)’s estimator of κh,i and Hall (1982)’s estimator
of dh,i due to their popularity, and available limit theory (see Hill, 2009, 2010, 2014b):
κ̂h,i,mT,h ≡
1
mT,h
mT,h
X
ln
j=1
(i)
Ybh,(j)
(i)
Ybh,(mT,h +1)
−1
κ̂h,i,m
mT,h b(i)
T,h
for i = 0, 1, 2.
Yh,(mT,h )
and dˆh,i,mT,h ≡
T −h
The bias-corrected tail-trimmed estimators are therefore
b(bc) (Yb(0)
b∗ b(0)
γ̂
T,h
h,(kT,h ) ) ≡ γ̂ T,h (Yh,(kT,h ) ) + R̂T,h for h ≥ 0,
(15)
where:
R̂T,h
1
kT,h b(0)
Y
if P (yt yt−h > 0) = 1
(16)
κ̂h,0,mT ,h − 1 T h,(kT ,h )
1−κ̂h,2,mT ,h
1−κ̂h,1,mT ,h
ˆh,2,m
ˆh,1,m
b(0)
b(0)
d
Y
d
Y
T ,h
T ,h
h,(kT ,h )
h,(kT ,h )
T −h
T −h
=
−
else (17)
T T − h − kT,h
κ̂h,2,mT ,h − 1
κ̂h,1,mT ,h − 1
R̂T,h =
Finally, if the tail indices are known to be equivalent κh,0 = κh,1 = κh,2 , then we can replace
tail-specific κ̂h,i,mT,h with the two-tailed κ̂h,0,mT,h , since the latter is computed from a larger sample
and therefore will be sharper with higher probability. We then have:
R̂T,h
T −h
T −h
=
T T − h − kT,h
dˆh,2,mT,h − dˆh,1,mT,h
κ̂h,0,mT,h − 1
!
(0)
Ybh,(kT,h )
1−κ̂h,0,m
T,h
if P (yt yt−h > 0) < 1.
(18)
14
3.2
Optimal Bias Correction
b(bc) (Yb(0)
As discussed in Hill (2013), a shortcoming of a bias-corrected estimator like γ̂
T,h
h,(kT,h ) ) is its
reliance on estimates κ̂h,i,mT,h and dˆh,i,mT,h based on one fractile mT,h , while a variety of m0T,h s can
be used. Further, R̂T,h is well defined only when κ̂h,i,mT,h > 1, and it seems desirable to choose mT,h
PT
b(bc) (Yb(0)
such that γ̂
T,h
t=h+1 yt yt−h .
h,(kT,h ) ) is close to the untrimmed estimator, e.g. 1/T
A simple technique for generating a window of fractiles is to pick some 0 < ξ < ξ¯ < ∞ and
generate a fractile function:
¯
mT,h (ξ) = [ξmT,h ] where ξ ∈ Υ = [ξ, ξ].
Let R̂T,h (ξ) be the bias estimator computed with mT,h (ξ). The new bias corrected estimator is
b∗∗ (Yb(0)
b∗ b(0)
ˆ
γ̂
T,h
h,(kT,h ) ) ≡ γ̂ T,h (Yh,(kT,h ) ) + R̂T,h (ξT,h )
(19)
PT
b∗∗ (Yb(0)
where ξˆT,h optimally places γ̂
T,h
t=h+1 yt yt−h such that κ̂h,i,mT,h > 1:
h,(kT,h ) ) near 1/T
T
∗
X
1
b
b(0)
y
y
(20)
ξˆT,h = arg min γ̂
(
Y
)
+
R̂
(ξ)
−
t t−h T,h
T,h
h,(kT,h )
T
ξ∈Υ∗h t=h+1
n
o
n
o
where Υ∗0 ≡ ξ ∈ Υ : κ̂0,0,mT,0 (ξ) > 1 and Υ∗h ≡ ξ ∈ Υ : κ̂h,i,mT,h (ξ) > 1 for i = 1, 2 . (21)
The set of minimizing ξˆT,h may contain more than one value, in which case ξˆT,h is one such element.
This is irrelevant since κ̂h,i,mT,h (ξ) and dˆh,i,mT,h (ξ) will not affect our bias estimator uniformly on Υ,
asymptotically with probability one, as long as mT,h /kT,h → ∞. Conversely, if κ̂h,i,mT,h (ξ) ≤ 1 on Υ
then a solution does not exist. This may occur when the assumption E[yt2 ] < ∞ fails, or E[yt2 ] <
∞ holds but h is large enough that there are too few tail observations of {yt yt−h }Tt=h+1 from which
to obtain a good estimate of κh,i .4 We find in simulation experiments that if T = 100 then h > 80
can lead to highly volatile estimates κ̂h,i,mT,h (ξ) , in particular κ̂h,i,mT,h (ξ) ≤ 1 over a large window of
ξ when κh,i is close to, but larger than, 1.
PT
b∗∗ (Yb(0)
In practice, sampling error can render γ̂
T,h
t=h+1 yt yt−h than the unh,(kT,h ) ) farther from 1/T
b∗ (Yb(0)
b∗ b(0)
corrected estimator γ̂
T,h
h,(kT,h ) ), especially when γ̂ T,h (Yh,(kT,h ) ) already has negligible small sample
∗
(0)
b (Yb
bias (e.g. yt yt−h has a symmetric distribution). We suggest using whichever estimator, γ̂
T,h
h,(kT,h ) )
P
∗∗
T
b (Yb(0)
or γ̂
T,h
t=h+1 yt yt−h :
h,(kT,h ) ), is closest to 1/T
b(obc) (Yb(0)
γ̂
T,h
h,(kT ,h ) )
∗∗
(0)
b (Yb
= γ̂
T,h
h,(kT ,h ) ) × I
(22)
!
T
T
∗∗
∗
X
X
1
1
b
(0)
(0)
b (Yb
yt yt−h < γ̂
)
−
y
y
γ̂ T,h (Ybh,(kT ,h ) ) −
t t−h T,h h,(kT ,h )
T
T
t=h+1
4
t=h+1
The problem of tail inference based on point estimates like κ̂h,i,mT ,h is well documented. This accounts for the use
of so-called window plots, or Hill-plots over a window of m0T,h s, e.g. Drees, de Haan, and Resnick (2000).
15
+
b∗ (Yb(0)
γ̂
T,h
h,(kT ,h ) )
×I
!
T
T
∗∗
∗
X
X
1
1
b
(0)
b (Yb(0)
)
−
yt yt−h ≥ γ̂
y
y
γ̂ T,h (Ybh,(kT ,h ) ) −
.
t
t−h
T,h
h,(k
)
T
,h
T
T
t=h+1
t=h+1
b(obc) (Yb(0)
b∗∗ b(0)
This is a merely small sample correction since, e.g., γ̂
T,0
0,(kT,h ) ) = γ̂ T,0 (Y0,(kT,h ) ) as T → ∞ with
probability approaching one due to negative bias. The optimal bias-corrected FD-QML estimator is
(obc)
θ̂T
= arg min
θ∈Θ
X
j∈F
(obc)
Ib
(λj )
ln f (λj , θ) + T
f (λj , θ)
where
(obc)
IbT (λ)
1
=
2π
b(obc) (Yb(0)
γ̂
T,0
0,(kT,h ) )
+2
bT
X
b(obc) (Yb(0)
γ̂
T,h
h,(kT,h ) )
!
!
× cos(λh) .
(23)
h=1
1/2
As long as κ̂h,i,mT,h and dˆh,i,mT,h in R̂T,h are mT,h -convergent, and we use more tail observations for
tail index estimation in the sense mT,h /kT,h → ∞, then κ̂h,i,m and dˆh,i,m do not affect the limiting
T,h
T,h
b∗∗ (Yb(0)
distribution of γ̂
T,h
h,(kT,h ) ). Indeed, since mT,h (ξ) = [ξmT,h ] also satisfies mT,h (ξ)/kT,h → ∞ ∀ξ ∈
Υ, we can always choose ξ from from any subset Υ∗ ⊆ Υ and still the tail index estimators will not
impact asymptotics. See Hill (2013, Theorem 2.2) for discussion.
The following summarizes the required second order power law property, and bounds for mT,h .
Assumption B0 (second order power law and fractile rates).
dh,1
c−κh,1 (1
+ O(r1 (c)) and P (yt yt−h − γ̃h ≥ c) = dh,2
c−κh,2 (1
P (yt yt−h − γ̃h ≤ −c) =
+ O(r2 (c))), where dh,i > 0, κh,i > 1,
and ri are measurable functions. Let eh,i > 0, eh,0 ≡ min{eh,1 , eh,2 } and κh,0 ≡ min{κh,1 , κh,2 }. Then
mT,h ∈ {1, ..., T − h} and mT,h → ∞, and either ri (c) = c−eh,i and mT,h = o((T − h)2eh,0 /(2eh,0 +κh,0 ) ),
or ri (c) = ln(c)−eh,i and mT,h = o(ln(T − h)2eh,0 ). Finally, mT,h /kT,h → ∞.
Remark 14 The two cases ri (c) = c−eh,i and ri (c) = ln(c)−eh,i fall under second order properties
SR1 and SR2 in Goldie and Smith (1987), while other SR1 and SR2 cases are possible here. See also
Haeusler and Teugels (1985), Hsing (1991) and Hill (2010). It is well known that as tails deviate from
1/2
a Pareto law, mT,h -convergence of Hill (1975)’s estimator requires observations to come from farther
out in the tails, hence mT,h must grow slower. See especially Haeusler and Teugels (1985, Section 5).
Fractiles that satisfy Assumptions B0 and C for either case, any exponents {eh , κh,0 } and any lag h
= 0, ..., bT , include mT,h = [δm,h ln(ln(T ))] and kT,h = [δk,h (ln(ln(T )))1−ι ], where ι > 0 is tiny and
δm,h , δk,h > 0.
Define
I∗T,h,t
≡ I (|yt yt−h − γ̃h | ≥ cT,h ) − P (|yt yt−h − γ̃h | ≥ cT,h ) and
I∗T,h
≡
1
T
1 X
κh,0 k 1/2
I∗T,h,t
T,h t=1
DT,0 ≡
1/2
kT,0
1
1 1−κ
1−κ
cT,0 and DT,h ≡ 1/2 dh,1 cT,h h,1 − dh,2 cT,h h,2 for h 6= 0
κ0,0 − 1 T − kT,0
kT,h
16
and
Z π
bT
X
1
1
(IT∗ (λ) − E [IT∗ (λ)]) $(λ)dλ +
Z̃T ≡ T 1/2
$h DT,h I∗T,h
2π
2π
−π
h=−bT
i
h
S̃T ≡ E Z̃T Z̃T0
and ṼT = Ω−1 S̃T Ω−1 .
−1/2
Theorem 3.1 Under Assumptions A, B 0 and C, T 1/2 ṼT
(obc)
(θ̂T
d
− θ0 ) → N (0, Ik ) where ṼT =
VT (1 + o(1)) when κ ≥ 4 and ṼT = VT (1 + O(1)) when κ ∈ (2, 4).
Remark 15 In order to estimate S̃T , a sample counterpart to
PbT
h=−bT
$h DT,h I∗T,h is available by
truncating the sum and using estimators of cT,h and κh,i . Conversely, a variety of subsampling
techniques may be valid, although we do not present a formal theory here. See, for example, Kirch
and Politis (2011) and Kreiss and Paparoditis (2012) for theory and references.
(0)
b∗ (Yb(0)
Remark 16 The order statistic Ybh,(kT,h ) is used both for trimming in γ̂
T,h
h,(kT,h ) ) and for the bias
estimator R̂T,h , hence the FD-QML scale ṼT is no longer VT in the heavy tail case κ < 4. We show
∗
b∗ (Yb(0)
in Appendix A that, for asymptotic analysis, we can replace γ̂
T,h
h,(kT,h ) ) with γ̂T,h (cT,h ) and write
1/2
(0)
1/2
(0)
R̂T,h = RT,h + DT,h × kT,h (Ybh,(kT,h ) /cT,h − 1) × (1 + op (1)), while also kT,h (Ybh,(kT,h ) /cT,h − 1) =
PT
∗ (c
I∗T,h (1 + op (1)). Hence γ̂T,h
T,h ) = KT
t=h+1 yt yt−h I(|yt yt−h − γ̃h | < cT,h ) for some function KT
of T , and R̂T,h − RT,h = DT,h × I∗T,h × (1 + op (1)), are both partial sum functions of tail indicators
I(|yt yt−h − γ̃h | < cT,h ) and I(|yt yt−h − γ̃h | ≥ cT,h ). Since I(|yt yt−h − γ̃h | ≥ cT,h ) is degenerate as
T → ∞ with a relatively fast rate when κ ≥ 4, and yt yt−h I(|yt yt−h − γ̃h | < cT,h ) is nondegenerate
PT
$h DT,h I∗T,h does not contribute to
and relatively thin tailed when κ ≥ 4, it can be shown bh=−b
T
the variance of Z̃T when κ ≥ 4, thus S̃T ∼ ST , cf. (cf. Hill, 2013, proof of Theorem 2.2). Finally, ṼT
(obc)
= VT (1 + O(1)) implies θ̂T
and θ̂T∗ have the same convergence rates.
The next result mirrors Corollary 2.4 in the linear thin tail case. Here we impose second order
power law Assumption B0 to safely handle tail exponent estimator asymptotics. See Section 3.3 for
removal of the power law assumption.
Corollary 3.2 In addition to Assumptions A, B 0 , and C.2 let yt satisfy (10), let kT,h → ∞ and
kT,h /(T − h) = o(1), and assume E[t |=t−1 ] = 0 a.s., E[2t |=t−1 ] = σ 2 a.s., E[3t |=t−1 ] = s a.s., and
(obc)
E[4t ] = K < ∞. Then T 1/2 (θ̂T
d
− θ0 ) → N (0, V) where V = Ω−1 (2Ω + Π)Ω−1 and Ω and Π are
defined in (11).
3.3
Bias-Correction: Thin Tails
Bias RT,h in (13) and (14) follows from Karamata theory under power law Assumption B. If tails
decay faster than a power law function, for example exponentially fast, then Karamata’s Theorem
17
does not hold, and (13) and (14) are not valid. We now show how R̂T,h behave as T increases for
exponential tails:
P (|yt yt−h − γ̃h | ≥ c) = ϑh exp{−ζh cδh } where ϑh , ζh , δh > 0.
(24)
The following extends to other thin tailed distributions with obvious changes to the derivations.
Dominated convergence and (24) imply we do not have an asymptotic bias problem if T 1/2 R̂T,h
(0)
→ 0. Logically, the intermediate order statistic Ybh,(kT,h ) in R̂T,h is well behaved, in particular
p
(0)
1/2
Ybh,(kT,h ) /cT,h = 1 + Op (1/kT,h ) provided in the exponential case kT,h / ln(T ) → 0, while Hill (1975)’s
estimator κ̂h,i,mT,h can be easily shown to diverge in probability. Finally, we only need mT,h → ∞
and mT,h /(T − h) → 0 because we no longer impose power law Assumption B0 , and we no longer
need mT,h /kT,h → ∞ because tails decay so rapidly that the tail estimators in the bias terms do not
impact asymptotics. We prove the next result in Hill and McCloskey (2014) since it is similar to Hill
(2013, Theorem 2.3).
Theorem 3.3 Let Assumption A hold, assume (24) applies for each h, and let kT,h → ∞, kT,h / ln(T )
p
−1/2
→ 0, mT,h → ∞, and mT,h /(T − h) → 0. Then T 1/2 R̂T,h → 0 hence T 1/2 VT
(obc)
N (0, Ik ). Further, if the conditions of Corollary 3.2 hold then T 1/2 (θ̂T
d
(obc)
(θ̂T
d
− θ0 ) →
− θ0 ) → N (0, V) where V
= Ω−1 (2Ω + Π)Ω−1 and Ω and Π are defined in (11).
4
Fractile Selection
We now discuss fractile selection from both theoretical and practical perspectives.
4.1
Mean-Squared-Error Minimization
In Section B of the supplemental material, Hill and McCloskey (2014), we characterize the minimum
mean-squared-error [mse] trimming fractile kT,h for the special case where yt yt−h has a symmetric
distribution for h 6= 0. In this case FD-QML bias reduces to just BT = Ω−1 (2π)−1 $0 {E[yt2 I(yt2
< cT,0 )] − E[yt2 ]}, which is easily analyzed. The general case for BT is far more complicated, and
possibly intractable, without a specific model and known distribution for yt yt−h .
By Theorem B.1 in Hill and McCloskey (2014), kT,h = a for some constant a ≥ 0 minimizes mse
as T → ∞, where a = 0 if κ = 4. Bounded kT,h follows from the fact that bias dominates the variance
in their contribution to the mse, and bias E[yt2 I(yt2 < cT,0 )] − E[yt2 ] is reduced with a smaller amount
of trimming. Obviously a constant kT,h is not allowed here precisely to ensure asymptotic normality
∗
when tails are heavy. But this does clearly suggest kT,h → ∞ very slowly will keep the bias in θ̂i,T
very small; and as a practical matter that we discuss below, it is easier to correct for small bias.
The conclusions of Theorem B.1 in Hill and McCloskey (2014) are qualitatively similar to those
in Chaudhuri and Hill (2014) for higher order bias minimization of a bias corrected tail-trimmed
mean. Chaudhuri and Hill (2014)’s bias representation exploits independence, and higher order bias
18
is smaller when trimming is reduced and more tail observations are used for bias estimation. Bias in
the tail-trimmed FD-QML estimator is merely a weighted average of biases from tail-trimmed crossmoments, and in the iid case higher order bias for each bias-corrected tail-trimmed cross-moment
has the same structure as in Chaudhuri and Hill (2014). Bias derivations for dependent data will be
similar under the Assumption A dependence properties.
4.2
Practical Concerns for Fractile Selection
In small samples kT,h plays three roles in estimator bias and bias correction. First, the trimmed
b∗ (Yb(0)
b∗ b(0)
second moment γ̂
T,0
0,(kT,h ) ) and possibly cross-moments γ̂ T,h (Yh,(kT,h ) ) for h ≥ 1 are biased, where
(obc)
the bias in general is augmented for larger kT,h (cf. Chaudhuri and Hill, 2014). Second, IbT (λ) uses
∗
b (Yb(0)
sample moments γ̂
) for only h = 0, ..., bT , where T − bT → ∞ since robustness requires
T,h
h,(kT,h )
kT,h ∈ {1, ..., T − h} and kT,h → ∞ for all h. The restriction kT,bT ∈ {1, ..., T − bT } implies in general
bT must be small to accomodate a large kT,h for each h, while small bT < T − 1 can distort FD-QML
estimation. In both cases, a large kT,h can generate substantial bias in θ̂T∗ . Third, the trimmed portion
|E[yt yt−h I(|yt yt−h − γ̃h | < cT,h )] − E[yt yt−h ]| is larger when kT,h is large, provided bias exists, and
a large gap is more difficult to approximate and therefore estimate using Karamata theory. Thus,
b∗ (Yb(0)
b∗∗ (Yb(0)
when kT,h is large the non-bias corrected γ̂
) and bias corrected γ̂
) can be
poor approximations to E[yt yt−h ],
T,h
h,(kT,h )
(obc)
rendering θ̂T
less sharp.
T,h
h,(kT,h )
In general, then, a small and slowly
increasing kT,h will improve small sample performance, and be valid based on Theorems 3.1 and 3.3.
Moreover, recall that kT,h → ∞ at a slowly varying rate is required to ensure mean-centering does
not impact asymptotics when yt is heavy tailed.
The conclusion of slow kT,h → ∞ is contrary to the finding of Theorem 2.2, that if κ < 4 then
the rate of convergence is higher when kT,h → ∞ faster. This obviously does not concern small and
large sample bias. In terms of inference on θ0 , clearly small bias is preferred, hence slow kT,h → ∞.
5
Simulation Study
We now investigate the small sample properties of the standard and robust FD-QML estimators. We
draw samples {yt }Tt=1 of size T ∈ {100, 250, 500, 1000} from AR(1) and GARCH(1,1) models. In each
case 10, 000 samples are of size 2T are drawn, and we retain the last T for analysis. The AR(1) models
are yt = φ0 yt−1 + t with φ0 ∈ {.00, .75, .90}, where t is iid standard normal, or symmetric Pareto
with distribution P (t < −c) = P (t > c) = .5(1 + c)−κ and tail index κ ∈ {2.25, 4.5} standardized
such that σ02 ≡ E[2t ] = 1. In this setup, it is well known that yt belongs to the same distribution
class as t (e.g. Brockwell and Cline, 1985). We initialize the draw with y1 = 1 . The spectrum is
0
f (λ, θ0 ) = (2π)−1 σ02 |1 − φ0 e−iλ |−2 where θ0 = φ0 , σ02 , and we optimize the FD-QML criteria on Θ
≡ [−.999, .999] × [0, 100].5
5
We use Matlab R2014a, where optimization is performed by the fmincon routine.
19
2 , ω = 1, and [α , β ] is [.3, .6],
The GARCH model is xt = σt t where σt2 = ω0 + α0 x2t−1 + β0 σt−1
0
0 0
[.2, .7] or [.05, .9], and t is iid standard normal. We initialize the draw with σ12 = ω0 . We use yt ≡ x2t
which has an ARMA(1,1) representation yt = ω0 + (α0 + β0 )yt−1 − β0 ut−1 + ut , where ut = σt2 (2t
− 1) is a martingale difference, and σ02 ≡ E[u2t ]. Thus, the spectrum for yt is f (λ, θ0 ) = (2π)−1 σ02 |1
0
− β0 e−iλ |2 × |1 − (α0 + β0 )e−iλ |−2 where θ0 = α0 , β0 , σ02 . Note that P (|yt | > c) ∼ dc−κy with
tail index κy ≈ 2.05 when [α0 , β0 ] = [.3, .6], κy ≈ 3.02 when [α0 , β0 ] = [.2, .7], and κy ≈ 4.05 when
[α0 , β0 ] = [.05, .9].6 Define Θ̃ ≡ [10−10 , 100] × [0, .999] × [0, .999]. We optimize the FD-QML criteria
on the subset Θ ≡ {θ ∈ Θ̃ : α + β ≤ .999}.
We use optimal bias correction since, by simulation experiments not reported here, we otherwise
find that tail-trimmed FD-QML estimates are strongly negatively biased. In the AR case we use
P
centering with γ̃T,0 = 0 and γ̃T,h = (T − h)−1 Tt=h+1 yt yt−h at lags h ≥ 1, and in the GARCH case
γ̃T,h = 0 ∀h ≥ 0.
Let ι = 10−10 . We use a trimming fractile kT,h = [.2(ln(T ))1−ι ] for each lag h = 0, ..., bT with
bandwidth bT = [T .95 ] ∈ {79, 190, 366, 708}, and bias estimation fractile mT,h = [8 ln(T )] and fractile
function mT,h (ξ) = [ξmT,h ] over ξ ∈ [.1, 2]. Using lags bT = [T δ ] for δ > .95 can lead to a breakdown
in bias estimation for T ≤ 250 and h ≈ bT , because the sample {yt yt−h − γ̃T,h }Tt=h+1 may not have
a tail index estimate κ̂h,0,mT,h > 1 for any mT,h (ξ) due to the sample’s small size T − h. As long as
second order power law Assumption B0 holds then {kT,h , mT,h } is valid for asymptotically unbiased
estimation in all cases by Theorems 3.1 and 3.3.
b(obc) (Yb(0)
We compute the optimally bias-corrected γ̂
b∗ (Yb(0)
γ̂
T,h
h,(kT,h ) ),
and the bias-corrected
T,h
h,(kT,h ) ) in (22) based on the potentially biased
∗∗
(0)
b (Yb
b∗ b(0)
ˆ
γ̂
T,h
h,(kT,h ) ) = γ̂ T,h (Yh,(kT,h ) ) + R̂T,h (ξT,h ) in (19). In this
study, yt yt−h from the AR model have symmetric tails for h > 0, hence R̂T,0 (ξˆT,0 ) is computed from
(16) and all other R̂T,h (ξˆT,h ) from (18). We compute all R̂T,h (ξˆT,h ) from (16) for the GARCH model.
The optimal tuning parameter ξˆT,h is computed as in (20) with the set restriction (21).
Trimming very few yt yt−h promotes small trimmed moment bias that is easily corrected if many
tail observations are used for bias estimation, and suffices to generate a robust FD-QML estimator that
is approximately normally distributed. In this study, for example, for T ∈ {100, 250} the computed
fractiles are kT,h = {1, 1} and mT,h = {37, 44}, with ranges mT,h (ξ) ∈ {[4, 74], [4, 88]}. Trimming just
one observation in samples of size T ≤ 250 corrects for heavy tails, while having available substantially
more observations for tail exponent estimation (e.g. mT,h (ξ) ∈ {[4, 74]) leads to sharp bias correction.
(r)
Let θ̂T,i denote the rth sample estimate of θ0,i for any estimator. We report the simulation
P
PR
(r)
(r)
2 1/2 , where
median, bias 1/R R
i=1 θ̂T,i − θ0,i , and root-mean-squared-error {1/R
i=1 (θ̂T,i − θ0,i ) }
(r)
(r)
R = 10, 000. We then compute the standardized variable zT,i ≡ (θ̂T,i − θ0,i )/sT,i with simulation
P
PR (r) 2
(r)
variance s2T,i ≡ 1/R R
i=1 (θ̂T,i − 1/R
j=1 θ̂j,T ) and perform a Kolmogorov-Smirnov test of standard
The GARCH process {xt } satisfies P (|xt | > c) = dc−κx (1 + o(1)) and E|α0 2t + β0 |κx /2 = 1 (cf. Basrak, Davis,
and Mikosch, 2002),
> c) ∼ dc−κx /2 . We draw R = 100, 000 iid standard normal t and compute κ̂x ≡
PR hence 2P (|yt | κ/2
arg minκ∈K |1/R t=1 |α0 t + β0 |
− 1|, where K = {.001, .002, ..., 10}. We repeat this 10, 000 times and find the
median and mean κ̂x are roughly 4.1 when [α0 , β0 ] = [.3, .6], roughly 6.04 when [α0 , β0 ] = [.2, .7], and roughly 8.1 when
[α0 , β0 ] = [.05, .9].
6
20
(r)
normality on the simulation sample {zT,i }R
r=1 . We report the KS statistic divided by its 5% critical
value: values greater than unity are evidence against normality.
First, we demonstrate the accuracy of the optimally bias corrected tail-trimmed covariances for an
AR(1) with φ0 = .9, and for the GARCH models, in both cases with T = 100. In Figures 1 and 2 we
b(obc) /γ̂
b(obc) and γ̂T,h /γ̂T,0 , where γ̂T,h ≡ 1/T PT
plot the trimmed and untrimmed ratios γ̂
T,h
T,0
t=h+1 yt yt−h ,
(obc)
(obc)
b
b
over lags h = 1, ..., 80. The plots are the 2.5%, 50% and 97.5% quantiles of γ̂
/γ̂
, and the
T,h
T,0
b(obc) /γ̂
b(obc) − γ̂T,h /γ̂T,0 , over the 10, 000 samples. Clearly γ̂
b(obc) /γ̂
b(obc) matches
average difference γ̂
T,h
T,0
T,h
T,0
γ̂T,h /γ̂T,0 exceptionally well, which demonstrates the sharpness of the optimally fitted bias estimator.
b(obc) /γ̂
b(obc) and the difference separately because γ̂
b(obc) /γ̂
b(obc) and γ̂T,h /γ̂T,0 cannot
Indeed, we plot γ̂
T,h
T,0
T,h
T,0
be visually distinguished apart for most h. The difference, however, is larger when tails are heavier.
Second, FD-QML estimates for the AR case [φ0 , σ02 ] = [.90, 1.00] are presented in Table 1. Since
estimation results for φ0 ∈ {.00, .75} are qualitatively similar, they are presented in the supplemental
material Hill and McCloskey (2014, Tables E.1-E.2). The untrimmed estimator works well, but
in most cases it is more biased and “less normal” than the trimmed estimator. In particular, the
untrimmed estimator is non-normal in heavy tailed cases, while the trimmed estimator of φ0 is closer
to normal for even small T , and roughly normal when T ≥ 250. The robust and non-robust estimators
of σ02 tend to be farther from normal than the estimators of φ0 , especially in the heavy tailed case κ
= 2.25, but the trimmed estimator is generally closer to normal, and has smaller bias in most cases.
Third, estimates for the GARCH model are presented in Table 2. It is known that time domain
non-robust M-estimators for GARCH parameters, like QML, appear biased in small samples, while
tail-trimming reduces the negative impact of larger errors on bias, and improves approximate normality. See Hill (2014a) for evidence, and references. The same essential patterns arise in the frequency
domain: tail-trimmed FD-QML has smaller bias and is closer to normal in most cases when T ≥ 250.
In general, samples sizes T ≥ 2000 are required to achieve the bias and KS statistic values that we
obtain in AR model estimation. See Hill and McCloskey (2014, Table E.3) for the case T = 2000.
6
Conclusion
We extend recent advances in the literature on heavy tail estimation to the frequency domain. We
deliver asymptotically normal and unbiased estimators by negligibly transforming the data, and
using optimally fitted bias correction for transformed sample cross-moments. A simulation study
reveals that the tail-trimmed FD-QML estimator works well compared to the non-robust conventional
estimator: our estimator is closer to normal and in most cases exhibits smaller bias and dispersion;
while our robust moment estimators are quite sharp, showing that the optimally fitted bias estimator
developed in Hill (2013) works very well.
In principle our robust Whittle estimator is asymptotically normal even when the stationary
sequence is not covariance stationary, and therefore does not have a spectral density. This possibility
will allow for an extension of our methods to arbitrarily heavy tailed data as in Mikosch, Gadrich,
21
Kluppelberg, and Adler (1995) for ARMA processes, and suggests a useful future line of research.
Finally, a higher order bias representation for the tail-trimmed moments computed in this paper
will be useful for determining a trimming policy. Such expansions, however, are tricky under data
dependence, and for FD-QML bias involves many tail-trimmed cross-moments with a complicated
structure. This, too, is beyond the scope of this paper.
A
Appendix: Proofs of Main Results
The proofs of Theorems 2.1, 2.2 and 3.1 exploit several preliminary results. We first present asymptotic theory for tail trimmed and tail event processes. These are used to then prove lemmas that
directly support the proofs of the main results. Finally, we prove the main theorems.
In order to avoid confusion, it is understood that h ≥ 0 unless otherwise noted. In preliminary lemP
mata we reduce the number of required cases by mean-centering I(|yt yt−h − (T − h)−1 Tt=h+1 yt yt−h |
< c) for all lags h. Thus:
T
X
1
(0)
(0)
(0)
≡ yt yt−h −
yt yt−h for h = 0, ..., bT , and Ybh,(1) ≥ Ybh,(2) ≥ · · · Ybh,(T −h)
T −h
t=h+1
!
T
T
X
X
∗
T
1
1
b (c) ≡
γ̂
yt yt−h I yt yt−h −
yt yt−h < c
T,h
T − kT T
T −h
(0)
Ybh,t
t=h+1
∗
γ̂T,h
(c) ≡
1
T
T − kT T
T
X
t=h+1
yt yt−h I (|yt yt−h − E [yt yt−h ]| < c)
t=h+1
∗ (c
This does not reduce generality since γ̂T,h
T,h ) is still biased at lag h = 0, and unbiased at lag h >
0 when yt yt−h has a symmetric distribution.
Further, since most proof details do not rely on kT,h being a function of h, unless otherwise stated:
kT,h = kT .
Since kT,h ∈ {1, ..., T −h}, the restriction kT,h = kT is synonymous to imposing kT,h = kT ∈ {1, ..., T −
bT }.
A.1
Asymptotic Theory for Tail and Tail-Trimmed Processes
We repeatedly use the following central limit theorem for an arbitrary triangular array {gT,t : 1 ≤ t ≤
T }T ≥1 .
t
Write FT,s
≡ σ(gT,t : s ≤ τ ≤ t),
supT ∈N sup1≤t≤T supA⊂F t−h
efficients
ρ̃∗h
∞
T,−∞ ,B⊂FT,t
and define mixing coefficients α̃h ≡
|P (A ∩ B) − P (A)P (B)| and interlaced maximal correlation co-
≡ supT ∈N supST,h ,TT,h ρ(σ(gT,t : t ∈ Sh ), σ(gT,s : s ∈ Th )) where ST,h and TT,h are
non-empty subsets of {1, ..., T } with inf s∈ST,h ,t∈TT,h {|s − t|} ≥ h. By convention supST,h ,TT,h ρ(σ(gT,t
: t ∈ Sh ), σ(gT,s : s ∈ Th )) = 0 if h ≥ T .
22
Lemma A.1 Let Assumption A hold. Let {gT,t : 1 ≤ t ≤ T }T ≥1 be a σ(yt , yt−1 , ..., yt−i )-measurable
2 ] < ∞ for each 1 ≤ t ≤ T and T ≥
triangular array for some finite i, where E[gT,t ] = 0 and E[gT,t
PT
P
P
d
2 ]; and (b) 1/υ
1. Then (a) υT2 ≡ E( Tt=1 gT,t )2 ∼ K Tt=1 E[gT,t
T
t=1 gT,t → N (0, 1) provided the
Lindeberg condition holds:
T
1 X 2
E gT,t I (|gT,t | > υT × ) → 0 for every > 0.
2
υT t=1
(A.1)
Proof.
For (a), by Assumption A.1 and measurability we have α̃h → 0 and ρ̃∗1 < 1, hence υT2
PT
2 ] by an application of Lemma 1 in Bradley (1992). For (b) note α̃ → 0, υ 2 ∼
∼ K t=1 E[gT,t
h
T
PT
PT
d
2
K t=1 E[gT,t ] and (A.1) imply 1/υT t=1 gT,t → N (0, 1) by Theorem 2.2 in Peligrad (1996). QED.
The next result allows us to use tail-trimmed covariance estimators with non-random thresholds.
∗ (c
Then (a) |γ̂T,h
T,h ) −
∗
p
1/2 × |γ̂
∗ (c
b (Yb(0) )
E[γ̂T,h
T,h )]| → 0; and (b) if additionally Assumption B holds then (T /||ST ||)
T,h
h,(kT )
Lemma A.2 Let Assumptions A and C hold, and let h ∈ {0, ..., bT }.
p
∗ (c
− γ̂T,h
T,h )| → 0.
Proof.
Define =t ≡ σ(yτ : τ ≤ t) and ψh,t (c) ≡ yt yt−h I(|yt yt−h − E[yt yt−h ]| < c), hence
P
≡ (T − h)(T − h − kT,h )−1 T −1 Tt=h+1 ψh,t (c). Under Assumptions A and C.1 ψh,t (cT,h ) is
Claim (a):
∗ (c)
γ̂T,h
uniformly L1+ι -bounded for tiny ι > 0, and α-mixing, hence {ψh,t (cT,h ), =t } forms an L1 -mixingale
array (McLeish, 1975, Lemma 2.1). Since h ≤ bT and T − bT → ∞ by Assumption C.2, it follows
P
p
1/T Tt=h+1 {ψh,t (cT,h ) − E[ψh,t (cT,h )]} → 0 by Theorem 2 of Andrews (1988). The claim follows
instantly.
Claim (b):
2 (c
−1
Write S2T ≡ E[ψ0,t
T,0 )], γ̃T,h ≡ (T − h)
Lemma A.9.b ||ST || ∼
KS2T ,
PT
t=h+1 yt yt−h
and γ̃h ≡ E[yt yt−h ]. By
hence we need only prove
T
n
o
X
1
(0)
AT ≡ 1/2
yt yt−h I(|yt yt−h − γ̃T,h | < Ybh,(kT ) ) − I (|yt yt−h − γ̃h | < cT,h ) = op (1).
T ST t=h+1
We have
T
X
1
AT = 1/2
yt yt−h {I (|yt yt−h − γ̃T,h | − cT,h < 0) − I (|yt yt−h − γ̃h | − cT,h < 0)}
T ST t=h+1
+
T
n o
X
1
(0)
b
y
y
I
|y
y
−
γ̃
|
−
Y
<
0
−
I
(|y
y
−
γ̃
|
−
c
<
0)
t t−h
t t−h
t t−h
T,h
T,h
T,h
h,(kT )
T 1/2 ST t=h+1
= BT + CT .
Consider BT . The trimming function I(u) ≡ I(u < 0) can be approximated by a continuous, differentiable, uniformly bounded positive function JN (u) that has a uniformly bounded derivative DN (u)
23
≡ (∂/∂u)JN (u), where N ∈ N guides the approximation: limN →∞ supu∈R |JN (u) − I(u)| = 0 and
limN →∞ supu∈U0 |DN (u)| = 0 for any compact subsets U ⊂ R and U0 ⊂ R/0. See Hill (2012b, proof
of Lemma A.1) and Hill (2013, proof of Lemma A.2), cf. Lighthill (1958, p. 22). In particular, we
can always set N = NT → ∞ as T → ∞ fast enough to ensure
T
n
o
X
1
2
2
2
2
y
y
J
(y
y
−
γ̃
)
−
c
−
J
(y
y
−
γ̃
)
−
c
+ op (1).
t t−h
t t−h
t t−h
NT
NT
T,h
h
T,h
T,h
T 1/2 ST t=h+1
BT =
Hence, by the mean-value-theorem, for some γ̊T,h , |γ̊T,h − γ̃h | ≤ |γ̃T,h − γ̃h |:
BT = −2
T
X
1
2
2
y
y
×
D
(y
y
−
γ̊
)
−
c
t t−h
t t−h
NT
T,h
T,h × (yt yt−h − γ̃T,h ) (γ̃T,h − γ̃h ) + op (1).
T 1/2 ST t=h+1
Mixing Assumption A.1 implies mixing in the ergodic sense, and therefore ergodicity (see, e.g., Petersen, 1983). Stationarity, ergodicity, Lp -boundedness for p > 2, and h ≤ bT with T − bT → ∞
p
p
imply γ̃T,h → γ̃h = E[yt yt−h ], hence γ̊T,h → E[yt yt−h ]. Further, max1≤t≤T |yt | = Op (T 1/2 ) by stationarity and square integrability, and |yt yt−h − γ̊T,h | 6= cT,h a.s. by distribution continuity. Thus, since
p
max1≤t≤T |DNT ((yt yt−h − γ̊T,h )2 − c2T,h )| → 0 as fast as we choose by setting NT → ∞ as T → ∞
p
fast enough, it follows BT → 0.
p
(0)
The same type of argument applies to CT where we use γ̃T,h → E[yt yt−h ], and Ybh,(kT ) /cT,h = 1 +
1/2
Op (1/kT ) by Lemma A.3. QED.
Finally, we characterize order statistics and tail exponent estimators.
Lemma A.3 (order statistic) Define I∗T,h,t ≡ I(|yt yt−h − γ̃h | ≥ cT,h ) − P (|yt yt−h − γ̃h | ≥ cT,h ).
−1/2 PT
1/2
1/2
(0)
I∗
= Op (1/k ) ∀h ∈
Under Assumptions A.1, B and C k (Yb
/cT,h − 1) = κ−1 k
T,h
h,0 T,h
h,(kT )
t=1
T,h,t
T
{0, ..., bT }.
Proof.
(0)
1/2
(0)
Define Yh,t ≡ |yt yt−h − γ̃h |. We show that kT,h (Ybh,(kT ) /cT,h − 1) has the same limit
1/2
(0)
distribution as kT,h (Yh,(kT ) /cT,h − 1). The claim then follows by applying Lemma A.3 in Hill (2013)
1/2
(0)
to kT,h (Yh,(kT ) /cT,h − 1).
Observe that E|yt yt−h |1+ι < ∞ for some ι > 0 by Assumption A, and the Assumption A.1 αmixing property implies yt yt−h is an adapted L1+ι -mixingale. Further, h ≤ bT with T − bT → ∞ under
P
P
Assumption C.2. Hence (T − h)−1 Tt=h+1 yt yt−h = E[yt yt−h ] + Op (1/T ι ) since E| Tt=h+1 yt yt−h |1+ι
= O(T − h) by stationarity, L1+ι -boundedness, and an application of Lemma 2 in Hansen (1991,
1992). Since kT is slowly varying by mean-centering and Assumption C.1, it follows kT = o(T a ) ∀a
P
1/2
1/2
(0)
> 0, hence (T − h)−1 Tt=h+1 yt yt−h = E[yt yt−h ] + op (1/kT ). Therefore kT,h (Ybh,(kT ) /cT,h − 1) and
1/2
(0)
kT,h (Yh,(kT ) /cT,h − 1) have the same limit distribution in view of mixing Assumption A.1, tail decay
Assumption B, and arguments presented in Hill (2010, Lemma 3) and Hill (2014b, proof of Theorem
24
2.1, Lemma 2.2).7 QED.
Lemma A.4 (tail exponents) Define κ̂−1
h,mT,h ≡ 1/mT,h
(0)
≡ (mT,h /(T − h))(Ybh,(mT,h ) )
κ̂h,mT,h
PmT,h
j=1
(0)
(0)
ln(Ybh,(j) /Ybh,(mT,h +1) ) and dˆh,mT,h
. Let Assumptions A.1, B 0 and C hold. Then κ̂h,mT,h = κh,0 +
1/2
1/2
Op (1/mT,h ) and dˆh,mT,h = dh,0 + Op (1/mT,h ).
Proof.
Note (T − h)−1
PT
t=h+1 yt yt−h
= E[yt yt−h ] + Op (1/T ι ) and kT = o(T a ) ∀a > 0. Then
1/2
κ̂h,mT,h = κh,0 + Op (1/mT,h ) by Theorem 2.1 and Lemma 2.2 in Hill (2014b). Simply note that
Assumption B0 falls under second order regular variation categories SR1 and SR2 in Goldie and
Smith (1987). Hill (2010, 2014b) uses SR1, and exploits arguments developed in Hsing (1991), to
1/2
prove κ̂h,mT,h = κh,0 + Op (1/mT,h ) for a filtered dependent sequence. It is easily verified that Hill
(2010, 2014b)’s asymptotic theory covers SR2 with the appropriate restriction to mT,h , in particular
1/2
that Assumption B0 suffices. dˆh,mT,h = dh,0 + Op (1/mT,h ) follows from Lemma A.3, κ̂h,mT,h = κh,0
1/2
+ Op (1/mT,h ), and the mapping theorem. QED.
Remark 17 If mean-centering is not used, e.g. if yt yt−h ≥ 0, then Lemmas A.3 and A.4 respectively
follow from Lemma 3 and Theorem 2 in Hill (2010). We only need the Assumption C.1 properties
P
kT → ∞ and kT /T = o(1) since the limit properties of (T − h)−1 Tt=h+1 yt yt−h are then irrelevant.
Similarly, if E[yt4 ] < ∞ then by Assumption A.1 and, e.g., Lemma 2.1 in McLeish (1975), (T −
P
1/2
h)−1 Tt=h+1 yt yt−h = E[yt yt−h ] + Op (1/T 1/2 ) = E[yt yt−h ] + op (1/kT ), hence Lemmas A.3 and A.4
follow from arguments in Hill (2010, Lemma 3) and Hill (2014b, proof of Theorem 2.1, Lemma 2.2).
A.2
Supporting Lemmas for Robust FD-QML
Throughout {rM,T }M,T ∈N is a double array of finite non-random numbers with supT ∈N |rM,T | → 0 as
M → ∞, and may be different in different places. Recall
$(λ, θ) ≡ −
1
∂
ln f (λ, θ),
f (λ, θ) ∂θ
and recall moment and spectral density estimators
b∗ (c)
γ̂
T,h
T
T −h
1 X
≡
yt yt−h I
T − h − kT T
t=h+1
∗
γ̂T,h
(c) ≡
T −h
1
T − h − kT T
T
X
!
T
X
1
yt yt−h < c : h ∈ {0, ..., bT }
yt yt−h −
T −h
t=h+1
yt yt−h I (|yt yt−h − E [yt yt−h ]| < c) : h ∈ {0, ..., bT }
t=h+1
7
Hill (2014b) proves Hill (1975)’s tail index estimator is asymptotically normal for filtered weakly dependent seb(0) , by extending arguments in Hill (2010, Theorem 2) for distribution tails that satisfy a second order
quences, e.g. Y
h,t
regular variation property. In particular, under conditions presented here, the plug-in does not affect the limit distrib(0) , while
bution of the Hill (1975)’s estimator. The same set of arguments extend to intermediate order sequences Y
h,(kT )
b(0)
asymptotics for Y
does not require a second order regular variation property (see Hill, 2010, Lemma 3).
h,(kT )
25
1
IbT∗ (λ) ≡
2π
1
IT∗ (λ) ≡
2π
b∗ (Yb(0) ) + 2
γ̂
T,0
0,(kT )
bT
X
!
∗
(0)
b
b
γ̂ T,h (Yh,(kT ) ) cos(λh)
h=1
∗
γ̂T,0
(cT,0 ) + 2
bT
X
!
∗
γ̂T,h
(cT,h ) cos(λh)
h=1
fT∗ (λ) ≡ (2π)−1
bT
X
i
∗
(cT,|h| ) e−iλh .
E γ̂T,|h|
h
h=−bT
Notice fT∗ (λ) is defined here for the first time. Define criteria and estimators:
(
θ̃T∗
= arg min
θ∈Θ
1
2π
π
Z
−π
Ib∗ (λ)
ln f (λ, θ) + T
f (λ, θ)
!
)
dλ
o
n
∗
= arg min Q̃T (θ)
(A.2)
θ∈Θ
!
n
o
∗
X
b
IT (λj )
1
∗
∗
θ̂T = arg min
ln f (λj , θ) +
= arg min Q̂T (θ) ,
θ∈Θ T
f (λj , θ)
θ∈Θ
j∈F
where F ≡ (−T /2, T /2] ∩ Z \ {0}, and recall Ω, ST and VT defined in (9). Recall also:
κ = arg sup {α > 0 : E |yt |α < ∞} .
We exploit the following implications of Karamata’s Theorem under Assumption B8 :
κ ∈ (2, 4) : E yt4 I yt2 < cT,0 ∼
4 kT 2
c
4 − κ T T,0
(A.3)
κ = 4 : E yt4 I yt2 < cT,0 ∼ L̃4 (T ) for some slowly varying L̃4 (T ).
In the Paretian case of Assumption B 0 , by integration is it easily verified that L̃4 (T ) = d0,0 ln(T ).
First, we can always work with θ̃T∗ .
−1/2
Lemma A.5 (equivalent estimators) Under Assumptions A-C T 1/2 VT
p
(θ̂T∗ − θ̃T∗ ) → 0 where
T /||VT || → ∞.
Proof.
The property T /||VT || → ∞ follows from VT = Ω−1 ST Ω−1 and Lemma A.9. We now
repeatedly use lim inf T →∞ ||ST || > 0 by Lemma A.9. Lemma A.6 can be easily modified to apply to
p
p
Q̂∗T (θ̊T ) to show (∂ 2 /∂θ∂θ0 )Q̂∗T (θ̊T ) → Ω for any θ̊T → θ0 . Similarly, Lemma A.7 can be modified
p
−1/2 P
−1/2
−1/2
to prove T −1/2 S
(Ib∗ (λj ) − I ∗ (λj ))$(λj ) → 0. Further V
=S
Ω by the definition
T
j∈F
T
T
T
T
of a square root matrix. Hence, by the optimization problems leading to {θ̂T∗ , θ̃T∗ } and first order
8
The case κ = 4 is shown as follows. Let L̃(c) be a slowly varying function that may be different in different places.
By Assumption B, a change of variables, and Karamata’s Theorem: ET ≡ E[(yt2 − E[yt2 ])2 I(|yt2 − E[yt2 ]| < cT,0 )] =
R c2
K 0 T ,0 L(u)u−1 du = L̃(c2T,0 ). By Assumption B, κ0,0 = κ/2, and properties of regularly varying functions: cT,0 ∼
(T /kT )2/κ L̃(T /kT ). Further, L̃(T /kT ) ∼ L̃(T ) because L̃ is slowly varying and kT > 0. Hence, cT,0 ∼ (T /kT )2/κ L̃(T ).
Therefore ET = L̃(c2T,0 ) = L̃((T /kT )2/κ L̃(T )), where L̃((ξT /kT )2/κ L̃(ξT ))/L̃((T /kT )2/κ L̃(T )) → 1 by slow variation,
hence ET is a slowly varying function of T .
26
expansions around θ0 , we have:
θ̂T∗ − θ̃T∗
Z π
1 X
1
−1/2
= −T 1/2 ST
(IT∗ (λ) − f (λ)) $(λ)dλ + op (1)
(IT∗ (λj ) − f (λj )) $(λj ) −
T
2π −π
−1/2
T 1/2 VT
j∈F
1 X
−1/2
= −ST
T 1/2
f(bT ) (λj ) − f (λj ) $(λj )
j∈F
1 X
−1/2
− T 1/2 ST
T
j∈F
1
IT∗ (λj ) − f(bT ) (λj ) $(λj ) −
2π
Z
π
−π
(IT∗ (λ) − f (λ)) $(λ)dλ
+ op (1)
= −AT,1 − AT,2 + op (1).
where f(M ) (λ) is the M th Cesáro sum of the Fourier series of f (λ). Hölder continuity Assumption
A.2.iv, bandwidth Assumption C.2, and boundedness of $(λ) imply AT,1 = O(T 1/2 /bαT ) = o(1). See
Zygmund (2002, Vol. I, Chapt. 2).
∗
th Cesáro sum of the Fourier series of f ∗ (λ) ≡
Next, letting fT,(M
T
) (λ) be the M
P
∗
−iλh , and noting E[I ∗ (λ)] = f ∗
T
E[γ̂
(c
)]e
(λ),
we
may
write:
(2π)−1 bh=−b
T
T,|h| T,|h|
T,(bT )
T
AT,2
1 X
−1/2
= T 1/2 ST
T
j∈F
1
(IT∗ (λj ) − E [IT∗ (λj )]) $(λj ) −
2π
Z
π
(IT∗ (λ) − E [IT∗ (λ)]) $(λ)dλ
−π
Z π
1X
1
f (λ)$(λ)dλ −
f (λj )$(λj )
+T
2π −π
T
j∈F
Z π
X
X
1
1
−1/2
−1/2 1
−T 1/2 ST
fT∗ (λj )$(λj ) + ST
f (λj ) − f(bT ) (λj ) $(λj )
fT∗ (λ)$(λ)dλ −
1/2
2π −π
T
T
j∈F
j∈F
Z
π
X
−1/2 1
∗
∗
1/2 −1/2 1
∗
∗
−ST
f
(λ
)
−
f
(λ
)
$(λ
)
+
T
S
f
(λ)
−
f
(λ)
$(λ)dλ
j
j
T
T,(bT ) j
T,(bT )
T
2π −π T
T 1/2
1/2
−1/2
ST
j∈F
= C1,T + C2,T − C3,T + C4,T − C5,T + C6,T .
By boundedness of $(λ), Hölder continuity Assumption A.2.iv, and bandwidth Assumption C.2, Ci,T
= O(T 1/2 /bαT ) = o(1) for i = 4, 5, 6.
Further, by Hölder continuity Assumption A.2.iv, and arguments in Dunsmuir (1979, p. 497):
kC2,T k ≤ K
T
kST k
1/2
sup
sup
kf (λ)$(λ) − f (λj )$(λj )k = O(T 1/2−α ) = o(1),
j∈F |λ−λj |≤2π/T
and likewise C3,T = o(1). The remaining term C1,T is op (1) by replicating Dunsmuir (1979, p. 498)’s
argument. QED.
Next, standard asymptotic results apply under a negligible data transform.
27
p
Lemma A.6 (Hessian) Let {θT∗ } be any stochastic sequence {θT∗ } that satisfies θT∗ → θ0 . Under
p
Assumptions A-C (∂ 2 /∂θ∂θ0 )Q̃∗T (θT∗ ) → Ω.
Proof.
By Minkowski’s inequality and approximation Lemma A.7:
2
Z π
∂
1
∂
∂
∗
∗
0
ln f (λ) ln f (λ) dλ
∂θ∂θ0 Q̃T (θT ) − 2π
∂θ
∂θ
Z π −π
Z π
1
∂
∂
∂
1
0
∗ ∂
∗ 0
≤
ln f (λ) ln f (λ) dλ −
ln f (λ, θT ) ln f (λ, θT ) dλ
2π −π ∂θ
∂θ
2π −π ∂θ
∂θ
Z π
1
∂
∗
(λ)
−
f
(λ))
$(λ)dλ
+
(I
T
2π
∂θ
−π
Z π
1
∂
∂
∗
∗
(IT (λ) − f (λ))
+
$(λ, θT ) −
$(λ) dλ
2π
∂θ
∂θ
Z−π
π
1
∂
(f (λ, θT∗ ) − f (λ)) $(λ, θT∗ )dλ
+
2π
+ op (1)
∂θ
−π
= A1 (θT∗ ) + A2,T + A3,T (θT∗ ) + A4,T (θT∗ ) + op (1) .
The op (1) term follows from Lemma A.7. The terms A2,T and A3,T (θT∗ ) are op (1) by Lemma A.8.
Rπ
Next, consider A1 (θT∗ ) and let ai,j (θ) be the (i, j)th element of the matrix (2π)−1 −π (∂/∂θ) ln f (λ, θ)
× (∂/∂θ) ln f (λ, θ)0 dλ. Assumption A.2 spectrum boundedness and smoothness properties imply
supθ∈Θ ||(∂/∂θ)ai,j (θ)|| ≤ K. By a first order expansion |ai,j (θT∗ ) − ai,j (θ0 )| ≤ supθ∈Θ ||(∂/∂θ)ai,j (θ)||
p
p
× ||θT∗ − θ0 || ≤ K||θT∗ − θ0 || → 0 hence A1 (θT∗ ) → 0.
Lastly, A4,T (θT∗ ). Assumption A, a first order expansion and consistency imply: |f (λ, θT∗ ) − f (λ)|
Rπ
p
× ||(∂/∂θ)$(λ, θT∗ )|| ≤ K|f (λ, θT∗ ) − f (λ)| ≤ K||θT∗ − θ0 || → 0. Hence A4,T (θT∗ ) ≤ K −π |f (λ, θT∗ ) −
p
f (λ)|dλ ≤ K||θT∗ − θ0 || → 0. QED.
Lemma A.7 (periodogram approximation) Let ω(λ, θ) be an R-valued uniformly bounded mapRπ ∗
p
(Ib (λ) − I ∗ (λ))ω(λ, θ)dλ| →
ping on [−π, π] × Θ. Under Assumptions A-C supθ∈Θ |(T /||ST ||)1/2
−π
T
T
0 where T /||ST || → ∞.
The property T /||ST || → ∞ follows from Lemma A.9. Let ω(M ) (λ, θ) be the M th Cesáro
Proof.
sum of the Fourier series of uniformly bounded ω(λ, θ), and let ωh (θ) be the hth Fourier coefficient of
ω(λ, θ). Then by Fejer’s theorem and the construction of ω(M ) (λ, θ), for M ∈ N:
T
kST k
1/2 Z
−π
=
π
T
kST k
IbT∗ (λ) − IT∗ (λ) ω(λ, θ)dλ
1/2 Z
π
−π
IbT∗ (λ) − IT∗ (λ) ω(M ) (λ, θ)dλ + op (1)
1/2 n
M o
1 X
|h|
T
b∗ (Yb(0) ) − γ̂ ∗ (cT,h ) + op (rM,T )
=
1−
ωh (θ)
γ̂
T,h
T,h
h,(kT )
2π
M
kST k
h=−M
= AM,T (θ) + op (rM,T ).
28
Rπ
− |h|/M )ωh (θ)| = supθ∈Θ |2π −π ω(λ, θ)dλ| < ∞ by construcp
b∗ (Yb(0) ) − γ̂ ∗ (cT,h )| →
tion and uniform boundedness, and (T /||ST ||)1/2 × |γ̂
0 by Lemma A.2.b.
Observe limM →∞ supθ∈Θ |
PM
h=−M (1
T,h
p
h,(kT )
T,h
Hence limM →∞ supθ∈Θ |AM,T (θ)| → 0 as T → ∞. The claim now follows from rM,T → 0 as M → ∞
for any T . QED.
Lemma A.8 (LLN) Let ω(λ, θ) be an R -valued uniformly bounded mapping on [−π, π] × Θ with inRπ
tegrable envelope supθ∈Θ |ω(λ, θ)|. Under Assumption A and C supθ∈Θ | −π {IT∗ (λ) − f (λ)}ω(λ, θ)dλ|
p
→ 0.
Proof.
Let ω(M ) (λ, θ) and ωh (θ) be as in the proof of Lemma A.7. By Lemma A.2.a, negligibility
Rπ
P
p
kT /T → 0, and dominated convergence −π IT∗ (λ)dλ = 1/T Tt=1 yt2 I(|yt2 − E[yt2 ]| + op (1) < cT,0 ) →
Rπ
E[yt2 ] < ∞, and by supposition −π supθ∈Θ |ω(λ, θ)|dλ < ∞. Hence, by Fejer’s theorem:
Z
π
−π
{IT∗ (λ) − f (λ)}ω(λ, θ)dλ =
=
By construction (2π)−1
A1,T
A2,T
Rπ
∗
−π {IT (λ)
1
2π
Z
{IT∗ (λ) − f (λ)} ω(M ) (λ, θ)dλ
−π
Z π
1
+
{IT∗ (λ) − f (λ)} ω(M ) (λ, θ) − ω(λ, θ) dλ
2π −π
Z π
1
{I ∗ (λ) − f (λ)} ω(M ) (λ, θ)dλ + op (rM,T ).
2π −π T
− f (λ)}ω(M ) (λ, θ)dλ = A1,T (M, θ) + A2,T (M, θ) where
M X
1
(M, θ) =
1−
(2π)2 h=−M
M X
1
1−
(M, θ) =
(2π)2 h=−M
PM
π
|h|
M
n h
i
o
∗
ωh (θ) E γ̂T,|h|
(cT,|h| ) − E[yt yt−h ]
|h|
M
n
o
∗
∗
(cT,h ) .
ωh (θ) γ̂T,|h|
(cT,|h| ) − E γ̂T,h
Rπ
−π ω(λ, θ)dλ| < ∞ by uniform
∗
boundedness and integrability, E[γ̂T,|h| (cT,|h| )] → E[yt yt−h ] by negligibility and dominated converp
∗
∗
gence, and |γ̂T,|h|
(cT,|h| ) − E[γ̂T,|h|
(cT,|h| )]| → 0 by Lemma A.2.a. Therefore, limM →∞ supθ∈Θ |A1,T (M, θ)|
Rπ
p
p
→ 0 and limM →∞ supθ∈Θ |A2,T (M, θ)| → 0, hence supθ∈Θ | −π {IT∗ (λ) − f (λ)}ω(λ, θ)dλ| → 0. QED.
Observe limM →∞ supθ∈Θ |
−M (1
− |h|/M )ωh (θ)| = 2π supθ∈Θ |
We require central limit theorems for both the tail-trimmed spectral density estimator, and a combination of tail-trimmed and tail event random variables, used respectively for the non-bias corrected
and bias corrected estimators. Let ωh be the hth Fourier coefficient of ω(λ), and define
bT
n h
i
o
1 X
∗
B̃T ≡
$h E γ̂T,|h|
(cT,|h| ) − E [yt yt−h ] .
2π
h=−bT
Lemma A.9 (CLT for the tail-trimmed periodogram) Let Assumptions A-C hold (in particular, let the trimming fractiles satisfy kT,h ∼ Kh,h̃ kT,h̃ for some finite Kh,h̃ > 0 and each h̃, h ∈
29
{0, ..., bT }).
−1/2
a. (2π)−1 T 1/2 ST
bution then B̃T =
d
∗
−π (IT (λ) − f (λ))$(λ)dλ −B̃T } →
∗ (c
2
(2π)−1 $0 {E[γ̂T,0
T,0 )] − E[yt ]}.
{
Rπ
N (0, Ik ). If yt yt−h has a symmetric distri-
b. ||ST || ∼ KE[yt4 I(yt2 ≤ cT,0 )] hence lim inf T →∞ ||ST || > 0.
c. If κ > 4 then ||ST || ∼ K, if κ = 4 then ||ST || ∼ K L̃4 (T ) for slowly varying L̃4 (T ) in (A.3), and
if κ ∈ (2, 4) then ||ST || ∼ K(T /kT )4/κ−1 , hence T /||ST || → ∞ ∀κ > 2.
Write ψh,t (c) ≡ yt yt−h I(|yt yt−h − E[yt yt−h ]| < c)], where ψh,t (c) = 0 for t ∈
/ {1 + h, ..., T }
Proof.
and h > bT .
Claim (a).
SM,T ≡ E
Let
GT,M G0T,M
Recall fT∗ (λ) ≡ (2π)−1
where GT,M ≡ T
PbT
h=−bT
1/2
M io
h
n
X
|h|
1
∗
∗
(c
)
.
1
−
$
(c
)
−
E
γ̂
γ̂
h
T,|h|
T,|h|
T,|h|
T,|h|
M
(2π)2 h=−M
∗
E[γ̂T,|h|
(cT,|h| )]e−iλh , let ω(M ) (λ) and ωh be as in the proof of Lemma
∗
th Cesáro sum of the Fourier series of f (λ) and
A.7, and f(M ) (λ) and fT,(M
) (λ) respectively be the M
fT∗ (λ). By construction:
∗
fT,(b
(λ)
T)
=
E[IT∗ (λ)]
bT
bT
1 X
1 X
∗
−ihλ
=
E[γ̂T,|h| (cT,|h| )]e
and f(bT ) (λ) =
E[yt yt−h ]e−ihλ .
2π
2π
h=−bT
h=−bT
Now use lim inf T →∞ ||ST || > 0 by Claim (c), Hölder continuity Assumption A.2.iv, bandwidth Assumption C.2 bαT /T 1/2 → ∞, and the construction of the Fourier coefficients ωh to deduce
−1/2
T 1/2 ST
Z
π
$(λ){fT∗ (λ) − f (λ)}dλ =
−π
=
−1/2
T 1/2 ST
Z
π
−π
−1/2
T 1/2 ST
∗
$(λ){fT,(b
(λ) − f(bT ) (λ)}dλ + O(T 1/2 /bα
T)
T)
bT
n h
i
o
1 X
|h|
∗
1−
$h E γ̂T,|h|
(cT,|h| ) − E [yt yt−h ] + o(1)
2π
bT
h=−bT
=
T
1/2
−1/2
ST
bT
n h
i
o
1 X
∗
$h E γ̂T,|h|
(cT,|h| ) − E [yt yt−h ] + o(1).
2π
h=−bT
This implies bias B̃T can be expressed as:
Z
π
$(λ){fT∗ (λ) − f (λ)}dλ + o kST k1/2 /T 1/2 .
B̃T =
−π
Hence, by the same Hölder continuity argument:
−1/2
T 1/2 ST
1
2π
Z
π
(IT∗ (λ)
− f (λ)) $(λ)dλ − B̃T
Z π
−1/2 1
= T 1/2 ST
(I ∗ (λ) − fT∗ (λ)) $(λ)dλ + o (1)
2π −π T
−π
30
(A.4)
=
−1/2
T 1/2 ST
π
Z
1
2π
−π
(IT∗ (λ) − E [IT∗ (λ)]) $(λ)dλ + o (1) .
Now use the same Cesáro sum argument exploited in the previous proofs, and (A.4), to obtain:
T
1/2
−1/2
ST
1
2π
Z
π
−1/2
=
=
− f (λ)) $(λ)dλ − B̃T
(A.5)
−π
= T 1/2 ST
(IT∗ (λ)
1
2π
−1/2 1/2
ST SM,T
−1/2 1/2
ST SM,T
π
Z
(IT∗ (λ) − E [IT∗ (λ)]) $(λ)dλ + o (1)
−π
−1/2
SM,T T 1/2
M
X
1
2
(2π)
h=−M
|h|
1−
M
n
h
io
∗
∗
$h γ̂T,|h|
(cT,|h| ) − E γ̂T,|h|
(cT,|h| ) + op (rM,T ) + o (1)
× ZM,T + op (rM,T ) + o (1) ,
say. By construction of the Cesáro sums and ST we have limM →∞ ST−1 SM,T = Ik . The claim therefore
d
follows if we show ZM,T → N (0, Ik ) as T → ∞ for any M since we may then take M to be arbitrarily
d
large to complete the proof. By the Cramér-Wold Theorem we need only prove ξ 0 ZM,T → N (0, 1) as
T → ∞ for any ξ ∈ Rk , ξ 0 ξ = 1.
Write $̃h ≡ $−h + $h for h 6= 0 and $̃0 ≡ $0 . Define
(
ζM,T,h (ξ) ≡
ET,h,t (ξ) ≡
1
(2π)2
1−
−1/2
ξ 0 SM,T $̃h × (ξ 0 SM,T ξ)1/2 for h ∈ {0, ..., M }
(A.6)
0 for h > M
1
(ξ 0 SM,T ξ)1/2
XM,T,t (ξ) ≡
h
M
∞
X
T −h
{ψh,t (cT,h ) − E [ψh,t (cT,h )]}
T − h − kT
ζh,M,T (ξ)ET,h,t (ξ) =
h=0
M
X
ζh,M,T (ξ)ET,h,t (ξ).
h=0
Notice by construction ET,h,t (ξ) = 0 ∀t ∈
/ {h + 1, ..., T } and h > bT . Then
0
ξ ZM,T =
It remains to prove 1/T 1/2
PT
T
1 X
T 1/2
t=1 XM,T,t (ξ)
XM,T,t (ξ) × (1 + op (1)).
(A.7)
t=1
d
→ N (0, 1). We now drop ξ to reduce notation.
Central limit theorem Lemma A.1 applies to XM,T,t by measurability and the stated assumptions
P
2
on yt . First, by construction and Lemma A.1.a 1 = E(1/T 1/2 Tt=1 XM,T,t )2 ∼ KE[XM,T,t
], hence
2
E[XM,T,t
] ∼ K. Notice K depends on M : we do not show this since K is positive and finite for each
M.
d
t=1 XM,T,t → N (0, 1) provided Lindeberg condition (A.1) holds
P
2
2
for XM,T,t . By stationarity and E(1/T 1/2 Tt=1 XM,T,t )2 ∼ K it suffices to show E[XM,T,t
I(XM,T,t
>
−1
2
2
T )] → 0 ∀ > 0. Use the triangle inequality, limM →∞ ST SM,T = Ik , ||ST || ∼ KE[ψ0,t (cT,0 )] by
2 (c
Claim (b), lim inf T →∞ E[ψ0,t
T,0 )] > 0 by negligibility and distribution non-degenerateness, and
Next, by Lemma A.1.b 1/T 1/2
PT
31
|ψh,t (cT,h )| ≤ cT,h by construction, to deduce for some ϑ ∈ Rk , ϑ0 ϑ = 1:
c
h
T,h
ϑ0 $̃h × h
|XM,T,t | ≤
|ζh,M,T ET,h,t | ≤ K
.
1−
i
1/2
M
2 (c
h=0
h=0 )
E ψ0,t
T,0
M
X
M X
Under power law Assumption B cT,h ∼ Lh,0 (T )1/κh,0 (T /kT )1/κh,0 , and by construction κh,0 ≥ κ0,0 .
P
0
Hence cT,h ≤ Lh,0 (T )1/κ0,0 (T /kT )1/κ0,0 . Since under Assumption A lim supM →∞ K M
h=0 |(1 − h/M )(ϑ $̃h )|
2 (c
1/2 ≡ A , say.
≤ K we therefore have |XM,T,t | ≤ KLh,0 (T )1/κ0,0 × (T /kT )1/κ0,0 /(E[ψ0,t
T,0 )])
T
2
If we prove AT = o(T 1/2 ), then XM,T,t
< T 2 a.s. for all T ≥ N and some N ≥ 1, hence
2
2
E[XM,T,t
I(XM,T,t
> T 2 )] = 0 ∀T ≥ N which completes the proof. If κ ≥ 4 then κ0,0 ≥ 2 hence AT ≤
KLh,0 (T )1/κ0,0 (T /kT )1/2 = o(T 1/2 ). Conversely, if κ ∈ (2, 4) then κ0,0 ∈ (1, 2) hence by Karamata the2 (c
2/κ0,0 −1 . Hence A ∼ KL
1/κ0,0 × (T /k )1/κ0,0 /(T /k )1/κ0,0 −1/2
ory (A.3) E[ψ0,t
T,0 )] ∼ K(T /kT )
T
T
T
h,0 (T )
= Lh,0 (T )1/κ0,0 (T /kT )1/2 = o(T 1/2 ).
−1/2
Define Ψh,t ≡ ψh,t (cT,h ) − E[ψh,t (cT,h )], ηM,h ≡ 1 − h/M and ϑT (ξ) ≡ ST ξ/(ξ 0 ST−1 ξ)1/2 .
P
2
(ξ)], and
By the arguments above SM,T ST−1 ∼ Ik and 1 = E(1/T 1/2 Tt=1 XM,T,t (ξ))2 ∼ KE[XM,T,t
P
M
2
2
2
by another application of Lemma A.1.a: E[XM,T,t (ξ)] ∼ K h=0 ζh,M,T (ξ)E[ET,h,t (ξ)]. Therefore
Claim (b).
M
M
X
X
2
2
−1/2
−1/2
2
2
ξ 0 ST $̃h $̃h0 ST ξ E Ψ2h,t ∼ K
ηM,h
ζh,M,T (ξ)E ET,h,t (ξ) ∼ K
E XM,T,t (ξ) ∼ K
h=0
h=0
hence
ξ 0 ST−1 ξ
−1
∼ K
M
X
2
× ϑT (ξ)0 $̃h $̃h0 ϑT (ξ) × E Ψ2h,t
ηM,h
h=0
h
i
2
E
Ψ
h,t
2
i.
h
ϑT (ξ)0 $̃h $̃h0 ϑT (ξ)
= KE Ψ20,t ϑT (ξ)0 $̃0 $̃00 ϑT (ξ) +
ηM,h
2
E Ψ0,t
h=1
M
X
Notice ϑT (ξ)0 ϑT (ξ) = 1 for each T and ξ. Further, κh,0 ≥ κ0,0 and kT,h ∼ Kh,h̃ kT,h̃ for Kh,h̃ > 0 imply
P
2
0
0
max0≤h≤M E[Ψ2h,t ] ∼ KE[Ψ20,t ] for any M . Moreover, lim supM →∞ supϑ0 ϑ=1 M
h=0 ηM,h (ϑ $̃h $̃h ϑ) <
∞ by boundedness properties Assumption A. Therefore supξ0 ξ=1 (ξ 0 ST−1 ξ)−1 ∼ KE[Ψ20,t ] and inf ξ0 ξ=1 (ξ 0 ST−1 ξ)−1
∼ KE[Ψ20,t ]. This proves by the definition of the spectral norm ||ST−1 ||−1 ∼ KE[Ψ20,t ] hence ||ST || ∼
2 (c
2
4
2
KE[Ψ20,t ] ∼ KE[ψ0,t
T,0 )]. Finally, by negligibility E[ψ0,t (cT,0 )] ∼ E[yt I(yt < cT,0 )], and negligibil-
ity and a non-degenerate distribution imply lim inf T →∞ E[yt4 I(yt2 < cT,0 )] > 0.
Claim (c).
If κ > 4 then by dominated convergence E[(yt yt−h )2 I(|yt yt−h − E[yt yt−h ]| < cT,h )]
∼ E[(yt yt−h )2 ] ∈ (0, ∞), and if κ ≤ 4 then by Karamata theory E[(yt yt−h − E[yt yt−h ])2 I(|yt yt−h −
E[yt yt−h ]| < cT,h )] is shown in (A.3). Coupled with Claim (b) this proves the orders for ||ST ||. QED.
32
Define κh,0 ≡ min{κh,1 , κh,2 } and:
I∗T,h,t ≡ I (|yt yt−h − E [yt yt−h ]| ≥ cT,h ) − P (|yt yt−h − E [yt yt−h ]| ≥ cT,h )
I∗T,h
≡
1
T
X
1
κh,0 k 1/2
T
DT,0 =
(A.8)
I∗T,h,t
t=h+1
1/2
kT
1
1 1−κ
1−κ
cT,0 and DT,h = 1/2 dh,1 cT,h h,1 − dh,2 cT,h h,2 for h 6= 0.
κ0,0 − 1 T − kT
kT
Lemma A.10 (CLT for tail events and tail-trimmed periodogram) Let Assumptions A, B 0
and C hold, let M ∈ N be arbitrary, and define
!
M X
1
|h|
(IT∗ (λ) − f (λ)) $(λ)dλ − B̃T +
1−
$h DT,h I∗T,h
2π
M
−π
Z
1
≡ T 1/2
2π
Z̃M,T
π
h=−M
−1/2
d
0
and S̃M,T ≡ E[Z̃M,T Z̃M,T
]. Then S̃M,T Z̃M,T → N (0, Ik ). If κ > 4 then S̃T = limM →∞ S̃M,T = ST (1
+ o(1)) and if κ ∈ (2, 4] then S̃T = limM →∞ S̃M,T = ST (1 + O(1)).
Write ψh,t (c) ≡ yt yt−h I(|yt yt−h − E[yt yt−h ]| < c)]. Let ω(M ) (λ) and ωh be as in the proof
Proof.
of Lemma A.7, and define $̃0 ≡ $0 and $̃h ≡ $−h + $h for h 6= 0. Define
−1/2 ∗
IT,h,t .
AT,h,t ≡ κ−1
h,0 DT,h T kT
−1/2
By the same arguments leading to (A.4)-(A.7), it can be shown S̃M,T Z̃M,T equals:
T
1/2
−1/2
S̃M,T
=T
=
=
1/2
1
2π
Z
T 1/2
1
T 1/2
T
X
(IT∗ (λ)
1
2π
Z
T
X
− f (λ)) $(λ)dλ − B̃T
+
π
(IT∗ (λ)
M X
1
(2π)
−E
[IT∗ (λ)]) $(λ)dλ
−π
2
t=1
−π
−1/2
S̃M,T
1
π
h=0
1−
|h|
M
+
M
X
1
(2π)
2
h=−M
|h|
1−
M
M
X
1
2
(2π)
h=−M
|h|
1−
M
!
DT,h I∗T,h
(A.9)
!
DT,h I∗T,h
+ o(1)
−1/2
S̊M,T $̃h {ψh,t (cT,h ) − E [ψh,t (cT,h )] + AT,h,t } (1 + op (1)) + op (rM,T ) + o(1)
X̃M,T,t + op (rM,T ) + o(1),
t=1
say, where S̊M,T satisfies E(1/T 1/2
PT
2
t=1 X̃M,T,t )
−1
= 1. By construction and (A.9), limM →∞ S̊M,T S̃M,T
= Ik and S̃T = limM →∞ S̃M,T for each T .
In view of measurability, as in the proof of Lemma A.9 we need only use Lemma A.1 to prove
the claim. We simplify notation by assuming θ0 is a scalar, but the following easily extends to the
general case by use of the Cramér-Wold theorem.
−1/2
First, consider how limM →∞ S̃M,T relates to ST . By two applications of Lemma A.1.a, E[(S̃M,T Z̃M,T )2 ]
P
−1
2
= 1 by construction, and limM →∞ S̊M,T S̃M,T
= 1, we have 1 = E(T −1/2 Tt=1 X̃M,T,t )2 ∼ KE[X̃M,T,t
],
33
and
S̃M,T
T M |h| 2 2
1 XX
1−
∼K
$̃h E ({ψh,t (cT,h ) − E [ψh,t (cT,h )] + AT,h,t })2 .
T
M
t=1 h=0
By degenerateness I(|yt yt−h − E[yt yt−h ]| ≥ cT,h ) → 0 a.s., the construction I(|yt yt−h − E[yt yt−h ]| <
cT,h ) = 1 − I(|yt yt−h −E[yt yt−h ]| ≥ cT,h ), and negligibility, the proof that E({ψh,t (cT,h ) − E[ψh,t (cT,h )]
+ AT,h,t })2 ∼ KE(ψh,t (cT,h ) − E[ψh,t (cT,h )])2 , where K = 1 if κ ≥ 4, is identical to the corresponding
proof for Theorem 2.1 in Hill (2013). The claims limM →∞ S̃M,T = ST (1 + o(1)) if κ > 4 and
P P
limM →∞ S̃M,T = ST (1 + O(1)) if κ ≤ 4 now follow from ST ∼ limM →∞ KT −1 Tt=1 M
h=0 (1 −
|h| /M )2 $̃h2 E(ψh,t (cT,h ) − E[ψh,t (cT,h )])2 by the proof of Lemma A.9.
Finally, by Lemma A.1.b we need only demonstrate the Lindeberg condition. Since 1 =
P
2
2
2
E(T −1/2 Tt=1 X̃M,T,t )2 ∼ KE[X̃M,T,t
] it suffices to check E[X̃M,T,t
I(X̃M,T,t
> T 2 )] → 0 ∀ > 0.
Observe that 1 < κ0,0 ≤ κh,0 = min{κh,1 , κh,2 }, and under power law Assumption B 0 cT,h ∼
1−κ
1/2
1/2
K(T /kT )1/κh,0 , hence DT,0 ≤ K(kT /T )(T /kT )1/κ0,0 and |DT,h | ≤ KcT,h h,0 /kT
1/2
= K(kT /T )(T /kT )1/κh,0
1/2
≤ K(kT /T )(T /kT )1/κ0,0 . Therefore
|AT,h,t | ≤ K
T
kT
1/κ0,0
|I (|yt yt−h | ≥ cT,h ) − P (|yt yt−h | ≥ cT,h )| ≤ K
T
kT
1/κ0,0
.
Similarly, ψh,t (cT,h ) ≤ KcT,h ∼ K(T /kT )1/κh,0 ≤ K(T /kT )1/κ0,0 . By the construction of X̃M,T,t , the
2
2
remaining steps for showing E[X̃M,T,t
I(X̃M,T,t
> T 2 )] → 0 follow from the line of proof of Lemma
−1
2 (c
= 1, limM →∞ S̃M,T = ST (1 + o(1)), and ||ST || ∼ KE[ψ0,t
A.9.a given limM →∞ S̊M,T S̃M,T
T,0 )] by
Lemma A.9.b. QED.
A.3
Proofs of Theorems
We now prove Theorems 2.1, 2.2 and 3.1.
Let ωh be the hth Fourier coefficient of $(λ) ≡
−(f (λ))−1 (∂/∂θ) ln f (λ).
Proof of Theorem 2.1.
Define criteria
(
)
X
X
b∗ (λj )
I
IT∗ (λj )
∗
∗
T
ln f (λj , θ) +
,
Q̂T (θ) ≡
ln f (λj , θ) +
and QT (θ) ≡
f (λj , θ)
f (λj , θ)
j∈F
j∈F
and write sample gradients and Hessians as
∂ ∗
∂ ∗
bT (θ) ≡ ∂ GbT (θ) and HT (θ) ≡ ∂ GT (θ).
GbT (θ) ≡
Q̂ (θ), GT (θ) ≡
Q (θ), H
∂θ T
∂θ T
∂θ0
∂θ0
Recall the population Hessian
1
H(θ) ≡ −
2π
Z
π
∂
1
$(λ, θ) 0 f (λ, θ)dλ −
∂θ
2π
−π
34
Z
π
{f (λ, θ) − f (λ)}
−π
∂
$(λ, θ)dλ.
∂θ
p
p
Define θT∗ ≡ arg minθ∈Θ {Q∗T (θ)}. We prove ||θ̂T∗ − θT∗ || → 0 and θT∗ → θ0 .
p
Step 1 (||θ̂T∗ − θT∗ || → 0).
Approximation Lemma A.7 and LLN Lemma A.8, combined with
spectrum boundedness and Hölder continuity Assumption A.2.iv imply:
p
b
p
sup GbT (θ) − GT (θ) → 0 and sup H
(θ)
−
H
(θ)
→ 0.
T
T
θ∈Θ
(A.10)
θ∈Θ
See the proofs of Lemmas A.5 and A.7 for relating the integrated robust spectral density estimator
bT (θ)},
in {GT (θ), HT (θ)} to the weighted partial sums of the robust estimator embodying {GbT (θ), H
cf. Dunsmuir (1979) and Hannan (1973a). We omit the added steps here to conserve space.
bT (b̃
Taylor expansions show by the FD-QML first order conditions GbT = H
θT ) × (θ̂T∗ − θ0 ) and
b̃
b̃
GT = HT (θ̃T ) × (θT∗ − θ0 ) for sequences {θT , θ̃T } satisfying ||θT − θ0 || ≤ ||θ̂T∗ − θ0 || and ||θ̃T −
θ0 || ≤ ||θT∗ − θ0 ||. Now use (A.10), spectrum boundedness, and multiple applications of Minkowski’s
inequality to deduce
∗
θ̂T − θT∗ = θ̂T∗ − θ0 − (θT∗ − θ0 )
−1
b b̃ −1 b
b b̃
b b̃ −1 ≤ H
(
θ
)
×
G
−
G
+
H
(
θ
)
−
H
(
θ̃
)
×
H
(
θ
)
×
H(
θ̃
)
× kGT k
T
T T
T T T T T
T T T −1
b b̃ −1
= op H
(
θ
)
1
+
H(
θ̃
)
× kGT k
.
T T
T p
bT (θ)H−1 (θ) − Ik || →
0 hence
If supθ∈Θ {||HT (θ)||−1 } = Op (1) then by (A.10) we have supθ∈Θ ||H
T
−1
−1
b
b
−1
sup H
(θ)
≤
O
(1)
×
sup
H
(θ)H
(θ)
−
I
+ Op (1) = Op (1).
p
T
T
k
T
θ∈Θ
θ∈Θ
Therefore, if we show ||GT || = Op (1) and supθ∈Θ ||HT (θ)||−1 = Op (1) then the proof that ||θ̂T∗ − θT∗ ||
p
→ 0 is complete.
By Hölder continuity Assumption A.2.iv (see Hannan, 1973a):
GT =
X
{IT∗ (λj ) − f (λj )} $(λj ) =
j∈F
1
2π
Z
π
−π
{IT∗ (λ) − f (λ)} $(λ)dλ + op (1).
Spectrum boundedness and LLN Lemma A.8 therefore imply ||GT || = op (1).
Next, recall IT (λ) is the periodogram of yt . By construction HT (θ) = H(θ) + rT (θ) where
Z π
X
∂
1
∂
rT (θ) = −
$(λj , θ) 0 f (λj , θ) −
$(λ, θ) 0 f (λ, θ)dλ
∂θ
2π −π
∂θ
j∈F
Z π
X
∂
1
∂
−
{f (λj , θ) − f (λj )} $(λ, θ) −
{f (λ, θ) − f (λ)} $(λ, θ)dλ
∂θ
2π −π
∂θ
j∈F
35
+
X
{IT (λj ) − f (λj )}
j∈F
X
∂
∂
$(λj , θ) +
{IT∗ (λj ) − IT (λj )} $(λj , θ)
∂θ
∂θ
j∈F
= A1,T (θ) + A2,T (θ) + A3,T (θ) + A4,T (θ).
Spectrum boundedness and Hölder continuity Assumption A.2.iv imply ||A1,T (θ)||, ||A2,T (θ)|| and
||A3,T (θ)|| are op (1) uniformly on Θ. See the proof of Lemma A.5 for related arguments, and see
Dunsmuir (1979, p. 497-498) for classic arguments.
Consider A4,T (θ) and let η : [−π, π] × Θ → R be any mapping that satisfies |η(λ; θ)| ≤ K for all
P
(λ, θ) ∈ [−π, π]×Θ. Write γ̂T,h ≡ 1/T Tt=h+1 yt yt−h . Square integrability and distribution continuity
ensure E|yt yt−h |1+ι < ∞ for some infinitessimal ι > 0. Hence, by triangle and Hölder inequalities,
stationarity and negligibility:
ι/(1+ι)
T
∗
1 X
kT
1+ι 1/(1+ι)
ι/(1+ι)
E γ̂T,h (cT,h ) − γ̂T,h ≤
(E |yt yt−h | )
P (|yt yt−h − E [yt yt−h ]| > cT,h )
=K
→ 0.
T
T
t=h+1
p
∗ (c
Therefore |γ̂T,h
T,h ) − γ̂T,h | → 0 by Markov’s inequality. Now use boundedness of η(λ; θ) to deduce
π
Z
sup θ∈Θ
−π
{IT∗ (λ)
p
− IT (λ)} η(λ; θ)dλ → 0.
(A.11)
p
Combine boundedness Assumption A.2 and (A.11) to deduce supθ∈Θ ||A4,T (θ)|| → 0. Therefore
p
supθ∈Θ ||HT (θ) − H(θ)|| → 0. Finally, Assumption A.3 states inf θ∈Θ ||H(θ)|| > 0 hence supθ∈Θ ||HT (θ)||−1
= Op (1).
p
Step 2 (θT∗ → θ0 ).
p
In view of (A.11) we can use IT (λ) in place of IT∗ (λ). The proof of θT∗ →
θ0 therefore follows by arguments in Dunsmuir and Hannan (1976), or by the proof of Theorem 1 in
McCloskey and Hill (2014). QED.
Proof of Theorem 2.2.
−1/2
Recall the solution θ̃T∗ defined by (A.2). We prove T 1/2 VT
−1/2
d
+ BT ) → N (0, Ik ) hence T 1/2 VT
d
(θ̃T∗ − θ0
(θ̂T∗ − θ0 + BT ) → N (0, Ik ) by Lemma A.5.
By optimization problem (A.2), and a first order expansion around θ0 , for some θ̊T , ||θ̊T − θ0 || ≤
||θ̃T∗
− θ0 ||:
−1/2
T 1/2 VT
2
−1
Z π
∂
1
−1/2
∗
∗
b
θ̃T∗ − θ0 = −T 1/2 VT
Q̃
(
θ̊
)
I
(λ)
−
f
(λ)
$(λ)dλ.
T
∂θ∂θ0 T
2π −π T
p
Hessian consistency Lemma A.6 applies with θ̊T since ||θ̊T − θ0 || ≤ ||θ̃T∗ − θ0 || → 0 by equivalency Lemma A.5 and consistency Theorem 2.1. Therefore by the definition VT = Ω−1 ST Ω−1 , and
approximation Lemma A.7:
T
1/2
−1/2
VT
θ̃T∗
− θ0
= −T
1/2
−1/2
VT Ω−1
1
2π
36
Z
π
−π
IbT∗ (λ) − f (λ) $(λ)dλ × (1 + op (1))
Z π
1
IbT∗ (λ) − f (λ) $(λ)dλ × (1 + op (1))
=
2π −π
Z π
−1/2 1
= −T 1/2 ST
(I ∗ (λ) − f (λ)) $(λ)dλ × (1 + op (1)) .
2π −π T
−1/2
−T 1/2 ST
(A.12)
Now define bias terms
B̃T ≡
bT
∗
1 X
1
$h E γ̂T,h
(cT,h ) − E[yt yt−h ] ∈ Rk and BT ≡ Ω−1 B̃T .
2π
2π
(A.13)
h=−bT
1/2 −1/2
Use VT ST
= Ω−1 and apply Lemma A.9.a to deduce
θ̃T∗ − θ0 + BT
−1/2
−1/2 1
= T 1/2 VT
θ̃T∗ − θ0 + T 1/2 ST
B̃T
2π
Z
T 1/2 −1/2 π ∗
−1/2 1
ST
=−
(IT (λ) − f (λ)) $(λ)dλ × (1 + op (1)) + T 1/2 ST
B̃T
2π
2π
−π
Z π
T 1/2 −1/2
d
∗
ST
(IT (λ) − f (λ)) $(λ)dλ − B̃T (1 + op (1)) + op (1) → N (0, Ik ) . (A.14)
=−
2π
−π
−1/2
T 1/2 VT
Finally, the claimed properties of VT = Ω−1 ST Ω−1 hold by Lemma A.9.b,c. QED.
Proof of Theorem 3.1.
Define bias corrected spectral density and FD-QML estimators
(bc)
IbT (λ)
(bc)
θ̃T
(bc)
θ̂T
−1/2
We will prove T 1/2 ṼT
1
=
2π
b(bc) (Yb(0) )
γ̂
T,0
0,(kT )
+2
bT
X
b(bc) (Yb(0) )
γ̂
T,h
h,(kT )
!
× cos(λh)
h=1
Z πn
o
1
(bc)
≡ arg min
ln f (λ, θ) + IbT (λ)/f (λ, θ) dλ
θ∈Θ 2π −π
o
Xn
(bc)
≡ arg min
ln f (λj , θ) + IbT (λj )/f (λj , θ) .
θ∈Θ
(bc)
(θ̃T
j∈F
−1/2
d
− θ0 ) → N (0, Ik ). Then T 1/2 ṼT
(bc)
(θ̂T
−1/2
d
− θ0 ) → N (0, Ik ) follows by the
d
(obc)
same argument used to prove Lemma A.5. The proof of T 1/2 ṼT (θ̂T
− θ0 ) → N (0, Ik ) is identical
¯
since mT,h (ξ) = [ξmT,h ] for 0 < ξ ≤ ξ ≤ ξ < ∞ and mT,h /kT → ∞ imply mT,h (ξ)/kT ∼ ξmT,h /kT →
1/2
1/2
∞, while also κ̂h,i,mT,h (ξ) = κh,i + Op (1/mT,h ) and dˆh,i,mT,h (ξ) = dh,i + Op (1/mT,h ) by Lemma A.4
1/2
hence {κ̂h,i,mT,h (ξ) , dˆh,i,mT,h (ξ) } have the same mT,h -rate of convergence as {κ̂h,i,mT,h , dˆh,i,mT,h }. In particular, Hill (2013, proof of Theorem 2.2) shows mT,h /kT → ∞ ensures {κ̂h,i,m (ξ) , dˆh,i,m (ξ) } do not
T,h
−1/2
effect T 1/2 ṼT
(obc)
(θ̂T
T,h
¯ hence {κ̂
ˆ
− θ0 ) asymptotically uniformly in ξ ∈ [ξ, ξ],
h,i,mT,h (ξ̂T,h ) , dh,i,mT,h (ξ̂T,h ) }
−1/2
do not effect the limit distribution of T 1/2 ṼT
(obc)
(θ̂T
4 and ṼT = VT (1 + O(1)) if κ = 4 follow from ṼT =
37
− θ0 ). The claims ṼT = VT (1 + o(1))
−1
Ω S̃T Ω−1 and the Lemma A.10 results
if κ <
that if
κ > 4 then S̃T = limM →∞ S̃M,T = ST (1 + o(1)) and if κ ∈ (2, 4] then S̃T = limM →∞ S̃M,T = ST (1
+ O(1)), where S̃M,T is defined in Lemma A.10.
(bc)
T
Let {R̂T,h }bh=1
be defined as in (16)-(17), and define a bias estimator:
Step 1 (θ̃T ).
b̃ ≡ − 1
B
T
2π
Z
π
R̂T,0 + 2
−π
bT
X
!
R̂T,h × cos(λh) $(λ)dλ.
(A.15)
h=1
Let {rM,T }M,T ∈N be a non-random double array of finite constants that satisfies supT ∈N |rM,T | → 0
as M → ∞, and may be different in different places.
(bc)
We need the process that governs θ̃T . Take I∗T,h , κh,0 and DT,h defined in (A.8), and define for
arbitrary M ∈ N:
Z̃M,T
S̃M,T
!
Z π
M X
1
1
|h|
≡ T 1/2
(IT∗ (λ) − f (λ)) $(λ)dλ − B̃T +
1−
$h DT,h I∗T,h
2π
2π
M
−π
h=−M
i
h
0
and ṼT = Ω−1 S̃T Ω−1 .
≡ E Z̃M,T Z̃M,T
By construction limM →∞ Z̃M,T = Z̃T and limM →∞ S̃M,T = S̃T .
In view of bias definitions (A.13) and (A.15), and arguments leading to (A.12) and (A.14), use
ṼT = Ω−1 S̃T Ω−1 and approximation Lemma A.7 to deduce:
T
1/2
−1/2
ṼT
(bc)
θ̃T
− θ0
Z π
1
(bc)
=
IbT (λ) − f (λ) $(λ)dλ × (1 + op (1))
(A.16)
2π −π
Z π
−1/2 1
= −T 1/2 S̃T
(IT∗ (λ) − f (λ)) $(λ)dλ − B̃T × (1 + op (1))
2π
−π
1
−1/2
b̃ − B̃ × (1 + o (1)) .
+ T 1/2 S̃T
B
p
T
T
2π
−1/2
−T 1/2 S̃T
b̃ is, asymptotically, a linear function of the indicator partial sum I∗ :
In Step 2 we show B
T
T,h
−1/2
T 1/2 S̃T
=−
b̃ − B̃
B
T
T
1 1/2 −1/2
T S̃T
2π
(A.17)
M X
1−
h=−M
|h|
M
$h DT,h × I∗T,h × (1 + op (1)) + op (rM,T ).
Combine (A.16) and (A.17) to obtain by the construction of Z̃M,T :
T
1/2
−1/2
ṼT
(bc)
θ̃T
− θ0
=
−1/2
−T 1/2 S̃T
1
2π
Z
π
(IT∗ (λ)
−π
M
X
− f (λ)) $(λ)dλ − B̃T
× (1 + op (1))
1
|h|
1/2 −1/2
−
S̃T
1−
$h DT,h × I∗T,h × (1 + op (1)) + op (rM,T )
2T
M
(2π)
h=−M
−1/2
−1/2 1/2
= − S̃T S̃M,T × S̃M,T Z̃M,T × (1 + op (1)) + op (rM,T ).
38
−1/2
d
By Lemma A.10: S̃M,T Z̃M,T → N (0, Ik ). Since M can be made arbitrarily large we can always
−1/2 1/2
S̃M,T
impose M → ∞ as T → ∞ such that limT →∞ S̃T
= Ik and limT →∞ rM,T = 0.
∗ (c
We now verify (A.17). Define R̃T,h ≡ −(E[γ̂T,h
T,h )] − E[yt yt−h ]), and recall by
b̃ ).
Step 2 (B
T
constructions (13) and (14) that RT,h ∼ R̃T,h . Then, by Fejer’s theorem, we can write:
B̃T
=
M ∗
1 X
|h|
1−
$h {E γ̂T,h
(cT,h ) − E[yt yt−h ]} + op (rM,T )
2π
M
h=−M
M 1 X
|h|
= −
1−
$h R̃T,h + op (rM,T )
2π
M
h=−M
b̃
B
T
= −
M |h|
1 X
1−
$h R̂T,h + op (rM,T ).
2π
M
h=−M
By arguments in Peng (2001, p. 259-263) for second order Paretian tails, and Assumption B0 , we
P
b̃
9
have T 1/2 M
h=−M $h {R̃T,h − RT,h } = o(1), hence with B̃T and B T above it follows:
−1/2
T 1/2 S̃T
b̃ − B̃
B
T
T
M o
n
1 1/2 −1/2 X
|h|
= − T S̃T
1−
$h R̂T,h − RT,h + op (rM,T ).
2π
M
(A.18)
h=−M
Next, with DT,h defined in (A.8), we show:
1/2
R̂T,h − RT,h = DT,h × kT
(0)
Ybh,(kT ) /cT,h − 1 × (1 + op (1)) .
(A.19)
By imitating arguments in Hill (2013: proof of Theorem 2.1), mT,h /kT → ∞ can be shown to imply
1/2
the tail index estimators κ̂h,i,mT,h and dˆh,i,mT,h in R̂T,m do not affect asymptotics since they are mT,h (0)
1/2
consistent under Assumptions A and B0 by Lemma A.4, while by Lemma A.3 Yb
is k -consistent
h,(kT )
under Assumptions A.1, B and C, and Assumption
B0
T
implies Assumption B. We therefore simply
write, without loss of generality,
kT
T
1/2
kT
(0)
Yb0,(kT )
1
1/2
cT,0 × kT
− 1
κ0,0 − 1 T
cT,0
1−κh,2
1−κh,1
(0)
(0)
b
b
d
Y
d
Y
2
h,1
h,(kT )
(T − h)
h,2 h,(kT )
−
=
for h > 0.
T (T − h − kT )
κh,2 − 1
κh,1 − 1
R̂T,0 =
R̂T,h
1
κ0,0 − 1
(0)
Yb0,(kT ) = RT,0 +
A first order expansions around cT,h implies for some sequence of positive random numbers {c∗T,h }
9
Peng (2001) assumes P (|yt yt−h − γ̃h | ≥ c) = dh,0 c−κh,0 (1 + O(c−eh,0 )) with mT,h = o(T 2eh,0 /(2eh,0 +κh,0 ) ), but
Peng (2001, p. 259-263)’s arguments extend to the other Assumption B0 class P (|yt yt−h − γ̃h | ≥ c) = dh,0 c−κh,0 (1 +
O(ln(c)−eh,0 ) with mT,h = o(ln(T )2eh,0 ), cf. Haeusler and Teugels (1985, Section 5).
39
(0)
(0)
where |Ybh,(kT ) − c∗T,h | ≤ |Ybh,(kT ) − cT,h |:
R̂T,h = RT,h +
1/2
h)2 /kT
(T −
T (T − h − kT )
1−κ
dh,1 cT,h h,1
cT,h
c∗T,h
!κh,1
1−κ
− dh,2 cT,h h,2
= RT,h +
1 1/2
kT
1−κ
1−κ
1/2
dh,1 cT,h h,1 − dh,2 cT,h h,2 × kT
(0)
Ybh,(kT )
cT,h
cT,h
c∗T,h
!κh,2 !
1/2
kT
(0)
Ybh,(kT )
cT,h
− 1
− 1 × (1 + op (1)) .
p
p
(0)
(0)
(0)
The last equality exploits c∗T,h /cT,h → 1 given |Ybh,(kT ) − c∗T,h | ≤ |Ybh,(kT ) − cT,h | and Ybh,(kT ) /cT,h →
1 by Lemma 1 in Hill (2010). Hence we have shown (A.19).
1/2
(0)
Finally, combine (A.18) and (A.19), with kT (Ybh,(kT ) /cT,h − 1) = I∗T,h (1 + op (1)) by Lemma A.3,
to deduce (A.17). QED.
References
Aguilar, M., and J. B. Hill (2014): “Robust Score and Portmanteau Tests of Volatility Spillover,”
Journal of Econometrics, forthcoming.
Andrews, D. (1988): “Laws of Large Numbers for Dependent Non-Identically Distributed Random
Variables,” Econometric Theory, 4, 458–467.
Andrews, D., P. Bickel, F. Hampel, P. Huber, W. Rogers, and J. Tukey (1972): Robust
Estimates of Location. Princeton University Press.
Bartkiewicz, K., A. Jakubowski, T. Mikosch, and O. Wintenberger (2010): “Stable limits
for sums of dependent infinite variance random variables,” Probability Theory and Related Fields,
150, 337–372.
Basrak, B., R. Davis, and T. Mikosch (2002): “Regular Variation of GARCH Processes,”
Stochastic Processes and their Applications, 99, 95–115.
Bradley, R. (1992): “On the Spectral Density and Asymptotic Normality of Weakly Dependent
Random Fields,” Journal of Theoretical Probability, 5, 355–373.
(1993): “Equivalent Mixing Conditions for Random Fields,” Annals of Probability, 21,
1921–1926.
Brockwell, P., and D. Cline (1985): “Linear Prediction of ARMA Processes with Infinite Variance,” Stochastic Processes and their Applications, 19, 281–296.
Carrasco, M., and X. Chen (2002): “Mixing and Moment Properties of Various GARCH and
Stochastic Volatility Models,” Econometric Theory, 18, 17–39.
Chaudhuri, S., and J. B. Hill (2014): “Heavy Tail Robust Estimation and Inference for Average
Treatment Effects,” Working Paper, University of North Carolina, Dept. of Economics.
Chiu, S. T. (1988): “Weighted least squares estimators on the frequency domain for the parameters
of a time series,” Annals of Statistics, 16, 1315–1326.
40
Cline, D. B. H. (1986): “Convolution Tails, Product Tails and Domains of Attraction,” Probability
Theory and Related Fields, 72, 529–557.
Csörgő, S., L. Horváth, and D. Mason (1986): “What Portion of the Sample Makes a Partial
Sum Asymptotically Stable or Normal?,” Probability Theory and Related Fields, 72, 1–16.
Davis, R. A., and T. Mikosch (2009): “Probabilistic Properties of Stochastic Volatility Models,”
in Handbook of Financial Time Series, ed. by t. Mikosch, J.-P. Kreiss, R. A. Davis, and T. G.
Andersen. Springer.
Dehling, H., M. Denker, and W. Phillip (1986): “Central Limit Theorems for Mixing Sequences
of Random Variables Under Minimal Conditions,” Annals of Probability, 14, 1359–1370.
Dette, H., M. Hallin, T. Kley, and S. Volgushev (2011): “Of Copulas, Quantiles, Ranks and
Spectra: an L1-Approach to Spectral Analysis,” Technical Report: Ruhr-University at Bochum.
Doukhan, P. (1994): Mixing: Properties and Examples. Springer-Verlag.
Drees, H., L. de Haan, and S. Resnick (2000): “How to Make a Hill Plot,” Annals of Statistics,
28, 254–274.
Dunsmuir, W. (1979): “A Central Limit Theorem for Parameter Estimation in Stationary Vector
Time Series and its Application to Models for a Signal Observed with Noise,” Annals of Statistics,
7, 490–506.
Dunsmuir, W., and E. J. Hannan (1976): “Vector Linear Time Series Models,” Advances in
Applied Probability, 8, 339–364.
Genton, M. G., and E. Ronchetti (2003): “Robust Indirect Inference,” Journal of the American
Statistical Association, 98, 67–76.
Goldie, C. M., and R. L. Smith (1987): “Slow Variation with Remainder: Theory and Applications,” Quarterly Journal of Mathematics, 38, 45–71.
Granger, C. W. J. (1966): “The Typical Spectral Shape of an Economic Variable,” Econometrica,
34, 150–161.
Haeusler, E., and J. Teugels (1985): “On Asymptotic Normality of Hill’s Estimator for the
Exponent of Regular Variation,” Annals of Statistics, 13, 743–756.
Hagemann, A. (2012): “Robust Spectral Analysis,” Working Paper, Notre Dame University, Dept.
of Economics.
Hall, P. (1982): “On Some Simple Estimates of an Exponent of Regular Variation,” Journal of the
Royal Statistical Societ Series B, 44, 37–42.
Hampel, F. (1974): “The Influence Curve and Its Role in Robust Estimation,” Journal of the
American Statistical Association, 69, 383–393.
Hampel, F., E. Ronchetti, P. Rousseeuw, and W. Stahel (1986): Robust Statistics: The
Approach Based on Influence Functions. Wiley, New York.
Hannan, E. (1973a): “The Asymptotic Theory of Linear Time-Series Models,” Journal of Applied
Probability, 10, 130–145.
(1973b): “The Estimation of Frequency,” Journal of Applied Probability, 10, 510–519.
Hansen, B., E. (1991): “Strong Laws for Dependent Heterogeneous Processes,” Econometric Theory,
7, 213–221.
41
(1992): “Erratum: Strong Laws for Dependent Heterogeneous Processes,” Econometric
Theory, 8, 421–422.
Hill, B. M. (1975): “A Simple General Approach to Inference about the Tail of a Distribution,”
Annals of Statistics, 3(5), 1163–1174.
Hill, J. B. (2009): “On Functional Central Limit Theorems for Dependent, Heterogeneous Arrays
with Applications to Tail Index and Tail Dependence Estimation,” Journal of Statistical Planning
and Inference, 139, 2091–2110.
(2010): “On Tail Index Estimation for Dependent, Heterogeneous Data,” Econometric
Theory, 26, 1398–1436.
(2012a): “Heavy-Tail and Plug-In Robust Consistent Conditional Moment Tests of Functional Form,” in Festschrift in Honor of Hal White, ed. by X. Chen, and N. Swanson, pp. 241–274.
Springer: New York.
(2012b): “Least Tail-Trimmed Squares for Infinite Variance Autoregressions,” Journal of
Time Series Analysis, 34, 168–186.
(2013): “Expected Shortfall Estimation and Gaussian Inference for Infinite Variance Time
Series,” Journal of Financial Econometrics, forthcoming.
(2014a): “Robust Estimation and Inference for Heavy-Tailed GARCH,” Bernoulli, forthcoming.
(2014b): “Tail Index Estimation for a Filtered Dependent Time Series,” Statistica Sinica,
forthcoming.
Hill, J. B., and M. Aguilar (2013): “Moment Condition Tests for Heavy Tailed Time Series,”
Journal of Econometrics, 172, 255–274.
Hill, J. B., and A. McCloskey (2014): “Supplemental Material for ”Heavy Tail Robust Frequency
Domain Estimation”,” Dept. of Economics, University of North Carolina, Chapel Hill.
Hosoya, Y., and M. Taniguchi (1982): “A Central Limit Theorem for Stationary Processes and
the Parameter Estimation of Linear Processes,” Annals of Statistics, 10, 132–153.
Hsing, T. (1991): “On Tail Index Estimation Using Dependent Data,” Annals of Statistics, 19,
1547–1569.
Huber, P. (1964): “Robust Estimation of a Location Parameter,” Annals of Mathematical Statistics,
35, 73–101.
Ibragimov, I. A. (1962): “Some Limit Theorems for Stationary Processes,” Theory of Probability
and its Applications, 7, 349–382.
(1975): “A Note on the Central Limit Theorem for Dependent Random Variables,” Theory
of Probability and its Applications, 20, 135–141.
Khan, S., and E. Tamer (2010): “Irregular Identification, Support Conditions, and Inverse Weight
Estimation,” Econometrica, 78, 2021–2042.
Kirch, C., and D. N. Politis (2011): “TFT-Bootstrap: Resampling Time Series in the Frequency
Domain to Obtain Replicates in the Time Domain,” Annals of Statistics, 31, 47–59.
Kreiss, J.-P., and E. Paparoditis (2012): “The Hybrid Wild Bootstrap for Time Series,” Journal
of the American Statistical Association, 107, 1073–1084.
42
Lee, J., and S. S. Rao (2011): “The Quantile Spectral Density and Comparison Based Tests for
Nonlinear Time Series,” Technical Report : Dept. of Statistics, Texas A&M Univ.
Li, T.-H. (2008): “Laplace Periodogram for Time Series Analysis,” Journal of the American Statistical Association, 103, 757–768.
(2010): “A Nonlinear Method for Robust Spectral Analysis,” IEEE Transactions on Signal
Processing, 58, 2466–2474.
Lighthill, M. (1958): Introduction to Fourier Analysis and Generalized Functions. Cambridge Univ.
Press, Cambridge.
Mancini, L., E. Ronchetti, and F. Trojani (2005): “Optimal Conditionally Unbiased BoundedInfluence Inference in Dynamic Location and Scale Models,” Journal of the American Statistical
Association, 100, 628–641.
Martin, R. D., and D. J. Thomson (1982): “Robust-Resistant Spectrum Estimation,” Proceedings
of the IEEE, 70, 1097–1115.
McCloskey, A. (2013): “Estimation of the Long-Memory Stochastic Volatility Model that is Robust
to Level Shifts and Deterministic Trends,” Journal of Time Series Analysis, 34, 285–301.
McCloskey, A., and J. B. Hill (2014): “Parameter Estimation Robust to Low-Frequency Contamination,” Working Paper, Brown University & University of North Carolina.
McCloskey, A., and P. Perron (2013): “Memory Parameter Estimation in the Presence of Level
Shifts and Deterministic Trends,” Forthcoming in Econometric Theory.
McLeish, D. (1975): “A Maximal Inequality and Dependent Strong Laws,” Annals of Probability,
3, 829–839.
Meitz, M., and P. Saikkonen (2008): “Stability of Nonlinear AR-GARCH Models,” Journal of
Time Series Analysis, 29, 453–475.
Mikosch, T., T. Gadrich, C. Kluppelberg, and R. Adler (1995): “Parameter Estimation for
ARMA Models with Infinite Variance Innovations,” Annals of Statistics, 23, 305–326.
Mikosch, T., and C. Starica (2000): “Limit Theory for the Sample Autocorrelations and Extremes
of a GARCH(1,1) Process,” Annals of Statistics, 28, 1427–1451.
Mikosch, T., and Y. Zhao (2014): “A Fourier Analysis of Extreme Events,” Bernoulli, 20, 803–845.
Peligrad, M. (1996): “On the Asymptotic Normality of Sequences of Weak Dependent Random
Variables,” Journal of Theoretical Probability, 9, 703–715.
Peng, L. (2001): “Estimating the Mean of a Heavy Tailed Distribution,” Statistics and Probability
Letters, 52, 255–264.
Petersen, K. (1983): Ergodic Theory. Cambridge Univ. Press.
Qu, Z., and D. Tkachenko (2012): “Identification and Frequency Domain Quasi-Maximum Likelihood Estimation of Linearized Dynamic Stochastic General Equilibrium Models,” Quantitative
Economics, 3, 95–132.
Resnick, S. (1987): Extreme Values, Regular Variation and Point Processes. Springer-Verlag: New
York.
Shao, X., and W. Wu (2007): “Asymptotic Spectral Theory for Nonlinear Time Series,” Annals of
Statistics, 35, 1773–1801.
43
Whittle, P. (1953): “Estimation and Information in Stationary Time Series,” Arkiv for Matematik,
2, 423–434.
Zygmund, A. (2002): Trigonometric Series, Vol.’s I and II. Cambridge Univ. Press, Cambridge.
(a)
(b)
Figure 1: Tail-trimmed correlations for AR(1). The process is yt = .9yt−1 + t , is iid Pareto
with tail index κ = 2.5, the sample size is T = 100, and the number of samples is 10, 000. Panel
(a) contains optimally bias-corrected tail-trimmed correlations: simulation 2.5%, 50% and 97.5%
quantiles (bottom, middle, top lines). Panel (b) contains the simulation average difference between
optimally bias-corrected tail-trimmed and untrimmed correlations.
44
45
T = 500
Bias
Med RMSE
Pareto Error: κ = 2.25
-.0053 .8960 .0211
-.0051 .8964 .0240
Pareto Error: κ = 4.50
-.0052 .8970 .0219
-.0049 .8967 .0210
Normal Error
-.0063 .8954 .0216
-.0058 .8954 .0214
.9492
.6528
1.256
.9391
1.892
.7139
KS.05
1.075
1.117
1.638
1.213
1.593
1.108
KSc.05
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
T = 1000
Bias
Med RMSE
Pareto Error: κ = 2.25
-.0021 .8982 .0141
-.0024 .8981 .0153
Pareto Error: κ = 4.50
-.0027 .8982 .0144
-.0025 .8976 .0142
Normal Error
-.0023 .8982 .0143
-.0023 .8987 .0147
T = 250
Bias
Med RMSE
Pareto Error: κ = 2.25
-.0100 .8956 .0337
-.0106 .8918 .0358
Pareto Error: κ = 4.50
-.0109 .8925 .0333
-.0103 .8938 .0344
Normal Error
-.0093 .8942 .0331
-.0094 .8952 .0328
.6356
.5647
.9266
.5915
1.835
.7857
KS.05
1.296
1.174
1.423
1.232
1.901
1.055
KS.05
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trimd
trim-obc
T = 500
Bias
Med RMSE
Pareto Error: κ = 2.25
.0022 1.007 .1932
-.0087 1.005 .2063
Pareto Error: κ = 4.50
.0067 1.005 .0364
.0066 1.004 .0382
Normal Error
.0077 1.006 .0646
.0057 1.005 .0452
.9492
.6755
1.085
1.002
1.892
1.267
KS.05
σ02 = 1 (where
T = 100
Bias Medb RMSE KSc.05
Pareto Error: κ = 2.25
.0164 1.012 .2682 1.817
.0137 .9967 .2954 1.532
Pareto Error: κ = 4.50
.0332 1.022 .1098 1.638
.0278 1.019 .1408 1.335
Normal Error
.0449 1.037 .1725 1.543
.0399 1.018 .2134 1.346
Table 1 : FD-QML for AR(1)
T = 1000
Bias Med RMSE
Pareto Error: κ = 2.25
no trim
.0024 1.002 .1906
trim-obc .0034 .9959 .2152
Pareto Error: κ = 4.50
no trim
.0032 1.002 .0277
trim-obc .0035 1.003 .0311
Normal Error
no trim
.0052 1.005 .0466
trim-obc .0017 1.002 .0432
T = 250
Bias Med RMSE
Pareto Error: κ = 2.25
no trim
.0195 1.016 .2132
trim-obc .0129 1.013 .2469
Pareto Error: κ = 4.50
no trim
.0111 1.009 .0541
trim-obc .0121 1.010 .0601
Normal Error
no trim
.0140 1.011 .0960
trim-obc .0137 1.012 .1019
φ0 = .90)
a. The model is yt = φ0 yt−1 + t where t is iid Pareto or Normal, E[t ] = 0 and σ02 ≡ E[2t ]. b. “Med” is the median, and “RMSE” is the root-mean-squared-error.
c. “KS.05 ” is the ratio of Kolmogorov-Smirnov statistic divided by its 5% critcal value. Values greater than one suggest non-normality at the 5% level.
d. “no-trim” is standard FD-QML. “trim-obc” is the optimal bias corrected tail-trimmed FD-QML
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trimd
trim-obc
T = 100
Bias Medb RMSE
Pareto Error: κ = 2.25
-.0286 .8801 .0690
-.0212 .8850 .0697
Pareto Error: κ = 4.50
-.0224 .8862 .0653
-.0193 .8913 .0657
Normal Error
-.0289 .8756 .0674
-.0272 .8890 .0672
φ0 = .90 (where σ02 = 1)a
.6356
.7653
.8934
.7862
2.143
1.214
KS.05
.4722
.5627
1.106
1.216
1.991
1.352
KS.05
46
= 100
Med RMSE
.2162 .2182
.2201 .2432
= 500
Med RMSE
.2159 .1877
.2094 .1969
= 100
Med RMSE
.1002 .1889
.1256 .2242
= 500
Med RMSE
.0674 .1212
.0890 .1064
T
Bias
.0644
.0622
T
Bias
.0620
.0502
T
Bias
.1032
.1175
T
Bias
.0501
.0478
no trim
trim-obc
KS.05
2.883
1.241
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
KS.05
4.361
2.790
KS.05
3.960
1.732
α0 = .05
KS.05
2.568
1.946
KS.05
2.505
2.142
α0 = .2
no trim
trim-obc
KSc.05
2.546
1.752
α0 = .3
T = 250
Bias
Med
.0624 .0677
.0663 .1027
T = 1000
Bias
Med
.0408 .0632
.0266 .0758
T = 250
Bias
Med
.0597 .2158
.0602 .2150
T = 1000
Bias
Med
.0619 .2038
.0499 .2078
T = 250
Bias
Med
-.0561 .2202
-.0260 .2232
T = 1000
Bias
Med
-.0378 .2402
-.0047 .2445
KS.05
2.727
1.572
KS.05
2.395
1.673
no trim
trim-obc
no trim
trim-obc
KS.05
3.137
2.442
KS.05
2.331
1.978
no trim
trim-obc
no trim
trim-obc
T
Bias
-.0646
-.0770
T
Bias
-.0058
.0095
RMSE
.1022
.0688
RMSE
.1449
.1533
KS.05
4.475
1.378
KS.05
4.377
2.167
no trim
trim-obc
no trim
trim-obc
T
Bias
-.2062
-.2076
T
Bias
-.0951
-.0348
[α0 , β0 ] = [.05, .9] : κ ≈ 4.05
RMSE
.1857
.1867
RMSE
.1983
.2208
[α0 , β0 ] = [.2, .7] : κ ≈ 3.02
RMSE
.0977
.1561
RMSE
.1202
.1728
T
Bias
.0174
.0267
T
Bias
.0036
.0014
= 100
Med
.8170
.7893
= 500
Med
.8858
.8990
= 100
Med
.7082
.6924
= 500
Med
.7104
.7220
= 100
Med
.6532
.6538
= 500
Med
.6391
.6482
RMSE
.2835
.1812
RMSE
.4031
.3884
RMSE
.1936
.1969
RMSE
.3210
.3165
RMSE
.1629
.1725
RMSE
.2321
.2152
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
KS.05
5.598
1.795
KS.05
4.221
2.560
no trim
trim-obc
no trim
trim-obc
β0 = .9
KS.05
2.706
2.111
KS.05
3.321
2.856
β0 = .7
KS.05
2.883
1.346
KS.05
1.832
1.452
β0 = .6
T = 250
Bias
Med
-.1663 .8587
-.1123 .8626
T = 1000
Bias
Med
-.0371 .8955
.0110 .9188
T = 250
Bias
Med
-.0005 .7234
.0010 .7306
T = 1000
Bias
Med
.0123 .7183
.0159 .7149
T = 250
Bias
Med
-.0142 .6304
.0131 .6655
T = 1000
Bias
Med
-.0028 .6277
.0015 .6445
RMSE
.1827
.0903
RMSE
.3741
.2856
RMSE
.1573
.1670
RMSE
.2219
.2366
RMSE
.1465
.1435
RMSE
.1880
.1879
2
a. The model is yt = x2t = σt2 2t where σt2 = 1 + α0 x2t−1 + β0 σt−2
where t is iid standard normal. b. “Med” is the median, and “RMSE” is the root-mean-squared-error.
c. “KS.05 ” is the ratio of Kolmogorov-Smirnov statistic divided by its 5% critical value. Values greater than one suggest non-normality at the 5% level.
d. “no-trim” is standard FD-QML. “trim-obc” is the optimal bias corrected tail-trimmed FD-QML.
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trim
trim-obc
no trimd
trim-obc
= 100
Medb RMSE
.2116 .1386
.2461 .2022
= 500
Med RMSE
.2233 .1087
.2449 .1688
T
Bias
-.0542
.0112
T
Bias
-.0528
.0004
Table 2 : FD-QML for GARCH(1,1)a
[α0 , β0 ] = [.3, .6] : κ ≈ 2.05
KS.05
5.552
1.104
KS.05
5.032
2.530
KS.05
3.077
2.206
KS.05
2.708
2.154
KS.05
2.160
1.172
KS.05
1.867
1.411
(a)
(b)
Figure 2: Tail-trimmed correlations for GARCH(1,1). The process is yt = x2t where xt = .9σt−1 t ,
2
2 , and is iid N (0, 1), the sample size is T = 100, and the number of
with σt−1
= 1 + α0 x2t−1 + β0 σt−1
samples is 10, 000. Panel (a) contains optimally bias-corrected tail-trimmed correlations: simulation
2.5%, 50% and 97.5% quantiles (bottom, middle, top lines). Panel (b) contains the simulation average
difference between optimally bias-corrected tail-trimmed and untrimmed correlations.
47
© Copyright 2026 Paperzz