Introduction Consider the simple AR(1) model for t = 1,...,T yt = φy +

Introduction
Consider the simple AR(1) model for t = 1, . . . , T
yt = φyt−1 + εt, εt ∼ WN(0, σ 2)
If |φ| < 1, then yt ∼ I(0) and
yt = ψ(L)εt, ψ(L) =
∞
X
ψ k Lk , ψ k = φk
k=0
such that
∞
X
k=0
k|ψ k | < ∞
LRV = σ 2ψ(1)2 = σ 2(1 − φ)−2 < ∞
Furthermore, by the LLN and the CLT
T −1
T
X
t=1
T −1/2
T
X
t=1
p
yt → E[yt] = 0
d
yt → N (0, σ 2ψ(1)2) = N (0, σ 2(1 − φ)−2)
d
T 1/2(φ̂ − φ) → N (0, (1 − φ2))
³P
´−1 P
T y2
T y
where φ̂ =
t=1 t−1
t=1 t−1yt is the least
squares estimate of φ.
If φ = 1, then yt ∼ I(1) and
ψk = 1
∞
X
k|ψ k |
T
X
yt → ∞ as T → ∞
= ∞
k=0
σ 2ψ(1)2 = ∞
Furthermore,
T −1
t=1
T
X
−1/2
T
yt → ∞ as T → ∞
t=1
p
T 1/2(φ̂ − 1) → 0
Clearly, the asymptotic results for I(0) processes are
not applicable.
Sample Moments of I(1) Processes
When φ = 1
yt = yt−1 + εt
= y0 +
t
X
εj
j=1
=
t
X
εj if y0 = 0
j=1
Now, consider the sample mean of yt when y0 = 0 :
ȳ = T −1
T
X
t=1
yt = T −1
T
X
t=1
⎛
⎝
t
X
j=1
⎞
εj ⎠
Notice that the sample mean is a normalized sum of
partial sums of the white noise error term εt. As such,
it exhibits very different probabilistic behavior than
the sum of stationary and ergodic errors. It turns out
that the limit behavior of y when φ = 1 is described
by simple functionals of Brownian motion.
Brownian Motion
Standard Brownian motion (Wiener process) is a continuoustime process W (·) associating each date r ∈ [0, 1] the
scalar random ariable W (r) such that
1. W (0) = 0
2. For any dates 0 ≤ r1 < r2 < · · · < rk ≤ k, the
random increments
W (r2)−W (r1), W (r3)−W (r2), . . . , W (rk )−W (rk−1)
are independent Gaussian random variables with
W (t) − W (s) ∼ N(0, t − s)
3. For any given realization, W (r) is continuous at
r with probability 1. That is, W (r) ∈ C[0, 1] =
space of continuous real valued functions on [0, 1].
The standard Brownian motion, or Wiener process,
may be intuitively thought of as the continuous-time
limit of a random walk process in which the integer
time index t = 1, 2, . . . , ∞ has been rescaled to the
continuous time index r = 0, . . . , 1. The Wiener process may be shown to have the following properties:
1. W (r) ∼ N (0, r)
2. σW (r) = B(r) ∼ N (0, σ 2r)
3. W (r)2 ∼ r · χ2(1)
4. W (r) is not differentiable and exhibits unbounded
variation.
Partial Sum Processes and the Functional Central
Limit Theorem
Let εt ∼ W N (0, σ 2). For r ∈ [0, 1], define the partial
sum process
XT (r) = T −1
[T
r]
X
εt
t=1
[T r] = integer part of T · r
For example, let T = 10 and consider XT (r) for r =
0, 0.01, 0.1, 0.2 :
[10·0]
1 X
εt = 0
r = 0, [10 · 0] = 0 : X10(0) =
10 t=1
1
r = 0.01, [10 · 0.01] = 0 : X10(0.01) =
10
[10·0.01]
X
t=1
[10·0.1]
X
εt = 0
1
ε1
r = 0.1, [10 · 0.1] = 1 : X10(0.1) =
εt =
10 t=1
10
[10·0.2]
1 X
ε1 + ε2
εt =
r = 0.2, [10 · 0.2] = 2 : X10(0.2) =
10 t=1
10
In general,
ε1 + · · · + εj j
j+1
,
≤r<
X10(r) =
10
10
10
For a sequence of errors ε1, . . . , εT :
1. the function XT (r) is a random step function defined on [0, 1].
2. As T gets bigger the spaces between the steps
gets smaller and the random step function begins
to look more and more like a Wiener process.
The Functional Central Limit Theorem
For any fixed r ∈ [0, 1], consider
⎛
⎞
r]
√ ⎜ −1 [T
√
X
⎟
T XT (r) =
T ⎝T
εt⎠
t=1
[T r]
1 X
= √
εt
T t=1
⎛q
Now, as T → ∞
⎞⎛
[T
r]
[T
r]
X
1
⎜
⎟⎜
⎟
= ⎝ √ ⎠ ⎝q
εt⎠
T
[T r] t=1
q
[T r]
√
√
→ r
T
q
⎞
1
[T
r]
X
[T r] t=1
d
εt → N (0, σ 2)
It follows from Slutsky’s theorem that
√
d
T XT (r) → N (0, r · σ 2) ≡ σ · W (r)
or
√
d
T XT (r)/σ → N (0, r) ≡ W (r)
Notice that when r = 1, we have the usual result
T
√
1 X
d
T XT (1)/σ = √
εt → N (0, 1) ≡ W (1)
σ T t=1
Since the above result holds for any r ∈ [0, 1], one
might expect that the result holds uniformly for r ∈
[0, 1]. In fact, the probability distribution of the sequence of stochastic step functions
√
{ T XT (·)/σ}∞
T =1
defined on [0, 1] converges asymptotically to that of
standard Brownian motion W (·).
This convergence result, know as Donsker’s Theorem
for Partial Sums or the Functional Central Limit Theorem (FCLT), is often represented as
√
T XT (·)/σ ⇒ W (·)
The symbol “⇒” denotes convergence in distribution
for random functions.
The Continuous Mapping Theorem
Recall, if XT is a sequence of random variables such
d
that XT → X and g(·) is a continuous function then
d
g(XT ) → g(X). A similar result holds for random
functions and is called the Continuous Mapping Theorem (CMT).
Let {ST (·)}∞
T =1 be a sequence of random functions
such that
ST (·) ⇒ S(·)
g(·) = continuous functional
Then the CMT states that
g(ST (·)) ⇒ g(S(·))
Example 1
√
Suppose ST (·) = T XT (·)/σ so that S (·) = W (·)
by the FCLT. Let g(ST (·)) = σ · ST (·) . Then
g(ST (·)) ⇒ g(W (·)) = σW (·)
Example 2
R
Let g(ST (·)) = 01 ST (r)dr. Then
g(ST (·)) ⇒ g(W (·)) =
Z 1
0
W (r)dr
Convergence of Sample Moments of I(1) Processes
Let yt be the I(1) process
yt = yt−1 + εt, εt ∼ WN(0, σ 2)
For r ∈ [0, 1], define the partial sum process
XT (r) = T −1
[T
r]
X
εt
t=1
√
such that T XT (·) ⇒ σW (·). The FCLT and the
CMT may be used to deduce the following results:
T −3/2
T −2
T
X
t=1
T
X
yt−1 ⇒ σ
Z 1
2
yt−1
⇒ σ2
0
W (r)dr
Z 1
W (r)2dr
0
t=1
Z 1
T
X
T −1
yt−1εt ⇒ σ 2
W (r)dW (r)
0
t=1
³
´
2
2
= σ W (1) − 1 /2
For example, it can be shown that
T −3/2
T
X
t=1
yt−1 =
Z 1√
0
T XT (r)dr ⇒ σ
Z 1
0
W (r)dr
using the FCLT and the CMT. The details are given
in chapter 17 of Hamilton.
Application: Unit Root Tests
To illustrate the convergence of sample moments of
I(1) processes, consider the AR(1) regression
yt = φyt−1 + εt, εt ∼ WN(0, σ 2)
If φ = 1 then yt ∼ I(1); if |φ| < 1 then yt ∼ I(0). A
test of yt ∼ I(1) against the alternative that yt ∼ I(0)
may therefore be formulated as
H0 : φ = 1 vs. H1 : |φ| < 1
A natural test statistic is the t-statistic
tφ=1 =
φ̂ − 1
SE(φ̂)
where
⎛
⎞−1
T
T
X
X
2
φ̂ = ⎝
yt−1⎠
yt−1yt
t=1
t=1
⎛ ⎛
⎞−1⎞1/2
T
⎜ 2 ⎝X 2 ⎠ ⎟
SE(φ̂) = ⎝σ̂
yt−1
⎠
t=1
σ̂ 2 = T −1
T
X
(yt − φ̂yt−1)2
t=1
Consistency of φ̂ under H0 : φ = 1
Under H0 : φ = 1
⎛
⎞−1
T
X
2 ⎠
yt−1
yt−1εt
φ̂ − 1 = ⎝
t=1
t=1
⎛
⎞−1
T
T
X
X
−2
2
−2
⎝T
yt−1⎠ T
yt−1εt
t=1
t=1
T
X
Using the results
T −2
T
X
2
yt−1
⇒ σ2
Z 1
W (r)2dr
0
t=1
Z 1
T
X
−1
2
T
yt−1εt ⇒ σ
W (r)dW (r)
0
t=1
and the CMT, it follows that
p
Ã
φ̂ − 1 → σ 2
p
so that φ̂ → 1.
Z 1
0
W (r)2dr
!−1
×0=0
DF Test with Intercept
yt = c + φyt−1 + εt
= x0tβ + εt
xt = (1, yt−1)0, β = (c, φ)0
OLS gives
⎛
⎞−1
T
T
X
X
0
β̂ = ⎝
xtxt⎠
xtyt−1
t=1
t=1
Ã
!
PT
T
X
T
yt−1
PT
Pt=1
xtx0t =
T y2
y
t−1
t=1
t=1 t−1
t=1
Ã
!
PT
T
X
y
PTt=1 t−1
xtyt−1 =
t=1 ytyt−1
t=1
Now, under H0 : φ = 1 and c = 0
β̂ − β
⎞−1
T
T
X
X
ĉ − 0
0
=
=⎝
xtxt⎠
xtεt
φ̂ − 1
t=1
t=1
Ã
!
!
PT
PT
−1 Ã
T
yt−1
t=1 εt
PT
Pt=1
P
T y2
T y
y
t−1
t=1
t=1 t−1
t=1 t−1εt
Ã
!
⎛
P
0 and PT x ε conProblem: Elements of T
x
x
t=1 t t
t=1 t t
verge at different rates!
Ã
!
Ã
!
PT
3/2
T
yt−1
O(T )
Op(T )
PT
Pt=1
=
T y2
Op(T 3/2) Op(T 2)
t−1
t=1 yt−1
t=1
!
Ã
!
Ã
PT
1/2
ε
Op(T )
PT t=1 t
=
Op(T )
t=1 yt−1εt
Implication: Cannot get sensible convergence results
using traditional scaling
³
T β̂
⎛
⎞−1
T
T
´
X
X
−1
0
−1
− β = ⎝T
xtxt⎠ T
xtεt
t=1
t=1
Ã
!−1
PT
−1
−2
T
T
t=1 yt−1
P
P
=
−2 T y 2
T −1 T
y
T
t−1
t=1
Ã
! t=1 t−1
P
T −1 T
t=1 εt
P
×
T −1 T
t=1 yt−1εt
Ã
!−1
0
0
R
⇒
0 σ 2 01 W (r)2dr
Ã
!
×
0
R1
2
σ 0 W (r)dW (r)
which is not well defined.
Sims-Stock-Watson Trick
Define the diagonal and invertible scaling matrix
DT =
Ã
T 1/2
0
0
T
!
Then write
³
DT β̂
⎛
⎞−1
T
T
´
X
X
−1
0
− β = DT ⎝
xtxt⎠ DT DT
xtεt
t=1
t=1
⎛
⎞−1
T
T
X
X
−1
0 D−1⎠
= ⎝D−1
x
x
D
xtεt
t
t
T
T
T
t=1
t=1
where
³
´
DT β̂ − β =
D−1
T
Ã
T
X
t=1
Ã
T 1/2ĉ
T (φ̂ − 1)
!
xtx0tD−1
T
P
1
T −3/2 T
yt−1
t=1
=
P
−2 PT y 2
T −3/2 T
y
T
t=1 t−1
t=1 t−1
Ã
!
PT
T
−1/2
X
T
εt
t=1
D−1
x
ε
=
P
t t
−1 T y
T
T
t=1 t−1εt
t=1
!
Therefore,
³
DT β̂ − β
´
Ã
!−1
R1
1
σ 0 W (r)
R1
R1
⇒
2
σ 0 W (r) σ 0 W (r)2dr
Ã
!
2
N (0, σ )
R
×
σ 2 01 W (r)dW (r)
Straightforward algebra shows that
d
T 1/2ĉ 9 N (0, σ 2)
T (φ̂ − 1) ⇒
ÃZ
1
0
W μ(r)2dr
W μ(r) = W (r) −
Z 1
0
!−1 Z
1
W (r)
0
W μ(r)dW (r)
Convergence of Sample Moments with General Serial
Correlation
yt = yt−1 + ψ ∗(L)εt, εt ∼ WN(0, σ 2)
= yt−1 + ut
ψ ∗(L) is 1-summable
LRV = σ 2ψ ∗(1)2 = γ 0 + 2
∞
X
γj
j=1
γ j = cov(ut, ut−j )
FCLT
[T ·]
√
1 X
T XT (·) = √
ut ⇒ LRV × W (·)
T t=1
P
1. T −3/2 T
t=1 yt−1 ⇒
2. T −2
√
R1
LRV 0 W (r)dr
R1
PT
2
2dr
y
⇒LRV
W
(r)
0
t=1 t−1
P
R
1 W (r)dW (r)+ω, ω =
3. T −1 T
y
u
⇒LRV
t
t−1
0
t=1
1 (LRV−γ )
0
2
P
4. T −1 T
t=1 yt−1εt ⇒
√
R
σ 2LRV 01 W (r)dW (r)
R1
PT
−1
5. T
t=1 yt−1ut−1 =LRV 0 W (r)dW (r) + ω +
γ0
Application: Asymptotic Distribution of ADF test
Assume yt is I(1) and that ∆yt ∼ AR(1)
∆yt = ξ∆yt−1 + εt, εt ∼ WN(0, σ 2)
|ξ| < 1
Therefore, ∆yt has Wold representation
∆yt = ψ ∗(L)εt = ut
ψ ∗(L) = (1 − ξL)−1 =
∞
X
ψ ∗j Lj , ψ ∗j = ξ j
j=0
LRV = σ 2ψ ∗(L) = σ 2(1 − ξ)−1
The ADF test regression is
yt = φyt−1 + ξ∆yt−1 + εt
x0tβ + εt
x0t = (yt−1, ∆yt−1)0, β = (φ, ξ)0
Notice that
xt =
Ã
yt−1
∆yt−1
!
∼ I(1)
∼ I(0)
OLS on the ADF test regression gives
β̂ − β
where
⎞−1
T
X
=⎝
xtx0t⎠
xtεt
t=1
t=1
T
X
!
PT
PT
2
yt−1
yt−1∆yt−1
0
t=1
t=1
PT
PT
xtxt =
2
∆y
y
∆y
t−1
t−1
t=1
t=1
t−1
t=1
Ã
!
Op(T 2) Op(T )
=
Op(T ) Op(T 1/2)
à P
!
T
T
X
y ε
PTt=1 t−1 t
xtεt =
t=1 ∆yt−1εt
t=1
Ã
!
Op(T )
=
Op(T 1/2)
T
X
Ã
⎛
Use Sims-Stock-Watson trick and define the scaling
matrix
Ã
!
T
0
DT =
0 T 1/2
Then write
³
DT β̂
where
⎛
⎞−1
T
´
X
−1
0
− β = DT ⎝
xtxt⎠ DT DT
xtεt
t=1
t=1
⎛
⎞−1
T
T
X
X
−1
−1
−1
0
= ⎝DT
xtxtDT ⎠ DT
xtεt
t=1
t=1
³
DT β̂ − β
´
D−1
T
=
Ã
T
X
⎛
=⎝
T
X
⎞
T (φ̂³− 1) ´
⎠
1/2
ξ̂ − ξ
T
xtx0tD−1
T
t=1
!
P
PT
−2
T
2
−3/2
T
yt−1
T
yt−1∆yt−1
t=1
t=1
P
−1 PT ∆y 2
T −3/2 T
∆y
y
T
t−1 t−1
t=1
t=1
t−1
and
Ã
!
PT
T
−1
X
T
t=1 yt−1εt
D−1
x
ε
=
P
t
t
−1/2 T ∆y
T
T
t−1εt
t=1
t=1
Note: ∆yt−1εt = ut−1εt is a stationary and ergodic
MDS with
E[(ut−1εt)2] = E[E (ut−1εt)2 |It−1]
= E[u2t−1E[ε2t ]] = σ 2γ 0
Therefore, by the appropriate CLT
T −1/2
T
X
t=1
∆yt−1εt → N (0, σ 2γ 0)
Using the convergence results for the sample moments
of serially correlated I(1) process, the above result,
and the CMT gives
R1
W (r)dW (r)
0
T (φ̂ − 1) ⇒ R 1
2
0 W (r) dr
³
´
d
1/2
ξ̂ − ξ → N (0, σ 2γ 0)
T
Furthermore, φ̂ and ξ̂ are asymptotically independent.