Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series
From Smoothing to Conditional Parametric Models
Henrik Madsen
Advanced Time Series Analysis
September 2016
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
1 / 16
Previously
Local constant estimate
Alternative/extension
Local polynomial approximation
In the following
Regressogram (later - in Chapter 3)
Locally-weighted polynomial regression
Conditional parametric models
Conditional parametric ARX-models
Non-linear FIR and ARX models
Smoothing splines
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
2 / 16
Introduction
Goals of smoothing
Goals of smoothing
Besides providing a nice summary of a set of data points, scatterplot
smoothers can be viewed as estimates of the regression function:
f (x) = E [Y |X = x]
Sometimes in addition, the following model is useful:
Y = f (x) + ǫ
Where ǫ is a random variable with mean zero.
Data: N pairs (xi , yi ), i = 1, 2, . . . , N .
Goal: estimate f .
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
3 / 16
Introduction
One-dimensional Smoothers
One-dimensional Smoothers
Four broad categories:
Series or regression smoothers (polynomials, Fourier regression,
regression splines, filtering)
Kernel Smoothers (Nadaraya-Watson locally weighted averages, local
regression, loess)
Smoothing splines (roughness penalties)
Near Neighbor smoothers (running means, medians, Tukey smoothers)
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
4 / 16
Important smoothers
Locally-weighted polynomial regression
Locally-weighted polynomial regression
Previously (in Section 2.3.3), the local model, Yt = θ + ǫt was
estimated as:
N
1 X
arg min
ws (x)(Ys − θ)2
N
θ
s=1
Extension: LWLS - Locally weighted least squares
Model:
Yt = µ(Xt ) + ǫt
Polynomial P (Xt , x) such that µ(Xt ) = P (Xt , x)
θ̂(x) = arg min
θ
N
1 X
ws (x) (Ys − P (Xs , x))2
N
s=1
The local estimate of µ(x) is µ̂(x) = P̂ (x, x).
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
5 / 16
Important smoothers
Locally-weighted polynomial regression
Locally-weighted polynomial regression
–Example
P (Xt , x) = θ0 + θ1 (Xt , x) + θ2 (Xt − x)2
θ = (θ0 , θ1 , θ2 )T is estimated by
θ̂(x) = arg min
θ
N
1 X
ws (x) (Ys − P (Xs , x))2
N
s=1
The local estimate of µ is:
µ̂(x) = θ̂0 (x)
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
6 / 16
Important smoothers
Locally-weighted polynomial regression
Locally-weighted polynomial regression
LWLS estimate using nearest neighbor.
First a local constant model is applied.
Local constant nearst neighbour 10%, 20%,..., 60%
0.0
0.2
0.4
y
0.6
0.8
•
•
• ••
• •
•
••
•
•
••
• • •
•
••
•
•
•• ••• • ••
••• • •
•• •• • •••• •• ••
• • •• •• •
•
• • • •• ••
•
•
• ••• • ••••
•
•
• ••
• • • • •••
•
• • •••• • ••• •
•
•• •
•
• •
•
••
•
•
••
•
• •• •
•
•
•
••
• ••
•
• • •• • •
•
•
-2
-1
0
1
•
•
•
2
x
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
7 / 16
Important smoothers
Locally-weighted polynomial regression
Locally-weighted polynomial regression
Then using local quadratic approximation.
Local quadratic nearst neighbour 10%, 20%,..., 60%
•
0.8
0.6
0.2
0.4
y
0.0
• ••
• •
• • ••
••
•
•
•
•
••
•
•
••• ••• • ••
••••••• •• •
•• •• •• • •
• • •• •• •
•
• • • •• ••
•
• ••• • • ••••
• •• • ••
• • • • •••
•
• • ••••• • ••• •
•• ••
•
• •
•
••
•
•
•
• • • •• • •
•
•
•
••
•
•
•
• • •• • •
•
•
•
-2
-1
0
1
•
•
•
•
2
x
The bias is much larger for the constant mean model.
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
8 / 16
Important smoothers
Conditional parametric models
Conditional parametric models
Also called Varying-coefficient models. The coefficients depend on a
number of variables.
Conditional parametric model:
Yt = ZtT θ(Xt ) + ǫt ,
t = 1, . . . , N
Estimated locally by:
N
2
1 X
arg min
ws (x) Ys − ZsT θ(x)
N
θ
s=1
where θ(x) are polynomials.
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
9 / 16
Important smoothers
Conditional parametric models
Conditional parametric models
–Example
Locally weighted linear model: Assume ZtT = (1, Z1t ), dim(x) = 1
Constant setting θT = (θ0 , θ1 )
Now
θ (X)
ZtT θ(Xt ) = (1, Z1t ) 0
θ1 (X)
θ00 + θ01 Xt + θ02 Xt2
= (1, Z1t )
θ10 + θ11 Xt + θ12 Xt2
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
10 / 16
Important smoothers
Conditional parametric models
Conditional parametric models
–example
This can be re-written as:

θ00
θ01 
 
 
∗T ∗
2
2 θ02 
Zt θ = 1, Xt , Xt , Z1t , Z1t Xt , Z1t Xt  
θ10 
θ11 
θ12

After estimation:
θ̂0 (x) = θ̂00 (x) + θ̂01 (x)x + θ̂02 (x)x2
θ̂1 (x) = θ̂10 (x) + θ̂11 (x)x + θ̂12 (x)x2
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
11 / 16
Important smoothers
Functional-coefficient AR-models
Two important conditional parametric models
Conditional parametric ARX-models
Yt =
p
X
ai (Xt−m )Yt−i +
r
X
bi (Xt−m )Ut−i + ǫt
i=1
i=1
Functional-coefficient AR-models
q
X
Yt =
ai (Yt−i )Yt−i + ǫt
i=1
where {ǫt } are white noise processes in both cases.
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
12 / 16
Smoothing Splines
Smoothing Splines
Consider the object function
RRS(f, λ) =
N
X
2
(yi − f (xi )) + λ
i=1
Z
{f ′′ (t)}2 dt
where λ is a fixed smoothing parameter.
Notice the extremes obtained for λ = 0 and λ = ∞.
It can be shown that the unique optimizer is a natural spline with
knots at the unique values of xi , i = 1, ..., N .
Over-parameterized since with have N knots which implies N degrees
of freedom.
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
13 / 16
Smoothing Splines
Smoothing Splines (cont.)
Let us write the natural splines as
f (x) =
N
X
Nj (x)θj
j=1
Now we write
RRS(θ, λ) = (y − N θ)T (y − N θ) + λθT ΩN θ
R
where Nij = Nj (xi ) and Ωik = Nj′′ (t)Nk′′ (t)dt.
The solution is easily seen to be
θ̂ = (N T N + λΩN )−1 N T y
(1)
The fitted smoothing spline is
fˆ(x) =
N
X
Nj (x)θ̂j
(2)
j=1
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
14 / 16
Smoothing Splines
Smoothing splines (cont.)
Let f denote the N-vector of fitted values at the training predictors
xi . Then
fˆ = N (N T N + λΩN )−1 N T y
(3)
= Sλ y
(4)
The matrix Sλ is known as the smoother matrix. Important: Sλ is
not idempotent!
Degrees of freedom is dfλ = trSλ .
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
15 / 16
Smoothing Splines
Smoothing splides (cont.)
Consider Bξ to be a N × M matrix (ie. M cubic-spline basis
functions evaluated at the N training points.
Then the vector of fitted values is given by
fˆ = Bξ (BξT Bξ )−1 BξT y
(5)
= Hξ y
(6)
where Hξ is a projection operator or hat matrix (idempotent).
Degrees of freedom is df = trHξ = M .
Henrik Madsen (02427 Adv. TS Analysis)
Lecture Notes
September 2016
16 / 16