A New Difference-based Weighted Mixed Liu Estimator in

A New Dierence-based Weighted Mixed Liu Estimator
in Partially Linear Models
Fikri Akdeniz
[email protected]
Çag University, Mersin-TURKEY
Department of Mathematics and Computer Science
11-06-2014
1 / 56
1
2
Shrinkage Estimators
Semiparametric Partially Linear Models
Motivation
3
Dierenced-based estimation
4
Restricted estimation
5
Generalized Dierence-based Liu Estimator
6
Eciency Properties
7
Application
8
Simulation Study
2 / 56
Shrinkage Estimators
Shrinkage Estimators: Ridge and Liu
Classical linear regression model is dened as:
y = X β + ε, E (ε) = 0, Cov (ε) = σ 2 I
(1)
OLS estimator of coecients are βb = (X > X )− X > y .
OLS estimator for nearly collinear data (X T X is ill-conditioned) is
very unreliable with very high variance.
Small perturbations in data change everything!
Multicollinearity between the predictors: shrinkage estimators
The most popular shrinkage estimator is the ridge regression
estimator which is dened by (Hoerl and Kennard, 1970)
1
b
β(k)
= (X > X + kI )−1 X > y
where k > 0 is the tuning (biasing) parameter.
3 / 56
Shrinkage Estimators
Ridge Regression
Ridge regression is like least squares but shrinks the estimated coecients
towards zero. Given a response vector y ∈ R and a predictor matrix
X ∈ Rn×p , the ridge regression coecients are dened as
b
β(k)
= arg min
β∈
Rp
X
(yi −
xiT β)2
+k
p
X
βj2
j=1
= arg min ky − X βk2 +k kβk2
{z }
| {z }
β∈Rp |
2
Loss
2
Penalty
Here k ≥ 0 is a tuning parameter which controls the strength of the
penalty term. Note that
For k = 0, we get the OLS estimate
b
For k = ∞, we get β(k)
=0
For k in between 0 and ∞, model t and amount of shrinkage are
balanced
4 / 56
Shrinkage Estimators
More of Ridge
The canonical form of model (1) is:
Let Λ and T be the matrices of eigenvalues and eigenvectors of X > X
y = XTT > β + ε = Z γ + ε
where Z = XT , γ = T > β and Z > Z = Λ,
T > X > XT = Λ = diag (λ , λ , . . . , λp ), λi is the i th eigenvalue of
1
2
X >X
The ridge resression estimator for the canonical model is:
b
β(k)
= (Λ + kI )−1 Λβb
b i=
β(k)
λi b
β, i = 1, . . . , p
λi + k
This illustrates the essential feature of ridge regression: shrinkage
5 / 56
Shrinkage Estimators
Bias and Variance of Ridge Regression
Theorem
For any design matrix X , the quantity Λ + λI is always invertible, thus
b .
there is always a unique solution β(k)
Theorem
b i) = σ
The variance of the ridge regression estimate is: Var(β(k)
2
λi
(λi +ki )2
Theorem
b i) = − k
The bias of the ridge regression estimate is: Bias(β(k)
λi +k
b i ) is a monotone increasing
The total squared bias i Bias (β(k)
sequenceP
with respect to k (amount of shrinkage) while the total
b i ) is a monotone decreasing sequence with
variance i Var(β(k)
respect to k .
P
2
6 / 56
Shrinkage Estimators
Theorem
(Existence theorem)There always exists a k > 0 such that the MSE of
b
β(k)
is always less than the MSE of βb
Ridge regression is a linear estimator (b
y = Hridge y ), with
Hridge = X (X T X + kI )−1 X T
One may dene its degrees of freedom to be tr (Hridge )
P i
Furthermore one can show that dfridge = λiλ+k
where λi are the
T
eigenvalues of X X .
7 / 56
Shrinkage Estimators
Eect of Multicollinearity
Let us consider two parameter simple linear regression such that
y = β1 x1 + β2 x2
Correlation transformation:
xtj∗
xtj − x j
= √
, xj =
sj n − 1
Pn
P
xij
,
n
− x j )2
yt − y
, yt∗ = √
,
n−1
sy n − 1
∗T ∗
x 1 x 1 x ∗T
x ∗2
1 r
∗T ∗
1
X X = x ∗T x ∗ x ∗T x ∗ = r 1
1
2
2
2
1
1
1 −r
2
T
−1
b
, Var (βbj ) =
Var (β) = σ (X X ) =
2
−r
1
1−r
1 − r2
2
sj =
r
Var (βbi )
t=1 (xtj
0 0.3 0.5 0.95 0.99
1 1.1 1.33 10.26 50.25
8 / 56
Shrinkage Estimators
Another Shrinkage Estimator: Liu
Given a response vector y ∈ R and a predictor matrix X ∈ Rn×p , the Liu
regression coecients are dened as
b
β(η)
= arg min
β∈
R
p
X
(yi −
xiT β)2
+
p
X
(η βbj − βj )2
j=1
2
= arg min ky − X βk22 + η βb − β p
|
{z
}
β∈R
| {z }2
Loss
Penalty
where βb is OLS estimator. Here 0 ≤ η ≤ 1 is a tuning parameter which
controls the strength of the penalty term. Note that
For η = 1, we get the OLS estimate
For η = 0, we get ridge estimate with k = 1
For η in between 0 and 1, model t and amount of shrinkage are
balanced
9 / 56
Shrinkage Estimators
Another Shrinkage Estimator: Liu (Linearly Unied)
Liu(Linearly Unied) estimator is a linear combination of ridge and
Stein-Rule estimators:
b
b
β(η)
= (X T X + I )− (X T y + η β),
0<d <1
1
= (X T X + I )−1 X T y + (X T X + I )−1 η βb
|
{z
} |
{z
}
b
β(k)
c βb
Liu estimator in canonical form is dened as (Liu,1993):
b
β(η)
= (Λ + I )−1 (Λ + ηI )
Theorem
b i) = σ
The variance of the Liu regression estimate is: Var(β(η)
2
(η+λi )2
(1+λi )2 λi
Theorem
b i ) = − ( −η)
The bias of the Liu regression estimate is: Bias(β(η)
(λi + )
1
1
10 / 56
Shrinkage Estimators
Theorem
(Existence theorem)There always exists a 0 < η < 1 such that the MSE
b
of β(η)
is always less than the MSE of βb.
Liu estimator is a linear estimator (b
y = Hliu y ), with
Hliu = X (X T X + I )−1 (I + ηX T X )X T
One may dene its degrees of freedom to be tr (Hliu ).
P
i )λi
Furthermore one can show that dfliu = ( +ηλ
where λi are the
λi +
T
eigenvalues of X X .
1
1
11 / 56
Semiparametric Partially Linear Models
Motivation
Determinants of electricity demand
Monthly aggregated(20 cities) electricity consumption(EC) in Germany:
01.1996-08.2010
Unique physical attributes of electricity are
non-storability
uncertain and inelastic demand
steep supply function
Electricity providers are interested in understanding and hedging demand
uctuations!
12 / 56
Semiparametric Partially Linear Models
Motivation
Determinants of electricity demand
To determine relationship between electricity consumption (EC) and
temperature we have to account for:
price:electricity price, gas price etc.
seasonal eects: monthly eects etc.
economic activity: income etc.
Preassumptions:
EC and price:−
EC and income:+
EC and temperature:unknown
13 / 56
6.5
6.4
y
6.3
6.2
6.1
6.1
6.2
6.3
y
6.4
6.5
6.6
Motivation
6.6
Semiparametric Partially Linear Models
−0.04
0.00
0.04
x_1
Figure : Plot of income vs. electricity
consumption, linear t(green),local
polynomial t(red), 95% condence
bands(blue).
−0.2
−0.1
0.0
0.1
0.2
x_2
Figure : Plot of relative price vs.
electricity consumption, linear
t(green),local polynomial t(red), 95%
condence bands(blue).
14 / 56
Motivation
6.1
6.2
6.3
y
6.4
6.5
6.6
Semiparametric Partially Linear Models
0
200
400
600
800
t
Figure : Plot of temperature vs. electricity consumption, linear t(green),local
polynomial t(red), 95%condence bands(black).
15 / 56
Semiparametric Partially Linear Models
Motivation
Semiparametric Models
The ordinary regression model can be written as
Y = XT β + ε.
(2)
Assumption: The predictors are linearly related to the response.
16 / 56
Semiparametric Partially Linear Models
Motivation
Semiparametric Models
The ordinary regression model can be written as
Y = XT β + ε.
(2)
Assumption: The predictors are linearly related to the response.
Semiparametric partially linear model is dened as
Y = XT β + f (U) + ε.
(3)
Introduced by Engle, Granger, Rice and Weiss(1986): analyzed the
relationship between temperature and electricity usage.
Became popular in the statistical literature due to the seminal works
of Robinson(1988) and Speckman (1988).
Advantages: Increased exibility and reduced modeling bias.
Partially linear models alleviate the curse of dimensionality.
16 / 56
Semiparametric Partially Linear Models
Motivation
Nonparametric Models versus Semiparametric Models
Nonparametric models:
Relax the restrictive assumptions of parametric models.
Are too exible to permit concise conclusions.
Semiparametric models:
Are natural extensions of ordinary linear regression models with
multiple predictors and nonparametric models with several covariates.
Retain the explanatory power of parametric models and the exibility
of nonparametric models.
17 / 56
Semiparametric Partially Linear Models
Motivation
Semiparametric Partial Linear Model
To avoid identiability problems we assume that a variable contained
in X is not contained also in U , and in general, that no component
of X can be mapped to any component of U .
Stone (1985) states that the three fundamental aspects of statistical
models are exibility, dimensionality and interpretability.
Flexibility is the ability of the model to provide accurate ts in a wide
variety of stuations, inaccuracy here leading to bias in estimation.
Dimensionality can be thought of in terms of the variance in
estimation, the curse of dimensionality being that the amount of data
required to avoid an acceptable large variance increases rapidly with
increasing dimensionality.
Interpretability lies in the potantial for shedding light on the
underlying structure.
18 / 56
Dierenced-based estimation
Model and Dierencing
Semiparametric partial linear model
i = 1, . . . , n
yi = xi> β + f (ui ) + εi ,
(4)
in vector/matrix notation
Y = XT β + f (U) + ε
(5)
where yi 's are observations at ui , 0 ≤ u ≤ u ≤ . . . ≤ un ≤ 1,
xi> = (xi , xi , . . . , xip ),y = (y , . . . , yn )> , X = (x , . . . , xn ),
f = {f (u ), . . . , f (un )}> , ε = (ε , . . . , εn )> .
1
1
2
1
1
2
1
1
E(ε|x, u) = 0, Var(ε|x, u) = σ , f has bounded rst derivative(f 0 < L)
2
Dierencing removes the nonparametric component (well.. almost).
19 / 56
Dierenced-based estimation
How does the approximation work?
mth order dierencing equation (Yatchew(1997))



 

m
m
m
m
X
X
X
X
dj yi−j = 
dj xi−j  β + 
dj f (ui−j ) + 
dj εi−j  ,
j=0
j=0
j=0
j=0
where d , d , . . . , dm are the dierencing weights.
Suppose ui are equally spaced on the unit interval and f 0 ≤ L. By the
mean value theorem for some ui∗ ∈ [ui− , ui ]
0
1
1
f (ui ) − f (ui−1 ) = f > (ui∗ )(ui − ui−1 ) ≤
For m = 1 from equation (20)
L
n
yi − yi−1 = (xi − xi−1 )β + f (ui ) − f (ui−1 ) + εi − εi−1
1
= (xi − xi−1 )β + O
+ εi − εi−1
n
=≈ (xi − xi−1 )β + εi − εi−1 .
20 / 56
Dierenced-based estimation

d0
0


D =  ...

0
0
d1
d0
..
.
···
0
···
d2
d2
d1
···
···
d0
···
dm
···
d1
d0
0
dm
d2
d1
···
0
···
d2
···
···
dm
···
0
0





0
dm
d0 , d1 , . . . , dm are dierencing weights that minimise
m
X

m−k
X

k=1
such that
m
X
j=0
2
dj dk+j  ,
j=1
dj = 0 and
m
X
dj2 = 1
(6)
j=0
are satised. The constraints (6) ensure that the nonparametric eect is
removed as n→∞ and Var(e
ε) = Var(ε) = σ respectively.
2
21 / 56
Dierenced-based estimation
Dierencing the model
Dy = DX β + Df + Dε ≈ DX β + Dε
ye ≈ Xe β + εe
(7)
where ye = Dy , Xe = DX and εe = Dε.
βbD
−1
= (DX )> (DX )
(DX )> Dy
− 1
= Xe > Xe
Xe > ye
(Yatchew, 2003).
σ
b2 =
ye> (I − P ⊥ )e
y
>
⊥
tr {D (I −P )D}
(8)
(9)
with P ⊥ = Xe (Xe > Xe )− Xe > , I (p × p ): identity matrix and tr (·): trace of
a matrix (Eubank et al., 1998).
1
22 / 56
Dierenced-based estimation
Dierenced based Liu-type estimator
Akdeniz-Duran et.al proposed the dierence-based Liu-type estimator
minimizing (w.r.t. β )
1
L = (e
y − Xe β)> (e
y − Xe β) + (η βbD − β)> (η βbD − β)
as
βbD (η) = (Xe > Xe + I )−1 (Xe > ye + η βbD )
(10)
where η , 0 ≤ η ≤ 1 is a biasing parameter and when η = 1, βbD (η) = βbD .
Bias{βbD (η)} = −(1 − η)(Xe > Xe + I)− β.
1
(11)
1 Akdeniz-Duran,
E., Haerdle, W., Osipenko, M. (2012). Dierence based ridge and
Liu type estimators in semiparametric regression models. Journal of Multivariate
Statistics
23 / 56
Dierenced-based estimation
Generalization of the Model
Semiparametric partial linear model for observations {yi , xi , ui }ni=
1
yi = xi> β + f (ui ) + εi , i = 1, . . . , n
(12)
Previously we assumed E(εε> ) = σ I.
Real data often reveals heteroscedasticity and in time series context
error term exhibits autocorrelation.
Now we assume E(εε> ) = σ V , V not necessarily diagonal. Thus
E(e
εεe> ) = σ VD where VD = DVD > .
Generalized dierence-based estimator is
2
2
2
βbGD = (Xe > VD−1 Xe )−1 Xe > VD−1 ye.
24 / 56
Dierenced-based estimation
EGeneralization of the Model cont'd
Properties of the generalized dierence-based estimator of β depends
on the characteristics of the information matrix Xe > VD− Xe = G .
The estimate of the σ is
1
2
σ
b2 =
1
n
(e
y − Xe βbGD )> VD−1 (e
y − Xe βbGD )
(13)
It is easy to show that
s2 =
1
n−p
(e
y − Xe βbGD )> VD−1 (e
y − Xe βbGD )
(14)
is an unbiased estimator of σ .
If G (p × p), p << n − m matrix is ill-conditioned with a large
condition number, then the βbGD produces large sampling variances.
Use restricted estimation or shrinkage estimation!
2
25 / 56
Restricted estimation
Generalized Dierence-based Restricted Estimator
The available prior information sometimes can be expressed in the
form of exact, stochastic or inequality restrictions.
We assumed an exact linear restriction on the parameters
2
Rβ = r
R(q × p) matrix, r (q × 1) vector.
Thus the generalized dierence-based restricted parameter estimate
is given by
βbGRD = βbGD + G −1 R > (RG −1 R > )−1 (r − R βbGD )
where G = Xe > VD− Xe .
1
2 Akdeniz,
F., Akdeniz-Duran, E., Roozbeh,M., Arashi, M. (2013). Eciency of the
generalized dierence-based Liu estimators in semiparametric regression models with
correlated errors. Journal of Statistical Computation and Simulation
26 / 56
Restricted estimation
Generalized Dierence-based Stochastic Restricted
Estimator
We do not always have exact prior information such as Rβ = r due
to economic relations, industrial structures, production planning, etc.
Thus we assume a stochastic linear constraint r = Rβ + ,
∼ (0, σ W ) where W is assumed to be known and positive denite.
Prior information are to be assigned not necessarily equal weights(ω ).
In order to incorporate the restrictions in the estimation of
parameters, we minimize (w.r.t. β )
2
(e
y − Xe β)> VD−1 (e
y − Xe β) + ω(r − Rβ)> W −1 (r − Rβ)
0 < ω < 1: prior information receives less weight in comparison to
the sample information
ω > 1: higher weight to the prior information(of little practical
interest).
27 / 56
Restricted estimation
This leads to the following solution for β :
βbGDWM (ω) = (G + ωR > W −1 R)−1 (Xe VD−1 ye + ωR > W −1 r ),
Since
(G + ωR > W −1 R)−1 = G −1 − ωG −1 R > (W + ωRG −1 R > )−1 RG −1
we have generalized dierence-based weighted mixed estimator as
βbGDWM (ω) = βbGD + ωG −1 R > (W + ωRG −1 R > )−1 (r − R βbGD ).
(15)
For ω = 1 we obtain the generalized dierence-based mixed estimator:
βbGDM = (G + R > W −1 R)−1 (Xe > VD−1 ye + R > W −1 r ),
(16)
This estimator in (16) gives equal weight to sample and prior information.
28 / 56
Generalized Dierence-based Liu Estimator
Generalized Dierence-based Liu Estimator
(Generalized) Liu estimator proposed by Liu(1993) and Akdeniz and
Kaciranlar(1995) is dened as
βbGDL (η) = (G + I )−1 (Xe > VD−1 ye + η βbGD ), 0 ≤ η ≤ 1,
= (G + I )−1 (G + ηI )βbGD = Fη βbGD = Fη G −1 Xe > V −1 ye
D
(17)
where βbGD = (Xe > VD− Xe )− Xe > VD− ye is the generalized dierence-based
estimator of β and Fη = (G + I )− (G + ηI ).
Observing that Fη and G − are commutative, we have
1
1
1
1
1
βbGDL (η) = G −1 Fη Xe > VD−1 ye
(18)
29 / 56
Generalized Dierence-based Liu Estimator
Generalized Dierence-based Weighted Mixed Liu Estimator
We have obtained
βbGDWM (ω) = βbGD + ωG −1 R > (W + ωRG −1 R > )−1 (r − R βbGD ).
substituting βbGD (η) with βbGDL (η) in βbGDWM (ω), we describe a
generalized dierence-based weighted mixed Liu estimator (GDWMLE),
as follows:
b η)
βbGDWML (ω, η) = β(ω,
= βbGDL (η) + ωG −1 R > (W + ωRG −1 R > )−1 (r − R βbGDL (η))
arranging terms we nally have
βbGDWML (ω, η) = (G + ωR > W −1 R)−1 (Fd Xe > VD−1 ye + ωR > W −1 r ) (19)
(Hubert and Wijekoon(2006) and Yang et.al(2009)).
30 / 56
Generalized Dierence-based Liu Estimator
b η) , we can see that it is a general
In fact, from the denition of β(ω,
estimator which includes the βbDWM , βbDL and the mixed regression
estimator βbGDM as special cases. Namely,
b η = 1) = βbGDWM (ω),
if η = 1 then β(ω,
b = 0, η) = βbGDL (η),
if ω = 0 then β(ω
b = 1, η = 1) = βbGDM .
if η = 1 and ω = 1 then β(ω
31 / 56
Eciency Properties
Lemmas
Lemma
(Farebrother, 1976) Let A be a positive denite matrix, namely A > 0,
and let α be some vector, then A − αα> ≥ 0 if and only if α> A− α ≤ 1.
1
Lemma
Let n × n matrices M > 0, N ≥ 0 , then M > N if and only if
λmax (NM − ) < 1 (Rao,2008).
1
Lemma
(Trenkler and Toutenburg,1990) Let βbj = Aj y , j = 1, 2 be two competing
estimators of β . Suppose that ∆ = Cov (βb ) − Cov (βb ) > 0. Then
MSEM(βb ) − MSEM(βb ) ≥ 0 if and only if b > (∆ + b b > )− b ≤ 1,
where bj denotes bias vector of βbj .
1
1
2
2
2
1
1
1
2
32 / 56
Eciency Properties
b η) over βbGD
MSEM Superiority of βbGDWM (ω) and β(ω,
b η) are
Expectation and dispersion matrices of the β(ω,
b η)) = BAβ and Var (β(ω,
b η)) = σ 2 B AB
e >
E (β(ω,
(20)
b η) is
The bias of β(ω,
b η)) = E (β(ω,
b η)) − β = B(Fη − I )G β
Bias(β(ω,
(21)
b d) is
The mean squared error matrix of β(ω,
b η)) = Var (β(ω,
b η)) + Bias(β(ω,
b η))Bias(β(ω,
b d))>
MSEM(β(ω,
e > + b1 b >
= σ 2 B AB
(22)
1
B =: (G + ωR > W −1 R)−1 , A =: (Fη G + ωR > W −1 R) and Ae =:
(Fη DFη > + ω 2 R > W −1 R), b1 = B(Fη − I )G β .
33 / 56
Eciency Properties
MSEMs of the Estimators
MSEM(βbGD ) = Var (βbGD ) = σ 2 G −1
MSEM(βbGDM ) = Var (βbGDM ) = σ 2 (G + R > W −1 R)−1
MSEM(βbGDL (η)) = σ 2 Fη G −1 F > + b2 b > ,
η
2
(23)
(24)
(25)
with b = Bias(βbGDL (η)) = (Fη − I )β .
2
b
MSEM(βbGDWM (ω)) = MSEM(β(ω))
b
= Var (β(ω))
= σ 2 B(G + ω 2 R > W −1 R)B >
(26)
34 / 56
Eciency Properties
b
b η)
MSEM Comparison between β(ω)
and β(ω,
Theorem
b η) is
The generalized dierence-based weighted mixed Liu estimator β(ω,
superior to the generalized dierence-based weighted mixed estimator
b
b
b η)) ≥ 0 if
β(ω)
in the MSEM sense, namely MSEM(β(ω))
− MSEM(β(ω,
−
−
>
?
and only if σ b B b ≤ 1.
1
2
1
1
Proof.
b
b η)) = σ 2 B∆1 B > − b1 b >
MSEM(β(ω))
− MSEM(β(ω,
1
2
+ λi (G )+η)λi (G )
i (G )+η)
where δi = λi (G ) − λi (G(λ)(λ
= ( −η)( (λ
. Since
2
2
i (G )+ )
i (G )+ )
0 < η < 1 and λi (G ) > 0, then δi > 0. We note that δi > 0 is monotonic
decreasing in η . Observing that B =: (G + ωR T W − R)− > 0 we get
∆ > 0 and B ? =: B∆i B > > 0.By lemma 1, we get
b
b η)) ≥ 0 if and only if
MSEM(β(ω))
− MSEM(β(ω,
−
σ − b > B ? b ≤ 1.
1
1
1
2
1
1
1
1
1
2
1
1
35 / 56
Eciency Properties
b η)
MSEM Comparison between βbGDL (η) and β(ω,
Theorem
When λmax (NM − ) < 1, the generalized dierence-based weighted mixed
b , η) is superior to the generalized dierence-based Liu
Liu estimator β(w
estimator βbGDL (η) in the MSEM sense, namely
b η)) ≥ 0 if and only if
MSEM(βbGDL (η)) − MSEM(β(ω,
>
>
−
b (σ ∆ + b b ) ≤ 1.
1
2
1
2
2
1
2
Proof.
b η)) = σ 2 ∆2 + b2 b > − b1 b > .
MSEM(βbGDL (η)) − MSEM(β(ω,
2
1
where M = Fη G − Fη> , N = B(Fη G Fη > + ωR > W − R)B > and
∆ = M − N . It is obvious that, M = Fη G − Fη> > 0,
N = B(Fη GFη> + ω R > W − R)B > > 0. Therefore, when
λmax (NM − ) < 1, we get ∆ > 0 by applying Lemma 2. By Lemma 3,
b η)) ≥ 0 if and only if
we have MSEM(βbGDL (η)) − MSEM(β(ω,
>
>
−
b (σ ∆ + b b ) b ≤ 1.
1
1
1
2
2
1
1
2
1
2
2
2
1
2
1
36 / 56
Eciency Properties
Variance Comparison between βbGD and βbGDWM
Theorem
The generalized dierence-based weighted mixed estimator βbGDWM is
superior to the generalized dierence-based estimator
in the sense that
Var (βbGD ) − Var (βbGDWM ) ≥ 0 if and only if ω − 1 W + RG − R > > 0.
2
1
Proof.
∆3 = Var (βbGD ) − Var (βbGDWM )
2
2
2
>
−1
−1 >
= σ ω BR W
− 1 W + RG R W −1 RB > .
ω
dierence
is positive denite when B is positive denite and
The
2
−1 R > > 0 which is positive denite as long as ω < 2.
−
1
W
+
RG
ω
When q < p , R has full row rank, therefore R > has full column rank and
it follows that in this case we can only conclude that ∆3 ≥ 0.
37 / 56
Application
Application 1
The data set is from a household demand for gasoline in Canada
(Yatchew(2003)). We allow price and age to appear nonparametrically:
The basic specication is given by Model:
dist = f (price, age) + β1 income + β2 drivers + β3 hhsize
+ β4 youngsingle + β5 urban + monthly dummies + ε
dist: log of distance traveled per month by household, price: log of price
of 1 lt gasoline, age: log of age, hhsize: size of the household etc.
specication test result of joint signicance of the nonparametric
variables: 5.96
order of dierencing: m=10 with d=(0.9494,-0.1437,-0.1314,-0.1197,0.1085,-0.0978,-0.0877,-0.0782,-0.0691,-0.0606,-0.527)
38 / 56
Application
0.9
Results 1
0.8
o
o
o
o
0.5
0.6
price
0.7
o
o
o
o o
o o
o o o
o
o
o o
o
o
o
o
o o o
o
o o
o o
o o
o
o
o
o o
o
o o
o
o
o
o
o
o
o o
o
o
o
o o
o o
o o o
o
o o o
o
o
o
o o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
0.4
o
o
20
o
o
o
o
o
o
o
o o
o
o o o
o
o o o
o
o
o
o o
o
o o
o o
o
o o
o
o
o o
o
o
o
o
o o
o o
o
o
o
o
o o
o
o
o o
o
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o o
o o
o
o o
o
o
o
o o
o
o
o o
o o o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
30
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o o
o o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o o
o
o o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o o
o
o
o
o
o
o
o o
o o
o
o o
o o
o
o
o
o
o
o o
o o
o o o
o
o
o
o o
o o
o
o
o
o
o
o o
o
o
o
o o
o
o
o
o
o
o
o o
o
o o
o o
o
o
o o
o
o o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o o
o
o
o
o
o o
o
o o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o o
o
o
o
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o o
o
o
o
o
o
o
o o o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
40
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o o
o o
o o
o
o
o o
o
o
o o
o o
o o
o o
o
o o
o
o
o
o o
o o
o
o
o
o o
o
o
o
o o
o
o
o
o
o o
o
o o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o o
o o
o
o
o
o
o
o o
o
o
o
o o
o
o
o
o
o
o
o
o o o
o o
o
o
o
o o
o o
o o
o
o o
o
o
o o
o
o
o o
o
o o
o
o
o
50
o
o
o
o
o o o
o
o o
o o
o o o
o
o o
o
o o
o
o
o o
o o
o
o
o
o o
o
o
o o
o
o
o
o o
o
o
o o
o
o o
o
o o
o
o
o o o
o
o
o o
o
o
o
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o o
o o
o
o
o
o
o o
o
o
o o
o o
o o
o
o
o o
o
o
o o o
o
o
o
o o
o o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o o
o
o
o
o
o o
o
o o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o o
o
o
o
o
o o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o o
o
o
o o
o
o
o o
o
o
o o
o
o o
o
o
o
o o
o
o
60
age
Figure : Ordering of data with respect to price and age
Di. Liu
income
drivers
hhsize
youngsingle
urban
R2
2
sdiff
Dierence
Est. Std. Error
Est. Std.Error
0.287
0.021 0.291
0.021
0.532
0.035 0.571
0.033
0.122
0.029 0.093
0.027
0.198
0.063 0.191
0.061
-0.333
0.020 -0.332
0.020
.270
.263
.496
.501
39 / 56
Application
Results 2
DIST
pr
8.0
9.0
ic
e
e
e
ic
e
ag
ag
7.0
3.2
pr
DIST
6.0
3.3
3.4
3.5
3.6
Figure : raw data graph(left), smooth graph after estimating parametric terms
with Liu estimator
Having estimated the parametric eects using GDWML estimator, the
40 / 56
constructed data (y − z β , age , price ) are then smoothed to estimate
Application
Application 2: Hedonic Pricing of Housing Attributes
Housing prices are very much aected by location (Yatchew(2003))
The price surface may be unimodal, multimodal or have ridges
(prices along the subway are often higher)
Thus, we include a two-dimensional nonparametric eect.
Dataset: 92 detached homes in Ottawa.
y(dependent variable): saleprice
lotarea, square footage of housing, average neighbourhood income
etc. are independent variables
41 / 56
Application
Results 1
Estimate Std. Error t value Pr(>|t |)
(Intercept) 73.9775
17.9961
4.11
0.0001
replac 11.7950
6.1523
1.92
0.0587
garage 11.8383
5.0793
2.33
0.0223
luxbath 60.7362
10.5148
5.78
0.0000
avginc
0.4776
0.2244
2.13
0.0364
disthwy -15.2774
6.6974
-2.28
0.0252
lotarea
3.2432
2.2766
1.42
0.1581
nrbed
6.5860
4.8962
1.35
0.1823
usespace 21.1285
10.9864
1.92
0.0580
south
7.5268
2.1505
3.50
0.0008
west -3.2062
2.5296
-1.27
0.2086
R
.62
sres
424.3
2
2
42 / 56
Application
Results 2
salep
t
100
140
180
220
so
ut
h
st
we
sou
th
price
rice
sale
we
s
10
20
30
40
50
60
70
Figure : (a) Graph of location versus saleprice , (b) Local Polynomial regression
43 / 56
Application
Results 3
replac
garage
luxbath
avginc
disthwy
lotarea
nrbed
usespace
R2
2
sdiff
Estimate Std. Error t value Pr(>|t|)
12.6489
5.7875
2.19
0.0316
12.9252
4.9146
2.63
0.0102
57.6187
10.5741
5.45
0.0000
0.5968
0.2342
2.55
0.0126
1.5538
21.3877
0.07
0.9423
3.0986
2.2387
1.38
0.1700
6.4323
4.7490
1.35
0.1792
24.7213
10.5893
2.33
0.0220
.66
375.5
Table : Parametric part is estimated by Liu-type regression
44 / 56
Simulation Study
Simulation Design
Various biasing parameters are considered: η = (0, 0.1, 0.2, ....., 1)
Various weights w = (0.1, 0.3, 0.5, 0.7, 0.9)
To achieve dierent degrees of collinearity predictors are generated
by:
xij = (1 − γ ) / zij + γzzip , i = 1, . . . , n, j = 1, . . . , p.
2
1 2
where zij are i.i.d. normal numbers.
γ = (0.80, 0.90, 0.99) where correlation between two explanatory
variables are γ
Dependent observations are generated by
2
yi =
5
X
xji βj + f (ti ) + εi
j=1
where β = (1.5, 2, 3, −5, 4), ε ∼ N(0, σ V ) the elements of V are
|i−j|
vij = n
, σ = 4.
2
1
2
45 / 56
Simulation Study
Simulation Design
0.0
−0.4 −0.2
doppler(t)
0.2
0.4
Nonparametric function is generated by
p
f (ti ) = ti (1 − ti )sin(2.1π/(ti + 0.05))
where ti = (i − 0.5)/n which is called the Doppler function.
0.0
0.2
0.4
0.6
0.8
1.0
t
Figure : Nonparametric part of the model
Dicult to estimate, spatially inhomogenous function its smoothness
varies over t.
46 / 56
Simulation Study
Simulation Design
The restriction on the parameters are taken to be
Rβ + = r , ∼ N(0, 1)
with R = (0, 1, −1, 1, 0) and r = 0.
The fourth-order optimal dierencing weights for m = 4 are d =
0.8873, d = −0.3099, d = −0.2464, d = −0.1901, d = −0.1409
(Yatchew(2003),p.61)
The dierence (100 − 4) × 100 dierencing matrix is dened as
follows:


d
d
d
d
d
0 ··· ··· 0
0 d
d
d
d d
0 ··· 0 


 ..

.
..
D=.



 0 ··· ··· d
d d ··· d
0
0 0 ··· ··· d d d d d
0
1
2
0
3
4
1
2
3
4
0
1
2
3
4
0
1
2
0
1
4
2
3
4
47 / 56
Simulation Study
Simulation Design
X and y values are ordered with respect to the nonparametric
variable.
The matrix G = Xe > VD− Xe has condition numbers
20.92, 33.23, 352.63 for γ = 0.8, γ = 0.9 and γ = 0.99, respectively,
which implies the existence of multicollinearity in the data set.
Dierencing we can perform inference on β as if there were no
nonparametric component f in the model to begin with.
1
48 / 56
Simulation Study
Performance criteria
The performances of the estimators are measured by
b =
\ (β)
SMSE
1 X b
β(l) − β 1000 l=
1000
2
1
b =
\ (β)
ABIAS
p
1 X X b
β
−
β
i
1000 l= i= i(l)
1000
1
1
where βb(l) denotes the estimated parameter in the i th iteration
[ (fb(u), f (u)) =
MSE
i
1 X hb
f (ui ) − f (ui )
1000 i=
1000
2
1
49 / 56
Simulation Study
Simulation Results
Table : Evaluation of the risk functions for γ = 0.99 dierent values of ω
GDE GDWME GDME GDLE GDWMLE
mse(ω = 0.1) 157.66
157.66 157.17 155.80
155.80
bias(ω = 0.1) 21.62
21.62 21.54 21.46
21.46
mse(ω = 0.3) 156.81
156.60 156.32 154.96
154.74
bias(ω = 0.3) 21.65
21.62 21.56 21.48
21.45
mse(ω = 0.5) 156.35
156.00 155.81 154.51
154.16
bias(ω = 0.5) 21.53
21.48 21.44 21.36
21.31
Optimum value of η for Liu estimator are calculated using the generalized
cross validation.
50 / 56
Simulation Study
Final Considerations
We have proposed estimation methods for partially linear regression
framework based on dierence-based method.
Especially, the estimation of the parametric part is considered.
Stochastic restrictions were put on the parameter space with
specied weights.
The results are generalized for heteroscedastic cases.
The proposed estimators GDLE and GDWMLE had smaller smses for
all cases.
51 / 56
References I
Engle,R.F., Granger,C.W.J., Rice, J. and Weiss,A.
Semiparametric estimates of the relation between weather and electricity sales
Journal of American Statistical Association,81:310-320,1986.
Härdle,W.K., Müller,M., Sperlich,M.S. and Werwatz,A.
Nonparametric and Semiparametric Models
Springer Verlag, Heidelberg, 2004.
Hoerl, A.E. and Kennard, R.W.
Ridge regression: Biased estimation for nonorthogonal problems.
Technometrics,12: 55-67,1970.
Liu, K.
A new class of biased estimate in linear regression.
Communication in Statistics: Theory and Methods,22:393-402,1993.
52 / 56
References II
Green,P.J. and Silverman, B.W.
Nonparametric Regression and Generalized Linear Models.
Chapman and Hall, 1994.
Akdeniz-Duran,E., Haerdle, W. and Osipenko, M.
Dierence based ridge and Liu type estimators in semiparametric regression
models.
Journal of Multivariate Analysis, 105, 164-175, 2012.
Akdeniz,F.,Akdeniz-Duran,E., Roozbeh, M. and Arashi, M.
Eciency of the generalized dierence-based Liu estimators in semiparametric
regression models with correlated errors.
Journal of Statistical Computation and Simulation, 2013, DOI:
10.1080/00949655.2013.806924.
Akdeniz-Duran,E., Akdeniz, F. and Hu, H.
Eciency of a Liu-type estimator in semiparametric regression models.
Journal of Computational and Applied Mathematics, 235, 1418-1428, 2011.
53 / 56
References III
Akdeniz-Duran,E., Akdeniz, F. and Hu, H.
Eciency of the generalized dierence-based Liu estimators in semiparametric
regression models with correlated errors.
Journal of Statistical Computation and
Simulation,DOI:10.1080/00949655.2013.806924.
Akdeniz,F., Akdeniz-Duran, E., Roozbeh, M. and Arashi, M.
New dierence-based estimator of parameters in semiparametric regression
models.
Journal of Statistical Computation and Simulation, Volume 83, Issue 5, 2013.
www.r-project.rog
54 / 56
Thank you.
Questions?
55 / 56
A New Dierence-based Weighted Mixed Liu Estimator
in Partially Linear Models
Fikri Akdeniz
[email protected]
Çag University, Mersin-TURKEY
Department of Mathematics and Computer Science
11-06-2014