October 29

ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Quadratic versus Linear Estimating Equations
GLS estimating equations
n
X


j=1
fβj
0
2σ 2 gj2
0

1/σ
νθj

σ 2 gj2
0
0
2σ 4 gj4
−1 Yj − fj
(Yj − fj )2 − σ 2 gj2
= 0.
Estimating equations for β are linear in Yj .
Estimating equations for β only require specification of the first
two moments.
GLS is optimal among all linear estimating equations.
1 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Gaussian ML estimating equations
n
X

fβj

0
j=1

2 2
2σ 2 gj2 νβj
σ gj

1/σ
2
2
0
2σ gj
νθj
0
2σ 4 gj4
−1 Yj − fj
(Yj − fj )2 − σ 2 gj2
= 0.
Estimating equations for β are quadratic in Yj .
Estimating equations for β require specification of the third and
fourth moments as well.
Specifically, if we let
j =
then we need to know
E 3j = ζj∗
2 / 26
Yj − f (xj , β)
,
σg (β, θ, xj )
and var 2j = 2 + κ∗j .
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Questions
If we know the true values ζj∗ and κ∗j , how much is β̂ improved using
the quadratic estimating equations versus using the linear estimating
equations?
If we use working values (for example ζj = κj = 0, corresponding to
normality) that are not the true values (i.e., ζj∗ and κ∗j ), is there any
improvement in using the quadratic estimating equations?
If we use working variance functions that are not the true variance
functions, is there any improvement in using the quadratic estimating
equations?
3 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
General form of quadratic estimating equations:
n
X



j=1
4 / 26
fβ,j
0

2σ 2 gj2 νβ,j
2 2
−1
σ gj
ζj σ 3 gj3
1 

σ
ζj σ 3 gj3 (2 + κj ) σ 4 gj4
2σ 2 gj2
νθ,j
Yj − fj
×
= 0.
(Yj − fj )2 − σ 2 gj2
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Large sample distribution for all parameters jointly:


β̂
−
β
0
√
L
n  σ̂ − σ0  −→ N 0, A−1 BA−1 .
θ̂ − θ 0
Here
n
1 X T −1
D0,j V0,j D0,j ,
n→∞ n
j=1
A = lim
n
1 X T −1
−1
B = lim
D0,j V0,j var ( s0,j | xj ) V0,j
D0,j ,
n→∞ n
j=1
5 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Also
Vj =


Dj = 
σ 2 gj2
ζj σ 3 gj3
ζj σ 3 gj3 (2 + κj ) σ 4 gj4
fβ,j
0
,
T
2σ 2 gj2 νβ,j
1 

σ
2σ 2 gj2
νθ,j
and V0,j and D0,j are evaluated at the true β 0 and θ 0 .
6 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Also
var (s0,j | xj ) =
σ 2 gj2
ζj∗ σ 3 gj3
ζj∗ σ 3gj3
2 + κ∗j σ 4 gj4
,
the true variance matrix.
Note that if the working values for ζj and κj are the same as the true
values,
var ( s0,j | xj ) = V0,j ,
so B = A, and the large sample distribution simplifies to


β̂ − β 0
√
L
n  σ̂ − σ0  −→ N 0, A−1 .
θ̂ − θ 0
7 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
To deduce the limiting distribution of n1/2 (β̂ − β 0 ), it would be
necessary to carry out the indicated matrix inversion and
multiplications and extract the upper left p × p submatrix of the
result.
It is possible to show that, just as GLS is optimal among linear
estimating equations, β̂ (as well as σ̂ and θ̂) are optimal among
quadratic estimating equations, provided the working values for ζj
and κj are the true values.
Next we consider a special case to gain better ideas of comparison
between linear and quadratic estimating equations.
8 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If we take
E 3j = 0 and var 2j = 2 + κ for all j,
while in truth
E 3j = ζ ∗
and var 2j = 2 + κ∗
for all j
then we have
n
9 / 26
1/2
L
β̂ − β 0 −→ N 0, σ02 Γ−1 ∆Γ−1 .
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Here
Γ = lim Γn
n→∞
= lim n−1
n→∞
= Σ−1
WLS +
4σ02 T
R PR
XT WX +
2+κ
4σ02
Σβ
2+κ
and
∆ = lim ∆n
n→∞
4σ02 (2 + κ∗ ) T
2σ0 ξ ∗ T 1/2
T
1/2
= lim XT WX +
R
PR
+
X
W
PR
+
R
PW
X
n→∞
(2 + κ2 )
2+κ
2 (2 + κ∗ )
∗ 4σ
2σ
ζ
0
0
= Σ−1
Σβ +
Tβ + TT
β
WLS +
(2 + κ)2
2+κ
10 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
and


T
νβ01


R =  ... 
T
νβ0n
n×p


T
τθ01


Q =  ... 
T
τθ0n
n×(q+1)
and P = I − Q(QT Q)−1 QT .
11 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Recall that, if the first two moments are correctly specified, then
√ L
n β̂ GLS − β 0 −→ N 0, σ02 ΣWLS .
First we note that the properties of β̂ GLS do not depend on those of
σ̂ and θ̂, whereas the properties of β̂ ML do.
Next we compare β̂ from the linear and quadratic equations in
various scenarios under this special case.
12 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
When the data are really normal
That is, we choose ζ = 0 and κ = 0 while ζ ∗ = 0 and κ∗ = 0.
Then
−1
2
Γ = Σ−1
WLS + 2σ0 Σβ = ΣML
So
and ∆ = Σ−1
ML .
√ L
n β̂ ML − β 0 −→ N 0, σ02 ΣML .
We can show that ΣGLS − ΣML is nonnegative definite, as
−1
2
ΣML = (Σ−1
WLS + 2σ0 Σβ )
−1
= ΣWLS − ΣWLS ΣWLS + σ0−2 /2Σ−1
ΣWLS .
β
13 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
That is, β̂ ML is more efficient compared to β̂ GLS when the data truly
are normal and we use the normal theory ML estimating equations for
β.
The source of improvement is from Σβ , which arises from taking
advantage of the additional information β available in the variance
function g (·).
14 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
When the data are only symmetrically distributed
That is, we choose ζ = 0 and κ = 0 while ζ ∗ = 0 and κ∗ > 0.
Then
2
2
Γ = Σ−1
and ∆ = Σ−1
WLS + 2σ0 Σβ
WLS + (2 + κ)σ0 Σβ .
√ L
n β̂ ML − β 0 −→ N 0, σ02 ΣQ , with ΣQ = Γ−1 ∆Γ
We can show that ΣGLS − ΣQ is nonnegative definite, if κ∗ ≤ 2.
That is, the optimality of β̂ ML no longer applies uniformly, when the
data are only symmetric and we use the normal theory ML estimating
equations for β.
15 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
When the data are symmetrically distributed
...and we correctly specify both ζ and κ. That is, we choose
ζ = ζ ∗ = 0 and κ = κ∗ .
Then
Γ = Σ−1
WLS +
and
with
16 / 26
4σ02
Σβ
2 + κ∗
and ∆ = Σ−1
WLS +
√ L
n β̂ ML − β 0 −→ N 0, σ02 ΣC ,
−1
4σ02
−1
ΣC = ΣWLS +
Σβ
2 + κ∗
Quadratic versus Linear Estimating Equations
4σ02
Σβ
2 + κ∗
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
We can show that ΣGLS − ΣC is nonnegative definite.
That is, β̂ ML is more efficient compared to β̂ GLS when we know the
data are symmetric and we are able to specify correctly a value for
the excess kurtosis.
17 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
When the variance function g (·) does not depend on β
In this case, Σβ = 0 and Tβ = 0.
Then
Γ = ∆ = Σ−1
WLS
and
√ L
n β̂ ML − β 0 −→ N 0, σ02 ΣWLS .
That is, there is nothing to be gained by using a quadratic estimating
equation over a linear one, because there is no additional information
on β to be gained from g (·).
18 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
In general
The large sample properties of the quadratic estimator depend on the
assumed and true third and fourth moments of the data. Those of
the GLS estimator do not, and are unchanged regardless of the
nature of the true third and fourth moments.
If the third and fourth moments are correctly specified, the linear
estimator is inefficient relative to the quadratic estimator for β. If
these are not correctly specified, it is no longer clear that one
estimator dominates the other in terms of efficiency.
Intuitively, because the performance of the quadratic estimator
depends on third and fourth moment properties, it would seem to be
sensitive to incorrect assumptions about them, whereas the
performance of the GLS estimator does not depend on these
moments at all. — We will study this next.
19 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Sensitivity analysis of linear and quadratic equations to
misspecification of third and fourth moments
Consider an example and numerical analysis:
True model:
Working model:
E (Yj ) = β0 ,
E (Yj ) = β,
var(Yj ) = σ02 β02 ,
var(Yj ) = σ 2 β 2 .
We can obtain:
p
β̂GLS = Ȳ −→ β0
P
(Ȳ 2 + 4σ02 nj=1 Yj2 /n)1/2 − Ȳ p
β̂ML =
−→ β0
2σ02
as well as the explicit forms of ΣWLS and ΣML .
20 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Table:
ARE of ML to GLS for the simple model
True Distribution
κ0
ζ0
σ0
ARE
Normal
0
0
0
0
0
0
0.20
0.30
1.00
1.08
1.18
3.00
Symmetric
(ζ0 = 0)
2
2
2
4
4
4
6
6
6
8
8
8
0
0
0
0
0
0
0
0
0
0
0
0
0.20
0.30
1.00
0.20
0.30
1.00
0.20
0.30
1.00
0.20
0.30
1.00
1.01
1.02
1.80
0.94
0.90
1.29
0.88
0.81
1.00
0.83
0.73
0.82
0.24
0.54
0.96
6.00
0.40
0.60
0.80
2.00
0.20
0.30
0.40
1.00
0.93
0.88
0.82
0.69
Gamma
(ζ0 = 2σ0 , κ0 = 6σ02 )
21 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If the data are truly normally distributed
The quadratic estimator is uniformly more precise than the GLS
estimator, as expected.
Also note for σ0 ’s that are relatively small (≤ 0.30), the gain in
efficiency for ML is not substantial and decreases with decreasing σ0 .
So, for “high quality” data where the “signal” dominates the “noise”
(small σ0 ), ML and GLS appear to exhibit similar performance; for
“low quality” data, where the noise dominates the signal, we see that
ML performs substantially better.
This makes intuitive sense – as the ML estimator exploits information
about β in the variance, when the variance is large, it seems likely
that we would be able to gain more information about β than when
the variance is of much smaller magnitude than the mean.
22 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If the data come from a symmetric but “heavy-tailed” distribution
The quadratic estimator, which assumes excess kurtosis is zero, is
inefficient relative to GLS, except when σ0 gets very large.
The inefficiency becomes worse as κ0 increases.
This shows that there is no general ordering of the relative precision
of GLS and normal theory ML in this case.
23 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If the data come from a Gamma distribution
Recall that the linear estimator β̂GLS is the maximum likelihood
estimator for β under the gamma distribution; hence, we would
expect that GLS is uniformly relatively more efficient, as seen in the
table.
In practice, it may be difficult to distinguish between normal and
gamma distributions if σ0 is “small”. Thus, if we mistakenly assume
normality when the data really arise from a gamma distribution, and
use β̂ML instead of β̂GLS , we stand to lose efficiency.
24 / 26
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Sensitivity analysis of linear and quadratic equations to
misspecification of variance function
Consider an example and numerical analysis:
True model:
Working model:
E (Yj ) = β0 ,
E (Yj ) = β,
var(Yj ) = σ02 β02+2θ0 ,
var(Yj ) = σ 2 β 2 .
We can obtain:
β̂GLS
β̂ML
25 / 26
p
Ȳ −→ β0
P
(Ȳ 2 + 4σ02 nj=1 Yj2 /n)1/2 − Ȳ
=
2σ02
(
)
2
4 2θ0 1/2
(1
+
4σ
+
4σ
β
)
−
1
p
0 0
0
−→ β0
.
2σ02
=
Quadratic versus Linear Estimating Equations
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Misspecification of the variance function g (·) can cause β̂ML to be
inconsistent.
By contrast, misspecification of the variance function g (·) can cause
β̂GLS to be inefficient, but still consistent.
Bottom line
Unless we have extensive information about third and fourth
moments, or about the full conditional distribution of Y (say,
normal), using GLS seems safer.
26 / 26
Quadratic versus Linear Estimating Equations