October 22

ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The “Folklore” Theorem
Return to the general mean-variance specification
E(Y |x) = f (x, β),
var(Y |x) = σ 2 g (β, θ, x)2 .
Estimation of β via solution of linear estimating equations.
Asymptotic properties of β̂ under fixed weights vs estimated weights.
Throughout we assume the model for the mean function f is correct.
Then we consider two scenarios for the variance function g (·):
The working model for g (·) is correctly specified;
The working model for g (·) is incorrectly specified.
1 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Iterative GLS scheme
∗
Preliminary estimators β̂ and θ̂.
Update σ̂ and θ̂ by solving

n
o2
∗
n
∗
X  Yj − f xj , β̂


∗
2 − 1 τθ β̂ , θ̂, xj = 0,
j=1
σ̂ 2 g β̂ , θ̂, xj
where
τθ (β, θ, x) =
2 / 32
1
νθ (β, θ, x)
The “Folklore” Theorem
.
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Then update β̂ by solving
n
X
j=1
Yj − f xj , β̂
×
f
x
,
β̂
= 0.
∗
2
β
j
g β̂ , θ̂, xj
∗
Note that the β argument of g (·) is held at β̂ in this update.
Iterate the last two steps C times, possibly to convergence (C = ∞).
3 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The theorem
Suppose that
√ ∗
n β̂ − β 0 = Op (1)
and
√ n θ̂ − θ 0 = Op (1)
∗
(i.e., β̂ and θ̂ are
√
n-consistent).
Then, under suitable regularity conditions,
√ L
n β̂ − β 0 −→ N 0, σ02 ΣWLS .
4 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Here
ΣWLS =
−1
lim n−1 XT WX
n→∞
where



X = X(β 0 ) = 

fβ (x1 , β 0 )T
fβ (x2 , β 0 )T
..
.
fβ (xn , β 0 )T
W = diag(w1 , w2 , . . . , wn ) ,
1
wj =
.
g (β 0 , θ 0 , xj )2
5 / 32
The “Folklore” Theorem



,

ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Derivation
0 = n−1/2
n
∗
−2 n
o X
g β̂ , θ̂, xj
Yj − f xj , β̂
fβ xj , β̂
j=1
≈ n−1/2
n
X
g (β 0 , θ 0 , xj )−2 {Yj − f (xj , β 0 )}fβ (xj , β 0 )
j=1

n
X
+ n−1
{Yj − f (xj , β 0 )}fββ (xj , β 0 )
j=1
−n
−1
n
X

g (β 0 , θ 0 , xj )
−2
T  1/2
fβ (xj , β 0 )fβ (xj , β 0 )
n
β̂ − β 0
j=1

+ −2n
−1
n
X
−1
n
X

−3
g (β 0 , θ 0 , xj )
T  1/2
{Yj − f (xj , β 0 )}fβ (xj , β 0 )gβ (β 0 , θ 0 , xj )
n
∗
β̂ − β 0
j=1

+ −2n

−3
g (β 0 , θ 0 , xj )
{Yj − f (xj , β 0 )}fβ (xj , β 0 )gθ (β 0 , θ 0 , xj )
j=1
6 / 32
The “Folklore” Theorem
T  1/2
n
θ̂ − θ 0
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Rewrite as:
∗
0 ≈ Cn +(An1 +An2 )n1/2 (β̂ −β 0 )+Dn n1/2 (β̂ −β 0 )+En n1/2 (θ̂ −θ 0 )
where
Cn = σ0 n−1/2
n
X
1/2
wj
fβ (xj , β 0 )j
j=1
An1 = σ0 n−1
n
X
1/2
wj
fββ (xj , β 0 )j
j=1
An2 = −n−1
n
X
wj fβ (xj , β 0 )fβ (xj , β 0 )T
j=1
Dn = −2σ0 n−1
n
X
wj fβ (xj , β 0 )νβ (β 0 θ 0 xj , )T j
j=1
En = −2σ0 n−1
n
X
wj fβ (xj , β 0 )νθ (β 0 θ 0 xj , )T j
j=1
7 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Also
j =
Yj − f (xj , β 0 )
σ0 g (β 0 , θ 0 , xj )
so that
E (j |xj ) = 0
and
var(j |xj ) = 1.
8 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
We have that
p
An1 −→ 0
p
An2 −→ −Σ−1
WLS
L
Cn −→ N 0, σ02 Σ−1
WLS
p
Dn −→ 0
p
En −→ 0
so
p
An = An1 + An2 −→ −Σ−1
WLS
9 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Then
√ n β̂ − β 0 ≈ −A−1
n Cn
L
−→ N 0, σ02 ΣWLS ,
as claimed.
10 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Remarks
This is the same large-sample distribution as for the WLS estimator
with known weights.
That is, to this order of approximation, using estimated weights gives
the same sampling distribution as if the weights were known.
∗
The terms Dn and En corresponding to the “effects” of β̂ and θ̂,
respectively, are op (1). This implies that the estimators we substitute
for β and θ in the weights play no role in determining the large
sample properties of the resulting estimator β̂.
Because the effect of θ̂ is also negligible, as En = op (1), it implies
that how one estimates θ does not matter in determining the
properties of the resulting GLS estimator.
11 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The main “folklore” message
The large sample precision is unaffected not only by the need to
estimate the parameters in the weights, but how these parameters are
estimated, as long as they are estimated “sensibly.”
The result is true for any C : C = 1 for a one-step estimator, C = ∞
for a converged iterated estimator.
Here we have assumed that the variance function g (·) is correctly
specified.
12 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The folklore result implies that
n
o−1 ·
T
2
.
β̂ ∼ N β 0 , σ0 X(β 0 ) W(β 0 , θ 0 ) X(β 0 )
We use this by plugging in β̂, θ̂, and
n
o2
n
Y
−
f
x
,
β̂
X
j
j
1
σ̂ 2 =
2 .
n − p j=1
g β̂, θ̂, xj
This is the approximation used by SAS’s proc nlin and the R
function nls.
13 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Working Variances
Recall the general mean-variance specification
E(Y |x) = f (x, β),
var(Y |x) = σ 2 g (β, θ, x)2 .
Suppose we use the GLS scheme with the correct mean specification,
but a working variance specification
var(Y |x) = τ 2 h(β, γ, x)2 .
What do τ̂ 2 and γ̂ estimate, and what is the consequence for β̂?
14 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
We solve
n
∗
o2
2 
∗
2
n
− τ̂ h β̂ , γ̂, xj  ∗
X  Yj − f xj , β̂

 ξγ β̂ , γ̂, xj = 0,
∗
2
j=1
h β̂ , γ̂, xj
where
ξγ (β, γ, x) =
1
∂ log h(β,γ,x)
∂γ
.
Suppose that there exist γ n and τn2 such that
#
!
"
n
X
{Yj − f (xj , β)}2 − τn2 h(β, γ n , xj )2
ξγ (β, γ n , xj ) = 0.
E
h(β, γ n , xj )2
j=1
15 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Then we can view γ̂ and τ̂ 2 as estimators of γ n and τn2 .
Also, if γ n → γ ∗ and τn2 → τ ∗ 2 , then (consistency):
p
γ̂ −→ γ ∗
p
and τ̂ 2 −→ τ ∗ 2 .
√
More strongly ( n-consistency):
√
√
n (γ̂ − γ ∗ ) = Op (1) and
n τ̂ 2 − τ ∗ 2 = Op (1).
Note
In general, if we fit a parametric model f (x, θ) to data from another
density f0 (x), the MLE estimates the parameter values θ∗ that
minimize the Kullback-Leibler distance from f0 (·) to f (·, θ).
16 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Asymptotic distribution of β̂
Using similar methods to those used earlier, we show that
n
−1 o
−1 T
·
.
X UW−1 UX XT UX
β̂ ∼ N β 0 , σ02 XT UX
17 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Here, as before,



X = X(β 0 ) = 

fβ (x1 , β 0 )T
fβ (x2 , β 0 )T
..
.
fβ (xn , β 0 )T
W = diag(w1 , w2 , . . . , wn ) ,
1
,
wj =
g (β 0 , θ 0 , xj )2
U = diag(u1 , u2 , . . . , un ) ,
1
uj =
.
h(β 0 , γ ∗ , xj )2
18 / 32
The “Folklore” Theorem



,

ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Notes
Note that this is the same asymptotic distribution as in the fixed
working weights case.
The original “folklore” theorem is the special case where the working
variance function h(·) is the same as the true variance function g (·).
The efficiency discussion carries over immediately from the fixed
weights case.
19 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Corrected Standard Errors
If we know, or suspect, that the working variance specification is not
the truth, we need to estimate the asymptotic variance matrix
σ02 XT UX
rather than
−1
XT UW−1 UX
XT UX
−1
,
−1
.
σ02 XT UX
X and U can be estimated by plugging in sample estimates β̂ and γ̂.
But how about W?
20 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Note that
i
h
σ02
= E {Yj − f (xj , β 0 )}2 xj
wj
so we can estimate σ02 W−1 by
R = diag r12 , r22 , . . . , rn2 ,
where rj is the unweighted residual
rj = Yj − f xj , β̂ .
21 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
XT URUX is a good large-sample estimator of σ02 XT UW−1 UX, as
XT URUX = σ02 XT UW−1 UX + op (n).
We are led to the sandwich variance estimator
−1 T
−1
XT UX
X URUX XT UX
.
22 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Wald Inference
The asymptotic distribution
·
β̂ ∼ N(β 0 , Σ)
may be used to construct:
confidence intervals for individual parameters or linear
combinations of parameters;
hypothesis tests about individual parameters or groups of
parameters.
23 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Inference is based on the asymptotic normal distribution of an
individual parameter estimator, or, to test a hypothesis such as
Lβ = Lβ 0 , the corresponding asymptotic χ2 distribution of a
quadratic form such as
β̂ − β 0
T
−1 L β̂ − β 0 .
LT LΣ̂LT
To allow for estimation of Σ, the normal distribution is often replaced
by the t-distribution, and the χ2 distribution by the (scaled)
F -distribution.
This replacement also gives the usual statistics in the (very!) special
case of a linear model with known variances.
24 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If the inference is about a nonlinear function of β, say, a(β), then as
T
a(β̂) ≈ a(β 0 ) + aβ (β 0 ) β̂ − β 0 ,
we have, by the Delta method
·
a(β̂) ∼ N a(β 0 ), aβT (β̂) Σ aβ (β̂)
25 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Advantage of Wald inference
Σ may be estimated either by assuming that the working
variances are correct, or by using the sandwich estimator.
Extension of familiar methods.
Disadvantages
Large-sample distribution may give a poor approximation in
small samples.
Not invariant to reparametrization of the mean model (may
reach different conclusions).
26 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Likelihood Inference
To circumvent some of the disadvantages of the Wald inference, at
the expense of increased complexity.
Start with assuming a normal distribution for Yj , then use likelihood
ratio methods for tests, and profile likelihood methods for confidence
intervals.
The large sample properties of the estimators may be sensitive to the
normality assumption for Yj .
27 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Alternative approach (less sensitive to assumptions?):
n
1X
L(β) = −
ŵj {Yj − f (xj , β)}2
2 j=1
is, up to constant terms, the log-likelihood in the linear model.
For instance, to test the hypothesis β 2 = 0, where β = (β T1 , β T2 )T ,
β 1 ∈ Rr , β 2 ∈ Rp−r , we can show that
−2{L(β̂ 0 ) − L(β̂)} · 2
∼ χp−r .
σ02
28 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Further improvement in small samples,
{L(β̂ 0 ) − L(β̂)}/(p − r )
L(β̂)/(n − p)
29 / 32
The “Folklore” Theorem
·
∼ Fp−r ,n−p .
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Optimality of GLS
Goal: to show that the GLS estimator is asymptotically optimal in
the class of linear estimating equations.
The GLS estimator β̂ satisfies
X(β)T W(β, θ 0 ) {Y − f(β)} = 0.
Here, we hold θ at its true value θ 0 ; replacing it by its estimator does
not change asymptotic distributions, due to the folklore theorem.
We assume correct variance specification, so that
n
−1 o
·
.
β̂ ∼ N β 0 , σ02 XT WX
30 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Consider β̃, satisfying the more general linear estimating equation
A(β)T {Y − f(β)} = 0.
Similar arguments (Taylor first order approximation, WLLN, CLT,
Slutsky) show that
n
−1 T −1 T −1 o
·
.
A W A X A
β̃ ∼ N β 0 , σ02 AT X
Then we can show that
−1
−1 T −1 T −1
− XT WX
A W A X A
AT X
is nonnegative definite.
31 / 32
The “Folklore” Theorem
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Conclusion
The GLS estimator is asymptotically optimal in the class of linear
estimating equations.
32 / 32
The “Folklore” Theorem