September 17

ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Likelihood Inference
Recall the form of the scaled exponential family density:
yj ξj − b (ξj )
f (yj ; ξj , σ) = exp
+ c(yj , σ) .
σ2
In the generalized nonlinear model, we assume
E (Yj |xj ) = f (xj , β) .
1 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
This determines ξj through:
bξ (ξj ) = f (xj , β) .
The variance is then determined by
var (Yj ) = σ 2 bξξ bξ−1 [f (xj , β)]
= σ 2 g [f (xj , β)]2
where
q g (·) = bξξ bξ−1 (·) .
2 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Gradient of the log-likelihood
y − bξ (ξ) ∂ξ ∂µ
σ2
∂µ ∂β
1
y − bξ (ξ)
=
fβ (x, β)
2
σ
g [f (x, β)]2
∂`(β, σ; y )
=
∂β
Estimating equations
n
X
j=1
1
{Yj − f (xj , β)} fβ (xj , β) = 0.
g [f (xj , β)]2
This is in the GLS form, with g (·) a function of β only through the
mean f (xj , β), and no additional variance parameters θ.
3 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Iterative Solutions
Newton-Raphson
β (a+1) = β (a) −
( n
X
)−1
`ββj (β (a) , σ; Yj )
j=1
n
X
`βj (β (a) , σ; Yj ).
j=1
Fisher scoring
β (a+1)
" ( n
)#−1 n
X
X
= β (a) − E
`ββj (β (a) , σ; Yj )
`βj (β (a) , σ; Yj ).
j=1
4 / 17
j=1
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The derivatives
`βj (β (a) , σ; Yj ) =
{Yj − f (xj , β)} fβ (xj , β)
σ 2 g {f (xj , β)}2
and
fβ (xj , β)fβT (xj , β)
σ 2 g {f (xj , β)}2
∂
fβ (xj , β)
+ {Yj − f (xj , β)}
∂β σ 2 g {f (xj , β)}2
`ββj (β (a) , σ; Yj ) = −
5 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Fisher Scoring
Note that the second term in `ββj has zero expected value, so
( n
)
n
X
X
fβ (xj , β)fβT (xj , β)
`ββj (β (a) , σ; Yj ) = −
E
σ 2 g {f (xj , β)}2
j=1
j=1
So the Fisher Scoring method can be written in the IRWLS form.
6 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Notation

fβ (x1 , β)T


..
X(β) = 
,
.
(n×p)
T
fβ (xn , β)


f (x1 , β)


..
f(β) = 
,
.
(n×1)
f (xn , β)

W(β) = diag g −2 {f (xj , β)}
(n×n)
7 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Iteration
β (a+1) = β (a) +
n
T
o−1
T
X β (a) W(a) X β (a)
X β (a) W(a) Y − f β (a)
Recall
Previously, IRWLS was obtained from directly solving GLS derived
from a loss function with plug-in weights.
8 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Estimating σ 2
Usual GLS estimator is
σ̂ 2 =
1
n−p
n
X
j=1
n
o2
Yj − f xj , β̂
n o2
g f xj , β̂
This is not the mle derived from the scaled exponential density.
It is unbiased in the special case of the Gaussian likelihood and a
linear model, provided only that the errors are uncorrelated with
constant variance.
9 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Interpreting σ 2
Some members of the scaled exponential family have natural scale
parameters; e.g., gaussian, gamma, inverse gaussian.
Some do not; e.g., binomial, Poisson.
In the latter case, an estimated σ 2 that is not close to 1 indicates
lack of fit.
Most commonly, σ 2 > 1: over-dispersion.
10 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Quasi-Likelihood
Consider the mean-variance specification
E (Yj | xj ) = f (xj , β)
var (Yj |xj ) = σ 2 g {f (xj , β)}2 .
If g (·) matches one of the scaled exponential distributions, and either
σ 2 = 1 or the distribution contains a scale parameter, the GLS
estimating equation
n
X
{Yj − f (xj , β)} fβ (xj , β)
=0
g [f (xj , β)]2
j=1
may be interpreted as the ML estimating equation.
11 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
This gives a formal justification to the GLS approach.
What about other cases?
Define the log quasi-likelihood function
Z µ
y −u
1
du.
`QL (µ; y ) = 2
σ y g (u)2
The log quasi-likelihood for a sample is the sum
n
X
`QL (µj ; yj )
j=1
where
µj = E ( Yj | xj ) = f (xj , β) .
12 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Maximizing the log quasi-likelihood leads directly to the same GLS
estimating equation
n
X
{Yj − f (xj , β)} fβ (xj , β)
= 0.
g [f (xj , β)]2
j=1
So even when it is not ML for some scaled exponential family
distribution, GLS may be interpreted as maximum quasi-likelihood.
This may reassure you!
13 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Deviance
Consider Y1 , Y2 , . . . , Yn following the scaled exponential density
yj ξj − b (ξj )
f (yj ; ξj , σ) = exp
+ c(yj , σ) .
σ2
This may be written as a function of
µj = E ( Yj | xj ) = bξ (ξj ) .
Write L(µ; Y) for the log-likelihood.
14 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Consider two cases:
µj , and hence ξj , unconstrained;
a model µj = f (xj , β).
In the unconstrained case, maximized log-likelihood is L(Y; Y).
In the model case, maximized log-likelihood is L(µ̂; Y).
The deviance is
D(Y, µ̂) = 2σ 2 {L(Y; Y) − L(µ̂; Y).}
15 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If the model is correctly specified and σ 2 = 1, then
D(Y, µ̂) ∼ χ2n−p .
This is in general an approximation, but exact for the Gaussian linear
model.
For nested models, the reduction in deviance
D (Y, µ̂reduced ) − D (Y, µ̂full ) ∼ χ2q
where q is the difference in model degrees of freedom, under the null
hypothesis that the reduced model is a correct specification.
16 / 17
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If the model is correctly specified and σ 2 6= 1, then the scaled
deviance D(Y, µ̂)/σ 2 is (exactly or approximately) χ2 distributed.
If the model is not correctly specified (e.g. σ 2 > 1 in a model with no
natural scale parameter), little is known about the distribution of
deviance.
The special case of the Gaussian linear model suggests that scaled
deviance, with σ 2 replaced with σ̂ 2 , might be approximately χ2
distributed.
17 / 17