October 13

ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Sampling Distributions
Recall the general mean-variance specification
E(Y |x) = f (x, β),
var(Y |x) = σ 2 g (β, θ, x)2 .
Closed form estimators with exact known sampling distributions exist
only in special cases, principally the linear model f (x, β) = xT β with
Gaussian errors and known variances.
β̂ ∼ N(β, σ 2 (XT X)−1 ) for any fixed n.
Otherwise, we must use large sample approximations by letting
n → ∞.
1 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Issues we aim to address
Analogs of the “unbiasedness” and “minimum variance” properties.
Large sample approximations that do not depend on specific
distributions for the errors.
Consequences of mis-specification of the variances.
Tradeoffs between linear and quadratic estimating equations for β,
the effect of knowing θ versus the need to estimate it.
2 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Fundamental concepts
Asymptotic distribution.
Asymptotic relative efficiency.
Disclaimer
A casual treatment.
3 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Review of Large Sample Tools
a.s.
Definition: almost sure convergence: Yn −→ Y iff
P lim Yn = Y = 1.
n→∞
p
Definition: convergence in probability: Yn −→ Y iff ∀δ > 0,
lim P(|Yn − Y | < δ) = 1.
n→∞
a.s.
p
Yn −→ Y ⇒ Yn −→ Y but not conversely.
4 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If h(·) is continuous, then
a.s.
a.s.
p
p
Yn −→ Y ⇒ h(Yn ) −→ h(Y )
Yn −→ Y ⇒ h(Yn ) −→ h(Y )
5 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Terminology
If η̂n is an estimator from a sample of size n and η0 is the true value
of the parameter, then we have two definitions of consistency:
a.s.
Strong consistency: η̂n −→ η0 ;
p
Weak consistency: η̂n −→ η0 .
Interpretation
Weak consistency: if the sample size n is sufficiently large, the
probability is small that η̂n assumes a value outside an arbitrarily
small neighborhood of η0 ; i.e., for n large enough, the probability that
η̂n will wander away is small.
6 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Order in probability – Op
P n−k ||Yn || < M > 1 − ,
for all n > n . Here k · k denotes some appropriate norm to measure
magnitude of Yn .
Notation: Yn = Op (nk ); “Big” Op .
7 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Remarks on Op
If k = 0, Yn = Op (1).
Practically this says that, as n gets large, Yn does not become
negligible, nor does it “blow up.” Instead, it is “nicely behaved.”
If k = −1/2, Yn = Op (n−1/2 ).
As n → ∞, n−1/2 → 0, so Yn itself “approaches” (converges in
probability to) zero at the same “rate” as n−1/2 .
8 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Order in probability – op
p
n−k Yn −→ 0,
as n → ∞.
Notation: Yn = op (nk ); “Little” op .
9 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Remarks on op
If k = −1/2, Yn = op (n−1/2 ).
As n → ∞, n−1/2 → 0, so Yn itself “approaches” (converges in
probability to) zero at a faster “rate” than n−1/2 .
10 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Properties
If Xn = op (nk1 ) and Yn = op (nk2 ), then Xn Yn = op (nk1 +k2 )
If Xn = op (nk1 ) and Yn = Op (nk2 ), then Xn Yn = op (nk1 +k2 )
Note that Yn = op (n) ⇒ Yn = Op (n), so the second property implies
the first.
11 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Convergence in distribution
L
Definition: convergence in distribution (or law): Yn −→ Y iff for
each continuity point y of FY (·),
lim FYn (y ) = FY (y ).
n→∞
p
L
Note: Yn −→ Y ⇒ Yn −→ Y but, in general, not conversely.
L
Special (and trivial) case: if Yn −→ y where y is a constant, then
p
Yn −→ y .
Practical interpretation
If we are interested in probability and distributional statements about
Yn , we may approximate these with statements about Y .
12 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Asymptotic normality
If we can find sequences {an } and {cn > 0} such that
L
cn (Yn − an ) −→ N(0, 1)
we say that Yn is asymptotically normal with asymptotic mean an and
asymptotic variance cn−2 .
We write
1
Yn ∼ N an , 2 .
cn
·
13 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Central Limit Theorem
Zj are independent with E(Zj ) = µj , var(Zj ) = Σj .
The variance matrices satisfy
1
(Σ1 + Σ2 + · · · + Σn ) = Σ.
n→∞ n
lim
The tails of the distributions of Zj satisfy the Lindeberg condition:
∀ > 0,
n Z
1X
||z − µj ||2 dFj (z) → 0 as n → ∞.
n j=1 ||Zj −µj ||≥√n
14 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Then
n
L
1 X
√
Zj − µj −→ N(0, Σ).
n j=1
In the language of asymptotic normality: if
n
1X
Z̄n =
Zj
n j=1
and
n
µ̄n =
then
1
Z̄n ∼ N µ̄n , Σ .
n
·
15 / 50
1X
µ,
n j=1 j
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
More general CLT
If we also write
n
1X
Σj ,
Σ̄n =
n j=1
the variance matrix condition can be written
lim Σ̄n = Σ.
n→∞
A more general CLT does not require this convergence:
1
·
Z̄n ∼ N µ̄n , Σ̄n .
n
16 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The Lindeberg condition becomes: ∀ > 0,
n
i
1 X h
E 1{||Zj −µj ||≥√nλn } ||Zj − µj ||2 → 0 as n → ∞,
nλn j=1
where λn is the smallest eigenvalue of Σ̄n .
Under only this modified Lindeberg condition,
1
·
Z̄n ∼ N µ̄n , Σ̄n .
n
17 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
In terms of convergence in distribution:
L
Cn Z̄n − µ̄n −→ N(0, I)
where Cn is any inverse square root of n1 Σ̄n :
1
Cn
Σ̄n CTn = I.
n
18 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Slutsky’s Theorem
L
p
If Yn −→ Y and Vn −→ c, a constant, then:
L
Yn + Vn −→ Y + c
L
Yn Vn −→ cY
and, if c 6= 0,
L
Yn /Vn −→ Y /c.
19 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Multivariate version
L
p
If Yn −→ Y and Vn −→ C, a constant matrix, then:
L
Yn + Vn −→ Y + C
L
Vn Yn −→ CY
and, if C is nonsingular,
L
Vn−1 Yn −→ C−1 Y.
20 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Weak Law of Large Numbers
{Zj } are uncorrelated and {aj } are constants.
If
n
var
1X
aj Zj
n j=1
then
!
n
1 X 2
= 2
a var (Zj ) → 0 as n → ∞,
n j=1 j
n
n
1X
1X
p
aj Zj −
aj E (Zj ) −→ 0.
n j=1
n j=1
Furthermore, if n−1
Pn
−1
j=1 aj E (Zj ) → c, then n
The condition is satisfied if n−1
similar requirement to the CLT.
21 / 50
Pn
j=1
Pn
j=1
p
aj Zj −→ c.
aj2 var(Zj ) → c, which is a
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
How do we use all these results?
Suppose that some estimator η̂ satisfies
√
n (η̂ − η 0 ) = A−1
n Cn + op (1),
where:
p
An satisfies the WLLN, and An −→ C, a constant matrix;
Cn satisfies the CLT, and is asymptotically normal with zero
mean.
·
Then η̂ ∼ N(η 0 , Σn ) for some asymptotic variance matrix Σn ,
typically of the form n−1 Σ.
22 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Comparing estimators
Suppose that η̂ (1) and η̂ (2) are both asymptotically normal with
asymptotic mean η 0 , but with different asymptotic variance matrices
n−1 Σ(1) and n−1 Σ(2) .
Which should we prefer?
In the univariate case, the one with the smaller asymptotic variance.
23 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
In the multivariate case, consider estimating the linear combination
λT η:
1
T (1)
var λ η̂
= λT Σ(1) λ
n
1
var λT η̂ (2) = λT Σ(2) λ.
n
So if for any λ, λT Σ(1) λ ≤ λT Σ(2) λ, we prefer η̂ (1) .
That is, if Σ(2) − Σ(1) is non-negative definite.
24 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Asymptotic relative efficiency
To measure how much better η̂ (1) is than η̂ (2) , use the generalized
asymptotic relative efficiency
1
 k
(1) Σ

 .
(2) Σ Note: for a given λ, the ARE is
λT Σ(1) λ
.
λT Σ(2) λ
As λ varies, this ratio varies between the smallest and largest
−1
eigenvalues of Σ(1) Σ(2) .
25 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
The eigenvalues of Σ(1) Σ(2)
−1
could be called the canonical AREs.
The product of the canonical AREs is
(1) Σ
(1) (2) −1 = (generalized ARE)k
Σ Σ
= Σ(2) so the generalized ARE is the geometric mean of the canonical AREs.
26 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
M-estimators
{Zj } are independent, each with a distribution depending on a
parameter η.
An M-estimator η̂ satisfies either:
η̂ minimizes a scalar criterion
n
X
ρj (Zj , η) ;
j=1
η̂ is the solution of estimating equations
n
X
Ψj (Zj , η) = 0.
j=1
27 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If the ρj (·) are differentiable, the minimum of the scalar criterion can
be found by solving the corresponding gradient equation.
Not all estimating equations can be interpreted as gradient equations,
so the estimating equation approach is somewhat more general.
The MLE minimizes a scalar criterion of this form, with
ρj (·) = −2 × log likelihood of Zj ;
M-estimators may be thought of as generalized MLEs.
28 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Consistency
Generally, η̂ is a consistent estimator of the true parameter value η 0 if
Eη0 {Ψj (Zj , η 0 )} = 0.
and
Eη0 {Ψj (Zj , η ∗ )} =
6 0
for any η ∗ 6= η 0 (unique η 0 ).
29 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Asymptotic normality
p
Assuming η̂ −→ η 0 , we expand
n
X
Ψj (Zj , η̂) = 0
j=1
around η 0 :
n
1 X
Ψj (Zj , η̂)
0 = √
n j=1
n
1 X
≈ √
Ψj (Zj , η 0 ) +
n j=1
30 / 50
n
1 X ∂Ψj (Zj , η 0 )
n j=1
∂η
Sampling Distributions
!
√
n(η̂ − η 0 ).
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Rearrange as:
√
where
n (η̂ − η 0 ) = −A−1
n Cn + op (1).
n
1 X ∂Ψj (Zj , η 0 )
An =
n j=1
∂η
and
n
1 X
Cn = √
Ψj (Zj , η 0 ) .
n j=1
31 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
First we can show that An satisfies the WLLN:
n
1X
∂Ψj (Zj , η 0 )
p
An −
−→ 0.
E
n j=1
∂η
If in addition
n
1X
∂Ψj (Zj , η 0 )
E
→A
n j=1
∂η
for some A, assumed to be nonsingular, then
p
An −→ A.
32 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
·
Next we show that Cn satisfies the CLT: Cn ∼ N(0, B), where
B = lim Bn
n→∞
(assumed to exist), and
n
o
1X n
E Ψj (Zj , η 0 ) Ψj (Zj , η 0 )T .
Bn =
n j=1
Finally Slutsky’s theorem implies that
n
o
√
L
−1
−1 T
n (η̂ − η 0 ) −→ N 0, A B A
.
33 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Equivalently,
T
1
η̂ ∼ N η 0 , A−1 B A−1
n
·
.
We can use the alternative version of the CLT to show that
1 −1
·
−1 T
η̂ ∼ N η 0 , An Bn An
n
without requiring the existence of either limn→∞ An or limn→∞ Bn .
To use this asymptotic distribution as a small sample approximation,
−1 T
we need an approximation to n1 A−1
n Bn (An ) .
34 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
For An , plug in η̂ for η 0 :
n
Ân =
1 X ∂Ψj (Zj , η̂)
n j=1
∂η
For Bn , two strategies:
Plug in η̂ for η 0 , in Ψj (·) and in the expectation; model-based
variance estimator.
Replace the expectation with its sample analog:
n
1X
B̂n =
Ψj (Zj , η̂) Ψj (Zj , η̂)T ;
n j=1
a sandwich variance estimator.
35 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
WLS as an M-estimator
The usual nonlinear mean model
E(Yj | xj ) = f (xj , β) ,
but with x1 , x2 , . . . , xn treated as fixed.
Simple variance structure
var(Yj | xj ) =
36 / 50
σ2
.
wj
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Suppose we fit using working variances
var(Yj | xj ) =
σ2
.
uj
That is, we estimate β by solving
n
X
uj {Yj − f (xj , β)} fβ (xj , β) = 0
j=1
using working weights uj .
37 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Note that
Eβ (Ψj (Yj , β)| xj ) = 0
because the mean is specified correctly, regardless of the
mis-specification of the variances.
So β̂ u is still consistent.
This is in M-estimator form, with Zj = Yj , η = β, and
Ψj (Yj , β) = uj {Yj − f (xj , β)} fβ (xj , β) .
38 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Following the same argument as before,
n1/2 β̂ − β 0 ≈ −A−1
n Cn ,
where
Cn = σ0 n
−1/2
n
X
−1/2
uj wj
j fβ (xj , β 0 ),
j=1
An = An1 + An2 ,
n
X
−1/2
An1 = σ0 n−1
uj wj
j fββ (xj , β 0 ),
An2 = −n−1
j=1
n
X
uj fβ (xj , β 0 )fβ (xj , β 0 )T .
j=1
39 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Then β̂ u is asymptotically normal:
n
−1 T
−1 o
·
X UW−1 UX XT UX
β̂ u ∼ N β 0 , σ02 XT UX
where X is the gradient matrix



X = X(β) = 

fβ (x1 , β)T
fβ (x2 , β)T
..
.
fβ (xn , β)T





and U = diag(u1 , u2 , . . . , un ), W = diag(w1 , w2 , . . . , wn ).
40 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Special cases
If the working variances are equal to the true variances, the result
simplifies to
n
−1 o
·
β̂ w ∼ N β 0 , σ02 XT WX
If the true variances are constant, i.e., W = I, the result further
simplifies to the familiar OLS form
n
−1 o
·
β̂ w ∼ N β 0 , σ02 XT X
.
41 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Comparisons via asymptotic relative efficiency
Fit OLS, i.e., U = I, when the true weight matrix is W.
The asymptotic variance matrix of β̂ OLS is
σ02 XT X
−1
XT W−1 X
The asymptotic variance matrix of β̂ w is
−1
σ02 XT WX
.
42 / 50
Sampling Distributions
XT X
−1
.
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
We can write
−1 T −1 T −1
−1
XT X
X W X X X
− XT WX
= QT (I − P)Q,
where
and
−1 T 1
1
P = W 2 X XT WX
X W2
1
Q = W − 2 X XT X
−1
.
But P and I − P are symmetric and idempotent, hence nonnegative
definite.
So the variance of β̂ OLS exceeds that of β̂ w by a nonnegative definite
matrix.
43 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
More generally, suppose we fit WLS with a working weight matrix U,
when the true weight matrix is W.
The asymptotic variance matrix of β̂ u is
σ02 XT UX
−1
XT UW−1 UX
The asymptotic variance matrix of β̂ w is
−1
σ02 XT WX
.
44 / 50
Sampling Distributions
XT UX
−1
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
For general U, a similar argument shows that the variance of β̂ u
exceeds that of β̂ w by a nonnegative definite matrix.
So the correctly weighted β̂ w is asymptotically optimal.
All canonical asymptotic relative efficiencies of β̂ OLS or β̂ u with
respect to β̂ w are at most 1.
45 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Best case: PQ = Q ⇒ OLS is fully efficient.
This occurs when each column of WX is in the column space of X;
e.g., each column of X is an eigenvector of W.
Worst case: generalized ARE is
4w1 wn
4w2 wn−1
4wk wn−k+1
2 ×
2 × ··· ×
(w1 + wn )
(w2 + wn−1 )
(wk + wn−k+1 )2
k1
where w1 ≥ w2 ≥ · · · ≥ wn are the ordered weights (and n ≥ 2k).
This occurs, e.g., when the j th column of X is the sum of the j th and
(n − j + 1)th eigenvectors of W.
46 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
If w1 < 2wn then
4w1 wn
8
= 0.89.
2 >
9
(w1 + wn )
Also w2 ≤ w1 < 2wn ≤ 2wn−1 , so
8
4w2 wn−1
2 > ,
9
(w2 + wn−1 )
and so on.
So generalized ARE > 89 .
Conclusion: if optimal weights vary by less than 2 : 1, OLS is at least
89% efficient.
47 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
In the worst case, the canonical AREs are the factors
4wj wn−j+1
, j = 1, 2, . . . , k.
(wj + wn−j+1 )2
So if we are interested in estimating the linear combination λT β,
T
var
λ
β̂
w
4w1 wn
4wk wn−k+1
≤
.
2 ≤
(w1 + wn )
(wk + wn−k+1 )2
var λT β̂ OLS
48 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
So again, if w1 < 2wn , then
var λT β̂ w
8
>
9
var λT β̂ OLS
and, for all λ, the ARE of β̂ OLS for estimating λT β is at least .89.
That is, there are no linear combinations λT β for which β̂ OLS
performs especially badly.
49 / 50
Sampling Distributions
ST 762
Nonlinear Statistical Models for Univariate and Multivariate Response
Note: above inequalities are true for general, non-diagonal, W if
“optimal weight” is replaced by “eigenvalue of W”.
They also generalize to arbitrary W and U if W is replaced by U−1 W.
50 / 50
Sampling Distributions