Estimation of Missing Values in Linear Model

154 통계청『통계분석연구』제3권 2호(98. 가을)
Estimation of Missing
Values in Linear Model
Jongtae Park*
This paper compares missing observations, the methods of
generalized least squares estimation and maximum likelihood
estimation
with
general
structures.
It
shows
that
the
two
methods are equivalent and provides and example for randomized
block design.
< Contents >
Ⅰ. Introduction
Ⅱ. Generalized Least Squares Estimation
Ⅲ. Maximum Likelihood Estimation
Ⅳ. Example: Randomized Block Design
* 평택대학교 전산통계학과
Estimation of Missing Values in Linear Moedel155
Ⅰ. Introduction
One common method for analyzing data in experimental design when
observations are missing was devised by Yates(1993), who developed
his procedure based upon R. A. Fisher's suggestion. Considering a
linear model with independent, equivariate errors, Yates substituted
algebraic values for the missing data and then minimized the error
sum of squares with respect to both the unknown parameters and the
algebraic values. Yates showed that this procedure yielded the correct
error sum of squares and a positively biased hypothesis sum of
squares.
Others have elaborated on this technique. Chakrabarti(1962) gave a
formal proof of Fisher's rule that produced a way to simplify the
calculations of the auxiliary values to be used in place of the missing
observations. Kshirsagar(1971) proved that the hypothesis sum of
squares based on these values was biased, and developed an easy way
to compute that bias. Sclove(1972) and others showed that Yates'
procedure was equivalent to setting the residuals of the auxiliary
values equal to zero. Feingold(1982) showed that these results are
extended to the case of a general linear model with a non-singular
covariance matrix that is known up to a scalar factor. The equations
for this model are converted into a form that is congruent to that of
the simpler model. In this way, it becomes easy to see when results
applicable to an independent error model can also be used for a
general model.
This paper compares the methods of generalized least squares
estimation and maximum likelihood estimation with general error
structures for missing observations It also shows that the maximum
likelihood estimations of the model is equivalent to least squares
156
estimation when there are missing values. As an example, the
maximum likelihood estimation of a single missing value in randomized
block design is provided.
Ⅱ. Generalized Least Squares Estimation
Consider the model
y = Xβ + ε,
where
(2.1)
y is the vector of n observations, X is the known design
matrix, β is the vector of unknown parameters, and ε has mean zero
and covariance matrix ∑ , where ∑ is non-singular and known up to
a scalar factor. Suppose that some obsercations are missing.
We
arrange y so thar it can be partitioned, and rewrite model (2.1) as
ym
=
ye
xm
β+
xe
εm
εe ,
[ ] [ ] [ ]
(2.1)
where the covariance matrix of the error vector is
∑=
[
∑ mm ∑ me
.
∑ em ∑ ee
]
(2.2)
The subscript "m" corresponds to the missing observations, and "e"
refers to the existing observations. The correct residual sum of
squares (SSE) is, of course, derived only the existing observations, y e .
SSE = min β ( y e - X eβ )' Σ ee - 1 ( y e -X e )β.
(2.3)
The design matrix, X, need not be of full rank. Therefore, the solution
to (2.3) is given in terms of generalized inverses.
β̂ = ( X e ' Σ ee - 1 X e ) - X e ' Σ ee - 1 y e.
(2.4)
We use the notation A - to mean any matrix such that AA-A = A. β̂
is not unique, but allows us to find BLUE's of all estimable linear
Estimation of Missing Values in Linear Moedel157
functions of β.
The normal equations resulting from (2.3) are often more difficult to
solve than the normal equations derived from the complete design.
Therefore, we want to find auxiliary values that can be used in place
of the missing data, so that we can proceed as if we had a complete
data set. Let us first assume that we have known quantities, f , with
which we augment the existing observations to form a "complete"
vector of augment the existing observations to form a "complete"vector
of observations. We can then the complete design and covariance
matrices
and
follow
standard
procedures
to
obtain
the
normal
equations. We can then use the complete design and covariance
matrices
and
follow
standard
procedures
to
obtain
f
ye
Xm
β
Xe
the
normal
equations. We compute
SSE( f )=min β
f
ye
[[ ] [
Xm
β ∑-1
Xe
] ] [[ ] [
]]
(2.5)
where
∑1=
[
-1
-1
-1
∑ mm.e
- ∑ mm.e ∑ me ∑ ee
-1
-1
1
- ∑ ee ∑ em ∑ mm.e
∑ee.m
]
(2.6)
and
-1
-1
∑ mm.e = ( ∑ mm - ∑ me∑ ee ∑ em )
-1
(2.7)
1
Σis defined similarly. We use SSE( t) to indicate the error sum
ee.m
of squares from a model in which t has been put in place of the
missing observations. We write
β̂ ( t ) to indicate value of
β that
minimizes SSE ( t) .
We can simplify (2.5) by using (2.6) and (2.7) , and obtain, after
some tedious but perfectly straight-forward algebra,
1
β
SSE( f) = min β ( y e - X eβ )'Σ ee ( y e -X e )
(2.8)
158
1
*
*
β
+ ( f *-X *m β )'Σ mm ( f - X m )
where
1
f * = f -∑ me∑ ee y e,
*
-1
X m = X m -∑ me∑ ee X e.
(2.9)
Note that the first part of the right hand side of (2.8) is identical to
the r.h.s. of (2.3). Therefore, it is clear that SSE( f) is equal to SSE ,
the correct error sum of squares for the existing observations, when
1
-1
β̂ ( f )= β̂ = ( Xe '∑ ee X e )-X e'∑ EE Y e,
*
*
β.
f = X mˆ
(2.10)
We can see also, by examining (2.9), that when Σ me = 0, i.e., when the
existing observations are not correlated with the missing observations,
the minimizing value for SSE( f) is simply
β.
f= Xm ˆ
(2.11)
We can simplify (2.8) further by letting
- 1/2
v e = ∑ ee y e,
- 1/2 *
u = ∑ mm.ef ,
- 1/2
T e = ∑ ee X e,
- 1/2
*
T m = ∑ mm.eX m
(2.12)
Here, we use the fact that, for any symmetric matrix, A, there exists
a matrix such that A = A 1/2 (A 1/2 )'
we can then rewrite (2.8) as
SSE( u) = min β ( v e -T eβ )
(2.13)
+ ( u - T m β )'( u - T mβ ).
We therefore arrive at a quadratic form identical to that resulting from
analysis of a simpler model,
u
=
ve
[ ] [
δe
Tm
β+
δm ,
Te
] [ ]
(2.14)
where u is the vector of known quantities that has been substituted
for
the
missing
observations,
v m,
independent, equi- variate errors since
and
δ
has
zero
mean
and
Estimation of Missing Values in Linear Moedel159
δ=
[
1/2
1/2
-1
- ∑∑mm.e
mm.e∑ me∑ ee
ε.
- 1/2
0
∑ ee
]
SSE ( u) =SSE when
β̂ ( u )= β = ( T e 'T e ) -T e'v e,
u = T m β̂.
(2.15)
These solutions to SSE ( u) are identical to the solutions (2.10) of
SSE ( f) .
Ⅲ. Maximum Likelihood Estimation
Now, the method of maximum likelihood estimation is considered.
Suppose the model is
y =Xβ + ε,
where
(3.1)
is the vector of observations, is the known design matrix, is
vector of unknown parameters, and where is positive definite and
known up to a scalar factor. Suppose that some observations are
missing.
We arrange y so that it can be partitioned, and rewrite
model (3.1) as
ym
Xm
εm
e
e
e
[ y ] = [ X ]β + [ ε ],
where the covariance matrix of the error vector is
∑=
[
∑ mm ∑ me
.
∑ em ∑ ee
]
Let us assume that we have algebraic quantities f , with which we
augment the existing observations to form a "complete" vector of
observations. Then the augmented model is
f
=
ye
[ ] [
Xm
β+
Xe
εm
εe .
] [ ]
(3.2)
160
The likelihood function for the normal erroer model (3.2) is
L(β,f)
f - X mβ
'∑ - 1
y e - X eβ
[
1
2
(
[
1
2
( ( y e - X eβ )'∑ ee
= ( 2 π ) - n/2|∑| - 1/2 exp -
)
(
f - X mβ
y e - X eβ
)]
(3.3)
= ( 2 π ) - n/2|∑| - 1/2 exp -
-1
( y e - X eβ )
*
1
*
β ) )].
+ ( f * - X *mβ )'∑ mm.e (f - X
m
where
∑
-1
=
[
-1
∑ mm.e
-1
1
- ∑ ee ∑ em∑ mm.e
-1
-1
- ∑ mm.e∑ me∑ ee
1
∑ee.m
],
and
-1
-1
∑ mm.e = ( ∑ mm - ∑ me∑ ee ∑ em )
-1
∑ ee.m =
-1
,
-1
-1
( ∑ ee - ∑ em∑ mm∑ me ) ,
-1
-1
-1
-1
= ∑ ee + ∑ ee ∑ em∑ mm.e∑ me∑ ee ,
and
1
f * = f -∑ me∑ ee y e,
*
1
X m = X m - ∑ me∑ ee X e,
(3.4)
Then
logL
n
1
1
-1
log π log |∑| - ( ( y e - X eβ )'∑ ee ( y e - X eβ )
2
2
2
*
*
-1
*
*
+ ( f -X m β )'∑ mm.e (f - X mβ ) ).
=-
(3.5)
By taking partial derivative of logL with respect to β and f , equating
each of the partials to zero,
-1
*
-1
*
-1
*
-1
*
( X e '∑ ee X e + X m'∑ mm.eX m')β = ( X e '∑ ee y e + X m'∑ mm.eˆ
f ),
ˆ
f * = X *m β̂.
1
where ˆ
f * = f̂ - ∑ me∑ ee y e.
(3.6)
(3.7)
Estimation of Missing Values in Linear Moedel161
The above system of equations is reduced by
( X e '∑ ee X e ) β̂ = X e'∑ ee y e.
-1
-1
This equation is equivalent to the normal equation of
y e = X eβ + ε e,
ε ~N( 0,∑ ee ).
In the special case of Σ = σ 2I n , the likelihood function for the normal
error model (3.2) is
L( β, σ 2,f)
1
=
×
2 n/2
( 2 πσ )
1
exp [ ( (f- X m β )'( f - X m β ) + ( y e - X e β )'(y e - X e β ))].
2σ 2
We can work with
log L , rather than L , because both L and log L
are maximized for the same values of β and f .
n
n
log 2π log σ 2
2
2
1
[ (f-X m β )'( f - X m β ) + ( y e -X e β )'(y e -X e β )].
2σ 2
log L =-
Partial differentiation of this logarithmic likelihood yields :
∂( log L)
1
= 2 [ ( X m'f +X e' y e ) - ( X m 'X m + X e'X e )β )]
σ
∂β
∂( log L)
n
=+
∂σ 2
2σ 2
1
β
β
β
β
2 [ (f- X m )'( f - X m ) + ( y e - X e )'(y e - X e )],
2σ
∂( log L)
1
=- 2 (f-Xm β ).
σ
f
Then equating each of the partials to zero :
β,
X m' f̂ + X e' y e = ( X m 'X m + X e'X e )ˆ
β,
f̂ = X mˆ
ˆ
σ 2 = 1 [ ( f̂ - X m f̂)'( f̂ - X m f̂) + ( y e - X e β̂ )'( y e - X e β̂ ) ],
n
where ˆ
β, ˆ
σ 2 and f̂ are the maximum likelihood estimators of β,σ 2
162
and f , respectively.
From the first two equations,
X e'X e β̂ = X e' y e.
The above equations are equivalent to the normal equations for the
existing observations. Thus, the maximum likelihood estimators of f
are the same estimators as those generated by the method of least
squares estimation.
Ⅳ. Example : Randomized Block Design
Suppose the model is
y ij = μ + a i + β j + ε ij, i = 1,…,t, j = 1,…,b,
where
ε N( 0,σ 2I).
If a single observation is missing, say, the
likelihood function of the model is
L( μ,a i,β j,σ 2,f) =
[
exp -
1
×
( 2 πσ 2 ) tb/2
1
2 ∑∑
i
j
2σ
(
( i, j)≠ ( l,
m)
)]
( y ij - μ - a i - β j ) 2 + (f- μ - a l- β m .
By taking partial derivative of log L with respect to μ, a i, β j, σ 2, f,
equating each of the partials to zero,
Estimation of Missing Values in Linear Moedel163
y..' - ( tb - 1) μ̂ - ( b - 1) ˆ
al- b ∑ ˆ
a i - ( t - 1β̂)
i≠m
m -t
β̂
∑
j≠m
j
β m ) = ,0
+( f̂- μ̂ - ˆ
a i-ˆ
β j = 0,i≠l,
y i. - b μ̂ - bˆ
a i - ∑ˆ
j
β j + ( f̂ - μ̂ - ˆ
β m ) = 0, i = l,
y l.' - ( b - 1) μ̂ - ( b - 1) ˆ
al- ∑ ˆ
a1-ˆ
j≠m
β j = 0,j≠l,
y j. - t μ̂ - ∑ ˆ
a i - tˆ
i
β) m + ( f̂ - μ̂ - ˆ
β m ) = 0, j = m
y m.' - ( t - 1) μ̂ - ∑ ˆ
a i - ( t - 1ˆ
al-ˆ
i≠l
∑i ∑j
( j, j)≠( l, m ) ( y ij -
μ̂ - ˆ
β j ) 2 +( f̂ - μ̂ - ˆ
β m ) = nˆ
σ 2,
ai-ˆ
al-ˆ
β m = ,0
f̂ - μ̂ - ˆ
al-ˆ
where
y i. = ∑ y
j
treatment,
ij,
block,
respectively,
y .j = ∑ y
i
and
i.e.,
grand
ij,
and
total
y l.', y .m '
for
y l.' = ∑ y ij,
j≠m
y..' = ∑ ∑
i
Here,
j
( i, j)≠ ( l,
assume
the
and
existing
y..'
are
the
observations,
y .m ' = ∑ y im,
and
t≠l
m )y ij.
that
∑i a i = 0, ∑j β j = 0 .
Than
the
equations become
β m = 0,
y..' - ( tb -1) μ̂ + ˆ
al+ˆ
y i' - b μ̂ - b ˆ
a i = 0, i≠1,
ˆ
μ̂
β
y l.'-(b-1) - ( b - 1) a l + ˆ
j≠l,
m = 0,
β j = 0,
j≠m
,
y .j - t μ̂ - tˆ
ˆ
μ̂
β
y .m' - ( t - 1) + a l - (t - 1)ˆ
m = 0, j = m.
Solving these linear equations results in
above
linear
164
ty l.' + y .m' - y..'
ˆ
al=
,
t( b - 1)
βm=
ˆ
μ̂ =
y l.' + by .m' - y ..'
,
b( t - 1)
βm
y..' + ˆ
al+ˆ
tb - 1
Thus
ty l.' + by .m' - y..'
βm=
f̂ = μ̂ + ˆ
al+ˆ
.
( t - 1)( b - 1 )
This result is the same as that of least squares estimation.
Estimation of Missing Values in Linear Moedel165
<References>
[1] Chakrabarti, M.C.(1962), Mathematics of Design and Analysis of
Experiments, New York: Asia Publishing House.
[2] Feingold, M.(1982), "Missing data in linear models with correlated
errors", Communications in StatisticsA
[3] Grybill, F.A.(1969), Introduction to Matrices with Applications in
Statistics,Belmont, California: Wadworth Publishing Company, Inc.
[4] Greenberg, B.G, A.E. Sarhan (1962), Exponential Distribution,
Contributions to Order Statistics (A.E. Sarhan and B.G. Greenberg
Eds.), New York: John Wiley.
[5] Kshirsager, A.M(1973), "Bias Due to Missing Plots", The American
Statistician 25(1), 47-50.
[6]
Lloyed,
E.H.(1962)
Contributions
to
,Generalized
Order
Statistics
Least
(A.E.
Squares
Sarhan
Theorem,
and
B.G.
Greenberg, Eds), New York : John Wiley.
[7] Sclove, S.L.(1972), "On Missing Value Estimation in Experimental
Design Models", The American Statistician 26(2), 25-26.
[8] Yates, F.(1933), "The Analysis of Replicated Experiments When the
Field Results are Incomplete", The Empire Journal of Experimental
Agriculture 1, 129-142.
166
선형모형에 있어서 결측치에 대한 추정
박 종 태
<요 약>
결측된 관측치가 주어졌을 때 일반적인 오차구조를 가지는 모형에 대하
여 일반화 최소제곱법과 최우추정법을 비교하였다. 이 경우에 두 방법들
이 서로 동치임을 입증하고 한 예로 확률화 블록설계에 있어 결측치에 대
한 최우추정량을 구해 보았다.