154 통계청『통계분석연구』제3권 2호(98. 가을) Estimation of Missing Values in Linear Model Jongtae Park* This paper compares missing observations, the methods of generalized least squares estimation and maximum likelihood estimation with general structures. It shows that the two methods are equivalent and provides and example for randomized block design. < Contents > Ⅰ. Introduction Ⅱ. Generalized Least Squares Estimation Ⅲ. Maximum Likelihood Estimation Ⅳ. Example: Randomized Block Design * 평택대학교 전산통계학과 Estimation of Missing Values in Linear Moedel155 Ⅰ. Introduction One common method for analyzing data in experimental design when observations are missing was devised by Yates(1993), who developed his procedure based upon R. A. Fisher's suggestion. Considering a linear model with independent, equivariate errors, Yates substituted algebraic values for the missing data and then minimized the error sum of squares with respect to both the unknown parameters and the algebraic values. Yates showed that this procedure yielded the correct error sum of squares and a positively biased hypothesis sum of squares. Others have elaborated on this technique. Chakrabarti(1962) gave a formal proof of Fisher's rule that produced a way to simplify the calculations of the auxiliary values to be used in place of the missing observations. Kshirsagar(1971) proved that the hypothesis sum of squares based on these values was biased, and developed an easy way to compute that bias. Sclove(1972) and others showed that Yates' procedure was equivalent to setting the residuals of the auxiliary values equal to zero. Feingold(1982) showed that these results are extended to the case of a general linear model with a non-singular covariance matrix that is known up to a scalar factor. The equations for this model are converted into a form that is congruent to that of the simpler model. In this way, it becomes easy to see when results applicable to an independent error model can also be used for a general model. This paper compares the methods of generalized least squares estimation and maximum likelihood estimation with general error structures for missing observations It also shows that the maximum likelihood estimations of the model is equivalent to least squares 156 estimation when there are missing values. As an example, the maximum likelihood estimation of a single missing value in randomized block design is provided. Ⅱ. Generalized Least Squares Estimation Consider the model y = Xβ + ε, where (2.1) y is the vector of n observations, X is the known design matrix, β is the vector of unknown parameters, and ε has mean zero and covariance matrix ∑ , where ∑ is non-singular and known up to a scalar factor. Suppose that some obsercations are missing. We arrange y so thar it can be partitioned, and rewrite model (2.1) as ym = ye xm β+ xe εm εe , [ ] [ ] [ ] (2.1) where the covariance matrix of the error vector is ∑= [ ∑ mm ∑ me . ∑ em ∑ ee ] (2.2) The subscript "m" corresponds to the missing observations, and "e" refers to the existing observations. The correct residual sum of squares (SSE) is, of course, derived only the existing observations, y e . SSE = min β ( y e - X eβ )' Σ ee - 1 ( y e -X e )β. (2.3) The design matrix, X, need not be of full rank. Therefore, the solution to (2.3) is given in terms of generalized inverses. β̂ = ( X e ' Σ ee - 1 X e ) - X e ' Σ ee - 1 y e. (2.4) We use the notation A - to mean any matrix such that AA-A = A. β̂ is not unique, but allows us to find BLUE's of all estimable linear Estimation of Missing Values in Linear Moedel157 functions of β. The normal equations resulting from (2.3) are often more difficult to solve than the normal equations derived from the complete design. Therefore, we want to find auxiliary values that can be used in place of the missing data, so that we can proceed as if we had a complete data set. Let us first assume that we have known quantities, f , with which we augment the existing observations to form a "complete" vector of augment the existing observations to form a "complete"vector of observations. We can then the complete design and covariance matrices and follow standard procedures to obtain the normal equations. We can then use the complete design and covariance matrices and follow standard procedures to obtain f ye Xm β Xe the normal equations. We compute SSE( f )=min β f ye [[ ] [ Xm β ∑-1 Xe ] ] [[ ] [ ]] (2.5) where ∑1= [ -1 -1 -1 ∑ mm.e - ∑ mm.e ∑ me ∑ ee -1 -1 1 - ∑ ee ∑ em ∑ mm.e ∑ee.m ] (2.6) and -1 -1 ∑ mm.e = ( ∑ mm - ∑ me∑ ee ∑ em ) -1 (2.7) 1 Σis defined similarly. We use SSE( t) to indicate the error sum ee.m of squares from a model in which t has been put in place of the missing observations. We write β̂ ( t ) to indicate value of β that minimizes SSE ( t) . We can simplify (2.5) by using (2.6) and (2.7) , and obtain, after some tedious but perfectly straight-forward algebra, 1 β SSE( f) = min β ( y e - X eβ )'Σ ee ( y e -X e ) (2.8) 158 1 * * β + ( f *-X *m β )'Σ mm ( f - X m ) where 1 f * = f -∑ me∑ ee y e, * -1 X m = X m -∑ me∑ ee X e. (2.9) Note that the first part of the right hand side of (2.8) is identical to the r.h.s. of (2.3). Therefore, it is clear that SSE( f) is equal to SSE , the correct error sum of squares for the existing observations, when 1 -1 β̂ ( f )= β̂ = ( Xe '∑ ee X e )-X e'∑ EE Y e, * * β. f = X mˆ (2.10) We can see also, by examining (2.9), that when Σ me = 0, i.e., when the existing observations are not correlated with the missing observations, the minimizing value for SSE( f) is simply β. f= Xm ˆ (2.11) We can simplify (2.8) further by letting - 1/2 v e = ∑ ee y e, - 1/2 * u = ∑ mm.ef , - 1/2 T e = ∑ ee X e, - 1/2 * T m = ∑ mm.eX m (2.12) Here, we use the fact that, for any symmetric matrix, A, there exists a matrix such that A = A 1/2 (A 1/2 )' we can then rewrite (2.8) as SSE( u) = min β ( v e -T eβ ) (2.13) + ( u - T m β )'( u - T mβ ). We therefore arrive at a quadratic form identical to that resulting from analysis of a simpler model, u = ve [ ] [ δe Tm β+ δm , Te ] [ ] (2.14) where u is the vector of known quantities that has been substituted for the missing observations, v m, independent, equi- variate errors since and δ has zero mean and Estimation of Missing Values in Linear Moedel159 δ= [ 1/2 1/2 -1 - ∑∑mm.e mm.e∑ me∑ ee ε. - 1/2 0 ∑ ee ] SSE ( u) =SSE when β̂ ( u )= β = ( T e 'T e ) -T e'v e, u = T m β̂. (2.15) These solutions to SSE ( u) are identical to the solutions (2.10) of SSE ( f) . Ⅲ. Maximum Likelihood Estimation Now, the method of maximum likelihood estimation is considered. Suppose the model is y =Xβ + ε, where (3.1) is the vector of observations, is the known design matrix, is vector of unknown parameters, and where is positive definite and known up to a scalar factor. Suppose that some observations are missing. We arrange y so that it can be partitioned, and rewrite model (3.1) as ym Xm εm e e e [ y ] = [ X ]β + [ ε ], where the covariance matrix of the error vector is ∑= [ ∑ mm ∑ me . ∑ em ∑ ee ] Let us assume that we have algebraic quantities f , with which we augment the existing observations to form a "complete" vector of observations. Then the augmented model is f = ye [ ] [ Xm β+ Xe εm εe . ] [ ] (3.2) 160 The likelihood function for the normal erroer model (3.2) is L(β,f) f - X mβ '∑ - 1 y e - X eβ [ 1 2 ( [ 1 2 ( ( y e - X eβ )'∑ ee = ( 2 π ) - n/2|∑| - 1/2 exp - ) ( f - X mβ y e - X eβ )] (3.3) = ( 2 π ) - n/2|∑| - 1/2 exp - -1 ( y e - X eβ ) * 1 * β ) )]. + ( f * - X *mβ )'∑ mm.e (f - X m where ∑ -1 = [ -1 ∑ mm.e -1 1 - ∑ ee ∑ em∑ mm.e -1 -1 - ∑ mm.e∑ me∑ ee 1 ∑ee.m ], and -1 -1 ∑ mm.e = ( ∑ mm - ∑ me∑ ee ∑ em ) -1 ∑ ee.m = -1 , -1 -1 ( ∑ ee - ∑ em∑ mm∑ me ) , -1 -1 -1 -1 = ∑ ee + ∑ ee ∑ em∑ mm.e∑ me∑ ee , and 1 f * = f -∑ me∑ ee y e, * 1 X m = X m - ∑ me∑ ee X e, (3.4) Then logL n 1 1 -1 log π log |∑| - ( ( y e - X eβ )'∑ ee ( y e - X eβ ) 2 2 2 * * -1 * * + ( f -X m β )'∑ mm.e (f - X mβ ) ). =- (3.5) By taking partial derivative of logL with respect to β and f , equating each of the partials to zero, -1 * -1 * -1 * -1 * ( X e '∑ ee X e + X m'∑ mm.eX m')β = ( X e '∑ ee y e + X m'∑ mm.eˆ f ), ˆ f * = X *m β̂. 1 where ˆ f * = f̂ - ∑ me∑ ee y e. (3.6) (3.7) Estimation of Missing Values in Linear Moedel161 The above system of equations is reduced by ( X e '∑ ee X e ) β̂ = X e'∑ ee y e. -1 -1 This equation is equivalent to the normal equation of y e = X eβ + ε e, ε ~N( 0,∑ ee ). In the special case of Σ = σ 2I n , the likelihood function for the normal error model (3.2) is L( β, σ 2,f) 1 = × 2 n/2 ( 2 πσ ) 1 exp [ ( (f- X m β )'( f - X m β ) + ( y e - X e β )'(y e - X e β ))]. 2σ 2 We can work with log L , rather than L , because both L and log L are maximized for the same values of β and f . n n log 2π log σ 2 2 2 1 [ (f-X m β )'( f - X m β ) + ( y e -X e β )'(y e -X e β )]. 2σ 2 log L =- Partial differentiation of this logarithmic likelihood yields : ∂( log L) 1 = 2 [ ( X m'f +X e' y e ) - ( X m 'X m + X e'X e )β )] σ ∂β ∂( log L) n =+ ∂σ 2 2σ 2 1 β β β β 2 [ (f- X m )'( f - X m ) + ( y e - X e )'(y e - X e )], 2σ ∂( log L) 1 =- 2 (f-Xm β ). σ f Then equating each of the partials to zero : β, X m' f̂ + X e' y e = ( X m 'X m + X e'X e )ˆ β, f̂ = X mˆ ˆ σ 2 = 1 [ ( f̂ - X m f̂)'( f̂ - X m f̂) + ( y e - X e β̂ )'( y e - X e β̂ ) ], n where ˆ β, ˆ σ 2 and f̂ are the maximum likelihood estimators of β,σ 2 162 and f , respectively. From the first two equations, X e'X e β̂ = X e' y e. The above equations are equivalent to the normal equations for the existing observations. Thus, the maximum likelihood estimators of f are the same estimators as those generated by the method of least squares estimation. Ⅳ. Example : Randomized Block Design Suppose the model is y ij = μ + a i + β j + ε ij, i = 1,…,t, j = 1,…,b, where ε N( 0,σ 2I). If a single observation is missing, say, the likelihood function of the model is L( μ,a i,β j,σ 2,f) = [ exp - 1 × ( 2 πσ 2 ) tb/2 1 2 ∑∑ i j 2σ ( ( i, j)≠ ( l, m) )] ( y ij - μ - a i - β j ) 2 + (f- μ - a l- β m . By taking partial derivative of log L with respect to μ, a i, β j, σ 2, f, equating each of the partials to zero, Estimation of Missing Values in Linear Moedel163 y..' - ( tb - 1) μ̂ - ( b - 1) ˆ al- b ∑ ˆ a i - ( t - 1β̂) i≠m m -t β̂ ∑ j≠m j β m ) = ,0 +( f̂- μ̂ - ˆ a i-ˆ β j = 0,i≠l, y i. - b μ̂ - bˆ a i - ∑ˆ j β j + ( f̂ - μ̂ - ˆ β m ) = 0, i = l, y l.' - ( b - 1) μ̂ - ( b - 1) ˆ al- ∑ ˆ a1-ˆ j≠m β j = 0,j≠l, y j. - t μ̂ - ∑ ˆ a i - tˆ i β) m + ( f̂ - μ̂ - ˆ β m ) = 0, j = m y m.' - ( t - 1) μ̂ - ∑ ˆ a i - ( t - 1ˆ al-ˆ i≠l ∑i ∑j ( j, j)≠( l, m ) ( y ij - μ̂ - ˆ β j ) 2 +( f̂ - μ̂ - ˆ β m ) = nˆ σ 2, ai-ˆ al-ˆ β m = ,0 f̂ - μ̂ - ˆ al-ˆ where y i. = ∑ y j treatment, ij, block, respectively, y .j = ∑ y i and i.e., grand ij, and total y l.', y .m ' for y l.' = ∑ y ij, j≠m y..' = ∑ ∑ i Here, j ( i, j)≠ ( l, assume the and existing y..' are the observations, y .m ' = ∑ y im, and t≠l m )y ij. that ∑i a i = 0, ∑j β j = 0 . Than the equations become β m = 0, y..' - ( tb -1) μ̂ + ˆ al+ˆ y i' - b μ̂ - b ˆ a i = 0, i≠1, ˆ μ̂ β y l.'-(b-1) - ( b - 1) a l + ˆ j≠l, m = 0, β j = 0, j≠m , y .j - t μ̂ - tˆ ˆ μ̂ β y .m' - ( t - 1) + a l - (t - 1)ˆ m = 0, j = m. Solving these linear equations results in above linear 164 ty l.' + y .m' - y..' ˆ al= , t( b - 1) βm= ˆ μ̂ = y l.' + by .m' - y ..' , b( t - 1) βm y..' + ˆ al+ˆ tb - 1 Thus ty l.' + by .m' - y..' βm= f̂ = μ̂ + ˆ al+ˆ . ( t - 1)( b - 1 ) This result is the same as that of least squares estimation. Estimation of Missing Values in Linear Moedel165 <References> [1] Chakrabarti, M.C.(1962), Mathematics of Design and Analysis of Experiments, New York: Asia Publishing House. [2] Feingold, M.(1982), "Missing data in linear models with correlated errors", Communications in StatisticsA [3] Grybill, F.A.(1969), Introduction to Matrices with Applications in Statistics,Belmont, California: Wadworth Publishing Company, Inc. [4] Greenberg, B.G, A.E. Sarhan (1962), Exponential Distribution, Contributions to Order Statistics (A.E. Sarhan and B.G. Greenberg Eds.), New York: John Wiley. [5] Kshirsager, A.M(1973), "Bias Due to Missing Plots", The American Statistician 25(1), 47-50. [6] Lloyed, E.H.(1962) Contributions to ,Generalized Order Statistics Least (A.E. Squares Sarhan Theorem, and B.G. Greenberg, Eds), New York : John Wiley. [7] Sclove, S.L.(1972), "On Missing Value Estimation in Experimental Design Models", The American Statistician 26(2), 25-26. [8] Yates, F.(1933), "The Analysis of Replicated Experiments When the Field Results are Incomplete", The Empire Journal of Experimental Agriculture 1, 129-142. 166 선형모형에 있어서 결측치에 대한 추정 박 종 태 <요 약> 결측된 관측치가 주어졌을 때 일반적인 오차구조를 가지는 모형에 대하 여 일반화 최소제곱법과 최우추정법을 비교하였다. 이 경우에 두 방법들 이 서로 동치임을 입증하고 한 예로 확률화 블록설계에 있어 결측치에 대 한 최우추정량을 구해 보았다.
© Copyright 2026 Paperzz