Chapter 13 13.5 Comments on the Distribution of the Lack-of

Chapter 13
13.5 Comments on the Distribution of the Lack-of-Fit Statistics (optional)
In the previous sections of this chapter, we have used some distribution results without full
justification. We now proceed to discuss this aspect within the scope of this book.
We begin with a binomial random variable X, which is the number of trials in a sample of n
independent random trials that have the characteristic A. We know, by observing X, that
Y  n  X is the number of trials in this sample that has the characteristic A . Suppose the
probability, P( A) , of obtaining A in a single trial is  , and we then have
P ( A)   ,
P(A )  1   .
(13.5.1)
We recall that E ( X )  n and Var(X) = n (1   ) .
Further, using the Central Limit Theorem, the distribution of X when n is large can be
approximated by the normal distribution with
X  n
n (1   )
Z
(13.5.2)
where Z is a N(0,1) random variable. From Chapter 7, we know that the square ( Z 2 in this case)
of an N(0,1) random variable is distributed as a 12 variable. Hence, we may write (13.5.2) in the
form
( X  n )2
 12 ,
n (1   )
Upon using the facts
1


for large n
(13.5.3)
1
1
and Y  n  X , with X  n  [Y  n(1   )], it is

1    (1   )
easily verified that the left-hand side of (13.5.3) may be written as
( X  n )2 [Y  n(1   )]2

n
n(1   )
(13.5.4)
Now let
X  f1 ,
Y  f2 ,
A  A1 ,
A  A2 ,
1  P( A)  P( A1 )
and  2  P( A )  P( A2 ) . Of course, f i is the number of trials in the sample of n trials that result
in Ai for i=1 and 2, and since each f i has the binomial distribution, we know that
E ( f i )  ni .
(13.5.5)
We may put (13.5.3) and (13.5.4) together to obtain, for large n,
( fi  ni )2
 12

n

i 1
i
2
(13.5.6)
Note that k = 2, but since f1  f 2  n, f 2  n  f1 , and if f1 is observed, we automatically
know f 2 . Hence, the degrees of freedom for the approximate chi-square distribution is 1.
Now in the case of k characteristics A1 , A2 ..., Ak , the respective frequencies, f1 , f 2 ,..., f k , have a
multinomial distribution and we note that
k
f
i 1
i
n
or
f k  n  ( f1  ...  f k 1 )
(13.5.7)
Indeed, the probability function of f1 ,..., f k 1 is
p( f1 ,..., f k 1 ) 
n!

k 1
i 1
fi ![n   i 1 fi ]!
k 1
1 f ... k 1 f (1   i 1 i )
k 1
1
k 1
( n
k 1
 fi )
i 1
(13.5.8)
for 0  fi  n and 0  i 1 f i  n .
k 1
Now, just as in the case of k =2 (k  1 = 1), where we are dealing with a binomial random
variable, it can be proved that the distribution (13.5.8) is well approximated for large n by a
certain (k  1)-dimensional normal distribution (see Chapter 6 for a discussion of the twodimensional normal distribution, called the bivariate normal). It may be proved that, for large n,
we have
( fi  ni ) 2
  k21

ni
i 1
k
( f k  n  i 1 f i ,
k 1
(13.5.9)
k  1  i 1i )
k 1
This, for k = 2, is the result presented at the beginning of this section.
Now, when n i  E ( f i ) , i=1,…,k are unknown and we make c estimates, c < k  1 based on the
sample to furnish a set of estimates of the ni ’s, we are placing c further restrictions on the f i ’s
(see, for example, Section 13.3.2 or 13.3.3). These restrictions then result in the loss of c
additional degrees of freedom, in addition to the loss of the one degree of freedom imposed by
the restriction i 1 f i  n .
k
Chapter 14
15.4.4 Prediction Interval for a Future Observation Y with Confidence Coefficient (1   )
Suppose we are working with the regression model E(Y X )  0  1 X and we find the (leastsquares) regression line
Ŷ  b0  b1 X
based on the data, ( X 1,Y1 ),
, ( X n , Yn ) , where the independent Yi ’s are such that
N (  0  1 X i ,  2 ) . Having found the regression line, we may be interested in predicting the
Yi
value of a future observation, Y, to be generated at X, independent of ( X i , Yi ), i  1,
we assume that Y
, n , where
N (  0  1 X ,  2 ) . Of course, E (Y X )  0  1 X  E (Yˆ X ) , so that
Ŷ  b0  b1 X is a point estimate of the future observation Y. To find a prediction interval for Y,
we consider first the random variable Y  Yˆ . We have that
E (Y  Yˆ )  E (Y )  E (Yˆ )  0
(15.4.15)
and
V (Y  Yˆ )  V (Y X )  V (Yˆ X )
  2 ( X  X )2  2 
2 


S XX
 n

or
 1 ( X  X )2 
V (Y  Yˆ ) =  2 1  

S XX 
 n
Since we are assuming normality for (Y1 , Y2 ,
Yn ), we easily find that
(15.4.16)
(Y  Yˆ )

 1 ( X  X )2  
N  0,  2 1  
 
S XX
 n


(15.4.17)
Hence, by standardizing the random variable (Y  Yˆ ) , we find that
(Y  Yˆ )
 1 ( X  X )2 
 1  

S XX 
 n
N (0,1)
(15.4.18)
Now, replacing  in (15.4.18) by its estimator S, we have that
(Y  Yˆ )
 1 ( X  X )2 
S 1  

S XX ) 
 n
tn  2
(15.4.19)
From Equation (15.4.19), we have






(Y  Yˆ )
P  tn  2; /2 
 tn  2; /2   1  
 1 ( X  X )2 


S 1  



S XX 
 n


(15.4.20)
so that we may write

 1 ( X  X )2 
 1 ( X  X )2  
ˆ
P  Yˆ  tn2; /2 S 1  

Y

Y

t
S

1  
   1 
n  2; /2

n
S XX 
n
S XX  




(15.4.21)
That is, the so-called prediction interval for Y to be observed at X having confidence
coefficient (1   ) is

 1 ( X  X )2  
 Yˆ  tn2; /2 S 1  


n
S XX  



(15.4.21a)