Toro Vizcarrondo, Carlos, E.; (1968)Multicollinearity and the mean square error criterion in multiple regression: a test and some sequential estimator comparisons."

.
~
.
'
MULTICOLLINEARITY AND THE MEAN SQ,UARE ERROR
CRITERION IN MULTIPLE REGRESSION:
A TEST AND SOME SEQUENTIAL
•
ESTIMATOR COMPARISONS
by
Carlos Enrique Toro-Vizcarrondo
Institute of Statistics
Mimeograph Series No. 588
August 1968
v
TABLE OF CONTENTS
Page
LIST OF TABLES.
. . vii
INTRODUCTION.
1
The Model •
Stepwise Regression . . . • .
Specification Error .
Multicollinearity .
Preliminary Test of Significance and the Mean Square Error.
0
1
2
•
3
4
5
TWO ESTIMATORS AND A CRITERION.
6
Purpose . . • • .
..••
Multicollinearity and the Mean Square Error Criterion--Two
Regressors.
• • . . .
• • . . . • • • .
The "t-ratio" Test and the Mean Square Error Criterion--Two
Regressors.. . . . . .
. ...
a
_
"
••
GO
•
•
6
6
•
•
•
8
The Restricted and Full Least Squares Estimators. • . •
9
VARIANCE (MEAN SQUARE) OF A LINEAR FUNCTION, GENERALIZED
VARIANCE (MEAN SQUARE), AND THE MEAN SQUARE CRITERION •
14
Variance of a Linear Function and the Generalized Variance. • • • • 14
Mean Square of a Linear Function and the Generalized Mean Square. • 15
THE NONCENTRAL F DISTRIBUTION AND THE MEAN SQUARE ERROR CRITERION . • 22
Preliminaries . . . . • .
A Uniformly More Powerful
Comparison of the Usual F
Criterion Test • . • .
Minimum Variance Upbiased
•
. • • . . . . . . • • . • . . .
Test for Multicollinearity.
Test and the Mean Square Error
. •
Estimator of A. • . • • • • • • •
0
0
•
•
•
•
•
•
22
22
•
•
•
•
•
•
30
• 32
•
CRITICAL POINTS AND VALUES OF THE POWER OF THE "MULTICOLLINEARITY"
TEST FOR ONE DEGREE OF FREEDOM IN THE NUMERATOR
• • 34
Critical Points ••
0....
Power Values.
. ••.•
Some Comments about the Critical Points and Power Tables . •
•
ESTIMATION AFTER PRELIMINARY TESTS OF SIGNIFICANCE.
Sequential Estimation • . . • • • .
Sequential Estimator Bias . • . •
Sequential Estimator Mean Square. •
•
34
• • • 38
• 45
•
0
•
49
49
• 51
· 61
vi
TABLE OF CONTENTS (continued)
Page
SUMMARY, CONCLUSIONS AND SUGGESTIONS FOR FURTHER RESEARCH . •
SlJID.mary.
1'1
..
•
••
. " . .
0
68
68
70
•
Conclusions. • ••
• ••.•.
Suggestions for Further Research
72
LIST OF REFERENCES
74
APPENDICES
76
APPENDIX A.
APPENDIX B.
Computer Program for the "Multicollinearity"
Test Critical Points • • • . • • • . • • • • .
Computer Program for the Evaluation of Power
Values.
APPENDIX C.
0
•
•
•
•
•
•
•
•
•
•
•
•
•
•
I)
•
•
Fortran Statement for the Evaluation of
Sequential Estimators Bias and Mean Squares
•
77
79
80
vii
LIST OF TABLES
Page
L
Noncentra1 beta and noncentra1 F "multicollinearity" test
of critical points for one degree of freedom in the
ntnner ator.
.-
0
"
•
l!I
0
e
e
"
tJ
0
('>
0
1II
•
0
0
0
1)
0
0
(I
f
37
•
2.
Largest values of i used in computing the power for A
O.
40
3.
Power values for the "multicollinearity" test at 5 percent
significance level
41
Power values for the
significance level
42
4.
5.
·.··. .·.· · .···
"multicollinearity" test at 10 percent
·.··.
· ·.·...···.
6.
0
0
0
0
0
0
0
0
43
Power values for the "multicollinearity" test at 50 percent
significance level
44
Comparison of Tang's 1 percent and "multicollinearity"
5 percent critical points..
• ••••••
46
Comparison of critical points for selected denominator
degrees of freedom when S(O) is approximately .01 • • •
47
Comparison of F critical points for selected denominator
degrees of freedom when S(O) is approximately .025 • •
47
Comparison of F critical points for selected denominator
degrees of freedom when S(O) is approximately .10 • • •
48
Critical points used in the evaluation of the sequential
estimator's bias • • • • • • ,
••• , ••••
57
· ·· ...·.·.·.. ···.
0
7.
0
Power values for the "multicollinearity" test at 25 percent
significance level
0
0
•
•
•
0
•
•
•
•
•
0
8.
9.
10.
11.
•
CI
12.
C!l
~
e
"
III
•
It
r
1II
~
0
•
57
0
Values of A and range i (in parenthesis) used in computing
the bias and mean square error of the sequential estimator
of S1. .
CD
14.
•
•
'"
l!l
•
'"
•
•
"
"
•
..
..
1J
•
fI
•
0
•
•
•
0
(I
59
II
The bias in estimating Sl via the "mean square," "usual," and
"50 percent" sequential estimators • • • • • • .
15.
0
Powers for the "50 percent" test for selected values of the
noncentrality parameter for one degree of freedom in the
numerator ...
13.
•
0
•
•
•
•
•
•
The mean squares of the "mean square," "usual," and "50 percent"
sequential estimators • • • • • •
0
••
0
•
0
•
•
•
•
•
•
•
•
•
60
67
CHAPTER I
INTRODUCTION
The Model
The motivation of this thesis can be found in the following
interrelated topics:
.~
stepwise regression, specification error, multi-
co11inearity, minimum mean square estimator and estimators after
preliminary tests of significance.
Consider the linear model of full
rank
(1)
Y=X].+£
where Y is an nxl column vector of random variables whose variation is
to be explained by the p nonstochastic regression vectors appearing as
columns in the nxp matrix X.
The X matrix and
~
vector may be partitioned conformably so that (1)
can be written as
(2)
where Xl is the nxk matrix of observations on the first k regressors:
•
x. (i=1,2, ••• ,k) with rank k, 8 is the kxl vector of coefficients of the
1
- 1
first k regressors, X is the nxm matrix of observations of the second m
2
regressors:
x. (j=k+1, k+2, ••• , k+m = p<n) with rank m,
J
S~
-~
is the mx1
vector of coefficients of the second m regressors, and £ is the nx1 vector
of disturbances distributed as a multivariate normal distribution with
mean vector Q. and covariance matrix
0 2 1.
For convenience both the XIS
and Y variables are assumed to have zero sample means.
2
Stepwise Regression
p~ocedure
The stepwise least squares
first estimates !l as
(3)
and then regresses the residuals
upon X2 to estimate !2 as
(4)
Freund et al. (1961) examined the bias, estimates and associated tests
obtained by subjecting the residuals to a regression analysis.
Goldberger
and Jockems (1961) showed that the stepwise procedure underestimates both
the regression coefficient of x
2
and the marginal contribution of x
the explanation of y in the simple model of two regressorso
2
to
The results
were generalized for the estimate of SK+l in the case of k+l explanatory
variables.
Goldberger (1961) presented the connection between the step-
wise and the direct (classical estimator of Sl including X ) estimator of
2
.
!l and concluded that the stepwise estimator
(~l'
function of the direct estimators
.•
b 2 ),
'\,
(~l)
can be written as a
io~.,
(5)
~
From (5) it is easily seen that b
l
,
-1'
is biased by the factor (XIX ) XlX2~2
l
unless Xl and X are orthogonal or !2
2
= 00
Zyskind (1963) established
exact expressions for the difference between "true" additional regression
sum of squares due to X obtained by direct regression analysis and the
2
3
regression sum of squares due to additional variables by residual
analysis.
He also showed that a residual analysis test statistic consid-
ered by Freund et al. (1961) is subject to certain serious defects.
(1963) gave the results obtained by Freund
~
Kabe
al. (1961), Goldberger and
Jockems (1961) and Goldberger (1961) for a multivariate regression model.
Wallace (1964) showed that for the two-regressors case the stepwise
estimator has a smaller variance than the direct estimators and that using
the criterion of relative mean square errors the stepwise estimators are
"better" than direct least squares estimators under certain circumstances.
This thesis generalizes the results obtained by Wallace (1964).
First,
it is shown that when the mean square criterion is used, the selection
between a stepwise estimator and a direct estimator as an estimate of !l
is determined by a critical value of the noncentrality parameter of the
noncentral F distribution,
Second, a uniformly most powerful test is
derived as a method for determining when to use the stepwise estimator.
Third, a minimum variance unbiased estimator is obtained for the noncentrality parameter.
Fourth, for various degrees of freedom and various
significance levels, the noncentral F is tabulated for the critical value
of the noncentrality parameter.
Fifth, the power of the test is tabulated
over selected ranges of the relevant parameters •
..
Specification Error
The term, specification error, is used for the situation where the
true hypothesis is (1) but we mistakenly work with
(6)
That is, we leave out variables which should be included.
The stepwise
4
'V
estimator b
V
= X2
~2
i
may be viewed as the direct estimator of
+ .£ as the "disturbance" vector.
~l
in (6) taking
The stepwise estimator bias
is attributable to the fact that there is a "specification error;" the
classical model is not appropriate to (6) because the "distrubance" V
•
Griliches (1957) analyzed some common
does not have zero expectation.
specification errors in Cobb-Douglas production function estimation;
Theil (1961) presented some results on the problems of specification
analysis, and Goldberger (1961) spelled out the connection between the
tl
stepwise and the direct estimator of
analysis of specification error.
as a special case of Theil's
The problem of estimating
tl
by the
stepwise estimator is analogous to misspecifying the model as (6) when
the true model is (2).
tl
If our interest is to find the "best" estimate of
between the stepwise and the direct estimator, the mean square error
criterion may, under certain circumstances, lead us to specify the model
as (6) and estimate
tl
,
as (XlX )
l
-1
'
Xl
y.
Multicollinearity
When linear relations exist among the observed values of the
regressors we say multicollinearity exists.
.,
..
In Johnston (1963) and
Goldberger (1964) multicollinearity is discussed in the limiting case
where the correlation of the regressors in a two-regressors model is one •
Also an intuitive explanation is given when the correlation is close to
one.
Wallace (1964) suggested that multicollinearity becomes a precise
concept by allowing the mean square error criterion to tell us whether
we are "better off'" 1n es t'1ma t'1ng
Q
~l
b
.
y '19nor1ng
x2 •
In the more general
model (2), the multicollinearity problem is whether to include X or not
2
in the estimation of
'V
~l'
II or II is to be used.
Again the problem is analogous to decide whether
5
Preliminary Test of Significance
and the Mean SguareError
Stepwise regression, specification error and multicollinearity are
branches of one tree whose trunk is the mean square error concept.
The
purpose of this thesis is to give the researcher a practical procedure to
decide when to use the stepwise estimator, or analogously when to misspecify the model, or delete variables from the model because they are
collinear.
Usually researchers make the selection between ~1 and b 1 using
a preliminary test of significance.
hypothesis
b
1
is used.
~2
The most common is testing the null
~
= Q; when the hypothesis is accepted b 1 is used, when rejected
The bias involved in the estimation of 8 after a preliminary
1
test in a two-regressors model was found by Bancroft (1944).
We propose
an alternative preliminary test of significance (mean square error test).
A comparison will be given between the usual estimator and the mean square
estimator in terms of bias and mean square in a two-regressors model •
.
•
..
6
CHAPTER II
TWO ESTIMATORS AND A CRITERION
Purpose
The purpose of this thesis is to find a "good" estimate of
the partitioned model (2).
Two estimators are to be studied:
~l
in
the
restricted least squares estimators (R.L.S.E.) of !l (generally known as
stepwise estimators) and the full least squares estimators (F.L.S.E.) of
!l (sometimes called direct least squares estimators).
The R.L.S.E.
ignores X2 in the estimation of !l while the F.L.S.E. estimates !l
including X in the model.
2
In the first section of this chapter the mean
square error criterion is used in order to select between the R.L.S.E.
and the F.L.S.E. when the model is simplified to a two-independent variables
case.
An attempt to explain multicollinearity and comments about the usual
procedure of deleting variables by using the t-test are presented.
The
restricted and full estimators are discussed in the subsequent sections.
Multicollinearity and the Mean Square
Error Criterion--Two Regressors
Suppose the "true" model is
..
(7)
i=l,eeo,n
0
The restricted estimator of 8 in this simple model is
1
LXIx 2
.
02/
2
with mean 81 + 82
2 and var1ance /Lx
and the full estimator is
l
is
02/
Ex1y
2
LXI
+
2
l-r 12
2
/ LXI (1-r
r 12
2
12
EX(lLXlY
2
---z
LX 2Y)
LXI
Lxl x 2
) where r
12
with mean 81 and variance
is the "simple correlation" between xl and x ;
2
7
i.~.
variance of b
..
tV
The variance of b
=
,
tV
l
but since b
l
is less than or equal to
is biased, a more meaningful comparison can
l
be made using mean square errors (M.S.E.).
By making the comparison,
Wallace (1964) showed that
<
>
tV
Mean Square Error (b )
l
S 2
as
(8)
2
=
(J
Mean Square Error (b )
l
S 2
S 2
2
2/
<
2
1.
>
LX 2 (1-r 2)
2
12
The mean square criterion tells the investigator when he is "better off"
in estimating Sl by ignoring x
2
but even in the simple case of only two
regressors, the criterion (8) is difficult to implement since it is
unlikely to be known beforehand,
i.~.,
it involves two parameters. The
criterion also depends on the correlation between xl and x •
2
x
2
When xl and
are linearly dependent, "perfect multicollinearity"--the simple
correlation coefficient is one.
As r
2
12
variance of the F.L.S.E. for Sl explodes.
square will be better than b
very close to one.
l
gets very close to one, the
Any estimator with finite mean
as the correlation between xl and x
2
gets
If the two regressors are orthogonal--"zero multi-
collinearity"--the simple correlation coefficient is zero and the two
estimators under consideration are equivalent (Goldberger, 1964,
pp. 200-201).
In some areas of research, say econometrics, the above situations
rarely, if ever, exist.
1
l
Econometricians have to accept the experiments
In controlled laboratory experiments, orthogonality of the sample
observations on x's can be achieved.
8
performed by nature, and these experiments are likely to feature
collinearity rather than orthogonality.
In practice an exact linear
relationship is highly improbable, but the general interdependence of
economic phenomena may result in the appearance of approximate linear
relationships in the
..
This phenomenom is known as multicollinearity
XIS.
or intercorrelation (Goldberger, 1964; Johnston, 1963).
Excluding the
limiting cases, use of the mean square error criterion makes multicollinearity a precise concept.
6 2
2
If
~
1, we should, according to the M.S.E. criterion, use
•
'V
the restricted est1mator (b ).
l
r
2
>
l2
Rearranging the inequality, we have
Thus multicollinearity is a problem when r
1
2
12
is
greater than or equal to a quantity that depends on the parameters 6
2
and
0
2 (Ex 2 is a known constant).
2
Since the square of the correlation coefficient is bounded by zero
and one, then we should throwaway x
(9)
~
.
r
2
l2
<
2
whenever
1 •
,.
So a precise statement of "multicollinearity" is obtained but it is a
rather inconclusive one--it requires knowledge about two unknown
parameters, i.~., 62 and
0
2•
The "t-ratio" Test and the Mean Square
Error Criterion--Two Regressors
Researchers, generally, will approach the problem of including or
excluding the variable x
2
by using a "t-ratio."
The question the
9
researcher asks, formally described, is the verbal hypothesis that "y
does not vary with x " which in turn is the hypothesis 13
2
2
called "t-ratio"
..
(SA
b
2
= O.
The so-
is the test statistic appropriate to this hypothesis
is the sample standard deviation of b ).
2
The use of the t-ratio
implicitly assumes that the researcher is interested in the prediction
of y, not in the estimation of 13 •
1
If the hypothesis 13
2
=0
'V
the dependent variable y is estimated by x b , and when 13
2
1 1
y is estimated by x b + x b •
2 2
1 1
is accepted,
=0
is rejected,
However, if the researcher is interested
'V
in finding the minimum mean square estimator between b
1
and b , the follow1
ing mean square rule should be followed:
Use ~1 if
Therefore, 13
2
=0
-/V(b )
2
~ 13 2 ~
IV(b ) ,
2
use b 1 otherwise.
is included in the mean square rule but x
deleted for other values of 13 2 as long as
1 13
2\ ~
2
might be
IV(b ) .
2
The Restricted and Full Least Squares Estimators
In the previous sections we considered the mean square error
criterion in the simple case of two regressors.
In the next chapter the
mean square error criterion is extended to the estimation of !1 in the
general partitioned model described in (2).
This section presents proper-
ties of the restricted and full estimators that are needed in the
generalization of the mean square error criterion.
If X is ignored, !1 is estimated from the estimation model
2
(10)
where
10
The restricted estimator for
~l
is obtained by applying the usual
least squares procedure to equation (10)0
This yields the estimator
(11)
In (5) we saw that the restricted estimator of
estimator.
~l
may be a biased
Equation (12) is obtained substituting (2) into (11).
(12)
where
The columns of A can be taken to be coefficients from auxiliary
regressions of each variable in X upon the set of variables contained
2
in Xl'
It is evident from (12) that
corresponding bias equal to AS
,
XX
1 2
=~
(kxm) or
~2
Z
= Q (Freund
'V
~l
is a biased estimator with
unless Xl and X aLe orthogonal,
2
!.~.,
et al., 1961; Goldberger and Jockems,
1961; and Goldberger, 1961).
Now let us turn to examining the second order moments of the
'V
estimators and denote the covariance matrix of £1 as
square error matrix as
2
Q'V
E.1
•
2
E~
and its mean
-1
A mean square error matrix for a vector of estimators, say 0, is
,
covariance matrix + (bias vector) (bias vector)
where E stands for expected value. If 0 is unbiased, the mean square
error matrix is equivalent to the covariance matrix.
11
The covariance matrix of
"-
~1
is:
(13)
'\;
And the mean square error matrix of 01 is:
.-
(14)
Q~
-1
As is well-known, the full least squares (F.L.S.E.) estimator for
~ is b
=
,
l'
(X X)--X Y or. in the partitioned form that is used in this
thesis,
(15)
The above can be written as:
(16)
b
=
~~=
times
,
,
,
where F = X X - X X (X X )
2 2
2
2 1 1 1
-1
'
X X ' employing a partitioned inversion
1 2
procedure (Goldberger, 1964, po 174).
~J
L~
1961, po
The properties of
theorem (Graybill,
=b
113).
are summed up in the following well-known
12
Theorem:
=X~ + £
If Y
full rank and if
is a general linear hypothesis model of
is distributed N~,02I), the estimators
E
,
= -
b
.
'1 '
Y [I-X(X X)- X ] Y
n-p
"
have the following properties:
."
(1)
Consistent
(2)
Efficient
(3)
Unbiased
(4)
Sufficient
(5)
1 is distributed N(~,02 (X' X) -1)
(6)
Complete
(7)
Minimum Variance Unbiased
(8)
(n-p) ;2
-
(9)
band 0 2 are independent
The F.L.S.E. of
~1
is distributed as
x2 (n-p)
reduces to
(17)
:\;
which in view of the definition of the R.L,SoE.
~1)
in (11) and the
definition of the F.LoS.E. (b ) in (16) gives equation (5),
2
where A has previously been defined.
i.~.,
From (17) we can see that
1tV 1 = 1 1
when Xl and X are orthogonal.
2
•
The F.L.S.E. are unbiased, therefore the covariance and generalized
mean squares matrices of
1 1 are identicalc
13
~
Using similar notation for £1 as for £1 the following result is
obtained directly from (16).
(18)
..
14
CHAPTER III
VARIANCE (MEAN SQUARE) OF A LINEAR FUNCTION,
GENERALIZED VARIANCE (MEAN SQUARE), AND THE
MEAN SQUARE CRITERION
Variance of a Linear Function and the Generalized Variance
In the simple model of two regressors the variance of the restricted
..
estimator is less than or equal to the variance of the full estimator •
In looking at the general partitioned model (2), the following criterion
is used to compare the restricted estimator,
A'
with the full estimator, b
=
1
'V'
'V
'V
'V
II = (b 1 , b 2 , ••• , b k ),
(b , b , ••• , b ).
2
k
1
The restricted
estimator is defined to be preferable from the viewpoint of efficiency
if the variance of any arbitrary linear function of the full estimator
is greater than or equal to the variance of that same arbitrary linear
function of the restricted estimator.
criterion can be written as:
Denoting the variance by Var the
the restricted estimator is preferable if,
(19)
for all .£ except .£
=
Q, where .£ is a kx1 vector.
ing inequality is obtained:
,
(20)
-
.£ [L b
-1
for all .£ (kx1) ,
3
L~ ]
.£ ~ 0
-1
LA
1.~.,
b
1
From (19) the follow-
is positive semidefinite.
L'V
b
So the
i
variance of an arbitrary linear function criterion asks whether the
3
,....
,,,
Var (.£'§') = E[.£,§, - E(.£,§,)] [.£ e
,
,,..
'A
A
= c
E[e - Ee] [e - Ee]
=.£
LA c
e-
-
.....
E(.£.§)]
,
c
15
covariance matrix of the full
~stimator
minus the covariance matrix of
the restricted estimator is pO$itive semidefinite.
Using (13) and
(~a)
the difference of the covariance matrices can be written as:
= 02
(Zl)
.
A F -1 /
Z
-1
Since F
Z
where A and F are previously defined.
Z
is positive definite,
F - 1 is the covariance matrix of the full estimator b ) A F -1A ' is
Z
2
Z
(02
positive semidefinite (Goldberger, 1964, p. 37).
Therefore the restricted
estimator is preferable in terms of variance of a linear function.
Furthermore, the variance of a linear function criterion implies the widelyused method of comparing two sets of estimators known as the Generalized
Variance Ratio.
4
Since
~b
-1
~A
_
.2.1
-1
is positive semidefinite, we can conclude by using a well-known
~~
b
are positive definite matrix and
'~b
i
I
theorem (Graybill, 1961, p. 6) that I~b
I~~ I I I~b I
1
I.
~ I~b
1
Further, the ratio
1
is bounded by zero and one, hence the ratio provides a
1
convenient percentage comparison of efficiency.
Mean Square of a Linear Function and
the Generalized Mean Square
A comparison of the restricted estimator and the full estimator based
both on bias and variance must be then based on the personal preferences
of the researcher.
4If
e is
If bias is considered to be extremely important, the
a random vector with covariance matrix ~, then the
determinant of ~(I~I) is called the generalized variance of 0.
~
The
A
estimator 0 is preferable over 0 if I~el ~ I~~I.
tion of the generalized variance is given in
T.
A geometric interpretaW. Anderson (1964).
16
full estimator would be used. while if the variance criterion is given
great weight the restricted estimator would be used.
One criterion.
encompassing both bias and variance. is the mean square error criterion.
...
Since we are dealing with vectors of estimators the criterion is reduced
to a scalar by looking at the mean square of an arbitrary linear function
of the estimators •
The criterion is the following:
(22)
for all £ (kxl) except £
= ~.
the restricted estimator would be preferable.
The above inequality can be written as
(23)
c
for all £ which implies that the mean square matrix of the full estimator
minus the mean square matrix of the restricted estimator must be positive
semidefinite.
Since
-
n~
..
b
i
Using (14) and (18) this difference is written as
and n~
~
is positive semidefinite. then we can conclude by a well-known
b
-1
are symmetric, and n~
S
nbi
is positive definite
and if
-1
i
theorem (Graybill. 1961, p. 6) that
Inb I >
-1
In~
-1
I.
SO the mean square
5,T~e mean square matrix of the restricted estimator is cr :2 (X ' X ) -1 +
1 l
A~2~2 A where the first term is positive definite (Graybill. 1961. p. 4)
and the second term is positive semidefinite (Graybill, 1961. p. 4; Goldberger, 1964, p. 37).
17
error of a linear function criterion
i~plies
what can be called the
Generalized Mean Square Ratio Criterion.
Using the more general M.S.E. criterion, whether to include X to
2
..
estimate
~l
-1'
depends on the properties of AF 2
A -
1
07
A~2~2
' ,
A since the
positive semidefinite property is not changed when (24) is multiplied by
the positive scalar
estimate
~l
1
~
When it is positive semidefinite we will
cr
using only Xl by means of the restricted estimator.
We shall
present two cases for which the difference in the mean square error
matrices is positive semidefinite.
-1
1
"
A sufficient condition for A(F 2 - cr~SZSZ)A to be positive
1
'
semidefinite is that F2-1 - crzSZ~2
be positive semidefinite (Goldberger,
I.
1964, p. 37).
II. If m ~ k and rank of A is equal to m, a necessary and sufficient
condition for A(F
-1
2
1
"
- crzSZSZ)A
be positive semidefinite is that
F -1 - ~ S ' be positive semidefinite.
2
cr~~
The sufficiency part of II is proven immediately by usage of I.
show necessity we assume A(F
,
-1
2
1
!
- crzSZSZ)A
,
To
is positive semidefinite
l'
and rank of A = (X X )--X X is m where A is a kxm matrix and m ~ k.
l l
1 2
6
,
The A matrix consists of m linearly independent vectors (say,
~.A.,
the first m columns) and k-m column vectors that can be written as
linear combinations of the first m column vectors since their rank is m.
Since any m linearly independent subset of vectors from m-space spans
6Rank of A
to m.
<
(minimum m, k).
Here we assume it is exactly equal
18
the entire space, we find all m-vectors by changing the values of the
I
k-vector £
in the equation A £
= £0
So we can conclude that under the
assumptions of II:
..
(25)
..
for all c (kxl) implies that
1
"
_c , A(F2 -1 - ~
(3 )A _c _> 0
a -2-2
(26)
d
,
(F 2
-1
1
'
- ~(3i2§..2 )
£
for a 11 _d (mx1 )
· J:..., F2-1 - '7-2-2
1(3 (3
,1:..
~ 0
is positive semidefinite.
It is not too optimistic to expect that in most econometrics problems
the matrix A will be of rank m.
-1
2
F
1
~i2§..2
'
Therefore, the condition that
be positive semidefinite will be a sufficient and necessary
condition for deleting X from the "true model" if our criterion is to
2
select between the full and the restricted estimator that one that minimizes the mean square of an arbitrary linear function of the estimates of
We can arrange (26) as
, 1
'
£~
(27)
for all
§..2§..2
£
<
d'F -ld
- 2 -
£
-1
Since F
2
(mx1).
1
is positive definite and
~2
is an m-vector,
then
(28)
~
d
.
=
where the supremun is attained at d *
F2~2
(Rao, 1965, p. 48).
Using
19
equations (27) and (28) we can show that F
2
-1
-
1
02SZ~2
'
is positive
semidefinite if and only if
(29)
<
1.
Sufficiency:
,
.
Let .4.
(F 2
-1
1
.4.
'
- '7SZ~2 ).4. ~ 0 for all .4..
Since
'1
'
2 SZSZ.4.
o
d'F -ld
- 2 -
-~----
<
I
for all.4., the inequality will be satisfied for the particular value
d * = F2.§.2'
i.~.,
< 1.
Necessity: ,
Let
~2 F 2 ~2
1.
<
02
'I
-~
From (28)
-
<
'
.4. 07.§.2~2 d
d
<
d'F -ld
- 2 -
i.~.,
I for all.4.,
F2
-1
-
I
'7SZ~2
'
is a positive semi-
,
definite matrix.
..
1
Since F is a positive definite, the expression
2
.§.2 F2.§.2
02
is greater
than zero for all nonnull .§.2 vectors, it is zero only if .§.2 = O.
Thus,
equation (29) can be written as
(30)
<
1.
,
.§.2 F2 .§.2
In the next chapter we present the relationship of
2
o
with
the noncentral F distribution that arises in multiple regression problems
and develop a test of hypothesis for (30).
two simple examples.
We conclude this chapter with
20
I.
Using only two regressors the inequality (30) simplifies to
(31)
Rearranging terms and using the definition of the variance of b ,
2
the inequality can be written as
S 2
o
(32)
II.
2
<
< 1
When we are interested in deciding to include or exclude only
one regressor, say X
2
fies to
= x k+1 ,
the mean square error criterion (30) simpli-
S~+l
O ~ ---.:La LX 2+ (1 k 1
(33)
where R..-2
-k+1·1,2, ••• ,k
.
Rk2
)
1
+1·1, 2 ,ooo,k ~ ,
i
Since Var (b k+1 ) =
determination."
~
(Wallace, 1964, po 1182).
0
e. , the "coefficient of
a2
2
2
' the above
Ex k+1 (1-Rk+1·1,2, •• "k)
inequality can be written as
(34)
o
<
<
1.
In the two situations described above the mean square error criterion
(32) and (34) are necessary and sufficient conditions without making the
assumption found in footnote 6.
The mean square error criterion, (22),
21
semidefinite.
In the above situations the mean square error criterion
simplifies to require that a scalar must be nonnegative.
For two regressors the M.S.E. criterion simplifies to
(35)
>
O•
.
'
(35) is equivalent to (32).
For including or excluding one regressor the M.S.E. criterion
simplifies to
(36)
Since the matrix in (36) is positive semidefinite (Graybill, 1961,
p. 4), the M.S.E. criterion as stated in (36) is equivalent to (34) •
..
22
CHAPTER IV
THE NONCENTRAL F DISTRIBUTION AND THE
MEAN SQUARE ERROR CRITERION
Preliminaries
..
.0
We found in the previous chapter that the mean square error
criterion to include or exclude X reduced to finding out if (30) is
2
satisfied.
We showed that, under the linear model of full rank, (30) is a
necessary and sufficient condition when we are dealing with a tworegressors model or m=l.
When m>2, we must further assume that rank of
A is m in order to have (30) as necessary and sufficient.
Using the
mean square error criterion the sets of variables Xl (nxk) and X (nxm)
2
I
are defined to be "multicollinear" whenever li2 F2li2/02 is bounded by 0
and 1.
When "multicollinearity" exists, we proceed to estimate !l by
the restricted estimator deleting X from the model.
2
Since the mean square error criterion is a function of the unknown
parameters !2 and 0 2 , it is impossible to compute the value of
I
li F2li2/02 for a specific situation.
2
However, the situation is not hopeI
less since, as to be shown in the next section,
'.
I
~
(li2 F2li2) is the
noncentrality parameter of a widely used "statistic" and we can turn to
the uniformly most powerful test of
I
~
(li2
I
F2~2)
being less than or
equal to one-half against the alternative of being greater than one-half.
A Uniformly More Powerful Test
for Multicollinearity
Usually researchers delete or include X by testing the null
2
hypothesis li2
= O.
The hypothesis li2 =
is zero since F is positive definite.
2
Q is
equivalent to
I
20T (li2
I
F2~2)
The usual (likelihood ratio or
23
the conditional error) test statistic for the null hypothesis,
~2
= Q,
under the model described in Chapter I is
(37)
u
=
(n-p)Q1
mQ .
o
A
where Q is the quadratic form
1
.0
,
'1 '
Y [I-X(X X)- X lY.
,
1 2 F2b 2 and Qo is the quadratic form
The quadratic form Q is distributed as a noncentra1
1
chi square with m degrees of freedom and noncentra1ity parameter
The quadratic form Q is distributed as a central
o
chi square with n-p degrees of freedom.
Since Q and Q are independent,
1
o
the random variable u is distributed as a noncentral F with m and n-p
degrees of freedom, and noncentrality parameter A (Graybill, 1961, p. 135).
The following theorem is extremely helpful to show that the test
statistic u distributed as noncentral F distribution provides us with a
uniformly most powerful (D .MoP.) test for "multicollinearity,"
o~
A~
"21
simply A .:..
i.~.,
Since A is nonnegative, the "multicollinearity" criterion is
0
~
,
Theorem (Lehmann, 1964, p. 68):
Let A be a real parameter, and let the random variable u have
probability density fA(u) with monotone likelihood ratio in T(u).
The
real parameter family of densities fA(u) is said to have monotone likelihood ratio if there exists a real valued function T(u) such that for any
A
O
<
Al the distributions are distinct, and the ratio
is a nondecreasing function of T(u).
24
I.
For testing H : A < A against K: A > A t there exists a
o
-
0
0
uniformly most powerful test, which is given by:
1 when T(u) > c; reject the H with probability 1
o
T(u) > C t
(38)
<p(u) =
o when
T(u)
= c;
o when
T(u)
<
.-
reject the H with probability 0
o
i f T(u) = c, and
c; reject the H with probability 0
o
i f T(u) < C,
where c and 0 are determined by the following expectation:
EA <p(u) = a •
(39)
o
The power function S(A) = EA<P(u) of this test is strictly
II.
increasing for all points A for which S(A)
,
<
1.
III. For all A , the test determined by (38) and (39) is U.M.P. for
,
,
testing H : A < A against K
IV.
For any A
<
-
"
A
>
A at level a
"
=
S(A ).
A the test minimizes S(A) [the probability of an
0
error of the first kind] among all tests satisfying (39).
Let f(u,m,q,A), for short fA(u), stand for the noncentra1 F
probability density of the random variable u with m and q degrees of
freedom and noncentra1ity parameter At
..
00
(40)
1:
i=o
r<t) r (2i;m) i!
(1 +
mU;~(2i+m+q)
q
where q
= n-p
and r stands for the gamma function.
25
Using the hint given by Lehmann (1964, p. 313, problem 4(i», we
proceed to show that fA (u)/f (u) is a nondecreasing function of the
A
1
0
rtlU
u
real valued function qm+mu . (0 -< q+mu <- 1) for all' A0 <, AI.
After performing some of the operations indicated in (40), we
.
'
obtained (41) •
,i
(41)
1\
co
fA (u)
e
-A
L
r<t)
i=o
f(2i;m)il
(1
+ mu)
q
u
i
1(m-2)
2
u
m+g
i
(1
+ mu) 2
q
Collecting all terms not involving i outside the summation sign,
the following expression is obtained:
m
m-2
2
2 (m)
-A
e
u
q
fA (u) =
(42)
f
where w =
q
mu
+ mu
(-~l.)
2
co
m+g
(llmu) 2
f(2i+m+g) Ai
2
L
i=o
q
f (2i+m)
2
i
w
i!
is a monotone increasing function of u, and
~
versa.
Substituting the corresponding values for fA (u) and fA (u), their
1
0
ratio can be written as in (43).
..
.
fA (u)
1
fA (u)
(43)
,
-A
=
b wi
i=o i
L
co
and
L
i=o
co
0
f(2i+m+g) Ai
1
2
b =
i
. I
f(2i+m)
~.
2'
co
and
-A
e
0
where
e
co
1
L
i=o
>
i
bi w
i
ai w
0,
a
i
f(2i+m+g) Ai
2
' 0
f(2i+m)
i!
2
>
0
L aiw i converge for all w (Kaplan, 1959, p. 350).
i=o
26
For fA (u)/f
1
A
(u) to be a nondecreasing function of w, its first
0
derivative must be nonnegative for all values of w,
-A 1
.L-A o
(44)
k
n 1
L a w L nb w k
n
n=o
k=o
00
00
00
L
n=o
b
n
n
w
i.~.,
00
L
k=o
k-1
k~ w
>
0
e
for all w.
The above inequality is obtained by differentiating (43) with
respect to w.
To avoid confusion, two indexes of summation are used.
The differentiation process is valid since we are dealing with convergent
power series and continuous derivatives (Kaplan,
Since e-A 1 / e-A 0
195~,
pp. 148-150).
is always greater than zero and ( ; a wk ) 2
k
k=o
is a
finite positive number, our attention shall be focused on
00
(45)
nb
L
n=o
n
wn - 1
00
n=o
00
k-1
L ka w
k
k=o
Our task is to show (45) to be nonnegative for all w.
We can
rewrite (45) as follows:
(46)
L L (n-k)
n k
..
~
b
n
n+k-1
w
Partitioning the sum over k into values of k less than n and values
of k greater than n (when k=n the term n-k in (46) vanishes), we rewrite
(46) as
(47)
27
n
'
Add ~ng
an d subt rac ting ~~ ~~ (n-k) an b w +k - 1 we obtained
k
n k<n
Changing the order of summation in the third term of (48) we obtained
(49)
Since nand k are indexes of summation we can substitute them by i
and j.
(50)
The second term in (48) is written as
L
L
(i-j) a
i j<i
b
i
j
wi + j - 1
and the third term in (48) as
(51)
l:
l:
i j<i
b j wi+j-1 .
(j _of)
... a i
Then the sum of (50) and (51),
i.~.,
the sum of the second and third
term in (48), is zero, and (45) simplifies to the scalar
..
n+k-1
w
(52)
Since k<n and
O~w<l
n+k-1
it follows that n-k and w
are nonnegative,
but what about a k b - an b k ?
n
Taking the ratios of b
(53) and (54) are obtained.
i
to a
i
for the i
th
and i+1
th
term, expressions
28
(53)
(54)
.-
bi
b i +1
<--.
Since A < A ,
o
1 ail
a i +1
generally
~bn
- anb
k
> 0 for k<n.
Therefore, (44) is
hence fA (u)/f A (u) is a nondecreasing function
1
nonnegativ~
and
~f w(O<w~l).
0
The density function of w; hew; ID,q,A), is easily derived by using
transformation of variables techniques in (40).
00
(55)
h(w;ID,q,A) =
L:
i=o
i
Ae
1
1(2i-hn-2)
-(q-2)
2
2
w
(l-w)
-A
.,
1..
where O<w<l.
If A=O, h(w;ID,q,A=o) is the beta distribution.
Similarly when A>O,
h(w;ID,q,A) is termed the noncentra1 beta distribution (Graybill, 1961,
p. 79).
We have shown that the family of noncentra1 F densities have the
..
monotone likelihood ratio property in w.
Using the above theorem (Lehmann,
1964, p. 68) we conclude that the U.H.P. test for "multicollinearity," i.e.,
1
A ~ 2' at an a level of significance is given by
H :
0
1
1
A <- against K: A >
-2
2
*
Accept H :
if w
Reject H :
if w
0
0
< w
* ->
a.
w
a.
29
where the critical point w is determined by
CI.
1
h(w;m,q,Z) dw
w
J CI.
o
(56)
1 -
CI.
and w* stands for the computed value of w.
Carrying out term by term integration indicated in (56) for the
distribution presented in (55), we have the following:
(1)
00
(57)
2
E
i=o
i
e
1
2
I
.,
1.
w'
(~
CI.
1
m+i, -2 q)
1 -
,
where
I
w'
(~
CI.
f
1
m+i, -2 q)
12 m+i-l
w r (m;g + i)
CI.
r(1)r(~
0
.
CI.
w
(l-w)
.9._ 1
2
m+i)
Hence, (57) is an infinite series whose terms are the product of a
term from a Poisson series with an lncomplete Beta Function Ratio.
As
the values of the Poisson term and Incomplete Beta Function Ratio lie
,
between 0 and 1, the calculation of w will not be too heavy if the
CI.
1
00
number of terms giving either
E
i=o
a desired accuracy is not too large.
are given for several
..
of n-p=q.
e
2
.,
E
i=o
10
I w'
(~
m+i,
~
q) to
CI.
In Chapter V critical points, wCl.'
levels of significance for m=l and selected values
CI.
Also the power
w
1
(58)
00
or
S(A)
=L
w
h(w;l,q,A) dw = 1 -
CI.
J
o
CI.
h(w;l,q,A) dw
is presented for several A'S.
If we transform form w to u by the monotone increasing function
u = --9li... we get
m-mw
30
1
(59)
a =
J
00
h(w;m,q,A ) dw =
o
w
a
where u
=
a
q w
a
m-mw
7
J
f(u;m,q,A) du
0
U
0.
Hence we can perform the uniform most powerful
a
test for the null hypothesis, A < A , by using the beta distribution or
-
the F distribution.
qQ1
u
* =-mQ
and
o
0
The test statistic for the F distribution is
mu *
w* = q+mu*
for the beta distribution.
For the null hypothesis, ,\ = 0, the critical point will be denoted
O
O
a
a
as w for the beta distribution and u
the null hypothesis, A ~
for the F distribution.
Under
~ , the critical point will be denoted as wa for
the beta distribution and u
a
for the F distribution.
Comparison of the Usual F Test and the
Mean Square Error Criterion Test
In most instances researchers interested in estimating
~1
decide to
include X or not by performing the central F test for the null hypothesis
2
~2 =
0,
i.~.,
A = O.
If the computed u (u *) is nonsignificant at some assigned significance
level, we omit the terms containing X and use
2
'V
II as the estimate of
~l.
*
If the computed u (u ) is significant, we retain the terms containing
X2 and use b as the estimate of
l
the test for
~2 =
~1.
Transforming the variable u into w,
° can be carried out by using the central beta distribu-
tion for the null hypothesis A = 00
The testing of
~2
=
0, or alternatively,
A = 0, weights heavily the interest in an unbiased estimator of
•
~1
without
7In Chapter V we present critical points and powers for the case of
deleting one variable, io~., m = 1
0
31
If A > 0
paying due consideration to the variance of the estimator.
the unbiased estimator b
test.
l
(F.L.S.E.) shall be used according to the usual
This procedure disregards the following:
(1) The linear function
II\,
of the restricted estimators (£ h ), though biased, has smaller variance
l
I
A
than £ b •
l
.'
(2) If the noncentrality parameter, A, is between zero and
II\,
one-half the mean square error of £ b
I
l
is smaller than the mean square
A
error of £ ll.
In the mean square error criterion test we consider the mean square
of linear function, hence both bias and variance are considered.
We
delete X from our model when we considered X to be "multicollinear"
2
2
1
with Xl' i.~., when the null hypothesis, A ~ 2 ' is accepted.
The following results are immediate from the properties of the
uniformly more powerful test used to test the hypothesis A
at a
0
level where a
l
o at
Testing H : A =
1.
l
> a
0
a
0
~~
•
level is the same as testing H : A
0
1
< -
-2
since the power function is strictly increasing
for all points A for which S(A) < 1.
II. For the same a level the critical point for the M.S.E. test is
greater than the critical point for the usual test,
.
terms of the beta distribution or u
a
>
i.~.,
w
a
=0
H : A~
o
21
o
w
a
in
uO in terms of the F distribution.
a
Therefore if X is deleted in the usual test at a level,
2
H : A
o
>
i.~.,
is accepted then X is deleted under the M.SoEo criterion,
2
is accepted at a level.
i.~.,
Analogously, if X is kept using the
2
M.S.E. test at a level then X is kept at a level under the usual test.
2
*
0
If w lies between wand
w I , we will delete X under the M.S.E. test but
2
a
a
•
keep X under the usual test •
2
32
Minimum Variance Unbiased Estimator of A
From a very well-known theorem (Graybill t 1961 t p. 113) we have that
... ,
"..
.'
A
b
p
form a set of estimators that are jointly sufficient
"
A
h(cr 2 t bIt b 2t ···t b ) can be found such that E[h(cr 2 t bIt b 2t
p
A
A
A
•• e ,
b )]
p
A
= g(cr 2 t SIt S2t "'t Sp)t then h(cr 2 t bIt b 2t "'t bp ) is an unbiased
estimator of g(cr 2 t SIt S2 t "'t Sp) and has smaller variance for a given
sample size than any other unbiased estimator of g(cr 2 t SIt S2 t ••• , Sp)t
Rao-Blackwell theorem.
The random variable previously defined as
=
u
(60)
(n-p)Ql
m Q
o
can be written as
(61)
and u is distributed as a noncentral F with m and n-p degrees of freedom
..
with noncentrality parameter
.
'
Now
E(u) =
(62)
where q
= n-p
>
....L
q-2 +
2Aq
m(q-2)
2 (Patnaik t 1949 t p. 221).
33
Rearranging (62) we find that
E[um(q-2)
2q
m
2
Since tml(q-2)
2q
:2,
"""' b p'
v
unbiased estimator of A"
.
"
.
"
\
,
m
2
= A.
is an unbiased estimator of A and is a function
therefore um(q-2)
2q
m
2
is the minimtml variance
34
CHAPTER V
CRITICAL POINTS AND VALUES OF THE POWER OF THE
"MULTICOLLINEARITY" TEST FOR ONE DEGREE OF
FREEDOM IN THE NUMERATOR
Critical Points
When we are interested in deciding whether to exclude only one
regressor in the linear model of full rank, the necessary and sufficient
condition under the "multicollinearity" test simplifies to
13
2
k+ll
(~
2Var bk+ l
)
I
< -
- 2
(see page 20).
Hence using the results from
Chapter IV we can perform the uniform most powerful test for the null
hypothesis A ~
i
by using the Beta or F distribution. The test statistic
A2
for the F distribution simplifies to u * = bk+I/V(b
)' the well-known
k+l
test statistic for the null hypothesis B + = 0, when only one regressor
k l
is under study for exclusion,
i.~.,
m=l.
In Chapter IV the theoretical framework of the U.M.P. test for
"multicollinearity" and the comparison with the usual test (B + = 0)
k l
are presented.
We proceed now to describe the procedure by which the
critical points were obtained for the "multicollinearity" test for m=!.
The computations were made using the noncentral beta distribution and via
..
,
,
qw
the monotonic inverse transformation, u =
all ' , the noncentral F
a
-w
a
critical points were obtained for purpose of ready comparison with
.0
critical points of the central F distribution which is so widely used
to make classical regression tests.
In order to estimate the critical points in the noncentral beta
distribution, we used the approximation
I.
I
7 (_)1 e2
, (l + i, S )
2
(63)
l:
I
2
w
2
i=o
1.
a
.,
I - a
35
for a = .05, .10, .25, .50, where I ' is the Incomplete Beta Function
w
a
Ratio, and q = n - p.
We truncated the series to the first eight terms with error less
than 10-
6
(i)
for the following reasons:
The sum of the Poisson weights for A = ~ beyond the first
-6
seven terms is less than 10
(Molina, 1947).
(ii) The Incomplete Beta Function Ratio is bounded by [0,1] and for
O<W:<l
is a monotonically decreasing function of
1965).
~+
i (Gun,
,
A searching process is carried out for w until equation (63) is
a
satisfied.
The searching process is simplified by the fact that equation
(63) is a monotonically increasing function of wa with domain and range
[0,1].
The searching process for a particular a and a given T-K was:
(i)
Try w:
,
(ii) If w =
a
=~
1
2
in equation (63).
yields an evaluation of equation (63) = 1 - a
within the absolute error 5 x 10
-4 ,we stop.
(iii)If the evaluation of (63) lies outside the range
1 - a + (5 x 10
..
-4
,
) , we step w by one-half of the remaining
a
interval either to the right or left depending on whether
the error is negative or positive •
(iv) The process continues until the evaluation is within 5 x 10of 1 - a.
A 1410 IBM Computer was used, employing the Incomplete Beta Ratio
subroutine written by Gautschi (1964).
The values of w were carried
to eight decimals and then were used to compute u
,
monotonic transformation u
a
= qwa / 1~ '
a
a
a
by means of the
4
36
A Fortran statement of the computation of the critical points for
selected values of n - (k+1)
R
q is presented in Appendix A.
In Table 1
the critical points for the noncentra1 beta and F distribution were
rounded to three and two decimal points respectively.
Due to the great
sensitivity of transforming the noncentra1 beta critical points to the
I
I
qw
noncentra1 F critical points through u =
a/
I
an inconsistent
a
1-w
a
result in the second decimal point was obtained for the critical point
corresponding to q = 28, a = .05.
for q
= 26,
For that reason the critical points
27, 28, 29 were deleted from Table 1.
Intuitively, we
recognized the sensitivity of mapping the domain [0,1] of the noncentra1
beta distribution into the noncentra1 F distribution domain [0,00].
Furthermore we truncated an infinite series into an eight terms sum and
4
approximated the a level with an absolute error of 5(10- ).
On the
other hand we do claim that for the three decimals in the noncentra1
beta and for the two decimals in the noncentra1 F critical points, the
results presented in Table 1 are accurate.
The supporting evidence for
this claim is postponed until some values of the power of the
"multicollinearity" test are presented.
We have provided four choices
for the level of significance to emphasize the fact that the researcher
'0
should make his choice of a by considering the specific problem under
..
study and his personal "utility function" by which he weights the
Type I and II errors "costs."
If a large a level is chosen we imp1ic-
it1y regard as more "costly" smaller variance but biased estimates for
~1
than the B.L.U.E. of
rv
~1 (~1
versus
~1)'
On the other hand, if a
small a level is chosen we implicitly regard as more "costly" the B.L.U.E.
A
of
~l (~1)
rv
than the smaller variance but biased estimator (£1)'
37
Table 1.
Noncentra1 beta and noncentra1 F "multicollinearity" test of
critical points for one degree of freedom in the numerator
w
u
a.
:'\J
..
'0
.05
.10
1
.997
2
I
.25
.50
.05
.10
.988
.927
.696
340.34
.949
.897
.734
.438
3
.869
.788
.591
4
.789
.693
5
.717
6
a.
I
.25
.50
86.15
12.65
2.29
37.38
17.50
5.53
1.56
.315
19.92
11.16
4.33
1.38
.492
.246
14.96
9.04
3.88
1.30
.617
.420
.201
12.66
8.06
3.62
1.26
.654
.554
.366
.170
11.36
7.44
3.47
1.23
7
.602
.502
.325
.147
10.57
7.05
3.37
1.21
8
.555
.459
.292
.130
9.96
6.79
3.30
1.20
9
.516
.422
.265
.117
9.58
6.57
3.24
1.19
10
.480
.391
.242
.lC5
9.25
6.41
3.20
1.18
11
.451
.364
.223
.096
9.04
6.30
3.16
1.17
12
.424
.341
.207
.088
8.83
6.20
3.13
1.16
13
.400
.320
.193
.082
8.68
6.13
3.11
1.16
14
.379
.302
.181
.076
8.54
6.05
3.09
1.15
15
.359
.285
.170
.071
8.41
5.98
3.07
1.15
16
.342
.270
.160
.067
8.31
5.93
3.05
1.15
17
.326
.257
.152
.063
8.23
5.88
3.04
1.15
18
.312
.245
.144
.060
8.18
5.84
3.03
1.14
19
.299
.234
.137
.057
8.10
5.82
3.02
1.14
20
.287
.224
.131
.054
8.05
5.78
3.01
1.14
21
.275
.215
.125
.052
7.98
5.75
3.00
1.14
22
.266
.206
.120
.049
7.96
5.71
3.00
1.14
23
.256
.198
.115
.047
7.91
5.69
2.99
1.14
24
.246
.191
.110
.045
7.83
5.68
2.98
1.14
25
.238
.184
.106
.043
7.82
5.66
2.98
1.13
30
.203
.157
.090
.036
7.65
5.60
2.95
1.13
40
.158
.121
.068
.027
7.52
5.48
2.92
1.12
60
.109
.082
.046
.018
7.33
5.40
2.89
1.12
120
.056
.042
.023
.009
7.14
5.31
2.86
1.11
38
Furthermore, as it will be discussed in Chapter VI, the choice of the
significance level has implications in the bias and mean square of
sequential estimators.
The usage of a critical point of one in the
noncentral F, which corresponds approximately to a 50 percent level as
Table I indicates, will be of special interest in Chapter VI.
The context for using Table 1 is a multiple regression model of a
dependent variable regressed on K+I nonstochastic regressors.
It is
assumed that the regression error is normally and independently distributed with zero mean and constant variance.
the exclusion of
!l(kxl).
~+l
The researcher considers
in order to provide a "best" estimator for
The null hypothesis is A ~
2I '
!.~.,
~
the R.L.S.E. (b l ) is to
A
be preferred over the F.L.S.E.(b ).
l
The procedure is
SSE(il ) - SSE(~)
(i) Compute u * =
SSE(b) / n-(k+l)
~
~
where SSE(b ) is the er:or sum of squares when x k+ l is excluded
l
from the model and SSE (b) is the error sum of squares when
~+l
is included in the model.
(ii) For
, a chosen level of a, compare u* in (i) with critical point
u
'.
a
in Table I that corresponds to the degrees of freedom
n - (k+l).
(iii) If u*> u ' ,reject the hypothesis that the restricted
-
a
estimators are better in MSE.
* ,
If u <u
a
,accept the hypothesis
that the restricted estimators are better in MSE.
Power Values
In this section power values of the U.M.P. test for the null
hypothesis A ~
2I
are presented.
The values of the power were evaluated
39
for the following noncentra1ity parameters:
2.25, 4.0, 6.25, 7.5625 and 900.
0, .25, 03333, .5, 1.0,
When A ~ 0, the power is evaluated
directly by the Incomplete Beta Ratio subroutine since in this case the
noncentra1 beta distribution reduces to the central beta distribution, i.e.,
(64)
6(0) = 1 - I '
w
ex
(.!2
51)
' 2
where 6(A) stands for the power corresponding to the noncentra1ity
parameter A.
For the other values of the noncentra1ity parameter the
power evaluation is made by the following approximation,
(65)
6(A)
=1
N(A) i-A
E Ae
i=o ~.
.,
I
w
where the truncation N(A) depends on A.
'
.
(.!2 + ~,
ex
Table 2 indicates the re1ation-
ship between the truncation and the parameter A and also indicates
maximum truncation error for the evaluation of the power presented in
Tables 3, 4, 5, and 6.
In Appendix B the Fortran statement used in the evaluation of the
power is presented.
As for the search of the critical points, a 1410
IBM Computer and the Incomplete Beta Ratio subroutine written by
Gautschi (1964) were used.
The input of the Fortran statement consists
of:
..
(i)
The truncation term, N(A) •
(ii) The noncentra1 beta critical point rounded to four decimals
for the corresponding ex level and denominator degrees of
freedom.
From the power tables we can obtain the corresponding significance
level for the usual test by looking at the power for A
for A =
21
= 0.
The power
was evaluated in order to see how close we come with four
40
Table 2.
Largest values of i used in computing the power for A
..
Maximum truncation
error
N(A)
A
.2500
6
7
3 x 10-
.3333
6
6
4 x 10-
1.0000
9
10- 6
2.2500
13
10- 6
4.0000
17
10-6
6.2500
22
10- 6
7.5625
24
2 x 10- 6
9.0000
27
W~
a
+Oa
See Molina (1947, Table 2).
decimal points in the critical point to the intended a level since we
rounded to three decimals the critical points in Table 1.
5
percent level we found one case in which we had an absolute error
in the level of significance greater than 5(10
q
.
4
For the
= 1 where the value of the power was .0509.
1
-4 ).
This happened for
For the 10 percent level
all values of 8(2) were within the allowed absolute error.
For the
1
25 percent level the only case for which 8(2) was beyond the absolute
error was for q
=
120 where the value of the power was .2506.
For the
1
50 percent level two 8(2) were not of the allowed absolute error.
were .5008 for q = 16 and .5006 for q = 22.
They
We must realize that in
practice we are satisfied to be reasonably close to the intended a level
and that extreme accuracy in the evaluations of critical points is
partially lost when we only use two or three decimals critical points,
i.~.,
41
Table 3.
.
'.
Power values for the "multicollinearity" test at 5 percent
significance level
0
.2500
.3333
1
.035
.043
2
.026
3
6.2500
7.5625
9.0000
.046
.154
0169
0184
.038
.042
.291
.336
.383
.021
.035
.040
.408
.478
.548
4
.018
.033
.039
.490
.574
.654
5
.016
.032
0038
.089
0199
.362
.550
.641
.724
6
.015
.032
.038
.092
.212
.391
.593
.688
.770
7
.014
.031
.037
.093
.221
.412
.624
.719
.801
8
.013
.031
.037
.096
.231
.432
0650
.746
.825
9
.013
.030
.037
.096
.236
0444
0668
.763
.841
10
.012
.030
.036
.098
.243
0457
.684
.779
.855
11
.012
.030
.036
.098
0246
.465
.694
.789
.864
12
.012
.029
0036
.099
0250
.474
.705
.800
.873
13
.011
.029
.036
.100
.253
.480
.713
.807
.880
14
.011
.029
.036
.100
.256
.487
0721
.814
.885
15
.011
.029
.036
.101
.259
0493
.728
.821
.890
16
.011
.029
.036
.102
.262
.498
.734
.826
.895
17
.011
.029
.035
.102
.264
0502
.738
.830
.898
18
.010
0028
.035
.102
.264
.504
0741
0833
.900
19
.010
.028
.035
.102
.266
.508
.746
.837
.904
20
.010
.028
.035
.102
.267
.510
.749
.839
.906
21
.010
.028
.035
.103
.270
.514
0753
0843
.908
22
.010
.028
.035
.103
.270
.515
.754
.844
.910
23
.010
.028
.035
.103
.271
.518
.757
.847
.911
24
.010
.028
.035
.104
.274
.522
0762
.850
.914
25
.010
0028
.035
.104
0274
0523
.763
.852
.915
30
.010
0028
.035
.105
.279
.533
.773
.860
.921
40
.009
.027
.034
.105
0282
.540
.781
.867
.927
60
.009
.027
.034
.107
.289
.552
.793
.877
.933
120
.009
.027
.034
.109
.296
.564
.804
.886
.940
,"
42
Table 4.
;
".
"
Power values for the "multicollinearity" test at 10 percent
significance level
0
.2500
2.2500
4.0000
6.2500
9.0000
1
.068
.085
.181
.238
.295
.351
2
.053
.077
.248
.372
.501
.624
3
.044
.072
.081
.290
.456
.624
.767
4
.040
.070
.080
.318
.511
.696
.838
5
.036
.067
.078
.336
.545
.738
.876
6
.034
.066
.077
.351
.571
.768
.900
7
.033
.065
.077
.362
.590
.788
.915
8
.031
.064
.076
.369
.603
.803
.925
9
.030
.064
.076
.178
.377
.616
.815
.933
10
.030
.064
.076
.179
.382
.625
.824
.938
11
.029
.063
.075
.179
.386
.631
.830
.897
.942
12
.028
.062
.075
.180
.389
.637
.836
.902
.946
13
.028
.062
.074
.181
.392
.642
.841
.906
.948
14
.028
.062
.074
.182
.395
.647
.845
.909
.951
15
.027
.062
.074
.183
.398
.651
.849
.912
.953
16
.027
.062
.074
.183
.400
.654
.852
.914
.955
17
.027
.062
.074
.184
.403
0658
.856
.917
.956
18
.026
.061
.074
.184
.404
.660
.857
.919
.958
19
.026
.061
.074
.184
.405
.662
.859
.920
.958
20
.026
.061
.074
.185
.407
.665
.861
.922
.960
21
.026
.061
.074
.185
.409
.667
.863
.923
.960
22
.026
.061
.074
.186
.411
.670
.865
.924
.961
23
.026
.061
.074
.186
.412
.672
.867
.926
.962
24
.025
.061
.074
.186
.412
.672
.867
.926
.962
25
.025
.061
.074
.186
.413
.674
.868
.927
.963
30
.025
.060
.073
.186
.415
.678
.872
.930
.965
40
.024
.060
.073
.188
.421
.686
.879
.934
.968
60
.024
.060
.073
.190
.426
.693
.884
.938
.970
120
.023
.059
.072
.191
.431
.700
.890
.942
.973
43
Table 5.
Power values for the "multicollinearity" test at 25 percent
significance level
1
2
.
•
°
0
,"
2 02500
9.0000
0437
.749
.198
0529
.837
.922
3
.129
0191
0571
0894
.961
4
.120
.187
0594
.917
.974
5
0116
.184
0609
0811
.930
.981
6
.112
.183
.619
0822
.938
.984
7
.109
.181
.204
.625
.830
0943
0971
.986
8
.107
.180
.204
.630
0835
0947
0974
.988
9
.105
.179
.203
.635
.840
.950
.975
.989
10
.104
.179
.203
.638
0844
.952
0977
.990
11
.103
.178
.202
0383
.641
0846
0954
0978
.990
12
.102
.178
.202
.384
.643
.849
.955
.979
.991
13
.102
.177
.202
.385
.645
.851
.956
.979
.991
14
.101
.177
.202
.385
.647
.852
.957
.980
.991
15
.100
.176
.201
.386
.648
0854
.958
.980
.992
16
.100
.176
.201
.387
.650
0856
0959
.981
.992
17
.099
.176
.201
.387
.650
.856
.959
.981
.992
18
.099
.176
.201
.387
.652
.857
.960
0982
.992
19
.098
.176
.201
.387
.652
.858
.960
.982
.992
20
.098
.176
.201
.388
0653
.859
.960
.982
0993
21
.098
.175
.201
.388
.654
.860
.961
.982
.993
22
.098
.175
0200
.388
0655
.860
0961
.982
.993
23
.097
.175
.200
.389
0655
.861
0962
.983
.993
24
.097
.175
.200
.389
.656
.862
.962
.983
.993
25
.097
.175
.200
.389
.656
0862
0962
0983
.993
30
.096
.175
.200
.390
.659
.864
.963
0984
0993
40
.095
.174
.200
0391
0661
0866
0964
.984
.994
60
.094
.174
.200
.392
.664
0869
0966
0985
.994
120
0093
.173
.199
.394
0667
.871
0967
.986
.994
44
Table 6.
..
°
0
r"
Power values for the "multicollinearity" test at 50 percent
significance level
.2500
.3333
Noncentra1it
arameter A
1.0000 2.2500 4.0000
9.0000
1
.372
.440
.461
.600
.764
0882
.981
2
.338
.425
.451
.622
0813
0930
.996
3
.324
.419
.447
.631
.829
0943
.998
4
.317
.415
.445
.635
.837
.948
.998
5
.313
.414
.440
.638
.842
.952
.998
6
.309
.412
.443
.640
.844
.954
.999
7
.308
.411
.443
.642
.847
.955
.999
8
.305
.410
.442
.642
.848
,956
.991
.999
9
.304
.410
.441
.643
.849
.957
.992
.999
10
.303
.409
.441
.643
.850
.957
.992
0997
.999
11
.303
.409
.441
.644
.851
.958
0992
0997
.999
12
.302
.409
.441
0644
.852
.958
.992
.997
.999
13
.301
.408
.440
.644
.852
0958
.992
0997
.999
14
.301
.408
.440
.645
.853
.959
.992
.997
.999
15
.300
.408
.440
.645
.853
0959
.992
0997
.999
16
.300
.408
.440
.646
.854
.959
.993
.997
.999
17
.299
.407
.440
0646
.854
0959
.993
.997
.999
18
.299
.407
.440
.646
.854
0960
.993
.997
.999
19
.299
.407
.440
.646
.854
.960
.993
.997
.999
20
.298
.407
.440
.646
.854
.960
0993
.997
.999
21
.298
.407
.440
,646
0854
0960
.993
.997
.999
22
.298
.407
.440
.646
.855
.960
.993
.997
.999
23
.298
0407
.440
,646
0855
,960
.993
,997
.999
24
.298
.407
.440
.647
.855
.960
.993
.997
.999
25
.298
.407
.440
.647
,855
.960
.993
.997
.999
45
we might be accurate to eight decimals but by rounding to two decimals,
we are not better off than having accuracy to, say, five decimals.
Some Comments about the Critical Points and Power Tables
A comparison of Table 1, Noncentra1 beta and noncentral F
."
"multicollinearity" test critical points for one degree of freedom in the
numerator, with a standard central F table indicates that for the same a
level the critical point for the "multicollinearity" test is greater than
the critical point for the usual test.
Tables 3, 4, 5, and 6 indicate
that the power function is strictly increasing for all points A for which
S(A) <
1.
They also provide the level of significance at which the usual
test is carried out (A=O).
It is also evident from the power value tables
that for a given a level and values of A smaller than one-half, as q increases the power decreases, while for values of A larger than one-half,
the power increases.
From Table 3 we find that for q
= 18(1)25
and q
= 30
corresponding a level for the usual test is 1 percent.
the
Since Tang (1938)
presented critical points in the noncentral beta for the usual test at
1 percent level, we can compare the corresponding values of Table 1 with
.
Tang's values.
The differences in Table 1 and Tang's critical points can
be attributed to the fact that the values of the power for A=O are rounded
to the third decimal point.
For that reason in Table 7 we incorporate the
value of S(O) at four decimals in order to provide a justification for the
discrepancy in the critical points.
To indicate the sensitivity of transforming the noncentra1 beta
critical point into the noncentral F critical point we present a comparison in the critical points of the noncentral F distribution and the
46
Table 7.
..
a
Comparison of Tang's 1 percent and "multicollinearity"
5 percent critical points
0
a
q
6(0)
woOl
18
.0103
.315
.312
19
.0103
.301
0299
20
.0101
.288
.287
21
.0101
0276
.275
22
.0099
.265
.266
23
.0098
.255
.256
24
.0099
.246
.246
25
.0098
.237
.238
30
.0096
.201
.203
w. 05
See Tang (1938).
central F distribution.
The comparison is presented in Table 8 for
q = 20, 24, and 30 using Table 1 and a standard central F table.
In
Table 8 we include the corresponding critical points for the beta
distribution.
'.
c'
The difference for q = 24 in the F critical points is due to
the fact that even though for three
o
computations for u.
Ol
and u.
05
.0
' = .246 the
w.
= w.
05
Ol
dec~mals
were carried out by different procedures
and accuracy.
From Table 4 we find that for q = 24, 25, and 30 the corresponding
a level for the usual test is
221
percent.
In Table 9 we include the
power for A=O using four decimals and compare the corresponding usual and
47
"multicollinearity" critical points via Table 1 and a standard central
F table.
Table 8.
Comparison of critical points for selected denominator degrees
of freedom when B(O) is approximately .01
q
o
u. 01
20
8.10
8.05
.288
.287
24
7.82
7.83
.246
.246
30
7.56
7.65
Table 9.
w. 05
Comparison of F critical points for selected denominator degrees
of freedom when B(O) is approximately .025
q
B(O)
24
.0
0
u .025
u. 10
.0254
5.72
5.68
25
.0253
5.69
5.66
30
.0246
5.57
5.60
From Table 5 we find that for q
-.
.203
level for the usual test is 10 percent.
= 14,
15, the corresponding a
In Table 10 we include the power
for A=O using four decimals and compare the corresponding usual and
"multicollinearity" critical points via Table 1 and a standard central
F table.
This section provides neither rigorous nor exhaustive accuracy
checks of the critical points table for the "multicollinearity" test.
There are two reasons for this:
first, there are not to my knowledge
available tables for the central F tables with the wide range of a levels
48
Table 10.
Comparison of F critical points for selected denominator
degrees of freedom when 6(0) is approximately .10
q
.
.-
0
t3(O)
u. 10
u. 25
15
.1001
3.07
3.07
16
.0999
3.05
3.05
implied in Tables 3, 4, 5, and 6 for the usual test; second, our main
purpose in this section is to emphasize the relationship in the levels of
significance for the usual and "multicollinearity" tests.
-.
• 0
49
CHAPTER VI
ESTIMATION AFTER PRELIMINARY TESTS OF SIGNIFICANCE
Sequential Estimation
The context in which we considered the mean square error test in
Chapter IV assumes a well-specified model where the error term is
normally independently distributed with mean zero and constant variance.
But the nature of economic data being what it is, a restriction on the
parameter space may need to be considered.
In considering the restriction,
the researcher may be more interested in the properties of his structural
estimates than in the "truth" or "falsity" of the restriction.
thesis we decide between the F.L.S.E. of f
b
l
and the R.L.S.E. of f ,
l
'V
l
vs. 01' via the mean square error criterion.
used is f
z=
In this
!.~.,
The underlying restriction
0 with the purpose of finding the "best" estimator of f
l
where the decision is based upon the existence of "multicollinearity."
In Chapter IV
a U.M.P. test for "multicollinearity" is obtained and in
Chapter V the critical points for this test given m=l, Le., Sk+l = 0,
are presented.
For this simple case the criterion simplifies to test if
or
".
S2
k+l/ZV(b
~
21
in order to decide if
~+l
should be excluded from the well-specified regression model of full rank
in order. to obtain the "best"
."
k+l
)
estima~or
of fl'
Traditionally, researchers have approached the problem of excluding
variables from a different context.
They have started with an uncertain
specification in the linear regression model of full rank.
After working
the F.L.S.E. of f, they have decided about the inclusion of a nonstochastic
S2
xk+ l by ~esting the hypothesis that Sk+l = 0, i.~., k+l!ZV(b
) = O.
k+l
50
A comparison of the M.S.E. test (test for multicollinearity) and the
usual test is found in Chapter IV.
As we noticed in Chapter V, the basic
difference is the level of significance chosen for the test discounting
the motivation issue.
The analogy in the above approaches also lies in the fact that in
..
both cases a preliminary test of significance is used as an aid in deciding the estimator to be used for
'V
~1'
would be the R.L.S.E. (b ) if the H :
- 1
0
The sequential estimator of
S~+1/2V(b
)
k+l
~~
~1
is accepted,
and the F.L.S.E. ('£'1) otherwise when the "multicollinearity" test is
used.
(b-1 )
When the usual test is used the estimator of
if the H : Sk+1
0
=0
~1
would be the R.L.S.E.
is accepted, and the F.L.S.E. (b ) otherwise.
1
Bancroft (1944) pointed out that sequential estimators of regression
coefficients on a two-regressors model based on a t-test are biased, the
degree of bias depending on several parameters including the level of
significance at which the test is made.
In the following sections a
comparison of the bias for the sequential estimators of Sl under the M.S .E.
test, the usual test (both at 5 percent significance level) and a test
where the critical point is one is provided.
From Chapter V we noticed
that a critical point of one in the F distribution is approximately
'.
equivalent to the M.S.E. test at 50 percent level of significance.
.'
An
intuitive explanation for such a critical point would be that computed t
is the "estimating analogy" of S2/I (b)
V
,L!:.., computed t is equal to
2
Since
b1
is a biased estimator and b
1
is an unbiased estimator and
realizing that the basic difference among the above mentioned tests is
the
selection of significance level, or viz. the selection of the
51
critical point, the bias of the sequential estimators disappears as a
goes to one,
!.~.,
the critical point in the F distribution goes to zero.
If the critical point is zero, the null hypothesis is rejected with
probability one and the unbiased estimator b
1
is used.
Henceforth an
analysis is carried out to determine the effect of the chosen level of
significance in the mean square of the sequential estimator in the tworegressors model as specified by Bancroft (1944)0
The analysis would
seem to indicate that even though the bias decreases as the critical
point goes to zero, holding constant the parameters of interest, the
mean square behavior depends on the value of the noncentrality parameter
of the noncentral F distribution.
Sequential Estimator Bias
The regression model of two regressors to be used is the same as
Bancroft (1944).
Let
i=l, ••• , n
where Yi' Xli and x 2i are deviations from their respective means; Sl and
S2 are the respective population regression coefficients and
..
E
i
is
normally independently distributed with zero mean and unit variance. The
XIS are nonstochastic with unit variances and "correlation coefficient"
.-
r
12
so that
2
LXIi
=
n-l,
n-l,
i
i.~.,
the nonstochastic variables Xli and x
(Goldberger, 1964, p. 198).
2i
are standardized variables
52
For purpose of easing the calculation of the sequential estimator
bias, the following orthogonal transformation is carried out:
Then the transformed model is
(67)
The above transformation provides us, after applying least squares
estimating procedures, with the following results:
tV
(i)
an unadjusted estimator for 8
(b )
1
1
(ii)
an adjusted estimator for 8
(b )
2
2
(iii) mean square due to xl ignoring x (S2)
2
1
(iv)
mean square due to x
(v)
residual mean square (S;)
2
after fitting xl (S;)
tV
(vi)
b
(vii)
si, s;,
1
is independent of b
and
2
s; are independent sum of squares
In Goldberger (1964, pp. 176-196), a general presentation of the
above results can be found.
Since b
1
0.
E(b ) = 8 + r
8 - r
E(b ).
12 2
2
1
1
12
Hence the unconditional expectation of
0-
b
1
is 8
=
1
and any expectation (conditional or unconditional) is 8
1
if
o.
The bias and mean square comparisons will be carried out for the
following sequential estimators:
°
(i) The "usual" sequential estimator (B ) where we test i f 82 =
1
2
8
at a. level, .!..~., A = 2/ 2V (b)= 0, where A has been previously defined
2
'f"
Ii:
53
as the noncentrality parameter in the noncentral F distribution and
V(b )
2
2
= 1 /(n-l) (l-r l2
),
B1 =
{
*<
u* >
-
ti' 1
if u
u
b
if
u
l
0
ex
0
ex
*
w*
0
i~~.
, w
<
w
ex
i.~.
,
->
w
ex
0
(ii) The "mean square" sequential estimator (B ) where we test if
2
82
1
A = 2/
~ 2 at ex level,
2V(b )
2
'\,
b
l
{
if u
*
<
U
> u
-
w* < w
ex
ex
w* > w
ex
ex
(iii) The "50 percent" sequential estimator (B ) where we test if
3
A~
21
at approximately .50 level,
'\,
b
B =
3
{
b
l
l
*
u*
1
* < -l+q
1
w* > - -l+q
if u
< 1
i.~.
, w
if
> 1
i.~.
,
We notice that the basic difference in the above sequential
* * o
estimators is the chosen level of significance where u , w , u ,
ex
,
and w have been previously defined in Chapter IV.
ex
Therefore we will
proceed to obtain the bias of a sequential estimator for a general ex
level and subsequently make the proper substitutions for each special
case.
Let us call the "general" sequential estimator B where a prelimi-
"
nary test of significance of 8
Hence,
'\,
b
B = {
b
and
l
l
2
0 at ex
=
*<
u* >
-
if u
u
if
u
ti'
E[B] = E[ 1/
u
*
< u
ex
ex
] Pr(u
ex
has been used.
*
*
w*
io~o
, w
>
i.~.
,
<w
ex
< u ) + E[
ex
~,
w
ex
bl
/
u
*>
-
u
ex
] [l-Pr(u*<u )]
ex
I
54
~
e
where E[e/n] stands for expectation of
2
*
and u
given
Pr stands for probability
S2
=:T'
S3
Since si, s;, s; are independent and
of
n,
~1 given
~1
= sl/ ln _1 ' the expectation
S2
-%S3
~1' !.~.,
< ua is the unconditional expectation of
Henceforth the expectation of B simplifies to
(69)
E(B) = (B 1+r 12 s2 ) pr(u* < u a ) +[(B1+r12S2)-r12E(bZ/u* > u )]
-
times [l-Pr(u*<ua )]
r
=
1Z
E(b ZI *
*
u >u ) [l-Pr(u <ua )]
-a
S1+r 1Z SZ - r 12 E(b ZI u*>u )Pr(u *>u )
- a
- a
From Bancroft (1944) we know that
(70)
Sz
E(b ZI *
=
u >u )
Pr(u* >u )
-a
-a
00
E
Ai e -A
.,
i=o
I
1..
x
n-3
2
(0
2+ i )
2
2
"
Sz
2
A = - (n-1) (l-r )
2
12
where
x
x =
n-3
(n-3)+u
=
n-3
(n-3)+u
0
x
and
I
x0
=
J
0
0
a
r (n;2i)
r(n;3)r(fti)
(the incomplete beta ratio).
-3 + i-1
n-3
-2- - 1
x
(l-x)
2
dx
a
55
By the relationship between the F distribution and the beta
distribution, we obtained the following:
w = ~--=,u~_
(n-3)+u
=1
x
= 1 - x,
- w
using the well-known identity
I
x
3+2i ) = 1 - I
(3+2i
2
1-x 2
'
(n-3
2
n-3
-)
2
we obtained
00
(71)
E
i -A
A e
i=o
.,
1..
I
00
n-3
x
o
(2
3+2i ) = 1
2
-
E
i=o
Aie -A
.,
1..
I
(3+2i
w
o
2
n-3
-)
2
Substituting (70) into (69) we obtained
00
(72)
E(B)
i -A
A e
i!
I
x
(n;3 , 3;2i )]
o
Therefore the bias of B is
.
00
Bias
(73)
Ai e -A
i!
I
n-3
x
o
(2
3+2i )]
2
The following deductions follow immediately:
(i)
There is no bias in estimating 6 , if r
or 6 is zero,
12
2
1
(ii)
the absolute value of the coefficient of 6
2
is at most one,
(iii) the sign of the bias depends on the signs of r
(iv)
the bias is independent of 6 ,
1
12
and 6 ,
2
56
(v)
the absolute bias decreases as the significance level
approaches to one, !.~., as u approaches zero, and
a.
(vi)
the absolute bias decreases as n increases for u
a.
since then x approaches one.
=1
o
The "mean square" sequential estimator bias was evaluated for sample
sizes 5, 11, 21, for r
and 4.0.
12
=
.2, .4, .6, .8, and S2 = 0.1, 0.4, 1.0, 2.0,
These are the same special cases analyzed by Bancroft (1944)
under the "usual" test and the "50 percent" level test.
In Table 11 the critical points in terms of u' s (F distribution),
w's (beta distribution), and the corresponding x 's are summarized.
o
The
critical points for the "mean square" test at 5 percent level can be
found in Table 1 in Chapter V.
The critical points for the "usual"
test at 5 percent level can be found for u in any standard F table and in
Tang (1938) for w.
In Tang (1938) and Table 3 in Chapter V,
an evaluation of the "mean
square" and "usual" tests of power function for selected values of the
noncentrality parameter (;\) can be obtained.
In Table 12 the "50 percent"
test power function for selected values of ;\ is presented.
The implications of the behavior of the power for the tests considered
.
will be shown to be more important in the next section when the mean square
errors for the sequential estimators are derived.
As it will be shown, the
mean square error is a function of the significance level and the noncentrality parameter.
e
..
Table 11.
-
.0
..
e.
Critical points used in the evaluation of the sequenti<:il estimator's bias
"Mean square" sequential
estimators
"Usual" sequential
estimators
"50 percent" sequential
estimators
n
sample
size
n-3
degrees of
freedom
5
2
37.38
.949
.051
18.51
.903
0097
1
.333
.667
11
8
9.96
.555
.445
5.32
.399
.601
1
.111
.889
21
18
8.18
.312
.688
4.41
.197
.• 803
1
.053
.947
Table 12.
,
u. 05
I
,
w. 05
I
x
u~05
0
I
0
w.OS
I Xo
u
I
w
x
I
0
Powers for the "50 percent".test for selected values of the noncentrality parameter for one
degree of freedom in the numerator
Denominator
degrees of
freedom
n-3
=q
0
.2500
.3333
.5000
1.0000
2.2500
4.0000
6.2500
7.5625
9.0000
2
.423
.511
.538
.586
.704
.871
.960
.991
.996
.998
8
.347
.451
.482
.540
.678
.870
.964
.994
.998
.999
18
.331
.439
.471
.531
.673
.870
.966
.994
.998
.999
U1
.......
58
A 1410 IBM Computer and the Incomplete Beta Ratio lubroutine written
by Gautlch1 (1964) were uled in the evaluation of ehe ,equantial e.timator bie.el.
For the "ulue1" and "SO percont" .equential eltim.torl,
analoaoul relult. to Bancroft (1944) were obtained.
A Fortran .tatement
of the proaram u.ed to evaluate the bie. can be found in Appondix C.
In Table 13 the corro.pondina value ot the noncAntrAlity param.ter(A)
i. obtained for the different combinAtion. of n, r 12 , And 92 , We
delated from the infinite .erie. of tho wltaheAd Ineompleto Seea aatiol
thOle term. for which the Poillon woilhtl WiUI l@u tluu\ 10.. 1 (ilil
Molina, 1947).
A .liahe modUie.don
Will
mado 111. the
{~omp\lt.t1on
at
the bi•• for lome A'. (repro.onted with Altlrlikl In TAbl. 14 ) lineA
tho computinl time rlquirld for t,hl oVAluaUc}fl of tho Ineomplot. Deta
Ratio wa. relatively lona_
one for tho torm. I
!3 + 1)'
xo
Tho modificAtion WAI to iAt I
Xo
'n-3
, - , -23 + i ' + k). k • 1,2,.. q whon I
~
.99999 linea Ix i. an inc:I.I1na !unct1on gf eho
o
ment (loa Gun, 1965),
Aqual to
n-3
XC)
(-2 '
~.egnd
aflu-
In Table 14 tho corro.pondinl bi.,UU) tot tb. "m@An iq,UAn", "ulual"
and "50 porcont" IIlquanti.l I.timatorl Aro prllHulciid for Ull iplcia.l
call. conlidor.d.
Th. ro.u1tl lo.m to
1nd1~.t.1
(i)
Th. bia. decr.a", A' WI fix r 12t B2, And the .1an1fieAncl
level and iner•••• tho lampl. 11~.~
(ii)
the b1al dlereale. conliderably A. we f1x f'12'
and incr.a.e the lovel of eian1ficancI,
a2 -
and n
(111) the b1a. incraa", without exception Ai wo fix n. 82• and
the 111nificanco lav.l and 1ncrOBB81 r 12 (wh-.n the colline.rbotwe.n Xli and x2i incraall. or when ~ d.er••••') and
e
Table 14.
u.
~
0.1
82
0.4
1.0
2.0
4.0
.. -
.2
.018
.073
.168
.250
155
u
___
0
~
0.1
B2
~
.2
.017
.067
.142
.162
.035
n = 5
,
05 = 37.38
.4
.037
.146
.340
.526
.. 378
_......
u.
.6
.055
.220
.520
.855
..........
784
.4
.034
.134
.292
.358
.101
.6
.051
.202
.455
.627
.282
u=l
n = 11
,
u. 05 = 9.96
.2
.018
.067
.076
.002
. 000
- --
.8
.074
.294
.713
1. 278
1.648
.4
.037
.135
.175
.009
. 000
- -0
05 = 18.51
~.2 1.4 1.6
0.1
0.4
1.0
2.0
4.0
e
"
.•
e
The bias in estimating fi 1 via the "mean square,-" "usual," and "50 percent" sequential
estimators
,
0.4
1.0
2.0
4.0
..
..
u.
.2
.015
.049
.028
.000
.000
.8
.069
.272
.640
1.038
.898
1.8.2
I
.6
.055
.208
.326
.049
.. 000
- --
.6
.046
.159
.164
.001
.000
u=l
1.4 1.6
:II
21
u. 05 = 8.18
.8
.074
.285
.567
.335
.. 000
- --
.2
.018
.056
.010
.000
. 000*
- --
.4
.036
.117
.032
.000
000*
0
05 = 5.32
.4
.030
.101
.072
.000
.000
n
u.
.2
.014
.033
0001
.000
.000
.8
.061
.227
.350
.083
.000
I
.6
.055
.187
.100
.000
000*
.8
.074
.271
.332
.010
- 000
-
05 = 4.41
.4
.029
.071
.006
.000
.000
.6
.044
.122
0025
.000
.000
1.8 _.2: 1:.4 : .6
.8
.059
.193
.132
.001
0000
U=j
1-.,;...;;......8_
.004
.008
.011
.015
.004
.008
.011
.015
.004
.008
.Oll
.015
.012
.026
.040
.057
.009
.019
.032
.051
.005
.011
.022
.041
.011
.025
.049
.095
.001
.003
.010
.039
.000
.000
.001
.009
.000.002.008.043.000.000.000.000.000.000.000.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
.000
m
o
~
...
61
(iv)
the bias increases and then decreases as we fix r
12
, n, and
the level of significance and increase 820
Sequential Estimator Mean Squares
Following an analogous procedure to the derivation of the
sequential estimator bias, the expression of the sequential estimator
mean square is obtained.
The sequential estimator mean square for a
general significance level is derived and subsequently the appropriate
changes are made to calculate the corresponding mean squares for the
"mean square", "usual," and "50 percent" sequential estimators.
As for
the bias, the mean square is dichotomized:
'\,
MSE(
MSE(B)
(74 )
b1
*
/ *
)Pr(u <u )
0.
u eu
ex
+ MSE( b 1 / u *>u )Pr(u *->u0. )
-0.
1
'V
Since the variance and bias of the R.LoS,E. (b ) are --- and r
8 ,
12 2
1
n-1
b
S~ is independent of
and
u, the MSE( 1 1 *
) is equal to the
u eu
b
unconditional mean square error, !'~"
MSE(
1
0.
lu*<u)
122
r1Z8Z
= n-l +
0.
Since
b
b' 1
the MSE(, 1 1u * >u ) is equal to
'\
'V
MSE[(b
1
- r
*
)/u ~Uo.].
A
b
1Z Z
-0.
By denoting a conditional expectation by
b
E/, we can write MSE( l/ u *>u ) as
-0.
'V
(75)
E/ [(b
ti' 1
and b
Z '
-
Z
'V
Since E/(b
1
l
8 )
1
'V
= MSE(b 1)
and using the independence between
the MSE(b 1/u*>u ) can be written as
-0.
c
62
b
MSE( 1/
(76)
2
2
i -A
A e
I
(n-3 1
i!
x
2'2
Pr(u >u ) i=o
o
2r
*
u >u
13
12
00
2
L:
*-a.
-a.
A2
b
E( 2/
+
* )
u >u
-a.
b
where we have substituted the previous result for the E( 2/
*
u >u
-a.
).
Following the same technique used by Bancroft (1944) to obtain the
expectation of b
2
given u
*~ua.
A2
' we obtained the expectation of b given
2
u *>u •
-a.
A2
We wish the expectation of b
2
since S2
where c =
2
Since b
2
and
s;
u *>u or
-a.
1
---=--~
2
(n-1) (l-r 12)
are independent, their joint distribution is
1
1(n-3) - -(n-3)V
Vf
(77)
and V
3
e
2
3
= S32 •
Making the following transformation of variables
the joint distribution of b
(78)
K
e
A
Z
-(b -13 ) /
2 Z
c
2
and
Z
is
1
A2
-(n-5)
1
- -(n-3) zb A2
AZ Z
2
2
(zb )
e
b dz
Z
2
A2
*
Taking the expected value of b when u >ua.'
2
we have
i.~o,
db
Z
1
z <--
-uc
a.
i)
63
(79)
00
times
dz db
L
i=o
Pr(z
where
2
,'(
1
< -- )
- u c
=
a
Pr(u >u )
-a
and
-oo<b <00
2
1
u c
o<z<-
,
a
The odd terms of the series will vanish whether n is odd or even
when b
2
is integrated out since for those terms we have an odd function
After integrating out with respect to b
2
for the even terms of the
series using the properties of integrating an even function of the gamma
variate-type, we have an infinite series whose typical term is of the
form,
2
-S2/2c
( 80)
l/u c
a
2 .
J
K e
Pr(u*>u )
-a
(B~
2i(2c)
i .
r(n+2i)
2
2
C
n+2i
2
n+2i
(21). [l+(n-3)cz]-2-I
0
By making the transformation of variable,
x =
c(n-3)z
l+(n-3)cz '
1
dz
c(n-3) (l-x)
,and since
.!..~.
1
z =-
uc
x
o
2
,
1 - x =
dx
n-3
(n-3)+u
a
1
-(n-S)
2
z
1
-:--~~.,----
l+(n-3)cz
dz
64
"2
the expectation of b given u *>u can be written as
2
-a.
(81)
i -;\
00
c
E
Pr(u*>u )
Ae
(2i+1)
i=o
-a.
I
0'
~o
x
n-3
2
1+ i)
2
(0
(using the following identity,
r(l
2
+ i) 22i+1
=
(II) 1/2 (Zi)!
Z
and letting ;\ = f3 /
Z
\
(82)
Zc
2i+1
i!
Since
'"
1
Z Z
MSE( °1/ u *<u ) Pr(u *<u ) = (n-l
+ r 12(32) Pr(u *<ua. )
a.
a.
and
b
(83)
MSE( 1/
*
u >u
-a.
1
*
) Pr(u >u ) = (n-1
-a.
2 Z
>u )
+ r 12 (32) Pr(u*-a.
i-;\
00
'"'
'-'
.,
;\ e
I
~.
i=o
(2
o
i -;\
00
2
+ rlZc
n-3
x
E (Zi+1) ;\ i!
e
I
i=o
n-3
x
(-2- ,
2"3 +
o
we finally obtain,
00
(84)
E
i=o
i -;\
7,
(2i+l) ;\
~o
i-;\
;\ e
.,
~.
n-3
(-Z- ,
As a check for our algebra, we could let u
of the critical point implies that b
n-3
3
Ix (--Z- , 2" + i) = 1 for all i since
o
of B simplifies to
I
a.
n-3
x
o
3
'2 +
X
o
i)
be zero.
is always used.
l
(2
This value
If u
a.
= 0 then
= 1, and the mean square error
i) ,
65
1
2 2
2 2
MSE(B) = --+ r S - 2r S
n-1
12 2
12 2
( 85)
2
+ 2r 12 c
But since
"" Ai e -A
1:
= 1,
. I
1.
""
L
fA ie-A
""
fA ie-A
L
i=o
i=o
Ae
1:
= A,
. I
1.
. I
1.
i=o
2
+ r 12 c
i!
i=o
i'''':~
""
"" Ai e -A
L
i!
i=o
cA =
S2
2/
2
the mean
square of b * simplifies to
1
_1_ + 2
= _ _--=1=--~_
n-1
r 12 c
2
(n-1) (1-r )
12
(86)
which is the mean square of b "
1
tV
If h 1 is always used, this implies
n-3
3
I x (-2, 2 + i) =
o
° for all i
since
X
o
UCJ.
= "", !.e.,
= 0, and the mean square of B
simplifies to
( 87)
tV
which is the mean square of b •
1
In this section the simplified model of Bancroft (1944) was used
for the purpose of comparison, but we could use the same approach for
.
a regression model with two regressors.
will be that the variance of
instead of -------1--2~­
(n-1) (l-r )
12
£
will be
The two basic modifications
02
instead of 1 and c = __~=1__~
2
2
Lx (1-r )
12
2
The orthogona1ization process will be
carried out with a new transformation matrix but the properties of
independent sum of squares will remain invariant.
Substituting the correspondent critical points the mean square of
the sequential estimators were calculated for the special cases under
66
consideration.
The results are presented in Table 15 in which the
single (double) asterisks mean that I
n-3
(--Z-,
terms Ix
23 +
i
'
was set equal to one for the
x
n-3
(--z-,
23 +
0
+. K), K = 1,Z,000, when Ix
o
'
i ) >
o
.99999, (.9999), since I
x
is an increasing function of the second
o
argument.
The results in Table 15 seem to indicate:
(i)
The mean square decreases as we fix r
' SZ' and the
lZ
significance level and increase the sample sizeo
(ii) The mean square increases as we fix n, SZ' and the significance
level and increase r
12
0
(iii) The mean square increases and then decreases as we fix r
12
n, and the level of significance and increase SZ'
(iv)
For A <
21 '
the mean square gets smaller as we reduce the
1
significance level and fix n, r 12 and S20 For A > 2 ' the
mean square gets smaller as we increase the significance
level and fix n, r
12
and S2 (see Table 13 for the values
of A).
These are the following three exceptions for the behavior of the
mean square for A >
1
2'
The mean square increases and then decreases
as we increase the significance level for:
(i)
Sample size 5 and A = .7Z0.
(ii)
Sample size 11 and A
=
.512.
(iii) Sample size 21 and A = .576.
,
e
Table 15.
The mean squares of the "mean square," "usual," and "50 percent" sequential estimators
n
,
u. 05
S2
.~
0.1
0.4
1.0
2.0
4.0
.. -
.4
.251
.257
.287
.359
389
.. ---
.255
.279
.400
.708
. 921
- -
~~
2
f.3
~
o
0.1
0.4
1.0
2.0
4.0
u. 05
.2
.252
.258
,284
.328
.290
.2
.4
.6
.101
.107
.121
.105
.. 104
_....
.103
.128
.196
.128
119
.108
.164
.361
.231
156
,
.6
.8
.264
.317
.596
1.369
2.322
.290
.385
.898
2.488
6.008
.4
.258
.281
.391
.590
.472
.6
.273
.325
.583
1.142
1.118
J
.4
.289
.297'
.3ll
.300
.298
.6
.365
.383
.428
.405
.391
n
0
= 18.51
u=l
.2 ]
.258
.260
.263
.261
.260
u. 05
= 11
= 9.96
=5
= 37.38
.2
0
0.1
0.4
1.0
2.0
4.0
e
e
,.
\
I
.8
.611
.646
.768
.802
.696
I
.2
.101
.107
.112
.104
.104
.4
.106
.128
.156
.120
.119*
,6
.117
.169
.272
.164
.156
.8
.121
.222
.684
.919
280
.4
.6
.8
.050
.056
.055
.052
052*
.052
.075
.076
.060
060*
-
.056
.1l0
.155
.078
- .
- 078*
.064
.165
.447
.161
u. 05
.8
.148
.246
.576
.459
.278*
I
.4
.1l6
.122
.121
.1l9
.119*
I
= 21
= 8.18
.2
0
u-1
.2 ]
.104
.105
.104
.104
.104*
n
u. 05
= 5.32
u. 05
.8
.319
.413
.901
2.228
3.723
,
.2
.051
.055
.052
.052*
.052*
- 139
= 4.41
.4
.054
.071
.063
.060
.060*
.6
.061
.105
.099
.078
.078*
.8
.080
.170
.279
.140
.139
u-1
.6 1.8
.147
.245
.162
.278
.166
.325
.156
.281
.156
.278
I I I
.2
.052
.053
.052
.052
.052**
.4
.058
.062
.060
.060
.060**
.6
.074
.085
.079
.078
.078**
.8
.124
.151
.151
.139
.139**
0\
.......
68
SUMMi\RY) CONCU1SJ ONS !'Nfl Si!i,I';STION',FOf< FURTHER RESEARCH
of Q. betweeu the (cr, S,F
1
q~ll
of full rank.
,JE
In ,Lie'
:j',!llg
'l\1d the F L S E
From
th~;
mode J
In a linear model
have defined "rnultic011 Lnearity"
OJ
sorB
(2..1>
in !:t'lm,;
by
linear combination of the esti-
Utl' nimccoualLty paramete.r of the F
distribution.
In Chapter I
presented.
i
ThE'
Ioe:.!'
model of
t,LlI ranl<- witb '.
i
A p;nUtto(ling
carried out in ouler
1.1 1
l;,oLI((' LI
regression, speclflCdr.[OUpru'l,
of sIgnIficance 'md
llH'.H1
"''ilj,ne
The' n.' I.dt1QilBhip among stepwise
1T1'"ltl'-·'Jl',l}P;Hlt~y)
preliminary tests
Related literature
p,',
was also Cited in Chaptci
The selection
two-regr-eS30(
OC(I,'('('O
m'Ud,,~ 1
An explanatJ.on f)f
,7 Ld
t.. l,t'
"m"tltlcol
procedure (If dE'lct.Jng
j 1\W
dnd
ju:n ('
D'II.'nn
li.:;ri,lhlc
Chapter 1 I properlH':·; (It
need12<! 1n the »i,"H'( "I
the I< I .S F'
'j
i,1"
tlw (,":di
1.',;"
,:
ttl(~
I:.,L
S.E
of 13
1
1n the
no t t:, d l.BclJssed in Chapter II.
and c.'J l l1m"\l'.s ahout. t.he uaual
Un' (
10iJ,
,·ti"1
i'11,·1
11Jd
Ul"(:
1
,ne give.n.
1'~'it\lIlLltOtS
that
In
a.n~
Ii ,~: 1'''1
presented
in terms 01
t!>t'
'.I·pl"ill'·'·
"
lilV
,1'Ihl."'liY
liilt''11
!lili(II()ll
1:1 give.n.
69
The theoretical framework of the U.M.P. test for "multicollinearity"
is discussed in Chapter IV.
Also a critical points and level of
significance comparison for the usual and the "multicollinearity" test.
is presented.
The minimum variance unbiased estimator of the noncentral-
ity parameter is derived in this chapter.
In Chapter V the procedure to obtain the "multicollinearity"
critical points and power values is described.
A table of critical
points in terms of the noncentral beta and the noncentral F distributions
is presented for one degree of freedom in the numerator and four levels
of significance, with an explanation of its usage.
"Multicollinearity"
test power value tables for selected noncentrality parameters under four
levels of significance are presented in this chapter.
A comparison with
standard F tables and Tang's power tables is given in order to emphasize
the basic difference between the usual test and the "multicollinearity"
test.
In Appendices A and B the computer programs for the evaluation of
critical points and power values are presented.
The bias and mean square error of a sequential estimator of Sl in a
special two-regressor model are presented in Chapter VI.
The evaluation
of bias and mean square error is carried out for selected sample sizes,
r
..
12
and SZ.
Also a comparison of bias and mean square error of the
sequential estimators corresponding to three significance levels is
provided in Chapter VI.
Tables summarizing the results for bias and mean
square error are presented in this chapter with the corresponding computer
program for evaluating bias and mean square error appearing in Appendix C.
70
Conclusions
Using the criterion of minimizing the mean square error of any
linear combination of the estimators of !l in a linear model of full
rank, we were able to integrate the topics of stepwise regression,
specification error, "multicollinearity" and preliminary tests of significance under a single framework.
The decision of selecting the R.L.S.E.
I
'
criterion of MSE(..£ ~l) ~ MSE(.£ £1)
A
for all nonnull .£ can be articulated as preferring the stepwise estimator, or analogously by model misspecification, or deletion of variables
from the model because they are "multicollinear."
Since the mean square
error criterion of selection requires testing a hypothesis about the noncentrality parameter A in the noncentral F distribution our
is a sequential estimator,
i.~.,
estimator
the selection of the "best" estimator is
based on a test of significance.
For the two-regressor model we were able to derive a precise but
inconclusive statement of "multicollinearity" since we should delete x
02
2
whenever r 12 is bounded by [1 - S~ ~x~ , 1].
deleting x
2
is found to be more
,\;
'\
MSE criterion) since b
l
statements in terms of
restric~ive
Also the usual test for
than required (under the
should be used when ISzl ~ 'V(b )
~+1.1,2, ... ,k'
2
A
Analogous
2
Sk+l and V(b k+l ) are derived when
we are interested in deciding to exclude only one regressor in the general
partitioned linear model of full rank.
,
L
I
The biased restricted estimator is (X1Xl)--X1Y with mean !l + A8 2 '
covariance matrix of
,
02(X~Xl)-l and mean square error matrix
,
+ A ~2~2 A.
'v
The full estimator is £1 - Ab 2 with mean
~1'
71
covariance matrix and mean square error matrix
cr2(X~Xl)-1 +
cr 2AF;lA'
since it is unbiased.
Under the criterion of minimizing the variance of any arbitrary
linear function, the restricted estimator is better than the full estimator.
Consequently the restricted estimator is better in terms of the
generalized variance ratio.
Under the criterion of minimizing the mean
square error of any arbitrary linear function, the restricted estimator
of f
l
is better than the full estimator if the noncentrality parameter
is less or equal to one-half and the mean square error of a linear
function criterion implies what can be called the generalized mean square
ratio criterion.
The condition that the noncentrality parameter is less
than or equal to one-half is a sufficient and necessary condition for
deleting X from the true model if m<K and rank of A is m. This is not
2
too optimistic to expect in most econometrics problems. When we are
interested in deciding whether to exclude only one regressor the mean
square error criterion is necessary and sufficient since the criterion
•
simplifies to a scalar.
Since the family of noncentral F densities have the monotone
likelihood ratio property in T(u)
,
collinearity,"
.!..~.,
A
~~
,
mu
=- =w
q+mu
a U.M.P. test for "multi-
is provided by using the noncentral F
distribution or equivalently the noncentral beta distribution.
The
basic difference between the usual test and the "multicollinearity"
test is the level of significance at which the corresponding hypotheses
are tested.
In Chapter V, "multicollinearity" test critical points for
one degree of freedom in the numerator in terms of the beta and F
distributions are presented.
Also the evaluation of the power for
selected values of the noncentrality parameter are given.
From this
72
table we can readily see the relationship between the levels of
significance for the H : A =
o
a
1
versus the H : A .:::. 2.
o
The minimum variance
unbiased estimator of the noncentrality parameter is um(q-2) + m in
2q
2
w(q-2)
m
terms of the F variable and are 2(1-w) + 2 in terms of the beta variable.
In the regression model of two regressors used by Bancroft (1944)
the bias and mean square error of the sequential estimators of 8
1
implied
by a usual test at 5 percent, a "multicollinearity" test at 5 percent, and
a "50 percent" test for selected sample sizes, r
12
and 8 were evaluated.
2
The results seem to indicate that the bias decreases for increases in the
sample size and level of significance with corresponding ceteris paribus.
The bias increases as the "collinearity" increases, ceteris paribus, and
the bias increases and then decreases as 8
2
increases, ceteris paribus.
For increases in sample size, "collinearity" and 8 with corresponding
2
ceteris paribus the behavior of the mean square error of the sequential
estimator for 8 is similar to the behavior of the bias.
1
For increases
iJ the level of significance, ceteris paribus, the behavior of the mean
square error depends on the value of the noncentrality parameter,
if A
<
i.~.,
I the mean square
~ the mean square increases and for A >
decreases, eventually, attaining maximums for noncentrality parameters
,
cfose to one-half •
..
Suggestions for Further Research
In this thesis we have used a very special restriction on the
parameter space.
•
Since our interest was to dec1de between
'V
~l
and b
J
the implicit restriction was
~2
= Q.
l
An investigation about the
properties of the R.L.S.E. and F.L.S.E.
under the mean square error
criterion for a more general set of restrictions seems natural.
73
Analogously to experimental designs where an optimum design matrix is of
great interest, a search for optimal restrictions should be carried out
especially for the case where only one linear combination of the
regression coefficients is studied.
The properties of the residual estimator of !l in terms of the mean
square error criterion should be analyzed.
!l is obtained by regressing Y on X
z
Xl'
The residual estimator of
and then regressing the residual on
Following the mean square error criterion a "best" estimator for !l
should be investigated without restricting ourselves to least squares
estimators.
Also it would be interesting to apply the mean square error
criterion in estimating linear combinations of regression coefficients
in error in variables models as well as in simultaneous equation models
where a linear combination of the structural parameters is of interest.
A Bayesian approach for estimating the noncentrality parameter and
an investigation of the distributional properties of the minimum
varian~e
unbiased estimator of the noncentrality parameter seem relevant.
"Multicollinearity" critical points tables should be provided for
more degrees of freedom in the numerator.
Due to the great sensitivity
of critical points evaluation, greater precision should be required in
their computation,
The empirical results obtained on the mean square error of
sequential estimators for
el
in the two-regressor model raise the
possibility of an optimum choice of a for a given objective function.
74
LIST OF REFERENCES
Anderson, T. W. 1964. An Introduction to Multivariate Statistical
Analysis. John Wiley and Sons, Inc., New York.
Bancroft, T. A. 1944. On biases in estimation due to the use of
preliminary tests of significance. Annals of Mathematical
Statistics 15:190-204.
Freund, R. J., R. W. Vail, and C. W. Clunies-Ross. 1961. Residual
analysis. Journal of the American Statistical Association 56:98-104.
Gautschi, W. 1964. Algorithm 222: Incomplete beta function ratios.
Communications of the Association for Computing Machinery 7:143-144.
Goldberger, A. S. 1961.
specification error.
56:998-1000.
Stepwise least squares: Residual analysis and
Journal of the American Statistical Association
Goldberger, A. S.
New York.
Econometric Theory.
1964.
John Wiley and Sons, Inc.,
Goldberger, A. S. and D. B. Jockems. 1961. Note on stepwise least
squares. Journal of the American Statistical Association 56:105-110.
Graybill, F. A. 1961. Introduction to Linear Statistical Models.
McGraw-Hill Book Company, Inc., New York.
Griliches, Z. 1957. Specification bias in estimates of production
functions. Journal of Farm Economics 39:8-20.
Gun, A. M. 1965. Monotonicity of incomplete beta and gamma functions.
Journal of Science and Engineering Research 9:329-331.
Johnston, J. 1963.
Inc., New York.
,
Econometric Methods.
McGraw-Hill Book Company,
Kabe, D. G. 1963. Stepwise multivariate linear regression.
the American Statistical Association 58:770-773.
Kaplan, W. 1959. Advanced Calculus.
Reading, Massachusetts.
Journal of
Addison-Wesley Publishing Company,
Lehmann, E. L. 1964. Testing Statistical Hypotheses.
Sons, Inc., New York.
John Wiley and
Molina, E. C. 1947. Poisson's Exponential Binomial Limit.
Nostrand Company, Inc., New York.
D. Van
75
LIST OF REFERENCES (continued)
Patnaik, P. B. 1949. The non-central x2 -and F- distributions and their
applications. Biometrika 36:202-232.
Rao, C. R. 1965. Linear Statistical Inference and Its Applications.
John Wiley and Sons, Inc., New York.
Tang, P. C. 1938. The power function of the analysis of variance tests
with tables and illustrations of their use. Statistical Research
Memoirs 11:126-149.
Theil, H. 1961. Economic Forecasts and Policy.
Publishing Company, Amsterdam.
North-Holland
Wallace, T. D. 1964. Efficiencies for stepwise regressions.
the American Statistical Association 59:1179-1182.
Zyskind, G. 1963. A note on residual analysis.
Statistical Association 58:1125-1132.
Journal of
Journal of the American
76
A P PEN DIe E S
,
77
Appendix A. Computer Program for the
"Multicollinearity" Test Critical Points
MESSAGE
00100
00200
00300
00400
00500
00600
00025
00004
00014
00015
\
00016
00017
00006
FORTRAN LISTING
1410-FO-970
DlMENSIONA (19),B(19),BETA(6l),U(4),V(4),W(4)
FORMAT(6X,4F9.2,4X,4F9.2)
FORMAT(32HSUBROUTINE ERROR,MACHINE STOPPED)
FORMAT(F6.0,4F9.4,8X,4F9.4)
FORMAT(I3)
FORMAT(lH1,7HF1 IS 1)
FORMAT(314,F6.4)
M=l
WRITE(3,500)
EPS=.5E-6
GM=M
V(1)=.05
V(2)=.1
V(3)=.25
V(4)=.5
WRITE(3,100)V(1),V(2),V(3),V(4),V(1),V(2),V(3),V(4)
READ(1,400)N
GN=N
Q=.5*GN
AZERO=EXP(-.5)
DOlOJ=1,4
SLIMIT=l. -V(J)
XA=l.
Y=O.
X=.5
P=.5*GM
CALLDRlVER(X,P,Q,EPS,NMAX,BETA,KEY)
IF(KEY)14,15,14
WRITE(3,200)
STOP
BZERO=BETA(NMAX+l)
A(1)=.5
S=AZERO*BZERO
P=P+l.
CALLDRlVER(X,P,Q,EPS,NMAX,BETA,KEY)
IF (KEY) 16,17,16
WRITE(3,200)
STOP
B(1)=BETA(NMAX+1)
S=S+AZERO*A(l)*B(l)
D08I=2,7
DI=I
A(I)=A(I-1)*.5/DI
P=P+l.
CALLDRIVER(X,P,Q,EPS,NMAX,BETA,KEY)
IF(KEY)6,7,6
WRITE(3,200)
STOP
78
MESSAGE
,
FORTRAN LISTING
1410-FO-970
00007 B(I)=BETA(NMAX+1)
00008 S=S+A(I)*B(I)*AZERO
P=S-SLIMIT
IF(ABS(P)-.0005)2,2,5
00005 IF(P)1,2,3
00001 Y=X
X=(XA+X)/2.
GOT04
00003 XA=X
X=(Y+X)/2.
GOT04
00002 W(J)=X
WRITE(2,600)M,N,J,X
00010 U(J)=GN*X/«l.-X)*GM)
WRITE(3,300)GN.W(1) ,W(2) ,W(3) ,W(4) ,U(l) ,U(2) ,U(3),U(4}
GOT025
STOP
END
79
Appendix B. Computer Program for the
Evaluation of Power Values
MESSAGE
..
00100
OOZOO
00300
00500
00600
OOOOZ
00001
00018
00019
OOOZO
OOOZl
FORTRAN LISTING
1410-FO-970
DIMENSIONA (Z7),B(Z7),C(9),BETA(61),rD(9)
FORMAT(33H F1
FZ
J
LAMBDA
POWER)
FORMAT(3ZHSUBROUTINE ERROR,MACHINE STOPPED)
FORMAT(3I5,ZF10.4)
FORMAT(F7.4,IZ)
FORMAT(3I4,F6.4)
EPS=.5E-6
WRITE(3,100)
DOZI=1,9
READ(1,500)C(I),ID(I)
READ(1,600)M,N,K,X
D013J=1,9
ALAMBA=C(J)
JJ=ID(J)
GM=M
GN=N
Q=.5*GN
P=.5*GM
AZERO=EXP(-ALAMBA)
CALLDRIVER(X,P,Q,EPS,NMAX,BETA,KEY)
IF (KEY) 18, 19 , 18
WRITE(3,ZOO)
STOP
BZERO=BETA(NMAX+1)
S=AZERO*BZERO
P=P+l.
A(l)=ALAMBA
CALLDRIVER(X,P,Q,EPS,NMAX,BETA,KEY)
IF(KEY)ZO,Zl,ZO
WRITE(3,ZOO)
STOP
B(1)=BETA(NMAX+1)
S=S+AZERO*B (1)
D012III=Z,JJ
DI=III
A(III)=A(III~l)*ALAMBA/DI
..
P=P+l.
CALLDRIVER(X,P, Q, EPS,NMAX,BETA,KEY)
IF(KEY)ZZ,Z3,ZZ
OOOZZ WRITE(3,ZOO)
STOP
000Z3 S=S+AZERO*A(III)*B(III)
T=l.-S
WRITE(3,300)M,N~K,ALAMBA,T
00013 CONTINUE
GOT01
STOP
END
80
Appendix C. Fortran Statement for the Evaluation of
Sequential Estimators Bias and Mean Squares
MESSAGE
...
00100
00200
00300
00400
00700
00800
00900
00950
00013
00003
00004
00005
00006
00007
00002
FORTRAN LISTING
l4l0-FO-970
DIMENSIONBETA(220),A(2l6),B(2l6),U(4),V(4)
FORMAT(I2,F2.l,F4.4,2I3,F2.2)
FORMAT(32HSUBROUTINE ERROR,MACHINE STOPPED)
FORMAT(40Hl BIAS
MEAN SQUARE ERROR)
FORMAT(22H.2
.4
.6
.8)
FORMAT(F4.l,F8.4,18X,F9.4)
FORMAT(F4.l,6X,F8.4,18X,F9.4)
FORMAT(F4.l,12X,F8.4,18X,F9.4)
FORMAT(F4.l,18X,F8.4,18X,F9.4)
EPS=.5E-6
V(1)=.2
V(2)=.4
V(3)=.6
V(4)=.8
WRITE(3,300)
WRITE(3,400)
READ(l,lOO)N,BE,X,IA,IO,RHO
S=O.
T=O.
AN=N
P=(AN-3.) /2.
Q=1.5
K=RHO*5.
CA=RHO*BE
CB=l. / (AN-l.)
CC=CA*CA
CD=-2.*CC
CE=RHO*RHO/«AN-l.)*(l.-RHO*RHO»
ALPHA=(1.-RHO*RHO)*(AN-l.)*BE*BE/2.
A(l)=ALPHA
AZERO=EXP(-ALPHA)
IF(IA-3)3,3,2
CALLDRIVER(X,P,Q,EPS,NMAX,BETA,KEY)
IF(KEY)4,5,4
WRITE(3,200)
STOP
BZERO=BETA(NMAX+l)
QaQ+l.
CALLDRIVER(X,P,Q,EPS,NMAX,BETA,KEY)
IF(KEY)6,7,6
WRITE(3,200)
STOP
B(l)=BETA(NMAX+l)
S=AZERO*(BZERo+A(l)*B(l»
T=AZERO*A(l)*B(l)
IA-2
GOT08
IAM=IA-l
81
MESSAGE
00009
00008
00011
,
00012
00010
00015
00016
00017
00018
00014
FORTRAN LISTING
D09I=2, lAM
DI=I
A(I)=A(I-1)*ALPHA/DI
AI=IA
Q=1.5+AI
D010I=IA,IO
DI=I
A(I)=A(I-1)*ALPHA/DI
CALLDRlVER(X,P,Q,EPS,NMAX,BETA,KEY)
IF(KEY) 11,12,11
WRITE(3,200)
STOP
B(I)=BETA(NMAX+1)
Z=A(I)*B(I)*AZERO
Q=Q+1.
S=S+Z
T=T+Z*DI
BIAS=CA*(l.-S)
SSQE=CB+CC+(CD+CE)*S+2.*CE*T
GOTO(15,16,17,18),K
WRITE(3,700)BE,BIAS,SSQE
GOT013
WRITE(3,800)BE,BIAS,SSQE
GOT013
WRITE(3,900)BE,BIAS,SSQE
GOT013
WRITE(3,950)BE,BIAS,SSQE
GOT013
CONTINUE
STOP
END
1410-FO-970