Helms, R. W.; (1969). "A procedure for the selection of terms and estimation of coefficients in a response surface model with integration-orthogonal terms."

I
..t
I
I
I
I
I
·1eJ
I
I
I
I
I
I
I
A PROCEDURE FOR THE SELECTION OF TERMS AND ESTIMATION
OF COEFFICIENTS IN A RESPONSE SURFACE MODEL WITH
INTEGRATION-ORTHOGONAL TERMS
by
Ronald W. Helms*, R. J. Hader**,
and A.R. l'1ansonio'~
*Department of Biostatistics
University of North Carolina
**Department of Experimental Statistics
North Carolina State University
Universi ty of North Carolina
Institute of Statistics Mimeo Series No. 646
August 1969
I
I
I
I
I
I
I
I
I
,,
~
•
ABSTRACT
HELMS, RONALD WILLIAM.
A Procedure for the Selection of Terms
and Estimation of Coefficients in a Response Surface Model with
Integration-Orthogonal Terms.
(Under the direction of ROBERT
JOHN HADER and ALLISON RAY MANSON.)
General linear approximation theory and general linear estimation theory are combined to produce a generalization of the Minimum
Bias Estimator (Karson, et al., 1969) in which a linear model is
transformed to an equivalent representation in terms of integrationorthogonal functions.
The integrated mean square error of an esti-
v
mator n for a response function n is shown to have the form
v
=E
IMSE(n)
B.•
J
v
2
v
E(Bj-B j ) , where B. is the estimator of the coefficient
J
One can thus consider the deletion or inclusion of terms in an
A term can be l:deleted ll from a model
estimation model one ata time.
v
by setting the corresponding coefficient estimator to zero (B.=O);
J
various procedures are considered for using the least squares esti-
I
I
I
I
V
m~tor
A
A
B.=B. for certain sets of B.-values and zero (deletion of the
v
J
J
A
J
term; B = 0) for other sets of B.-values.
j
J
The expected value, bias,
and mean square error are derived for each estimator. Techniques
v
are given for the selection of the region on which B = 0 when prior
j
information is available about B in the form of a prior distribution.
j
I
I.
I
I
I
I
I
I
I
lit)
I
7
I
I
I
I
I
~
I
ii
BIOGRAPHY
The author was born October 30, 1941, in Charlotte, North
Carolina.
Tennessee.
He was reared primarily in Charlotte and in Oak Ridge,
He graduated from Oak Ridge High School.
He received the Bachelor of Arts degree with a major in Mathematics from the University of Tennessee in 1963.
After working as a
laboratory assistant and programmer at the Atomic Energy Commission
Agricultural Research Laboratory in Oak Ridge, he accepted a graduate
assistantship at the Computation Center of the University of Tennessee.
He received the Master of Arts degree in Mathematics in 1966.
In
September, 1966, he enrolled in the Graduate School of North Carolina
State University.
After two semesters of study he was awarded a
National Aeronautics and Space Administration traineeship.
due ted into Phi Kappa Phi Honor Society in 1968.
He was in-
He accepted a posi-
tion as instructor with the Biostatistics Department of the University
of North Carolina in August, 1968, and has accepted the position of
Assistant Professor effective autumn 1969.
The author married Miss Mary Irene Wagner, of Knoxville,
Tennessee, in 1963; they have no children.
I
,I
I
I
I
I
I
I
I
-
iii
ACKNOWLEDGHENTS
The author wishes to express his appreciation to all who have
contributed assistance in the preparation of this study.
Appreciation
is particularly extended to Dr. R. J. Hader and Dr. A. R. Manson for
their continued guidance and to the other members of the Advisory
Committee, Professors J. W. Bishir, C. P. Quesenberry, and R. G. D.
Steel, for their aid.
A special note of appreciation is extended to
1
Dr. James E. Grizzle of the University of North Carolina for continual
moral and intuitive support throughout the course of this study.
The computer time required for this study was granted by the
Computation Center of North Carolina State University, which is sup-
I
I
I
ported in part by a grant from the National Science Foundation.
To Mrs. Gay Goss and Hrs. Mary Donelan, who had the unenviable
task of typing final and rough drafts, respectively, of this thesis,
the author expresses
~is
special thanks for a job well done.
ation is also given to Mrs. Mary Flanagan who typed much of the rough
I
draft and helped with the figures.
I
I
his appreciation for the continual support of his wife, Hary.
I
Appreci-
Finally, it is not possible for the author to express adequately
I
I.
I
I
I
I
I
I
I
•
TABLE OF CONTENTS
Page
LIST OF TABLES .
vi
LIST OF FIGURES.
vii
1.
INTRODUCTION
1
2.
REVIEW OF LITERATURE
4
3.
MINIMUM BIAS ESTIMATION WITH INTEGRATION-ORTHOGONAL
FUNCTIONS.
. . . . . • . . . . . . . . . . .
3.1
•I
I
I
I
I
I
iv
Least-Squares Linear Approximation with Orthogonal
Functions
. . . . . . . . . . .
12
3.1.1
3.1.2
12
16
3.1. 3
3.2
3.2.3
Definitions and Notation . . . . .
Best Linear Unbiased Estimation of the
Best Approximating Function. . .
Equivalence of the Minimum Bias Estimator
and BLUE of the Best Approximating
Function . . .
. . • . . . . . .,.
A PROCEDURE FOR SELECTION OF TERMS IN THE MODEL WITH
MINIMUM BIAS ESTIMATION AND THE INTEGPATED MEAN
SQUARE ERROR CRITERION . . . . . . . . . . . . . .
4.1
Minimization of Integrated Mean Square Error with
Respect to Choice of Terms in the Model,
Parameters Known.
4.1.1
4.1.2
4.1. 3
I
Some Basic Theory from Linear Algebra.
The Approximation Theorem. . . . . . .
Some Results for Integration-Orthogonal
Polynomials . . . . . . • . . . • . . .
Minimum Bias Estimation: Best Linear Unbiased
Estimation of the Best Approximating Function
3.2.1
3.2.2
4.
11
Experimental Model
. . . .
An Example . •
Results Useful for Evaluating Integrals.
22
31
31
36
37
41
43
45
46
50
I
I.
I
I
v
Page
4.2
Tests . . .
4.2.1
4.2.2
I
I
I
I
I
lit
I
I
I
I
I
I
I
Properties of the Estimators Based on Two-Tail
4.3
. . • .
Properties with
Properties with
5.
Known .
Unknown •
Estimators Based on One-Tail Tests . .
57
57
64
81
4.3.1
Properties of the One-Tail Estimators When
. . . . . . . . . . . . . . . .
84
4.3.2
Properties of One-Tail Estimators When
Variance is Unknown. .
. . . • .
85
(J~ is Known.
4.4
2
0
2
0
. . . . .
Generalized Estimators . • • • •
90
SaXE TECHNIQUES FOR THE SELECTION OF CUTOFF POINTS
112
5.1
The Bayes Decision Procedure.
114
5.2
The Posterior C=l Procedure •
115
5.3
Cutoff Point Selection with a Normal Prior. .
116
5.4
Cutoff Point Selection with a Uniform Prior.
122
6.
SU}fi1A,RY.
7.
LIST OF REFERENCES
• • • • • •
125
130
I
~.
I
I
I
I
I
I
I
--I
I
I
I
I
I
I
vi
LIST OF TABLES
Page
5.1
A comparison of the expected values, with respect to a
N(~, t 2 ) prior, of the Mean Square Error function for
the Bayes decision procedure and the Posterior C=l
Pro cedure . . . . . . . . .
0
•
•
•
•
•
•
e
•
a
•
•
•
•
'"
•
•
120
I
I.
I
I
I
I
I
I
I
•I
I
I
I
I
I
~
I
vii
LIST OF FIGURES
Page
4.1
4.2
e
Mean Square Error of the two-tail estimator
= S/s.d.(S)
as a function of 8 and C; 0 2 assumed known • • • • . • • •
Absolute value of the bias of the two-tail estimator
= S/s.d.(S) as a function of 8 and C; 0 2 assumed
e
known
.
e
•
•
••
•
•
•
•
•
•
•
•
•
•
(I
v
4.3
4.7
4.8
63
•
A
V
······
······
······
v
······
A
75
Absolute
value of the bias of the two-tail estimator
v
v
8 = f3/s.d. «(3) as a function of 8, C and v = 2, 0 2
estimated
76
Absolute
value of the bias of the two-tail estimator
v
v
8 = f3/s.d.(f3) as a function of 8, C and v = 5, 0 2
estimated
77
Absolute
value of the bias of the two-tail estimator
v
v
8 = f3/s.d.(f3) as a function of 8, C and v = 10, 02.
estimated
78
Absolute
value of the bias of the two-tail estimator
v
v
8 = f3/s.d.(f3) as a function of 8, C and V = 25, 02.
estimated
79
Absolute
value of the bias of the two-tail estimator
v
v
8 = f3/s.d.(f3) as a function of 8, C and v = 50, 0 2
estimated
80
······
A
A
· · ·· · · ······ · ······ ·····
4.10
A
·········· ··············
4.11
A
························
4.12
74
Mean Square Error of the two-tail estimator
= f3/s.d.(f3)
as a function of 8, C and v = 50, 0 2 estimated
························
4.9
73
A
Mean Square Error of the two-tail estimator 8 = f3/s.d. (13)
as a function of 8, C and v = 25, 0 2 estimated
e
72
A
V
v
71
A
Mean Square Error of the two-tail estimator 8 = f3/s .d. (13)
as a function of 8, C and v = 10, 0 2 estimated
v
4.6
•
V
Mean Square Error of the two-tail estimator 8 = f3/s.d.(f3)
as a function of 8, C and V= 5, 0 2 estimated
v
4.5
•
Mean Square Error of the two-tail estimator 8 = f3/s. d. (13)
as a function of 8, C and v = 2~ 0 2 estimated
v
4.4
61
A
························
I
I.
I
I
I
I
,I
viii
LIST OF FIGURES (continued)
v
4.13
4.14
--I
I
I
I
I
I
~
I
Mean Square Error of the one-tail estimator 8
as a function of 8 and C; a 2 assumed known
4.15
V
86
0
•
II
•
•
•
0
•
f)
~
•
87
e
Mean Square Error of the one-tail estimator 8 = S/s.d.(~)
as a function of 8, C and V = 2, a 2 estimated
,
.. ..
e
V
4.19
4.20
Ab~oluse va1u~
= S/s.d.(S)
0
e.
95
•
e
••
•••
G
•
GO
•
•
\I
96
e
of the bias of the one-tail estimator
2
V = 5, a
Ab~oluse va1u~
Ab~oluse va1u~
8
= S/s.d.(S)
8
G
II)
•
•
ell.
97
I)
of the bias of the one-tail estimator
as a function of 8, C and V = 10, a 2
estimated . . .
. .
fl.
••
0
•
•
•
•
e
•
•
98
til
99
Ab~oluse va1u~
of the bias of the one-tail estimator
8 = S/s.d.(S) as a function of 8, C and V = 25, a 2
es timated . . . . . .
II
•
0
•
II
1&
•
•
•
•
•
e
Absolute value of the bias of the one-tail estimator
8 = ~/s.d.(S) as a function of 8, C and V = 50, a2
estimated
e
•
e
•
•
II
4.25
94
A
of the bias of the one-tail estimator
as a function of 8, C and V = 2, a 2
0
4.24
A
e
8 = S/s.d. (13) as a function of 8, C and
es t ima ted .
e
•
e
e
•
•
4.23
V
93
Mean Square Error of the one-tail estimator
= S/s.d.(S)
as a function of 8, C and V = 50, a 2 estimated • • • •
es tima ted
4.22
A
Mean Square Error of the one-tail estimator 8 = S/s.d.(S)
as a function of 8, C and V = 25, a 2 estimated • • • •
8
4.21
92
Mean Square Error of the one-tail estimator 8 = S/s.d.(S)
as a function of 8, C and V = 10, a 2 estimated • • • •
v
4.18
91
Mean Square Error of the one-tail estimator
= S/s.d.(S)
as a function of 8, C and V = 5, a 2 estimated • • • • •
v
4.17
A
= S/s.d.(S)
Absolute value of the bias of the one-tail estimator
8 = S/s.d.(S) as a function of 8 and C; a 2 assumed
known .
4.16
I
I
Page
•
•
•
0
0
•
•
•
0
•
•
Mean Square Error of the generalized estimator
8 = 1A(8) + IB(8) , variance assumed known
°
e
100
II
e
•
•
•
•
e
•
106
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
~
I
ix
LIST OF FIGURES Ccontinued)
4.26
4.27
4.28
Mean
Page
SquareAErro~
e= 0
of ~he generalized estimator
lA(6) + 6 la(6) variance assumed known
•••••••
107
••••
108
Absolute value of the bias of the generalized estimator
= a lA dh +
lB ce) variance assumed known •• • •
109
Absolute value of the bias of the generalized estimator
= a lAce) + e lBce) variance assumed known • • • •
110
Absolute
value of the bias of the generalized estimator
v
6 = a 1 (6) + 6 1 (6) variance assumed known
III
Configurations of qce) for various combinations of prior
distribution parameters • • • • • • • • • • • • • • • •
118
Mean
SquareAErro~
e= 0
of ~he generalized estimator
lA(6) + 6 lB(6) variance assumed known
e
e
e
4.30
A
A
5.1
A
A
B
I
I.
I
I
I
I
I
I
I
Ie
I
I
~I
~I
[.
I
I
I·
I
1. INTRODUCTION
In many experimental situations it is convenient to suppose that
a functional relationship,
exists between a response nand m continuous variables xl' ••• 'x •
m
When the form of the function (1.1) is known, the object of the experiment may be to estimate the parameter vector, 8.
In many cases, how-
ever, the form of the function is unknown, or, if known, may not be
linear in the parameters.
In such cases the objective may be to esti-
mate an approximation of the function
g(~,~)
over some given region
R
of the m-dimensional space of the x-variables.
If the response function is linear in the parameters,
n ..
g(~.;~)
=
m
L
i=l
8 .f (x)
~ i -
(1.2)
where the f. are known functions (polynomials, for example), then,
~
under certain assumptions about the nature of the experimental errors,
it is possible to apply traditional general linear model (least
squares) estimation theory to obtain Best (minimum variance) Linear
~
Unbiased Estimators for the parameters
and the function n.
However,
if the assumed model (1.2) is inadequate, in the sense that the true
model is of the form
m+q
n =
g(x;~)
=
L
i=l
8 f. (x)
i
~
-
(1.3)
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I-
I
2
then the estimators for the parameters
biased.
~
and for the function are
In the presence of bias the use of the minimum variance cri-
terion for the selection of an estimation procedure is subject to
question.
Since bias attributable to an inadequate model is usually not
constant over the region of interest, R, the mean square error of an
estimator, n, integrated over the region R, viz.,
I
IMSE(n) =
2
A
E{
[n (x) - n (~) ] } d~
(1.4)
R
would appear to be a good criterion for use in the selection of an
estimator.
In this thesis the IMSE is used as a criterion for the
comparison of estimators in. inadequate model situations.
It is useful to notice that the IMSE can be decomposed into bias
and variance components:
A
IMSE(n)
2
A
E [n (x) J
- n (x)
}
dx +
J Var[n(xJidx
(1.5)
R
Box and Draper (1959) showed that in considering the selection of a
design, the bias contribution to integrated mean square error generally far outweighed the variance contribution.
Motivated by this re-
suIt, Karson et a1.(1969) derived a Minimum Bias Estimator for n by
the technique of first considering a class of estimators which minimize the bias contribution in(1.5) and then selecting the estimator
d
from that class which has minimum variance.
From (1.5) one can see
A
that
E{n(~)}
is an approximating function for
n. Thus, the minimum
bias estimator is the minimum variance estimator of the minimum bias
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
3
approximating function.
Linear approximation with respect to the in-
tegrated squared error criterion is considered in Section 3 of this
thesis.
Some underlying connections are established between least
squares estimation based on integration-orthogonal functions and the
minimum bias estimator.
An important problem in minimum bias estimation and in many
other regression problems is the selection of the approximating function,
i~.,
selection of the independent variables or terms to be in-
cluded in the analysis.
For· the case in which one is using minimum
bias estimation based on integration-orthogonal functions ("independent variables"), a particularly simple term-by-term procedure can be
used.
This procedure and its properties are derived and discussed
in Section 4.
Techniques for selection of the parameters of the esti-
mation procedure are discussed in Section 5.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
2.
REVIEW OF LITERATURE
Box and Draper (1959) motivated the study of minimum bias estimation.
They investigated the problem of finding optimal designs for
least squares estimation in the following setting.
It is desired to
fit a response function n(x), assumed to be a polynomial of degree d
A
over the region of interest R, by a polynomial y(x) of degree d
The polynomial
y(~)
is a least squares estimator.
lower degree than n (~), E(y(x»
l
Z
< d "
2
Since y(?E) is of
" n (x) and both variance error (due to
sampling error) and bias error must be considered in the selection of
an estimation procedure and the construction of designs.
Define the
Integrated Mean Square Error of y as
A
A
IMSE(y) =
E{y(~) _ n(~)}2 dx
f
R
A
=
Var[y(x) ] dx +
J
R
R
=V+
f
A
{E[y(x) _ n(~)}2 dx
B.
(The V and B above differ from Box and Draper's V, B by a normalizing
scale factor.)
In their paper Box and Draper considered designs for
the situation in which n(x) is a second degree polynomial and y(x) is
a first degree polynomial.
In a later paper, Box and Draper (1963),
they extended their results to fitting a quadratic polynomial, y(x),
to a cubic response, n(x).
In both studies they found that the
I
I.
I
I
I
I
I
I
I
5
contribution to IMSE due to bias (B, above) far outweighed the contribution due to variance (V, .above)., in the sense that (Box and Draper,
1959, p. 622) "the optimal design in typical situations in which both
variance and bias occur. is very nearly the same as would be obtained
i f variance
~ignoredcompletelyand
the experiment designed so as
to minimize bias alone." .(The emphasis. is theirs.)
Karson, et a1.
(1967)~Hquestionedthe use
a criterion for choice of a.biasedestimator.
of minimum variance as
In the Box and Draper
setting the least squares estimators are certainly biased, although one
can construct designs to minimize the bias contribution to IMSE.
Karson, et a1. (1969), took the approach of considering design and
--
estimation as two phases of. the problem of producing estimators with
Ie
small IMSE.
I
I
I
I
I
I
I
choice of design than does the variance contribution, Karson et aL
Ie
I
Recognizing from the work of Box and Draper (1959, 1963)
that the bias contribution to IMSE exerts much more influence on the
(1969) derived a class of estimators which minimize the bias contribution to IMSE, and from that class selected the estimator with smallest
variance.
(MBE) for
The resulting estimator is called the Minimum Bias Estimator
n.
Since the MBEminimizes the bias contribution to IMSE for
any design, one is free to choose designs which minimize the variance
contribution to IMSE.
Karson, et a1. (1969) illustrated this technique
by demonstrating optimum designs for· fitting a linear polynomial (in
one variable) to a Hquadratic over a,three point design, to a quadratic
over a four point design, and to a cubic model over a four point
design.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
6
Karson, et al. (1969), assumed throughout their study the point
of view of choosing in advance the terms to be included in the fitted
1'\
y(:x:)~
model,
Specifically, they assumed that all terms of degree less
than or equal to d
the fitted model.
l
(which is specified in advance) will be included in
No consideration was given to the problem of choosing
the terms in the fitted model as a result of tests based on the data.
However, considerable work has been performed on the problem of
selection of terms in the model in the general linear model setup
(least squares estimation) and with problems involving an ordering of
the terms in the model (as in a polynomial, for example).
Draper and Smith (1967,. Chapter 6) discuss the following six
techniques "in current use" for the. selection of the "best fit":
Z
(1) comparison of the R (coefficient of determination) for all possible regressions, (2) backward elimination, (3) forward selection,
(4) stepwise regression, (5) two variations on the four methods above,
and (6) stagewise regression.
The expected values and mean square
1'\
errors of the coefficient estimators and function estimators (y) produced by these techniques apparently have not been investigated except
in special cases.
Bancroft (1944) produced an expression for the bias of the esti-
* for
mator b
Sl under the assumption that the full (true) model is
where b* is obtained by the following procedure:
1'\
1'\
(1) Obtain the usual least squares estimators (Sl,S2) for
(Sl' SZ) ;
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
(2) Use an F-test to test the hypothesis H:
alternative K:
(32
~
(32
= 0;
VB.
the
0.
(3) If the F-test is significant at the predetermined a-level,
b*
= (31;
otherwise fit the model Y
= (3lx l +
e to the data
and b* is the resulting least squares estimator for (31.
Bancroft did not consider estimators based on one-tail tests, bias of
the resulting estimator for (32' mean square errors or variances, or
integrated mean square errors.
Larson and Bancroft (1936b) considered an extension of the above
model, viz.,
with
essenti~lly
vector
~;
i
= 1,
the same procedure as above, with (3i replaced by the
2.
In this situation one performs an F-test of the
compound hypothesis H:
~2
= Q.
They assumed that the elements of the
A
least squares vector estimator,
.f,
A
wereuncorrelated.
Thus, (3* is [email protected]'
-1
A
'-;~
[email protected] is either .§.2 or Q, depending on the outcome of the test.
ticular elements of
~l
and
~2
are specified in advance.
The par-
They derived
expressions for the bias and mean square error of
under the assumption that
the errors, e, are normally and independent-
ly distributed with zero mean and variance
0
2
•
The derivation in-
cludes a derivation of the bias and mean square error of the elements
~'<
of .§.2.
In a companIon paper (Larson and Bancroft, 1963a) they con-
sidered successive tests of the hypotheses H :
j
stopping at the first hypothesis rejected.
(3j
= 0,
j
= 1,
2, .•• ,
I
I.
I
I
I
I
I
I
I
8
Gorman and Toman
(1966)~
Hocking and Leslie (1967), and
Schatzoff, et al., (1968) have presented several efficient methods for
calculating and comparing subs.ets.of all possible regressions.
these techniques are based on choosing a small subset of the independent variables which produces a great reduction in the residual sum
of squares.
Since the actual selection procedure is somewhat impre-
cise in each case, the properties of the resultant estimators are
difficult to investigate.
Toro and Wallace (1968) considered the comparison of the usual
A
~,
least squares estimator,
(Graybill, 1961) with
S*,
'/~
linear constraints H ~
A
for
~
in the general linear model
the least squares estimator subject to the
= h.
They suggested as a final estimator
*
whichever one of ..@.,.@. produces the smallest mean square error,
Ie
I
I
I
I
I
I
I
II
All
A
1:.~.,
A
if E{{(:3-.@.) (13_:-~)'} - _E{ <S*-.@.} (.@.*-.@.)'} is positive definite, use ..@.*
as the estimator.
They also derived the liMP test of the hypothesis
that the above condition holds.
However, they did not derive the
properties of the resulting estimator.
A number of procedures have been proposed for problems in which
there is a natural ordering of the coefficients, as in polynomials of
a single variable, for example.
Graybill (1961) discussed a procedure
for finding the degree of a polynomial which "describes" a set of
data, and proposed the use of summation-orthogonal polynomials (which
yield uncorrelated coefficients) as a labor saving technique.
not discuss the properties of the resultant estimators.
He did
Hoel (1968)
proposed a sequential procedure for determining the degree of a polynomial, when the model is written in terms of summation-orthogonal
I
I.
I
I
I
I
I
I
I
9
polynomials.
A featureoLHoel' s· p.roe-edure (and other sequential de-
cis ion procedures).is thatqitallowscontrol of the probability of
both types of error in hypothesis testing.
Sclove (1968) discussedes.timators which have uniformly smaller
mean square error than the usual least squares estimator when there
are more than three terms in the model.
He also discussed a stepwise
procedure for successively testing the hypotheses H :
j
j
= p,
Sj
= 0,
p-l, ••• , (until one. hypothesis is rejected), for the case in
which the terms are
ordered~
as in polynomial models in one variable.
Although he referred to proofsthat.the proposed estimators have uniformly smaller mean squareerrorthan.least squares estimators, he did
not evaluate the mean· square errors, noting that (Sclove, 1968,p. 599),
Ie
"Computation of.[the mean· square error] seems difficult."
I
I
I
I
I
I
I
I-
to be included in themdoeluses.apoint..".wise criterion to guide the
I
Each of the above mention:edPl"ocedures for selection of terms
selection.
The criteria are,,'.'point..,.wise" or"1ocal" in the sense that
they are based solely on information obtained at the experimental
points; no consideration. is. given: to how.well the resulting function
es tima torfi ts the true function,·over. theq region of interest.
In con-
trast, the IMSE criterion. may berefe·rred to as a "global" criterion.
In addition, the properties
of.some~of
the resultant estimators above
are difficult to eva1uate;.some.ofthe estimators are difficult to
evaluate in comparison. to
the.~east.squares
estimator.
Michaels (1969) considered the problems of estimation and design
for linear versus quadratic. estimation (for an assumed quadratic model).
for univariate polynomials over the scaled region of interest,
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
Ie
I
10
R
=
[-1,1], using the integrated mean square error criterion.
Con-
sidering the expectation of the IMSE function with respect to the prior
Z
density of Sz' the coefficient of x , he found that for normal or uniform prior densities one would optimally choose between least-squares
linear estimation or least-squares quadratic (full model) estimation
by comparing E[S~] (with respect to the prior) and
0;,
the variance of
A
Sz.
Minimum Bias Estimation of. the linear model was considered, but
for all Sz values the MBEwas dominated by either the linear or quadratic (full model) least-squares procedure.
Michaels (1969) also in-
vestigated the selection of an optimum des ign.,..es timation procedure with
respect to the IMSE criterion and-various prior distributions for SZ.
In Chapter 4 of this. paper. a procedure is presented which allows
one to select terms in the model with respect to theIMSE criterion.
All possible "regressions" are considered; the calculations are
straightforward (only the least squares estimator and estimated variance need be calculated),.and-thebias, mean square error and IMSE of
the estimates are.derivedand,computed.
Moreover, the resulting estimator for the response function,
n,
is, conditional upon the model chosen, the Minimum Bias Estimator for
n.
Chapter 3 of this paper presents a generalization of the Minimum
Bias Estimation scheme for quite general weighting functions, over a
general region of interest, and for any.finite set of continuous,
linearly independent functions in several variables.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
3.
MINIMUM BIAS ESTIMATION WITH INTEGRATION-ORTHOGONAL FUNCTIONS
The material in this chapter is mostly expository in nature; the
theory of linear approximation and the theory of linear estimation are
standard material in the fields of numerical analysis and statistics,
respectively.
A summary of some pertinent sections of the two bodies
of theory is presented here because the combination apparently does not
appear elsewhere and because the combined results lead to an elegant
and useful description of minimum bias estimation.
In the first subsection some results are presented for vector
spaces with inner products.
Examples are given for finite dimensional
Euclidean vector spaces, En, and for infinite dimensional vector spaces
of real, continuous, square-integrable functions.
Orthogonal functions
are defined for each of these spaces, and are illustrated with orthogonal polynomials in several variables.
Since the primary application
of this theory involves the use of integration-orthogonal polynomials,
some properties are derived for these functions, including algorithms
for their construction and for conversions from standard polynomial
models to and from orthogonal.polynomial models.
In the second subsection, a model is defined for a response
function as a linear combination of orthogonal functions (polynomials,
for example).
Standard least squares estimation theory is applied. to
find estimators for the .. responsefunction.and for "best" approximations
to the response function.
The equivalence of minimum bias estimation
and least squares estimation of the best approximating function is
I
I.
I
I
I
I·
1
I
I
Ie
I
I
I
I
I
I
I
,.
I
12
established, and some of the properties of minimum bias estimators
which follow from this equivalence are given.
3.1.
Least-Squares Linear Approximation with Orthogonal Functions
In this section approximation of functions which are linear in the
,.
parameters is considered.
The sense in which the approximation is
"least-squares" will be explained in the development of the theory.
Orthogonal functions, particularly orthogonal polynomials in one
variable, are basic tools in numerical analysis and in linear approximation theory in particular.
Treatments of least-squares linear ap-
proximation of a function of one variable are given in several introductory numerical analysis texts, such as Todd (1962), Hildebrand
(1956) and Handscomb (1966) •.. The present treatment will assume the
function being approximated may be-a function of several variables.
In introductory texts the region, R,over which the function is
to be approximated is usually.assumedto be the interval [-1,1] for
univariate approximations and the "unit cube" (the same interval in
each variable) for multivariate approximations.
Our assumptions about
R are somewhat more general.
3.1.1.
Some Basic Theory from Linear Algebra
Throughout this section general vectors (unspecified vector
space) will be denoted by lower case Latin letters, with or without
subscripts; most will be from the end of the alphabet (x, y, z).
Scalars will also be lower case Latin letters, with or without subscripts, mostly from the first of the alphabet (a, b, c).
from finite dimensional vector spaces, such as En, will be
Vectors
I
I
e
I
I
I
I
I
I
I
Itt
I
I
13
~);
underscored (x,
scripts, will denote vectors from infinite dimensional vector spaces
~.~.,
functions), and will not be underscored.
,e
I
The letters i, j, k,
m and n will be used for sub- and super-scripts and for limits of summation.
Definition:
Inner product.
dimensional real vector space.
Let V denote a finite or infinite
An inner product is a function, de-
I
noted ( , ), from V x V to E , the real line, such that for all x, y,
z in V and for all real constants a, the following four conditions
hold:
(i) (x,y) = (y ,x) ;
(ii) (x,x)
~
0, and =
° only if x = ° (the zero vector in V);
(iii) (ax,y) = a(x,y);
(iv) (x+y,z) = (x,z) + (y,z).
Corollary:
Let xl' ""
xm
, Yl'
· :.:, y n all be vectors in V,
and let aI' .•• , am' b l , .,., b be real constants.
n
n
b,y,) =
l:
j=l
I
I
I
I
I
the letters f, g'dh, P andq with or without sub-
Definition:
J J
m
L:
i=l
n
l: a b , (x, ,y ,) .
j=l i J 1. J
Norm based on an inner product.
above, define the norm of the vector x, denoted
II xii
= I (x ,x), or
Then
II x 11
2
With the setting
I Ixl I,
= (x, x) .
(3.1)
by:
(3.2)
It should be noted that there are norms which are not based on inner
products.
I
14
I.
I
I
I
I
I
I
Definition:
Orthogonal, orthonormal vectors.
Two vectors x, y
in V are said to. be orthogonal if .(x, y) = 0, and are said to be ortho-,
normal if they are orthogonal. and (x, x)
A set of vectors is said to be
Am
= I Ixl I =
= I Iyl
(y, y)
I = 1.
orthogonal set if all pairs. of vec-
tors in the set are orthogonal; it is said to be an orthonormal set of
vectors if it is an orthogonal set and the norm of each vector in the
set is onee
Consider two rather different vector spaces and inncer products.
Since both will be used in the following sections, the (x, y) notation
for these special inner products wilLbe altered slightlye
Let V
= En = n-dimensional
Euclidean space over the reals.
H be a real, symmetric, n x n positive definite matrix.
Let
Z
For all x,
in V, define and denote the inner product:
Ie
I
I
I
I
I
I
I
,.
I
n
n
= (~,y)
= x'HZ:::: r
i=l
r x.h .. Yj'
j=l
~ ~J
(3.3)
The S refers to summation and the H denotes the matrix of the inner
producte
A special case is for H :::: I, the n x n identity matrix.
In
this case,
This is the "usual" inner product (also referred to as the "dot product") over En.
Returning to the general case, ~ and y in En are
orthogonal with respect to the inner product SH if
SH-""(x,v) :::: _
x'Hv
"L.
= O.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
15
The norm of a vector x in En with respect to the inner product SH is
Notice the special notation for the norm with respect to SHe
n
For the second example,-let R be some non empty subset of E ...
R does not denote the real line except as a special ,case.
Let x denote
a vector in En, and let H(~)denote either a finite measure over the
Borel subsets (B) of R or a distribution function of finite variation
defined over R.
(See Halmos (1950) and Loeve (1955) for discussions
of finite measures, distribution functions of finite variation,and
integration with respect to such measures and distribution functions.)
Let V be the space of continuous functions which are square-integrable
with respect to WoveI' R, i.e., f is in V if f is continuous and
° ~ f f2 (~)
dW(~)
< +00.
(3.4)
R
Denote V by V
= L2 (R,B,W) = L2 • Functions which are equal almost
everywhere with respect to ,t-Jare .considered to be identical,
j.~.,
i f f(~) = g(~) except on a set B, suchthatW(B) = 0, then f and g
are considered to be the "same function.
function I
W
Under these conditions, the
defined by:
IW(f,g)
=
f f(x)
g(x)
dW(~)
(3.5)
R
for all f, g in L is an inner product over L (Lpeve, 1955}.'. Here,
2
2
the I denotes integration (to distinguish from the summation inner
product) and W denotes the measure.
I
16
II
I
I
I
I
I
I
There are two special cases of this inner product which are very
useful.
First, letW(x) be the Radon..,Nikodym.derivative of
respect to Lebesgue measure;
weight function over R.
w(~)
I
I
I
I
I
I
I
,I
with
is a finite, non-negative, integrable
Then (3.5) becomes
= J f(~)
IW(f,g)
g(x)
w(~) d~,
(3.6)
R
which is a multidimensional integral over the set R.
This is the
most common case and the one which will be applied most often.
The second special case of interest arises when W is a discrete
measure which assigns the weight w. to the point x., i
J.
~
= 1,
2, .•• , m.
In such a case,
IW(f,g) =
m
L.:
. 1
f(x.)
~
J.=
Ie
w'(~)
g(~.) ~7.·~
..
J.].
.
This is,of course, just a special case of (3.3).
In the general case of V = L (R, B, W), the norm will be
2
denoted:
3.1.2.
The Approximation Theorem
Theorem 3.1.
Let V be a real vector space with inner product
( , ) and associated norm
II II.
Let {xl'
finite set of orthonormal vectors from V.
x }" be a fixed,
n
Then for any y in V, the
minimum of
IIy -
n
L.:
:1.=1
bix. II
J.
(3.7)
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••
I
17
is attained if and only if
(3.8)
The number a
1
is called the Fourier coefficient of xi and y with
respect to the inner product ( , ).
Proof.
The quanti ty (3.7) is minimized i f and only if the fol-
lowing is minimized:
(y - Eb.x.,
y - Eb.x.)
;I. ;I.
;I. ].
=
(y,y) - 2Ea.b
+ Eb~J.
]. i
=
2
I Iyl /2 + E(a.-b.)2
- Ea i
J.].
All summations are over the range i
term E(a i -b )2 depends upon the bi'
t
= 1,
(3.9)
(3.9a)
2, ..• , n.
In (3.9a) only the
Being a sum of real squares, it
takes on its minimum value of zero if and only if all b.].
= a i'
i = 1, 2, ... , n.
Most of the theory of linear approximation and of linear estimation can be derived from this theorem.
The result is now extended to
include thec,9.se of orthogonal (not necessarily orthono.rmal) vectors.
These results are given in many linear algebra and numerical analysis
texts.
The approximation
n
y
=
E aix i
i=l
I
18
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
is called a "least...squares" approximation because it minimizes the
In most applications, since the x. are
square of the norm (3.9).
1
orthogonal the square of the norm is eithe:r;' a " sum of squares" or the
Th~oughout
integral of the squared error.
the remainder Of this thesis,
the least squares approximato;!" of a vector (or function) will be de~.
noted by the use of a tilda (-)
y is
the least squares approxima-
tor of y.
Corollary 1.
The Approximation Theorem for Orthogonal Vectors.
Let {zl' z2' ••• , zn} be a finite set of non... zero orthogonal vectors
from the vector space V, with inner product ( , ) and associated
norm,
II II.
Then for any y in V, the minimum of
n
/ly -
L:
j==l
c.z./I
(3.10)
J J
is attained if and only if
(y, zJ' )
c. ==
J
Proof.
(y, Z • )
..,....._-->01.._.,--_
== ----.J~2-' j = 1, 2, ... , n.
(Zj,Zj)
Ilzj II
The set of vectors Xi' i == 1, 2, •.. , n, defined by
Xi
=nz.:nl
z.
1
z., i == 1, 2, ••• , n.
1
.
is the Fourier coefficient of x.1
and 1
y; then
a.x. = c.z. and the
111
minimum of
is attained.
But
(3.11)
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I
I
'_
19
a.
ci
=~
=
I IZi l I
(y,x.)/llz.11
1
1
as was to be shown.
The following corollary establishes a result which will be usefu1 in consideration of minimum bias estimation.
Corollary 2.
The Truncation Approximation Theorem.
Let
Q = {x., j = 1, 2, •.. , m} be a fixed orthogonal set of nonzero vectors
J
from V and define
y
=
m
L: cjx.
j=l
(3.12)
J
Of.course, y E V.
where the c. are also fixed.
J
a subset of the vectors in Q.
Let n be such that
Now approximate y by
0
< n < m.
The
quantity
n
nm
c.x.
II y - L: bix. II = II L: 1 1 L: b.x.1I
1 1
i=l
1
i=1
i=l
(3.13)
is minimized if and only if
bj
Proof:
j
= 1,
= cj
' j
= 1,
By Corollary 1, the b
j
2, ..
q
n:
2, ""
n < m.
(3.14)
which minimize (3.13) satisfy, for
I
20
I.
I
I
I
I
I
I
I
Ie
=(
m
m
c,x, ,x,) =
L:
~
i=l
which implies b
j
Example.
= c ' j
j
J
~
= 1,
L: c.(x:.,x.) =
~
~ J
i=l
2, ••. , n.
As an example let V = L (R,B,W) as described above.
2
= 1,
Let Q = {f., i
~
2, ••. , m} be an orthogonal set of nonzero func-
tions from V, and let y
= f(x)
IW(fi,f j ) =
be in V.
From the orthogonality,
J fi(~)fj(x)dW(~)
R
=
Cifi~j
2
I/f i l/ >: 0
if i
=
j}.
(3.15)
Also
IW(y,f j ) =: IW(f,f j )=
f f(x)f
j
(~)dW(~)
.
I
I
I
I
I
I
K which contains the subscripts of the functions to be used in approx-
'.I
or, equivalently, which minimizes
I
R
Suppose one decides to approximate y
= f(x)
with a linear combination
of a subset Q* of the vectors (functions) in Q.
Consider the following
notation Which will be useful in a later example.
K is the set of subscripts of functions in Q.
imating y,
1..~.,
Let K = {1,2, ••• ,m}.
Let K* be the subset of
the subscripts of the functions in Q*
Then the best
linear approximation of y with respect to the set Q* , i.e., the set of
coefficients b
i
which minimizes
I/y -
L:
i8K
* b.f
.11
1. ~
I
21
II
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,I
(3.16)
is defined by
Jf(x)f i <,~)dW(~).
R
*
for each iEK.
Since these b. values minimize the integral of the
1.
square of the "error" (3.16), the approximation is called a least
squares approximation.
Now suppose further that y has the following structure:
y
= f(:l{) =
L:
iEK
(3.17)
a.f.
(x)
1. 1. -
That is, y is a linear combination of the functions in Q.
By Corol-
lary 2, the best ("least-squares") approximation of y with respect to
the set Q* is given by
L:
iEK
*
(3.18)
a.f.(x).
1.
1. -
l
Consider the special case where ~ is a real variable (xEE ) and
R
= [-1,1]. Also suppose the weight function is w(x) = 1 so that the
inner product of functions f, g over V is given by
1
IW(f,g)
=J
f(x)g(x)dx
(3.19)
-1
If the set of orthogonal functions is the set of polynomials of degree
0, 1, 2, ••. , m, which are orthogonal with respect to the inner product
(3.19), the functions we obtain are the Legendre Polynomials
(Abramowitz and Stegun, 1964, Chapter 25).
The same region, R, with the
I
22
II
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.-
I
2
- x )-1, generates the family of
weighting function dW(x)
= (1
Tchebychev Polynomials.
Note also that if R
the family of functions {sin nx, cos nx, n
=
= 0,
[-TI,TI] and dW(~)
= dx,
1, 2, ... } forms a set
of orthogonal functions.
3.1.3.
Some Results for
Integration~Orthogonal Polynomials
Although the approximation theorem can be extended to cover nonorthogonal sets of·vectors, approximation with orthogonal sets of vectors is easier and, in many cases, less subject to round-off error in
computational processes (Davis, 1962).
sentation of some useful results for
Before proceeding to the pre-
~ntegration
orthogonal polyno-
mials, two techniques are presented for producing orthogonal (or orthonormal) vectors from sets of linearly independent vectors.
While the
two methods are equivalent,one, the Gram-Schmidt orthonormalization
process, is standard material in linear algebra texts, and the other,
based on the square root
decomposition~ofpositivedefinite
symmetric
matrices, is a more useful algorithm in the present study.
Theorem 3.2.
Gram-Schmidt Orthogonalization.
Let V be a real
II II.
vector space with inner product ( , ) and associated norm,
z
l is any linearly independent set in
m
m
l:
i=l
a.z.
~ ~
=
°
V, i. e. ,
(3.20)
E V
is impossible for any set of real scalars {ai}unless all a
there exists an orthonormal set X
xk =
k
L
i=l
= {Xl'
a'kz ..
~
~
If
i
= 0,
then
x ' •.• , x } such that
2
m
(3.21)
I
I.
I
I
I
I
I
I
I
23
The proof consists of the Gram-Schmidt
here for reference purposes.
Let xl
that
= ~ ~l'
I Iz i !I = 0
process,.~Jhich
is reproduced
The procedure is recursive.
Clearly I Ix l ! I
= 1,
I Iz i !I = °->
is impossible, for
~
all zi
= 0,
tor cannot belong to any linearly independent set.)
(Notice
and the zero vec-
Suppose that
. {xl' •.• , x } has been found so that it is an orthonormal set satisr
fying (3.21).
Let
r
u r + l = z +1 - E x,(x"z +1)
r
i=l ~ ~ r
(3.22)
Clearly for j = 1, 2, ... , r,
r
(x j ,ur +l )
= (xj ,zr+l) - . E
~=
l
(xj ,xi) (xi,zr+l)
Ie
I
I
I
I
I
I
I
••
I
Set
X
ur +l
= -rr:::---rr'
r+l
I I u +l I I '
r
(3.23)
the set {xl' ..• , x + } is orthonormal and x +l satisfies (3.21), as
r
r l
was to be shown.
The primary application of the Gram-Schmidt process in this study
will involve transformations of sets of "standard" polynomials to orthogonal polynomials.
The actual calculations will be performed in a
slightly different format.
In order to motivate the calculation pro-
cedure.whichfol1ows, consider a set of standard univariate polynomials,
(3.23)
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
24
where
.. . , m.
Consider the vector space V = L (R,B,W) where R = [-1,1], d\AJ(z) = dz.
2
As noted earlier-, the application of the Gram-Schmidt procedure to the
set F (3.23) produces-the Legendre Polynomials.- Denote the set of
Legendre polynomials through.degree m by P ='{PO' Pl' .•• , Pm}.
From
(3.21),
, (3.24)
ForD:\ a 1 x (m+l) row vector- (with values in E(m+l) of the functions
f = (fO,f l ,
.E. = (PO' Pl'
(3.25)
Then, equation (3.24) can be written in matrix notation using the matrix
A
=
[a .. J:
J.J
(3.26)
.E. = !A; .E.(x) = !(x)A.
Note that the (m+l) x (m+l) matrix A is -upper triangular; a ..
J.J
j
< i.
=
-1
=A ,
As noted above,- the A matrix is non-singular; letting B
f =.E.A
-1
= .E.B;
0 for
(3.27)
lex) = .E.(x)B.
The B.matrixis also upper triangular.
Now consider a linear combination of the f"
J
m
m
j
g(z) = I: c.f.(z) = L: c.z .
j=O J
j=O J J
a polynomial:
(3.28a)
I
I.
I
I
I
I
I
I
I
25
(3.28b)
where the (m+1) x 1 vector .£ = (cO' c l ' •.• , c ) ,f
m
•
Evaluate g in terms of the orthogonal functions:
g(z) = f(z).£ = .E.(z)B.£
(3.29)
m
=
L:
j=O
d 'Pj (z)
J
where
d = Bc
(3.30)
which implies
.£
=
(B)
-1
£ = A£.
(3.31)
Ie
Using the relations (3.30) and (3.31) a polynomial function g(z) is
I
I
I
I
I
I
I
an orthogonal polynomial representation (3.29) and vice versa.
,.
I
easily converted --from a standard polynomial representation (3.28a) to
It shou1d·he evident that. the results above hold for conversions
from any linearly independent set to an orthogonal set.
In particular,
the functions may be polynomials in any number of variables.
The fol-
lowing theorem demonstrates a computational algorithm for finding the
A and B matrices-in the general case.
Theorem 3.3.
Assume the general setting of Theorem 3.2; let A
be the mXm nonsingu1ar ,tippertriangular matrix defined by (3.21) and
-1
let B= A
•
Define the mXm symmetric matrix M with typical element
m.. , where
1.J
m.. = (Z.,z.), i
1.J
1.
J
= 1,2, ... , m;
j = 1,2, ... , m.
(3.32)
I
I.
I
I
I
I
I
I
I
26
Then M == B'B == (A
Proof.
-1
-1
)'(A
).
Therefore M is positive definite.
As before,. form the lXm row vectors
Of course'H each element of. each vector is. a member... of Hthe.vector space
v.
From (3.21),
x
zA
==
or, equivalently,
z
==
-xA
-1
==
(3.33)
-xB.
That is,. since .. B . .i.s.. also triangular,
Ie
I
I
I
I
I
I
I
,.
I
(3.34)
Consider-.the inner product. (the(i,j} ""'.element of M),
(3.35)
==
But
{Xk ' k
== 1,
2, ... , m} is orthonormal:
== {10 if k
1:
if k ==
~}
N
Therefore (3.35) becomes
m •• == (Z.,z.) ==
.J.J
J.
J
(3.36)
I
27
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
••
I
But the summation in (3.36) defines the (i,j)-element of the matrix
B'B, as was to be shown.
The theorem provides a computational method for finding the A
and B matrices by straightforward matrix computations.
Given the pos-
itive definite symmetrix M,.onecan apply the "square root algorithm"
for matrices (see Faddeeva, 1959, for example) to compute an upper triangular matrix B such thatB'B
= M.
One can readily compute A
= B-1 ,
since triangular matrices are easily inverted.
The following result will be useful in the next subsection.
Theorem 3.4.
Assume the general setting of Theorem 3.2.
Suppose
it is desired to approximate
y =
m
L: y.z.
~ 1.
i=l
by
n
y
=
L:
i=l
where 0< n <.m.
(3.37)
c.z.
~ ~
Partition the vectors of c., y.,
~
and x. as
~
~
follows:
where each of .£.' and X is rnx 1.
-1
A1sopartition.the A and B = A
matrices:
n
n
A
=
mXm
(m-n)
n
m-n
n
m-n
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
28
If Ii -
(:~)
o
is the vector of coefficients of the ,,-vector.
'=
By
=
Then the vector of coefficients .£1 which gives the best approximation,
y,
is given by
Proof.
Since y
=
m
L
i=l
o.x.
~ ~
= xc,
-
where 0
-
(3.38), by the Approximation Theorem,
y
=
n
L
i=1
where d
=
Then by (3.31),
= I.
OoX.
=x
J. J.
d
=
(0 , •.• , om)' is given by
1
I
II
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I-
I
29
Although the above ,,:r:asult is.us.eful,.it isuno.t ..nea:dy as general
as it might appear
0
.
The approxima.tionit.produces depends.upon the
order in .which .the...z",v.ectors are.arranged ..iu.. the.. orthogonalization process; . t.hose.z~vectors(functi.ons) . which. are to receive zero . coefficients
must be the last.(m.".n} .v:ector.s.inthe Gram...,Schmidt.procedure for the
result.above to holdo
The . 'point. is. that. the.. ordar,. in.. which.. the . .z"'ve.c..tors are entered
into. the. orth.og.oua.lizat.ion.proc.edure.det:ermines .the .. resulting orthogonal .vectors; .. diffe.rent.orderings--pro.duc.e~-different-se.ts ... of orthogonal
vectors •.. Theuorthogonal vect.o.rs .. are.linear. combinations. of the originaLvectors;-different.. ord:er.iugs~.pr.odu:c:e....diff:erent-sets
of linear
combina"t.ions. (different A matrices).
In.many.probl:ems., ..especially.those- involving. sets·of polynomials
in one .variable,.. there.is .. a.natu.r.al.. o.rdering •.. However,. for polynomials
in sev.eraL.va:t:iables. the.ordering. is.. s.omewhat . arbitrary.
consider~.-for...twoindep.endentvariables,-the.following
For example,
ordered set of
functions:
Another "reasonable" ordering is:
2
2
(1, xI,x2'~:X2' Xl' Xl)·
The difference isreal., but.. uot.a.ltogether.easy to. grasp.
If y is a
linear combination of the functionsabove.andone.decides to use an
approximation
y which
is a linear ·combination.. of ..the functions
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
30
. then the capprox.!mations. _pro.duced.,us.ing~ .the.. two. different -- orderings ab ove
would be. identical. - . However,. Theorem 4 applies only to the second
ordering.
Now consider how differeU-t ..ap.-proximations;..m:f.ght:be.produced.
Suppose.. the.. orthogonaL.f.uuctious.. produ.ced;.frotll. the.cfirs t· ordering are:
and the orthogonal ,.func.tions:~producedfrom·.the-.secondordering are
denoted:
The results will be. identical:,
Pi·· .... q.,.for.
i= l,.2.,-and 3.
..
J.
For the
other: . functions we...have.the.·"correspondences ". (which. are not exact,
of course):
2
P4 ++ Xl ++
2
Ps ++ x 2 ++
qs
q6
Now consider approximation of y by Yl,a linear combination of
Pl' P2' P3' and·P6; and by Y2' a linear·combination.of ql' q2' Q3" and
Q4; the approximations would, in general., be different:
The approximations. are..different .even.· though. both ..are . linear
combina~
tions. of terms. which "corr-espond" to.the same subset of original func-'
tions.
I
I.
I
I
I
I
I
I
I
31
However, .. once the. ordering-has .been.cnosen.and- the orthogonal
vec tors (functions } .. have .. been_ prcduced~. the. appro:xima tion theorem is
more. general.. than __Theo.rem.. 4.,.... an.d.~orde.r:...i.s .;unimportant-in.. the following
y
m
L dix i ,
i::01
then.for.. any:cnon,:-eriipty.subset X'* eX,
y=
L
x.~ e:X'*
d.x.
~
~
is the "best". approximation. of yover.thesubset Xi¢, . regardless of which
subset was chosen.
Ie
I
I
I
I
I
I
=
3.2.•. Minimum.-J3ias Es.timation: Best Linear. Unbiased Estimation
of . the.J3est.Approximating Function
3.2.1.
Definitions and Notation
Some considera.tion is now. given to the estimation of approximating
functions ... The. following assump.ti.o:ns..,definitions and notation will be
usefulthrough ..the.. remai.,nder.of the study.
It. is. assumed.. thatt')
=. t'}(~) is.. a"response.function.which belongs
to the function. vee tor space L 2 {R..,..B,. W).. described in -sec tion 3.1.1;
i. e • .,: t'} . is a. continuous function. of the.r .variables ~
= (xl' x 2 ' •.. ,xI') ,
and .. is, square integrable. with:..respe.c.t...to.the finite measure (or disr
tribution function)W overR C E •. It is assumed that nhas the structure
I.
I
I
n=
m
L o:.f
j=l
J j
,
(3.50)
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
32
where the- a ... ar-e.unknowu...parame.t~:t,s.~:a.n:d.:the"
j
fT-a.re.,.kno~
functions
(usually polynomi.als) .. of-3f-o . I t is. also assumed that each
f
j
E
L (R, B, W).
2
Arrange... the functionsf
in. a.lxm row vector
j
The Gram-S.chmidt process is applied.toproducethe-I ...".orthogonal (not
w
necessarily. or.thonormal). vee-tor. of functions
E.= (PI' P2' ••• , Pm)
with the. conversion matrices . .A:~_.B""_A-::-l, such that
E. :::: fA,
i = .£.B
as.explained.. in..the __previoussection.
... ,
(3.51)
Thus., on setting
a )'
m
(3.52)
from (3.50) - (3.52):
m
n = fa:::: E. S·= L. S,Pj(x)
- -
-
j=l
J
(3.53)
-
The functionn is not directly observable.
formed- at.N:.·> m points. -x] .. .,.L = 1, 2., ..... , N..
matrix," traditionally-denoted by X:
Experiments are per-
Compute. the Nxm "design
I
33
I.
I
I
I
I
I
I
I
In terms of the orthogonal.. func.tions...,thadesign matrix is
Ie
It is assumed that X and P are of rank m, which implies that X'X and
I
I
I
I
I
I
I
I·
I
(3.54)
p ::;: XA ::;:
(3.55)
Pi'J :;:pj(x.);
1< j< m; 1 < i < N.
--:L
-
p'p are.nonsingular.
At the
point.~theexp.erimenterobserves
Y ::;:
i
These observations'
n (x
.) + e:.,
--:L
~
i ::;: 1, 2, ••• , N.
are.,arranged,into.-an.NX~vector -with"
the following
structure
(3.56)
where the vector e: ::;: (e: , e: , •.. , e:N)' is a continuous N-variate ran1
2
2
2
dam vector with:E.(e:) =, 0". E(~ E') ::;: O'e:r. - '. One may assume O'e: is known or
unknown.- When e:is assumed to have
tion,
them~tivariate normal
thisassumptionwil~.be-explicitlystated.
distribu-
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
34
Under'theseassumptions,the application of .classical,least
squares estimation
theory(Graybill~l96l)yieldsthe
biased Estimators (BLUE) for CI. and
A
Best Linear Un-
~:
-1
CI.
=
(X'X)
i
=
(p'p)-l p'Z
X'~
= (A-l)t
=
(A'X'XA)-l (A'X')Z
(X'X}:l,(A:.h,' A'X'Z
= B~.
(3.57)
Also:
A
A
E(CI.) = CI.; E[
=~;
E(S)
(~.,.. ~)
A
(CI. - CI.)'] = 0
E[{S - {3HS
A
and, moreover,Cl.
i
= 1,2,
-~)']
2-1
(X'X) ;
= 0
2
(p'p)-l;
(3.58)
A
i
is the BLUE, of·Cl.i~ (3:1. is ,the BLUE of (3i'
••• , m., (If one assumes-f. has themul.tivariate normal dis-
tribution:,.the.estimators.above are all. Minimum. Variance ,·.Unbiased Estimators.)
Also, .the BLUE for- nat the point
A
n(3S) =!(x)~
~
is
A
= .E.(~)~
(3.59)
Where
and
Now consider the effect---of an inadequat.e--model. - Suppose that
only n<m of the f. functions are used. to "fit" the model.- (3.56) •
J
For
I
I.
I
I
I
I
I
I
I
35
convenience, assume the functiolls used are f , f , ••• , f and par1
2
n
tition the design matrix and coefficient vectors so the-- true model is
Y
I
~1
+
NXn n Xl
X2
(].2
+
(3.60)
E.
NX(m-n) (m-n)xl
If onlythe.. firstnfunctions'those.repz:esented-inX1) are used in
the estimation pxocess.,---theestimator
~
for £ is:
(3.61)
and
(3.62)
Ie
I
I
I
I
I
I
I
I·
= Xl
a
-2
That is, a
-1
~2'
= O.
is biased by-an.. amount-depending-onthedesign matrix and
A
Denote by-n t the estimator for n based on aI' viz.,
A
nL~) =
II (x)~l '
(3.63)
A
is:
(3.64)
A
Unless £2= 0, nL is biased.
In the presence of bias, the variance
criterion would-seem to be inappropriate as a criterion for comparing
estimators of·n.
Thefol1owing.criterionis~moreappropriate. Let
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
36
n* be any estimator for n; the Integrated Mean Square Error of n* is:
IMSE(n*) ...
E{[n(x) -
n*(~)]2}
(3.65)
dW(x)
R
=
J E{[n(~) -
E(n*(x»
+
E(n*(~»
- n*(x)]2}
dW(~)
R
. . J E{[n(~)
-
E(n*(~» ]2} dW(~) +
R
J E{[n*(x) -
E(n*(~» ]2}
dW(x)
R
(3.66)
3.2.2.
Best Linear-Unbiased.Es-timation-of-the Best-Approximating
Function
The function actually being-estimated by
E(n ), is an approximat:Lnsc-function.for·n.
t
nt
above (3.63);
i.e~,
SinceE(n ) depends upon
t
the -design. ma trix., ..it ..may -.or may not be- a -.- "good" _. approximating function.
Considera.tion is. nowu given to estimation of the best approx-
imating functionin .. a.somewhatmor.e.general setting.
Suppose one.-chooses-to approximate n as a linear combination of
a subset ofthe--functions E.
=
(PI' ·P2' ••• , Pm). - LetJ.* be - the (non-
empty) set of subscripts of the.-selectedfunctions.Bythe Approximation Theorem the best linear approximating funCtion for n with respect to the chosen subset of functions is obtained by simply ignoring terms-in.-the orthogonal-:"'functionrepresentation of n:
(3.67)
Notice that
I.
I
I
I
-n
is not equivalent to
nL
:=
L
iEJ..
*
f
~ ~
0/..
0
,
I
I.
I
I
,I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
37
which would be obtained by ignoring terms in the representation.of
as a linear combination.of theoriginal-.nouorthogonal functions.
n
By
classicalleastsquares..estimation-..theory ..(Graybill~ 1961, ~Theorem 603),
the best .. linear...estimator.for·n.cat.. the..point ~is:
A
A
L
iEj*
(3.68)
l3 P. (x) •
i ~ -
This representation can be, converted back to a representation in terms
ofthe.. original,fcfun.ctions as-follows .
Define.. themxl vector.@. by:
A
l3 i =
l3
i f iEA*
i
(3.69)
0
if iEA*
Then,
.-
CI.
= AS
(3.70)
and
A
n(~).=
m
(3.71)
L ii . f (x) •
i=1 ~ i -
3.2.3 • . Equivalenc.e.of the. Minimum Bias.. Es.timator and BLUE of the Best
Approximating Function
In general, in the w?rkabov~(3r=·0. does. not . imply .iii
= 0;
tha,t
A
is, . in general. all. m.terms· wi,lL be used in the representation of
n in
termsof.the.originalnonorthogonal.functions.
Consider~.however,the-.following.
s pecial
case which occurs, for
example,.if ...tha.f.....arepolynomiaJ.s in. one variable, arranged in order
~
of increasing degree,.andiftheapproximator·n·is of-degrcae n-l < m-l.
(The speciaL case being discussed. arises ,in. any. si,tuation in which
n
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
38
is a l.inear>function of thefir·st·n. elements of the ordered set
n
n = i=l
L:
J.
Partition thecoeffj"cient vectors
~ = ( ~2Sl) '
a
-1
=
al )
(
a
2
'
A
where a
1
and (31areeach.nxl.
Par.tition theestimator.vectors
A
£'
the function. vectors.. !:, p';-andthe.A.and-Bma.trices... to correspond.
~,
By
Theorem 4 -of. the.p.revious section,
Thus the BLUE of
Tll
is
A
A
Tl l
=:
A
A
[email protected].=f:l(a1 + AIIB12a2)
A
= i Ia
(3.72)
where
A
a=
A
[I:AIlB12 J a.
(3.73)
The setting described:.ab.oveisequivalent. ·to. the setting. Karson et ale
(1969) used in: deriving, the Minimum.. Bias ..Eatimator. under the- following
. restrictions.•..
I.
I
I
S.Pi'
W~) . .
represents the uniform weighting function over R,
i. e., dW (x.)-= dx dx ·..•.• dx
1
2
r ·., .The
in.tegration used in defining the
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
39
inner . product is·.. Ri.emannintegra,tion.•-The. functions.fi(x) are.standard polynomials in the r ...variables~arranged-in.ord.e.rof
degree inthefo110'Wing- sense.
increasing
If i=n.-(frtobe used.intheapproxima...
. tion) ~nd-n<j:o;:m'(£l'not.· to -be "us.ed ..in the.appr().ximation.)~ then it. was
assumed. that ..the degree of.f:Cin.. ~ is.less.than or equal to the
degree of
frin~,. k
= 1,
2~"H .•.,r-•.
For example.one might have:
Withsuchfuncti.ons;,the>M ..ma.tr4.x,-oLinner.. produc.ts becomes the matrix
of. "moments" of the region R.
m....
J.J
=··I· f .. (x)fj (x)dx.
1. -
--
R
In the above. example., one el,ement of M is
The MBE.. for <X,under:theabove assumptions is
(3.74)
Where
01.
is th.eBLUE. of.£!,(usingthe .full model) and
I
I
I .
I·
I
is partitionedto'.ma.t-eh"the..partiti9n of
01....
(MIl is nXn.)
To show
the equivalenceof.theestimat.ors.(3.73) and. (3.74) it is only necessary to show
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
,.
I
40
(3.75)
~ecall that. All = :6
-1
; (3.75) becomes
11
(3.76)
Since B'B.= M, we have
Hence
and
This demons.t.rates.the.equivalence of the estimato:rsin the special case
considerf;Ui by. Karsonnet a1•. (1969) •. nAl.though. theY·ndiil. not explicitly
derive the Minimum Bias Estimator as the BLUE of:the.Best Approximating
Function, this is clearly the. principle which guided their work.
Thus,
in the." p.resentexpos:ltion.. an.es timation. procedure. has been developed
which . "represents a.generalization--. of the Minimum. Bias Es timator.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
4.
A
PROCEDUB.E.~FOR-$ELECl'I.ON·OF..
TERMSH IN-THE -MODEL... WITH· MINIMUM BIAS
ESTlMATI-ON.-AND..1'HR-.INTEGRAT-ED.ME,AN..SQUARE .ERROR CRITERION
Many procedu:resc have been proposed for a t.ta-ekon the problem"of
selecting. independent:.variables.. .in. a regression. problem..
References
to- pap.ers-des.ctib:l..ng.. ,many...of".these--j;echniques-have_been.. given in the
ReView of. Literature-secti.on. of this paper.. .. All of the procedures
mentionedthere·suffer-one-common affliction., which derives from the
fact that .the ..proceO.ures.. are- based on,,· "locaill. criteria,. such as the
2
R criterion.. Letthe--true response (or "model",.or "regression equa-
tion") be denoted by
n (x) . =
m
Ea. f . (x) ,
J J -
j=l
where x is an nxl column vector, ~ = (xl' x ' ••• , x )' and f j (~) is
n
2
a polynomial in then variables .. x
1
&
For example, ifn= 4, one might
have
Let R, thel'reg.ionof interest", be a subset of Euclidean n-space, En.
In essence,. procedures. for .. selecting. terms. in a model produce an estimator for
n,
say
v
n,
of the form
v
n (x) =
m V
Ea. f (x) .
j=l J j -
I
I.
I
I
I
I
I
I
I
Ie
If the j .,.th term is .excludedfromthe modeL (bywhatever procedure .is
v
., J
being applied), the effect is to setda. =0 •.. Suppose, for example,
that applicati.on.of...the.procedure ..being. used .. results in setting
'&j
= 0, j
= m + 1, ~ +.2, ..• ,m~ sothat.the-estimator is
l
v
The point is that. thefunc.t.i.on really. being estimated by n · (that is,
L
v
E [IlL.])' isnot(e~ept~..in.. special.-cases)..-the"bes~ ..approximating function
for. 11.· That is, define
m
nL
l
=
L
the funct.ionbeing estimated by
J j -
nL .· vnL
v
proximating.functionof.Iloftheform
I
I
I
I
I
I
I.
I
I
(*)
a.f (x)
j=l
is. not (usually) the best ap.,.
(~)~.aswasshown.in.theprevious
v
chapter.. That is, . in generaL.the as timat·or.. IlL"" produced .by . the. pro":" .
cedures.mentioned..above.,. .. is.;.. not...a.minimum...bias-.estima.ting function.
. In this.chap.tera.-procedure.. is.devel:opedwhich is '.'globaln.in.
the sense. that the criteriontakes.intoacc.ount the properties of the
estimation~approx:i.mationprocedureoverthe.whole.region
of
interest,
R.. Moreover, as will be seen.,..-the.pro.cedureisapplicable to full
general . linear.models including polynomials. in many variables.
The
procedure ..allows one to--test.any term.in.the.model-{low.,.orderpoly"',
.nomials.,as .. well:as....1 U.gh.-order .. pol¥nomials).. for. "significance", and
the one '.'best" .. regression equation of. all--possible. regression equations. (as.. measured:.by:...the~.criterion. defined:.. below) is. simply obtained •.
The ..properti.es....of_ .the..coefficien.t..estima.tors- are also derived.
I
43
II
I
I
I
I
I
I
4.1.
Minimization of Integrated Mean Square Error with
Respect to Choice·ofTermsin the Model,
. Parameters Known
Assume that the true . response. func·tion n is a polynomial function
of n variables, ~ ... (x '·.·•. ,x )', and assume a "region of interest",
1
ri
R
C
En.
Assume.also-awe.ight function.or.finite measure W is defined
over R and that n
L " (R,... B,.W) the vector space discussed in the
2
E
previous chapter.
That is
o<
In
2
(x)
dW(~)
< +
co
R
Pk ~), k
Let
=.1,
••• , m .". he integration....,.orthogonal. polynomials in
(4~2)
Ie
I
I
I
I
I
I
I
,I
(4.1)
Assumethatthe.degreeofeach.Pk(~)in
x!' ... , x
ri
each. of the variables
is specified •. Assume. also. that the true response function,
n,can.be written as
n(x)
Note that. Sr=;t
v
o for
Let f3 , j = 1, 2, .•
j
=
m
L a,Pj (x)
j=l J
-
(4.3)
anyj s·{l,. 2., ..... ,m} is. explicitly allowed.
eo,
m.be. . .estimatorsof .the.{3., and define
..
J
v
n (~)...
m
v
L f3 P , (x) •
j=l j J -
(4.4)
v
Certain. of. the estimatorsf3 rare. explici.tly. allowed-to be zero with
non..,.zero;· probability,
.!. e. ,
I
I.
I
I
I
I
I
I
44
v
P[Sj = 0] > 0 .
Note that setting: Sj = O.is. equivalen-t.. todeleting. thej,..,th term from.
the model (or the "regression.
v
2
. E[(Sj)] <
+
foreachj
00
.;:::.1,.-.2~
I
I
I
I
I
I
I
e
I
I
.. It .is. assumed that
..... '7m,... and that each Sj is esti-
mable.
Defineutheintegratedmeansquareerror (IMSE) for
n to be
~(x)]2} dW(~) .
IMSEdl) = JE{IT1(x)."..
(4.5)
R
Theorem4.L, With thesetting,described above
v
I
Ie
equation'.~;);.•:
IMSE(n) =
Proof.
m
v
l:E[ (S
j=l
2
- Sj) ].
(4.6)
j
Since.all.,termsarefini.te,., the.in,t.egraL and. the expected
value.. op.er.ator ..can-be, iuterchanged in (4.5):
IMSE= EJ
rn~)·u"" n(~)]2 dW(~)
'R
=E
I[ ~
. R j-l
(Sj -
~j) Pj ("'~J 2 dW(x)
= E f' .[ ~
R
= E
j=l
[~
j=l
=E[~
j=l
as was to be shown.
(4.7)
I
I.
I
I
I
I
I
I
I
Ie
I
45
Some remarks on equation. (4.7) are in order •
eventhoughtheestimators.
I
I
I
I.
I
I
v
f3 j may be.cor.related.,... no cov.ariance terms
appear in this expression.. of ..the-. IMSEh" Als.o.,notice .tha.t .the.summation
form of equation (4.7) allows .. one. to. consider the coefficient estimation one.·term at a time.
4.1.1.. Exper-imental Model
The following experimental.modelwill be. used throughout the
paper.
It is assumed that a "true".response function
n(x).~
\:2:
m
l: S.P.(x)
JJ-
(4.8)
j=l
is defined.over.the.
region of interest R.C: En and that the {P.J(x)}
are
.
integration-or.thonormalfunctions(usually polynomials) of the n variab1es.x!= (xl' •... , x )
n
Jpi(x)P
I
I
First. notice that
R
I ,
i.e.,
--
j (x) dW
(~) =
{
l if i
=
o if
:f:
i
j}
(4.9)
j
where. W(x) .. isueither.. afinite . measure.defined.ove.r--R or.a.. distribution
function of:..finit.e'.variation.over,R. as. discussed in. the previous chapter.
It is further. assumed.tha.t.:.N...observations are taken at the points
x. :
---J.,
(4.10)
Also, E:
=
(E: ,
1
H., E:NJ'has the.multivariatenormaldistribution,
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
46
N(O,O'
-
2
e:
2
I),.where O'may or may not be known.
Thus the vector of obser-
e:
vations may be expressed as
Y ==
Nxl
+ e:
P{3
NXm mXl
(4.11)
Nxl
where the matrix P == [Pij]is defined by
(4.12)
The matrix. P is assumad to have fulL rank so that the ,m1.nimum variance
unbiased. estimator of
~
is
(4.13)
which
hasthem:ul.tivaria.t.e~normal
'
i
mi n~mumvar
A
2'
d:.. estJ.ma.tor
'.,
0
The
f2
0' ,~s
e:
;2e: == ~(Y'Y
_ S'P'Y)
N-m - - -
(4.14)
2
(N-m)Oe:
and.
......
ance~·unu.Joase
dis tribiition NU3,
.' . has., thecelltra.l X2 distribution.with N.,.,.m degrees of
O'e:
A
freedo~-.allCl
4.1.2. .
.. is, .dist.ri.b.u.ted~.indep.endentlyof
~.
An Example
Suppose one. decides.,in-advance.of.an~experiment:~
that. certain ones
v
of. the f3 == 0 <.!.e_,..cer.tain.terms .. are .. to be deleted. from the model),
j
and that. others ...are-to.be set equal" ,to. the. corresponding. least squares
estimators (t.erms.to be included. in the mode.l).
Under:..these assumptions,. for. coefficients. corresponding to terms
"in the model",
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
47
Sj ,the least:...s.qua.res...,es.timator,. and
(4.15)
V
For terms not '.'in the model", ~.~.,. those_.te:rms for which Sj = 0 (set
inadvan.c.e.of the experiment),
(4.16)
Then:
IMSE.(~). ..=
R
=
L
var(Sj)
terms in
the model
+
L
-
2
Sj •
(4.17)
terms not in
the model
As explained in Chapter.3-r-the-proc.edure described ..above is just the
Minimum.. B.ias.Es.timation procedure.
The.. result.above$ug.g.ests..an.-i.n.triguingpossibili.ty for obtainA
ing a smaller IMSE.
y
If it ..were.knowninadvance.that
S~>.var(Sj)'
one
A
wouldset.(3j :::: Sj .(keepthe. j-thterm. in the model) . and obtain. a
small.er-IMSE-thanwould.be.ohtained.if.the.. j.,..th.term.were.deleted.from
the model •.:.,..The..p.rocedure.:.would. be.applied. individually. to each term
in the model.
An additional. bonus is-obtained:
no matter which terms are de.,.
y
v
leted from the. model (by setting S. :::: O) . theresultingestimator, n(.!.)
J
is theminimum.bias.estimatorfor
,..,
n(x)
I.
I
I
f
v
=E.[n~)l
chosen.
n(~)
ofthe·form chosen, and
is.. the- best-.approximating function of.
n of
the form
I
I.
I
I
I
I
I
I
I
48
The procedure above. can.be.-summarized as follows:
v
13
for j = 1, 2, •.• , m.
Ofcourse-,if. one knew .the - 13 j values in. advance there would be
no-..need.-for -.experimenta-ti-OtJ...-and es timation_· However.,--. the above procedure ..motiv&te8 u--the-f.ol~owing... estimation_-procedure . for each term
(l3 ) in the model:
j
(1) Test the hypothesis:
I.
I
I
113
H :
o
Ie
I
I
I
I
I
I
=
I
<
har (13J.)
(2) Define the estimator:
v
13 j =
Q.if.H. is accepted
o
13. ifH is rejected (13. is-the least.squares
J
0
J
estimator.)
The test in step{l)
W-ilLdepend_()n.whether(j~= var(Sj)
is known and
on the level (a)· of. the . .test'n ..Xhe,. pr.operties. of.the -estimator above
for the.. two. cases .(cr~
-
- E
kn~wn, (j~.
unknown.) are derived in· the next two
E
sections •. When ..thesa.prop.erties.are. developed .(based.on UMP tests),
it is
seen~that.the_estimator.is_.of
the form:
I
I.
I
I
I
I
I
I
I
49
A
v
13j
A
if
l13
13j
j
=
A
OU l13
j
I
> C
/A2
0.
J
(4.18)
I
< C
/A2
0.
J
2.
A2
2
(if OJ is. known, replace cr. by 0.) ,and. one can discuss. the "cutoff
J
J
point/I. C., .t::a.ther. than ..the. level ,C'i, of the test.. A. discussion of the
choice of thecutoff... poiuts.follows development of- the. properties of
d
the. estimators.
The. procedure--outlined ... above .is ..based ona.two-,tail. test.
are occasionswhen.a. one,,-tail.procedure is appropriate.
There
In such a
case, if it is known that 13. Y. 0 ,·thees tima tor is:
J
Ie
I
I
I
I
I
I
I.
I
I
A
2
(where OJ is
2
re p1aced.by.Oj'
if known) •. If13 .<.O.oneconsiders - J3 .
j
j
The properties of "one tail estillla.tors'! are. developed in.section 4.3.
Consider the following
generall~ed
procedure . . Let A be a sub-
set of. the real line, and define:
V
0 if
13j =
I
A
13:]
A2
e: A
0.
J
A
13j otherwise
This type of procedure. is suggested by Bayesian techniques for selection
of
region~on.which
v
13r"'O •.. The. properties .of.this .generalized proce-
dure fo.r..·certain simple.types of.-Jbsets (intervals, complements of intervals., .andhalf.lines).are.developedinsection 4.4.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
50
4.1. 3 •.. Res,ults ·UsefuLfor Evaluating Integrals
Theevaluationofexpectations~and
mean-square errors in suc-
ceedingsectionswill . requirethe..numaricaLevaluat.ion .. of the cumulative distribution.functionandpartiaLfirst.and.second.moments of the
standard.no.rmalprobabilitydensity. function•..The.. algorithms given
here are.intended.for.use on.a computer;. the. computer used for computing charts ..in.. thisstudywasthe..IIDL360 -ModeL 7-5 •
Since . IBM. and
mos t. other..computer- manufacturers .. supplj'.. very-. accuJ::ate ..algori thma for
evaluating,.,the-error-£uncti.on.,. defined by:
er£(x) =
x
....L
I
rrr
e
_t 2
dt,
0
this function is.used.as a.basis.for.evaluatingthe functions below.
The cumulative normal distribution function., denoted by <1>, can
be .evaluated ..in_ terms. of.."the.error.. function.(.Abramowitz and Stegun,
1964,.·. equation 7.1.22):
x
<1>(x)
= --L
121T
I
e
-t
2/2
l'
dt =2 [1
+ erf
(x/I2:")]
(4.19)
-00
The.partial. firs.t.moment.(deno.ted PFM) of the . standard normal
probability. density, defined by
1
PFM(a,.b)
= I21T
b
.J
te
-t
2 /2
a
can be.. evaluateddir..ectly.-by.. observing that
dt,
(4.20a)
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
51
Thus the PFM function may be evaluated by:
PFM(a,b) =
--L
;z.rr-
2
2
[exp(-a /2} - exp(-b /2)].
Thepartia~-second.momemt..
(4.20b)
(PSM)....of the standard. normal probabil-
ity density, defined by:
I
1
PSM(a, b)
b
=·h'rr
2
2
texp(-t /2) dt
(4.2la)
a
can be evaluated.by.integ.ration",by..,-parts.The result is:
(4.2lb)
Therefore thePSbLfunctionmay be. evaluated as:
(/f.2lc)
The ¢i, PSM,. and-PFM.functions ..were programmed-.as.double precision
FORTRAN .subprograms.for..use.. invevaluating.. results. discussed in the
following sections.
The.. following. ~mmas.-wi~l_aJ.so._be.....use£~.in- ..--the.. following sections.
Lemma 4.1. . Let (x,u) be jointly independently distributed random. variables. such ..that ··x. has the.. (margfu,a 1}. normal.· dis tribution,
~(e,l), and Vu has; .the(marginal)chi.-sq.uare..-distributionwith V
degrees of. freedom,.Le., the joint density function of (x,Vu) is:
f (x,vu;
e, v) =
exp [-(x.,..e)2/ 2 ] .(vu)V/2-l exp (-u/2)
;z.rr-
2v/2 r (v/2)
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
52
over the region
{(x,u) ;
-00
< x <
+
00,
0 < u} •
=
Let a , a 2 ,·a , .9.4 be given constants satisfying
l
3
-00
and
definethefol~owing.. subsetsof .the.~.(x,u)...,
= {(x,u): u
A.
~
B.~ =--. {(x,u):
~
0,. a
sample space:
ru}
<
i lu< x - a i +l
u.> 0, (x,u) t- Ai}'
= 1, 2, 3.
i
B.~ is the complement of A.relative-to.thesample
.space.
~
be.acontinuous function.-of.x.and.a parameter,. e..
= xor
tions, h(x.,8)
h(x,8) "" (x-e)2.
Let h(x,e)
In the applica-
Let kbe a.real constant; in
the applications-.k~=.0 ·0:1:- k-:;; e~ .•. Define. the functions
gi(x,u) = h(x., e.) 1 B. (x,u). + k lAo (x,u)
~
~
g~(x,u)
~
= h(x,8) lAo (x,u) +k
~
IB
(x,u)
i
and
I (e; -a, b; c, d) =
t
a
[k-h(t+6,6l]
II (t;0,1l
2
p[V(:;6l
<
~ < V(:;6l 2j
dt.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
53
Then,
E[g2(X'u)] = E[h(x,8)] + 1(8; .-00, -8; a 2 , 0)
+ 1(8; .,.8, 00; a , 0);
3
E[g~(x,u)]
= k - 1(8; .-00, .,.8; a , 0).- 1(8; ..,.8, 00; a , 0);
2
3
Proof.· . First note that
E[gi(x,u)]
~. 11
h(x,8) f(x,ti) du.dx.+ k
Bi
11 f(x,u)
du dx
Ai
= E[h(x,8)]
+
II
[k - h(x,8)] f(x,u) du dx,
A.J.
and
E[g~(x,u)J= 11
h(x,8) f(x,u) du dx + k
If f(x,u)
du dx
Bi
Ai
=ko-fJ
[k - h(x,8)] f(x,u) du dx
Ai
= k + E[h(x,8)] - E[g.(x,u)].
J.
The integrals over thesetsA. must be evaluated separately.
I.
I
I
J.
integral over A is:
1
The
I
I.
I
I
I
I..
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
II
54
[k-h(x,8)] f(x,u) du dx
Al
2
.. I
_00
2
x /a 2
o
f
[k.,..h(x,8)] f (x;8,1)
l
. 2
x /a
f (u,V) du dx·
2
2
l
Change the variable.of integration for the outer integral to
t .. x - 8.
The inner-integral becomes
the limits.ofintegra.tion change from (-00, 0) to (':"00, -0), and
k - h (x, 8) .. k - h (t+8, 8) •
l-e[k-
Mt+ll)] f
1
«;
Thus-'-dthe whole quantity is
0;1) P
f(~e)2
dt
adding -E[h.(x, 8) J produces -the . result _-stated -in the Lemma.
Theotherderivationspare essentially-the-same and-use the same
change of variable:
If
A
2
[k_- h(x,.8)J f(x,u) du dx
00
o
.I
-00
[k - h(x,B)] f (x;8,1)
r
I
f 2 (u,v) du dx
2 - 2
x -/a
2
I
I.
I
I
I
I
I
I
I
55
I
00
00
+
[k -
h(x~8)J
f (x;8,1)
l
Jf 2 (u,V)
du dx
2 2
a
x/a
= 1(8;-00,-8; a , 0)+ 1(8; -8,
2
3
00, a , 0).
3
Addition of '. E[h (x, 8) J produces the desired result.
[k.~ h{x~8)Jf(x,u)
du dx
,·.2;a 2
00
=I
[k - h(x,8)] f
1
JX
3 f2 (U,V) du dx
2 2
,x ;a4
Addi tion of E[h (x, e) J produces the desired result.
Ie
I
I
I
I
I
I
The
integralsabove.with.the:chi~square.probability
i n,the
integrand must· be. evaluated by." numericaL approximation.
methods-it isimportant.t.oevaluatetheintegrand.very accurately;
Abramowitz .and.Stegun...(1964", Chapter 26) .give equations. for the direct
evaluation. of ..the.. chi-square.distribution:function.. -... These formulas
were incorpOJ:a.ted . in:. a. -double.. precision:..FORTRAN FUNCTION. subprogram,
QCHI ~-which~ was_usedfor:.- calculations. in this - paper.
Lemina 4.2.
Let x be a random.variable.with the N(8,l) distribu-
tion, ...with.density.. denoted . f (x;8 ,l).te·t.a.
1
I.
I
I
In, such
such that
-00
sample' space:
sal <a
2
<·+oo
r
a
2
be given. constants
and define-the following. subsets of the
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
56
Lath(x,.8)
stant.
be.a.g.iven~-con.tinuous.pfunc.ti.on.and.le.tkbe
a
given con-
Define
Then,
E[g(x)] ::: E[h(x,8)] + k[W(a Z':'"8) - W(a1-e)]
a -8
2
h(t+S,8) f 1 (t;O,1) dt
a -8
l
-I.
and
E[g*(x)]
Proof.
~
k +
E[h~x,8)]
- E[g(x)].
First consider
E[g{x)]
=I hex,S)
f
1(x;e,1)dx +k
B
A
a
=E[h(x,8)] +
I f 1 (x;8,1)dx
I
a
Z
[k,..h(x,8).]f1 (x;e,l) dx.
1
Let t ::: x-8; then
a
E[g(x)]
= E.[h(x,S)] +
I
2
-e
[k-h(t+8,8)] f 1 (t;O,l) dt
a -8
l
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
57
a -8
= E[h(x,8)]
I
2 f1 (t;O,1) dt
a -8
1
a -8
-I
2 h(t+8,8) f (t;O,l) dt
1
a -8
l
which is equivalent to the desired result.
Elg*(~)] =
Now
J
If 1 (X;e,1)
A
B
h(x,8) f (x;8,1) dx + k
1
=k
+
I
dx
(h(x,8) - k) f (x;8,1)
1
A
:;: k + E[h(x,8)] - E[g(x)].
as was to be shown.
4.2.
Properties .oftheEstimatorsBasedonTwo-Tai1 . Tests
Throughout this section the experimental model is assumed to·· be
as described indsect.ion4.Ll.
The distribution .. function, expected
value, and mean square error of the estimator (4.18) will be derived.
4 • 2.~1.
p.
Properties. wi th
In this
2
fore, OJ'
Known
sUbsectionuitisassumedthatO~is
A
=!I..
(l
var(l3j)~can
known, and, there-
2
be compu.ted as 0t·times ..the.}-th diagonal
element of (p'p)-l.
Since.-the ..-intag.rat.ed.mean- square. error is .the. criterion under
considera.ti.on, . by ..Theoretn.-4.. l-.the.es.tima.tors.may.be ..considered one at
a.time.
I.
I
I
+k
Define:
I
I.
I
I
I
I
I
I
I
58
"f3 j =
(4.22)
The real questionof.interest is whetherlf3j
equivalent to
I > ~ar <aj ),
If3j I
Ivar
(e
j
which is
(4.23)
)
Define
e =
(4.24)
and its· minimunLvariance'unbiased es timator,
Ie
N
I
I
I
I
I
I
I.
I
I
N(e,l).
(The symbol ":';" means "is distributed as".)
Now, inequality (4.23}.is equivalent to
n
H:
o
leI
~1
(4.25)
and the Frocedure (4.22).. is.. motiva.ted.by.the -fact.-that. the UMP level
C/. test of H.. (4.25) is .(Lehman.,.1959.,. section 3.1):
o
(4.26)
I
I.
I
I
I
I
I
I
I
59
v
= 13l (13j) •
where C depends on a, and 13 j
v
The properties of 13
j
are more easily derived indterms of the
standardized variables:
=
f:
I.
I
I
(4.27)
else
\.
A
=
A
8 IB (8)
(4.28)
where IB is the indicator function for the set B
=
(-C(), -C]u[C, +00):
(4.29)
Ie
I
I
I
I
I
I
A
if 18:1
v
The probability distribution of 8 is mixed.
2
v
LetF(8;8) denote the
v
exp[,.,.,(t-8) /2J dt,-for -C() < 8 < -C
F(-C;8) for -c
.~
v
8 < 0
v
F(8;e) =
1
F(+C;8) = 12'IT
C
J..exp[--(t - 8)2/2] dt,
-C()
o<
(4.30)
v
8 < C
v
8
1
I27r
,.f
exp[-(t - 8)2/2] dt for C <
-co
F(8;8) is continuous except at ~
v
8 < +00 ..
= 0;
it has a continuous derivative,
v
~
f(8;8), everywhere except for the set of points, 8dc, -C, D./'.
I
60
I.
II
I
il
il
I
II
I
Ie
I
I
I
I
I
I
I
Ie
I
y'
Now consider the mean square error of 8, which can be found by
application of Lemma 4.2. In that lemma, set k = 82 , x = 8;
A
A
2
2
h(8,8) = (8-8) so h(t+8,8) = t ; -al = a 2 = C so A = [a1'a J
2
and B = A
C
as above.
v
V
A
=
[-C,C]
2
By Lemma 4.2., since g(8) = (8 - 8) ,
2
A
MsE = E[(8-8) 18,C]
= E[g(8)J
C-8
=
2
2
t f (t;0,1) dt
l
J
Var(8) + 8 [¢(C-8) - ¢(-C-8)
-C-8
- @(-C-8)] - PSM(-C-8, C-8)
(4.31)
dt;
(4.32)
where @ denotes the cumulative distribution function and PSM denotes
,
the partial second moment function for the normal distribution, as
discussed in section 4.1.3.
Although (4.32) is a compact representa-
tion of MSE (8,C), equation (4.31) is suitable.for numerical computations.
A graph of the MSEfunction for. various values of C and for
o<
82.4 is displayed in Figure 4.1.
function is symmetric about 0 in 8:
It should be noted that. the
MSE(8,C)
= MSK(-8,C) ,
so that
only positive values of 8 need be considered.
v
It shoul.d.. also- be noted-that . for C = 0,
V
(i.e.,
e
= 8 identically
A
B. = B.); this is equivalent to always using the least squares
J
estimator for
J
B..
J
Therefore
MSE(8,C
= 0)
= VAR(8) = 1, all 8.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
61
o
•
C\J
U")
c::
....•
0
c::
c::
w
UJ
c::
«
:::>
0
....•
C?1
C./)
z
«
UJ
:E
U")
0
•
o•
o.
1.0
2.0
3.0
9
Figure 4.1
Mean Square Error of the two-tail estimator
v
e=
v
A
S/s.d.CS) as a function of
a Z assumed known.
e and C;
I
62
I.
I
I
I
I
I
I
I
This straight line is also plotteddfor reference in Figure 4.1.
cussion of Figure 4.1 is deferred until similar figures are developed
for the case where
0
is unknown.
v
J
v
v
MSE(S., S.) = MSE.(8, 8) var (S.),
J
J
J
where Sj is the true value and 8 =
Sj/~ar(Sj)
.
v
The expected value of 8 can be found by applying Lemma 4.2 with
k = 0; h (8,8) = 8; -a
1
= a
2
c
= C; A = [-C,C] = [a ,a ]; B =A, and
1 2
v
A
g (8) = 8; by Lemma 4.2,
v
E(818,C) = E[g(8)]
Ie
I.
I
I
2
The mean square error ofS.·can be computed from:
= 8 +
I
I
I
I
I
I
Dis-
0[~(C-8)
-
~(-C-8)]
-C-8
-
I
(t+8) f (t;O,l) dt
1
-C-8
= 8 where
~
8[~(C-8)
-
~(-C-8)]
- PFM(-C-8, C-8).
(4.35)
is the cumulative distribution function andPFM is the partial
first moment function for the standard normal distribution, as disv
cussed in section 4.1.3. The bias of 8 is:
v
BIAS(8,C) = E(8 - 8)
=
8[~('-C-e)- ~(C-8)]
This function is plotted in Figure 4.2
° < 8 < 4.
is zero.
(4.36)
- PFM(-C-8,C-8).
various C values and for
v
A
Again it should be noted that if C = 0, 8
8, so the bias
fo~
=
Points are not plotted for 8 <
relationship,
° because of the symmetry
I
63
I.
I
I
I
I
Lf)
o
::1'
o•
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
•
(T')
0
•
C/)
c:::e
.......
JXl
(\J
0
•
....•
o
o•
O.
2.0
1.0
3.0
8
Figure 4.2
Absolute value of the bias of the two-tail
v
estimator
e
e=
V
A
S/s.d.CS) as a function of
and C; 0 2 assumed known.
lI.O
I
.,.
I
I
I
I
I
I
I
--I
I
I
I
I
I
l-I
I
64
BIAS(-8,C) = -BIAS(8,C)
which can be inferred from (4.25).
To
thispoint.thedistribution.function~e.xpected
value, and
v
mean squareerror.of 8.have,been.displayed as functions of 8 and C.
v
The corresponding functions for the coefficient estimators, Sj' may
be found f-romthedefiningrelation (4024).
In alLof these deriva-
tions it.has been~assumedthat cr~_~s.knO-wn (and, therefore, that
e:
v
(var(Sj) is known), and also. that the estimator SJ is based on a twotail test. - The -next section contains an. investigation.. of -estimator
basedonthe.two.",tail testwhencr
2
e:
is estimated•. The theory for
estimators. based .onone'='"tailt-estsand.more.general. ·estimators will
be developed. in later sections.
4.2.2 •..' Properties..wi.th 0
2
Unknown
.In this section. i,t. is assumed that the experimental model is
as describedin.section.4~l~l;.02is. assumed unknown.
e:
The.
tion.
pro.c.edur.e--,is~
essentially the. same.. as· in-the---previous sec-
For each.texm.in the model
n (x) =
m
~
j=l
SjP. (x),
J -
(4.37)
the hypothesis
ISj I -<
H:
o
har (S.).
J
(4.38)
v
is tested and the estima.tor 13. is defined by
J
v
Sj =
~ if Ho is accepted}
13. if H is rejected
J
0
(4.39)
I
I.
I
I
I
I
I
I
I
•I
I
I
I
I
I
,
I
I
65
Consider the procedure:
(4.40)
Since the same· procedure will be applied to each term in the model
(with possibly different values for C), the notation can be simpli£led.
Let:
.2
A
(a) I3V\N«(3,O' );
i..=...,
. (3/0';
(c) 8 .. 13/0'V'\ N(8,1);
(b) 8
A
(4.41)
A
(d) . ~2 be the estimator. of CJ
2
A
var(l3) based on V degrees of freedom;
..
thus:
V ~2
(e) - 2 "'X 2
0'
V
A
independently of 13.
(Graybill, 1961).
Let
(4.42)
which has the noncentralF distribution with 1 andv.degrees of freedom and noncentralityparameter 82/2.
H (4.38) is equivalent to
o
181~1.
H:
o
(4.43)
Toro and::Wallace (1968) ..have shown that. the UMP test of H (4.43) is
o
given by:
Accept H:
if F < K
Rej ect H:
o
if F
o
(4.44)
2: K
I
66
I.
I
where theconstantK depends. on the degrees of fr·eedom and the level
of the test •. The acceptance criterion (4.44) is equivalent to:
I
I
I
I
(4.45)
which is equivalent to the procedure (4.40).
In the . derivation-of. themea·n- squar.e-eJ::.t'.o.r ancl ..expected value
v
of S it is more-convenient to.-work with.the standardized random
" .and-- the standardized parameter ,.
variables .a'" .. and.B,.
a..
In order to be
able to apply Lemma 4.1, define:
I
I
•I
I
I
I
I
I
l-I
I
(4.46)
(vu).is a.chi.,,-,square random variable. with V degrees of freedom, dis-
'"
tributed independentJ.¥.-of .. (3.,.as previously mentioned... From the relations above,
v
2
E[(S - S)
Is,
v
C, v] :;:E[(e.,. 8)
2
la,
'"
C, v]~var(S).
(4.47)
v
The mean square-.error.. and.-expected. value -of. e will now be
developed •... Consider. the. following two· subsets, A and.B, of the twodimensional. sample. space of.
'"
B :;: {(a,
u) :
:;:
'"
{(a, u):
'" u) :
A:;: {(a,
:;:
'"
{(a, u):
o~
u,
0
~
u,
o~
u,
~
0,
u
'"
(a.,
u:;:
'"
a IeI
"'2
e
'"
leI
"'2
e
a"'2 /0 2 ):
'"
1131 -> C
/~2 = C 1u~2
}
2
> c u}
a
'"
= lsi
2
< c
(4.48)
< C
uN}.
H
::;c 1u~2/v}
I
67
I.
I
I
I
I
I
Clearly A andB are disjoint and A +B.is.the whole sample space.
Also,
V
e
I
II
I
I
il
I
I
I
~
I
I
A
lB(e,
u)
where lB· is the.indj,cator-func ~on of .the set . . B.. .. For . the mean square
error of
v
e,
at given values of
e,
C, v:
MSE(8, C, v) =E(8
_-e)21 a,
C v]
apply Lemma 4.1 with the following correspondences:
a;
x =
I
A
=e
A
h(a,a)
u = u; -a
2
= a 3 = c;
2
= (a
A
-e) ; h(t+8,8)
=t 2;
Define the function
~(B; ••b;c.d) •
b
J
2 2
(B "t ) f1(t;O,l) P
~v<:~e)2 < ~ < v<t~)2:J
dt.
a
(4.49)
Then, from Lemma 4.1,
MSE.(8,C~v)
=
E[g2(e,u)]
A
= Var(e)
=1 +
+ IM
(e·
'
~(8;-CO,
To get the expectation of
v
a,
-00
-e·
'"
+co; C, 0).
-C 0) + I
(a·
-8
M'"
00'
CO)'
,.
(4.50)
apply Lemma 4.1 with the correspondences:
I
68
I.
I
I
I
I
I
I
I
II
I
I
I
I
I
I
~
I
I
A
X = 8;
A
u
= u;
-a
2
= a
A
h(8,8)
= 8;
h(t+8,8)
3
= C;
= t+8;
"
k = 0; g2(8,u) = 8;
and
b
= J - (t+8) £1 (t;O;1)
I E(8;a,b;c,d)
P
[,,(:~)2 < X~ < "(t~l2J dt.
a
(4.51)
From Lemma 4.1,
v
E(818,C,v)
= E[g2(8,u)]
= 8 + I (8; -00, -8;
E
= 8
+
I
E
~C,
0) .,. I (8; -8, 00; C, 0)
E
(8; -00, +00; C,O).·
v
Thus,the biasin.8 at the values 8, C, V is
v
BIAS(8,C,.V) :;: E(8.,..ej8,C,V)= I (8; -00,00; c, 0) •.
E
(4.52)
The integrals (4.50).and (4.52)Hmust be approximated by numerical
2
methods ... Due .to. the factor.e~t./2 and. the doubly.,.infinite range,
these integrals would. appear .. to be . good. candidates for Hermite quadrature (Hildebrand., .. 1956) .. .. However, . for.. large . degrees of freedom the
cumulative -chi.,..square.. distribu.tion function is. almost a·· step function
at the mean.·value, V"'. The shiU'pness. of~.the curvature .. of. the integrand
near the values
= V, i .~., t
= -8
± C
(4.53)
I
I.
I
I
I
I
I
I
I
--I
I
I
I
I
I
l-I
I
69
cause the Hermite quadrature algorithm to fail.
Since theintegrands tend to zero very quickly as t becomes
2
large (due to the exp(-t /2) and chi-square probability factors) the
integralscanbeadequatelyapproximatedover.a relatively short
(finite) interval, as-follows.
Let X-be an integer such that
o
The one can use as upper and lower limits of integration:
t
o
+
2
a
=
c
xo2 ,
i.e., t
- -
0
= ±
c
X 0
since the integrands are negligible outside the
c
X o
al.
a,
(4.54)
int~rval
[-c
X o
a,
Since the integrandshave.greatest curvature in the
neighborhood .of the. points-(4 .53)., -the ..three subintervals defined by
the points
-c
were used.
Xo
-a,-c.~ e~
A 32."point Gaussian
c-- a, c X0 - a
(4.55)
quadrature algorithm was applied to
theintegrands in (4.50) and (4.52)-over each--ofthe subintervals
definedbythe-.points (4.55).
The
integraJ.s-(4.50)~
(4.. S2) were ap-
proximatedasthe surne of .. thecorres.ponding. approximations over the
three.. subintervals.
All.arithme.tic-was... parfonne.d. in- double precision
(about-1S.decimaLplaces).on.the IBM 360/75.
Of . course, the final
results are. not accur.ate ..to.15 decimals.
A check..ou--the. -.accuracy ..of. the-approximation is available.
\> becomes~
As
large.the.integrands.b.ecome. more.. Ul"'.condi.tioned and the
accuracy. of the. approxi.mations.-decrease~.. However, for large \>, the
I
I.
I
I
I
I
I
I
I
II
I
I
I
I
I
I
L
I
I
70
functions MSE(8, C, v) and.BIAS (8,. C.,. v)., _0'2 unknown,tend.to MSE(8, C)
and BIASe8, C), 0'2. known, .respectively. .Convergence of· the corresponding families ..of plotted . curves convinces. us. that .. the., approximations are
adequate.
An additional check on the accuracy is available.
Guassianquad-rature.a~go.r..ithm.isequival.entto.
The n-point
fitting.an.n - 1 degree
polynomiaL to. the integrand ...(.the,p,olynomial equals the .integrand at n
selected pCiints), . and then finding the "exact" integral.. of the polynomiaL
If the polynomial very closely approximates the integrand,
then the integral. of the polynomial will be very close to the integral
of the integrand.
For several-of. the parameter. value. combinations con-
. sidered~ . the . integrand and ..corresponding . approximating polynomial were
plotted •.. The__ approxima.tion.was.varyclose in each case; in addition
the,. approximating.polynomialoscilla,tes. ab·out. the integrand, which.
implies. that-on .int.e.gra,Uo.n .. the arrors _of ..approximation tend to cancel
and produce. a.· very. accurate approximate integral.
v
The curves computed. for .. the mean square error of 8, MSE(8,C,v),
are plotted.in.Figures.4..3::-4.. 7 for various C-values and for
v = 2, 5, 10, 25, 50.
v
The... curvesrcomputed. for. the bias of.8, BIAS{8, C,.v), are plotted
in Figures
4.8~4_l2for.various C,..values
and for v
The actua.1.values .plott.ad. are.. JBIAS (8, C, v)
= 2,
5, 10, 25, 50.
I.
In each case the. functions are plottedonly.for 8
~
MSE function and·IBIASI·function.are,.s.ymmetric about· z.ero.
BIAS{·:"e~._c.,.v) .'"'.
.",BIAS(8,
0, since·the
However,
C, v).
The mean square.error..and .expectation of. the. original variable,
v
(3,. can.be.computecl-.. .from.the.relations (4.47).
I
I.
I
I
I
I
I
,I
,I
71
o
N
•
Lf)
0:::
0
....•
0:::
0:::
W
UJ
0:::
<C
::>
C?J
0
....•
C/)
~
Ie
<C
UJ
i'
I
II
I
'I
I
..
z
I.
I
I
:E:
Lf)
0
•
o•
O.
2.0
1.0
3.0
lI.0
8
Figure 4.3
Mean Square Error of the two-tail estimator
v
v
"
e = B/s.d.(S)
02
estimated.
as a function of
e,
C and V
= 2,
I
I.
I
I
I
I
I
I
I
Ie
72
o
N
•
LJ')
a::
0
....•
a::
a::
W
UJ
a::
<C
:::>
Gl
0
....•
(/)
z
<C
UJ
I
~
LJ')
0
•
I
(I
I
I
I
o•
o.
1.0
3.0
8
Figure 4.4
Mean Square Error of the two-tail estimator
v
8
v
A'
= S/s.d.(S)
cr2. estimated.
I.
I
I
2.0
as a function of 8, C and
V
= 5,
I
I.
I
I
I
I
I
I
I
Ie
73
o
N
•
Lf')
0:::
a
....•
0:::
0:::
W
UJ
0:::
«
::::>
Gl
0
....•
(/)
z
«
UJ
I
I
I
I
I
I
I.
I
I
:E
Lf')
0
•
o•
O.
1.0
2.0
LI.O
3.0
8
Figure 4.5
Mean Square Error of the two-tail estimator
v
v
A
e = S/s.d.(S)
0'2
estimated.
as a function of
e.
C and v
=
10.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
74
o
N
~
0
~
•
...
lJ')
•
~
Ll.J
UJ
~
«
:::>
C?l
(/)
...
0
•
z
«
UJ
:E
lJ')
0
•
o•
o.
1.0
3.0
9
Figure 4.6
Mean Square Error of the two-tail estimator
v
8
I.
I
I
2.0
0-2.
v
A
= S/s.d.(S)
estimated.
as a function of 8, C and v
= 25,
I
I.
I
I
I
I
I
I
I
Ie
75
o
N
•
lJ')
0:::
....•
0
0:::
0:::
W
UJ
0:::
<C
=:l
CJ
0
....•
(/)
Z
<C
UJ
I
I
I
I
I
I
:E
lJ')
0
•
o•
o.
1.0
3.0
9
Figure 4.7
Mean Square Error of the two-tail estimator
v
v
A
e = S/s.d.(S)
0
I.
I
I
2.0
2
estimated.
as a function of
e,
C and
V
= 50,
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
76
LJ')
•
C)
::r
•
C)
(Y')
•
C)
........
(\J
•
C)
•
C)
•
C)
o.
2.0
1.0
3.0
9
Figure 4.8
Absolute value of the bias of the two-tail
v
estimator
e,
C and
V
v
A
e = S/s.d.(S)
= 2,
0
2
as a function of
estimated.
4.0
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
77
If)
a
•
::r
a
•
('I'")
a
•
U)
c:::r:
(\J
a
a
a
•
•
•
o.
Figure 4.9
4.0
3.0
Absolute value of the bias of the two-tail
v
v
A
= S/s.d.CS) as a function
V = 5,a 2 estimated.
estimator 8
8, C and
I.
I
I
2.0
8
1.0
of
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
78
a
a
•
•
(lI
a
•
if)
<C
.......
P=l
(\J
a
a
a
•
•
•
o.
3.0
2.0
1.0
9
Figure 4.10
Absolute value of the bias of the two-tail
v
estimator
e,
V
A
e = S/s.d.CS)
C and V
= 10,cr2.
as a function of
estimated.
4.0
I
I.
I
I
I
I
I
I
I'
Ie
I
I
I
I
I
I
I.
I
I
79
I.f')
o
o
•
•
en
(/)
0
•
c::::r:
........
l=O
N
a
o
0
•
•
•
O.
1.0
3.0
2.0
8
Figure 4.11
Absolute value of the bias of the two-tail
e = S/s.d.CS)
A
estimator
as a function of
8, C and V = 25,cr 2 estimated.
4.0
I
80
I.
-I
If)
o
I
•
II
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
o
•
en
(/)
0
•
c:::r:
.......
t:Q
(\J
0
•
o•
o
•
o.
2.0
1.0
3.0
4.0
9
Figure 4.12
Absolute value of the bias of the two-tail
v
estimator
e,
C and v
V
A
e = S/s.d.(S)
2
as a function of
= 50, cr estimated.
I
81
I.
I
I
I
I
I
I
I
4.3.
Estimators Based 6nOne-Tail Tests
To this point estimators. based on two-tail. tests .have been considered.
Since an investigator. often.knows .. the algebraic sign of
certain of . the. coefficien.ts~nproceduresnw hich.. utilize .. this information
wilL be investigated... ,Throughout this . chapter .the. experimental model
is assumed to be as described in .section 4.1.1.
Themotivating,idea.continuestobethat setting.the estimator
2
A
"A
to Sj = 0 whenever ISjl < Var(l3 j ), and to Sj = Sj'the least squares
"
estimator~ when
13:J
A
> Var(S.).
known, it can be assumed without loss of generality that Sj .:: 0, for
if (3j < 0, one can consider -Sj'
The procedure. is totest.the hypothesis
Ie
I
I
I
I
I
I
I.
I
I
Since the sign of. the co.efficient· is
J
H:
o
(3. <
J -
;Gar(~j)
(4.59)
and use the estimator:
" =
(3j
{B
j if H0 is rejected}
o ifnHo
is accepted
For notational simplicity in the following derivations, let:
A
(3
A
denote Sj
A
0
so that:
2 denote Var(S,)
J
I
82
I.
I
I
I
I
I
I
I
Ie
2
A
13 "'" N(I3,cr ),
= l3/cr,
8
A
A
8 =S/crV\N(8,1),
A2
cr
is anunbiased.estimator of cr
based on V degrees
of freedom,
A2
V
cr
g \l\X~,
A
independently of
13.
The hypothesis (4059).,.intermsoftheHabovenotation, becomes:
H:
8 < 1
H:
e>
o
a
(4.60)
1
There are. two. situations, .' according ·to . whether cr
is. known, .·the.testis based. on· the statistic 8.
I
I
I
I
I
I
I·
e
I
I
2
2
is known.
When cr
2
It .is well known that
the normaLprobabilitydensityhas ....monotone likelihood ratio (for
variation inHthe.maan~.8)~ . app.lication . of. Lehman's. Theorem 2 (Lehman,
1959i·P... 68}.. yi.elds.the;,.UMP ...tes.t..(for a given a level.) .of the hypothesis (4.60):
where C.is.determined from a.
Ifa
2
is. unknown, . consider. the statistic:
t
=
f3Lcr
;;2/cr2 =
e
I
83
I.
I
I
I
I
I
I
I
which has the noncentral t distribution with noncent:ralityparameter 8.
Lehman (1959, p. 233) shows that this: distribution has the monotone
likelihood: ra.tio ..property(for . variationin e), so ano.ther application
of Lehmann's. Theorem. 2. (Lehmann, .1959, . p •. 68) . yields . the. UMP test of
the hypothesis (4.60) as:
ep(t)
I.
I
I
1 if t >
e
where He is.. determined from Ci.andthe noncentral tdistribution.
simplicity, .rewrite the test above in the form:
ep (t)
Ie
I
I
I
I
I
I
=
;;2/ri'
"
0 if e < e ;;2/ i'
"
=
1 if 8 >
e
r
Thus, ineithercase(cr 2 known.ornot),.theestimator for 8
based ontheUMP.one...,tail test can be written:
e" if e" > e
V
e=
o if
"
/;2/ 0 2
e < e /~2 2
/0
"
= e lB(8)
where
B
"
= fe,
"2
O'
"
> 0: 8 > eA2/ 0 2}
" > 0, cr"2 > 0:
= {8
and where
estimator
"2
(J
is replaced by cr2 , When known.
\) 82
e2
\) ~2
2
cr
-->~}
This also yields the
For
I
84
I.
I
I
I
I
I
I
I
Ie
where
A
with cr
2
2
replacedby.cr, if known.
The properties of the estimators may now. bed.erived,. depending
on whether
4.3.1.
ri
is known.
Properties of. the·One.,-TaiLEstimatorsWhen
I.
I
I
is Known
For notational convenience the expected value and mean square
V
A
error of6 will be evaluated.
Let .. f(6; 6) denote theN(8, 1) density
and consider:
A
B
=
MSE(8,C) =
A
{8:
8 > C}
E[(~ - 8)218, C]
A
A
I
I
I
I
I
I
ri
= E{[81B(8) - 8]2/ 8 , C}
00
=
I
A
A
A
A
[8 1 (6) ,.. 8]2 £ (8; 8) d6
13
-00
C
00
=
I
(6 -
A
6)2 £(8 ; 8)d8+8
C
Application.. of the. change of variable t =
MSE(8, C) =PSM(C- 8, +00) + 8
2
J f(6;
A
8) d8
_00
e2
8 yields:
He - 8)
(4.61)
where<P denotesthe.cumulative distribution function and PSM denotes
the partial second moment function,. discussed in section 4.1.3.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
85
Similarly,
I
00
v
E[e!e, C] =
I
00
a lB(a) f(a; e) de=
A
A
A
e fee) de
C
-00
C
=e-
C-8
e f(8;8)d8 = 8- I27T
-----L J (t+8)
I
-00
2
exp(-t !2)dt
_00
=e
- e
~(C
- 8) - PFM(-oo, C - 8)
(4.62)
where PFM denotes the partial first moment function for the standard
normaL density, discussed ind section 4. L 3.
v
Thus the bias of e at 8,
C is
BIAS(e, .c)
=
-PFM(-oo,
.c - e)- e
~(C
- e).
(4.63)
The MSEand BIAS functions are plotted for various C-va1ues in Figures
4.13 and 4.14 respectively.
Discussion of these Figures is deferred
until corresponding res~lts are developed for the unknown variance
case.
4.3.2.
Properties-of- Ona.,..Tail-Est:ima.tors-~When·
Vat'iance is Unknown
v
The mean. square .. error of e-canbe found by-applying Letnma 4.1
with the following--cot'respondences:
x
= 8;
u
= u;
a
2
=
-00;
a
3
= C;
A
A
2
2
h(e,S) = (8- 8). ;h(t + 8,8) = t ;
2
A
V
2
k = 8 ; and g2{8,u) = (8 - 8) •
I
I.
I
I
I
I
I
I
I
Ie
86
o
•
C\J
If)
~
0
.....•
~
~
w
UJ
~
<C
::::>
C?l
0
.....•
(/)
z
<C
UJ
I
I
I
I
I
I
:E
If)
0
o
•
•
-1.0
D.
3.0
8
Figure 4.13
Mean Square Error of the one-tail estimator
v
e=
02
I.
I
I
2.0
1.0
v
A
S/s.d.(S) as a function of
assumed known.
e and
C;
4.0
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
I·
I
87
Lf')
o•
::Jf
o•
('I")
C/)
0
•
c:::e:
-
1.4
l:Cl
C\J
0
•
.....
o•
o
•
-1.0
o.
2.0
1.0
3.0
9
Figure 4.14
Absolute value of the bias of the one-tail
v
estimator
e and
V
A
e = S/s.d.CS)
as a function of
C;a 2 assumed known.
4.0
I
I.
I
I
I
I
I
I
I
88
Let I (8; a,b;c,d)-be defined.as in the derivation of the MSE of the
M
two-taiL estimator -(equation 4.49).
= 1 + I H(8;
I
.,...00, .,...8; ""'00, 0) + 1MC8;-8, 00; C,O).
The-chi-square.probabilityin the expression.IMC8;-00, -8; -00, 0)
is identically-l.O;-thus-the-term has value
82 (CP(-8) .,. . <I>(-oo)] -PSM(-oo, -8).
Hence,
MSE(8,C,v)
=1
+
Ie
I
I
I
I
I
I
I
I·
Then,. by Lemma 4.1,
+ 82 11>(..,.8) -PSM(-00,-8)
~(8;
-8, 00; C, 0).
(4.63)
v
The expected value-of 8 can also be.foundby application of
Lemma. 4.1, with. the . following .. correspondences:
A
x
= _00;
a
= 8;h(t+8,8) = t+8;
k
= S;
A
h(8,8)
u
= u;
a
2
3
= C''
A
= 0;
v
A
=
elf
Let IE(s;a,b;c,d) bedefined·as at equation (4.51) in the derivation
of
thedexpected~value--of-the
V
E(SIS,C,v)
A
. two.".taiLes.timator;-- then by Lemma 4.1,
= E[g2(8,u)]
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
89
Because the chi....,square probability is identicallylrO.in the second
term,
I (8; -00, -8;....,00, 0) =....,8 \p(-8) - PFM(-oo, -8).
E
Thus,
v
E(8jS,C,v)
= 8..,..8.\p(,..8)
- PFM(-oo, -8)
+ I E(8; -8, 00; C, 0).
(4.64)
The last. terms in (4.63). and (4.64)..must ..beappr.oximatec:Lnumerically.
Since the. integrands. of these termeare identical. to.the . corresponding
integrands inthe .. two,..tail case. similar techniques ..were.used for evaluation and.. similar. remarks on. the accuracy. of the. approximation apply.
Specifically~.. the.in.teg1;."als.were
[-8, CXo,..
81~where.. Xo.is
As before, fo·r large
approxima.ted over the interval
an. integer satisfying.
v the integrands. have greatest curvature in the
neighborhood of:
v
(e ~ t)
2 _
v,
i.~.,
t_ e.
± C-
The integrals were approximated.bya32--p.oint.Gaussi.an quadrature
technique. (double precision. arithmetic). over each of the two inter....,
vals [,..8, C-81,[C ..,.8,.Cxo ...,.8]; .. the.sum of the integrals over
the subintervals. was taken as· the approximation for the integral over
the whole interval [-8, + 00].
I
I.
I
I
I
I
I
90
The
remarks~on
the.accuracypof.th.e appro:x1mation in the two-
tail case apply here also.
Graphs of the MSE.. and-IBIASI .functions for. various C-values, for
'V
= 2,
4~19
5,.10,.25, 50. and for.-LO £.6 £.4.0 are· given in Figures 4.15-
and 4.20...,4.24 respectively.
4.4.
Generalized Estimators
In somesit-uations..it.. may.. be des.irable.to-use.. th.e generalized
estimator,
0 if Sj e: A'*
I
v
Sj
=
A
Sj if Sj
I
*
E B
= A*c
Ie
* a subset of the sample space and B* is the complement of
W'hereAis
I
I
*
will be.. dev.elop.ed.fo.r. the ..casesin .which.Ai·s.·a
finite. interval , the
I
I
I
I
I.
I
I
A* relativeto.thesample .space •... The properties of this estimator
complement of . a finite interval, or· a
half~line...
If the variance is
assumed unknown.,. the endpoints of A* will be random.
The properties. of. the. estimator. will be developed in terms of
the standardized variables
and
The properties.of this. 6.wil.!. bedeveloped.along the same lines as
derivations in.pr.evious sections.
I
I.
I
I
I
I
I
I
I
91
o
•
C\J
Lf')
0:::
....•
0
0:::
0:::
W
UJ
0:::
<t:
:::::>
0
....•
C?1
C/)
Ie
I
I
I
I
I
I
Z
<t:
UJ
:E Lf')
0
o
•
•
-1.0
o.
4.0
3.0
8
Figure 4.15
Mean Square Error of the one-tail estimator
v
v
A
e = S/s.d.CS)
0'2.
I.
I
I
2.0
1.0
estimated.
as a function of
e,
C and v
= 2,
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
92
o
•
C\J
Lf)
0:::
0
....•
0:::
0:::
W
LU
0:::
<C
:::>
Cl
0
....•
(/)
z
<C
LU
::iE:
Lf)
0
•
o•
-1.0
o.
2.0
1.0
3.0
4.0
8
Figure 4.16
Mean Square Error of the one-tail estimator
v
v
A
e = S/s.d.(S)
0
2
estimated.
as a function of
e,
C and v
= 5,
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
93
o
•
C\J
Lf)
•
a:::
0
~
a:::
a:::
l.L.J
UJ
a:::
<t 0
::::>
QJ
•
~
C/)
Z
<t
UJ
:E
Lf)
0
o
•
•
-1.0
o.
2.0
1.0
4.0
3.0
8
Figure 4.17
Mean Square Error of the one-tail estimator
v
8
0-2.
v
A
= S/s.d.CS)
estimated.
as a function of 8, C and V
= 10,
I
I.
I
I
I
I
I
I
I
Ie
t
I
I
I
I
I
..I
I
94
o
C\J"
Lf)
0::
0
....."
0::
0::
W
UJ
0::
«
::J
Ql
0
....."
(/)
z
«
UJ
:E
Lf)
0
"
o"
-1.0
o.
2.0
1.0
4.0
3.0
9
Figure 4.18
Mean Square Error of the one-tail estimator
v
v
A
e = S/s.d.CS)
02
estimated.
as a function of
e,
C and v
= 25,
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
95
o
•
C\J
Lf')
•
0:::
0
r-f
0:::
0:::
W
UJ
0:::
«
0
:::>
Ql
•
r-f
U)
Z
«
UJ
::E:
Lf')
0
•
o•
-1.0
o.
2.0
1.0
3.0
4.0
8
Figure 4.19
Mean Square Error of the one-tail estimator
v
v
cr
A
= S/s.d.CS)
8
2
estimated.
as a function of 8, C and V
= 50,
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
96
Lf)
o•
en
0
•
(/)
1.4
«
...P:::l
C\I
0
•
....•
o
o
•
-1.0
O.
2.0
1.0
3.0
9
Figure 4.20
Absolute value of the bias of the one-tail
v
estimator
e,
C and
v
A
e = B/s.d.CS) as a function
V = 2,(J2 estimated.
of
lI.O
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
97
Lf)
o
•
:::J'
o
•
(T1
0
•
(/)
1.4
c::::r:
........
CCl
C\J
0
o
o
•
•
•
-1.0
o.
2.0
1.0
3.0
9
Figure 4.21
Absolute value of the bias of the one-tail
v
8, C and v
V
A
= S/s.d.(S) as a function
= 5,0 2 estimated.
estimator 8
of
4.0
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
98
Lf)
o
•
::::Jf
o•
en
(/)
0
•
<C
1.4
........
t:r.l
C\I
0
•
~
o•
o•
-1.0
o.
3.0
8
Figure 4.22
Absolute value of the bias of the one-tail
v
estimator
e,
I.
I
I
2.0
1.0
e=
V
A
S/s.d.(S) as a function of
C and v = 10,
.cy 2
estimated.
14.0
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
99
Lf)
o
o
•
•
en
0
•
(/)
c:::r:
1.4
t:O
C\J
0
•
....
o
o
•
•
-1.0
o.
3.0
8
Figure 4.23
Absolute value of the bias of the one-tail
A
estimator
8, C and V
I.
I
I
2.0
1.0
e = S/s.d.CS)
= 25,a 2
as a function of
estimated.
4.0
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
100
Lf)
o
•
=:Jf
o•
(rl
(/)
0
•
1.4
c:::(
......
!=Q
C\J
0
o
o
•
•
•
-1.0
o.
2.0
1.0
3.0
8
Figure 4.24
Absolute value of the bias of the one-tail
e = S/s.d.CS)
A
estimator
e,
C and v= 50, .a
2
as a function of
estimated.
4.0
I
I.
I
I
I
I
I
I
I
101
Consider the case in which the variance is known.
const~ts
C , C (C < C2 ) be given, with C
l
2 l
l
allowed.
Define A
= [Cl,C Z]
B = AC •
=
~
Let the
and/or C
2
= +00
Let
The expected value, bias, and MSE of each of these estimators are
found by simple applications of Lemma 4.2.
A
define (in Lemma 4.Z) k
For expected values,
V
A
A
V
*A
= 0, h(8,8) = 8, 81 = g(8), and 8Z = g (8).
Then, by Lemma 4.Z,
Ie
I
I
I
I
I
I
I.
I
I
C -8
fZ
(t+8) fl(t;O,l) dt
C -8
l
and
,
Thus,
BIAS(8,A)
v
= E(8 l
- 818,A)
I
102
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
and
v
BIAS(e,B) = E(e .- ele,A)
2
For the MSE functions, define k
v
2
*A
(e 1 - e) , g (e)
t
2
•
=
2
A
, h(e,e)
A
= (e
2
A
- e) , gee) =
V
2
2
(e 2 - e) , and note that h(e+e,e) = [(t+e) - e] =
By Lemma 4.2,
MSE(e,A) = E[(el-e)2Ie,A]
= 1+
e2[~(C2-e)
= E[g(8)]
~(Cl-e)]
-
-e
C
J2
-
2
t f (t;0,1) dt
l
cl-e
\I
*A
2
MSE(e,B)= E[(e -e) le,A] = E[g (e)]
2
= e2
+ 1 - MSE(e,A)
Consider the case in which Var(S.) is unknown.
Let constants
J
Cl ' C2 ' C3 ' and C4 be given such that
Define the three subsets of the sample space:
A
A
Ai = {(B,u):
u 2: 0, Ci
ru < e
~ C+
i l
ru:},
i
=
and the complements (relative to the sample space),
A
I.
I
I
=e
Hi
= {(B,u):
A
u ~ 0, (e,u) ~ Ai}' i = 1,2,3,
1,2,3,
I
I.
I
I
I
I
I
I
I
Ie
103
A2 . 2
= ala
where u
~1
t
as in previous sections.
A
·
J
A
i f (8, u) E A
A
Define the estimators:
A
8 1 . (8, u) , i
B
= 1,
2, 3;
J.
i f (8, u) E B
i
v
8 .
CJ.
={O
1, 2, 3.
i f (e,u)
A
A
8 if (8,u)
The expected values and BIAS functions for these estimators are found
from Lemma 4.1 by setting k
*
A
g.(8,u)
J.
v
= 8CJ..•
=0
and h(8,8) = 8, so that gi(8,u)
= v8i ,
Temporarily define
Then by Lemma 4.1,
I
I
I
I
I
I
The BIAS functions for the various estimators are easily written from
I.
I
I
the equations above.
I
I.
I
I
I
I
I
I
I
104
k
=
and
The MSE functions can be found by applying Lemma 4.1 with
2
A
A
2.
2
A
V
2
8 , h(8,e) = (8 - 8) , so that h(t+8,8) = t , g.(8,u) = (8.-8)
"
*
g.(8,u) =
1
A
1
V
2
(8 i-8).
c
Temporarily define the function
b
I M(8;a,b;c,d)
=
I
1
(8 2-t 2 ) f 1 (t;0,1) P
[V (t+8)
2 2
a
c
< X
2 <V(t+8) 2 ]
. 2
dt.
d
Then, by Lemma 4.1,
Ie
I
I
I
I
I
I
It is easily seen that these results include the results of the previous sections as special cases.
The symmetric two-tail estimator is
obtained by setting C
2
= -c 3 and using "82 , The one-tail estimator is
obtained by setting C
2
= -po and using
v
8 ,
2
The functions above are evaluated numerically in the same manner
as the corresponding functions for estimators discussed in previous
sections.
The computer subroutines used for the previous estimators
were altered to compute the properties of these general estimators.
Previous comments on accuracy apply here, also.
I.
I
I
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
~
I
I
105
It is important to note that the MSE and BIAS functions in this
case are not symmetric.
The figures on the following pages illustrate
MSE and BIAS curves for various (C1,C ) values for the known variance
Z
case.
- - e- - - - - - - -e -- - - - -
Iii"" ... _
e
0
•
en
If)
•
C\J
0::
a
0
N
•
0::
0::
W
UJ
0::
«
:::J
as
If)
•
--....•
0
(/)
U1
z:
«
0
•
UJ
:.E:
0
•
- 3.0 -2.0 -1.0
Figure 4.25
o.
1.0
2.0
3.0
4.0
8
Mean Square Error of the generalized estimator
v
A
A
A
S = 0 lACS) + S ~(S), variance assumed known.
.....
o
0\
- -e- - - - - - - -e -- - - - - - •- o
•
('I"')
U1
•
(\J
B = [0, 2.667]
C)
•
0::
o
C\J
0::
0::
W
UJ
0::
c:x:
::>
lJ")
•
a
B = [-2, 0]
.....•
,
Ql
C/)
z
~
U1
a
•
y
[0,-1=]
c:x:
LU
:E:
a•
O.
- 3.0 -2. 0 -1.0
1.0
2.0
3.0
l!.0
5.0
8
Figure 4.26
Mean Square Error of the generalized estimator
e= 0
A
A
A
lAce) + e IBce), variance assumed known.
f-'
o
-....J
- _.e-- -------- - - - - e- e
a
•
en
Lf')
•
C\J
0::
0
o
C\J
•
1
\
B = [0, 5.333]
0::
0::
W
UJ
0::
<C
:::>
C3S
-
Lf')
•
a
=
[0, 00] }
•
B
Ul
<C
=
~
U)
z
B
a
A = [-4, 0]
•
UJ
:E:
o
•
O.
-2.0 -1.0
1.0
2.0
3.0
1l.0
5.0
6.0
8
Figure 4.27
Mean Square Error of the generalized estimator
v
6
A
=
A
A
0 lA(6) + 6 lB(6), variance assumed known.
I-'
o
00
- -e- - - - - - - -e - - - -- - -e-0
•
(r')
If)
•
(\J
0
N
•
If)
•
.......
(/)
c:::::e:
0--1
0
•
I
B
= [0, -too]
.,;.....I
~
If)
0
o
•
•
o.
-3.0 -2aO -1.0
1.0
2.0
3.0
4.0
8
Figure 4.28
Absolute value of the bias of the generalized estimator
v
8
A
=0
A
A
LA(8) + 8 lB(8), variance assumed known.
f-'
o
\.0
- -e- - - - - - - ; - - - - - - - e- a
·
(T')
lJ)
•
(\J
a
•
I
\
B = [0, 2.667]
(\J
~
lJ)
•
~
/
/
B = [0, 4]
(/)
c::::r::
~
O
•
~
Ln
•
a
a
-3.0 -2.0 -1.0
Figure 4.29
1.0
9
0.
2.0
3.0
4.0
5.0
Absolute value of the bias of the generalized estimator
v
e
A
A
A
= 0 lAce) + e lB(e), variance assumed known.
I-'
.....
o
- -e- - - - - .. - r - - - - - .. -e- a
·
('r")
~t
a
N
B
=
1/
..
..
B
=
[0, 8]
•
lJ1
.-..•
en
c:::I:
.........
a
-=
--b....A
=
[-4, 0]
l:Q
lJ1
•
a
a
•
O.
-2.0 -1.0
Figure 4.30
2.0
8
1.0
3.0
ll.O
5.0
6.0
Absolute value of the bias of the generalized estimator
e=
A
A
A
0 lA(8) + 8 lB(8), variance assumed known.
I-'
I-'
I-'
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
5.
SOME TECHNIQUES FOR THE SELECTION OF CUTOFF POINTS
In order to apply the procedures discussed in the previous
sections one must select the cutoff points (C, or C and C ).
l
2
Pre-
sumably, if there is prior information about a particular parameter,
one would like to use this information in the selection of cutoff
points in an attempt to decrease the mean square errors,6fan estiy
mator, S .•
J
Isjl
For example, if one were absolutely certain that
< JCar"(Bj ) ,one would
v
mation equation <Sj
= 0);
two-tail procedure with C
always delete the j-th term from the esti!.~.,
= +00.
in effect one would use a symmetric
On the other hand, if one were abso-
lutely certain that Isjf > kr(;j) one would use the symmetric twotail procedure with C
= 0;
i.~.,
v
estimation equation, setting Sj
always include the j-th term in the
= Sj'
These two
extreme~
the relationship between 8-values and optimum C-values.
help explain
In general,
if prior information indicates that S: < Var(S.), one would tend to
J
J
use large C-values with either the one-tailor symmetric two-tail
procedures.
If prior information indicates that
v
would tend to use small C-values (use Sj
S~
> var(Sj), one
A
= Sj
a larger proportion
v
of the time, or increase the probability that Sj
A
= Sj)'
In this section two techniques are discussed and illustrated
for the selection of cutoff points when prior information is available
in the form of a "prior distribution" for a particular parameter, 8.
I
Ie
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I
.e
I
113
It is assumed throughout this discussion that the variance is
known.
Although the techniques can also be applied to the unknown
variance situation, the MSE curves for the known and unknown variance
situations are so similar that it is expected that the selected cutoff
points for the two situations would not be very different.
The following notation will be useful.
Let:
g(8) denote the prior density of 8;
A
f(818) denote the conditional density of 8 given 8,
i.~.,
the N(8,1) density;
A
r-
A
f(8,S) = g(8)f(SI8) denote the joint density of 8, 8 ;
A
A
f(8) =
I
f(8,8)d8 denote the marginal density of 8;
A
h(818) = f(8,8)/f(8) denote the conditional density of 8 given 8,
i.~.,
the posterior density of 8.
It is assumed that all of the functions above exist and are valid
probability density functions.
In Bayesian terminology, the "decision
function" is
A
..,
d(8) = 8 =
where the set A is the set on which
V
8
=
A
e.
The loss function is:
L(8,d(8»
=
v
(8
v
e = 0,
and B is the set on which
I
Ie
I
I
I
I
I
114
the risk (expected loss) is:
5.1.
The Bayes Decision Procedure
One strategy for the selection of the cutoff points (equivalent
to the selection of the decision function, d(8»
is to use the Bayes
decision procedure (Lindgren, 1962; Michaels, 1969), which consists of
choosing d
(i.~.,
C and C ) to minimize the expected risk with respect
2
1
to the prior of 8,
i.~.,
choose C and C to minimize
2
1
I
I
Ie
I
I
I
I
I
I
I.
I
I
Li~dgren
(1962, p. 285) shows this strategy is equivalent to minimiza-
tion of the expected posterior loss,
i.~.,
after 8 is observed, com-
'"
pute h(818) and choose the sets A and B to minimize:
m
{l1, <:2)
'"
i f 8 e; A
"'2
'"
E (8-8)2 = 8 - 28 E (8)
h
h
'"
2
+ Eh (8 ) if 8
E:
B.
v
'"
Thus, the set A should be chosen so that 8 e; A (8 = 0) i f
<~
'"8[2E (8) - '"8] < O.
h
I
I.
I
I
I
I
'I
I
I
Ie
I
I
I
I
I
I
I
e
I
I
115
The Bayes decision procedure (for selecting A and B or C and C )
1
2
can be summarized as follows:
(1) Compute the posterior density of e for the observed value
of
8 = Bj l'Gar(Sj);
(2) Compute the mean of the posterior density, Eh(e).
pute q(e) =
e[2~(e)
A
Then com-
- e].
"
(3) If q(e) < 0, set e
= V0;
v
otherwise e = e.
The procedure does not yield the sets A and B directly; one method for
finding the "effective ll cutoff points for the above procedure is to
"search" for the cutoff points by repeating the procedure above for
various e values.
For the normal prior distribution it is possible
to find Eh(e) and the sets A, B explicitly in terms of the parameters
A
of the prior and e (see section 5.3).
The Bayes decision procedure is optimal in the sense that the
sets A and B are chosen so as to minimize the expected value of
MSE(e,c ,c ) with respect to the prior distribution.
1 2
However, in some
cases the procedure is very sensitive to small changes in the parameters of the prior distribution.
5w2.
The Posterior C=l Procedure
If the parameter e(or So) were known exactly, the appropriate
J
cutoff points would be -1 and 1, which motivates the following procedure:
A
(1) Compute the mean, Eh(ele), of the posterior distribution;
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
116
(2) Apply the rule:
otherwise
While this rule is not optimum in the sense that the Bayes decision
procedure is optimum, it does have intuitive appeal.
The mean of
the posterior distribution in a sense summarizes the prior information and the information from the sample regarding the value of 8;
the same procedure is applied to the posterior mean that would be
applied to the parameter itself, if it were known.
Aside from any intuitive appeal, the cutoff points produced by
the "Posterior C=l Procedure" for a normal prior are not as sensitive
to small changes in the prior parameters as the Bayes procedure, and
may be more appealing in other ways than the cutoff points produced
by the Bayes decision procedure.
The Posterior C=l Procedure does not produce explicit algebraic
.
2
formulas for cutoff points. For the normal N(~,t ) prior, the cutoff
points can be found explicitly in terms of ~ and t
2
(see section
5.3); for other priors a search procedure may be necessary.
5.3.
Cutoff Point Selection with a Normal Prior
Hogg and Craig (1965, p. 165) show that for the N(~,t2) prior
A
for 8, and N(8,l) density for 8, the posterior distribution of 8 given
A
8 is the normal distribution with mean
I
I.
I
I
I
I
I
I
I
117
and variance
t
*2
Note that the posterior mean is a weighted average of the prior mean
A
and the observed
e.
From the results of section 5.1, the Bayes decision procedure
depends on the sign of the function
A
A
A
A:.
8] ,;;,' 8[2lJ
*
A
.. 8]
which simplifies to:
q(8) = 8
2
2
(t 2
A
A
e=
e=
_1) + 8(...1l:t-)
+l
t
Ie
I
I
I
I
I
I
I
e
I
I
values are presented in Figure 5.1.
t
2
= 1.
2
+l
22'
A
-2lJ/(t -1) if t ~ 1, or just 8 = 0 if
with roots at
0 and
t
Typical graphs of q(8) for various combinations of lJ,t 2
A
A
q(8) < 0, and
v
Remember that 8 = 0 wherever
A
e = e wherever
q(8)
~
O.
The application of the Posterior C=l Procedure is as follows:
v
e=
0 if IlJ * I
=
{ e otherwise
which is equivalent to:
A
v
e=
A
e otherwise
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
118
o
&----.........---A.
q(6)=Q
5?----.....
--~­
;:l.
Figure 5.1
Configurations of q(6) for various combinations
of prior distribution parameters.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
119
where:
C = _Q - (
2
1
t
C = _ l!.
2
2
t
1:~2 ) = - ( 7 )-1
2
+ ( 1+t2 )
m
t
(7)
+1.
Various properties of the two procedures may be compared, the
most important being the expected value of the MSE function with
respect to the prior distribution.
tion, optimum for this criterion.
The Bayes procedure is, by definiA comparison of the expected MSE
functions is given in Table 5.1 for several
(~,t
2
) -values in the
ranges likely to be encountered in practice. In examining the table
v
A
v
one should note that if e e the expected MSE is 1.0; if 8 = 0 the
=
2
expected MSE is E (8 2 ) = ~2 + t •
g
The expected MSE for the optimal
(Bayes) procedure must lie between 0.0 and 1.0.
2
The Bayes procedure is sharply superior for small t «1) and
~ ~l
values.
Note that in some cases the expected MSE is greater
than 1.0 for the Posterior C=l Procedure; in such cases the use of
V
8
A
= 8 would
be a better procedure.
There are other properties of the procedures which can be
compared.
The Bayes procedure is "sensitive" to certain prior
parameter values; at certain points slight changes in parameter values
v
can make striking changes in the cutoff region A(over which 8
and its complement, B.
The procedure is sensitive at
2
2
value of t , at t = 1 and any value of
tive at ~
~,
= 0 and t 2 = 1. For example, if
= 0)
= 0 and any
~
and is particularly sensi~
= -0.01, t
2
= 0.99, then
.. -e" .. .. - .. .. ............
..
e
•
Table 5.1.
~
A comparison of the expected values, with respect to a N(~,t2) prior, of the Mean
Square Error functions for the Bayes decision procedure and the Posterior C=l
Procedure
t
2
C
1
Posterior C=l
Procedure
C
2
E(MSE)
0.0
0.0
0.0
0.0
0.0
0.25
0.50
1.00
2.00
4.00
-5.000
-3.000
-2.000
-1.500
-1.250
5.000
3.000
2.000
1.500
1.250
0.2501
0.5558
1.0000
1.1386
1.1270
0.5
0.5
0.5
0.5
0.5
0.25
0.50
1.00
2.00
4.00
-7.000
-4.000
-2.500
-1. 750
-1. 375
3.000
2.000
1.500
1.250
1.125
0.5539
0.8171
0.9959
1.1117
1.1179
1.0
1.0
1.0
1.0
1.0
0.25
0.50
1.00
2.00
4.00
-9.000
-5.000
-3.000
-2.000
-1.500
1.000
1.000
1.000
1.000
1.000
0.9466
0.9243
0.9438
1.0468
1.0933
Bayes Procedure
E(MSE)
C
C
1
2
0.0
0.0
0.0
0.0
0.0
Ratio =
E(MSE:P C=l)
E(MSE:Bayes)
0.0
0.0
0.0
0.0
0.0
0.2500
0.5000
1.0000
1.0000
1.0000
1.0005
1.1116
1.0000
1.1386
1.1270
0.0
0.0
0.0
-1.0000
-0.3333
1.3333
2.0000
0.0
0.0
0.0
0.4192
0.6238
0.8254
0.9892
0.9994
1.3215
1.3099
1.2064
1.1238
1.1186
0.0
0.0
0.0
-2.0000
-0.6667
2.6667
4.0000
0.0
0.0
0.0
0.6763
0.7489
0.8004
0.9469
0.9956
1.3996
1.2341
1.1792
1.1055
1.0982
\-I
N
·0
........
_-_..
....
-..
- •- e
Table 5.1.--Continued
11
t
2
C
l
Posterior C=1
Procedure
C
E(MSE)
2
C
l
Bayes Procedure
C
E(MSE)
2
Ratio =
E(MSE:P C=l)
E(MSE:Bayes)
2.0
2.0
2.0
2.0
2.0
0.25
0.50
1.00
2.00
4.00
-13.000
-7.000
-4.000
-2.500
-1. 750
-3.000
-1.000
0.0
0.500
0.750
0.9999
0.9683
0.8996
0.9414
1.0274
0.0
0.0
0.0
-4.0000
-1.3333
5.3333
8.0000
0.0
0.0
0.0
0.9379
0.9217
0.8995
0.9108
0.9792
1.0661
1.0505
1.0001
1.0335
1.0493
4.0
4.0
4.0
4.0
4.0
0.25
0.50
1.00
2.00
4.00
-21.000
-11.000
-6.000
-3.500
-2.250
-11.000
. -5.000
-2.000
-0.500
0.250
1.0000
1.0000
0.9999
0.9891
0.9796
0.0
0.0
0.0
-8.0000
-2.6667
10.6667
16.0000
0.0
0.0
0.0
0.9997
0.9990
0.9961
0.9858
0.9770
1.0003
1.0010
1.0038
1.0034
1.0026
NOTE:
For t
2
e
< 1, the cutoff region (on which
= 0) for the Bayes procedure is the complement
2
2
of [C ,C ]; for t = 1, the cutoff region is (-00,0); for t > 1 the region is [C ,C ]. The cutoff
1 2
I 2
region for the Posterior C=l Procedure is [C ,C ].
1 2
H
N
l-'
I
122
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
A
=
(-00,-2) + (0,+00), B
= [-2,0].
But if
A = [-2,0] and B = (-00,-2) + (0,+00).
~
= +0.01, t
2
= 1.01, then
A change of only 0.02 in each
of the parameters completely reverses the procedure!
Table 5.1 reveals
why the Bayes procedure is so sensitive; in the neighborhood of
t
2
v
A
~
= 0,
V
= 1 it makes little difference whether one sets 8 = 8 or 8 = 0;
the expected MSE's are nearly identical, and approximately equal to
1.0.
A slight departure from the point ~ = 0, t
2
= 1 produces a very
v
A
v
slight change in the expected MSE, which makes one of 8 = 8 or 8 = 0
slightly superior to the other; slight changes in opposite directions
would produce opposite cutoff regions but nearly the same expected
MSE.
The Posterior C=l Procedure is insensitive to small parameter
value changes.
All the results discussed in this section hold for the normal
prior distribution; other prior distributions may produce quite different results.
~.4.
Cutoff Point Selection with a Uniform Prior
This section is included as an example of the application of the
Bayes decision and Posterior C=l procedures for a non-normal prior.
The posterior distribution is messy; no detailed analyses or comparisons of the two procedures will be attempted.
1
g(8;a,b) = b-a a ~
° elsewhere
Let
I
123
I.
I
I
I
I
I
I
I
A
be the prior of 8, and let f(8/8) denote the N(8,l) density.
the joint, marginal, and posterior densities are found as follows:
.
2
A
exp{-(8-8) /2}
f(8,8) =
(b-a)
-00
I27T
< 8 < +
00
a < 8 < b
b
=
f(8)
f f(8,8)d8 = b=a [@(b-8) -
@(a-8)],
_00
< 8 <
_00
a
A
.
2
A
he8/8)=
exp{-(8-8~ /2}
I2TI
A
[@(b-e)-@(a-8)]
a < 8 < b
The posterior is essentially the N(8,1) distribution truncated at
a and b.
The expected value of the posterior is:
boO
A ..
Eh [81 8 ]
Ie
I
I
I
I
I
I
Then
=
ra 8h(8Ia)
d8
A
A
[@(b-8) - @(a-8)]
b-8
2
fA (t+8) exp{-t /2} dt
a-8
= =-~---:;A::----AA---
I2TI
[@(b-8)~@(a-8)]
A
A
= 8 + PFM(a~8,b-8)A
[@(b-8)-@(a-8)]
= ]J * .
This function is easily evaluated on a computer for a particular
(8,a,b) value.
Application of the Posterior C=l Procedure is trivial
once Eh (8) is available.
V
A
8 = 0 if 8[2]J
*
A
- 8] < O.
The Bayes decision procedure is also trivial:
It appears to be difficult to find explicit
formulas for the C-va1ues for these procedures.
For particular (a,b)
values the effective C-va1ues may be found by a search procedure on a
I.
I
I
computer •.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
124
Estimator of n is simply the BLUE of the best linear approximating
function of
n.
In Section 4 a procedure was introduced for using one set of data
both for determination of terms to be included in an estimation model
and for estimation of the parameters in the resulting model.
Due to
the fact that when non-zero terms are deleted from an estimation model
the resulting estimator is biased, the variance criterion was replaced
by the Integrated Mean Square Error (IMSE) criterion for the comparison
of estimators.
v
n,
for
n
It was shown in Section 4 that the IMSE of an estimator,
(both expressed in terms of integration-orthonormal functions)
can be written as the termwise sum of the mean square errors of each of
the coefficients.
Because there are no covariance terms in this expres-
sion for the IMSE one can select terms to be included in the estimation
model on a term-by-term basis.
The procedure consists of examining
each term in the model and setting the estimator of the corresponding
coefficient equal to either zero (to delete the term from the model)
or to the least squares estimate of the coefficient.
Three estimation procedures were considered.
The first is
called a "two-tail" estimator and is based on the UMP test of the
Is.1
OJ
hypothesis H :
A
> s.d.(S.) versus H :
-
a
J
A
Is.1
J
< s.d. (S.).
J
mator is defined as:
A
v
A
=
S.
=
Oif
.J
if
Sj
A
Isj I
> C s.d. (S.)
J
A
Isj I
A
<
c
s . d. (S .) ,
J
The esti-
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
6.
SUMMARY
Techniques have been presented for attack on one of the very
difficult problems facing applied statisticians, that of using one set
of data both for determining which terms to include in a general linear
model and for estimating the parameters of the resulting model.
The major result of Section 3 was the extension of the concept
of Minimum Bias Estimation to a very general setting.
First, some
linear approximation theory was presented leading to the definition of
the best linear approximating function of a given function
n with
re-
spect to a given region of interest, R, weight measure W, and set of
linearly independent functions, F.
The best linear approximating func-
tion was given in terms of integration-orthogonal functions; algorithms
were given for computing the set of integration-orthogonal functions
from the set F of linearly independent functions, and for transformations of representations of
n
in terms of the linearly independent
functions in F to representations in terms of the set of integrationorthogonal functions, and vice versa.
linear approximating function of
n
It was shown that the best
is easily computed; one simply
deletes unwanted terms from the representation in terms of integrationorthogonal functions.
Similarly, one computes the Best Linear Unbiased
Estimator (BLUE) of the best linear approximating function by simply
deleting terms from the BLUE of the integration-orthogonal-function
I.
I
I
representation of
n.
Finally, it was shown that the Minimum Bias
I
i26
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
where the constant C is determined from other considerations (see
Section 5) and where s.d.(B.) denotes the known or estimated standard
J
deviation of B..
J
The distribution, expected value (and bias) and mean square
v
error of the estimator B were derived and the bias and MSE plotted for
various C-values and various values of
the estimate of the variance of -B.
v,
the degrees of freedom for
This estimator was called the
lItwo-tail" estimator because the resulting estimator can take positive or negative (or zero) values.
For the case in which the sign of a particular coefficient, B,
is known from prior consideration, a lIone-tail'l estimation procedure
was presented.
Again, the BLUE,
B., of B. is compared with the
J
J
A
(known or estimated) standard deviation of B.; the resulting estiJ
v
mator, B , is defined as;
j
(The above procedure is for
v
= B. if -B.
B.
J
assumed positive; if
A
S.
J
is assumed
The distribution, expected
v
value (and bias) and mean square error of the estimator B. were
negative, B.
J
J
J
> C s.d.(B ).)
j
J
derived" and the bias and mean square error were plotted for various
C-values and various values of
v, the degrees of freedom for the esti-
mate of the variance of B..
J
Finally, a generalized estimator of the form
I
127
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
A
v
B.J
Bj
if
e: B
A
=
=
A
o
if
Bj
Bj
A
IB (B j )
e: A
was considered, where A is a subset of the sample space and B is the
complement of A relative to the sample space.
(and bias) and mean square error of
v
B were
The expected value
derived for A-sets which
are finite intervals,complements of finite intervals, or half-lines.
For the unknown variance case, the endpoints of A and B depend on
the estimate of the variance.
Two techniques were presented in Section 5 for selection of the
A and B sets for the generalized estimator.
By specifying a prior
distribution for a particular coefficient, one can select those sets
which give the smallest expected value (with respect to the prior) of
the mean square error of the estimator.
A formula was given for compu-
tation of the estimator as a function of the mean of the posterior
distribution.
For the normal prior the boundaries of A and B were
pressed in terms of the mean and variance of the prior.
ex~
A second tech-
nique was presented and compared with the optimum technique for the
case in which the prior is normal.
All of the results above hold for the response function expressed
in terms of integration-orthonormal functions; it was shown in Section
4 that the IMSEoftheestimator resulting from the above procedures is
simply the sum of the mean square errors of the individual coefficients.
Thus, the procedures can be applied tery-by-term to each
"full model"
I.
I
I
Bj
A
(each term of "potential significance").
term in the
Since each term
mayor may not enter the final estimation model, the procedures, in
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
128
effect, allow consideration of "a ll possible regressions It subject to
the restriction that terms are deleted by setting corresponding coefficients to zero.
The procedure is remarkably easy to apply (com-
pared with stepwise regression, for example):
one simply compares
each coefficient estimate with its standard deviation times a constant (C, or C and C ).
2
I
mationmodel are known.
Moreover, the properties of the final esti.(The properties of the final estimation
model in stepwise regression are not known.)
In summary, a procedure has been presented which allows one
to determine an estimation model and estimate the resulting model
with the same data.
The procedure is applied to an expression of a
response function in terms of integration-orthogonal functions.
(Transformations between representation in terms of integrationorthogonal functions and representation in terms of "standard"
(linearly independent) functions consist of multiplication of a
vector by an upper triangular matrix.)
The procedure is easy to
apply, is applied on a term-by-term basis to each potential term
in the model, has known properties, and is closely related to
Minimum Bias Estimation.
There are at least three areas in which further study is to
be recommended:
(1) A comparison of the estimators above with those produced
by least squares estimation of the reduced model.
of the least squares estimator are design-dependent; comparisons
would have to be made for various standard design.
I.
I
I
The properties
I
I.
I
I
I
I
I
I
I
I
I
I
I
I
I
I.
I
I
129
(2) A study should be made to determine efficient designs for
v
the estimator n in various situations; the designs would depend on the
region of interest and the weight function.
(3) The procedures given here should be extended to multivariate
general linear model settings.
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
LIST OF REFERENCES
Abramowitz, M. and I. A. Stegun. 1964. Handbook of Mathematical
Functions. National Bureau of Standards Applied Mathematics
Series, No. 55. U. S. Government Printing Office, vlashington,
D. C.
Bancroft, T. A. 1944. On Biases in Estimation Due to the Use of
Preliminary Tests of Significance. Annals of Mathematical
Statistics. 15: 190-204.
Box, G. E. P. and N. R. Draper. 1959. A Basis for the Selection of
a Response Surface Design. Jour. Amer. Stat. Assoc. 54: 622654.
Box, G. E. P. and N. R. Draper. 1963. The Choice of a Second Order
Rotatable Design. Biometrika 50: 335-:-352.
Davis, Phillip J. 1962. Orthonormalizing Codes in Numerical Analysis,
pp. 347-379. In John Todd (ed.), Survey of Numerical Analysis.
McGraw-Hill Book Company, Inc., New York.
Draper, N. R. and H. Smith. 1967. Applied Regression Analysis.
Wiley and Sons, Inc., New York.
John
Faddeeva, V. N. 1959. Computational Methods of Linear Algebra.
Translated from the Russian by Curtis D. Benster, Dover Publications, Inc., New York.
Gorman, J. W. and R. J, Toman. 1966. Selection of Variables for
Fitting Equations to Data. Technometrics. 8: 27-51.
Graybill, F. A.
Volume 1.
1961. An Introduction to Linear Statistical Models.
McGraw-Hill Book Company, Inc., New York.
Halmos, Paul R.1950.
New York.
Measure Theory.
D. Van Nostrand Company, Inc.,
Hands comb , D. C. 1965 ... Methods of Numerical Approximation.
Press, New York.
Pergamon
Hildebrand, F. B. 1956. Introduction to Numerical Analysis.
Hill Book Company, Inc., New York.
McGraw-
I
I.
I
I
I
I
I
I
I
Ie
I
I
I
I
I
I
I.
I
I
131
Hocking, R. R. and N. N. Leslie. 1967. Selection of the Best Subset
in Regression Analysis. Technometrics 9: 531-540.
Hogg, Robert V. and Allen T. Craig. 1965. Introduction to Mathemati-.
cal Statistics. The MacMillan Company. New York.
Karson, M. J., A. R. Manson, and R. J. Hader, 1969. Minimum Bias
Estimation and Experimental Design for Response Surfaces.
(To appear in Technometrics, September, 1969.)
Larson, Harold J. and T. A. Bancroft. 1963a. Sequential Model Build- .
ing for Prediction in Regression Analysis, I. The Annals of
Mathematical Statistics. 34: 462-479.
Larson, Harold J. -. and T•. A. Bancroft •.. 1963b. Biases in Prediction
by Regression for Certain. Incompletely Specified Models.
Biometrika 50: 391-402.
Lehmann, E. L. 1959. Testing Statistical Hypotheses.
Sons, Inc., New York.
Lindgren, B. W.
New York.
1962.
Statistical Theory.
John Wiley &
The MacMillan Company.
Lindley, D. V. 1968.u The Choice.of Variables in Multiple Regression.
Journal of the Royal Statistical Society B. 30: 31-66.
Loeve, Michel. 1963. -:Probability Theory.
Inc., New York.
D. Van Nostrand Company,
Michaels, Scott.E. 1969. - Optimum Design and Test/Estimation Procedures for Regression Models. Unpublished Ph. D. thesis,
Department of Experimental Statistics, North Carolina State
University, Raleigh, North Carolina.
Schatzoff, R. Tsao.andS. Feinberg. 1968. Efficient Calculation of
all Possible Regressions. Technometrics. 10: 769-779.
Sclove, S. L. 1968. Improved Estimators for Coefficients in Linear
Regression. Jour. Amer. Stat. Assoc. 63: 596-606.
Todd, John. 1962 •. Survey of. Numerical Analysis.
Company, Inc., New York.
McGraw-Hill Book
Toro, Carlos andT. D.Wallace • . 1969. A Test of the Mean Square
Error Criterion for. Restrictions in-Linear Regression. Jour.
Amer. Stat. Assoc. 63: 558-572.