Mallios, W.S.; (1961)Some aspects of linear regression systems." Ph.D. Thesis.

SOr''1E ASPECTS OF LINEAR REGnESSI0N SYSTEMS
by
William S. Mallios
Institute of Statistics
Mimeograph Series No. 2.98
November, 1961
ERRJ-\. TA
v
7.1
SHEET
should read The Estimation of Stationary
Poir~s
of Linear Functions
of Linear Regressions
7.2 should read ••• when
4
¢(z)
is a
QO.
4th line from bottom - Roy's ~£7 instead of Roy's ~17
11
2nd paragraph, line one, 1959 instead of 1958
26
in
(5 .. 1),
YT
YT - 0T
YL - °L
Yv -
YL
instead of
6v
Yv
+ °T
+ °1
+6v
2nd paragraph, next to last line
in (5.2), and equation
29
'* b3x n
el
+ b xA + b 6x D + eL
5
2
2
+ b xAx n + b xA + b6x D +
4
5
YL
:=
bO + blYT + b2xA
YL
:=
bO + blYT + b2xA + b3x D +
b4x~xD
in C5.5)
~t
2E'
:=
:=:c
CXA'Xn,xAxD~xf,x~) instead
of
2 2
CXA" ."xAzD,xA'Xn )
7th line from bottom
31-37,4L~
31
Read causal Cnon-causal) instead of casual (non-casual)
2nd paragraph, 4th line
For example, in the model
instead of
Page
35
2
First paragraph, last line
E(sl s2) - b2 0"22 for b 2
r 0 and 0"12 = 0 ,
II
E(s1 62)
45
0=
b 2 522 for b 2 , 0 and 5
12
f
instead af
0
*
Figure 6.1
""-'
Y2
.-...,,'
.'~..J
= bO + blX
'-v
instead of
,..,...
Yl = bO +b~X
53
In the corollary, change "test" to "best ll •
55
3rd line from bottom, Pjjt instead of P
64
(6.60), (6.61), (6.62)
(~h/()b
73
in
•
I £ = !?)
instead of (bh/ob
106
•
£ = E.-)
(6.70)
Y = Ap (x)!Bq (x) +
88
jj
&
instead of Y
= Ap (x)/Bp (x)
+ s
in theorem, 2nd line
2nd paragraph, 1st line
Hancock
iJ.£! instead
of' Hancock!J27
3rd paragraph, 3rd line from bottom
method of steepe[t descent L2~ instead of
method of steepest descent ~7
107
2nd paragraph, 4th line
Frank and Wolf
Lr67 instead of
Frank and Wolf
L427
2nd paragraph, 7th line
Kuhn~Tucker
L>27 instead
of Kuhn-Tucker ~-7
2nd paragraph, last lire
Lagrangian form
* Same
02.7
instead of
Lagrangian form
paragraph, 4th line from bottom
E{ei s2) == 012
=0
instead of
E(e~s2) ;::
°12 = 0
ffiy
3
Page
108
2nd line, top of page
Umland and Smith
109
instead of Umland and Smith
12th line
Bodmer
119
§§}
Cv
instead of '3odrn.er
OJil
2nd paragraph, 1st line
,"
"in finding not" instead of "in finding out"
'"
3rd line from bottom
..( instead of ~
,,:{;/
!fig
iv
TABLE OF CONTENTS
Page
LIST OF FIGURES
1.0
INTRODUCTION
1
2.0
REVIEW OF LITERATURE
7
3.0
TOXOPLASMOSIS . . .
4.0
3.1
A Discussion of Toxoplasmosis
12
3.2
A Serologic Test for Detecting the Presence
or Absence of Toxoplasmic Antibodies .
12
STATISTICS AND THE DYE TEST . .
6.0
14
4.1
The Probit Transform and LD 50 .
14
4.2
Linear Weighted Regression in Estimating
the Line of Descent . . . . . .
16
The Role of Compound Distributions in the
Dye Test . . . . . . . . . . . . . . . .
20
4.3
5.0
12
THE CYCLE OF INVESTIGATION AS APPLIED TO ONE STAGE
OF THE DYE TEST . . . . . . . . . . . . . . .
24
SYSTEMS OF LINEAR REGRESSION EQUATIONS
31
6.1
The Nature of Errors Associated with
Linear Regression Systems . . . .
31
Estimation of the Parametric Coefficients
in a Linear Regression System When W Is
Known . . . . . . .
. . . .
36
6.3
Estimation of 6 2 and r,
53
6.4
Estimation of b When W Is Unknown
55
6.5
A Sufficient Condition for Convergence and
an Approximate Variance of the Iterated
Estimate . . . . . . . . . . . . . .
61
General Comments on Estimation in the
Predictive Regression System . . . .
66
A Note on the Variance of a Vector Coefficient Estimate Obtained Through Iteration
in a Single Rational Regression Model
73
6.2
6.6
6.7
v
TABLE OF CONTENTS (continued)
7.0
SOME UNCONSTRAINED EXTREME VALUE PROBLEMS . .
78
The Estimation of Stationary Points of
Linear Regressions . . . . . . . . .
78
7.1
8.0
10.0
A
79
7.3
Remarks Concerning the Confidence Region . .
83
7.4
Intersection Region Confidence Procedures
in Finding anAApproximate Confidence
Region for E(Z)
90
SOME PROBLEMS OF CONSTRAINED EXTREMA
8.1
9.0
A
A Confidence Region for E(Z) when ~(y) is a
Second-order Linear Function of the Z's
7.2
Uses of Lagrangian Multipliers in a System
of Linear Regression Equations
.
99
99
8.2
Errors in the Optimal Solution. . .
108
8.3
An Approximate Confidence Interval for an
Individual Lagrangian Multiplier.
112
SUMMARY AND GENERAL REMARKS.
117
LIST OF REFERENCES
120
vi
LIST OF FIGURES
Page
4.1
4.2
5.1
Percentage modified plotted on the arithmetic
scale of twofold dilutions from 1:256
through 1:4096
.
Percentage modified plotted on the probit
scale at twofold dilutions from 1:256
through 1:4096
17
.
18
A possible series of events characterizing the
phenomenon of interest . . . . . . . . . .
25
6.1
Unbiased and best unbiased prediction lines .
45
7.1
95
7.2
The intersection confidence region, R(I) for
E[(Zl,2 2 )] for specific k l , k 2 vectors .
The R(l) approximation to R(I) . . •
7.3
The R(2) approximation to R(I)
.
96
7.4
The R(3) approximation to R(I)
.
97
8.1
A geometrical example of the compromise solutions
and the optimal solution . . . . . . . . . . . .
95
102
1.0
INTRODUCTION
Suppose an inquiry'were made concerning a certain phenomenon. The deeper inquiry, as opposed to phenomenology, may invol ve questions such as, "What are the mechanics of the
phenomenon?", and "What are the laws of nature governing such
an event?"
Though the second question is nebulous and/or perhaps
beyond the generation of the scientist questioned, the first
may fall within the framework of casual theory.
As such, one
may first sketch the network of hypothesized occurrences
characterizing the phenomenon in a path diagram.
From a
statistical viewpoint, such diagrams may be described mathematically in terms of linear and/or nonlinear structural
regression models; i.e., a system of models deduced possibly
from one or more differential equations.
Thereafter comes
experimentation designed to support or discount the hypothesized models.
Then the process is repeated with inferences
gained in the first cycle dictating modifications in the hypothesized models used in the second cycle of investigation.
The Humean reader may now expect the statement, "then at the
termination of some future cycle, the true law of nature is
uncovered."
But does there exist a true law of nature; e.g.,
is the functional relationship /T9, 2071 exact?
More pre-
cisely, does the error entering in the functional relationship stem from extraneous effects on the true values of
lNumerals appearing in slanted brackets refer to citations in the List of References.
2
associated variables, or is there perhaps an error associated
with the relationship itself?
No attempt is made to answer
this due to the uncertainty of the writer.
Rather, we will
say that the cycle of investigation is continued until a
structural regression system is determined which describes
adequately the mechanics of the phenomenon and hence, predicts
adequately the event.
Almost any causal theory comes sooner or later to deal
with structural regression rather than predictive regression
/357,
(See Chapter 5 for definitions of structural and pre-
dictive regressions.)
Though the preceding
quote was, most
likely, not intended to slight predictive regression, one
might wonder whether there has been a tendency to neglect predictive regression in favor of structural regression.
The
former may yield solutions to certain problems unsolvable by
structural regression systems (hypothesized in early cycles
and therefore possibly inadequate) and also offer aid in the
cycles of investigation, particularly when structural regression coefficients are nonestimable.
Further, interest does
not always lie in the structural regression, but, even so, the
latter may not be totally neglected, since the predictive regression models usually stem from the structural models.
Consider now the experimental stage of the cycle.
In
general, let us say the phenomenon of interest occurs "withinn the experimental unit.
Whether it be a human body (a
group of individuals), a corn plant (an acre of corn plants),
or a missile warhead, the unit can be characterized by
3
infinitely many variable (from one unit to the next) endogenous responses (or response types), only some of which are
of interest.
Let these be measurable and, of course, de-
scriptive of the phenomenon of interest.
Further, to allevi-
ate the possible time difficulty, we will, throughout this
writing, be confined to responses which can be regarded as
occurring simultaneously or approximately so;
~.g.,
suffi-
ciently small time differentials between occurring responses
may offer possible justification to an assumption of simultaneity.
By either theoretical or primitive means, the structural
regression system is established, whereupon each endogenous
response type is described in terms of a set of variables,
part of which are exogenous (assumed controllable) and the
others being endogenous (some or all of the other response
types).
After the structural system has been reduced to the
predictive system;
~.g.,
the endogenous response types are
functions of only the exogenous variables; the first directive is to estimate jointly the parametric coefficients in a
system of linear regression models;
i.~.,
models which are
linear in the unknown coefficients and errors, where errors
within, but not between, an experimental unit are correlated,
and the independent variables;
and functions thereof,
response type.
i.~.,
the exogenous variables
are not necessarily the same for each
If the latter is true and the covariance
matrix (or the relative magnitude of its elements) of the
4
"within" experimental unit errors is unknown, then, under the
normality assumption, the maximum likelihood estimates of the
covariances and coefficients are usually unattainable (in a
practical sense) due to the complexity of the nonlinear equations resulting from equating to zero the partial derivatives
of the likelihood function.
As a result, an iteration scheme
is proposed for finding estimates of the parameters.
More-
over, a sufficient condition for convergence when iterating
is given along with an approximate covariance matrix of the
resulting coefficient estimates.
There is no need to consider the latter regression system as such if interest is centered on the effects of levels
of the exogenous variables (with, say, one level of each
exogenous variable combined into a so-called treatment as in
a
factorial experiment) on the pattern of any particular
endogenous response type, for then, a series of univariate
analyses of variance may be the most fruitful recourse;
the response types are analyzed separately.
i.~.,
Of course, joint
probability statements across response types (ignoring the
"within" experimental unit correlations) are erroneous unless
errors within experimental units are uncorrelated or approximately so.
In this connection, if it is desired to study the
effects of a common set of exogenous variables on the patterns of the endogenous response types jointly, then Roy's
multivariate analysis of variance techniques
able.
~re
/~77
avail-
Here, one may test contrasts between treatments (for
each response type) and within treatments (for different
5
response types) simultaneously with associated confidence
bounds following therefrom.
Unfortunately these methods are,
at present, somewhat esoteric.
In any event, the "real world" statistician is conscious
of the fact that many pertinent questions regarding relationships between endogenous response types are left unanswered
by analysis of variance.
Further, it is inconceivable to the
writer that certain questions could be answered without at
least establishing
(i) that the syndromic response types (or functions
thereof) can be predicted to some degree and
(ii) the pattern or shape of such surfaces resulting
from the prediction.
One such class of questions leads to problems of extrema
or constrained extrema for which the predictive regression
system is particularly useful.
For problems of extrema, con-
ditions on the exogenous variables are found (point estimates
and confidence regions) under which linear functions of the
predicted endogenous response types are extremal.
Problems
of constrained extrema may arise in regression systems when
extreme values of linear functions of certain response types
(say, those in the same unit of measure) are wanted subject
to constraints, the constraining functions being defined by,
say, one or more of the other response types (possibly those
in different units of measure).
All of the foregoing is discussed with respect to a factorial experiment designed for one stage of a laboratory test
6
used in the quantitative determination of toxoplasma antibodies.
However, prior to the outright consideration of the
aforementioned, a detailed discussion of the statistical
aspects of this laboratory test is presented to give perhaps
a greater insight into the nature of the problem.
7
2.0
REVIEW OF LITERATURE
One of the earliest and perhaps most fruitful applications of regression systems could have occurred with Wright's
original work in path analysis /427 shortly after World War I.
However, Wright, undoubtedly influenced by the statistical
vogue of that period, chose to construe causal relationships
in terms of correlation coefficients, although he repeatedly
and properly stressed the distinction between causation and
correlation.
More recent criticisms of using correlation
analyses in terms of causation were given in 1953 by Tukey
/357 who, very convincingly, related the priority of regression over correlation.
Tukey's approach to path analysis was
further developed in extensive studies by Turner and Stevens
/36/ and Turner, Monroe, and Lucas /377.
In the middle thirties came possibly the first direct
application of linear regression systems with the publication
of Hotelling's Most Predictable Criterion /167,
In another
article /177 in 1935, he considered the problem in a more
symmetrical manner.
Denote a system of linear regression
equations by
A
A
.
Y1
A
9=
Yj
A
L
yq
b ll
.
A
= Bx =
A
b lj
A
b lq
A
·
. . .·
A
b 2j ·
b 2l
A
b 2q
·
A
bpI
Xl
b pj
Xi
· A. .
.·. .
A
.
· · b pq
Xp
(2.1)
8
1\
!\
where, if Yj does not depend on xi' the corresponding b in B
is defined to be zero, and the x's are deviations from their
respective means.
In
!f67,
a linear function of
9j
is taken,
say
1\
/\
= .(,'B!.
-t'y
where
~
is chosen such that
~'Y
will have the greatest mul-
tiple correlation, say R2 , with x.
=1
'
Since
t'vt
- 't'U-t '
where V is the matrix of residual sum of squares and cross
products; i
.~.
,
and
j,j'
= 1,
2"",
q~
then by setting
.(, is easily found to be the characteristic vector corresponding to the smallest characteristic root of U-1V.
Hotelling
assumed the endogenous response types (or criteria variables
in his terminology) to be either mutually independent or regressed on the same vector
~,
both of which allow a separate
least squares estimation of each parameter
vector~.
ever, in many situations, response types are neither
How-
9
independent nor dependent on the same exogenous variables (at
least not in the same linear functional form).
By assuming
independence when there exist dependencies between response
types, the coefficients estimates
A
~
may be inefficient
(though unbiased and consistent) when each
~
is estimated
separately by least squares (see Chapter 6).
In the immediate period prior to the development of
high-speed computors (lasting roughly through the 1940's),
methods were given by Hotelling /187, Turing /337, and others
for finding better approximations to the inverse of a nonsingular square matrix so as to reduce the error in the solution z in the linear simultaneous equations
u
qxI
A z
qxq qxl
(2.2)
Lonseth /237 considered these errors which may have stemmed
from rounding in the inversion of A and/or observational
errors in u.
~.g.,
Other sources of error in
~
could be given;
if (2.2) is a regression system, A is subject to error
due to its determination by least squares, but it was not intended in
/~37
to single out any particular cause.
Rather,
the aim was to find a series expansion for the error in the
z's,assuming that (2.2) is a nonhomogeneous system of linear
equations with A,
~,
and
~
having additive error components.
By means of a Taylor's expansion, a series was obtained for
the solution error as a function of the q(q+l) coefficient
errors.
Conditions for its convergence were then found,
giving not only approximations to the maximum error but also
10
an upper limit.
Earlier writings also considering errors in
the z's individually are cited in /237.
In the early fifties and even before, the treatment of
multiple equations (with dependent errors) was being carried
on in econometrics.
Here, main efforts were directed to
finding estimates of structural parameters.
The estimable or
nonestimable character of the latter was investigated in detail with the method of limited information, published
in 1953 by Hood and Koopmans /157, enabling the estimation of
overidentified structural parameters.
At this same time, closely related works were published
by Scheffe /297 in "tests for all possible contrasts" and by
Roy and Bose /277 in simultaneous confidence interval estimation.
(Since then, both topics have been extensively de-
veloped by Roy /267 in many different connections.)
It was
natural, then, to eventually consider the errors in the z's
in (2.2) not individually, as did Lonseth
/23~
taneously ,as did Box and Hunter L41 in 1954.
but simulIn application
of the latter article, Box and Hunter extended an argument by
Fieller /107 in finding a confidence region for the functionally independent x's which give rise to the stationary
point in a single second-order regression model.
However,
difficulties may occur not only in establishing the confidence
region (laborious algebra), but also in visualizing it for
applied purposes.
As a result, two subsequent papers serve
as possible remedies.
that
First, Box and Hunter LQ7 have shown
a very considerable simplification in the expression
11
for the exact simultaneous confidence region occurs when a
rotatable design is used to generate the experimental data
~~.
Another alternative is to give an approximation to the
true confidence region, as did Wallace /40Z by using a theorem
by Scheffe
/~97.
Here the approximations take the form of
convex polyhedra or parallelopipeds with the approximations
being easy to compute.
References
L4-7
and /407 are discussed
in greater detail in Chapter 7.
Finally, attention is directed to a 1958 publication by
Williams
/417
who used regression systems for inverse esti-
mation as, say, in a multivariate bioassay situation.
Assum-
ing (2.2) to be a regression system with the z's still functionally independent but A of order qxq', it is Williams' intent that the u's be the observed variables and the z's, the
predicted;
i.~.,
given a new vector of observation u
= o~,
what is the estimate of z (attainable if the dimension of z
exceeds the row rank of A by one) and the fiducial region for
that
~,
say
E(~),
which would have been attained had
been subject to error.
given a new observation vector
probability that
,
E(~)
hyperellipse is 1 -
~
not
Williams presents methods for finding
= o~, P[E(~)ERJ = 1
u = o~, the fiducial
the region R in the statement given ~
i.~.,
o~
lies within a q' dimensional
-~;
12
3.0
3.1
TOXOPLASMOSIS
A Discussion of Toxoplasmosis
A discussion of toxoplasmosis is given by Sonnenworth
/307.
In brief review, toxoplasmosis is a disease of man and
animal stemming from the protozoan parasite, Toxoplasma
gondii.
In humans, the disease is produced in two forms,
congenital (existing at birth) and acquired.
In the former
case, it is believed that toxoplasmosis occurs when the
organism invades a non immune pregnant woman, with parasitemia
occurring upon infection of the fetus.
The infection is un-
apparent in the mother, but, depending on which stage of
pregnancy, the infection was contacted
(i) the fetus may die in utero or after abortion;
(ii) the fetus may be born prematurely with active
toxoplasmosis; or
(iii) the fetus may be normal at birth and appear well
for days, even months, before symptoms are noted.
In (ii) and (iii), possible consequences include cerebral
calcification and retardation.
Postnatal infections may be
so mild as to be unrecognized, with more severe infections
afflicting the central nervous system.
3.2
A Serologic Test for Detecting the Presence
or Absence of Toxoplasmic Antibodies
From the many contradictory findings reviewed in /307,
it is evident that the symptoms of toxoplasmosis are extremely variable necessitating a heavy reliance on laboratory
13
test findings.
The serologic test considered in this writing
is the Sabin-Feldman
/~87
cytoplasm-modifying dye test.
Sabin and Feldman, while searching for some in vitro
method of detecting the presence of toxoplasma antibodies in
human serum, discovered that certain chemical compositions
have been found capable of indicating antibody activity.
Namely, the cytoplasm of toxoplasmic organisms in fresh
exudate (extracted from inoculated mice) is stained a deep
blue by alkaline methylene blue.
However, when the cytoplasm
of toxoplasm is acted upon by specific antibodies and accessory factor, it loses its affinity for methylene blue and
the cytoplasm remains unstained.
An explanation of this is
found in a recent theory that the antibodies disrupt the cell
wall of the toxoplasma organism, releasing the cytoplasm so
that no staining can take place.
Consequentl~
the dye test
offers a quantitative measure of the absence or presence of
toxoplasma organisms.
14
STATISTICS AND THE DYE TEST
4.0
4.1
The Probit Transform and LD 50
When determinations of unknown sera are made, one or
more positive controls are tested.
serum with toxoplasma antibodies.
A positive control is
The degree of positiveness
exhibited by a given serum is measured quantitatively in
terms of titers.
In explanation of the latter, serum is
diluted most easily in dilutions of 1:1, 1:2, 1:4, 1:8, 1:16,
1:32, 1:64, etc.
A dilution of, say, 1:256 is one part serum
to 255 parts saline.
The more antibodies contained in serum,
the higher the dilution at which they will be detected.
From
what has been said previously, the more unstained organisms
detected (by means of the dye test), the more toxoplasma
antibodies present.
At each dilution, organisms are counted from five fields
on a particular slide so that
Percent stained
number
= number
of organisms stained
of organisms counted
.
That dilution at which 50 percent of the organisms are
stained is called the LD 50.
LD 50 is a characterization used in early toxicity tests
by Trevan /347.
~.g.,
In applying various doses of a stimulus,
poison, to a group of subjects, it is convenient to
characterize the stimulus by its effectiveness at a certain
dose.
It is often difficult to kill all subjects, except pos-
sibly at very high doses, so that the lethal dose for which
15
all subjects are killed (denoted by LD 100) is a poor measure.
Conversely, the dose at which no subjects are killed
(LD 0) would be an extremely poor characterization, since some
subjects would die when exposed to a placebo.
As
a charac-
teristic of the stimulus which can be more easily determined
and interpreted, Trevan has advocated the median lethal dose
(LD 50),
o~
as a more general term to include responses other
than death, the median effective dose (ED 50).
This is de-
fined as the dose which will produce a response in half the
population, and thus, from another point of view, is the mean
tolerance
/117.
In the dye test we consider those unstained
organisms as destroyed by the action of toxoplasmic antibodies.
For this reason LD 50 is used as the characteriza-
tion of any serum rather than ED 50.
Consequently the LD 50
gives a quantitative measure of the degree of positiveness,
and in this context we define a titer as the LD 50.
An individual is usually regarded as positive if 50 percent or more of the organisms are unstained at a dilution of
1:16 or above, though the criterion is controversial.
It should be mentioned that, for a high positive control
with known LD 50 (known from many previous determinations),
countings are made at dilutions ranging from 1:256 to 1:8192
or higher;
i.~.,
the percent of unstained organisms is found
at 1:256, 1:512, 1:1024, 1:2048, 1:4096, and 1:8192.
positive controls,
i.~.,
those with LD 50
~
Low
1:64, are unsatis-
factory,since they have been found to be unstable in this dye
test.
16
The dilutions are coded to log2 on the abscissa with the
percent unstained plotted on the ordinate axis as in
Figure 4.1.
But a straight line relationship is desirable
for ease in estimating the LD 50.
Since the resultant curve
resembles a cumulative normal distribution, the percent unstained, ~, is transformed to t where
A
p
and t
=
probit - 5.
Jt
1
= I~
.::o1T,
-00
e -x2/2d x,
(4.1)
This transformation, called the probit
transform, produces approximately a straight line in the
range of interest.
For a complete discussion of the probit
transform see Finney 1117.
Figure 4.2 depicts a typical high
positive control plotted on probit graph paper.
The result-
ant curve is three intersecting straight lines which we designate as the upper plateau, the line of descent, and the
lower plateau.
the line
By inverse estimation, the LD 50 is read from
of descent.
In Figure 4.2 the LD 50
= 1:1024.
At first glance two statistical problems are posed:
(i) What are point and interval estimates of the LD 50
for the control?
(ii) What are the abscissal point and interval estimates
of intersection of the straight lines?
4.2
Linear Weighted Regression in Estimating
the Line of Descent
In answer to the first question, it is evident from the
previous discussion that one would initially assume the
scatter around the curve prior to transformation to be
17
90
80
-.
"0
Q)
X
70
d
.,-j
cd
.+J
60
00
d
~
M
50
0
"0
Q)
40
.,-j
"H
.,-j
"0
0
a
30
.+J
d
Q)
20
()
M
Q)
~
10
10g 2 (dil): 8
dilution: 1:256
Figure 4,1
9
1:512
10
1:1024
11
1:2040
12
1:4096
Percentage modified plotted on the arithmetric
scale at twofold dilutions from 1:256 through
1:4096
18
90
~
Upper plateau
'tl
C'iI
+1
80
........
C'iI
It)
~
.....-,,-
I
Q)e
~
70
-,..
~
0
0
........
-
~
Line of descent
I
.~
.0
0
60
M
Q.
0 n
50
Q)
1=
C'1
I
I
I
I
I
40
/I
Q.
30
/I
Q)
b.!l
'="
20
~
l>
Q)
'tl
I LD
10
log 2 (di1):8
dilution: 1:256
Figure 4.2
9
1:512
10
1:1024
,..
Lower plateau
'"
IS
M
0
)(
I
M
Q.
'"
.~
.Jt
Q)
()
Q)
~
50
-1
=.
~
= 1:1024
11
1:2048
12
1:4096
Percentage modified plotted on the probit scale
at twofold dilutions from 1:256 through 1:4096
19
binomial.
As such, the variability differs from dilution to
dilution and is maximum around the true response corresponding to the LD 50.
With n large, one might not choose to use
probit analysis 1117, i.~., nonlinear regression, but rather,
linear weighted regression.
(Besides, from what is to be
said in Section 4.3, there is reason to question the probit
model in the sense that there may be "interaction in the design" in which case inferences may go no further than the
data on hand.)
If we were to fit a cubic regression line to
the untransformed data, as in Figure 4.1, weights could be assigned from the value of pain at each dilution.
But with a
straight line relationship being desirable in the range of
interest, a linear weighted regression of the transformed 1\p's
on the coded dilutions is in order.
In this case, weights
are found approximately by the propagation of error technique
ta7.
From (4.1),
p = F(t)
/\
dp
= f(t)dt
-;;t
[f ( t ~ 2 var t
= var
/"
/\
p
where F(t) denotes the cumulative normal distribution.
var (t) :; [var
pJ [f (t)] -2
= pqln
~ (t)]-2
Hence
20
designates weights to the variances in the line of descent
regression where f(t) is found in the tabulated normal tables.
The method of obtaining a confidence interval for the true
LD 50 is well known; !.. g., see exercise
13.1, Anderson and
Bancroft /1_7.
With regard to question (ii) of the last section, the
abscissal values of the estimated intersection points are
easily found by estimating the upper and lower plateaus and
the line of descent, then finding their intersections.
The
question of corresponding confidence intervals can be resolved
by applying the propagation of error technique or Fieller's
theorem.
4.3
The Role of Compound Distributions in the Dye Test
Experimentation was designed to determine whether the
slope of the line of descent (for a given control) remained
roughly constant from one day test to the next; i.!.., the line
of descent contrast by dye test interaction was tested for
significance.
With a nonsignificant interaction, one would
merely have to adjust or shift the control line of descent by
a certain amount to account for the dye test effect.
Then
sera determinations would be adjusted by the same amount.
Un-
fortunately, the interaction was highly significant.
It was found that the Pi (the probability of an organism
surviving at dilution i, i:l,2,· ··,N) appeared to vary from
one field to the next for all i.
It was then reasoned that
there existed some type of "contagion" /24/ between organisms
21
on reasonably distinct fields on a given slide.
As such, p
was hypothesized to follow a beta distribution.
i=1,2,. ··,N, N being the number of dilutions.
By fixing the
number of organisms counted on any field of a particular slide
at n', compounding the beta and binomial distributions, and
integrating out p, the resultant distribution is
where
~i
and
~i
are easily, though not most efficiently, esti-
mated by equating ratios of moments.
This distribution was
first published by Skel1am /317 in 1948, but it is likely
that many were aware of its existence since 1920, at which
time Yule and Greenwood /447 derived the Poisson-gamma (the
compound Poisson distribution) analogy of the beta-binomial
distribution.
The beta-binomial distribution gave a close fit at each
dilution so that the assumption of "contagion" between organisms on a field appears warranted.
Since n' is by no means constant, another refinement can
be made on the explanatory distribution /247 around the untransformed line,
Figure 3.1.
~.g.,
the curve or untransformed line of
The nonconstant n' may stem from the fact that
22
each of the original organisms inoculated into the system may
produce a different number of organisms.
With n' varying ac-
cording to the probability law, say, g(n'), a further compounding results in
"l
~
,
,:",\ -1 0-1
L:
(n ) pXq n -x P
q
g (n ' ) dp .
J
n'>O 0 x
fi(~,t)
(4.2)
Letting g(n') be the Poisson distribution with parameter
1,
the moment generating function of the distribution given by
(4.2) is
(4.3)
a confluent hypergeometric function /7_7.
The distribution
defined by (4.2) is similar to Neyman's Type A contagious
distribution /247 and was derived during the course of Gurland's extensions /127 of Neyman's original work.
The explanatory distribution approach plays a vital role
in many stable biological processes, as in Neyman's first application
/~47.
We have attempted to use this approach in an
unstable process; i.e., this particular dye test procedure
leads to a wide variety of contradictory conclusions.
In
doing so, (4.2) was obtained under a set of conjectured occurrences, whereas the same moment generating function given
by (4.3) may well have been derived under a totally different
set of occurrences;
~.g.,
one may arrive at the negative bi-
nomial distribution under at least three different sets of assumptions.
On intuitive grounds, it is felt that if the
23
explanatory distribution approach is used in an unstable
process rather than in a stable process, then there is a
higher risk of ascribing the end result to one chain of events
when, in fact, something different is occurring with the same
end or distribution.
In statistical terminology, the instability of the dye
test is characterized by the slopes of the line of descent
and the plateaus, and the LD 50 of the control which vary
widely from one dye test to the next.
for unknown sera, at times, differ
Further, the slopes
drastically from the con-
trol slopes in a given test.
Consequently the intent is to isolate the source or
sources of trouble in the dye test.
As a result, experimenta-
tion has progressed "backwards" through the various stages of
the dye test.
24
5.0
THE CYCLE OF INVESTIGATION AS APPLIED
TO ONE STAGE OF THE DYE TEST
As was mentioned previously, exudate, extracted from
inoculated mice, is used in the dye test.
The question arises
as to what amount and dilution of inoculum to use in order that
the exudate extracted at a particular time, to <t o =72 hours),
have a maximum number of toxoplasma organisms, a minimum number of leucocytes (white blood cells which destroy the toxoplasma organisms), and maximum volume.
Time from inoculation
to extraction is not considered, since previous experiments
have determined the optimal and most economical time differential.
A factorial experiment was designed to gain perspective
into the three measured response types:
number of toxoplasma
organisms (YT) , number of leucocytes (YL) , and volume of exudate (yV).
The experiment included various levels of dilu-
tion of inoculum (xD) and amount of inoculum (XA) , with
exudate from three mice being pooled at each treatment combination.
Data are not presented as the residual variability was
excessive for all three response types.
Further, there ap-
peared to be a substantial block = day x treatment interaction.
However, the majority of the mice were dry or con-
taminated so that many observations were missing or based on
exudate from one mouse.
Consequently, it is questionable
that the block x treatment interaction is real and that the
additive models are completely inadequate.
25
Suppose we now consider the problem with regard to the
cycle of investigation.
With the experimental unit being
those mice inoculated with the same amount and dilution of
inoculum and the exudate being the phenomenon of interest,
then the three endogenous variables characterizing the phenomenon are YT' YL' and YV; the two exogenous variables are
xA and x D .
Figure 5.1
A possible series of events characterizing the phenomenon of interest
Figure 5.1 illustrates a hypothetical chain of occurrences at a fixed time, to' after inoculation.
During to'
the L (leucocyte organisms) are engulfing the T (toxoplasma
organisms).
The more T produced, the more L the body pro-
duces farrow (1), Figure 5.17.
On the other hand, the more L
the body produces, the fewer the number of T farrow (227,
since the former supposedly destroy
the latter.
Finally,
volume results from the interchange between the T and L
farrows (3) and
(417.
It should be stressed that Figure 5.1
vastly oversimplifies the true situation, and the preceding
discussion thereof could be wrong, especially if the T thrive
on the L. Further, to be realistic with regard to the structural system, the element of time should be considered.
However,
26
the writer is at this time more concerned with the method than
the problem.
From Figure 5.1, the following relationships result:
(5.1)
where the 6's represent measurement error, the y's measured
values, and the
~'s
are inherent errors;
errors associated with the models.
i.~.,
the
~vs
are
The b's can be made ar-
bitrarily small, or even negligible, depending on the refinement of the measurement technique.
Usually the measurement
error is pooled with the inherent error through no other recourse.
Further, it is assumed that the &'s and e's are un-
correlated with each other (within a particular response type)
and the x's, E(E)=O, E(EGEL )=6TL , E(&T~V)=6TV' E(ELc V)=6LV '
2 2
E(E 2
T)=6TT , E(E L )=6LL , and E(~V)=6VV'
As a very crude first attempt, one might choose
YT
= aO
+ a Y + a x + a x + a 4 x x + a 5x 2 + a 6 x 2 +
2 A
3 n
A
n
l L
A n
E
,
T
,
YL = b O + blYT + b 2 xA + b 3 x n + b 4 xAxn + b 5xA + b 6 x n + e L
(5.2)
,
E
Y = Co + c l YT + c 2 Y +
V
L
V
as the structural system of models a t time, to, where
E~ =ET+albL+&T; E~ =€L+bl~T+bL; and c~=EV+cl~T+C2bL+bv.
27
The equations of (S.2) are called the structural regression
models.
Ordinary regression analyses are not appropriate,
since the "independent" variables are correlated with the error ~17; ~.g., in the first equation, YL is correlated with
~~, since YL='lL+()L where '7.L is the true measurement value.
However, the structural regression models of (S.2) can be
reduced to
,
,
,
t
, 2
, 2
YT = a'0 + alxA + a 2x n + aSxAx n + a 4 xA + aSx n + E T
f
,
,
,
2 + bsx
' 2 + e"L
YL = b O + blxA + b 2x n + b xAx n + b 4' xA
n
3
(S.S)
,
,
,
,
, 2
' 2 + ~ "V'
YV = Co + clxA + c 2x n + cSxAx n + c 4 xA + csx
n
where, by substituting for the y's occurring as "independent"
variables, it is seen that
,
ao
,
as
=
=
a 2 + a l b2 . ,
, a2
I - alb l
a O + albo
,
,
al
I - al b l
=
a4 + alb4
I - alb l
= 1
as + albS
- alb l
a'
4
,
,
and similarly for the b 's and c 's.
,
as
=
=
as + alb s .
,
I - alb l
(S.4)
.
a6 + alb6
,
I - alb l
The equations of (S.S)
are called the reduced equations or the predictive regression
models, and for the system of (S.S), ordinary regression
analyses are appropriate.
If, however, the y's should not be
dependent, functionally, on the same x's, and the errors "between models,"
i.~.,
within experimental units, are correlated,
then the joint estimation of the parametric coefficients may
present difficulties.
This problem is considered in Chapter 6.
28
After estimating the coefficients in the predictive regression system, the latter are used to estimate the parameters of the structural regression system.
But in this
problem, we have the difficulty of overidentification;
i.~.,
there exist 17 unknowns, a O'" ',a 6 ; b O' "',b 6 , cO,cl,c2' in
18 equations, 6 of which are given by (5.4). In this case,
the Hood-Koopmans method of limited information /157 is available for estimating the structural parameters, with the variances of the resulting estimates being obtained by the
propagation of error technique 18_7.
With more unknowns
than equations, there is the problem of underidentification
with no known solution.
(This may possibly be remedied by
imposing further relationships derived from physical considerations.)
It would be surprising if the structural regression system (5.2) proves to be anything but a rough and possibly
invalid approximation.
It is thought that eventually (in
some future cycle of investigation), models of a nonlinear
nature, and involving the element of time after inoculation,
will be deduced (from one or more differential equations)
to replace the system (5.2).
This should, however, not dampen
one's hopes of answering the question posed at the beginning
of this section due to the fact that anyone of many structural systems (possibly the "correct one") could have given
rise to (5.3).
Thus, even though the structural system (5.2)
may be totally unrealistic, the resulting predictive system
(5.3) may offer valid answers to what amount and dilution of
29
inoculum to choose in order that exudate (extracted at time,
to) have maximum volume and a maximum difference in opposing
organisms.
There were scant and possibly misleading indications that
within the range of the factor space considered, each of the
response types achieved a maximum.
Further, it appeared that
these maximums occurred for different xA ' x D values with
max YT and max YL being roughly in the same locality of the
x ' x dimensions. In this connection, one function of inA
D
terest is ~(YT'YL)=YT-YL' which is estimated by the difference
in predicted values,
(5.5)
where the x's are deviations from their respective means
and within each response type, the variance is heterogeneous
necessitating, possibly, a transformation.
By taking partial
derivatives in (5.5), we are able to estimate by (~A'~D) any
(xA,xD) value, say E[(~A'~D~ , for which xT-YL is maximum.
Further, if
YT
and YL are second-order regressions, a confi-
dence region is available for E~~A'~D~ which, among other
things, provides the laboratory worker with a range of amounts
and dilutions of inoculum to aim at rather than one value which
may be difficult to make up.
The problem of bringing yV into
the picture is dealt with in detail in Chapter 8.
30
Note that we have considered the difference in regressions
YT-9L rather than regressing YT-YL on the x's.
It is easily
seen that YT-YL=YT-L if YT' YL' and YT-YL are regressed, linearly, on the same x's.
However, the error mean square esti-
mating 6;_L pools the entities 6 TT , 6LL , and 6 TL which can be
calculated with little effort and should be for a better conception of the situation.
Further, if one can adequately
predict the response patterns for Y and Y , then the difL
T
ference in responses is automatically known. Obviously, the
converse is not true, and by regressing YT-YL on the x's
without investigating the response patterns of YT and YL sepI
arately, one may run the risk of fitting, say, a cubic regression when, in fact, YT and Y follow quadratic patterns.
L
As the problem now stands (in the first cycle of investigation), the methods of solutions are relatively simple
partly due to the fact that the correlated response types are
(in the predictive system) related (linearly) to precisely
the same x's.
We now pass on to other aspects of estimation
in more difficult predictive systems.
31
6.0
SYSTEMS OF LINEAR REGRESSION EQUATIONS
6.1
The Nature of Errors Associated
with Linear Regression Systems
In analyzing relationships between different types of
dependent responses where each response type is to be re~essed
on a set of fixed or controlled variables, one should
first give careful scrutiny to the nature of the errors associated with the models.
(Since we are concerned with linear
regression, the discussion will pertain to models which are
linear in the unknown coefficients and errors, unless otherwise stated.)
It should be realized that in any regression relationship
the inherent error, E (the error associated with the model),
is never truly noncasua1, and, in fact, is a function of other
extraneous, uncontrolled variables or disturbances.
For ex-
ample, the model
where the ~'s are disturbances.
In applying the above model,
it is hoped that the !'s contribute little to
casua1ity;
i.~.,
~in
that the major factors causing
in the x's, and that the
,
~'s
~
terms of
are included
tend to offset each other.
Let
x k =(x1k" .. ,xik' ... ,xpk) denote a treatment applied at random
to an experimental unit.
Randomization not only tends to make
32
the correlation between
ments
~
and
~t
~k
and E kt
(k~k')
cancel as the treat-
are applied to an increasing number of ex-
perimental units, but also neutralizes many of the disturbances
(or biases).
However, as the treatment,
~,
an increasing number of uni ts (resul ting in
does not follow that
~kl'~k2""
is applied to
'L kl 'Y(k2' ... ), it
neutralize each other to the
extent that E(e/xk)=o, since it is inconceivable that all the
"minor" casual agents (the ~ 's) will be offset (ei ther wi thin
or between experimental units) in the repeated application of
~.
As such, the most that should be said with regard to the
inherent error associated with a particular regression model
is that the average value of E is approximately zero;
E(e)~O.
i.~.,
Consequently, for the class of all linear models,
the inherent error will be called weakly casual when we assume
E(E)=O, when actually
E(E)~O,
correctly assume that
E(c)~O.
and strongly casual when we
Note that weakly casual is not
a necessary requirement for an adequate model;
~,g.,
a simple,
though biased model may be a better predictor than a complex
though unbiased model.
It is, however, convenient to use terms
such as weakly and strongly casual to describe the nature of
errors.
For purposes of illustrating the types of correlation between response types (arising from dependent errors), consider
the following example.
Among the characteristic responses of
an experimental unit, let, say, two of interest be measured
with error by Yl=~+61 and Y2=~2+b2 where the Yls are true
measurement values and the a's are measurement errors.
If
33
~l
and
~2
can be expressed adequately and linearly in terms of
xl' and xl and x2' respectively, then
(6.1)
and
where it is assumed the x's are controlled or fixed.
E(ElE2)~O,
Now,
since the experimental unit forces a relation
between cl and c2 or between 11 and
~2;
i.e., the extraneous
variables influencing the experimental unit induce
independence between El and E2.
the non-
With there existing no casual
exchanges between the two response types, and with El and 6 2
being weakly casual, then the correlation between
~l
and
~2
stems from a linear type association between the random variables E l and E 2 ; i.~., the correlation between '11 and ~2
originates from the nonindependence of the inherent errors,
El and E 2 , which are weakly casual for
practical sense, are noncasual for
~l
~
and
and
~2'
~,
or, in a
As such, a
"regression type" correlation does not exist between I'll and '(2'
By introducing the measurement errors, (6.1) and (6.2)
become
(6.3)
and
(6.4)
Usually the measurement errors are assumed almost negligible
so that
6
and
&are
pooled.
Note that if E(6lG2)=O and
34
E(~)=O, while E(&lb2)~O, then there results a linear associ-
ation type correlation between Yl and Y2'
In either case, we
still have correlated errors and the parameters of the models
(6.3) and (6.4) may be estimated jointly by methods given in
the next sections.
Suppose, after one cycle of investigation, it is found that
there are casual exchanges between the two characteristic
responses, with, say,
~2
influencing 11;
changes may be in terms of competition.
ditivity, the error
the casual ex-
~.g"
Then, assuming ad-
cl
in (6,1) takes the form
E: l
= b 2"'?2
,
+ El
(6.5)
(b2~O)
so that (6.1) becomes
'll
=
,
b O + blxl + b 2 '12 + E l ·
(6.6)
Under (6.5) the error, E l , in (6.1) is strongly casual for
with
now taken as weakly casual. Then not only do the
~l
ei
errors, 6 1 and c2' give rise to the linear association type
correlation between ~l and ~2 IE(e~e2)~07, but further, E 1 ,
in the role of a strongly casual agent for
additional correlation between
~l
~l'
describes an
and '12, a "regression type"
correlation,
Under (6.5), the structural models, (6,6) and (6.2), are
referred to the reduced equations or predictive models, (6.2),
and
(6.7)
35
,
,
,
"
,
where b O=b +b 2 a ; b =b +b 2 a ; b =b a ; and E =E +b E .
O
O
l
2 2 2
l
l
l
2 2
l
assuming tha t
,
"
and
,
(6.8)
E(El ) = E(E2) = E(E l ) = 0,
E(~lE2)=6l2~0
Then,
(the latter stemming from the effect of
extraneous variables on the experimental unit), it follows
that
and
El"
Although from (6.8), E 2 and
are weakly casual, a regression
relationship exists between ~i and E 2 , since ~i=E~+b2~2.
The
latter will be true whenever it becomes necessary to refer a
system of linear models to the reduced equations.
even if one were to assume that E(E{E
a precarious assumption when
~l
and
2
~2
)=b12 =0
Further,
(which may be
stem from the same ex-
perimental unit), we still have correlated errors, since
«c
~
E(eIE2)=b2022 for b2~0 and 0l2~0.
By introducing the measurement error,
(6.7) becomes
Assuming that E(b)=o; E(&~)=6;1; E(b~)=6;2; E(~182)=~Ol&2;
and E(bE)=O, then
36
and
Consequently, it is very likely that the pooled errors (for
each response type taken from the same experimental unit)
will be correlated since nonindependence will arise if at
least one of the following conditions hold:
(i) E(fiE2)~O;
(ii) b2~O; and/or E(al~2)~o.
Main emphasis in this section has been given to the nature
of the errors.
Whether the latter are strongly or weakly
casual may, of course, be determined through one or more cycles
of investigation.
In any event, we have presented a strong
argument for the existence of correlated errors between models,
and this forms a premise for developments in the following
sections.
6.2
Estimation of the Parametric Coefficients in
a Linear Regression System When W Is Known
Let there exist n randomly selected experimental units
9
with o!k=(oYlk"",oYjk,' "'OYqk) being a vector of q dependent measurements from the k th unit which characterize a
phenomenon of interest from the k th unit, k=1,2,· ",n,
37
Assume that oYjk conforms to the general linear model,
(6.9)
where
,
x =( x
... x
... x
). i-I 2 ... p' is a
o-jk oljk'
'0 ijk'
'0 p.jk'
-"
, J
J
vector of known constants (or at least controlled variables)
or independent variables; aj is an unknown parameter;
,
bj=(blj,···,bij,···,bpjj) is a vector of unknown parameters;
and
E'k is a weakly casual error associated with the model.
o J
(For the remainder of this writing, we will, for convenience,
assume the E's to include both the measurement and inherent
errors.)
Further assume that
oElk
E:
0-k
=
N (.Q., r,)
o ~'k
J
...
(6.10)
oE:qk
and
E(o _Ek
=
0,
qxq
k"k' ,
(6.11)
where the symmetric matrix
0Il
L:
qxq
=
0 12
6 1q
6 22
6 2q
(6.12)
38
Set
and
x
where if Yj depends on
Pj~P
of the vectors of X in (6.13),
then x .. implies that y. depends on
-lJ
(6.13)
(x"
... , -1
x. , ... ,x
),
-.l.
-p
=
nxp
-J
eluded once and only once in X.
_Xl'
and the latter is in-
Then an alternative popula-
tion form of the model is
Yj
nxl
=
b jO!. +
+ -J
E·
(6.14)
for j=1,2,·· ',q, where the subscript j on Xj implies that Xj
includes the Pj<P column vectors of X upon which Yj depends.
Further,
Correspondingly, let the sample form of the model be written
as
Y·
-J
1\
= b'Ol
J -
/\
(6.15)
+ X.b.
e j'
J-J + t\
/\
Consider now the determination of b jO and b j
From (6.10), (6.11), and (6.14), it follows that
in (6.15).
39
L-
blO!.
Yl
bl
Xl
0
Y =
nqxl
y.
N
-J
X.
+
b'Ol
J -
b.
-J
J
,
(6.16)
0
bqOlJ
Yq
Xq
nq x ~
L. p.
. 1 J
J=
I
I
Lnxn
*
'ZP·xl
. 1 J
J=
l: ]
qxq
where var(y)=I*r. is a symmetric matrix written as the direct
product or Kronecker product of I and l:;
i.~.,
var y =
Suppose now
=
Xj~X
of l: is available.
for some j and no
~
1* l:.
priori information
If an attempt should be made to find esti-
mates of b jO ' b j , and E such that the likelihood function,
L
= (2 )-(nq/2)
rr.1
-(n/2) times
(6.17)
is maximized, then the nonlinear system of equations (in the
parameters), resulting from equating to zero partial derivatives of L, is unsolvable in a practical sense, except possibly
40
in very simple cases.
A discussion of estimation in this case
is deferred until Section 6.4 in order that supplementary
material can be presented.
If I*r. can be rewritten as (I*W)62 , where 6 2 is unknown
and W denotes a known weight matrix such that r.=wo 2 , then, under (6.16), it is easily verified that the maximum likelihood
estimates of the b jO and b j are the same as those attained
through the method of weighted (or joint) least squares.
This fact should be obvious since maximum likelihood, under
(6.16) and known W, and weighted least squares, under
y : [E(y), (I*W)6 2] and known W, both provide minimum variance,
linear, unbiased estimates of the b jO and b j
We will now derive these estimates by means of weighted
least squares and refer the reader to Anderson
/~7
for their
derivation by maximizing the likelihood function when Xj=X
for all j.
Denoting Kl / 2 as the square root of a nonsingular kxk
matrix K, where Kl/2tKl/2=K and K- l / 2t KK-l/ 2 =I, then the
t
transformed vector, (I*W)-1/2 y , is distributed as
t
t
(I*W)-1/2
y
: N[(I*W)-1/2 E (y), 1()2J.
nqxnq
nqX1
Now let E(y) in (6.16) be rewritten as
o
E(Y)=~+Xb,
Xl
0
0
b'Ol
J -
X =
Xj
0
l
Xq
(6.18)
where
41
and b'=(bi,···,bj, ...
,~).
From (6.15) we have
el
~
e =
1\
= y -
.!?o -
01\
Xb.
Since ejk has variance directly proportional to 6 jj and, say,
6 jj "-6 j 'j"
it is intuitively evident that ej should have a
"greater influence" than e., in the determination of the esti-J
mates and as such, ej should be weighted more heavily; i.e.,
the weights imply that greater information is contained in
those
~
ances.
with smaller variances than those with larger variHence, if the squares and cross products of the
residuals are weighted inversely proportional to their weights
in W (implying that the squares and cross products of the
residuals are to be weighted by their counterpart with W- 1 )
with the resulting quadratic form being minimized with respect
/\
1\
to bO and b, then we are utilizing that quadratic in the e's
with the greatest information to obtain the estimates.
The latter argument Iwith the mathematical justification following from (6.1817 directs one to minimize
(6.19)
with respect to
zero, we have
1\
.!?o
/\
and b.
Taking partials and equating to
42
and
o~ I (I*W) -l~
':}o
so that
/\
b jO = Yj'
1\
a jO = y.J
Pj
A
- ~ bijXij
i=l
in (6.9), and
(6.20)
For example, let q=2 and
J
62 .
1
Then (6.20) becomes
(6.21)
Clearly, when the response types are independent, i.e.? p=O,
(6.21) reduces to the usual separate least squares estima-
tion of each b j .
Consider next the case when Xj=X for all j, where X is
given by (6.13).
Then (6.20) reduces to
43
or
G. =
-J
(X'X)-lX'Y ..
-J
(6.22)
Hence, when Xj=X for all j, the weight matrix offers no in1\
formation regarding b, and the separate least squares estimation of each
~
is appropriate.
The intuitive arguments underlying the foregoing developments become apparent in the following examples.
Example 6.1
Consider the models
where x is a deviation;
i.~.,
x takes on the potentially ob-
servable values,
or the "already observed" values,
depending on which came first, the model or the data.
44
Assume the errors to be weakly casual /for any error,
E(t:)=07 and that
=0
if
k~k'
.
With p known, the joint or weighted least squares estimates
of a O' aI' and b O given by (6.20) are
and
1\
bO
var
1\
aO
=
l/n
p/n
0
p/n
l/n
0
0
0
1\
al
1\
62
.
(1-p2)!rx 2
1\
1\.1
~
Consider Figure 6.1, where Y2=b O and Yl=aO+alx are the
best unbiased prediction lines for Y2 and Yl' Y2=b O+bl x is
an unbiased prediction line for Y2 which is obtained by regressing Y2 on x, and Yl=aO+:alx is the unbiased prediction
line for Yl determined by least squares independently of Y2.
Clearly the inherent (and measurement) error will induce a
posi ti ve or nega ti ve slope, say, hI = 2xY2/ ]x 2 , if Y2 is
regressed on x.
.-v
But if, as in Figure 6.1, b l > 0, then it
.-J
,...,
/V
"J
necessarily follows that al is too large (or that Yl=aO+alx
45
is overly steep) if p:> O.
Consequently, estima tion through
weighted least squares accounts for the positive correlation
and corrects
al
...v
1\
roI
rV
by an amount minus pb l , making al=al-pb l the
best unbiased estimate of b l under the given system of models.
/'.
y
.---------~
Figure 6.1
x
Unbiased and best unbiased prediction lines
Example 6.2
Likewise for the models,
Yl
=
a fa. x.
0 +
. 1
1=
11
+ E:l
under the same assumptions following the models of example
6.1, it is seen that the joint least squares estimates of the
46
Clearly, the separate or unweighted least squares estimates
of the ai' given by a=(xlx)-IX'YI' are corrected by an amount
minus p(X l X)-IY
to account for the inherent (and measurement)
2
error which makes b=(X'X)-IX'y a nonzero vector.
-2
Example 6.3 The models,
under the same assumptions as the previous examples p also have
a direct analogy to the other arguments.
Suppose that xl and x 2 are functionally independent and that
'0
~xlx2=O. Then under the given system of models, at every point
on the x2 axis, E(Y2) retains the same straight line relationship in the direction of the xl axis.
But due to sampling
error, if Y2 is regressed on both xl and x2' b2='2:x2Y2/:L:x~
will be nonzero.
Since p~O, the weighted least squares estimate, ~2 of a2'
takes into account the nonzero b2 and corrects a2=~x2YI/2x~
N
~
,v
,...j
by an amount minus pb 2 so that a2=a2-pb2'
Example 6.4 Consider now the models,
Y
I
= aO
P
+
La. x. + EI
i=l
1
1
47
under the same assumptions as previously.
In this case, it
has been seen by (6.22) that the correlations do not enter in
the estimation of the coefficients (a known correlation offers
no information in the estimation of coefficients corresponding
to response types based on precisely the same set of independent variables).
From the given examples, it was seen that
the correlation was used in adjusting one set of coefficient
estimates when a type of constraint (viz., that certain coefficients be zero) was placed on the other set of coefficients.
That is, under previous models, the expectation of the second
prediction line was to remain constant along the added dimensions of the first prediction line.
Clearly, sampling varia-
tion prevents this from being true if we pay no attention to
the restriction, so that weighted least squares correct
the
coefficients corresponding to the added dimensionality of the
first prediction line.
However, in the above models both lines
have the same dimensionality, being based on precisely the
same x's.
Clearly, there is no possible way for a known cor-
relation to enter in the estimation of coefficients, since
adjustments are impossible.
Hence a knowledge of the correla-
tions in this case offers irrelevant information in the
estimation of the coefficients.
Now, if XIFX 2 , let the notation, XICX2' imply that the
matrix Xl is contained in the matrix X2 so that X2 can be
augmented into parts,
x2 =(x l lx 2 ),
where X2 is that part of Xl
48
not contained in X2 .
X2=(xl,x2'!3'~)'
For example, if
Xl=(xl'~2)
and
then XlC=X2' since the columns vectors of Xl
are the first two column vectors of X2; i.!., X2 =(X
l
N
X2=(!3'~)'
]X2 ),
where
Considering the examples 6.1 through 6.4, the fol-
lowing theorem becomes apparent.
Theorem:
If y :
Xl~X2~" '~Xj~"
l£o
+ Xb, (I*W)a2], W is known,
'~Xq' and
X1 eX 2 C·· ·eX j e··
·exq
(6.23)
,
then Yj enters in the joint least squares estimation of some
or all of the elements of the vectors, b.,b. l,"',b , but does
-J -J+
~
not enter in the joint least squares estimation of
b. l,b.
-J-J- 2"
Proof:
.. ,b
- l ·
Let Xq=X in (6.13).
Transform the vector -1
x. of X to
,x* such tha t
n
"Z
k=l
n
X*1' k ...
2: x'!' k X'!' 'k
k=l 1
... 0
1
for i ~ i ' =1 , 2, ... ,p.
(6 . 24 )
Note that this is directly analogous to determination of
orthogonal polynomials
1~7.
* *j + Ej .
as Yj ... b *jO_1 + Xjb
Then the model (6.14) is rewritten
For examp1 e, i f X is of order nx5 and
2
2
a typical row of X is (xlk,xlk,x2k,x2k,xlkx2k)' then
xik=xlk+c l , and cl=O is obtained from
49
n
n
""Y
~
'V
.~
x*
k=l 2k
=
K=l
x *2k x *1k
=
0;
from
*
*
*
*
*
and x5k=x1kx2k+c11x4k+c12x3k+c13x2k+c14x1k+c15'
where c 11
through c15 are obtained from
= ...
Now, from (6.20),
X*'X*w 11 X*'X*w12 ... X*'X*w 1q - 1
11' 12'
, 1 q
=
q
X* , ( L w1 j y .)
1
. 1
J=
J
X*'X*w 12 X*'X*w 22 ..• X*'X*w 2 q
21' 22'
, 2 q
X*'X*w 1 q X*'X*w 2q, ... , X*'X*wqq
'q 2
q q
q 1
(6.25)
50
We will refer to the inverse matrix on the right-hand side of
(6.25) as having q pseudo rows and q pseudo columns.
But under
(6.23), xj may be written as the partitioned matrix,
Xj=(XiIX21~31' "/Xj); ~.£., x3=(Xi/x2/X3) where X2 is that part
of X* not contained in X*, and X* is that part of X* not con2 1 3
3
tained in X~ and
Correspondingly,
x;.
b*' -
-j
-
J
[~*
'6*'
. .. '6*'
- j ( l ) ' -j(2)'
, -j(j)
Hence (6.25) becomes
.
X*'X*w11 X*'X*w12 X*'X*w12 ... X*'X*w1 q .. , X*'X*w 1 -1
1 1
'1 1
' 1 2
'
, 1 1
'
, 1 q
12 X*'X*w 22 X*'~X*w22 ... X*'X*w 2 q ... X*'X*w2q
~*2(1) X*'X*w
11
'11
'12
'
'11'
'1 q
£*2(2) X*'X*w 12 X*'X*w 22 X*'X*w 22 ... X*'X*w2 q ... X*'X*w2q
21
'21
'22'
'21'
'2 q
=.
0
0
•
0
0
• • • • • • • • • • • • • • •
tt
••
0
0
•
•
•
•
•
•
•
•
•
0
1\*
b
-q(l)
•
1\
b*
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
* 1.v
2 '" ,... 2
,.J
,-J
q 'X*w
1 q ' X*'X*w
q 1 q ' X*'X*w
q
2 q ' .. " X*'X*wqq···
q 1
'
, X*'X*wq
q q
,y
-q (q)
x* , (
1
2:j w1 j -J
y .)
*
X '( 'V
LJw 2'Jy.)
1
.
-J
J
x
x*2 ,( L:. w2 j -J
y.
J
(6,26)
51
where the vector, b* !Which denotes the left-hand side of
(6.2617 has (q/2)(q+l) pseudo elements and the inverse matrix
on the right-hand side of (6.26) has the same number of
pseudo rows and columns.
Now , from (6 .
24)
Further, it is
, X*'X*
j
j ' =0 for J'1J".
'"
easily seen that if the elements of ~* are rearranged in the
following order,
... , 1\
b*t
, 1\b*t
, 1\b*'
-q-l(q-l) -q-l(q) -q(q
J,
then (6.26) becomes
x*'x* W-1
1
-1
o
1
b* =
X*'x*
-I
j
wj-l,j-l
j
X*'x*
q
q
Wq-l,q-l
times
x*'
W- l
1
x*'
2
WI
0
y.
-J
o
where Wjj/of order (q-j)27 is W- l with the first j rows and
columns deleted, and wj=(wj!w jj ) lof order (q-j)xq7 is W- l
with the first j rows deleted, j=1,2,···,q-l. For example,
wq-1,q-l=wqq , the lower right-hand element of the matrix W- l .
52
ClearlYJ
(X~9X~)-lx~tYl=bi' which
implies that
Further, a typical pseudo element of
G;
(XiXl)-lXiYl=~l'
is
ti*
-j+l (j)
ti*
-j+-l. U)
1\*
b
-q(j)
'"
[(5{~tx~)=lXj'J
* [(wj-l,j-l)-1(wj-1,j-1lwj-l,j-l)] y
=
[(x*'x*)-lx*~
*[<wj-lJj-l)-l(wj-l,j-l>1
(q-j)xU-l)
000
••
I] Y
(q_j+l)2
" , 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
,..;,..,
,..,
w.J +-v , 1'.'J +'lJI , 2" "'w,J +-vI , J·_1,0,0, ...• 1, ... ,0
j
=
¥OJ
:+:
0
•
•
<)
/)
(I
<)
(>
•
(I
(>
•
0
(I
•
<)
(I
<)
(I
"
•
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1
,'" ,Wq,j_l
so that
.0,0, ... ,0, ... ,1
1q
53
Hence, for the j+tth regression line, the vector estimate,
/\*
bj+t(j)' entails some or all of the y's up to and including
Yj+t' but not Yj+t+l'
Since the same is necessarily true for
(\
bj+t,(j)' the theorem is proved.
Corollary:
Under the assumptions of the previous theorem, the
-1
1\
test estimate of bl is bl=(Xl'X l ) XiYl'
Proof: Given during the course of the proof of the previous
theorem.
Finally, from (6.20) and (6.16), it follows that
(6.27)
Also, when Xj=X for all j, then (6.27) becomes
var(b)
= (X'X)-l *
~.
(6.28)
The reader is referred to Anderson /2_7 for an alternate
derivation of (6.28) and to Williams /417 for a specific
application thereof.
6.3
Estimation of ~2 and ~
Consider now finding the best unbiased estimate of 6 2 in
c~ =(b 'O,b~); c'=(c ' ,c ',,·· ,c~" .. ,£.),);
-J
J -J
- l - 2
-J~
"'c'. = ("b j 0' "b')'
..-..,j , ... '.£q'
6,). T j = (llx)'
-J
_j , ",
c = (1\,1\,
c l ' c 2 , ... ,c
_ j , an d
(6.18).
Let
1-
o
I Tl
T =i
10
L
'T·
J
'Tq
54
By equating (6.14) and (6.15) for all j and solving for
e ' =(e ' , ... , e ~ , ... , e ' ), we have
-
-1
-J
-q
(6.29)
Letting A=I*W, then from (6.14) and (6.20) it follows that
(6.30)
and thence,
(6.31)
Now
E[e'O*w)-l e]
= E(e'A-1~)
[E' [~-1
=
E
=
E~'[I
- A-1 T(T ' A-1 T) -IT' A-lJ E }
- A-l/2T(T'A-lT)-lT'A-l/2'J
where, from (6.10), E=A- 1 / 2 'E: N(O,
I ~2).
nqxnq
EJ.
Then with tr
denoting trace,
= nq~2 - 6 2 tr
= nq~2
6 2 tr
[A -1/2T(T' A-IT) -IT' A-1/2]
I
q
(~p,
.
J +q)x( ~ Pj+q)
J=
q
=
[nq -
({;p.
J= J
+ q)J (J2 .
J=
(6.32)
55
Hence, the best unbiased estimate of 6 2 is
(6.33)
Next we will find the best unbiased estimate of E when
Xj=X for all j.
By equating (6.14) and (6.15), using
SJ.=(T~T.)-lT~Yj and (6.14), and solving for e ., the solution is
-
J J
J
J-
(6.34)
Then
Hence the best unbiased estimate of the element 6 .. , of ! is
JJ
e
=
.,
ii - (p+l)
~e
-J-J
(!.j , -T j , ~
n - (p+l)
(l.j -T j t ) ,
=
t )
(6.35)
Estimation of b When W Is Unknown
Prior to (6.18), ~ was rewritten as W62 . In doing
6.4
S09
it
was assumed that one had information leading to a known W.
Such would be the case if, say, the simple correlations, Pjj'
and the ratios of the diagonal elements of
~
were known.
However, such knowledge is rarely at hand in practice.
56
In this connection, with
Xj~X,
Pjj'~O,
and no prior
knowledge of W, it is the intent to estimate b through an
iterative scheme where W is estimated from the data on hand.
If we estimate (by least squares) each Cj=(bjO,bj )
separately by, say,
1\
(note that the b jO are not influenced by the Pjj') when
and
Pjj'~O,
Xj~X
then one finds analogously to (6.31) that
e· = [I - T· (T t T . ) -1 T j'
-J
J
J J
where Uj(nxn) has rank Pj+l.
J -J
E·
= (I - U . ) E·
J -J
( sa y) ,
(6.36)
For the expectation of eje jV , we
have
E(e'.e.,)
-J-J
= EIE'.(I-U.)'(I-U.,)E.,l
L:""J
J
J -J ~
= E[E~Ic.,
-J -J
- E~U~E., - E~U.,E., + E~U~U.tE.vl (6.37)
-J J-J
-J J -J
tr Uj'
= Pj' + 1.
But
and
-J J J -J U
57
= (
rv
Tj
Tj , ) ,
nX(P j +1) nX(Pj'-Pj)
and
tr(UjU j ,)
=
Tj (Tj Tj ) -1 Tj (Tj
=
'J
C (' )
t r T TjT -1,
'
) -1 T ,
TjT j , (Tj,T
j
j
j ,
j
I'Tj , )
tr
TjT j
TJT j ,
/V
~,
=
t{ T; ':j
T~
J
,N,
T ., 'if .,
J
J
TjT j
r~~T'
-1
,'" ,
TjT
j
T. ,
J
1
r
r
T~T.
N,
""Tj,T
'" ,
j
J J
N
'"
Tj,Tj(TjT j )- TjT j ,
tV
,
TjT j
TjTj
1
o
(Pj+1)x(Pj '-Pj)
= tr
where
(6.38)
so that an unbiased estimate of 6 jj , is
58
I'
ajj
,
=n
=
e\e·,
-J-J
_ Pm _ 1
e.e. ,
-J-J
n - p. - 1
J
h
were
Pm
if X
j
)
= max (
Pj'Pj'
= XJ·t or J.=.J t
That is, to obtain an unbiased estimate of
Pjj'~O,
~jjt,
.
when
Xj~Xjt,
and XjC=Xjt, divide the sum of the residual cross
products by n minus ±he larger number of coefficients occurring in the jth and j,th regression lines.
In general, we have
The weight matrix may then be estimated by
= "W (say)
(6.39)
for j,j'-1,2, ... ,q where c is an arbitrary constant.
Now, for the initial estimate of W, say ~(O), we may use
~(O) to obtain the residual sum of squares and cross products;
i . .!.
,
e ~ e .,
-J-J
=
( y . -T . 1\(0)
c. )
-J J-J
t (
1'\(0)
y . , -T . ,c . t ) .
-J
J-J
By substituting "w(O) for W in
(6.40)
one may proceed to estimate £ by, say, ~(l), which is an improved estimate over ~(O); improved in the sense that the correlations are now being accounted for.
Now use the new
59
estimate , ~(l) , to obtain new residual sums of squares and
-
cross products and hence a new weight matrix, say, W(l).
Substituting ~(l) into (6.40), we obtain another estimate of
£, say, &(2).
This process is continued until WeN) and
W(N+l) (and consequently &(N+l) and a(N+2»
are identical up
to a desired decimal place, which assumes that convergence
takes place at some stage of the iteration cycle.
To illustrate another iterative scheme, suppose we
write the normal equations leading to (6.40) as
T ~,
J
Zq
",
q
','
1\
wJ J y, = ~ wJ J T ~ , T ,£, ,
j=l
J
j=l
J J J
j'=1,2, ... ,q.
T,=(l!X,),
J
-
J
(6.41)
Further, let Wjj be that matrix including the
first j rows and columns of W and W~~=(w~~').
JJ
JJ
Now assume that the Tj of equations (6.41) obey the
hypothesis of the theorem in Section 6.2.
Then we know that
Yj enters in the least squares estimation of some of the
elements of
Cj,Cj+l""'~'
mation of c j _l " "'£1'
but does not enter in the esti-
But this implies that c1 can be
estimated independently of Y2""'Y , c 2 independently of
.::.qY3'"
"Xq' £s independently of 14""'Xq' and so forth.
sequently, the Cj of (6.41) are attainable in a stepwise
fashion as follows,
Con-
60
TiT1~1 = Ti Y1
(6.42)
TiT2~2 = Ti Y2
(6.43)
N,
T T
N, [( w22!w22)e1
21 22 .
1\
C
2 2 2 = T2
+
YaJ
(6.44)
a
Ti T3 3 = Ti l 3
,..J
2
1\
T T C
3 3
"J
=T
"J
T'T B'
3 3-3
a
T'1 Tq-q
-v
T'T
21 23
2 (w33!w33)e1
[
=
31 !w33 )e
T'3 [<w33
33-1
=
T1'l.n'i.
(6.45)
+
22 23
(w23!w33)~2 + Y3J
+ (w 32 !w33 )e
33
33-2 + Y3
J
(6.46)
(6.47)
1\
C
q q-q
where (6.42) is obtained by setting q-1 in (6.41); (6.43) and
(6.44) are obtained by setting q=2 in (6.41); (6.45), (6.46),
.
and (6.47) are obtained by setting q=3 in (6.41), and so on .
7
,
1\
Note that with W unknown, the best estimate £1 of c1 is
still available.
Then, fixing ~1' we may converge (assuming
convergence at the N2 th iterate ) to an es t'1mate cA(N2)
0 f c2'
2
"·
1\
A(N3) 0 f
t f 1x1ng
Nex,
c1 an d I\(N2)
c2
,we converge t 0 an es t'1ma t e ~
~,
and so forth.
61
Though this stepwise convergence to estimates of Cj entails considerably less work than converging to estimates of
the Cj jointly, it should be noted that in the stepwise convergence the variability associated with each progressive
~~Nj) necessarily increases since it includes the variability
-J
/\ L\(N2) ... ,c
/I(Nj-1)
,
, de'
o f the prece d 1.ng
~1 ,c
are fixed 1.0
_
wh1.ch
j 1
2
/I (N ' )
termining c, J . However, if q is small, the errors of the
-J
stepwise estimates will, most likely, be of approximately
the same magnitude as the joint estimates.
6.5
A Sufficient Condition for Convergence and an
Approximate Variance of the Iterated Estimate
Consider now the question of convergence when using the
first iterative scheme of Section 6.4 to obtain an estimate
of c.
Let ~ be the best estimate of c; !.e., if ~~ is any other
estimate of c~, then var ~:/var ~~.
Clearly all the elements
of ~ are attainable if W is known; i.~.,
(say)
where
,.. [ll(T,y;W), ... ,~(T,y;W), ... ,1-
~ Pj+q
(T,y;W)].
j~
Further, since W does not enter in the joint least squares
estimation of the b jO ' disregard the latter so that
62
1\
bl
A
b ::
o
1\
b.
::
-J
(X,y;W).
f
(6.48)
q
J:' Pjxl
J~
It was seen that
(6.49)
where one method of obtaining ti(O) is through the separate
least squares estimation of each b j .
general recurrence relationship,
Then (6.49) defines a
(6.50)
or
:: ho\(b(N»,
which is used in solving the
~
p.
. I J
nonlinear equations
J=
o
0
--
b :: h(b)
in the same number of unknowns.
....
(6.51)
The superscript, (N), in
o
1\
(6.50) denotes the N+l st iterate and for convenience, X, b O'
and yare not written in the functional forms, since they are
held constant throughout the iteration process.
Let
I\b (N) __ bl\*
0
+ ()
(N)
,
(6.52)
63
A
where b* is a solution of (6.51) and
N+1 st iterate.
O(N)
&
is the error of the
Substituting (6.52) into (6.50) yields
~(N+1) = h(G*+h(N»
or
b(N+1)
~
= h (G*+~(N».
""" - -
(6.53)
Expanding the right-hand side of (6.53) in a Taylor's series
about b*, we have
b(N+1)
~
= h~(G*)
+
-,
[(Ohd/Ob)lb=b*J'~(N)
-,
-
- -
-
+
0(~2)
(6 . 54)
or generally,
(6.55)
where
(6.56)
!\
/\
Using (6.52) along with b*=h(b*),
!.~.,
substituting the
solution b* into (6.51), then (6.55) takes the form»
S(N+1)=H&(N)+0(&2), or, neglecting 0(6 2 ), ~(N)=H9N, where the
9's are the characteristic roots of the matrix
°
H, and H
is
the spectral decomposition of H (and is determined from
b(O».
Clearly, if the 9 largest in modulus is less than unity, then
.
I'"
~ (N)~ 0 as N ----->- <1>.
This, then, is a sufficient condition for
the iteration process to converge to a solution 1\*
b , assuming
°
that 0(b 2 ) may be neglected.
from reference
(The preceding argument is taken
/~_7.)
Now consider the approximate variance of such an estimate.
Let (6.52) be rewritten in the form,
64
(6.57)
where b(N) is the error distance of the N+l st iterate from the
A
best estimate, b, and
1 is
A
analogous to £-£ in (6.30).
Sub-
stituting the middle term of (6.57) into (6.50) results in
(6,58)
or
(6.59)
Expanding the right-hand side of (6.59) in a Taylor's series
A
about b yields
Further, since G=b+~,
where the latter follows by expanding the middle term of
(6.61) in a Taylor's series about b.
pansion of
/'.,
h~(b)
Substituting the ex-
in (6.61) into (6.60) and using (6.57), we
have
(N 1)
b~ +
1\
1\
,
(
)
= h~(b) - be>( + [(ohd,lob) b=bJ () N
0
0
(6.62)
+ [(oh~/ob) b=bJ'~ + 0(6 2 ) + 0(~2).
Neglecting 0(0 2 ) and 0(s2), the result in general is
(6.63)
65
where GI = [(oh~/ob) I b=bJ and G2 = [(aho( lob) '1 b=b
q
J for
o<=I,2,"',L
Pj'
j=l
(
Note that if W is known, it should follow that b N)=O for
all N.
This is seen by noting that a known W is not a funcand
Further,
o
with W known, h(b) becomes !(X,y;W) in (6.48), all of which
implies that ij(N)=O.
The same can be said for particular b~
for which best estimates are attainable independently of W.
Now, (6.63) can be written in the form
o
0
where the 9's are the characteristic roots of GI , and Gl is
o
the spectral decomposition of GI . If the largest 9 in
modulus is less than unity, then b(N)-+h(b)-b+G21 for N
sufficiently large.
Further, as n increases, h(b) rapidly
/\
approaches b. (The latter statement is based on the following
intuitive argument.
If
~=[SSEjj,/(df)jjJ, where df implies
degrees of freedom, then when b is known, (df)jj,=n-l rather
than n-Pm-l, Pm=max(Pj,Pj')'
Clearly, as n increases (say
1\
n>20), then W should be very close to W so that the difference
/\
between h(b) and b is negligible.)
Then assuming convergence,
(which presupposes 0(&2), 0(~2) negligible) we have
(6.65)
and by substituting (6,65) into (6.57),
66
Since E(!)=Q, E(b*)~b and
(6,66)
Assuming convergence, it is proposed that
1\
1\
b * rv N( b, va r b *),
where var 6* is estimated by substituting
(6,67)
b*
for band
W(N-I)=W* for W (with convergence occurring at the Nth iterate)
in (6.66).
1\
It should be noted that we have neglected in var (b*)
the variability of
A
~
which arises in repeated sampling,
As
such, var (b*) in (6,66) will tend to underestimate the true
small sample variance of the iterated estimate
6.6
b*,
General Comments on Estimation in
the Predictive Regression System
This discussion pertains to cases where the Yj
(j=I,2, ... ,q) depend on different sets of overlapping independent variables with no
~
priori information regarding E.
If the sole intent is the individual prediction of each
Yj for a given Xj' then separate least squares may be applied
in estimating the parametric coefficients and diagonal elements of L.
But for analyses requiring the use of the off-
diagonal elements of L, separate least squares neglecting the
correlations may yield misleading results. For example, the
q dimensional confidence ellipse for the average value of y,
67
!.~.,
E(y), given a particular value of
o
xt
-2
.~
is defined by
(6.68)
which stems from
where V is the coefficient matrix at 6 2 in (6.66).
known, ~* is replaced by Wand V by [X'(I*W)-IX]-l.
correlations in
~
With W
If the
are neglected in (6.68), then the ellipsoidal
region defined by the la tter will not be "til ted?V so tha t
values of y given AO may, in repeated sampling, fall outside
the confidence region with great regularity.
The same can be
said for the q dimensional tolerance ellipse defined by
The tolerance hyperellipse being such that
vectors y given A will fall within it.
O
l-~
percent of the
68
Should the simple correlations of r: be "near zero," then
the separate least squares estimation of each b j may be used.
If, however, "near zero" correlations cannot be assumed, then
one should go from the restricted regression system (the systern in which the models include different sets of overlapping
independent variables) to the unrestricted regression system;
i.~.,
Xj=X for all j.
After regressing each response type on
all the independent variables (in which case best estimates are
available independently at r.=W6 2 ), the result should be that
the coefficients not appearing in the restricted regression
system should have near zero estimates in the unrestricted
system as is apparent from examples 6.1 through 6,3.
A test
of significance can be made to determine whether the matrix,
1\
BU
qxp
=
1\
b·
-J
/\
~
(the estimated coefficient matrix of the unrestricted re1\
gression system), differs significantly from BR , which is
A
A
the matrix BU with zeros replacing those bU's not appearing
in the restricted system.
However, methods such as the like-
"
1\
lihood ratio test used for testing HO:BU=B
(see Anderson
R
/2_7)are not sensitive to small departures from the null
hypothesis whereas we are particularly interested in detecting
small
~epartures.
Further, due to the impossibility of
69
proving HO' a test of significance, particularly in this
problem, leaves much to be desired with regard to deciding
A
A
whether BR is "sufficiently close" to BU' On the other hand,
one may have no other choice other than to resort to a test
of significance.
~
~
If BR is "sufficiently close" to BU' then the coefficients
in the models of the restricted regression system may be estimated by separate least squares and from these estimates, say,
b~*, the matrix (SSE .. ,) may be determined.
JJ
J
The b~* will be
~
unbiased, consistent, and possibly most efficient under the
given circumstances.
If those coefficients not appearing in the restrictive
regression system have "decidedly" nonzero estima tes in the
unrestricted system, then one should either
(i) question the models of the restricted regression
system, or
(ii) make adjustments by using some iterative scheme
to estimate b.
With the models having strong theoretical justification,
there may be no other choice than to make adjustments through
iteration.
Then assuming convergence, there is the formidable
task of estimated (6.66), the estimated variance of an iterated
vector estimate.
The likely resolution will be the use of the
A
estimated asymptotic variance of b*; namely
(6.69)
70
A
even though the variances of the b*'s may be considerably
underestimated, particularly if there are many parameters to
estimate through iteration (or rather, by the first iterative
scheme of Section 6.4).
For more simple systems, such as in
example 6.1, it is seen in the forthcoming example (example
6.5) that the estimate of (6.66) is very close to (6,69).
This is due possibly to the fact that iteration is required
in the estimation of only one parameter, b l , and that
al=r.Ylx~!X2 (the least squares estimate of the coefficient
not appearing in the restricted system) is not significantly
different from zero.
Further, it is again stressed that the
variance of the iterated solution does not account for the
A
A
variability contributed by the b Oj in W*.
In general, if there are very few parameters which must
be estimated through iteration, the estimate of (6.66) and
the estimated asymptotic covariance matrix (6.69) may be close,
particularly if those coefficients in the restricted regression system have nonsignificant (from zero) estimates in the
unrestricted regression system.
If there exist many param-
eters and if adjustments need be made, then one should be well
aware that the variances of such estimates increase, possibly
by very large amounts.
Example 6.5.
From a bivariate normal population,
N
.r:
71
the data were as follows:
-x
Y1
Y2
-5
8.8
3.7
-3
9.6
12.5
-1
10.6
19.8
1
8.3
24.4
3
10.4
31.6
5
10.1
40.1
A
A
-
Best estimates of a O and b O were a O=Y1=9.6333 and b O=Y2=22.0167.
Even though
~1=~XY112x2)=(6.6/70) was not significantly
greater than zero, an adjustment was made by means of
where
p(N) = MSEi~-1)/MSE~N-1) = (SSEi~-1)/4)/(SSE~N-1)/9)
=
~
4
since 611=612 ;
A
A
~(N
~(Y1~aO)(Y2-bO-b
9
~(y 1 _~ 0 )2
i.~.,
+
-
1)
x)
~(y 2 -b 0 _b(N-1)x)2
6 2y1 =6 2y2 '
Three iterations were required
to converge to the third decimal place of
..
b*
and the second
decimal place of 11*
P •.
b(O) = 3.4843
1\(0)
p
b(l)
1\(1)
=
=
3.4190
p
3.4119
1\(2)
b(3) = 3.4114
1\(3)
1;(2)
p
p
= 0.6930
=
0.7677
= 0.7729
=
0.7703.
72
Then
ob (N+I\tOb l = [-
I
(2":: xYI I 2x 2 ) (ll~ (N) lobI) bi
b~}
=
O. 0639.
so that the variance of ~* is estimated to be (1.0639)2=1.1319
"
times the estimated asymptotic variance of b*.
The asymptotic
covariance matrix of the estimates is
1\
aO
l/n
pin
0
pin
l/n
0
0
0
(1- p 2)/2 x 2
1\
var
bO
=
1\*
b
1
62
,
which is estimated by
0.77/6
1/6
o
o
o
o
1-(0.77)2 /70
0.77/6
1/6
1\2
where 6 =1.329.
"2
6
=
0.167
0.128
0.128
0.167
o
o
o
o
0.006
1\2
6 ,
The estimate of the covariance matrix given
by (6.66) is
0.167
0.128
0.128
0.167
o
o
o
o
0.007
Note that the zero's in the above matrix are fictitious, since
1\*
1\
b is dependent on aO
and ~b O ;
1
turn depends on
~O and bOo
i.~.,
1\*
b 1 depends on 1\*
p which in
Also, the variance of
/'
A
hi
would be
somewhat larger, since the variation of a O and b O is neglected
in
P*.
73
6.7
A Note on the Variance of a Vector Coefficient
Kstimate Obtained Through Iteration in a
Single Rational Regression Model
The initial iteration scheme of Section 6.4 is well
known.
See, for example, applications by Williams /417 and
Turner /377.
Then assuming convergence as discussed in the
previous section (though convergence is rarely discussed in
statistical applications using iteration processes), the
usual recourse is to use the asymptotic variance to describe
the variability of the resulting estimates.
In doing so, the
1\
variability of the estimated weight matrix, W*, is neglected.
Consequently, probability statements, such as those associated
with confidence intervals for the parametric coefficients, may
be grossly in error.
Turner /377 considered the rational regression model,
(6.70)
where Ap(x)=aO+a1x+a2x2+ ... +apxP, and Bq (x)=1+b 1 X+b 2 x 2 + •• '+bqxq.
The problem is one of estimating the a's and b's under the
assumption that the c's are NID(0,6 2 ).
It is readily seen
that even in the simple case, p=O and q=l, the maximum 1ike1ihood equations are intractable due to there being no systematic method of iteration.
To counter this difficulty, (6.70)
is put in another form; namely,
(6.71 )
74
If one were to neglect the obvious defect that the "independent" variables are correlated with the error term, Bq(X)E,
then estimates of the a's and bts are attainable (assuming
vergence) through weighted least squares by using the first
iterative scheme of Section 6.4;
i.~.,
(6.72 )
where
2
P
p
I , Xl ' Xl ' ... , Xl' -Xl YI ' ••• , -Xl YI
TO
=
.••.
.
.
.
"
~ (N) t :: (a (N) , ... ,a (N»
o
p
~(N)t
=
•
"
•
-X
n Yn , " ' , -xPy
n n
0
•
"
q.
•
9
,
(b~N), ... ,b~N»,
and ~aN)-1 is a diagonal matrix with typical element,
(l+b~N)xk+" .+b~N)x~)-2, k=l,2, .. ·,n.
The estimated asymptotic
covariance matrix of the coefficient estimates, ~*,G*, is
,-
75
If we proceed as in (6.51), the recurrence relationship
(6.72) is used in obtaining solutions to
l:]'
(6.73)
h(b).
But (6,73) is a set of p+q+l equations in q unknowns so that
. t e var1ance
'
of aA * and ~b* by
' d th e approx1ma
a tt emp t s t 0 f 1n
methods of Section 6.5 are futile.
Suppose, however, that the model (6.70) is altered so as
to take the form
(6.74)
under the same assumptions following (6.70).
Multiplying
through (6.74) by Bq(X)+Ap(X) and rearranging terms, we have
y
=
-y
t
1 =1
i + (l-y)
b,x
1
so that, again, one may proceed with the weighted least
squares iteration scheme in estimating the a's and b 9 s.
Instead of (6.72), we have the new vector estimate
(assuming convergence)
i
[
~**]
'b**
=
T' ~**-l T )-1 T1 ~**-ly
(0
0000
0 u 0
-'
(6.76)
76
where,
and ~~*-1 is a diagonal matrix with typical element 9
(1 +
Now, as in (6.51), the iteration scheme is used to obtain the
solutions to
under the model (6.74) and in precisely the same manner as in
Section 6.5, it is seen that
var
~
A**]
[ b**
(6.77)
~ [I +[o~~\)~l~] or:)] ko~~*-lOTO)
~=1,2, ... ,p+q+l.
-1
II +{a~~~)~I[~l· ~]]
2
'6 ,
Then (6.77) is estimated by setting 0 2 =6 2
and evaluating (~,b) at (a**,~**).
By setting p=O and q=l in (6.68), there results the
rectangular hyperbola, y=[aO/(1+b1x~ +~, with asymptotes at
x=-(1/b 1 ) and y=O.
But under the model (6.68), it is dif-
ficult to measure the worth of those estimates obtained through
iteration, other than by using the asymptotic covariance matrix.
77
If, instead, we refer to model (6.74) and set p=O and q=l,
then y=[ao/(bo+blx~+e, where b O=a O+1 and the asymptotes existing at x=-(bO/bl) and y=O. In this case, different estimates will, most likely, be obtained, but one will at least
be able to gain a better measure of their variation in small
samples.
Note that the disadvantage of using the model (6.74)
instead of (6.70) is that additional parameters are introduced, thus possibly increasing the variability of the iterated
estimate.
Hence, while the gain in using (6.74) is possibly a
more realistic covariance matrix for the estimates, the loss
may. be a larger variability associated with the estimates due
to the additional parameters.
It should also be noted that
another alternative is to generate small sample estimates of
the parameters by omitting single observations, pairs of observations, and so forth.
In this way, estimates of the small
sample dispersion for the parameter estimates are available.
We have chosen another course in attempting to find an approximate expression for small sample dispersion.
78
7.0
7.1
SOME UNCONSTRAINED EXTREME VALUE PROBLEMS
The Estimation of Stationary Points of
Linear Functions of Linear Regressions
o
Referring to the p columns of X in (6.l3)p let xi=xi-Ki
n
0
(Xi=~oXik/n)' where xi is a potentially observable value
k=l
0
in the range min oXik<xi~max oX ik as opposed to oX ik which
k
has already been observed.
Let ~ '=[Zl' ... ,Z-t' ... 'ZhJ be the
h dimensional vector (-t=1,2, ... ,h) including those x's in
~t=(Xl'"
~.g.,
1.
"xi" .. ,x p ) which are functionally independent;
f ~'
2 2) then _Z' = (Z l' Z)
= ( xl,x2,xlx2,xl'X2'
2 = ( xl ,x2 ) .
If an estimated linear function of interest p say,
(7.1)
is differentiable where -t'=(.f'1,,f'2'··· ,-t q ) is a non-null vector
of preselected constants and the
y.J
taken in linear combina-
tion are in the same unit of measure, then by taking partial
derivatives of
~
with respect to the Z's and solving the h
equations,
(7.2)
simultaneously, we obtain estimates, say, ~ of those Z values
for which ~[E(y)J attains stationary values, assuming the existence of at least one stationary point.
79
A Confidence Region for E(Z) When ~(y) is a
7.2
-
-
Second-order Linear Function of the Zts
At this point and for the remainder of Chapter 7, we shall
restrict ~(y) to the role of a second-order linear function of
the zt s .
With
Z denoting
the Z value for which ~(y) attains its
" be the Z value for which (lj [E{y)]
stationary value, let E{Z)
attains its stationary point.
But E(Y) is unknown, so that a
natural consequence is to determine an h dimensional confidence
A
E(~).
region for
Such a region is useful in situations where,
say, the achievement of
region of
~
A
~
is difficult, making a range or
values desirable.
One approach to its attainment is to define the confidence
region as that region in the h dimensional Z space such that,
with probability
l-~,
the null hypothesis,
== 0
is not rejected.
(7.3)
Consequently, the problem is one of finding
the region of Z values which is in concordance with the null
hypothesis.
If Xj~X for some j, then o~{y)/oZ is of the form
-so
o~(y)
t
-oZl
;'\
00(Y)
oZ
=
o0(Y)
-oZ.f,
=
'C"
l -ril,t 2 -12'
lXP2
lXPl
tl
lll,t 2
1\
t
, q
1:'
-lp
lXPq
U2'" .,tq .!iq
t
l
T
hl
,t
2
't""'
-h2'
• ••
t
, q
bl
Plxl
1\
b.
p.xl
-J
J
. . . .
I ...
100(Y)
lOZh
• ••
/\
'7:'
-hq
b
-q
Pqxl",
or
b~(Y)/oZ
=
1\
C
hx1; Pj
j=l
q
L
j=l
b
(7.4)
Pjxl
where, for example, if q-2; ~'=(xl,x2,x3,xlx2,xlx3,x2x3'x~,x~,x~);
2 2
Yl is regressed on ~' and Y2 on (xl,x2,xlx2,xl,x2); then
Z'=(x l ,x 2 ,x3 ); t=1,2,3=h; tlLil=tl(1,O,O,x2,x3,O,2xl'O,O);
t 2 ti2=t 2 (1,O,x 2 ,2x l ,O); and so forth.
Under (7.3),
(7.5)
where from (6.27), V=[i'(l*W)-liJ-l if W is known and for W
unknown, V is the coefficient matrix of 6 2 given in (6.66).
If 6 2 is known, then
(7.6)
is distributed as ~ with h degrees of freedom under (7.3).
81
Since a quadratic form
r' A -1 r = -
o
r'
r
A
(7.7)
then we may write (7.6) as
o
eve'
o~(Y)/oZ.
or
(7.8)
"" 0,
O~(Y)/oZ:
eve'
where 1~ (0<) is the upper 0< percentage point of the ~ distribution with h degrees of freedom.
The (h+l)x(h+1) determinant in (7.8) is a function of the
1\
Z's which defines an exact confidence region for E(Z).
This,
then, is the region of all Z values such that, with probability
l-~,
the null hypothesis in (7.3) is not rejected.
If 6 2 is unknown, then,
[
l)~ (x.) la!]
, (eVe' )-1 [()~ (x.) lo~
h~2
= F,
where F follows the F distribution with h and
degrees of freedom.
1\
(7.9)
nq-(~ Pj+q)
j=l
In this case, the exact confidence region
for E(Z) is determined from
82
"2
hO ~ [h, nq •
•
•
•
•
(:z p +q)J
•
q
•
."O~ (y) /O!
j
J=l
•
•
••
•••••
o~(y)/O!
When Xj=X for all j;
= O.
cvc'
i.~.,
(7 .10)
1\
all the Yj are second-order
regression in the same Z's,
r
o0(Y)/oZ
... t
, q 1:"'
~ 1 'ti,t 2 1:"'
-1'
-1
pxl
pxl
pxl
. .. . . . ... .
=
i
tl
'Lh' t 2
pxl
/.'-L-0'h' ... , t q
~'
-h
1\
b
l
pxl
A
!!q
pxl
or
A
o~(y)/oZ
rJ
,..//\
= Cb,
roJ
and CVC' becomes
'1:"' (X'X)-l"t' ,'"
-v
-1
w
-1
,'l'
(X'X)-l1'
-1
-h
cvc' =
't' (X' X) -1'[
-h
-1
,...;r' (X' X) -l~
-h
where from (6.28) var G=(X'X)-l*E.
-h
-t'Lt- =.Qt'"2;t,
- (7.11)
From (6.34) and (6.10)
q
we have ~ tjej:N(O,(I-U)t'2:t), where
J=l
And since I-U is idempotent, it necessarily follows that
83
q
q
(Ll
t ~)'(~ t e)
~j_
.. _1_ _-:::J:...=_1
t'(e~e . .)t
t'L:t
tl~-t
t'L;t
.. -
,t'2:-t
A
-J-J
- = (n-p-1)-
-
is distributed as ~ with n-p-1 degrees of freedom.
Hence,
[O~(l.)/o~JtJl-1[o~(1.)/o!J .. F
(7.12)
/\
ht'L:t
is distributed as F with hand n-p-1 degrees of freedom, and
1\
the exact confidence region for E(Z) is formed from
=
o.
(7.13)
The determinant in (7.13) is precisely the confidence region
derived by Box and Hunter
14-7, if t'y is regressed
on~.
This
is so because the equation resulting from the latter and tit
are identical as was noted earlier.
However, by dealing with
linear functions of regressions rather than one regression of
a linear combination of the observations
on~,
it may be pos-
sible to explain and/or remedy irregularities in the confidence
region by omitting certain Yj'
Irregularities in the confi-
dence region will be discussed in the next section.
7.3
Remarks Concerning the Confidence Region
Following the reasoning and terminology of Box and Hunter
11L7, the region allowed by (7.8), (7.10), or (7.13) will now
be discussed.
84
The size and shape of the confidence region depends not
only on the adequacy of the model, ~[E(y)J=ttB~, but also on
the nature of the response surface itself.
Roughly speaking, the experimental error influences the
size of the region, while the nature of the response surface
is ascendant with regard to shape.
It is evident that a
large error mean square stemming from either an inadequate
model or insufficient error degrees of freedom will give rise
to a large confidence region.
If, on the other hand, the
experimental error is small and the response surface portrays,
say, a stationary ridge, then clearly, the confidence region
A
for E(Z) must certainly be attenuated in the direction of the
ridge.
Further, consider the following situation where the
nature of the surface influences the size of the region.
If
the surface is near sYmmetrical concave paraboloid with the
curvature in the vicinity of ~(ylz) being sufficiently small,
then there would exist many values close to the true maximum
A
such that in repeated sampling diverse Z values may occur
even with a small experimental error.
In such a situation,
a large confidence region may result.
At this point we will consider the detailed aspects of
the shape of the confidence region as determined by the nature
of the surface.
An aspect greatly influencing the shape of
the region is whether the equations
(7.14)
85
where
are well or poorly conditioned.
In explanation of this term,
we first rewrite ~(y) in the well-known form,
~(~)
= --r'Z + Z'RZ,
-
(7.15)
assuming, of course, that ~(y) is a second-degree function of
all the Z's.
Then o~(Y)/oZ=~+RZ.
The quadratic function, Q, defined by
(7.16)
is the surface which Box and Hunter call the conditioning
surface in the h+l dimensional space of
Q,Zl,Z2'" "Zh'
ForlRIFO and for fixed values of Q, (7.16) defines hyperellipsoids in the Z space.
Neglected in (7.16) is the covari-
ance matrix of [o~(Y)/()ZJ which is usually placed ubetween"
the product vectors so that well-known distributions are applicable.
Clearly, any extreme elongations in any of the h
dimensions of the hyperellipsoid defined by fixing Q must induce the same effect when the covariance matrix is accounted
for.
Hence, an examination of (7.16) allows a critical study
of the confidence region apart from any lack of fit of one or
more of the regressions taken in linear combination.
Suppose now we refer the hyperellipsoids to their principal
0 0 0
axes, say, Zl,Z2"",Zh'
/\
First translate the center of the Z
axes to Z, the center of the hyperellipsoids, so that
86
apply the orthogonal transformation which transforms the
above quadratic into
0
Q
0
0
0
0
= Z'9Z = (Zl
~'
, , pZh) 91
Z1
0
0
6
9h
where the 9 t are the h
and u
9t
.
h
02
L 9 -t Zh
"" .f.,.,.l
9,f"
ch~r~cteristic
~
Zh
roots of R 9 Rp
,
1\
Zt=Ut<~-Z),
is the ch~racteristi~ vector corresponding to the root
t
Notice now that if one or more of the characteristic
roots,
say~
der of the
919 ,,,
roots~
~9s'
s
<h, is small
comp~red
with the remain-
then the conditioning surfaces are attenuated
in the direction of the axes corresponding to the smaller roots,
As such, it is possible for values of Z9 which differ greatly
from the correct solution, to correspond to points near a
o
0
0
hyperplane passing through the axes of ZlPZ2p"',Zs such that
there result small values of Q and hence of 6~o~<Y)/GZ.
this it is seen that there will exist many
solutions to the equations (7,2),
From
Q~nearly correctQ~
In this casep Box and
Hunter call the equations (7,14) ill or poorly conditioned as
opposed to well conditioned equations where all the roots of
;
R'R are equal or nearly so.
h
Turing 1337 proposed g(R' R) ... nih) (
h
L: 9t L: 9i l ) 1/2
t=l
-t=l
as a criterion for conditioning of equations, Clearly, if
g(R'R)=l, then 9.f.,=9 for all t (well conditioned equations),
87
On the other hand, if some of the
9~
are small compared with
the remainder of the roots, then g(R'R) is small, implying
poorly conditioned equations.
/I
If 0(y) is a linear combination of second-order regres,
'A j_'
Z an d
sion f unctions in t h e same Z ' s, t h en 1\Yj=ajz+z
(7.17)
q
R
=
~ ~.A . .
t;l
(7 .18)
J J
Noting that the roots of R'R are the squares of the roots
of R with the characteristic vectors being the same, then it
is easily seen that the nature of the response surface as
written in (7.15) or (7.17) reflects the conditioning of the
equations in (7.14).
Under (7.18), if g(R) and consequently g(R'R) is small,
then the ill conditioning is due either
(i) to the inherent nature of the surface, ~(y), even
though each set of equations OYj/oZ=~j' j-l,2, ... ,q,
is well conditioned;
!.~.,
a linear combination of
well conditioned equations may yield poorly conditioned equations;
(ii) or to one or more of the sets of equations
OYj/OZ=6j being ill conditioned such that when taken
88
in linear combination, the resultant equations in
(7.14) are ill conditioned; i.e., a linear combination of well and poorly conditioned equations may
result in poorly conditioned equations.
On the other hand, a linear combination of poorly conditioned
equations may produce well conditioned equations.
In case
(ii), g(R) would be small and the diversity of the roots of
R-r.tjA j is due to one or more of the tjA j .
by Ostrowski
/~47
A recent theorem
allows a study of linear functions of poorly
and well conditioned equations (resulting in poorly conditioned equations due to the poorly conditioned equations
taken in linear combination) and may possibly direct one to
that/those Aj with roots distinctly different from R without
laborious computations.
o
Theorem.
0
Let H=(h'1J
.. ), H=(h 1J
.. ) be two nxn matrices with
lfH(9)=IH-GII =0, ¥H(G)=/H-GI)=O being the corresponding characo
teristic equations of Hand H, respectively.
Denote the roots
o
of YH(G) andYH(G) by Gi and Gi , respectively.
m=max(lh ij
I, [h ij
\), i,j=1,2, .. ' ,n, (l/nm)
2:
o
Then for at least one pair, Gi,G i , we have
i
Set
~\hij-hij\ =d.
J
(7.19)
o
and for all Gi ,9 i ,
(7.20)
Now, let the inclusion of any subscript on R, say Rj' imply
that the corresponding Aj is not taken in linear combination.
89
q
For example, Rl,2,3=~4tjAj=R-(tlAl+t2A2+t3A3). Denote by
J=
R(_) the exclusion of at least one Aj .
Suppose we find upper bounds on the difference between
at least one pair and all pairs of roots of Rand R(_) by
applying (7.19) and (7.20), respectively.
Suppose further
that (7.20) yields a small value (though in examples taken,
(7.19) and (7.20) tend to give conservative upper bounds;
i.~.,
they are too large).
Then if the roots of R are close,
so must be the roots of R(_).
Hence, Rand R(_) both cor-
respond to sets of well conditioned equations.
If, on the
other hand, the upper bound on the difference between at
least one pair of roots of Rand R(_) is large when g(R)
is small, then there is the indication that R(_) corresponds
to a set of poorly conditioned equations (though this is not
necessarily true).
In another example, suppose the Aj are to be taken in
linear combination one at a time, or the inclusion of each
Aj is through a stepwise procedure.
Starting with Al and
supposing its roots are close, we then form Al+t2A2.
If
(7.19) is small land (7.20) is relatively small? for the
matrices, Al and A1 +t 2A2 , then A2 corresponds to a set of
well conditioned equations. Assuming this to be true until
the ith step where (7.19) is relatively small and (7.20) is
i=l
i
excessively large for the matrices, ~ tjA j , ~ tjA. p
j=l
j=l
J
then it necessarily follows that tiAi in
i
~ t'A' corresponds to a set of poorly conditioned equations,
. 1 J J
J=
90
'=1
~ tjAj are well conditioned, In
j=l
this situation one may possibly omit Ai from consideration
since those associated with
after comparing desired analyses including and excluding Ai'
Finally, mention is given to open and closed confidence
regions.
o
The Zt determine the type of conic the second-
degree function,
~(y),
describes.
The reader is referred to
reference /6_7 for illustrations of the basic types of contours (for h=2 and ha3) arising from fixing ~(9),
If the
o
contours are closed with the Zt being of the same sign, and
if the plausible variation associated with each coefficient
o
is such that none of the Zt will change sign, then the confidence region will be closed and will describe a region for
o
Z values giving maximum (all the Z's are positive) attainment
for ~(y).
If, on the other hand, the plausible variation
o
associated with the coefficients is such that the Zt may
change sign (so that some are positive and some negative)
then open contours result, and an open confidence region may
result,
The latter may be true for, say, a saddle point in a
saddle type surface,
7.4
Intersection Region Confidence Procedures in
"
Finding an Approximate Confidence Region for E(Z)
Consider the null hypothesis given in (7.3).
For any
non-null, preselected hxl vector k, "0 is rejected if
IIok :
is rejected.
ktll[:~£)]}
• 0
(7.21)
With the customary normality assumption, no
91
mathematical difficulties are encountered in defining confidence regions or intervals.
However, with k dependent on the
data, only approximate methods were available prior to
Scheffe's work in multiple comparisons
~97,
and Roy's ex-
tensive developments in simultaneous confidence bounds /267.
Wallace /407 draws on this theory in simplifying the confidence region defined by (7.10) and (7.13).
The algebra necessary for obtaining the confidence regions in Section 7.2 is, even with two Z variables, cumbersome
to the point of possibly curtailing its use.
Fully aware of
this, Wallace states "what is important is that the region be
represented geometrically or analytically so that the user
can comprehend its size, shape and location.
Approximations
to the region which simplify this representation will be
valuable so long as they do not greatly change the confidence
level" /40, p. 4557.
Moti va ted by Scheffe' s method of making
all contrasts in the ellipsoidal type confidence region,
Wallace approximates nonellipsoidal regions as <7.10) or
(7.13) by parallelopipids or convex polyhedra.
From (7.5), it follows that for preselected
k~O
under
HOk ' k'lo~(y)/oZJ : N(O,k'Ak), where A=CVC'6 2 under (7.5)
and A=f2..(, '2::; t under (7.11). The set or region of all Z such
:
-
-
that HOk is not rejected is found from
(7.22)
92
point of Student's distribution with degrees of freedom J
f
>=
-
nq
(i:j=l Pj+q)
if ~
>=
CVC,&2
/\
1\
if A
= n - p - 1
=Dt'Lt,
Or (7.22) implies that
=1
p
(7.23)
-0<,
Consider the preselected non-null vectors,
k lJ k 2J ··
'J~J.",kh'
With each
~
there is associated a con-
fidence region or interval via
,o(lj (l.)J 2
[ !4 oZ
::
J
(k' Ak)
Suppose that the
of the data,
equal to
l-~.
!4
~F~(l,f).
were selected after an examination
Clearly, then, the probability in (7,23) is not
In this case, one may attempt to find the exact
distribution of the above statistic.
However J when
~
is
functionally dependent on the data J attempts in this direc;
tion may be futile.
Due to a theorem by Scheffe
another recourse is available.
1297~
Here the distribution of
(7,24)
is found; i.e., the distribution of the least upper bound
93
of the statistic,
is obtained over all non-null k t .
Clearly, the main dis-
advantage of this approach is over-protection or the possibility of drastic conservatism,
Proceeding in detail, F* is to be found such that
pb~
[(7.24>1 ""
It is easily seen that (Roy
F*J = 1- 0<.
/~67,
(7.25)
p. 95) (7.24) is equal to
[o~(Y)/oZJ'~-l[o~(Y)/oZJ. But if A=CVC'6 2 , then from (7,9)
it follows that (7.24) is distributed as hF where F is an F
q
variable with h and nq-(~ Pj+q) degrees of freedom; further,
j=l
if A=D.t,Lt, then from (7.12) it is seen that (7.24) is distributed as hF where F is an F variable with hand n-p-l
degrees of freedom,
Hence (7.25) <=> (7.26) where
=
P
for all non-null kj
i.~.,
(7.26)
1 - 0<
(7.25) implies (7.26) and (7.26)
implies (7.25) so that (7.25)
(=>
(7.26).
The significance of (7.26) is that with each
~FO
de-
fining a confidence region or interval, the probability of
the infinity of such intervals or regions being true
94
simultaneously is
l-~.
Note that k is no longer required to
be independent of the data.
However, if only h non-null k
vectors are chosen (resulting in h intervals or regions), the
probability of the h statements holding simultaneously is
necessarily
>l-~.
(See Roy
/~67,
p. 105, or Wallace /407,
p. 461.)
Wallace writes
and observes that the latter is a quadratic equation in the
Z's which defines a region R(t) "between the sheets of a
two-sheeted hyperboloid, the exterior of an ellipse, or
limiting and transitional forms of these.
Dt
The boundary,
, can never be one of the one-hseeted hyperboloids. n
The
region formed by the intersection of the ht(Z), t=1,2,·.· ,h,
is defined as the intersection confidence region and is denoted by R(I).
Observe that the confidence region is approxi-
ma te in the sense that P'> 1-0< wi th fini te h where P -+- 1-01 as
h increases indefinitely.
If, say, h=2, then from k l and!2 result h1(Z) and h 2 (Z).
Letting both quadratic equations define two-sheeted hyperbolas
which intersect, then an example of the intersection confidence region, R(I), is depicted in Figure 7.1 as transformed
axes, vI and v2'
95
'-----------------~Zl
The intersection confidence region, n(I), for
E [(2: 1 ,Z2)] for specific ~,~ vectors
Figure 7.1
Wallace gives three types of approximations to the region, n(I).
The first approximation region, n(l), is obtained by connecting the 2 h intersection points of the ht(Z)=O,
assuming there exist 2 h intersection points. Figure 7.2
illustrates n(l) as the area enclosed by lines joining the
four intersection points.
-
':
-'-----------------~~
Figure 7.2
Zl
The n(l) approximation to n(I)
t'
96
The second approximation region, R(2), for hyperboloids
(two-sheeted), and probably the most useful in practical situations,is formed by the intersection of hyperplanes tangent
to each of the pairs of surfaces (hyperboloids), Dt , the
tangencies occurring at points of intersection of the hyperboloids and the transformed axes.
Figure 7.3 depicts R(2) as
the closed area formed by the intersection of tangent lines
to the pairs of curves, Dl and D2.
Z1
Figure 7.3
The R(2) approximation to R(I)
Finally, the third type of approximation, or the parallelopiped approximation region, R(3) (again for hyperbolas),
is formed by approximating the pair Dt by hyperplanes parallel
to ~[O~(Y)/bZJ=O.
Figure 7.4 depicts this region in reference
to the other figures.
The reader is referred to Wallace /407
for the detailed mathematical and geometrical aspects of
these regions.
97
1-----------------~Zl
Figure 7.4
The R(3) approximation to R(I)
For purposes of simplifying the formulation of R(l),
R(2), and R(3), new coordinates, vI,v2, .. ·,vh' are defined
by Vt=~[o~(~)/ozJ in the Z space for specific choices of
k ,k 2 , ... ,k h .
l
Then,
is a quadratic in the Vt.
Further, letting P t be the hyper-
plane, vt=O, then "the pair of points lying on Dt and every
P tt , t~tt, have all v coordinates but the tth zero, and that
one given by the roots (rl< r
Ht(O, .•. ,O,v t '··· ,0)=0."
t ) of
the quadratic,
If r.i < O<r'.l' then the equation of
the tangent hyperplane approximation, R(2), to the part of
Dt with V..r,:>
° is
98
~ V{OH.(.(V)]V
t
.L-J
0
-t,i.(, '
vt
-
--
( 0, ...
,r2 ' ... , 0 )
OHt
t
(Y)] .Y. t
+ (v.(, , -r'.i t ) [ OV t t
(see Taylor /327, p. 151).
is easily formed.
= (0,···, r',i, ... ,0)
Thus, with Ht(.Y.)=O given, R(2)
It follows that R(3) is given as
r'<v <r n t=l, .. · ,h. From this point, one may transform the
.(,- -t- t'
inequalities back to the Z's.
With regard to the choice of k l ,'" ,kh, if k t =Ut/(9 t )1/2,
where Ut and 9.(, were previously defined as the characteristic vectors and roots of R'R lor ~ and 9~/2 are the vectors
and roots of R, R being defined in (7.1417, then
v.", = [!Li/(9.()1/2][o~(Y)/oZJ =
[Q.t/ (9,()1/2]<!.+RZ)
t RZ/(9 )1/2J
= [U'r/(9
~
t )1/2J + [U
~-.(,
= [Utr/(9 )1/2J + U:Z = U'(Z-~),
::::..t-
t
-v
-t - -
the last step following by equating (7.15) to zero, which gives
r=-RZ so that ~~/(9t)1/2=(-~RZ)/(9.",)1/2=-Ut~.
1\
o
But from Section 7.3, Zt=!~~l(Z-!).
Hence the transforma-
tion to the new coordinate v system is precisely the same as
referring 0(Y) to
canonical
form
(or is the same as refer-
ring the hyperellipses, Q=[o0(Y)/oZJ'C~0(Y)/oZJ, to their
principal axes).
I\.
Then from the canonical form of 0(y),
one
o
can proceed to the formulation of H.",('y')=Ht(Z), t=1,2, ... ,h,
and the various approximate confidence regions.
99
8.0
8.1
SOME PROBLEMS OF CONSTRAINED EXTREMA
Uses of Lagrangian Multipliers in a System
of Linear Regression Equations
A
When linear functions of the Yj were first considered in
A
Section 7.1, it was assumed that all Yj taken in linear combination (for purposes, say, of maximization or minimization)
were in the same or comparable units of measure.
Developments
under this assumption are restrictive since, even in the present problem, toxoplasma exudate yields three characteristic
response types in two units of measure--volume and number.
A
A) A
Maximizing (YT-YL
+Yv is inappropo as
!
is not invariant under
transformation of scale unless 9T-~L and ~v attain maximums
for the same Z value, or approximately so, in which case no
further developments are necessary.
The intent in this chapter is to consider extremal relationships between response types in different units of measure.
This concept is best exemplified with the toxoplasma problem.
Suppose that YT-YL and Yv can be described adequately by
~
A
A
concave parabolic surfaces with YT-YL and Yv attaining maxiA
A
mums for distinctly different Z values, say, ZT-L and Zv.
If
that amount and dilution of inoculum which maximizes YT-YL
:
also produces an unacceptably low volume of exudate, and conversely, if that amount and dilution of inoculum which maxiA
mizes Yv also produces an unacceptably small difference in opposing organisms, then the bacteriologist is forced to compromise, assuming that the inoculation of a greater number of
100
mice is unfeasible.
Compromise values of Z, estimated by,
/\
say Zfi' fall in the range
!.~.,
compromise values of Zl stem from the interval
and those of Z2 from
since YT-~L and ~v describe concave parabolic surfaces.
A
compromise implies the acceptance of a lower volume of exudate
and a smaller difference between opposing organisms.
The
sacrifice inherent in such a compromise is, in the former case,
fewer sera determinations, and, in the latter, a lower precision in the LD 50.
One approach to a proper selection among the
follows.
Between the functions
Yv
Zf is
/\
as
and ~T-~L' one must be
chosen as the constraining function and the other as the function to be maximized.
If both functions describe concave
paraboloids, the choice is somewhat arbitrary.
If not, then
the function which lends itself to maximization most conveniently within the range of interest becomes (possibly) the
function to be maximized.
Let YT-YL and
Yv
take the roles of
maximizing function and constraining function, respectively.
Select a lower bound for the acceptable exudate volumes, say,
101
(8.1)
where CM is the lowest acceptable volume of exudate.
1\
/\
/\
surfaces YT=YL and YV being concave paraboloids
y
With the
A
ZI
then
is
that Z value such that YT-9 L is maximum for Z on the boundary
/\
of Yv=C(!;.
Conseq~entlY9
by
v~rying C~
under (8.1)9 the set
(8.2)
includes all possible
co~promise
solutions.
From. physical
considerations,
(8.3)
/\
is selected and defined to be the optimal solution with YV=C O
1\
1\
I
A
II
/I
and max(YT-YL CO) being the optimal values of YV and YT-YL'
and!o, the optimal Z vector.
For example 9 the selection may
possibly involve associating the precision of the LD 50 with
values of
I
,/\
A
max~YT~YL
C~).
'
An acceptible precision ,specified
from the risk of deciding that a serum is
neg~tive
when in
fact it is positive) would then determine the optimal solution.
The process is depicted in Figure 8.1.
tours surrounding the point
Q4~
The closed con1\
which corresponds to Zv' stem
from dropping the boundaries formed by the intersection of the
paraboloid
YV
and the planes C~Yv=C~ onto the Z axes.
maximum values of
II
A
YT~YL
for Z on the boundaries of
The
A
C~Yv=Cfi
are on the curve, QlQ2' where the Z vectors defined by QIQ3
comprise the entire set (Z~)o
Max(YT~9L/CM) is denoted by
point Ql which represents the largest difference in opposing
e
e
e
.~
YT-YL
,I
,,
I
I
I
I
t
I
I
/
,/
/
/
I
'i/
Z2=X2
/'
1"\
L -,' -t-
- 7ZlO- - - \\
I
t
:>
Zl :>:xl
/
~-+-/
"-
.
Z2f)L- - ~ -
/
I
_ ~--
7'
/
{
-----,z:.-J<M7
J.......
YV""CM
/"----
z:
L~ .
I
--c
_
#
YV""C O
Figure 8.1
A geometric example of compromise solutions and the optimal
solution
I'='
o
I.\:l
103
organisms at the lowest acceptable volume of exudate.
After
a study of (8.2) and through other considerations, (8.3) is
selected as the optimal solution where, say, Q'=max(YT-9 LIC o)'
From the above discussion, it is clear that an expedient
method of determining values of Z (3 is by the use of Lagrangian
multipliers, since one function is maximized, subject to a
varied constraint.
Proceeding with a general formulation of the problem, let
each of the q response types be in one of t(t<q) units of
measure.
Let~(y), a linear combination of y's in one unit
of measure, be the function to be maximized.
Let ~y(Y) be a
function of those y's in unit of measure y, y=l,2, · .. ,t-l,
where the ~y will assume the role of constraining functions.
If
Yj = f j (!), j = l,2, ... ,q,
(8.4)
then ofy(Y)=,;(yCf<'~'>J =Ly(Z) and ;(*(y)=L*(Z); ~.g.,
/\
1\
1\
YT-YL=L*(Z)=Z'!l+Z'RlZ and
1\
II
YV=Ll(Z)=Z'!2+Z'R2~ where
L is
/I
estimated by L.
Assume that L* and Ly(Z)-CfiY~O are regular functions for
!€Xl, and h dimensional region in the Z space, where the CiS
:
are constants such that
•
Cp , t-l Z CM, t-l
104
or for brevity
(8.5)
Further assume that h>t-l;
!.~.,
the number of constraints is
less than the dimens ional i ty of Z.
Denote by X,8 the region
defined by
r1
t-l
~'K"
[Ly(Z) - CfJY
=
OJ ;
(8.6)
X(3 is that region or the ellipse in Figure 8.1 formed by
/I
the boundary of YV=C#.
To determine the extremal positions of L* subject to the
constraints Ly(Z)-se >0, we use the Lagrangian form
7IJ(Z,A)
I
- -~
where
!::.~= (Af51'
= L*(Z) -
t-l
~ A~yLy(Z) - Ca.y
Y=l r
~
(8.7)
.•. , Af3Y' ... , Af;, t-l) is a vec tor of Lagrang ian
multipliers, take partial derivatives of t(with respect to Z
and !::.~, then equate the h+(t-l) expressions ~]'loz and O~/O!::.~
to zero;
!.~.,
01£
where L t (Z)
oL*
t-l· oLy
0
oZ = bZ - ~l AS Ya-Z = hxl
(8.8)
~
-- L(Z)
C
0
OA
- - - _A
-f:;
., = (t-l)xl '
(8 9)
.
41 (Z) , ... ,L,,(Z) , ... ,Lt-l (Z~.
Solving equations (8.8) and (8.9) simultaneously, we find
among the Z vectors on the boundary of X..e (or rather, that
105
part of the boundary of
~~
in Xl) those
for which L*(!) is extremal.
6
=
SA(/!;)
r'-
roots (vectors)
Denote these sf vectors by
~
A
(/!;1'Z2""""
-
Ideally, we would like
s~(say)
A
(8.10)
,Zs
-'/5 ).
defined by putting
~M'
(8.6), to be a convex region such that
X~c=~
~=M
for all
in
~~M
under (8.5), for then, the problem of estimating those Z
for which L* is optimal is narrowed to a certain bounded
locale of the Z space, namely, on and within XM.
ample, in Figure 8.1,
~C=XM
~
under (8.6) for
For ex1\
is the region formed by Yv=CM with
~~M.
Certainly one may proceed with
solving (8.8) and (8.9) for, say, XM as the outside and
boundary of a hyperellipse (so that
~M
is closed); however,
the choice of which pattern of cts to choose under (8.5)
If or
substitution into (8.917 such that a representative set
of compromise solutions is found may be difficult.
If
the Ly(Z) are concave functions, then each forms a
convex region in the Z space.
Since the intersection of a
finite number of convex regions is convex, XM is convex, and
it follows tha t Xf;C l<M for all
f3
under (8.5).
Then external
values of L* for Z on and within XM are estimated by varying
C~
and hence
~,8
"fill" the region of
~t
within
Clearly, the infinity of regions
~M'
J(M'
Then through the variation of
C~
under (8.5), the set of
solutions,
1\
[s,s(Z)]
,
(8.11)
106
is generated which, upon substituting each
Z into
L*, gives
the extremal values of L*
(i) for Z6KM under the assumptions of the last paragraph,
(i1) for Z on the boundary of
~,
otherwise.
The set,
(8.12)
includes all compromise solutions with only one p sayp
(8.13)
being the optimal solution.
The reader is referred to Hancock /397, Chapter 6, particularly pp. 115-116) for determining whether L* attains a
maximum or minimum for any particular vector of the set
(8.11).
The general aim of the literature dealing with problems
of constrained extrema is that of finding the Z vector, say
A
~, in (8.11) such that L*(Z) is extremal over all ZEXM
1\
is the vector which gives rise to Q1 in
Figure 8.1), or such that L*(Z) is extremal on the boundary
(for example,
of 3<',s.
:
~
Oftentimes equations (8.8) and (8.9) are considered
in a different light so as to render general theories applicable.
For example, the method of steepest descent /417
is one by which
A
~
is sought among those Z vectors minimizing
(8.14)
107
the solution being found by iterating along the normal to
surfaces of constant
rV
y.
In those problems where L* is a quadratic function of
t
.
1\
the Z s, sufficient conditions for the existence of a Z
-+
(for which L* is supremum) are given in a theorem by Frank
and Wolf /if07:
If L*(Z). is a quadratic function and is
bounded on a polyhedral convex region, XM, then L* achieves
its supremum on XM.
If, further, L* is concave and the con-
ditions of the Kuhn-Tucker theorem /427 are met with regard
to the constraints, then the constrained extrema problem
falls into the realm of concave programming.
In this case,
/\
solving for Z+ reduces to estimating the saddle point of a
Lagrangian form
,L427.
/\
In the concave programming situation, Z+ is termed the
optimal vector among the set of vectors (8.11).
This would
necessarily lead to L*(~+) as the optimal value of L*.
Though our definition and the econometric definition of optimality may coincide, they are in general different,
With
optimality as defined earlier in this section, one is not
led to L*(~) /Or to ~*(~) if L* is estimated7 directly,
but rather, a selection process must take place with respect
:
to (8.12) after its determination.
The selection may be in-
fluenced by, say, considerations of economy, expediency, or
some combination thereof.
For instances where L* or one or more of the Ly are estimated by least squares, the writer is unaware of any publication
108
using this method of solution except for one application by
Umland and Smith 1457.
8.2
Errors in the Optimal Solution
Errors in any of the entities of the optimal solution
(8.13) will tend to be more serious than those associated with
the econometric optimal solution, since the true value of
(8.13) lor the expected value of (8.13) if L* and some or all
of the L y are unknown and estimated, say, jointly by weighted
least squares7 may not have been chosen as the optimal solution, whereas the econometric optimal solution entails no
selection process.
If L* and the Ly are known in exact form, then the only
error entering in (8.13) is through the estimation of the roots
of (8.8) and (8.9) when the latter describe a system of nonlinear equations.
To illustrate this error, suppose we were to estimate the
roots of the nonlinear system (8.8) and (8.9) directly, by
some iterative process, without reworking them into the framework given by (8.14). Suppose further that equations (8.8) and
(8.9) can be rewri tten in the form
p
h + t - 1
where
:
=
rLA..S].. = - - -
F (s , A)
=
-F ( -p) ,
(8 . 15 )
!' (p) = [F l
(p) , ... , F t (p) , ... F h + t - l (p)]. Then in i tera ting
for each root of (8.15), if p(O) is the initial estimate, so
result p(1),p(2)," .,p(O),'"
by means of the recurrence
relation,
(8.16)
109
Analogous to (6.52), let
(8.17)
where Tl is a root of (8.15) and beN) is the error of the Nth
iterate.
Substituting (8.17) into (8,16) and expanding the
right-hand side in a Taylor's series around I I results in
t=1,2,. ,.,h+t-l, where
[ b(N) ,
-
~l r
om
F-t,
=:
I'i> (N)'
L:'"
2.-1
rF
Op·f- P
. ,
=n
If the terms, [()(N)'(O/OIT)]rF t , in (8.18) vanish up to and
including r-l for all t, then Hartree /147 defines the iterative process "convergent to order r."
Bodmer /-:f47 has shown
that the generalized Newtion-Raphson process is convergent
to the second order; i.e"
the matrix,
t,t'=1,2,. ",h+t-l, when F(£) is defined to be
F(p) :: p -
-l[oYl
L-o(oliopt)]
oP.r.'
oEJ
'
where (o:f/op)':: [(oY/oZ) ' (oo//o~) '] in (8.8) and (8,9).
(8.19)
Note
that from (8.19), the equations in (8.15) are equivalent to
(8.8) and (8,9).
110
If in (8.19), 0(&2) is negligible, then the recurrence
relationship,
(8.20)
is used to estimate the roots of (8.19) or (8.15), and hence
of (8.8) and (8.9).
Unless O(b 2 ) is sufficiently small, con-
vergence (to a given decimal place) to the roots of (8.15)
is only "seemingly."
That is, we want to find the roots of
one system of equations, (8.8) and (8.9), but find it more
convenient to work with a second system, (8.19) or (8.15)
(whose roots are the same as the first system), since the
latter have associated systematic, iterative scheme, (8.20).
But in the iterative scheme a quantity, 0(0 2 ), is assumed
negligible, and if the latter is, in fact, substantial, then
the roots obtained may differ greatly from those of (8.19),
thus resulting in pseudo roots.
However, by neglecting
terms of, say, 0(b3 ) or 0(b4 ) in (8.18) rather than 0(0 2 )
and/or obtaining a closer initial estimate of
II, the
pseudo roots may be corrected.
When 0(b 3 ) is sufficiently small, the error may be negligible so long as one iterates a sufficient number of times,
especially when the roots of (8.19), and consequently of (8.8)
and (8.9), are close.
Then it could be concluded that the
true value of (8.13) would have been chosen as the optimal
solution.
Consider next the case when L* and some or all of the Ly
are unknown in exact form and must be estimated by, say,
111
L* and Ly ' Here we consider those cases for which the Lean
be estimated by least squares (or by weighted least squares
when the errors within the experimental vnits are correlated),
as is described in Chapter 6;
!.~.,
Yj=fj(Z) in (8.4) can be
adequately approximated by a Taylor's
certain order.
expansion up to a
In this case, the error associated with each
member of the optimal solution may be real and therefore cannot be discounted as previously, since besides the error of
iteration, we have the errors stemming from the estimation
of the L's.
As such, it is not unlikely that the expected
value of the optimal solution,
(8.21)
may not have been chosen as the optimal solution, unless
(8.13) and (8.21) are sufficiently close.
But there exist
no means of measuring this closeness aside from the narrowness
of associated confidence bounds for each member of (8.21)
(assuming the existence of unbiased statistics), which are
difficult, if not impossible, to construct from
1\
E(~),
except in possibly very simple problems.
Regardless of the usual indetermination of closeness just
mentioned, there are situations where there is an especially
high risk of choosing the optimal solution when, in fact, the
expected value of the optimal solution would not have been
desired.
"'*
of L
1\
This occurs when t*(Z)o is at a point-on the surface
where the curvature is large, for then, small changes in
1\*
1\
Z produce large changes in L (Z).
A*
Further, for cases where L
112
1\
is
neither concave nor convex and/or the Ly are not concave,
the latter mentioned risk is high since small changes in C~
1\
could produce large changes in Z.
borhood of
-:'1
~
and
£0
the distance,
If in the immediate neighr.(~
1\
,
L~~-Z~,)
(/\
/I
~ Z;8' <
~
1\
!o < Z /5'
£.0
1/2
,
where
and C13' < £.0 L. C/3' then one should consider
a sufficient number of interpolated values of
neighborhood of
]
IC;.e. -C /3' \ '
is large compared wi th any of the distances,
f3/.f3'
1\
Z~-Zft')
C~
in the
before finally selecting the optimal solu-
tion.
1\
In passing, it is worth noting that for L* roughly
.
plana~
1\*
over the constraint region, one might approximate L
by a plane (hyperplane) and simplify the computations considerable.
8.3
An Approximate Confidence Interval for an
Individual Lagrangian Multiplier
Suppose that the q response types are in one of two units
of measure; ~.~., t=2.
Let L*(!) be a linear combination of
q' linear regression functions in one unit of measure and
A.
L(!) a linear function of the
q~q'
tions in the other unit of measure.
linear regression funcThen (8.7) becomes
(8.22)
Assuming there exists at least one value of Z on the boundary
defined by L(Z)=C for which L* is extremal, then this point
(points) is found by setting
(8.23)
113
and L(Z)=C.
The tth equation of (8.23) is
(8.24)
1\
(\
(\
I'c,
1\
1\
1\"
1\
A
where b'=(b' ,bI ), b =( b ,··· ,b ,), b =(b 1"" ,b ), and A
-I - I
-I - l
-q
-II -q+
-q
A
is the appropriate estimate of A. From (8.24), A may also
be written in the form,
(8.25)
/by a proper multiplication and rearrangement in (8.2417
where
~
~
~(b)
/'.,
~
~
implies that the elements of k-f,l and k-f,2 are de-
pendent on b.
Clearly, these dependencies are in nonlinear
forms, and except in trivial cases, it is difficult, if not
unfeasible, to obtain the
'"
1\
A=~(b)b,
'V
k~
in explicit forms.
With
it may be evident to the reader at this point that
we intend to use the argument of Section 7.4 in finding a
N
confidence interval for A, since in (8.25) k
t
is functionally
1\ 7
dependent on the data Iwhich
is implied by rJ
kt(bL
.
Now, if Xj~X for some j=k,2, ... ,q, then var (b)=V62 ,
where V is given in (7.5) and discussed thereafter;
(8.26)
V
q
~ Pj
j~
i.~.,
q
x
:z
J=l
Pi
if W is known and
(8.27)
if W is unknown (assuming convergence).
114
If k is a preselected non-null vector,
(8.28)
Then using the same argument given in Section 6.4, we find,
similar to (7.24) and (7.25) that
(8.29)
is the upper ~ percent
q
q
point of the F distribution with f l = ")l Pj' f 2 =qn-( ~ Pj+q) ,
is distributed as
PlF~(fl,f2)
where
F~
~
Pl=f /f 2 , and
l
J~
&is
J--;;}
the proper estimate of 6.
Then
p
(8.30)
for all non-null k.
Now, the statement,
(8.31)
for all non-null k and with probability
l-~,
subject to k'k=constant and with probability
(8.31)
implies that
>l-~;
i.~.,
~(8.32), but (8.32)~(8.3l). But subject to
k'k=constant, sup (k'Vk)=ch V so that (8.32) implies
max
subject to k'k=constant and with probability
(8.32)( >(8.33).
~ l-~i
i.~.,
Since (8.32) and (8.33) hold for all non-
null k subject to k'k=constant, we may certainly choose
115
f'J
1\
.-v
1\ It 1\
!',,..J
1\
1\
k=~(b) so that ~(b)~b-b~=A-k!(b)b=A-AO' Hence from (8.33)
(A-A O )2
< PIF
0(
(fl,f2)~2 ch V with probability>l-O< so that
max
(8.34)
-v
/\
defines an exact confidence region for AO=!l(b)b.
assume that
If we
and replaced AO by A in (8.34), then the
(8.34) will define an approximate confidence interval for A.
AO~A
If Xj=X for all j, consider the regression of t'y on
~,
so that the resultant coefficient vector estimate of, say,
1\
1\
1\
1\
£, is £=tlbl+t2b2+···+tq£q, and, from previously,
A
var (£)=(X'X)-
1
t'~t.
~*
~
t
1\
1\
Then in (8.25) we will have A =k (£)£,
so that following through with the entire argument just given,
one arrives at
P0* - Pl Fo«fi,f 2)1/2(t'Et)1/2 ch 1/2(X'X)-1 < A~ < ~*
max
(8.35)
+ P2F~ (fi, f 2)1/2(t' ~t) 1/2 ch 1/2 (X=X) -1] ~ 1 - 0< ,
max
* -v
As before,
where f1=p, f 2 =qu-qp-q, P2=fi/f2' and AO=~(£)£'
if we assume A*~A*=~:(C)c and replace A~ by A* in (8.35),
o
---
then there results an approximate confidence interval for an
/\
individual Legrangian multiplier, A*, whose estimate is A*.
Since the determina tion of ch V, ch (X' X) -1 may involve
max
max
tedious computations should either of these matrices be of
large order, one may use the following approximation.
The
116
largest characteristic root of a matrix is less than the
largest sum of the moduli of the terms in any row or column.
Taking this value as the largest characteristic root, one might
at the same time increase the value of
reduces
ch
max
~
by some amount which
to counteract the possible increase in ch V or
max
(X t X)-l due to its approximation.
F~
117
9.0
SUMMARY AND GENERAL REMARKS
Concepts in various parts of this article have long been
known and applied.
However, to the writer's knowledge, no
one has considered various aspects of Chapter 6.
Its purpose
is to allow the construction of an adequate system of linear
regressions in order to cope with problems dealing with certain relationships between dependent response types.
First, a phenomenon of interest (exudate extracted from
inoculated mice) is characterized by q response types each of
which are, in turn, expressed in terms of other variables.
The latter includes certain exogenous variables (which are
assumed to be at least controlled) and certain endogenous
variables (the response types)
0
The reduced equations or
predictive regression system is assumed to include only
linear models depending on the same or different sets of overlapping independent variables;
~.g.,
if xl and x2 denote the
exogenous variables, then the response type
Yj is regressed
on the independent variables, say, xl' x 2 ' x~, x l x 2 , and x~,
while another response type, Yj' is regressed on, say, xl and
x~o
It is stressed that while the hypothesized structural
regression system may be incorrect, the predictive regression
system may lead to valid inferences.
Should each Yj be regressed on precisely the same set of
independent variables, then E, the covariance matrix of the q
errors stemming from an experimental unit, does not enter in
the joint least squares estimation of the parametric
118
coefficients of the regression system.
If L is known (or at
least the simple correlations and the ratios of the diagonal
elements of L are knQwn), and the Yj depend
on different
sets of overlapping independent variables, then best estimates of the coefficients are available.
Otherwi.se (L
un~
known), an iterative scheme is proposed for finding joint
estimates of the coefficients.
Further, an approximate co-
variance matrix is given for the iterated vector estimate.
In this connection, a computor study should now be made to
determine whether the approximate covariance matrix is a
realistic measure of dispersion arising in small samples.
Then linear functions, say ~(9), of the predicted values,
Yj (in the same or comparable units of measure») are taken
1\
with the objective being the estimation of the points, say!,
in the functionally independent factor space such that ~[E(y)J
attains stationary points, assuming the existence of at least
one.
Should ~(y) be of second order in the ZV s , then exact
1\
and approximate confidence regions are given for
Finally, if the
Yj
E(~).
are in different units of measure, func-
tions of certain Yj are maximized (minimized) subject to
varied constraints, the latter being defined by other Yj'
It is unfortunate that the iterative methods and approximate covariance matrix given in Chapter 6 are not applicable
to all single nonlinear models such as the "revised" rational
regression model of Section 6.7.
Nonlinear regression models
are gaining increasing prominence and rightly so.
The real
world is not linear, but rather nonlinear and by referring to
119
realistic nonlinear models, one is,
to nature.
in effect, coming closer
Further work is needed to determine the worth of
parameter estimates of nonlinear models since large sample
variances tend to underestimate the actual dispersion.
On
the other hand, the more basic problem lies in estimation.
Nonlinear estimation by known methods oftentimes
requires
some type of iteration while iteration, in turn, may induce a
large variability in the estimates.
Perhaps at a future date,
some method of estimation will be devised voiding the "necessity" of iteration in nonlinear estimation.
The theorem by Scheffe /297 was used in finding out only
A
an approximate confidence region for E(Z), but also an approximate confidence interval for an individual Lagrangian multiplier.
Though no other recourse is usually available with
such a sound mathematical basis, the application of this
theorem presupposes nature to react at her worse, while, in
fact, she does not always do so or may rarely do so.
Conse-
quently, to obtain a relatively simple solution to a very complex problem, one is forced to become ultra-conservative.
With regards to needed
research~
few statistical problems hold
such high priority as the one dealing with overprotection.
In
other words, when applying the aforementioned theorem, to what
extent should L be increased (see Section 8.3) in order that
realistic confidence intervals (regions) and tests of significance are attained.
120
10.0
LIST 'OF REFERE~CES
1.
Anderson, R. L., and Bancroft, T. A. 1952. Statistica~
Theory in Research. McGraw-Hill Book Company, fnc.,
New York.
2.
Anderson, T. W. 1958. An Introduction to Multivariate
Statistical Analysis. John Wiley and Sons, New
York.
3.
Bodmer, W. F. 1960. On the convergence of iterative
processes for the solution of simultaneous equations
in several variables. Proceedings of the Cambridge
Philosophical Society 56:286-289.
4.
Box, G. E. P., and Hunter, J. S. 1954. A confidence
region for the solution of a set of simultaneous
equations with an application to experimental design. Biometrika 41:190-199.
5.
Box, G. E. P., and Hunter, J. S. 1957. Multifactor experimental designs for explo~ing response surfaces.
Annals of Mathematical Statistics 28:195-241.
6.
Box, G. E. P., and Hunter,J. S. 1959. Experimental designs for exploring response surfaces. Part B.
Proceedings of Symposium on Design of Industrial Experiments.' Ed. V. Chew, John Wiley and Sons, .New
York.
7.
Copson, E. T. 1935. Theory of Functions of a Complex
Variable. Oxford University Press, Oxford.
8.
Deming, W. E. 1943. Statistical Adjustment of Data.
John Wiley and Sons, New York.
9.
Frank, M., and Wolfe, P. 1956. An algorithm for quadratic programming. Naval Research Logistics
Quarterly 3:95-110.
10.
Fieller, E. C. 1940. The biological standardization of
insulin. Journal Royal Society Supplement 7:1-53.
11.
Finney, D. J. 1947.
Press, Oxford.
12.
Gurland, J. 1958. A generalized class of contagious
distributions. Biometrics 14:229-249.
13.
Hancock, H. 1960. Theory of Maxima and Minima.
Publications, New York.
Probit Analysis.
Oxford University
Dover
121
14.
Hartree, D. R.
1949. Notes on iterative processes.
Proceedings of the Cambridge Philosophical Society
45:230-236.
15.
Hood, W. C., and Koopmans, T. C. 1953. Studies in
Econometric Method, Cowles Commission Monograph
No. 14, John Wiley and Sons, New York.
16.
Hotelling, H.
1935. The most predictable criterion.
Journal of Education Psychology 26:139-142.
17.
Hotelling, H. 1935. Relations between two sets of
variates. Biometrika 28:321-377.
18.
Hotelling, H. 1943. Some new methods in matrix calculation. Annals of Mathematical Statistics 14:1-34.
19.
Kendall, M. G. 1951. Regression, structure, and functional relationship. Biometrika 38:ll-ff.
20.
Kendall, M. G. 1952. Regression, structure, and functional relationship. Biometrika 39:96-108.
21.
Kendall, M. G. 1957. A Course in Multivariate Analysis.
Hafner Publishing Company, New York.
22.
Kunz, K. S. 1957. Numerical Analysis.
Book Company, Inc., New York.
23.
Lonseth, A. T. 1942. Systems of linear equations with
coefficients subject to error. Annals of Mathematical Statistics 13:332-337.
24.
Neyman, J. 1939. On a new class of contagious distributions applicable in entomology and bacteriology.
Annals of Mathematical Statistics 9:35-67.
25.
Ostrowski, A. M.
1960.
tems of Equations.
26.
Roy, S. N. 1957. Some Aspects of Multivariate Analysis.
John Wiley and Sons, New York.
27.
Roy, S. N., and Bose, R. C. 1953. Simultaneous confidence interval estimation. Annals of Mathematical
Statistics 24:513-536.
28.
Sabin, A. B., and Feldman, H. A.
1948. Dyes as microchemical indicators of a new immunity phenomenon
affecting a protozoan parasite toxoplasma. Science
108:660-663.
McGraw-Hill
Solution of Equations and SysAcademic Press, New York.
122
29.
Scheffe, H. 1953. A method for judging all contrasts
in the analysis of variance. Annals of Mathematical Statistics 40:87-104.
30.
Sonnenwirth, A. C. 1958.
tory Digest 7:38-41.
31.
Skellam, J. G. 1948. A probability distribution derived
from the binomial distribution by regarding the
probability of success as variable between sets of
trials. Journal Royal Statistical Society 10:257261.
32.
Taylor, A. E. 1955. Advanced Calculus.
Company, New York.
33.
Turing, A. M. 1948. Rounding off errors in matrix processes. Quarterly Journal of Mechanics and Applied
Mathematics 1:287-303.
34.
Trevan, J. W. 1927. The error of determination of
toxicity. Proceedings Royal Statistical Society,
Series B 101:483-514.
35.
Tukey, J. W. 1954. Causation, Regression, and Path
Analysis. Chapter 3, Statistics and mathematics in
biology. Eds. Kempthorne, Bancroft, Gowen, and
Lush. The Iowa State College Press, Ames.
36.
Turner, M. E., and Stevens, C. D.
1959. The regression
analysis of causal paths. Biometrics 15:236-258.
37.
Turner, M. E., Monroe, R. J., and Lucas, H. L.
1961.
Generalized asymptotic regression and nonlinear
path analysis. Biometri~s 17:120-143.
38.
Umland, A. W., and Smith, W. N. 1959. The use of Lagrange multipliers with response surfaces. Technometrics 1:289-292.
39.
Uzawa, H.
1958.
Iterative methods for concave Programming. Studies in Linear and Nonlinear Programming.
Eds. Arrow, Hurwicz, and Uzawa. Stanford University Press, Stanford.
40.
Wallace, D. L. 1958.
Intersection region confidence
procedures with an application to the location of
the maximum in quadratic regression. Annals of
Mathematical Statistics 29:455-475.
41.
Williams, E. J.
1959. Regression Analysis.
and Sons, New York.
Toxoplasmosis.
The Labora-
Ginn and
John Wiley
123
42.
Wright, S. 1918. On the nature of size factors.
Genetics 3:367-374.
43.
Wright, S. 1921. Correlation and causation.
Agricultural Research 20:557-585.
44.
Yule, G. U., and Greenwood, M.
1920. An inquiry into
the nature of frequency distribution representative
of multiple happenings with particular reference to
the occurrence of multiple attacks of disease or of
repeated accidents. Journal Royal Statistical
Society 2:255-279.
Journal
INSTITUTE OF STATISTICS
NORTH CAROLINA STATE COLLEGE
(Mimeo Series available for distribntion at cost)
265. Eicker, FriedheIm.
266.
Consistency of parameter-estimates in a linear time-series model.
October, 1960.
Eicker, FriedheIm. A necessary and sufficient condition for consistency of the LS estimates in linear regression. October,
1960.
267. Smith, \V. L.
On some general renewal theorems for nonidentically distributed variables. October, 1960.
268. Duncan, D. B. Bayes rules for a common multiple comparisons problem and related Student-t problems.
1960.
269. Bose, R. C. Theorems in the additive theory of numbers. November, 1960.
270. Cooper, Dale and D. D. Mason.
271. Eicker, FriedheIm.
November,
Available soil moisture as a stochastic process. December, 1960.
Central limit theorem and consistency in linear regression. December, 1960.
272. Rigney, Jackson A. The cooperative organization in wildlife statistics. Presented at the 14th Annual Meeting, Southeastern Association of Game and Fish Commissioners, Biloxi, Mississippi, October 23-26, 1960. Published in Mimeo Series,
January, 1961.
273. Schutzenberger, M. P.
On the definition of a certain class of automata.
January, 1961.
274. Roy, S. N. and J. N. Shrizastaza. Inference on treatment effects and design of experiments in relation to such inferences.
January, 1961.
275. Ray-Chaudhuri, D. K. An algorithm for a minimum cover of an abstract complex. February, 1961.
276. Lehman, E. H., Jr. and R. L. Anderson. Estimation of the scale parameter in the vVeibull distribution using samples censored by time and by number of failures. March, 1961.
277. HoteIling, Harold.
The behavior of some standard statistical tests under non-standard conditions.
278. Foata, Dominique.
1961.
279. Eicker, FriedheIm.
On the construction of Bose-Chaudhuri matrices with help of Abelian group characters.
280. Bland, R. P.
281. Williams,
J.
Central limit theorem for sums over sets of random variables.
282. Roy, S. N. and R. Gnanadesikan.
April, 1961.
283. Schutzenberger, M. P.
285. Patel, M. S.
286. Bishir, J. W.
April, 1961.
A coding problem arising in the transmission of numerical data.
April, 1961.
May, 1961.
Two problems in the theory of stochastic branching processes.
May, 1961.
A quantitative analysis of the growth and regrowth of a forage crop.
288. Zaki, R. M. and R. L. Anderson.
ning over time. May, 1961.
May, 1961.
Equality of two dispersion matrices against alternatives of intermediate specificity.
Investigations on factorial designs.
287. Konsler, T. R.
March, 1961.
An evaluation of the worth of some selected indices.
On the recurrence of patterns.
284. Bose, R. C. and 1. M. Chakravarti.
February,
February, 1961.
A minimum average risk solution for the problem of choosing the largest mean.
S., S. N. Roy and C. C. Cockerham.
February, 1961.
May, 1961.
Applications of linear programming techniques to some problems of production plan-
289. Schutzenberger, M. P.
A remark on finite transducers.
290. Schutzenberger, M. P.
On the equation a2 +n
291. Schutzenberger, M. P.
On a special class of recurrent events.
=b
Z m
+
June, 1961.
c2+p in a free group.
June, 1961.
June, 1961.
292. Bhattacharya, P. K. Some properties of the least square estimator in regression analysis when the 'independent' variables
are stochastic. June, 1961.
293. Murthy, V. K.
On the general renewal process.
294. Ray-Chaudhuri, D. K.
295. Bose, R. C.
June, 1961.
Application of geometry of quadrics of constructing PBIB designs.
Ternary error correcting codes and fractionally replicated designs.
June, 1961.
May, 1961.
296. Koop, J. C. Contributions to the general theory of sampling finite populations without replacement and with unequal
probabilities. September, 1961.
297. Foradori, G. T.
Some non-response sampling theory for two stage designs. Ph.D. Thesis.
298. MaIlios, VV. S.
Some aspects of linear regression systems. Ph.D. Thesis.
299. Taeuber, R. C.
On sampling with replacement: an axiomatic approach. Ph.D. Thesis.
300. Gross, A. J.
301. Srivastava,
On the construction of burst error correcting codes.
J.
N.
November, 1961.
August, 1961.
Contribution to the construction and an alysis of designs.
302. Hoeffding, vVassily.
November, 1961.
November, 1961.
The strong laws of large numbers for u-statistics.
August, 1961.
August, 1961.
303. Roy, S. N.
Some recent results in normal multivariate confidence bounds.
304. Roy, S. N.
Some remarks on normal multivariate analysis of variance.
August, 1961.
August, 1961.
305. Smith, VV. L.
A necessary and sufficient condition for the convergence of the renewal density.
306. Smith, VV. L.
A note on characteristic functions which vanish identically in an interval.
307. Fukushima, Kozo.
308. Hall, W.
J.
A comparison of sequential tests for the Poisson parameter.
Some sequential analogs of Stein's two-stage test.
309. Bhattacharya, P. K.
August, 1961.
September, 1961.
September, 1961.
September, 1961.
Use of concomitant measurements in the design and analysis of experiments.
!'--J'ovcmber, 1961.