ALGORITHMS FOR THE GENERATION OF DESIGN AND DEFINITIONAL MATRIcES
IN LINEAR MODELS FOR CROSSED-FACTOR DESIGNS
by
David H. Christiansen
Department of Biostatistics
University of North Carolina at Chapel Hill
Institute of Statistics Mimeo Series No. 1416
September 1982
ALGORITHMS FOR THE GENERATION OF DESIGN AND DEFINITIONAL MATRICES
IN LINEAR MODELS FOR CROSSED-FACTOR DESIGNS
by
David H. Christiansen
A Dissertation submitted to the faculty of
The University of North Carolina at Chapel Hill
in partial fulfillment of the requirements for
the degree of Doctor of Public Health in the
Department of Biostatistics.
Chapel Hill
1982
DAVID HOWARD CHRISTIANSEN. Algorithms for the Generation of Design and
Definitional Matrices in Linear Models for Crossed-Factor Designs
(Under the direction of RONALD W• HELMS.)
The increased availability of statistical software over the last
decade has generated considerable discussion regarding the inability of
linear models programs to convey the details of their analyses to the
user in an interpretable form. In some instances, the model fitted is
determined by the computational method used by the computer program and
by the number of observations in the cells of the design, with the
result that the user may not know what model was fitted and what
hypotheses were tested.
Algorithms for generating linear models for crossed-factor
designs,including designs involving missing cells, are presented along
with methods for communicating information about these models to the
researcher. The concept of parameter definition is used to describe
each parameter element in terms of a linear combination of the expected
cell means of the design. Secondary parameters of interest can also be
described in terms of their parameter definitions.
Normalized Gauss-Jordan reduction of the design (X) matrix of a
linear model results in a "definitional matrix" that frequently
provides ~n easily interpretable and intuitively appealing definition
of the model parameters. Designs involing missing cells often results
in some parameters being undefined because their definitions involve
the expected means of missing cells. In certain instances, these
parameters can be "recovered" by finding an equivalent definition that
does not involve the missing cells. In addition, the concept of
recoverability of parameters generalizes the definition of
connectedness to higher order designs and provides a test for
connectedness of these designs.
For those missing cell designs where one or more parameters are
.not recoverable, three alternative strategies and associated algorithms
are presented. First, the user can specify that subparameter of the
original model that is recoverable. Second, he or she may make
additional assumptions about the model in the form of restrictions
which will allow more parameters to be recovered. Third, the user may
specify recoverable parameters that are linear combinations of the
non-recoverable original parameters.
ACKNOWLEDGMENTS
First, I would like to acknowledge and thank my adviser, Dr.
Ronald W. Helms, not only for his help and guidance on this
dissertation, but also for his role as an instructor, mentor, and
friend.
I would like to thank the members of my committee, Dr. James E.
Grizzle, Dr. James D. Hosking, Dr. Gary G. Koch, and Dr. James E.
Watson for their input, support, and assistance.
I would like to
extend a special thanks to Jerry Hosking. whose diligence and word
processing skills contributed so much to this work.
Finally, I wish to thank my family:
my grandparents, Mr. and Mrs.
Elmer Nielson, for their strong belief in education; my parents, H. O.
and Elma Jean Christiansen, for their support and encouragement; and my
wife Roberta and daughters Laura and Jennifer for their love,
understanding, and patience.
i1
Contents
Page
LIST OF TABLES
....... ......... •
...... .......
..
...... ..........
• •
v
LIST OF FIGURES •
v
LIST OF EXAMPLES
v
1
Int roduct ion •
• • • • • • • • •
• •
1.1 Statement of the Problem • • • • • •
• •
1.2 Overview • • •
• •
• • • • •
• • •
1.3 Research Plan. • ••
••••••••••
• ••••
1.4 Notation
• • • ••
•••••••••
• ••
1.4.1' General Linear Models Notation. •
• • ••
••
1.4.2 Experimental Design Notation. • •
• ••
1.4.3 Reduction in Sums of Squares: The R() Notation
1.4.4 INFL: An Informal programming Language • • • • •
1.5 Model Definitions. •
• • • • • • ••
••
• •••
1.5.1 Primary Parameters. • •
• ••
Model 0
LTFR ANOVA Model
••• • • • •
Model 1
LTFR ANOVA with Restrictions.
•••••
Model 2
Sigma Restricted Model •
• • • • •
Model 3
Reference Cell Model •
• • ••
••
Model 4
Cell Mean Model
••
1.5.2 Secondary Parameters • • • •
• •
ABDL Effects •
DFM Effects.
Reference Cell Effects • • • • • •
• •
...... ...
2
Review of Literature • •
• •
2.1 Parameter Definitions.
••
• • • • • •
2.1.1 Full Rank Model Definitions • • •
2.1.2 LTFR Model Definitions.
•••
2.1.3 Alternative Definitions
2.1.4 Secondary Parameter Definitions ••
2.2 Transformation Between Models.
••
• •
2.2.1 Isomorphic Models • • • • •
2.2.2 Full Rank Isomorphic Models
2.2.3 LTFR Transformations • •
• •
2.2.4 Reduced Models • ••
•••
2.2.5 Comparison of Model Effects • • • • •
2.3 Unbalanced and Irtcomplete Designs • • • • •
2.3.1 Method of Fitting Constants ••
•
2.3.2 Method of Weighted Least Squares.
2.3.3 Hypotheses of Interest. ••
••
2.3.4 Missing Cells (Incomplete Designs) • •
2.4 Algorithms for Model Generation.
• • ••
iii
. . . .· ..
• •
•••
• • •
•
• • •
• • • •
• •
•
•
•
•
••
••
• ••
•
•
• ••
• ••
• ••
• • •
• • • ••
•••••
• • • • ••
• • • •
• • • •
••
•••
••
••
••••••
1
1
3
3
4
5
6
8
9
10
10
10
11
11
12
13
14
14
16
17
18
18
18
20
22
23
25
26
28
28
29
29
30
30
32
33
36
42
Page
3
.
. ·
·
·· · · · · · ·· ·· · · · · ··
· · · ·· · ·· ·· · · · ••
· ···· ·· ·
·· · ·· ··
· · · · · ·· ••
· · · · · ·· · · · · · •
·
···· ··•
· · · ·· ·•
· ·· · ·· ·· · ·· · · ·· ·· •
··
·
· · · · · · · ·· ·· · •
· · · · · · ·· ·· ••
··
The Complete Model •
• • •
•
• •
•
•
•
3.1 Generation of the Essence Model •
• •
•
•
• •
3.1.1 Model Specifications
•
•
• •
•
•
3.1.2 The Method of Kurkjian and Zelen
•
•
•
•
3.1.3 Equivalence of Design Matrices
•
•
•
Theorem 1. Generation of ANOVA Parameters
•
•
3.1.4 Generation of LTFR Paramete~ Restrictions
•
3.1.5 Generation of X and R Matrices
•
•
•
•
Algorithm 1. Generation of Design Matrices
•
3.2 Primary Parameter Definitions •
•
• •
3.2.1 Definitional Matrix • • •
• •
• •
•
3.2.2 Normalized Gauss-Jordan Reduction
Algorithm 2. Normalized Gauss-Jordan Reduction
3.2.3 Conditions for Equivalence of Definitions
•
Theorem 2. Equivalence of FR Model Definitions
Theorem 3. Equivalence of LTFR Model Definitions
3.3 Secondary Parameter Definitions
•
•
•
3.3.1 e Definitional Matrix • •
•
•
• •
3.3.2 Conditions for Equivalence of Secondary Paramet.p.rs
Theorem 4. Definition of Secondary Parameters •
3.3.3 Complete Definitional Matrix
•
Algorithm 3. Complete Definitional Matrix •
3.3.4 Generation of C Matrices • • •
•
•
Theorem 5. EquIvalence of Secondary Parameters
Algorithm 4. Generation of C Matrices
•
4 Missing Cells • • • • • • • • • • • • • • •
• • • • • •
4.1 Recoverable Paramete~s • • • • • • • • • • • • • • • • • •
4.1.1 Conditions for Recoverability • • • • • • • • •
Theorem 6. Recoverable Secondary Parameters • • • •
4.1.2 NGJ Reduction for Missing Cells Designs • • • • • •
Algorithm 5. Recovery of Missing Cell Parameters ••
4.1.3 Connectedness and Recoverability • • • • • • •
4.2 Nonrecoverable Parameters • • • • • • • • • • • • • • • • •
4.2.1 Strategy 1 - Recoverable Subparameters of e ••••
4.2.2 Strategy 2 - Convert Nonrecoverable Parameters • • •
Algorithm 6. Convert Parameters to Restrictions ••
4.2.3 Strategy 3 - Define Similar Parameters • •
• ••
5
Sumtna ry
46
47
47
48
51
51
57
60
61
63
63
66
69
70
70
71
78
78
79
79
81
82
83
83
86
92
93
94
95
97
99
105
108
108
109
119
123
• • • • . • • • • • • • • • • . • • • • • • • • • • • • 124
·5.1 Results. • •
• • • • • • • • • • • • • • • • • • • • • 126
5.2 .Applications • • • • • • • • • • • • • • • • • •
• •• 128
5.3 Directions for Further Research • • • • • • • • • • • • • • 129
REFERENCES
. . . . . . . . . . . . . . . . . . . . · . · . . . . • 131
iv
tit
LIST OF TABLES
Table
Page
2.1
Hypotheses Usually Tested for a Two-Way Factorial Design..
35
2.2
Hypotheses Tested by Various Methods and Computer Program
37
3.1
Main Effect Design Matrices and Parameters
• • • • • ••
62
3.2
C Matrices and Secondary Parameters
••••••••••••
87
LIST OF FIGURES
Figure
Page
Hypothetical Cell Means for a Two-Way Factorial
1.1
. . . . . • 15
LIST OF EXAMPLES
Example
Page
3.1
Additive 3x4 LTFR ANOVA without Restrictions (Model 0) • ••
74
3.2
Additive 3x4 LTFR ANOVA with Restrictions (Model 1)
••••
75
3.3
3x4 LTFR ANOVA with Interactions • • • • • • • • • • • • • •
76
3.4
3x4 Reference Cell by LTFR Restricted ANOVA
• • • • • • ••
77
3.5
ADBL Effects for 3x4 Reference Cell Model
•• • • • • • ••
89
3.6
DFM Effects for 3x4 Reference Cell Model. • • • • • • • ••
90
3.7
Reference Cell by DFM Effects for 3x4 Reference Model
91
4.1
"L" Design for 3x3 Factorial • • • • • • • • • • • • • • • • 102
4.2
"Box and One" Design for 3x3 Factorial • • • • • • • • • • • 103
4.3
"Missing Diagonal" Design for 3x4 Factorial
• • • • • • • • 104
4.4
one-Half Replication of a 24 Additive Model
• • • • • • • • 106
4.5
One-Half Replication of a 24 Model with Two-Way IA • • • • • 107
4.6
"L" Design for 3x3 Factorial with Interactions • • • • • • • 122
v
•••
Chapter 1
Introduction
1.1 Statement of the Problem
Historically the analysis of experimental designs has focused on
balanced designs for reasons of efficiency and computational
practicality.
The wide-spread availability of digital computers and
the subsequent development of statistical software provide the
computational capability for dealing with unbalanced and incomplete
data.
With this increased computational ability comes the
responsibility for selecting an appropriate approach for the particular
problem at hand.
Although computer programs can perform the
calculations SWiftly and accurately, they cannot, in general, choose
the "best" methoq.
In many cases the current software does nat
communicate to the user the specific details of the analysis performed
or the hypotheses
test~d.
Recent literature reflects considerable concern over the analysis
of classification designs by currently available linear models
software.
The question being asked is:
being tested?".
"What hypothesis is really
One problem is the inappropriate application of a
statistical program.
For example, some ANOVA programs are written
specifically for balanced, complete designs and explicitly state that
they should not be used for unbalanced data.
Unfortunately, Some such
programs do not check for balance and will produce output that can be
misinterpreted by a naive user.
Programs designed for unbalanced, incomplete data have problems as
well.
There are several alternative analyses possible with
non-orthogonal data, none of which is appropriate for all applications.
Some programs use only one of these analyses, while others offer the
user a
~hoice.
In either case, they may not convey this information to
the user in an interpretable form.
Thus the occasional or
inexperienced user of statistical software may select a package or
option without being made aware of the implications that his or her
choice has on the analysis.
Even the experienced researcher may be
mislead by reading a printout that does not adequately describe the
analysis performed.
Another problem is that no single statistical package provides a
complete selection of statistical techniques.
Therefore, the analyses
available to a researcher are limited, not by statistical
considerations, but by the design and availability of appropriate
computer software.
Even at large computer installations offering a
variety of statistical packages, most researchers do not have the time
or inclination to become proficient in the use of more than one or two
statistical packages.
The major goal of this research is the development of methods and
algorithms to generate fixed-effect, models for a variety of
parameterizations involving arbitrary combinations of crossed factors.
The definitions of the parameters associated with the models will be
presented and a mechanism provided for warning the user when the model
2
is not well-defined.
Emphasis will be placed on development of
methods, algorithms and
~ecision
rules for the case of missing cells.
Computer software will be implemented for the testing and evaluation of
the various algorithms.
1.2 Overview
The remaining sections of Chapter 1 describe the overall research
plan, present the notation to be used, and describe the various models
and parameters of interest to be discussed.
Notation for linear
models, experimental designs, reduction in sums of squares, and
algorithm description are all discussed.
Commonly used types of models
and various primary and secondary parameters are also shown.
Chapter 2 is a review of the pertinent literature and includes
sections on parameter definitions, tranformation between models,
unbalanced and incomplete designs, and algorithms for model generation.
Chapters 3 and 4 present theory, methods, and algorithms for the
complete case (no missing cells) and incomplete case (missing cells),
respectively.
Summary, discussion, and recommendations are presented
in Chapter 5.
1.3 ReSearch Plan
The development of the algorithms can be broken into two phases.
First, methods and algorithms for generating complete models and
parameter definitions will be developed and tested.
This task includes
some theoretical results concerning the definition of model parameters,
methods of model specification, and introduction of the "definitional
matrix" as a mechanism for displaying results. The second phase
3
involves extension of these methods to the case of missing cells.
Several alternative strategies are presented and implemented via
additional theorems and algorithms.
A general requirement of any algorithm developed will be that the
resulting parameters of interest be well-defined.
Previous experience
suggests that no algorithm or program can handle all conceivable
pathological special cases. particularly those involving missing cells.
A variety of special cases will be examined from the aspects of the
failure of the algorithm and the definitions of the parameters.
These
special cases. some of which exist in the literature. will be used to
evaluate the algorithms and to "tune" them in an attempt to broaden the
class of designs for which they are valid.
An attempt will also be made to develop algorithms or decision
rules which determine whether the model parameters and hypotheses are
"reasonable" or whether the underlying design is "too incomplete" to
support the model requested.
Due to the subjective nature of the
criterion, i.e •• "reasonable." this task is not well-defined but may
contribute to an understanding of the boundary between "acceptable"and
"unacceptable" incomplete designs.
1.4 Notation
Unfortunately, a standard notation for linear models does not
exist at this time.
This section will present the general linear model
notation to be used throughout the remainder of this research.
In
addition. notation for expressing various experimental designs will be
developed.
The Reduction of· Sums of Squares, R (), notation will also
be introduced.
An informal programming language. INFL. suggested by
4
Stewart (1973) will be used to describe algorithms.
Except where
noted, matrix valued variables will be denoted as capital letters with
underscores and both row and column vectors as lower case letters with
underscores.
Scalar variables may be either upper or lower case, with
. or without lower case subscripts.
Special column vectors of constants
are represented as !.k fora kxl column of ones and
column of zeros.
A unit vector
~.
.Qk for a kxl
is a column of zeros, except for a
one as the jth elememt.
1.4.1
General Linear Models Notation
Consider a linear model of the form
( 1.1)
-y
=
-X13 + -e
where
Y is the Nxl vector of observed values,
X is the Nxq known design matrix,
RankC.~)=r<;q,
! is the q xl unknown parameter vector,
e is the Nxl vector of errors, distributed N( 0, C1
E( y) =
X13 with V(.!) =
C1
2
2
.P,
and
I.
If the Rank(!) = r =q, then the model (1.1) is full rank (FR); if
r
< q,
then the model is less than full rank (LTFR).
The model may be
subject to restrictions of the form
(1.2) R13 = Q, where! is txq.
If the rows of R are linearly independent of the rows of X, the
restrictions are said to be "non-estimable;" otherwise they are
"estimable ...
5
1.4.2 Experimental Design Notation
Consider a classification design consisting of factors A, B, ••• , H
where the level of each factor is denoted
Aa , a=1, ••• , A
Bb' b=l, ••• , B
Hh' h=1., ••• , H.
Thus a particular treatment combination or cell can be represented by
the ,sequence of subscripts (ab ••• h) and the number of observations in
that cell by Nab ••• h'
The element of the observation vector Y
representing the k-th observation in the (a, b, ••• , h) cell is
Yab ••• hk, k=1, ••• , Nab ••• h.
The expected value of a treatment combination, or expected cell mean,
Js given by
E(
Yab ••• h) = ~ab ••• h'
and the vector of expected cell means is
( 1.3)
where the lexographic ordering of the cells starts with the last or
rightmost subscript.
For example, i f A=2, B=3, and C=2, the subscript
order is Ill, 112, 121, 122, 131, ••• , 231, 232.
Hencefbrth, where the
generalization to higher order designs is obvious, the two factor
design will be used for illustrative purposes.
Collapsing across a particular factor in order to obtain the mean
or sum can be represented by using the "." (mean) or "+" (sum)
reduction operator in place of the
appropria~e
in the two-way model:
6
subscript.
For example,
B
J:I.p. = (liB) I: \.lab for all a-I, ••• , A,
b=l
A
~.b
= (l/A) I: for all b=l, ""
a=l
Na +
= r
B,
B
Nab for all a=l, ""
A,
b=l
A
N+b =
r Nab for all b=l, ••• , B, and
a=l
N++ = N,
the total number of observations.
The extension to a higher number of factors is straightforward.
When
the order of summation is important, explicit summation notation will
be used.
Several of the parameterizations to be studied involve terms such
as "overall mean," "main effects," "interactions," etc.
will be rigorously defined later for each specific model.
These terms
Here we wish
only to specify the generic notation to be used:
Cell mean
= lJab'
Overall or Grand Mean
Mean of Means =
Main Effect
lJ ••
= Aa ,
Interaction Effect
=
lJ,
,
Bb' ••• , Hh ' and
= ABab'
ACac ' .'., AB •••Hab ••• h •
In this notation, Aa is the main effect of Factor A at level "a"; ABab
is the AB interaction at the a th level of A and bth level of B; and so
on.
A vector containing all the levels of a particular effect will be
. written A,
~,
AB, etc.
The extension to more than two factors is
7
usually straigthforward.
Specific terms will be defined in ambiguous
situations.
1.4.3 Reduction in Sums of Squares:
The R () Notation
The R () notation is often used as an aid in explaining and
calculating the sums of squares associated with various ANOVA
hypotheses.
Searle (1971) defines R () as the reduction in total sum
of squares due to fitting a model.
Comparisons of different models for
a given set of data can then be made by comparing their respective
values of R ( ).
Consider the LTFR model ( 1.1).
The R () notation is
defined as:
(1.4)
R(~)
where
If X and
!'
=
E.°'X'.!,
E. 0
is any solution to X'XbO
are partitioned as ~1,!2] and
= X'Y.
fl.l', !2'],
then
R(!1' !2) = R(!),
°
R(_
~_
B2) = _b 2 'X'l'Y,
and
R(!11!2) = R(!1,B2) -R(!2).
R(!II!2) can then be viewed as the additional reduction in sums of
squares due to fitting a model containing (!l,
~2)
over and above the
reduction in sums of squares due to fitting a model containing just
or "fitting
!2;
!1, having already fit !2."
Speed and Hocking (1976) point out that the R () notation can be
applied to a LTFR ANOVA model, as described by Searle (1971), or to a
reparameterized FR model as in Carlson and Timm ( 1974).
Speed and
Hocking note that the R ()'s obtained by these two procedures are not
always the same and rnayeven differ if the same sets of non-estimable
conditions are applied to both.
They further note that, in the case of
8
unbalanced data, the actual hypothesis being tested by the sum of
squares generated using the R () notation may be extremely difficult to
identify.
1.4.4 INFL:
An Informal Programming Language
Stewart (1973) discusses the problems encountered in transforming
an algorithm from a mathematical description to a workable computer
program.
Rather than restrict the algorithm description to a.specific
programming language, he introduces an INFormal Language (INFL).
This
language is very unstructured and is intended to communicate
information about the algorithm to an informed reader rather than to
provide actual computer code for a specific computer language.
A
description and several examples of INFL can be found in Stewart (1973;
pp 83-91).
1.5 Model Definitions
There are four types of models, or parameterizations, that appear
repeatedly in the literature.
In this section we will give the model
equations and parameter definitions for these models in the complete,
but possibly unbalanced, case.
Parameter definitions will be given in
terms of expected cell means, ~ab ••• h = ~Yab ••• hk)' as previously
defined.
For lack of a much needed standard notation, the model
parameters will be designated by a subscript and each model defined
by
Model m:
F.c.:~)
= ![mJ.~[mJ' m=l, 2, 3, 4.
Each model type will be presented as a two-way factorial with
interaction, since this simplifies notation and since generalizations
9
to higher order models present no problems for complete designs.
will first mention Model 0, the classic LTFR ANOVA.
We
This model is not
well-defined, but it forms the basic structure on which the other
models are built.
Models 0, 1, 2, and 3 all have main effect and
interaction terms and can be written in the general form
(1.5) Yabk =
l.l
+
As
+ Bb + ABab + eabk·
As we shall see, the number and definition of these parameters depends
on the model and the restrictions placed on the parameters.
1.5.1 Primary Parameters
Model 0 -LTFR ANOVA Model
This familiar overparameterized model has the form shown in (1.5)
and has parameters
![o]
=
A[O]
Ax1
![O]
Bx1
AB[O]
ABx1.
![O] is not well-defined in this model.
Restricting the parameters
with A+B+l independent, non-estimable restrictions produces a different
model which has well-defined parameters.
The most common restrictions
are the "sum to zero" restrictions that result in Model 1.
10
Model 1 - LTFR ANOVA with Sum-to-Zero Restrictions
Modell has the form shown in (1.5) and has parameters
![l] =
lJ[l]
1 xl
![ 1]
Axl
![l]
Bxl
AB[I]
ABxl.
with the restrictions
A[I]+=O,
B[I]+ = 0,
AB[ 1 ]a+ = 0, a=l,
... , A,
AB[ l]+b = 0, b=l,
... , B.
and
This results in the following definitions:
V[I]
= V••
A[I]a = Va. - lJ ••
B[I]b = V.b -
ll ••
AB [ 1 ] a b = Vab - ].1a. - lJ. b +
lJ ••
for a=l, ••• , A, b=l, ••• , B.
The restricted parameters are now well defined, as discussed in
Chapter 2, but A+B+l of them are linear combinations of the others, as
a consequence of the restrictions.
Model 2-Sigma Restricted Model
Note th,at in Model 1 a parameter at the last level of a factor is
equal to the negative sum of all the other levels:
A-I
A[I]A = -I: A[l]a'
a=1
B-1
B[I]B = -E B[l]b' etc.
b=l
Searle, Speed, and Henderson (1979) use this fact to delete the last
11
level of each factor in order to create Model 2.
The selection of the
last level is arbitrary but will be used for consistency.
This results
in a FR model with the following parameters and definitions:
-lJ[2]-,
![2] = ![2]
I
lxl
(A-l)xl
![2]
(B-l) xl
AB[2]
(A-l)(B-l)xl
where
lJ[2] = lJ ••
A[2]a = lJa. - lJ ••
B[2]b = lJ.b - lJ ••
AB[2]ab = lJab - lJa. - lJ.b + lJ ••
for a=l, ••• , A-I, b=l, ••• , B-1.
Note that, to the extent that particular parameters exist in both
Models 1 and 2, they have the same definitions.
Model 3 - Zero Restricted or Reference Cell Model
Another full rank model of interest is the Reference Cell Model
described by Speed, Hocking, and Hackney (1979) and others.
By
arbitrarily designating the first level of each factor as the
"reference level" and the treatment combination involving all first
levels as the "reference cell," one can define Model 3, again based on
( 1.5), as:
!(3] =
lJ[3]
1xl
![3]
(A-I) xl
!(3]
(B-l) xl
AB[3]
(A-l)( B-1) xl
12
where
lJ[3] = lJll
A[3]a • lJa l
-
lJll
B[3]b = lJlb - lJll
=
AB[3]ab
for a=2,
lJab - lJal
..., A,
-
lJlb + lJll
... , B•
b=2,
Note that the reference level effects do not exist in this model.
If
the restrictions
= 0,
Al = Bl = AB1b = ABa 1
for all a, b,
were added to Model 0, and these parameters were then omitted from the
model, the result would be Model 3.
Model 4 -Cell Mean Model
This model is simplest to define; the parameters are simply the
means of all non-empty cells:
i[ 4] =.H.,
where
lJ
= {lJab} =
n Yab)
for all ab where Nab> 0.
In both the two-way and three-way layouts with multiple observations
per cell; Scheffe (1959) first assumed a cell mean model and then
defined the standard ANOVA parameters in terms of the cell mean
estimates.
Recent authors, including Searle, Speed, and Henderson
(1979); Urquhart, Weeks, and Henderson (1973); Hocking and Speed
(1975); Hocking, Hackney, and Speed (1978); Bryce, Carter, and Scott
(1980); and Hocking, Speed, and Coleman (1980), advocate the cell mean
model for multi-way designs.
Note that the cell mean model is the only
model with all irtteractionsthatis full rank with an arbitrary pattern
of missing cells.
If a particular cell has no observations, that
13
element of the parameter vector is omitted from the model.
As
will be
shown, Models 1, 2, and 3 become somewhat more complicated in the case
of missing cells.
1.5.2 Secondary Parameters
Typically the terms "main effects" and "interactions" are used in
the analysis of factorial designs.
Speed, Hocking, and Hackney (1975),
Helms (1978), and Vivaldi (1982) point out that these terms do not have
precise meanings, and they demonstrate this point by listing several
different and distinct main effects for the same experimental designs
and linear models.
Three types of effects will be described here by
defining a Cell Mean model for a 2 x' 3 factorial design:
( 1.6)
E(Y) = ![4] ![4] = .!~,
where ~ = [~11 ~12 ~13 ~21 ~22 ~23]'·
The three types of effects will be designated as:
(1.7)
~[i]
= £[i]
~
for i=l, 2, 3.
Figure (1.1) plots hypothetical cell means versus the levels of Factor
A and Factor B.
Average Distance Between the Lines (ABDL) Effects
Consider Figure 1.1a.
One measure of the "A" effect is the
average difference between the lines connecting the points at each
level of A, or the average of the differences
(~2j -~lj)
j = 1, 2, 3,
or
3
0A[I] = 1/3 1: (~2b - ~lb) = 1/3 [-1 -1 -1 1 1 1] ~
b=l
Referring to Figure l.lb, a similar meUGurc of the differences
between levels of factor B is the average difference between the three
14
L
e
v
e
1
2-
lJ21
s
o
f
F
lJ22
1-
lJll
a
c
t
o
r
lJ12
1-
A
1
I
I
2
Levels of Factor B
3
1.la - Lines at· each level of A.
e
L
e
v
3-
e
1
s
0
f
2-
F
a
I
c
t
0
1-
lJ23
lJll
r
B
lJ21
--~-_.-r--
1
2
Levels of Factor A
1.1h - Lines at each level of B.
Figure 1.1
Hypothetical Cell Means for a Two-Way Factorial
15
lines connecting the points at each level of B.
There are three
different sets of "differences:"
( 1) (lJi2 - lJi 1) , (lJi3 - lJi 1) ,
(2) (lJi2 - lJi 1) , (lJi3 - lJi2) , and
(3) (lJi3 - lJi2) , ( lJi3 - lJi1)·
These three sets of differences are said by Vivaldi (1982) to be
different versions of the same effect.
By convention, ·we will take the
difference between the first level and all other levels such that the
ABDL for a factor B is
1- i
~[11
=
I
1/2
(lJi2 - lJil)
i=l
-I
-1 1 0 -1 1 0
=
1/2
-1 0 1 -1 0 1
2
1/2 I: (lJ13 - lJil)
i=l
Interaction effects measure the extent to which the lines in
Figure 1.1 are parallel. As was the case with factor B, there are
several versions.
Again, by convention, we will take the difference
between the first level and all other levels in defining the·
interaction effects.
~B[11
=
\1-
lJ22 - lJ12 - lJ21 + lJll
J.l23 - lJ13 - lJ21 + lJll
- 1 -1
=
1
o -1
o -1
1 0-
-1 0 1
~.
Deviations from the Means (DFM) Effects
The DFM main effect for a level of factor A is defined as the
difference between the mean of all cells at that level and the overall
mean of means
16
eA[2]
=
-
11 ••
-
11 ••
-'
Note that as a consequence of these definitions, the last level of
each factor is equal to the negative sum of all other factors.
These
effects are of course the effects defined in the LTFR ANOVA model with
sum to zero restrictions.
By removing the elements containing the last
level of each fator, the DFM effects become the primary parameters of
the Sigma Restricted Model defined in Section 1.5.1
~[2]
= [1I1.
~[2]
=
.QA.B[2]
= 1/6
- 11 •• ]
11.1
-
U.2
-
1I ••
1J11
-
l1L
2 -1 -1
1I • •-,
=
-
=
1112
[ 1 1 1 -1 -1 -1] ~,
-
-
111.
2 -1 -1
1/6
~,
2 -1 -1
-1
2 -1 -1 -2
1J.1 + 11 ••
1J.2 + 11 ••
and
2 -1
1 1
1/6
=
~.
2 -1
-1
1 -2 1
-'
Reference Cell Effects (RC)
The RC main and interaction effects are the primary parameters of
the Reference Cell Model described in Section 1.5.1.
GA[3] = [ 1I21
-
1111]
=
-1 0 0 1 0 0 ]
--1 1 0 0 0 0-
1I12 - 1I11
0B[3] =
1J13
1J22
0AB[ 3] =
1123
~
=
-
1121
1I21
.H.
-1 0 1 0 0 0
1I11
-
- 1113 + 1I11
-'
17
-1
1J12 + 1111
-1
0 -1 1 0-1
=
1
0 -1 -1 0 1
I
~.
Chapter 2
Review of the Literature
Although the current statistical literature does not contain many
references to algoritluns for generation of linear models per set there
are several areas which are pertinent to this research.
One key topic
is parameter definition t the explict definiition of the parameters of a
linear model in terms of the expected cell means.
Next t since a
variety of different models can be fitted to a particular design,
transformation among various models will be described. Third t an .
historical review of methods used to handle unbalanced and incomplete
designs is presented.
FinallYt currently implemented algoritluns and
methods of model generation are discussed.
2.1 Parameter Definitions
As
mentioned previously, considerable attention has been focused on
the question of what hypothesis is being tested by a particular
program.
Helms (1980) addresses the more basic question of the
definitions of the model parameters.
With clear definitions of these
parameters, an hypothesis often can be simply and unambiguously
stated.
2.1.1 Full Rank Model Definitions
A primary parameter
~
is definable, or well-defined (Helms, 1980),
iff there exists a unique solution to the consistent equations
possibly subject to the restrictions
(2.2)
R!=Q,
where R is t x q and of full row rank.
Consider the FR model (2.1) with no restrictions.
estimable.
( 2.3)
! is unique and
The canonical definition of ! is given by Helms (1980) as
! = (.! I!) _I.! IE(~) = !1
Dc:
E(.'9.
The matrix Al defines! in the sense that it gives the linear
combination of the expected values of
!
that make up each element of
~.
He further states that a particular pararmeter is well-defined if and
only if i t is estimable.
A "well-defined model" is one whose
parameters are ali definable and estimable.
Now consider a second definition for
!:
Dc and D are said by Helms (1980) to be "equivalent definitions" i f and
only i f they define the same vector of parameters.
He then shows that
Dc and D are equivalent if any of the following conditions hold:
~1 E(Y)
~1 X. =
(2.4)
~
!
~ =
=~
& !'
.!.'
or
Ai -
!,
=
E(!),
where KX ""
o.
The distinction should be noted between f3 and its least squares
estimate b
( 2 .5)
~
= (X I!) -
The definition of
i
IX I Y.
requires no data, while b is a linear
combination of the vector of observed values, Y.
19
Equations (2.3) and
(2.5) are computationally similar, and
~~)=~in
the full rank model.
Confusion arises when the model is LTFR and a unique inverse of XIX
does not exist.
2.1.2 LTFR Model Definitions
If the model defined by (2.1) is LTFR, then a unique inverse of
XIX does not exist, the parameter
~
is not estimable, and
defined in the sense of Helms (1980).
~
is not
Searle (1971) points out that
the normal equations in the LTFR case have an infinite number of
solutions.
He denotes these solutions as
where (!IX)-is any generalized inverse of XIX.
strongly that b
an estimator
o
He emphasizes quite
is only a solution to the normal equations and is not
of~.
He states:
In a general discussion of linear models that are not of full
rank, it is essential to realize that what is obtained as a
solution of the normal equations is just that, a solution and
nothing more. It is misleading and in most cases quite wrong
for b O to be termed an estimator, particularly an estimator of
e. It is true that b O is an estimator of something •••• but not
of a, and indeed the-expression it estimates depends entirely
u~on which generalized inverse of XIX is used in obtaining
E. • (p. 169)
- An
alternative method of finding a particular solution to (2.6)
involves imposing the "usual sum to zero constraints" on the solution.
Searle again stresses that these constraints are merely a method of
obtaining a particular solution:
They [constraints] can be used whether or not a similar
relationship holds for the elements of themodel ••• constraints
on the solutions do not necessarily imply restrictions on the
model and therefore constraints do not affect estimable
functions or testable hypotheses ••• But if the model is such
that there are restrictions on its parameters, these same
restrictions can be used as constraints. (p. 212)
20
The reason for belaboring this distinction between "constraints on
the solution" and "restrictions on the parameters" is to make the
following points pertaining to LTFR models:
1.
Constraints on the solution are applied to the normal
o
equations X'Xb =X'! and have no effect on the estimability or
definability of B.
2.
~Y)
Restrictions on the parameters are applied to
= XB, or,
equivalently, to (X'X)! =,!' ~ Y)and will lead to well:"defined
parameters only if the proper number of linearly independent,
non-estimable restrictions are imposed.
3.
Since weare interested mainly in models that have
well-defined parameters t we will focus on FR models and on
LTFR models which include the proper restrictions, as defined
below.
Consider the LTFRmodel (2.1) with Rank( X)=r<q.
If the
restrictions in (2.2) with t = q-r exist, such that
-1
XIX
R'
(2.7)
=
R
o
exists, then the canonical definition of the restricted
i
is
Thus FR models of the form (2.1) will have parameters defined by (2.3),
while LTFR models (2.1) with restrictions (2.2) that meet the
conditions (2.7), will have parameters defined by (2.8).
21
2.1.3 Alternative Definitions
Although the canonical definition can be computed in a
straightforward manner. it is not always the simplest or most intuitive
definition.
This section will present some results given by Helms
(1980) which provide useful ways of computing equivalent definitions
for FR models and LTFR models with restrictions.
The first technique involves using Gaussian elimintation ( Stewart.
1973) to operate on the matrix [X,I] such that
[X,I] +
g
0
Since A2 !
= 1, !.2
definition of B.
-!2-
!.2-
1q
=
g
[!,I]
J
forms an equivalent definition to the canonical
In addition, QX = Q, so
g
represents restrictions on
the rows of X.
The second result selects the rows of the X matrix corresponding to
the first observation from each non-empty cell to construct the "essence
model"
(2.10)
FX:Ye )
=!ei,
Ncxq
where Nc is the number of non-empty cells, and
cell means as defined in (1.3).
~Ye) =~,
the vector of
The relationship between models (2.1)
and (2.10) is defined by an NcxN matric F which consists of those rows
of the NxN identity matrix that "pick off" the rows of
corresponding to the first observation in each cell:
and !e • FX.
22
~:D
and X
The canonical definition of B for the essence model is then given by
( 2. 11 )
i
Dec:
=
(!e'!e) _l!e'
E(.!e) =
!e 1
E(.!e).
Let a second definition of model (2.1) be
( 2 • 12 )
i
De:
!e 1 !
=
E( .!).
Helms shows that this definition is equivalent to the canonical
definition D given in (2.3).
Since! E( Y)
= .!!.'
this allows the
parameter vector! to be defined in terms of the expected value of the
first observation in each cell, which is equal to the cell mean.
This
can result in a simpler and more intuitive definition as will be shown
later by an example.
2.1.4 Secondary Parameter Definitions
In the analysis of linear models, it is often convenient to look
at linear combinations of the parameter estimates.
Given the FR or
LTFR model (2.1), these secondary parameters can be given in terms of
the primary parameters of
E( Y)
as
or
o=H
E( Y) - ~
where
(2.16)
o is
the sxl secondary parameter vector,
C is an sxq matrix of specific constants,
H is an sxn matrix of specified constants,
i
is the q xl primary parameter vector, and
~
is an sxl vector of specified constants.
Helms (1980) presents secondary parameter defintions and
conditions for equivalence that are analogous to those given for the
23
primary parameters.
defined iff CX-X
- = -CS
A secondary parameter 0
= C or,
is definable or well
/
equivalently, for an arbitrary value of
there. exists a unique value of
e satisfying
(2.16).
~Y)
The equivalent
canonical definitions of 0 are
Del :
~ = ~ X' !)- X' ~ 1)
(2.17)
- £
or
Dc 2:
~ =
H E( Y) - £
=
.!!
X(!'!)-!'
E(.!) -' £.
Necessary and sufficient conditions for equivalence of secondary
parameters are given by Helms (1982).
Given a LTFR linear model (2.1)
subject to the restrictions (2.2), let four secondary parameters of the
form decribed in (2.16) be given by
D1j:
01j =.£j
D2j:
0zj
=!i
~
- gl for j = 1, 2
E(.!) -
V for j
=
1,2.
Then for any generalized inverse of R, !-:
Dll and D12 are equivalent iff (£c£2)
= (£1-£2)!-!,
D21 and D22 are equivalent iff (.!!1-.!!2)!
D1i and D2j are equivalent iff £1
(Ci-.!!jX)
(2.18)
= (S -.!!j29!-!
for i*j
= £2.
=
=
(.!!l-.!!2)XR-!,
and
1, 2.
For the FR case, Rand !- are null and the conditions
for equivalence reduce to:
Dll and D12 are equivalent iff
£1
= £2,
D21 and D22 are equivalent iff .!!1!
= .!!2!,
Dli and D2j are equivalent iff gl =V and
£i
= .!!j!.
The concept of an "essence" model is extended to the secondary
.,
parameter definitions by Helms (1980).
Given a linear model (2.1) and
the corresponding essence model (2.10), let the following two essence
24
model equations be equivalent:
(2.19)
o = CB
- £
o = !e
E(!e) -
£.
Del and De 2 are then equivalent to the canonical definitions of the
full model:
Del:
..@ = ~ X'X)-!' E( Y) -
Dc 2:
~
= !!E(.!)
-
£
£
where
H
= He!
and C
=
!!e!e
=
l!e ! !
= HX.
F consists of those rows of the N x N identity matrix which pickoff the
first observation in each cell.
Note that an unrestricted LTFR model which has undefined primary
parameters may have well-defined secondary parameters.
The most common
example is the ANOVA model where the main effects are undefined. but
contrasts involving these main effects are well defined.
2.2
Transformation Between Models
It is often the case that more than one model will be defined for
the same experimental design.
Certain models. especially in "messy"
data applications, are easy to generate but have parameters that are
not directly interpretable.
Other models may have more useful
parameters but are difficult or impossible to implement using current
statistical software.
This section will provide tools for transforming
one model· to another and for studying the relationship of one model to
another.
This will allow the researcher and statistician to define a
model that is straightforward in terms of available software and then
25
transform to one or more equivalent models for analysis.
2.2.1 Isomorphic Models
Helms (1978) defines two models, M1 and M2'
M1 :
E(~) = !1 !1
M2:
E(!)
( 2.20)
= !2 !2
as equivalent if and only if
~
for any 1.1 there exists a
such that Xl 1.1 =
!2 !2,
and
(2.21)
for any
!2
an~
He states that equivalence can exist if
same space.
!2 !2 = !1
there exists a !1 such that
only if !1 and
!2
!1·
span the
He further notes that for any two equivalent models Ml and
M2' the Rank( Xl)
= Rank(!2).
Shilov (1974) denotes matrices that span
the same space as isomorphic.
In order to avoid confusion between
equivalent definitions and equivalent models, we will adopt Shilov's
nomenclature and use "isomorphic models" as meaning models that meet
Helms' condition for equivalence (2.21).
Necessary and sufficient
conditions for isomorphism are given by Helms (1978).
Assume two linear models described by (2.20).
The two models are
said to be isomorphic if and only if there exist matrices
such that
Xl = X2 H21 ,
!2
= !~12'
where
(2.22)
.!!21 = !2-X1
!!.12
!.
+ (!2-!2
= !1-!2 +
-.!.)~,
(!1-~.1 -.!.)Z,
and
is arbitrary.
Further, if the two models are isomorphic then
26
~12
and
~1
!1 = H1lh and
!2- = !il1!l·
Note that this result does not require that the two models have
equivalent parameter definitions; in fact, it does not even require
that the models be well defined.
As stated earlier, the primary
emphasis of this research is on well-defined models.
Therefore, we
will concentrate on transformations that result in one of the
isomorphic models, say M2' always being well-defined.
There are several cases of interest.
These cases are identified
by various combinations of several characteristics of the models:
Rank(Xl), Rank(!2), the restrictions (1f any), and the relative sizes
of !l and
12.
Using the results of Christiansen and Helms (1980) and
the notation of (2.17), let W be a non-singular q x q matrix such that
(2.20)
!2
= !1~1'
!2 = .!!.1 SI,
where
WI is qxs,
~ is q.xt,
U1 is sxq, and
.!!2
is txq •
We can then write the original model as
E(
Y) =!1 !1 = !l ~.!!. !1
= Xl ~1 .!!.1 !1 + !1 ~
27
.!!2
!l·
Now if Wand U are selected such that
for all !1 in the solution space of the model, then
and M1 and
2.2.2
M2
are isomorphic.
Full Rank Isomorphic Models
Helms (1978) shows that in the FR case the models given by (2.20)
that meet the conditions of (2.22) will have unique transformation
matrices:
H12 = (!1' !1) - 1 !1' !2,
l!21
=
(!2'
!2)_1 ,!z' Xl'
H
where .!!12 = ~1
_1
•
Thus a FR Model 1 could be fitted and H12 calculated.
vector !1 can t h en b e trans f orme d b y
the entire model.
~
H _1
= _12
The parameter
1
!1 wi tlout
re f1 tt i ng
Note that this result can be stated in the notation
of (2.23) by setting t=O so that
~ =
.!!.1 = .!!12
(!!.2.
is null),
1
U = ~1 = H21 = W1- (~ is null).
2.2.3 LTFR Transformations
Historically, transformation was used to reparameterize the LTFR
ANOVA modele Model 1) into a FR model (Model 2) for computational
purposes.
Helms (1978) and Bryce, Scott, and Carter (1980) perform
such a reparameterization on a LTFR model with Rank( Xl) = r<q.
wishes to construct the matrices described in (2.23) such that
(2.26)
X1.!!.2 = 0,
28
One
thus satisfying condition (2.24) and establishing that M1 and
isomorphic.
are
Selecting W2 orthogonal to Xl can be thought of as "taking
-
-
up the slack" in the LTFR model.
Wand U are not unique, and any Wl
and W2 satisfying (2.23) and (2.26) will result in M1 and
-
-
isomorphic.
~
The selection of WI and
definition of the parameters of M2.
~
~
being
will, however, affect the
That is, different choices of WI
lead to different (but isomorphic) models M2.
2.2.4 Reduced Models
In the previous two sections we started with a LTFR model and
either added restrictions or reduced the number of parameters in order
to obtain a model with well-defined parameters.
In contrast, one can
start with Model 1 as a FR model and add restrictions of the form
= O.Finding
U1 ,
.!iI'
and
~
~2
Al
as described in (2.23) allows one to
transform M1 with its restrictions to M2.
Examples are shown in Bryce,
Scott, and Carter (1980) and Christiansen and Helms (1980).
Note that,
in this case, the number of parameters is reduced by s, the number of
independent rows of U2' and the estimation space is reduced
accordingly.,
2.2.5 Comparison of Model Effects
The concept of parameter definition is extremely useful in this
context.
effect.
A primary or secondary parameter may be thought of as an
Using the transformation techniques discussed in the previous
section, a model with well-defined parameters can be transformed into
an isomorphic model with a different set of primary parameter
definitions which may, in fact, be the effects of interest.
29
A vector of secondary parameters e
desired effects.
= Ca
can also represent the
Equivalence of effects can be tested using the
results of section 2.1.4.
Using (2.15) to test for equivalence of the
main effect and interaction effects for the three methods described in
section 1.5, we observe that the main effects are not equivalent for
the three methods and that the DFM interaction effects are different
from the other two.
It has been shown by Helms (1980), Vivaldi (1982) and others that
hypotheses testing whether or not an effect is zero are equivalent for
the three methods.
Helms (1982) gives the conditions for equivalent
hypotheses resulting from different secondary parameter vectors.
e·
2.3 Unbalanced and Incomplete Designs
The analysis of variance of experiments involving unbalanced data
(unequal cell frequencies) was discussed by Brandt (1933) for the
special case of a 2xb design.
He stated the problem succinctly when he
wrote, "Equal cell frequencies are frequently physically impossible to
obtain•••also experimental units are lost during the experimental
period.
Thus, it is easy to understand that the idea of an equal
number in each subgroup or classification is rarely realized."
The
various methods used in the analysis of unbalanced and incomplete data
will be reviewed in the following sections.
.
Algorithms for generating
models for messy data situations will then be discussed.
2.3.1 Method of Fitting Constants
Brandt (1933), acting on a suggestion by Fisher, adjusted the cell
means such that there was no interaction, but the marginal totals
30
remained unchanged.
Since the adjusted cell means then differ by a
constant amount, the interaction sums of squares obtained does not
represent true interaction but "is due to the disproportionate
frequencies only ••• "
and should be subtracted from the interaction
sums of squares calculated using the original data.
Formulas for
calculating the main effects and interactions are given for any
unbalanced, complete (no missing cells) 2xB design.
Yates (1934) generalized the fitting constants method to the axb
case.
He pointed out that Brandt implictly assumed an additive (no
interaction) model when he required that the "interior 2 way means
differ only by a constant." The method is given in several contemporary
texts and is referred to by some authors, including Yates, as the
"method of least squares."
Speed, Hocking, and Hackney (1978) point
out that the least squares label can be misleading since, with the·
exception of approximate methods, the majority of all unbalanced data
methods are also "least squares methods."
Searle (1971) explains that
"fitting constants" refers to fitting the coefficients of the terms in·
a linear model.
His reference to this method as "regression on dummy
variables" is likewise too imprecise, since the other least squares
techniques are also regression on dummy or indicator variables.
Overall and Spiegel (1969) describe three least square methods of
analysis, two of which are fitting constants.
They recommend their
"experimental design method" of analysis in those applications where
"the problem is conceived as a multiclassification factorial design and
where conventional analysis of variance would have been employed except
for unequal cell frequencies."
The method consists of adjusting each
31
main effect for all other main effects but ignoring higher order
effects.
Interactions are adjusted for all effects of equal or lower
order while higher order interactions are ignored.
The second fitting constants approach discussed by Overall and
Spiegel is the "a priori ordering" method in which the researcher has
reason to believe a logical hierarchy of effects exists.
An effect is
tested by adjusting for all effects preceding it and ignoring all
. effects following it, in the specified sequence.
2.3.2 Method of Weighted Squares of Means
Yates (1934) suggested that the fitting constants method is
appropriate i f the additive model is assumed.
If
not·, he recommended
that the method of weighted squares of means be used.
Yates noted that
Fisher described the proper sums of squares for a one-way unbalanced
ANOVA as involving a weighted sum of the squares of the cell means,
where the weighting is proportional to the cell frequencies.
He
generalized this result to the multiple classification case using
harmonic marginal means of the cell frequencies as the weights for the
main effects.
This method is the most frequently referenced procedure for
analyzing unbalanced data.
It is given in most statistical methods
texts, e.g. Neter and Wasserman (1974) and Searle (1971), and is.
mentioned in recent papers dealing with unbalanced or incomplete data.
Overall and Speigel (1969) refer to this method as "complete least
squares" or "general linear models analysis" and explain that it is
"simply a conventional least squares multiple regression solution in·
which each effect, whether it be main effect or interaction, is
32
adjusted for relationship to all other effects in the model."
Speed,
Hocking, and Hackney (1978) point out that the weighted squares of
means method actually results in testing unweighted means in the main
"-
effect hypotheses.
They argue that, in general, hypotheses that do not
depend on the non-zero cell frequencies are desirable.
This view is
shared by Searle (1971), Carlson and Timm (1974), Herr and Gaebelein
(1978) and Hocking and Speed (1975).
2.3.3 Hypotheses of Interest
In the complete, balanced, multiple classification design there is
general agreement on the hypotheses to be tested for main effects and
interactions.
With incomplete designs and/or unbalanced data, however,
many generalizations are possible, and no apparent concensus exists
among statisticians as to which method is appropriate.
Witness the
controversy created by Overall and Speigel ( 1969), Francis (1973), and
Kutner (1974) related to complete, unbalanced problems.
These articles
generated a considerable number of notes, letters to the editor,
comments, and rebuttals concerning which hypothesis was correct for a
two-way unbalanced analysis of variance.
See, for example, Bryce
(1975), Carlson (1975), Gianola (1975), and Kutne (1975).
The dispute
prompted NeIder (1975) to comment, "I am confirmed in my belief that
multi-way tables are the least understood of data structures and the
worst taught in statistics courses."
Some of the disagreement among statisticians stems from the fact
that many of the commonly used computer programs and statistical
packages do not explicitly describe the model being fitted or the
hypotheses being tested.
Francis (1973), Golhar and Skillings (1976),
33
..
and Hosking and Hamer (1979) compared much of the currently available
software and found no more consistency in their approches to missing
and unbalanced data than exists among the statisticians commenting on
the articles mentioned above. Searle and Henderson (1978) have prepared
a series of annotated outputs from the major statistical packages in an
attempt to answer the question, "What hypothesis is being tested?"
The lack of agreement on the appropriate analysis of unbalanced
data is further evidenced by a session on "Computing Approaches to the
Analysis of Variance for Unbalanced Data," presented by Herberger and
Laster (1977) at the 13th Interface of Computer Science and Statistics,
Washington, D.C.
The participants (Frane, Goodnight, Searle,
Wilkinson, Hocking, Hackney, and Speed) all addressed the same
unbalanced design and arrived at several different analyses.
More
recently, an entire issue of Communications in Statistics (1980) was
dedicated to "Analysis of Variance with Unbalanced Data."
(Specific
papers in this issue are described in the appropriate sections of this
review. )
Speed, Hocking, and Hackney (1978) summarized the common
hypotheses associated with the two-way ANOVA in the unbalanced but
completed case.
Table 2.1 (Speed, Hocking, and Hackney, Table 1) lists
the various A main effect, B main effect, and AB interaction hypotheses
that can be tested in the two-way factorial design.
The hypotheses are
given both in terms of the Cell Mean model (Model 4) and the LTFR ANOVA
~
Model with "the usual restrictions" (Model 1).
The four different main effect hypotheses given for each factor may
be characterized as comparing either weighted or unweighted averages of
34
e
e
e
Table 2.1 - Hypotheses Usually Tested for a Two-way
Fa~torial
CELL MEAN MODEL (Model 4)
Design (all a
1
a', b
1
b')
LTFR ANOVA. MODEL WITH "USUAL RESTRICTIONS"
(Model
1)
A HaInEftec:ts
. '\'
HI
Weighted Squares of Means
Generat Linear Models
R( A[ I) 11[11' B[II' AB[ 1)
lIa • - lIa ' •
'\
H2
Unadjusted Fitting Constants
Testing A, ignoring B, AB
R( AlII)
bt Nabllab/Na • • bt Na'blla'b/Na' •
Aa + ~ Mab(B b + ABab)/Na • - '\' + l: Na'b(B b + ABa'b)/Na ,.
H3
Adjusted Fitting Constants
Testing A, adjusting for B,
ifriorin g AB
R(A II, B)
~ Nabllab • ;, ~ NabNa'blla'b/N.b
(Na • - ~ Nab2/N.b)Aa + ~ (Nab - Nab2/N.b)ABab l:, [t
a
H4
Referejce Cell
R(A[3) 1113)' B(3)' ABO)
I)
I'IB.
NabNa'b/N.b)Aa , + ~ (NabNa'b/N.b)ABa'b)
lIal - lIa ' 1
Aa - 0, a-2, ""
lI.b - lI.b'
Bb - Bb ,
A
n r.rrects
H5
Weighted Squares of Heans
Generat Linear Models
(((B[l) 11(1), All)' AB[l)
H6
Unadjusted Fitting Constants
Testing B, ignoring A, AB
((( Bill)
a
Adjusted Fitting Constants
Testing B, adjusting for A.
ifnoring AB
R( B II)
.
a
H7
b
l:
Nabllab/N.b • t Nab 'lIa b,/N.b'
a
I: Nabllab • I:
I: NabNab'lIab,/Na.
b' a
Bb + l: Nab('\ + ABab)/N. b - Bb ,
a
--H9
Refereyce Cell
R(B[3] 11[3]' AU]' ABU])
AI)
~, [I: NabNab,fNa,>Bb,
+
t (NabNab,fNa.>ABabd
a
IIlb- IIlb'
Bb - O. b-Z, ""
lIab - lIa 'b - lIa b' + lIa 'b' .. 0
ABab - ABa'b - ABab , + ABa'b' - 0
B
J.n[era~tlon
Testing AB, adjusting for
A. B
R(ABIII. A. B)
-
Nab '(,\ + ABab,)!N. b ,
(N.b + I: Nab 2/ Na .)Bb + l: (Nab - Nab2/Na.)ABab a
a
a
H8
+ I:
the expected cell means.
The comparisons consisting of the unweighted
average of one of the main effect across all levels of the other main
effect (HI' HS) are associated with the method of weighted squares of
the means.
The weighted means hypotheses result from the application
of the fitting constants approach.
They relate to adjusted (H3' H7)
and unadjusted (H2' H6 ) numerator sums of squares.
"Adjusted" refers
to whether or not the sums of squares due to other main effects are
"adjusted for" in the hypothesis, as described in the experimental
design and a priori methods of Overall and Spegiel (1969).
Several authors refer to the unadjusted method as "ignoring" the
other effects and to the adjusted method as "including" them.
Comparisons involving only the first level of each factor (H4' H8) are
also given.
Speed, Hocking, and Hackney (1978) associate these
hypotheses with what they call the "Dummy Variable or Regression
Method."
This method is equivalent to using Model 3, the Reference
Cell or Zero Restricted Model and testing H4 and H8 in Table 2.1.
Table 2.2 presents the hypotheses tested by the various methods
previously mentioned and by some of the most commoly used statistical
software.
Speed and Hocking (1976) observe that the R () notation does
not uniquely define the hypothesis being tested.
They compare HI and
H4 and note that "R(!I.~,~,AB)" is used for both hypotheses.
The fault
is not with the R () notation per se but with the fact that .H.' A,
are not well defined.
~,
Note that in Table 2.1, the subscripts [1) and
[3] defIne the parameters and uniquely identify the hypotheses.-
36
AB
Table 2.2
Hypotheses Tested by Various Methods and Computer Programs
Hypotheses
2
1
Weighted Squares of Means
Overall and Spiegel I
3
'4
5
6
7
8
'*
'*
'*
Unadjusted Fitting
Constants
'*
Adjusted Fitting Constants
Overall and Spiegel II
A Priori Ordering
Overall and Spiegel III
(Order is A, B, AB)
'*
'*
'*
..
Dummy Regression
Reference Cell Model
Zero Restricted Model
'*
'*
'*
'*
'*
SASGLM Procedures
Type I Sums of Squares
(Order is A,B ,AB)
'*
'*
SAS GLM Procedure
Type II Sums of Squares
'*
9
'*
'*
.
'*
'*
SAS GLM Procedure
Type III, IV Sums of
Squares
'*
'*
'*
BMD
'*
'*
'*
SPSS ANOVA
'*
Sources:
'*
..
..
'*
Speed and Hocking (1976), Speed, Hocking, and Hackney (1978),
Hosking and Hamer (1979).
37
2.3.4 Missing Cells (Incomplete Designs)
The extreme case of. an unbalanced design is that in which one or
more of the cells contain no observations.
Some experiments such as
BIB designs and Latin squares are designed to be incomplete, but these
and other purposefully incomplete designs usually involve certain
assumptions regarding interactions which make possible the definition
and estimation of effects of interest.
Accidental loss of data and naturally occuring incomplete designs
were studied by Elston and Bush (1964).
They presented hypotheses for
testing two-way crossed-factor designs with interactions for certain
patterns of missing cells.
extension of the method is suggested
An
that requires that at least one level of the factor contain no missing
cells.
For the case of a single missing cell, say the (p,q) cell,
Elston and Bush (1964) tested a weighted A main effect hypothesis for
the Cell Mean Model (Model 4) as
HO:
~
Wb
=
~ab
b
~
Wb
~a'b
b
"for all a '" a' '" p.
For purposes of illustration and simplification Wb will be set to I
so that the hypothesis becomes HI of Table 2.1:
(2.23)
HO:
~ ~ab
b
=
~ ~a'b'
for a, a' '" p.
b
This hypothesis has A-2 degrees of freedom and involves only those rows
of the design that contain no missing cells.
Elston and Bush suggest
adding the following one degree of freedom hypothesis:
( 2.24)
HO:
(I!AB)
~
~
a:Fp
b"'q
~ab =
38
«A-l)!AB)
~
~pb'
This hypothesis can be interpreted as comparing a weighted average
of the existing cell means in the row containing the missing cell to
the average of the corresponding columns from the complete rows.
Testing (2.23) and (2.24) jointly gives an hypothesis with A-I degrees
of freedom that, according to Elston and Bush, "tests, as far as the
data will allow, for the main effects of the A factor."
An extension
of this method to the case where multiple cells are missing at the same
level of A is also given.
Similar hypotheses can of course be tested
for Factor B main effects.
The most general case covered by Elston and Bush requires that at
least one level of A have no missing cells and that the design be
"connected," as discussed by Bose ( 1947).
Connectedness, paraphrasing
Bose in order to conform to current notation, is defined as follows:
Definition.
A two-way design is said to be connected if it is
possible to pass between any two non-empty cells by alternately
changing a row or column and never landing on an empty cell.
m
Consider an AxB design with missing cells .~p1
q ,1
•••m
, ~p q.
It
is assumed that the m missing cells are at m different levels of Factor
A.
The Elston-Bush hypotheses to test A main effects would then be
. E ~l:lb = E ~atb' for all a,a' t- pI,
( I!AB)
• •• J
E
at-pl ••• m
( 2.25)
(I!AB)
E
I:
atpl •••m btqm
~ab =
« A-m)! AB)
The first line of (2.25) represents A-m-l tests of equality among the
A-m complete rows, while the next m lines test each incomplete row
39
against a weighted average of the appropriate columns of the complete
HO: (I/AB)
1:
a:#pl •••m
L
lJab
= «A-m)/AB)
b:#q,r
IJpb.
1:
b:#q,r
Similar hypotheses exist for B main effects, assuming that at
least one level of B has no missing cells.
Interaction hypotheses of
the form
H0:
lla b -lJa Ib -lJa b I + lla Ib I
can be tested only for a, aI, b, b ' not involving missing cells.
Frane
(1980) suggested that the complete-design hypothesis should be tested
"to the extent that it can be tested" in the incomplete case.
That is,
only the A-m-l hypotheses represented by the first line of (2.25)
should be tested as the "reduced A main effect."
Hocking and Speed (1975), Hocking, Hackney, and Speed (1978), and
Hocking, Speed, and Coleman (1980) all give examples of incomplete,
unbalanced designs with the highest order interaction restricted to
zero.
Given these restrictions, it is possible to test main effect
hypotheses that are either the same as, or are similar to, those tested
in the complete case.
Other authors take the view that hypotheses
"similar to" the balanced hypotheses are appropriate.
Hocking, Speed, and Coleman (1980) suggest using a cell mean model
and modifying the restrictions and hypotheses for a particular missing
cell pattern and testing the "effective" hypotheses.
Their method
consists of defining the hypothesis that would be tested if all cell
frequencies were non-zero using a Cell Mean Model (Model 4) with
interactions restricted to zero:
40
= x[4] ![ 4 ]
E( .!)
M4 :
subject to R6[4]
= o.
The "effective constraints," say R*, are found by reordering ![4]
and R[4] such that the missing cells occur first:
![4]
1-![4]m-
=
![4]o
where m indexes the missing cells and
0
indexes the occupied cells.
Row operators are applied to R form
R
+
~
0
Note that RB[4]
(2.26)
*
Ro-
R*
= Q implies
! *m ![4]m +!o * ![4]o = o.
H
+
*-*
!!m H 0
0
H*
The "effective hypothesis" is then given by Ho :
H* ![4]o
= Q.
Hocking and Speed (1976), Henderson and McAllister (1978) and
Speed, Hocking, and Hackney (1978) present interaction models and
suggest restricting the interactions of missing cells to zero.
Hocking, Hackney, and Speed (1978) extend this concept and give several
theorems covering general missing cell cases for both interaction and
41
additive models.
Dodge and Majumdar (1979) give an algorithm for
determining connectedness, and a geometric description of the concept
is offered by Burdick, Herr, O'Fallon, and O'Neill (1974).· Unconnected
portions of a design can be analyzed separately as shown by Searle
(1971).
It should be noted that all the above authors are in general
agreement that, whenever possible, hypotheses should not depend on
non-zero cell frequencies unless those frequencies are representative
of the population of interest.
2.4 Algorithms for Model Generation
This section will review algorithms and computational schemes
useful in the automatic generation of linear models by computer
programs and statistical packages.
The computational methods used to
obtain estimates are not of interest here.
Rather, the methods used in
generating the design matrix and in determining the hypotheses to be
tested are explored.
Kurkjian and Zelen (1962) developed a calculus for factorial
arrangements based on the use of "primitive elements," which are
actually vectors representing each factor of the design.
These
primitive vectors can represent either numberic or symbolic values.
generalization of the matrix right direct product, or Kronecker
product, is developed.
The operator, denoted @, performs the normal
Kronecker product on numeric elements but performs a "Symbolic Direct
Product" (SDP) on symbolic primitive vectors.
The results of SDP
operations are new symbolic elements, representing effects to be
included in the model.
observation per cell.
Consider a 2x3 factorial design with one
Let A
= [A1'
42
A2]' and B
= [B1'
B2' B3]'
A
represent the main effect parameter vectors.
The SDP operator, @, can
be used to generate the AB interaction parameters for this design as
follows:
-ABll-
A
@
B
=
AB12
AB13
AB21
AB22
AB23
The models developed by Kurkjian and Zelen are classic LTFR ANOVA
models with toe usual sum to zero constraints, that is, Model 1.
Bock
(1963) uses the Kurkjian and Zelen method to develop ANOVA computer
programs.
Bock does not deal with the overparameterization of the
ANOVA model by adding constraints.
Instead he separates the design
matrix into two sub-matrices, "one providing a column basis for the
model and the other representing linear functions of the parameters."
In Bock's example involving missing cells, the highest order
interaction terms from the missing cells are deleted from the model.
Models involving both nested and crossed factors are also discussed.
Bradley (1968) uses the essence model described in Chapter 1 to
construct a design matrix containing the first observation from each
non-empty cell.
He then uses a "marking" algorithm to eliminate
columns of the design matrixtbat are not independent.
The algorithm
is structured so that, in case of no missing cells, a Reference Cell
Model (Model 3) is obtained.
Starmer and Grizzle (1968) developed a multivariate general linear
model program called MGLM.
This program computed estimates of the
model parameters b and tested multivariate hypotheses of the form
43
~
.!: !!
=
Q,
or, for the univariate hypotheses,
(2.27)
Cb
= O.
Algorithms were provided to generate the design matrix for the
crossed-factor ANOVA model with two-way interactions and the
~
needed to test main effect and two-way interaction hypotheses.
matrices
The
model used for computation was the Sigma Restricted model (Model 2).
MGLM could be used for other designs as well, but the user was required
to "build" the design and C matrices in order to estimate the
parameters and test the desired hypotheses.
This could be a cumbersome
task for large or complex designs, but at least the user knew what
hypothesis was being tested!
Frane (1980) describes the algorithms used by the BMDI0V program.
This program also creates the design matrix for Model 2.
For other
model types, the user must build the design matrix in order to estimate
the desired parameters.
by building a
~
The user may specify a particular hypothesis
matrix and testing the hypothesis (2.27).
RUMMAGE (Bryce, Scott, and Carter, 1980) starts with a Cell Mean
Model (Model 4) and reparameterizes to a FR "main effect and
interaction" model.
The A main effect parameters in this new model are
described as A-I contrasts involving the levels of the A main effect.
It should be noted that the parameters consisting of these contrasts
are not definable unless it is known exactly what the "contrasts" are.
Fowlkes (1969) describes a series of operators that perform ANOVA
calculations by regression methods.
These operators "unbundle" the
tasks performed by Bock's program and introduce a simpler, more
44
flexible method of model definition based on specifying a model
equation.
The Statistical Analysis System (SAS) uses a similar model
definition equation in PROC GUM. a general linear models program
described in the SAS User's Guide (1979).
In the case of
classification variables, GUM generates the indicator variables for the
LTFR ANOVA design matrix (Model 0).
A generalized inverse is used to
find a solution to the normal equations, therefore the resulting
parameters are not well-defined.
Four types of sums of squares are
available for hypothesis testing, as shown in Table 2.2.
Schlater and Hemmerle (1966) present a computational method for
balanced, complete ANOVA analysis.
Although the method is not of
interest here, the specification of the model by a model equation and
the conversion of the factors and subscripts to binary arrays may be
useful in future algorithm development.
Bryce, Scott, and Carter
(1980) give a recursively generated matrix operator which, when given
vectors representing main effects, will generate the corresponding
interactions.
This method is similar to Kurkjian and Zelen's and to
the SAS PROC MATRIX "horizontal direct product."
45
CHAPTER 3
The Complete Model
This chapter will present the basic results for complete
classification designs, those with at least one observation in each
cell.
Exactly one observation per cell will be used in all the
examples and computations in this and the following chapters.
This
"essence model," described in section 2.1.3, has been shown to have
canonical parameter definitions equivalent to those of the
corresponding full model.
An
algorithm for generating the essence model design matrix and
the necessary sum to zero restrictions for a model containing an
arbitrary combination of LTFR ANOVA, Sigma Restricted, Reference Cell,
, ..';
and Cell Mean effects will be developed.
serve several important functions.
These design matrices will
First, they will facilitate the
generation of test data for the other algorithms to be developed in the
course of this research.
Second, as will be shown in Chapter 4, they
will serve as a starting point for dealing with the problem of missing
cells.
Finally, since the essence model design matrix defines the
pattern of the X matrix for each cell of the experimental design, it
can provide a method for transforming the cell indices of actual
observations into the rows of the X matrix of the full model for
computational purposes.
3.1 Generation of the Essence Model
Kurkjian and Zelen (1962) developed a calculus for factorial
arrangements to allow efficient ANOVA model computation for complete
factorial designs.
Although the computational rationale for usirtg
their method may have been abrogated by advances in computer software
and hardware, we have adopted their notation for LTFR ANOVA design
matrices. The notation will be expanded to include models containing
arbitrary combinations of all types of factors described in section
1.5.1.
A method of specifying these models and algorithms for
generating their design (X) and restrictions (R) matrices will be
developed and illustrated with examples.
3.1.1
Model Specifications
For the purposes of this research, a simple indexing and coding
scheme will be used to specify a model.
Actual implementation of these
algorithms will be done.in PROC MATRIX (SAS User's Guide, 1979).
This
implementation is for testing and demonstration purposes.
Factors in the model will be named A, B, C, ••• , H.
(The computer
program contains an artificial limit of eight factors.) For each
factor, the number of levels and the type of factor must be specified.
Factor types can be those described in Section 1.4.1:
LTFR ANOVA,
Sigma Restricted,
Reference Cell, and
Cell Mean.
The parameters to be included in the model are classified into
47
"effects" of one of the following forms:
MU -the "overall mean" or intercept,
Cell Mean Effects,
Main Effects, or
Interactions (abbreviated "IA").
Unless the user explicitly specifies otherwise, all main effects
and interactions will be included.
The overall mean term will be
included unless cell mean effects are present.
Cell mean effects
represent separate intercepts for each level of cell mean effect.
There are three ways the user may define the effects to be
included in the model:
1.
Default:
all appropriate main effects, lA, and intercept
effects are included as described above.
2.
Explicitly specify the effects.
3.
Specify the highest order of IA term to be included.
The above specifications are the input to Algorithm 1, which generates
the X matrix and, in the case of LTFR effects, the necessary sum to
zero restrictions on the parameters.
3.1.2
The Method of Kurkjian and Zelen
Using the notation of section 1.4.2, consider an n-way factorial
design with factors A, B, ••• , H.
Kurkjian and Zelen (1962) denote the
treatment effect of the (a b ••• h)th cell of the design as
tab ••• h
=
Aa + Bb +•••+ Hh + ABab + ACac +•••+ AB •••Hab ••• h·
The vector of treatments corresponding to each cell of the deSign is
represented as a sum of "primative element symbolic direct products:"
48
(3.1)
t.
n
l:
l:
k=1 j+=k
~1 @.!?,j2 @••• @
E.1 n
where
1) the ji are (0,1) indicators of the absence or presence of the
i th factor in a particular term;
2) j+ = j 1 + j 2 +•••+ jn and the summation
l:
goes through all
j+=k
combinations of ji where j+ is exactly k, i.e., k is the number
of factors involved in the effect;
3)
a is the primative element for factor A, ao =
.!A,
and
~1
= A,
with similar definitions for b, ••• , h' and
-'
4) the operator @ is defined as the right hand Kronecker product
for numeric vectors and as the symbolic direct product (SDP)
for primitive element vectors as described in section 2.4.
For example, consider a 2x3 factorial design.
The vector of
treatment effects (3.1) is
t = ~1 @~O + aO @ ~1 + ~1 @ bl
6xl
=~
Al
Al
= Al
A2
A2
A2
@ .!.3
+ .!.z
Bl
B2
+ B3
Bl
B2
B3
@ B +~ @ !
+
ABll
AB12
AB13
AB21
AB22
AB23
The sum to zero constraints on the LTFR ANOVA parameters are given
in terms of the primative vectors as
49
.!A'
A=
.!B'
B = •••
-.!.i '
[.!.
.!i
:0:
.!H'
@ 1)
H
=..Qi
:0:
0 for main effects,
@ .Qj' i
"*
j
= A,. B, ••• H
for 2 way interactions,
(3.2)
.!A' @!B @••• @!Ii
!A
@lB' @••• @!Ii
[!
@
!
@ ••• @
!!] = .QA
@
.9.B
@••• @ .QH
for n-way interactions.
@••• @
.!.H'
J
·e
In the 2x3 factorial example
12'~
= Al +
13 ',!?
=
@
.!.2
@
A2
=0
!I + !2 + !3
.!3-/
I
-J
A @B
= 0
100 100
1 0
o 0 I 001
1 0 1 0 1 0
o 100
=
13'
'-0 1 0 1 0 1
AB
J
ABU
ABl2
= AB13
ABII
AB21
+
+
+
+
+
AB21
AB22
AB23
AB12 + AB13
AB22 + AB23
= Q6·
The notation of Kurkjian and Zelen has three deficiencies with
respect to this research.
First, although (3.1) and (3.2) provide
compact descriptions of the treatment effects and restrictions, they do
not generate the! and R matices for the ANOVA model. Second, the SDP
does not lend itself to conventional matrix operations, thus making the
algorithms and computer programs more difficult to develop.
Third,
only LTFR models with ANOVA parameters and sum to zero restrictions are
50
In the next section, the method and notation of Kurkjian
considered.
and Zelen will be expanded and modified in order to ameliorate these
shortcomings.
3.1.3. Equivalence of Design Matrices
The factorial design described by the vector of treatment effects
(3.1) can be specified in matrix form as
(3.3)
!* =!*
i*
where
x*
= [x( :!.1), XC :!.2) , ••• , XC J v ) ],
B*' = [~:!.1)"
=
[~',
~!.2)', .•• , B(Jv )']',
!', ... ,!!.',
AB', ••• , AB ••• H']',
~i
= [jl, j2, ••• , jn] for i=l, ... v, and
v
= the number of terms in the summation of (3.1), which
corresponds to the number of terms in the model.
The following theorem provides an algorithm for generating the design
matrix to define the treatment effects in (3.1).
By adding an overall
mean term, the LTFR ANOVA model design matrix can be generated.
Theorem 1.
Assume an n-way LTFR ANOVA factorial design with the
treatment effects associated with each cell given by! in (3.1).
Let
X* in (3.3) be constructed for i = 1, ••• , v such that
(3.4) !C:!.i) = !.iIi
@ !.i2 i @••• @ !.ini,
where
!.iIi =
lA'
i f j 1=0,
!.ill =
.!A,
i f j 1=1, and
!i21 , ••• , !ini are defined similarly.
Then (1) the treatment effects t and t* are equal, and (2) the essence
51
design matrix X for the LTFR ANOVA model can be constructed by
augmenting (3.3) as follows:
where X( JO) =
.!..N
and 130 =
J,I.
Proof (1):
By definition
=
~1
t
•••
:!..V correspond to each term in (3.1) t so
v
E (~1 i
i=1
( 3.5)
t
(3.6)
t* = X* 13*
=
@ ,Ej
2i
@ ••• @ E,j n i
v
E X(:!..i)
i=1
) and
!C:!..i).
~J') represents the main effect or interaction parameter in the i th
--~
term, and can be written as:
where
!i Ii
if j 1=0,
1
=
B·l.~ = A i f j 1=1, and
~
ii 2i
.!1 2i'
are defined similarly.
Kurkjian ·and Zelen (1962) show that for expressions involving both
Kronecker direct product and symbolic direct productoperations t the
following relationship holds:
(!i 1
@
!:J 2
!:J 1 !j 1
@••• @ Xj n) (!11 @
@ Xj 2
!j 2
@••• @
!:J n
and thus (3.6) can be written as:
52
!j 2
!1 n ,
@ ••• @
!j n) =
(3.8)
If the i th term of (3.5) and (3.8) are equal for i=l, ••• v, then
t
= t *•
For j 1 = 0:
lA'
!j 1 =
!,j1
~ 1. = 1, so
1 S·1=IA.
!j1 =Aj
-J
.=->.
For j 1
=
!j 1
1:
=!'
!i 1 = .!A,
f3j 1 =
!'
so
= .=->.
IA
a· 1 = ,-:J
X. 1 f3.J 1
~
A = _A.
Identical results hold for j2, •••• jn.
QED ( 1).
Proof (2):
Adding an overall mean
(~)
to the treatments defined in (3.1) and
(3.3) results in the LTFR model (Model 1) defined in section 1.5.1 •.
E( Ye ) = ~.
+ !.=
[X(:.!.O)'
!*l 1-
-
~
f3*
Since ~ is involved in every cell, ~~O) must be a column of ones:
~ ~O) =
.!A @ .!B
@••• @
.!.H
=
.!N.
QED ( 2).
For example, in the 2x3 factorial design in 3.1.2, let J = [00,
10, 01, 11 l, so
53
f3( 00)
e< 10)
](01)
~ 11)
lJ
A @ 1
1 @B
A@B
=
110 100 1 0 000 0
1 100 100 100 0 0
1 1 000 100 1 000
101 100 000 100
1 0 1 0 1 0 0 0 0 01 0
1 0 100 1 000 001
lJ
Al
AZ
B1
BZ
B3
AB11
AB1Z
ABU
AB21
ABZZ
ABZ3
An algorithm to generate X[l] consists of two parts.
the index vectors
~1'
First,
••• J v are generated, and then these vectors
are used to generate! according to Theorem 1.
in the generation of the indices Ji.
The problem lies
Using the binary value of Oto
Zn-1 will, in fact, generate all terms in an n-way model with
interaction, but the order is not desirable.
In a three way factorial the order would be
~
= [000,
001, 010,
011, 100, 101, 110, 111] which would generate B'
=
[lJ, ~',
!',
AC', AB', ABC']'.
desirable.
BC',
!',
With more factors, the ordering becomes even less
A preferred ordering would be:
54
intercept or cell mean
effects, main effects, 2-way interactions, 3-way interactions, etc.
It
is also desirable to have a consistent ordering of effects within each
level of interaction.
Thus by generating
~1'
:!.2, ... ,
J:v in the proper
order, Theorem (1) can be used to create the desired X[ll matrix.
The method involves starting with the main effect indices and
iteratively concatenating the interaction terms in the proper sequence.
Creating Ji for any IA terms is simply a matter of adding all the J
vectors of the factors to be included in the interaction.
For example,
in the 4-way factorial, start with the overall mean and main effects.
Index
.
~
Effect
J"
_v
= 0000
l.l
~1
= 1000
A
+
A Pointer
= :!.z = 0100
B
+
B Pointer
J3 = 0010
C
+
C Pointer
J4 = 0001
D
+
D Pointer
Now take each main effect in turn and add i t to all effects after its
"pointer, " then add the result to the end of J and increment the
pointer to the last term of
~
involving that main effect.
~5
= ~1 +:!.z = 1100
AB
J6
= ~1 + ~3 = 1010
AC
:!.7 = ~1 + ~4 =
1001
AD
~8
=.:!.2+b =
0110
BC
J9
= :!.2 +
= 010r
~O
= b + ~4 = 0011
~4·
+
A Pointer
BD
+
B Pointer
CD
+
C Pointer
55
Repeat the process for three way interactions,
.:!B
= 1110
ABC
.:!12 = J 1 + .:!9
= 1101
ABC
.:!13 = .:!1 + J 10 = 1011
ACD
1-
A Pointer
.:!2 + .:!1O
BCD
1-
B Pointer
.:!11 = .:!1 +
J 14 =
=1°111
and, finally, four way interactions.
ABeD •
.:!15 = .:!1 + .:!14= 1111
r
As previously noted, Kurkjian and Zelen only provided for LTFR
ANOVA models in their notation.
Consider the submatrices of X and!
corresponding to a main effect, say A:
.:!1 = 100 ••• 0,
!(.:!1)
= !11
@
lB
~ .:!1)
= !11
@
1
@••• @ .!II,
@••• @
1,
where, for the LTFR ANOVA model,
In order to use a different main effect, the definitions of !11
and
ill must be changed. A subscript corresponding to each of the four
models described in section 1.5.1 can be added to
a.nd to
~.:!i)
(3.9)
~.:!i)
in Theorem 1
in (3.7) so that
~.:!i)
= ![m 1]j Ii
@ ![m 2 ]j2 i @••• @
X[mnUni and
~.:!i) = ![m 1 ]j Ii @ i[m 2 ]j2 i @. . . @ ![mn]jni ,
where m = 1, 2, 3, 4 corresponds to the definition of
main effects shown in Table 3.1.
56
Since IAterms are formed as direct products involving the appropriate
main effect submatrices, interactions involving different types of main
effects can be computed in a straightforward manner.
Models involving one or more terms with m=1 (LTFR) will require
restrictions in order to be well-defined. These models are discussed in
the next section.
3.1.4 Generation of LTFR Parameter Restrictions
Models involving LTFR ANOVA parameters will not be well-defined
unless the proper non-estimable restrictions are placed on the
parameters,usually the sum to zero restrictions described in section
1.5.1, Model 1.
For the 2x3 factorial example in (3.3) the
restrictions are
A
B
AB
Restrictions
o 1.2'
0
.2.6'
4
lJ
~.:!.1) =
(3.10)
!
= R(.:!.2) = 0
0
0
0
0
.!3.!3
~.:!.3) = 0
0
0
.!.3'
0
0
0
0
0
..!.3 '
= 0
B+ = 0
..!.3' .2.6'
AB+b = 0, b=l, 2, 3
ABa + = 0, a=l, 2
whereJi indexes the absence or presence of each factor with a 0 or 1
and R(.:!.i) forms the sum to zero restrictions for the effect involving
the factors indexed by J1.
Kurkjian and Zelen represent each main effect restriction as a row
of ones for that effect with all other elements zero as shown in
and R(J2) of (3.10).
~~1)
Interaction restrictions are represented as
57
direct products of identity and unity matrices that conform to the
appropriate portion of the X matrix.
.!2'
o
@
For example, in (3.10)
.!.3
@ .!.3'
5x6
J
1-
In general, Kurkjian and Zelen form the non-zero portion of the k-way
interaction restrictions as
-1' @ I
@I
@••• @ 1-
I
@ l' @ I
@••• @ I
I
@I
@••• @ l'
@I
where each submatrix conforms to the k parameters involved in the
interaction.
In the AxBxCxD factorial example previously shown,
the four-way interaction would be:
.!A'
o
(3.11)
!A
@ lB @ .!.C
@In
@~' @.!.C @In
.!.A @.!.3
@ .!.C' @
IA @ Iu
@
-
-P
IC
-
@
In
.!.D'
I
_I
These restrictions are not, however, independent.
The first row of
(3.11) represents restrictions:
ABCD+bcd
=0
for all b, c, d.
In all subsequent restrictions, the rows involving the last level
of A can be omitted, since this restriction will be a linear
58
combination of previous restrictions.
Thus the set of independent sum
to zero restrictions for the AxBxCxD LTFR factorial can be given by
@.!.C
@.!.D
~
@lB' @.!.C
@.!.D
b\
@~
@lc
~
@!-.B
@~C
_A
1 ' @.!.B
(3.12)
where
~n
=
[.!.n-1 , 01.
,
@.!.n
@.!D'
The generalization to k-way interactions is
If the i th factor in (3.12) is not a LTFR factor then
straightforward.
the i th row of (3.12) is deleted and the i th column remains .!.n in all
other rows.
Independent, sum to zero restrictions for models generated by
Theorem 1 can be constructed for the G effects in the model containing
LTFR factors as:
(3.13)
R
=
txq
where !C..:!g)
~
= zl,
=
[,2, ,2, ••• ,
!C~),
"" ,21.
z2, ••• , zk index the k elements of (dg) that are 1 for a k th
order interaction.
!11 @R12 @• • • @R1k-/
(3.14)
R(~)
=
•
~1
@!k2 @•• ·@!kk
for g=l to G.
59
Consider the mth factor indexed by
~
with nm levels:
If the mth factor is not a LTFR factor, then
Rim ... .!nm for all i and
Rutl
•••
~k
are deleted.
If the mth factor is a LTFR factor then
!im
= .!nm
= lnm'
< m,
if i
if i
= m,
= ~m = [lnm-l'
3.1.5
or
Q] if i ) m.
Generation of X and R Matrices
The notation of section 3.1.1 is used to specify models containing
the type of effects described in section 1.5.1.
The methods of .
sections 3.1.3 and 3.1.4 are then applied in order to generate! and R
respectively.
discussion.
Models containing LTFR or cell mean effects warrant some
All submatrices involving LTFR factors will require sum to
zero restrictions.
If; no LTFR (Modell) main effects are included in
the model, the matrix
~
is not generated and the model is FR.
In models involving cell mean factors, the overall mean ( il)
parameter is not included in the model.
Instead, the highest order
interactions among all cell mean (Model 4) factors. are generated.
example, if factors iI, i2' ""
ik with numbers of levels L1' L2' ""
Lk were specified as cell mean factors, then
model.
For
The number of parameters in
~
~
would be included in the
is L1xL2x ••• xL3'
Interaction
between these cell means and other effects are not included in the
model automatically but can be added by explicitly specifying the
interaction terms to be included as described in section 3.1.1.
60
The following algorithm describes the process in terms of INFL t
the informal programming language discussed in section 1.4.4.
Algorithm 1.
Input:
Generation of Essence Model Design Matrices
F = (NIt N2t .'.t NH) vector of levels for each factor.
fxn
T = (TIt T2t ".t TH) type of each factor:
fxn
1 =
2 =
3 =
4 =
\
"-
LTFR ANOVA t
Sigma Restricted t
Reference Cell t
Cell Mean.
IA = highest order IA (default = n).
MODEL =
[~lt'.'t
Jvl explicit indicators for each term.
(Default = all terms in design)
1.
If MODEL is explicitly defined then:
1t.
J = Model.
Otherwise:
Generate index for ~ as ~O = [jl, j2 t ' •• t jnl t where
If.
ji = 1 if ith factor is a cell mean effect and
ji = 0 otherwise.
2f.
d,j =!i' for all j=l t .. "tn where Tj
3f.
For IA = 2 to H:
1_ 1.
:f.
4.
Generate J for all IA using section 3.1.3.
2.
Generate main effect design matrices ![mjl using Table 3.1.
3.
Initialize X and R to null matrices COxO).
4.
For i = ITO vi
1.
Use
(~)
and main effects matrices X[m] from 5 to
generate X(dj) using Theorem 1 as modified by (3.9).
61
5.
Index effects containing LTFR terms and create!g for g=1 ••• G.
6.
For g = 1 TO G:
1.
Use
2.
R =
~
to generate
,!(~)
by (3.13) and (3.14).
R
Table 3.1
Main Effect Design Matrices and Parameters
T
Design Ma trix
Name
Main Effect Parameter
A
X[m]j 1
-~-I~
LTFR-AN-O-VA----1----:;A-r---
2
Sigma Restricted
-
.!A-1
-lA-1'
-
-
A x (A-1)
3
Reference Cell
-
°A-1 '
.!A-1 '
-
-
A x (A-1)
4
Cell Mean
(a x b
... h)
.!a xbx ••• xh
62
~11
••• 1, ~11 ••• 2, ••• ,
~ab ••• h
3.2 Primary Parameter Definitions
Given a particular X matrix for a FR model, or an X matrix and
appropriate restriction matrix
~
for a LTFR model, it is desirable to
define the 'parameters of the model in an easily intepretable manner.
This section will introduce the concept of using a "definitional
matrix" to display parameter definitions.
This matrix is also utilized
by the algorithm that generates the definitions and by the theorems
that prove that the results of the algorithm are equivalent to the
canonical definitions.
3.2.1 Definitional Matrix
Let the essence linear model be given by
( 3.15)
E( Ye ) =
Nx1
where E(!e) represents the N cell means of the design.
The canonical definition of 8 for a FR model is given by
Helms (1981) shows that an equivalent definition A can be obtained by
performing row operations on [!' !N] such that:
[! lJ
row
operations
+
!q
A
qxN
o
Q
sxN
where s
= N -q.
Performing such row operations is equivalent to premultiplying [Xl] by
a matrix such that:
63
[X .!)
A
AX
(3.17)
=
QX .Q
Q
AX
=
QX=
A
=
.!.q
A
0
.Q
I is a necessary and sufficient condition for
Q represents
where q<N.
~
<=)
Acby (2.4).
the s restrictions on the rows of X in those cases
For example, in an additive model
.Q
would represent the
linear combinations of cell means that define the interactions as zero.
In a fully parameterized model q=N and Q is the null matrix.
Definition:
( 3.18)
Consider a matrix D of the form:
=
D
8
E( Y)
-T
qxq
qxN
0
-
A
Q
sxN
where T is arbitrary, s = q -Rank(X), and
N
.•.. ,;..
= number
of cells in the design •
D is defined as the primary definitional matrix of the linear model
(3.15).
of
Each row of T indicates a linear combination of the elements
! and the corresponding row of A defines that linear combination in
terms of cell means
E(~):
The rationale for this definition will be illustrated using the
special case where!
= ,!,q.
Rewrite the essence linear model (3.15) as
64
X8 - E(.!) = 0
or
[x I]
=0
Premultiplying by
as described in (3.11) gives:
A
[! .!.]
-8
A
I
A
-8
0
g
E(.!)
=
g
E( Y)
= 0
or
-~
+
A E( Y)
g
=0
E(.!) = 0
By labeling the columns of the definitional matrix (3.18) with the
elements of
~
and the cell means E(:!), we provide a convenient notation
for presenting definitions of linear models parameters.
65
a
~Y)
[X,
I]
Using this same notation,
[X I]
-a
=0
represents
or
E( Y)
A definition of
!
can then be obtained from the design matrix X as
a ~Y)
!]
. [X
A
-a
~Y)
I
A
=
B
= D
~
0
This notation is useful for computer generated output since the
elements of
!
and E(!) can be used as column headings.
It will be
valuable later in defining secondary parameters and in computing
alternative definitions for the case of missing cells.
3.2.2 Normalized Gauss-Jordan Reduction
Helms (1981) uses Gram-Schmidt orthonormalization to compute the
definitional matrix D.
to
~c'
Although the resulting definition is equivalent
it is often just as complicated and non-intuitive.
A method
that preserves the integer properties of the factorial design matrix
will result in simpler definitions, for example, the Gauss-Jordan
reduction of a matrix by row operators:
G-J
[!!]
G
A
-+-
o g
where G is a diagonal matrix.
!'
By modifying this method such that
the desired result can be obtained.
66
E. =
This method will be referred to
as Normalized Gauss-Jordan (NGJ) reduction and will be used extensively
in subsequent algorithms.
Definition. An elementary NGJ matrix of order N and index j,
designated as ~, is~ except for the jth column, which is~.
Nx1 column vector
~
with
~j
* 0,
.!!!j exists such that
.~. ~j = ~j'
where~'
=(0 ••• 0 ••• 1 ... 0), as defined in section 1.4. NGJ
reduction
of~O)
= [!'
..!.I, where Rank.(!) = q, consists of
premultiplying by ·a series of elementary NGJ matrices such that:
.!:!:i E(j-1) = ~j) for j=1, "., q,
(3.19)
= 1-
Let
~
•••
!!l
=
~
and
~q)
Iq
A
o
,g
= D=
.!.q
A
o
.Q
The Normalized Gauss-Jordan reduction
NGJ
[X,
II
+
is equivalent to
67
Foran
where M =
A
AX = .!.' and
QX=O.
!'!.:i can be computed from (3.19) by writing
1
1
...
1
0
0
mlj
m2j
dlj
d2j
mjj
djj
. 1.
dNj
0
1
=
1
mNj
J L
dlj + mljdjj
0,
d2j + ID2jdjj
0,
e
J
,
so
(3.20)
mij
= -dij/djj for i
*j
and mjj = l/djj.
Note that if the pivot element djj
= 0,
be performed and the algorithm fails.
68
the computation (3.20) cannot
Interchanging the jth and k th
rows of,!! j -1) where k>j and dkj* 0 will allow the algorithm to
continue.
~
If no dkj :;. 0 for k>j exists the jth column of ~, say
~
,will not equal
•
~
In this case
will be set to
.!N,
and the
algorithm will continue. Failure of the algorithm in this manner
indicates that A does not describe a well-defined vector of parameters,
as will be shown later in Theorem 2.
The Normalized Gauss-Jordan
algorithm is described below.
Algorithm 2.
Input:
1.
D
Normalized Gauss-Jordan Reduction
= [!l]
For j = 1 to q:
= IN·
1.
~
2.
If any dkj
3
D
:f.
0 for k = j, ""
1.
Find first dkj :;. O.
2.
-J
3•
~
4.
1lIjj = l/rnjj'
=~
q.
(interchange rows if k :;. j)
d· .. d,.•
oJ'-
= -dj / d j j
•
D.
The results of this algorithm are
if a non-zero pivot can be found at each stage of computation, and
[! .!.]
ALG 2
..
T
A
69
where T
3.2.3
*I
is upper triangular if one or more zero pivots are found.
Conditions for Equivalence of Definitions
The following theorems can be used in conjunction with Algorithm 2
in order to compute definitions which are equivalent to the canonical
definitions but that are» in many cases» more intuitive and preserve
the integer properties of the design matrix.
The first theorem applies
to unrestricted models and provides a method for testing whether a
model is FR, as well as providing a definition of the model parameters
if that is the case.
Theorem 2.
Given the design matrix! for the essence linear model
(3.15), apply Algorithm 2 as follows:
ALG 2
(3.21)
+
[! .!q]
Nxq
-T
qxq
A
qxN
0
Q
sxN
sxq
The definition formed by matrix A is equivalent to the canonical
definition
~
non-estimable restrictions» where
Proof:
g
given in (3.16) iff T=.!.
Q!
=
represents independent,
Q.
Applying Algorithm 2 is equivalent to multiplying
A
~[! .!]
=
g
AX
A
QX
.2
[X .!] =
-
A necessary and sufficient condition for A
AX
=T
=
I.
QX
= 0,
Note that if T
(=)
Ac is given in (2.4) as
so Q is not in the solution space of X.
*.!.,
this implies that the elements of
70
~
QED.
are not
linearly independent and that the model is LTFR.
In the LTFR case,
non-estimable restrictions R as described in (2.7) must be placed on
the parameters such that
-1
X'X
R'
RX
0
=
( 3.22)
exists.
The canonical definitions of the essence model is then given by
(2.8) as:
(3.23)
Dc:!
= ~1 !'
E( Y)
= Ac
E(~).
The following theorem shows that Algorithm 2 results in parameter
definitions that are equivalent to the canonical definition.
Theorem 3.
Given the design matrix! for the LTFR models of the
}-
form shown in ( 3.15), add non-estimable restrictions R as shown in
Apply Algorithm 2 as follows:
(3.22).
X
I
ALG 2
.!q
A-I
0
~
-+
R 0
The definition formed by
given by (3.23).
!
is equivalent to the canonical definition
If Q exists, it represents independent,
non-estimable restrictions of the form QX
Proof:
From (3.22)
~1
~2
X'X
R'
~
~4
R
o
== I
71
= £.
so ~1 !'! + !2 R == ,!q.
Applying Algorithm 2 is equivalent to
A ~
X I
==
0
.9.
R
0
where AX+A2R == .! and QX = .Q..
Using the notation of (2.15), define two secondary parameter
vectors equal to the primary parameter vector S:
Dl:
~1=!=!E(:!)·
D2:
~2=!=AcE(.!)=ZIX'E(Y).
By (2.15), a necessary and sufficient condition for equivalence of ..§.21
and ...§.22 is
_-_c
_ = (A
_ - _c
__
(A
A )X
A )XR-R.
Set AX = .!-A2!. and Ad = ~1 !'! =
.! - !2
~.
Then (! - !c)! = .!- !~ - .! + ~~ = ~~ - !~
and
(~2!
-
!2.!9 = Z2 R ~-~ - !~ ~-R
QED.
The following examples illustrate Algorithms 1 and 2 by generating
a variety of models for a 3x4 factorial design.
For each model the X
and R matrices generated by Algorithm 1 are displayed.
Example 3.1 is a LTFR ANOVA without the necessary parameter
restrictions to make the model well-defined.
The definitional matrix
(3.18) resulting from the application of Algorithm 2 is shown.
Note
that T*I and A is not an equivalent definition to the canonical
definition by Theorem 2.
This result is not unexpected, since the
model is not well defined and has no canonical definition.
72
Example 3.2 adds the sum to zero restrictions to the LTFR ANOVA
parameters, resulting in a well defined model.
In examples where the
conditions of Theorem 2 or Theorem 3 are met (i.e., well defined
models), the T
=!
portion of the definitonal matrix is omitted and
only! and Q are displayed.
Example 3.3 generates a LTFR Restricted
ANOVA model with interactions for the same 3x4 factorial design.
Consider the case where one of the levels of the A factor in the
above example represents a placebo or standard treatment.
Example 3.4
presents a Reference Cell by LTFR Restricted ANOVA model for such a
design.
Note that sum to zero restrictions are required only for
factor B, the LTFRRestricted ANOVA factor.
73
1l:XAPlPLf 1.1 -
Yll
Y13
Y14
121
,.
0
0
0
0
1
1
1
1
1
1
1
1
IIU
Al
0
0
1
yn
Y2.1
Y24
111
Yl2
Y33
114
A3
A2
1
1
1
1
0
0
0
0
0
0
0
0
1
1
1
112
!
Al
"U
.
ADOITlVP. 1.t'TR &NOVA 11/0 RFSTRICTJONS (II0DEL 0t
Bl
B2
B1
B4
1
1
1
1
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
0
0
0
0
1
0
0
0
1
0
0
0
1
&2
&J
Bl
112
R1
84
1
0
1
0
1
0
0
1
0
1
0
1
0
1
1
1
1
1
0
0
0
0
0
0
0
0
/)
II
0
I}
........
~
,.
.!!
""&2Al
11 =
A J.
111
112
OJ
"4
,g =
P"STII
Fr.STIl
1.l~!iTP
l'rSTP
e
ROWI
ROll2
81
82
81
04
Yl1
T12
T13
Tl11
T21
T22
12l
124
III
112
III
In
0
0
0
0
0
0
0
1
0
1
0
0
0
-1
-1
-1
0
-1
1
0
1
1
0
0
1
0
0
0
-1
0
1
0
0
0
0
0
0
1
-1
1
0
0
0
- 1
-1
- 1
0
0
0
1
-1
0
0
0
-1
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
-1
-1
0
0
0
0
0
It
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
-1
0
0
0
n
()
0
0
1
1
1
1
0
0
-1
0
- 1
0
/)
0
0
0
I}
-1
0
0
0
0
0
0
0
1
0
0
0
0
-1
-1
-1
0
1
0
0
IIU
Al
A2
Al
1
0
0
0
0
0
0
0
0
1
0
0
1
-1
-1
0
0
0
0
1
II
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
n
/)
0
0
0
0
0
I}
0
e
0
0
- 1
()
1
0
0
0
0
0
0
0
0
0
0
0
0
O·
0
0
0
1
0
0
0
1
0
0
0
0
•
.
..'"'
..
00000000
00000-
'"
00000000
0000-0
....
00000000
000-00
'"
'"...
.
'".............
"",
~
.o
!!
OOD"~"
..
....
00000000
00-000
.......
00000000
0"0000
.
~
oi
a
~~~~oooo
..
000"000"0.00"
lD
.
-...... .., .. "
...... ,.,
-•. , ..""
'". , .., .. " ..
....
N
::l
... ...
......t:i
....
o
oo'-ooo"ooo ...
'"
g
lD
III
.:a
.
,.
...
.,.".. •
.·..
..-. ....
.
. -..
0"000'-000"00
....lD
"000"000'-000
-
OOOOOOOg .........
...
..,
...
. ..
oi
•
.,
:0
I
I II
I
,. Ct
.... ,.,
....
,t"\I ....
-... . -""
or
~~~~
........... 0 0 0 0 0 0 0 0
~
I'
c'-co"a
I
I
I'
" ..
'-00"'00
I
I
..
I' I
~
0000000'"
oooo~~
~
000000'-0
QOOO~O
.. oo
~ooooo
III
<=00
III
...
. . C"'t..,-~, .... N,..,:r .. ~,..#
. . . . ~ . . ""lIt"lfC'l •. ..,,..,...,,.,..,
...... _ ~ ... ,...-,.,. ,.,. >.II 1M ... ,"
....
""
..,.,
,
I
I
.oou • • • •
.....
-,
111
,
I
'I
0000 ........ 0000
..
I
a,a _ • • •
...
c
........................
oo.-go"
...
...
Q
Q
~-
.00.0.· ••
..
~
"00000
,.,"""" guo
,
lD.
c·
00000000
,,,
....
o~ooo
..
00'00'-000
III
000000
.
.
=
. ...
.................
........
.................
", .. "
....
....
0 0 .. 0 0 0 0 0
000000
C'-OQUC.UU
OQU-UQU
oo~oooo
,
75
000000
111111111111111111
..,...
....,
..
.
..,
.
000"0000
lit
00000000"000
III
...
00"00000
.•., ......
" ,.., ......
" ,.., .." ,.. .. ".. , .. -_ ....
, ......, ..,-""
..,,., ......
",..., ....
" .., ......
" "-",,
.... .-
.
.
...... .., ......
, ..... "" ..
,.,. ......
" , ..,-,
,-",
",-"
....,. ......
" ......
......
""
" ,....
" ,,-,---,
" ,.., ......
" , ......
......................
t.,
N ...
III
N~NNN.NNN.NNN.~~~N
~
,~
~"'"
. . . . . . . . . . . . . . . . . , ...... , .
0000000"0000
I'
I
I
I'
I
~~N~NC~.NNt~.~C~".~~~N~
..,
...
.
.
N
N
N
III
III
...•
U
...
..
000000"00000
...
III
•...
..
1II
...a
.•
.•..•.
.,'"'
o
..,
,.,
>4
.
.
.
f'1
0000"00"-
III
00000"000000
N
N
000=-===0=00
..
N
. . . . . . . . . . . . . . . . . . . . . . . . . _,~
-.,
I I
I I
I
N
I
-....
......
.
..-
I
000"00=0=0==
I
t'lf N \Ill f"I C"lt
~
I
I
.....
"\U""
,.. ......
"...................
.., .........
" " -, .......
" , ....
,,, ..",
" __.......
.."...... , " ... ..., .......
..... , ..
" , , .., ""
..............
, " .,.
I
I
I
N. N
~~ ..
.,
'lID f"lI.'It \'lI C"I •
I
~~.~~~.~~".N~·~~~NN.~
,~~
~
I
N
r-.
0 .. = 0 0 0 0 0 0 0 0 =
I
I
~ ~ liQ ~" N
I
I
C"I r"If
•
...~,
...
I
I~
I
I
#
:"'tIl
...
~
'" N \0. '1#1 ('II
~
:"f
~
...
...
I
I
~~~N.NN~e~N~~4."'.NN~
'-OQOOOOOOOQC
...
..
''I
00"'000-0
I
....
...m'"'
c
...
.
C"OQOOOO
..
N
;'f
lD
'-=00'-000-000
..
o"oaoooo
.
.
O'-OQOOQO
'-0000000
nt
"0000000
.,
00=000==
III
"'f"'lII"'''''N,.,
JIll
C''''4'''.
,..,.., ... ,.,
,.."" ...
•
"'
.........
...
I
I
I
I
I"
...
........ 1 ........ '
I
-,
, .. ...........
,,,,, ..,-",,,--,---,, ...... --,.,-,,,-,
~~~~~~~~4~~~·~~.~~~.~
..
... .. ... ...
•
I
.. I
'I
•
, " , .. "
:"'Ill o,Q
"lI N
... ' - "
,
I
I " ' ' ' ' .......... I .
--, .. " , ... ,
"w ""
~ "'
.Q
,
,"' ,
I
I
t
"
~
~
,
#' ~
"
,
I
~ #
,
1
-...
........
...
, , "..., , , ,,-"..........
" ,:r , " '.'-, '" '" ..."
........
,
...
........... - .... 'I I.· ... ... --I
=.. (..
I
t
I
N:"'4..g .Q'.Q
C'I ~ ~
="t'
"N~.V'l.QIr-~
a:aaa":II.:a.
OOUUUOUU
•
. . . . . . 111 . . .
.•..
76
~
~
I
....... _- :""f,.,.-N..,#
...
~,.,
~
"'I~..,...,..,
~-."lI~~~~~~~~~~~w~~
C~44~~~~44~444.4.444
........... uaQuuuuQ
. . . . . . . . ' .. r-t·net
' 1 1 1 - .....
I
............
~ ~
ID
...,
I
...
...
..,
,.,
I
""" ,,, ...,-,,,-,
,,,,,, ..............
.•.... , ..., ...........
.
III
.......................
~·O~~N~N.~~~N~~N.N~~#
ID
,.,.
.
It
I
.
... , ......
" .....
" , .........
" , , " _"',
............
,-"
, ., ........
. .......................
... .....
,_ ..... """ , .........
" ., .., .." ..., .......
" " " ... " " , .............. " " ,
N
00"'000000000
o il
C
...
I
I
•
,.,
lD
...
I
.-
00"0000-
ID
,.,
I
NNN~.NNN."'NN.N"'NN~~~
III
...,.
t
NNN~N.NNN.NNN.NN~N~~
............... ,
000"000'-
III
tJ
~
...
..
.
,.,"
,.,,.,
..
...
o
•
...,.,
.
00'"
0000000000"'0
00"00000000." ••
0 0 0 0 0 0 0 0 0 .... 0 0
00000000"'000
..,
..
...
00"
all
>01
>01
..
..
U
OW
U
III
II<
IiI:l
•
0 0 0 0 0 0 .... 0 0 0 0 0
I
0.00000
N'"
...
00000"'000000
...
0"'000-000-00
..,
~
...
-000-000-000
N
"
C
000
0 0 0 0 ........ - 0 0 0 0
N
N
0000
"
"."
"
~~~-
I
,
I
"
~~
,
I
.. , .,
~~~
,
................
, .....
" .... , "
..,
~
-~
I
I
~
~~
~
I
........ " ,
~
_
,
"1
'"
... ", .... ,
.........
",
~
~M
::r
....
..
I
.. ,
.,.
• • • • • ::r.::t::rt:l::r::r::r::r::r
~
...
.
., ,
~
...
C
..
;::.;:;.;::.;::.;::.
...............
""
I
. . . . . . . . . . t:I
-00
a:l
..,
0000
0.00000" • • • 0000
........." "
W
00000000---'"
I
..f¥t
..
I
I I
...
...
-
I
0.00000
N....
~
..
....
I
N,
0000-0000000
0 0 ... 0 0 0 - 0 0 0 - 0
~
..
,.,
""
...
0 0 0 .... 0 0 0 - 0 0 0 -
N
"
•
0"00000""""0000
. . . . ,., ...
...
~
I
1.1
N"
,."
N
Qo
JC
C
~~~
",
..
IlOl
I
.... ",
~
...
-.....
lQ
III
I
00"00000000."".
C
IlOl
. . , . . ...,
...,.,_
I
00.00000000·"".
~
N
lQ
.. .
Z
N""
,.,...
I
• I
.......
0 0 0 0 0 0 0 .... 0 0 0 0
lQ
N
.......",
... ,., ..
I
~
~
N
,.,....
...
"" ,.,
00.00000000."."
"
all
... ......
~
N
all
>01
,.,...
."
..
. ..,..
,.,
.,
all
all
..
III
lAo
00000000000'"
I'
I
~
I
I
I
"
~
-
I
000
C
....Nl"'l . . . . N " " .
.................................
~
000
C"liI,..,,#.-.~f91=t ...
......
......
~
L'f {"",,, N
~
~_'''C-')
N'""'::J'
COl
W w
N'''''''''''''''''''''''''
11;1 ;l;l
UOU
.........
111111\11
"
II
77
11;1
lQ lQ
..
"
:11:11:11
,., r""1 r-., t""'1
~>o
N""'" ' ... ",,::r
IIICCa:lll.llQlI.I
III
..
~
..,
3.3
Secondary Parameter Definitions
In this section the concept of a definitional matrix will be
expanded to include secondary parameters.
The conditions for
equivalence of definitions will be set forth and a method of using
Algorithm 2 to compute equivalent definitions will be described.
Finally, a method for generating secondary parameter matrices will be
given.
3.3.1
0 Definitional Matrix
Let
o = C
6 be a secondary parameter of interest for the
axq qX 1
ax1
possibly LTFR model given by
=0
Then
[! .!.]
-!
CA
EC Y)
.9.
If CA and
C
.9. exist such that
CA
CAX
CA
-6
QX
Q
EC Y)
=
CAX
= 0
= ~ and
-6
=0
or
o = C6
... CA
.9. EC.!)
=
EC:!)
o.
78
QX =
.9-
then
This can be written as a secondary definitional matrix
F.( Y)
- l3
( 3.24)
-
C CA
e
F.( Y)
I
Ae
0
~
=
0
~
-
This matrix can be be used to determine the definability of a
particular secondary ·parameter e and to provide a. definition which is
equivalent to the canonical definition, as will be shown in the next
section.
3.3.2
Conditions for Equivalence of Secondary Parameters
As was noted in Section 2.1.4, a model with an undefined primary
parameter
i may
p~ve
(1980) shows that if
a well-defined secondary
parameter~.
!
= CA
is well defined then e
Helms
will also be well
defined.· The following theorem gives necessary and sufficient
conditions for
to be well defined and provides a method for
~
determining the definition using Algorithm 2.
Theorem 4.
Apply Algorithm 2 to a linear model with or without
restrictions:
1)
X
I
R
0
- l3
ALG 2
F.( Y)
T
A
0
Q
=D
-+-
or
2)
I!, .!.]
79
For the secondary parameter vector 0=C6, form the definitional matrix
C 0
-. 13
De =
CT
CA
D=
109
Then the parameter e specified by
is well-defined and Dl is equivalent to the canonical definition given
in (2.14)
as
e = g!'!)-!' E(~)
Dc:
iff CT = C.
Proof:
Applying
Algorithm 2 is equivalent to multiplying by
A A2
g
0
so
C 0
E.e =
0
I
A
g
~
X I
0
R 0
g AX+AzR)
CA
QX
g
=
By (2.15), a necessary and sufficient condition for equivalence of
and
~
is
(.!!1 -.!!2 ) != (.!!1 -.!!2) XR-!'
( 3.25)
where HI =CA and
.!!2 = g X'!)-!'.
80
~1
From section 2.1.4,
o = ~ X'!)-X'F.(~')
is a definition iff CX-X
= C.
One generalized inverse of X is
(!'!)-!', so
= c.
f.(X'!)-!'X
If CT
= f'
~AX
CAX
then
+ ~2.!9 = f'
=f -
CA2~'
So (.!!1 - .!!2)!
=
HI! -.!!~
= CAX -
=f -
g. X'X)-!'!
CAl! -
f
= -CAZ,!, and
-CA~
= -CA,R-RR.
QED ( 1) •
If no restrictions are placed on!, then the condition for equivalence
reduces to
HIX = H2X and.! =.AX.
Substituting using (3.25),
CAX =
~!'!)-!'! =
f
iff CAX = C.
3.3.3
QED (2).
Complete Definitional Matrix
It will be useful to display the definitions of the secondary
parameters in terms of both the primary parameters and the cell means.
The following complete definitional matrix or simply "definitional
matrix" combines both primary and secondary definitions as follows:
81
e
a
.!'.e CTa
axa axq
( 3.26) D
=
0
Ta
qxq
0
0
E( Y)
-
Ae
axN
Aa
qxN
.9.
sxn
/
where
.9.
represents non-estimable restrictions such that QX
Ta = la, then Aa is equivalent to Aac for
E(
Y) = xa.
Ae is equivalent to Aec for e
=
!e.
Ce and .!'.e
=
by the following algorithm.
Complete Definitional Matrix
Input:
Te ~e
D=
0
-C
axq
from Algorithm 2
Q
= secondary
parameter matrix
1.
D
2.
=
Te ..
C
0
!q
0
0
.!s
.!a +
D
=
CTa
CAa
.!'.e
~e
0
0
J
CTe - C
82
= Q.
If
If CTe = ~, then
D can be constructed
3.
!e
D
=
0
D
o
3.3.4
Generation of C Matrices
was previously noted, it is sometimes desirable to fit one
As
linear
mode~and
have parameters of interest that are defined in terms
of. a different but isomorphic model.
The following theorem provides a
method for constructing the C matrix necessary for defining the
secondary parameters of interest in terms of the primary parameters of
the model fitted.
Theorem 5.
Consider the two isomorphic models described in
Section 2.2.1.:
M1:
E(Y)
= Xl
S1' and
M2:
E( Y)
= X2
S2'
Assume model M1 is fitted and the secondary parameter of interest is
well-defined in terms of model M2 primary parameters as:
~2 =
De2:
An
.£z !2
= A e2 E(.!)
equivalent definition to De2 in terms of model M1 is
~1 = ~1 !1 = ~1 AS1 E(.!)= Ae1 E(.!)
Del:
= ~e2
where ~1
!1·
Proof: From (2.19)
.!!21
!2
=
!2-
= !!21
!l +
(!2-
!2 -
.9
Z and
!1
where Z is arbitrary and !2- is any generalized inverse of
83
!2.
One such inverse is
(!l'~)-!2"
so letting!
= Q gives
= (!2'!2)-!2'X1·
H21
The canonical definition of !2 is given by (2.14) as
~
DC2 :
=~
(!2'!2) -
= A02
!2' !1
~1
!1 ~1
..
= ~1 !1
= ~l.
Thus, the
paramete~s
Q.E.D.
of interest
~
can be defined in terms of cell
means or as selected elements of one of the models described in Section
1.5.1.
The C1 matrix needed to define the parameters of interest in
terms of the model fitted can then be generated using Theorem 5.
The secondary parameter matrix
conforming toE(Y).
~2
can be any definitional matrix
For example, the primary or secondary definition
AS2 or !02 could be used to convert the parameters
algorithm to generate
~2
~ to~.
An
for the secondary parameters described in
Section 1.5.2 can now be developed.
The Cell Mean model (Model 4) is
used for this purpose because the primary parameters are simply the
expected cell means
!(!)
=
~:
~
![4j and the canonical definition is
= ~ ![ 4] = ~ !<!)
=
!02
!< Y) •
By applying Theorem 5, the parameters of interest can be expressed in
terms of Model 1 as
D1 :
01
=~
!1
! =~
!1 AS1 E( Y)
= !01 !(!)
For the sake of generality, the algorithm will allow for
interactions between two different types of effects.
Consider the
LTRF ANOVA and Reference Cell by LTFR ANOVA models shown in
84
Examples 3.3 and 3.4. Note that although the "B effect" is LTFR ANOVA
in both models, the definitions of these B effects are not equivalent
due to the fact that different types of A effects are used in the
models.
This implies that the algorithm must not only "know" what
factors are in a particular effect, but must also "know" the types of
the other factors in the design.
Let the secondary parameter vector of interest defined in (2.13)
be' partitioned into T sections
(3.27)
e =£
~[4]
=
=
•
2 for DFM effect, or
3 for Reference Cell effect; and
3) Each
section of C is generated by
(3.29)
<XJt, !t)
= £[ji ,kl]t
@ ••• @ £Un,kn]t'
where £[ji,ki]t is defined by Table 3.2 for i=1 to n, the number of
factors in the design.
85
Generation of C Matrices
Algorithm~.
Input:
Xl - a design matrix for an n~way experimental design as
derived in Algorithm 1.
Nxq
- a matrix of indicators where each row represents
J
the presence or absence of the factors in .§: t) of
(3.27).
Txn
K - a matrix of types of secondary effects desired as
Txn
described in Section 1.5.2.
or
K
- if
all effects are the same type within ..§r t).
Tx1
or
K - if the same pattern of factor types repeats across
1xn
all ~t).
or
K - if all effects are the same type throughout 0.
1x1
=1
1.
If the rows of K
2.
If the columns of K
3.
..22 = null
4.
For t
=1
then!
=1
=!
then!
@!T.
=! @In'.
matrix (OxO).
to T:
1.
Generate CX.:!t,!t) by (3.29) and Table 3.2.
2.
.£2 =
86
Table 3.2
. C Matrices and Secondary Parameters (A levels)
_C[ l,k i }t
Name of Effect
1
3
Average Distance
Between the
Lines
( ABDL)
( l/A)lA'
Deviations From
the Means
(DFH)
( 1/a).!.A'
[ lA-I, O) - (1/A) 1
(A-I) xl
Reference Cell
[1, .QA-l'}
[-.!,
l__,
_~,
.!A-I}
-,--._ _--,l
Steps 1 through 4 of Algorithm 4 generate the C matrix
representing the effects described in section 1.5.2 in terms of the
expected cell means.
These expected c;ell means are the primary
parameters of the Cell Mean Model (Model 4) so the .£ matrix forms a
definition for the effects.
Step 5 then transforms the effects into a
.£1 matrix in terms of some other model as described in Theorem 5.
This
allows the statistician to fit any model that is isomorphic to the cell
mean model, define secondary effects of interest as combinations of the
effects given in 1.5.2 t and use Theorem 5 to create the Cl matrix
needed to test the hypothesis in terms of the model fitted.
In Examples 3.5 and 3.6, Algorithm 4 is used to generate the
secondary parameter matrix
a Reference Cell model.
El
representing the ADBL and DFM effects for
Algorithm 3 is then used to compute and
display the resulting primary and secondary parameter definitions:
87
e
~e E(!)
=
!
AS E( Y)
For the same 3x4 Reference Cell model, Example 3.7 examines the A, B,
and AB effects resulting from specifying Reference Cell effects for
factor A and DFM effects for factor B.
Note that the DFM B main
effects in this example is not equivalent to the DFM B main effects in
Example 3.5 because of the different specification of secondary A
effects.
88
e
e
e
};UIlELE J.5 -UilL EFucrs rOil J I
IDBL A
I£U A
IUL B
&tEL B
D:l
un
IDU
AORL
ADBL
ADBL
ADBL
ADBL
\,0
C
=
8
A.I:l
AB
A8
A8
AB
A8
IUL A
UI:I. I
un
I(UUI)
=
Il
JDEL B
Ili1:LE
IDEl &D
AUI. AB
IDH AI:l
lIln AB
AtEL AD
UIL AB
I!U
A2
AJ
B2
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
111
11:t
llJ
- 1/11
-1/4
-1/1f
-1/"
113
-1/4
-1/11
0
-1/3
-1/3
1
1
1
1
0
0
-1
l/J
0
0
1
0
0
-1/3
1
a
0
-1
0
0
u
-1
0
0
-1
0
BJ
.. JiEfEIIUC! CELl. 1I0DEl
BII
1522
&823
U211
a
a
1/11
0
0
VII
1111
0
0
0
0
1
0
0
0
0
0
0
0
0
1111
121
122
-1/11
-1/"
1/11
0
-1/3
-1/3
-1/3
-1
-1
-1
113
0
0
0
1/3
0
0
-1
0
0
-1
1
0
0
0
0
0
0
0
0
0
a
a
a
1/3
0
0
0
113
0
1/3 •
1
0
1
0
0
0
0
0
1JI32
IIlJ3
lB311
a
0
11"
V"
0
lIlt
1/3
0
0
0
0
0
0
113
0
0
1
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
1
123
121f
131
132
133
1311
1111
1/1f
0
0
0
11"
0
0
1
0
0
0
0
0
0
0
0
0
1/3
0
0
1
0
0
0
0
0
1/3
0
0
0
0
0
0
VIf
-1/3
1/"
113
11"
11"
-V3
0
a
0
0
-1/3
V3
0
1/3
1
0
0
0
0
0
0
0
0
0
-1
1
-1
-1
a
0
0
0
0
0
0
0
0
1
0
0
0
1/3
0
0
a
a
a
0
1
E~AflP1E
tUl A
UftA
CUI S
DffI S
\0
tlll II
0
C
ttll
DFI1
DFI1
DFI1
DfP1
DF'I'I
=
AD
AS
ltS
AS
AS
AS
till A
lUI A
Ull Il
UllS
tl'l! B
A('lUUA)
'"
UftAB
Cft! All
tHI All
DI II All
£t ft AS
tift All
e
3.6 - tift EillCtS
flU
A2
13
B2
BJ
0
0
0
0
0
0
0
0
0
0
0
-1/3
2/3
-1/3
-1/3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
-l/It
0
0
0
0
0
0
0
-1/1f
J/If
-I/If
111
112
I1J
()
-1/11
J I 4 iEIBBliCi CiLl ftODEL
I~B
Bif
A1I22
A823
AB211
lB32
lBJ3
lUll
-1/12 -1/12 -1/12
1/6
1/6
1/6 -1/12
-1/12 -1/12 -1/12 -1/'2
1/4 -1/12 -1112
1/"
-1/12
1/11 -1/12' -1/12
1/1l 1112 1112 1/12
-1/4 1/12 1/12 -1/1f
1/12 -114 1/12 1/12
-1/6 -1/6 -1/6 1/12
1/2 -1/6 -1/6 -1/4
-1/6
1/2 -116 V12
-'/'2
-1/12
-1/12
-1/12
1111
1/12
1/12
-1/11
1/12
1/12
-V-
-1/12
-1/12
-1/12
-1112
-l/12
1.112
V12
1/12
V12
1/12
1/12
o -1/12
0
-1/4
-1/11
-1/4
0
0
0
0
0
0
.J/"
0
0
0
0
0
0
114
121
122
0
0
0
0
0
0
12J
124
IJl
112
133
134
1/b
1/6
l/b
1/6 -1/12 -1/12 -1/12 -1/12
-1/'2 -1/12 -1/1~ -1/'2
1/6
1/6
1/ 6
1/6
114 -1/12 -1/1~ -1/1~
1/4 -1/12 -1/12 -1/1~
1/4 -1/1.! -1/12 -1/12
1/4 -1/12 -1/12
-1/12 -1/12
1/4 -1/1l -1/12 -1/12
114 -1/12
1/2 -1/6 -1/0 -1/6 -1/4 1112 1/12 1/12
-116
1/:t -1/6 -1/6 1/12 -1/" 1/12 1/12
-lIb -1/&
11-' -l/b 1112 1/12 -1/4 1112
-1/4 1/12 1/12 1/12
1/2 -116 -1/6 -116
1/12 -1/11 'IlA:: 1112 -1/&
112 -1/6 -V6
1/12 1/12 -1/1f 1./12 -116 -116
114 -116
-1/12
-1/12
1/4
-1/12
-1/12
-1/4
1112
1/12
-1/4
1/12
1112
-1/'2
-1/12
-1/12
1/4
-1/12
1/12
";1/,
1/12
1/12
-1/4
1/12
-1/12
-1/12
-1/12
-1/12
114
1/12
•./12
-1/"
1/12
1112
-114
-1/12
-1/12
-1/12
-1/12
-1112
1/12
1/12
V12
1112
1/12
1/12
-'/'.!
-
e
e
e
e
EIAIIPLE 3.1 - REfERENCE CELL BY Dr" EflBcrs 'OB BEllBEICB CBLL BODfL
110Ft A
AOAL A
UI'I B
[il! B
UI'IB
C
'"
-
~
~
Ii'-","
RC-DF'I
PC-I'I" "
PC-OP'"
"C-OF"
RC:-Ol""
~ [lBt
I (THE'll)
I
AnRL I
[fft B
UI'IB
tift B
:: ADnL -erl'l
II OBI. -Dill
AOBL -DFI'I
1I0l."t -DF"
1I0~L -1)'1'1
IInI."L-UI'I
IIU
12
13
62
83
B"
lB22
lB23
182..
lB32
1B33
0
·0
0
0
0
0
0
0
0
0
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
.11"
1/"
1/4
0
-1/11
-1/11
0
V/&
0
0
0
0
0
0
0
0
0
0
0
1/'
J/II
-l/ll
J/ll
-1/"
-1/"
-1/"
0
0
1/'
0
0
0
0
0
0
0
·0
0
111
112
-1111
-lilt
~1/4
3/4
-1/4
-1/"
:-l/ll
J/"
-VIt
-J/II
1/4
1111
... 3/"
1/11
1/4
-1/1t
1/11
-31"
114
1/4
-J/4
1/4
-1/"
0
0
0
0
0
0
0
0
a
0
a
0
0
0
0
0
0
0
0
0
0
0
0
-1/11
3/It
-1/11
-VII
-1/11
3,IQ
-VII
0
0
0
0
0
0
0
lIlt
121
122
123
12/&
131
-1/11
-1/11
1/11
V"
lilt
1/11
-1/ll
-1/"
-1/4
3/4
-1/4
-1/1t
1/1t
11"
1111
1/"
1/4
1/4
0
0
0
0
0
0
0
0
3/11
-1/11
-1111
-VII
0
0
0
0
-1/4
-1/"
3/4
-VII
0
0
0
0
0
O·
0
0
0
0
0
0
0
0
0
0
113
-1/11
-1/11
l/ll
1/4
-3/4
1/4
1/"
-3/"
a
a
0
a
a
3/"
-114
a
a
a
-1/"
-1/"
-1//&
-1/1t
a
a
1/4
0
0
0
a
0
a
0
0
0
0
0
-1/ft
-V4
-V"
J/ll
112
133
-11'
-1/'
-1/'
13..
0
0
0
1/11
VII
0
0
0
0
0
0
0
0
0
0
0
0
-1/11
-1111
3/"
-1/ll
0
0
0
a
a
0
0
-1/"
J/"
-1/4
0
11'
3/11
-1/11
-V'
3/"
183..
-1/11
-1/11
-1/'
Chapter 4
Missing Cells
In this chapter the results obtained in Chapter 3 will be extended
to the incomplete case, models for designs with one or more missing
cells.
Consider an n-way model where not all interactions are present,
for example, an additive, two-way factorial model.
Certain patterns of
missing cells will allow "recovery" of the expected means of missing
cells.
This in turn will allow for the definition of the parameters,
even though one or more of the expected cell means involved in the
definition are missing.
Such parameters will be termed "recoverable
parameters."
The first section of this chapter will give necessary and
sufficient conditions for equivalence of the parameter definitions in
the complete and incomplete cases.
It will provide an algorithm for
determining the existence of and computing the definition of any such
recoverable parameters.
The relationship between recoverability and
connectedness will be discussed and, in the process, a test for
connectedness will be provided.
The second section will present three
strategies for those instances when one or more parameters are not
recoverable.
developed.
Algorithms applicable to these situations will then be
4.1
Recoverable Parameters
The concept of recoverability can be illustrated with a simple
example.
Consider a 2x2 Reference Cell model with no interactions:
Yll
E( Ye )
Y12
or
= ![3] ![3]
E
=
I Y21
Y22
1
0
0
jJ
1
0
1
A2
1
1 0
B2
1
1
1
The canonical definition of 13[3] is given by (3.16) as
Dc:
13[3]
= (!'X)-l XI
-
=
E(!e)
• !.C
E( Ye )
3/4
1/4
1/4
-1/4
-1/2
-1/2
1/2
1/2
-1/2
1/2
-1/2
1/2
E( Ye )
A more intuitive definition would be
1
-1
I
Since
A!
=
l,
000
1
0
0
the definitions Dl and D2 are equivalent by (2.4) and D2
does not require E(Y22)'
This means that cell (2,2) could contain no
observations, and the parameters in 13[3] would still be well-defined
and recoverable.
If cell ( 1,2) were missing, the definition
93
1
0
0
0
-1
0
1
0
o
0
-1
1
would be equivalent to Dc and D2' since
~3
X
= !'
and the element B2
would be recoverable.
4.1.1
Conditions for Recoverability
Consider the general case of an essence linear model (2.10) with a
parameter 0
(4.1)
= Ca
D=
represented by the secondary definitional matrix
C
Ae
o
Q
The goal is to find a definition of
Q=
Ca that is equivalent to the
canonical definition but has all zeros in the columns of
corresponding to the missing cells.
~0
Primary parameters are a special
case where C = I.
Assume E(Ye ) and X are reordered and partitioned such that the m
missing cells are first:
E( Yl)
!1
mx1
E( Ye )
=
B.
=
E(
.!2)
~N-m)xj
!2
94
The· definitional matrix (4.1) can then be written as
-e
E( Y1)
c
~
A·
_.1
(4.2) D =
-
E( Y2)_
S!2
~1
0
=
0
E( Y1)
.!a
!1
~
0
~1
S!2
E( Y2)_
J.
Premultiplying (4.2) by
!a
M=
6
!:!2
sxs
M
-
.!a
!1
~
-0
0
~I
~
E(.!I)
=
.!a
!1+M1~1
~+MI~
-0
0
M2~1
!:!2g2
E( YI)
E( Y2)
E( .!2)
or
e E( Yl)
(4.3)
D*
=MD =
=
F.( Y2)
I
Al
~
0
gi
~2
M
-I
I
J
e
E( YI)
E( Y2)
!a
-AI+MIQ1
--
~+MI~
0
!'!2QI
M2Q2
Theorem 6. The definition of
~
given bY!e in (4.1) is equivalent
to !2+MI~ in (4.3) iff Ml exists such that AI+!:!lgl
95
= O.
Proof:
Let a second definition be
A necessary and sufficent condition for Dl *D2 is given by (2.15) as
Let .!!l!
=
[Al ~1
= Al
Xl +
~!z
and
By definition
so
.Ql Xl
.!!z
If
X
= -.Q2!z
= ~ !z -
!l + ~l Ql
Hl X
= Al
=0
Xl +
= - ~l
and
Ml .Ql !l·
then Al
~
=-
~l .Ql and
!z
Ql Xl +
~
!z
= .!!2 !.
QED.
96
4.1.2
NGJ Reduction for Recovery of Missing Cell Parameters
Note that the- original restrictions can be returned without
affecting the parameter definition as follows
0
.!a
0
0
~-l
D*
=
E( Y2) _
E( Yl)
I-~
!l+!!lQl
0
Ql
~+Ml,gz
~
•
The question arises) why bother with M2 in the first place?
The same
end result would be obtained by selecting
lies in the computational approach taken to
By using the NGJ reduction defined in section 3.2.2:
construct M.
11= !'!s+q •••~+l where!!.1 = [el' ••• , ~J' .•. , ~+S], and
11 D =
E(!2) _
0
E( Yl)
.!a
0
!2!:!lg2
0
bo-.
!:!2~
Thus if the algorithm runs to completion, all s restrictions may be
used to eliminate missing cells from the parameter definitions.
After
a column of Ql (ilj) is used to eliminate the corresponding column of
AI, then Slj
=~.
This prevents the re-introduction of missing cells
into Al in subsequent iterations.
Since the matrix M consists of a
product of normalized elementary matrices, the restriction restoration
matrix M* can be constructed in the same manner.
97
From section 3.2.2
...
>a
For j
~
~
Ml j
o
~j
=
[~l ~ t ••• .!!j t ••• t ~] •
let
o
*=
o
M2·- 1
-J
= [~l ~ t ••• t!!!j
*t
~n]
t
where
* = 0 for i " at
mij * = -mij/mjj for
mjj * = l/mjj.
mij
i
> at
i
* jt
and
Using the relationship
M· M.*
-J :;]
=
.!a
~lj
0
.!s
,:.,..
( 4.4)
and the definition of mij from (3.20):
-'
* = 0 for i < a and
mij * = dij for i > a.
mij
/
So
*
M
=
M_
;31+1
* ~+2
"'1
*
•••
M_
.:.:n
*
o
=
o
98
and
D**
= M*M D =
o
.!h
The matrix M can be obtained by applying Algorithm 2 for j-= a+l to N.
M* can be constructed using (4.4).
Failure of Algorithm 2 indicates that
~1
+ Ml R
=
~1
does not exist such that
0, and, by Theorem 6, there is no equivalent definition
that does not involve the missing cells.
For specific missing cell
patterns, the following algorithm gives equivalent definitions for
recoverable parameters and identifies those parameters that are not
recoverable.
Algorithm 5.
Recover Parameters Involving Missing Cells and
Identify Nonrecoverable Parameters
Input:
.!a
D=
A
axn
o
Q
sxn
nI, ••• ,
1.
of the missing cells.
Reorder D such that the m missing cells are first:
0
EX Yl)
.!a
Al
axm
0
Ql
sxm
D=
2.
nm = indices
M*
=
E( Y2)
~
a>« Nm)
-,
~
.!a+s.
99
3.
4.
For j = a+1 to J, where J= mine s, m) :
1.
~ = .!a+s·
2.
~
3.
If any dkJ
*
= .!a+s·
* o for
k=j,
... ,
J
* O.
1.
Find first dkj
2..
Interchange
3.
~ = -~/djj •
4.
mjj = 1/mjj.
5.
mij* = dij for i = a+1 to a+s.
4.
~ = ~ D.
5.
M* =!:!
~
and
!ik
if k
* ~*•
For i = 1 to a:
1. If dij
1t.
=0
for all j = a+1 to a+m then:
0i is recoverable (in 01).
Else
1f.
5.
0i is unrecoverable (in
Reorder D such that.:
- °1
D =
6.
* j.
.!.0l
0
0
.!.02
0
0
1--*
D = M
02
E( Yl)
E( Y2)_
0
~12
~21
~2
!:!2~1
M2Q2
- - I·
D.
100
~).
Examples 4.1,4.2, and 4.3 apply Algorithm 5 to three missing. cell
designs.
In each example, an additive Reference Cell model is used as
the primary parameterization and ADBL and DFM secondary effects are
specified.
By Theorem 6, the secondary effects are recoverable only
for those parameters where A( 11)
effects are recoverable in
= Al+
E~amples
~1 .~h=
.2..
All secondary
4.1 and 4.3, but only the mean,
ADBL A2' and ABDL B2 effects are recoverable in Example 4.2, which is
an unconnected design.
The relationship between' recoverable parameters
and connected designs is discussed in the next section.
.
.
101
EXlft~LE
D~t~~itlOHIL
IIITBII
POR J I J fACTORIAL
&1
12
1
1
1
1
1
0
0
0
0
El
112
E3
CELL COONTS
DES~GH
4.1 - "L"
IJ
~
!
(11)
! (12)
= 1
!
(ol 1)
j
~
(1)
il(:2)
!l
IIU
12
AJ
112
113
1
1/3
1
0
0
0
-1/3
:.t.13
0
0
113
113
113
0
1
0
0
0
0
0
I
I
I
I
1
-113
-113
0
0
1
0
0
0
1
0
0
-1/3
2/3
-1/3
-1;3
=
(22)
122
12J
132
0
0
0
0
0
0
0
0
0
0
0
0
0
IJJ
111
113
112
121
131
I-'
:::>
I"
IDEl I!O
AVBl 12
IDEL 13
AD1l1 i:2
AtEl E3
DEll 11
en I.e
CEil 1:1
Dill E2
IIU
12
13
£2
113
BEsa
HSU
UsH
USH
e
0
0
0
0
0
0
0
0
.
---
0
0
I
I
1
I
1
I
0
0
I
0
0
1
0
1
0
0
0
0
1
0
1
0
0
1
0
1 1
0
0
0
I
_______________
1
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
u
0
0
0
0
0
0
0
u
0
I
I
I
I
I
. .
0
I -1/3
-1
I
1/3
1/3
0
0
0
0
1
0
2/3
I -1/3
0
0
0
1 2/3
I -113
0
o.
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
0
0
0
0
0
I
0
0
0
0
0
0
0
·0
0
0
0
1
0
0
0
0
1
0
0
I
I
0
0
1
0
0
0
0
0
0
0
0
1
-1
-1
-1
I
1
I
l1
.1
I
I
I
1/3
0
1
1
0
.0
0
-1/3
2/3
-113
-113
-1/3
2/3
-1,3
-113
0
0
0
1
-1
0
0
0
0
0
0
0
1
._I
113
1
0
0
0
I
I
-
e
~
0
0
0
0
1
0
0
-1
0
0
-1
1
0
0
0
-1
0
1
0
0
1
1
-1
0
-1
-1
0
0
1
1
-1
0
0
-1
-1
0
0
-1
-1
0
0
e
e
e
Elllun q.2 -
CELL couns
DBPIHilON~L
IATBII
=
e
"eol AND ou" tlSlGI POll 3 I 3 PlCIORIAI.
A1
&2
A3
III
1
1
e2
IlJ
1
0
1
0
0
0
1
oS
j (11)
j(12)
.I
j (21)
j (22J
.2
il (1)
,g (2)
=
4~J
EXAMPLE
11
CELL
econs
::
12
13
111
0
1
112
1
1
0
1
1
1
0
113
DI'II171C81L MIT8l1
DES~G.
- "MiSSiNG DIAGO.1L"
~
J
(11)
A (12)
1
J (21)
A (22)
j!
,g (1)
il (2)
POB 3 I 3 PICTORIIL
a
......
'....:)
..,..
ltel Illj
lEl!J. I~
II:I!L 13
AtEl 1:2
IDBi B3
DfII 11
Df! 12
DfM E 1
DU
E~
I!U .
12
13
E2
113
usn
usn
US1Ii
6ESHi
e
!IO
12
13
114
B3
1
1/3
1/3
0
0
0
0
0
0
0
0
1
0
0
0
0
-1/3
1
0
1/3
0
0
1/3
0
0
0
1
0
0
1
0
0
0
0
0
0
-I/J
0
213
-1/3
-1/3
1
0
0
0
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
-----
0
UJ
-1/J
-113
0
0
0
1
0
0
0
0
·0
1
0
0
0
0
0
0
0
0
111
122
133
112
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
I 1/3
0
I
0
I
1
I
0
I
0
I
0
I
I -1/3
2/3
I
J
J
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
0
I
J
1
1
I
1
1
0
0
0
0
0
0
-1
0
-1
0
1
0
-1
I
•I
•I
I
I
I
I
-------_._---
1
0
0
,•
J
-----,
I
I
1
0
0
0
0
e
•
113
121
123
131
132
0
0
0
1/3
1/3
0
0
0
0
0
0
0
0
0
0
-1
-1
-1
-1
-1
1 .
1
1
1
1
0
0
-2/3
-1;3
-113
-1/3
1;3
-2/3
1/3
1
1
-1
-1
1
-1
0
-1
-1
0
0
0
0
-1
0
0
1
0
0
0
-1
-1
0
1
0
0
0
0
0
0
0
0
-1
-1
0
0
-1
2/3
1/1
1;1
-1/3
113
-213
-1
-
-1
2/3
1
1
1
0
0
0
0
1
e
4.1.3
Connectedness and Recoverability
In the two-way factorial model, the:concept of recoverability of
all well-defined parameters is analogous to "connectedness" of an
additive model as described by Bose (1947) and Dodge and Majumdar
(1979). Neither of these authors generalized the concept of
connectedness beyond 2-way designs.
The folloWing definition will
extend connectedness to any factorial design and 't'11ll provide a method
for testing connectedness of a given missing cell pattern.
Definition:
Consider a complete design and a linear model with
well defined parameters meeting the conditions of Theorem 4.
conditions of Theorem 6 with 0
=!
If the
are met for a particular pattern of
missing cells (incomplete design), then the model parameters are said
to be "recoverable" and the incomplete design is "connected;" if these
. are not met then some parameters are "unrecoverable" and the incomplete
design is "unconnected."
Consider the one-half fractional replication of a 24 factorial
design shown in Example 4.4.
Since all primary parameters in this
model are recoverable for this missing cell design, the design is
considered "connected" for the given parameterization.
In Example 4.5,
the same fractional factorial missing cell design is applied to a model
with two-way interactions.
In this instance, the parameters are not
recoverable, so the design is not connected for the model containing
two-way interactions.
105
e
e
e
e
e
e
"t
EllftfLE 4.5 • ORB HALF 5EPL)(11101 ~, 2121212 ROOIL
WITH T~O-iAl IRTEBACTl~'S
ULL
"
11112 11121 l1:il1
IIU
12
B2
C2
D2
0
0
0
0
0
0
0
0
0
0
0
0
•0
0
0
112~2
0
0
()
0
0
---- .
110
Q
0
12
82
C2
D2
0
C
0
0
0
0
0
0
0
U
0
0
0
0
0
0
0
liES'! Ii
1
1
1
1
0
BESTi
1
0
2
11122
IC22
At22
Q
Be22
0
8D22
C022
0
usn
usn
usn
1
2
0
0
0
0
0
0
0
0
0
0
0
1
0
1
1
2
1182
1281
1
0
0
1
0
1
1
0
0
1
<;.101
1.1D2
(2&1
C2D2
(cuns
tEflllIleRAl ftATBI!
f-'
0
A1E1
0
0
0
0
0
0
0
0
0
0
0
1
=
J (21)
A (22)
g (1)
il (2)
-1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
-1
1
0
0
0
1
1
-1
-1
0
0
1
1
0
1
0
0
0
--
0
0
0
'1
0
2
1
}(12)
0
0
0
0
0
-1
0
1
0
0
-1
0
0
1
0
0
1
0
.Ie 11) .
12111 12122 12212 12221
1282
'"'
11111 11122 11212 11221 12112 12121 '12211 12222
1
-1
-1/2
-112
0
0
-1/2
1/2
1
-1J'2
0
0
-1
-1
1
-1
0
1
-1
0
0
1
1
1
0
0
0
-1
-1
-1
0
0
1
0
0
0
-1
-1
0
0
1
0
-1
-1
-1
-3
1
0
1
0
-
-1J'2
-112
-1/2
1/2
1J'2
0
0
1/2
-112
0
1/2
1/2
V2
V2
-V2
0
0
0
1/2
-1/2
0
0
..
0
-112
112
11'2
1/2
-112
V2
-1/2
1J'2
112 -112 -112
0
0
0
0
1/2
1/2
-1/2
-1/2
-1n
0
1
-1/2
0
1
-1/2
-1/2
V2
-1/2
-1J'2
-1/2
1/2
0
0
0
-1
0
-1
0
-1
0
-1
0
0
·1
-1
-1
-1
0
-1
-1
0
0
-112
-112
1/2
V2 . -V2
0
0
0
0
0
0
1
1
0
0
0
-1
0
1
-1/2
1/2
-1/2
-1/2
-11'2
0
0
1
'/2
-1/2
-1/2
-1/2
-'J'2
-1J'2
0
1
1/2
-1/2
0
1
-1/2
-1/2
-1/2
-1/2
-112
-1n -In
-1J'2
-1/2 -1/2
0
1
0
0
-1
0
-1
-1
1
-1/2
0
0
1
1
1
0
0
0
0
0
0
-1
-1
-.
1
•
4.2
Nonrecoverable Parameters
When a specific missing cell pattern causes one or more parameters
to be nonrecoverable t several strategies are possible.
1
Define only that subparameter that is recoverable.
2
Make additional assumptions which add restrictions to
the model and allow the parameters to be defined.
3
Define parameters that are "similar to" the original
parameters.
Each of these strategies involves identifying which parameters are
recoverable for the missing cells nIt ••• , nm•
Algorithm 5 will
provide this information as follows:
where
4.2.1
~1
= the
01 recoverable parameters,
0.2
= the
02 unrecoverable parameters,
E(Y1)
= the
m missing cell expected means,
E(Y2)
=
!12
= the
the N-m nonmissing cell expected means, and
definition of the recoverable cells.
Strategy 1 - Recoverable Subparameters of
~
Strategy 1 can be implemented by simply applying Algorithm 5 to
identify the subvector of
~
that is well-defined, i.e.,
~1.
Strategies
2 and 3 involve further manipulation of the definitional matrix
generated by Algorithm 5.
108
4.2.2
Strategy 2 - Convert Nonrecoverable Parameters to Restrictions
Consider the results of Algorit·hm 5, steps 1 through 5, before the
restoration of the original restrictions:
Y1)
Y2)_
~1
~
.!a
0
0
A12
0
.!.b
A2l
!22
0
0
!:!2Q1
!:!2~2
D* = M D =
E(
E(
where at least one ,column of !21, say !i' is not all zeros and the
corresponding column of
M2~1'
say Si, is not equal to ei.
to use row operations involving Si to reduce
~
reintroduce nonzero elements elsewhere in A2l.
additional non-estimable restrictions.
Any attempt
to zero will
What is needed are
Since
X
R
completely spans the row space of
!'
it is not possible to find
additional restrictions which are not linear combinations of either R
or X.
Henderson and McAllister (1978) and Hocking, Hackney and Speed
(1978) suggest restricting the highest order interactions of missing
cells to zero.
If these interaction terms were included in the
complete case model, then setting them to zero will involve changing a
linear combination of the rows of
!,
specifically the definition of the
interaction term in question, to a restriction.
The following example will demonstrate how the primary
definitional matrix and a variation of Algorithm 5 can be used to
109
implement this strategy.
Consider a 2x3 Reference Cell model with
interactions. The complete design primary definitional matrix for this
!!lodel is
B2
A2
~
B3
AB22
AB 23
1
1
1
1
1
1
E(Yll
Y12
Y13
Y21
Y22
Y23)
1
0
0
0
0
0
-1
0
0
1
0
0
-1
1
0
0
0
0
-1
0
1
0
0
0
1
-1
0
-1
1
0
1
0
-1
-1
0
1
Applying Algorithm 5 results in the
Assume cell 2,2 is missing.
following definitional matrix:
~2
~1
[~
A2
B2
B3
E(".!1 )
AB23] [ AB 22] [Y22]
-1
1
1
1
1
1
E( .!2)
[Yll
Y12
Y13
Y21
Y23]
0
1
0
0
0
0-
0
-1
0
0
1
0
0
-1
1
0
0
0
0
-1
0
1
0
0
0
1
0
-1
-1
1
1
1
-1
0
-1
0
Following the suggestion of the above cited authors, the interaction
term of the missing cell is restricted to zero.
This can be
implemented by setting the one corresponding to AB22 to zero, thus
converting that row from a parameter to a restriction.
This example is
slightly contrived, since no other parameter definitions use the
110
missing cell.
Consider the same model with the 1,3 cell missing.
Applying
Algorithm 5:
~
01
[lJ
AZ
BZ
ABZZ] [B3
E( Y1)
AB23] [Y13]
1
1
1
1
1
E(!2)
[Yll
YIZ
YZ1
YZ2
Y23]
0-
o
1
0
0
o
-1
0
1
o
o
o
o
-1
1
0
o
o
1
-1
-1
1
o
1
-1
0
0
o
o
-1
1
0
-1
o
1
o
The two choices for additional restrictions are setting either B3
or ABZ3 equal to ·zero. The 2,3 cell is not missing, and there is no
AB13 interaction in the reference cell model.
In keeping with the
general philosophy that main effect should not be deleted when the
corresponding interactions are present, the interaction term is
thus recovering B3.
Carrying this strategy to the extreme, if all cells were missing
we would restrict all parameters to zero.
Obviously, there is a point
at which no further restriction assumptions are acceptable.
In
addition, some sort of decision rule is needed to choose the next
restriction to be added.
By ordering the parameters from "most
important" to "least important" and instructing the'algorithm to start
its restriction search at the bottom of
~2,
the user can control the
order in which parameters will be restricted without knowing the
missing cell pattern a priori.
Typically this ordering would go from
main effects down to the highest order interactions, but the user could
specify his or her own ordering.
By defining the potential restrictions in advance, the minimally
acceptable model can be specified.
The parameters in ° can then be
partitioned as follows:
_°1
°2
°3
.!.01
(4.6) D
=
.!.02
L
.!.03
E(
Yl)
E(
Y2)_
0
!12
!21
!22
A31
!32
The elements of .!.0 index the parameters in 03, which are the potential
restrictions to be used to reduce
~1
to zero •. The order of conversion
is determined by the order of the elements of 03 and by the specific
missing cell pattern.
Algorithm 6 first defines 01,
~,
and
~3.
Algorithm 6 then determines the next potential restriction and uses
the new restriction to recover parameter elements from
112
~2.
The example discussed assumed that the primary parameters were the
parameters of interest.
The definitional matrix (4.6) assumes that the
potential restrictions are given in terms of some
Typically, secondary parameters
~
second~ry
parameters.
are of interest, and the potential
restrictions are given as primary parameters such as interactions.
This need to have definitions of secondary parameters while identifying
potential restrictions in terms of primary parameters is one of the
motivations for using the complete definitional matrix described in
(3.2.6).
Consider the case where e and
complete case.
D
=
(3
are well defined for the
The complete definitional matrix is then given by:
e
(3
E( Y)
.!.a
C
!e
0
.!q
A(3
0
0
.9.
Assume that m cells are missing such that some elements of e and
are undefined.
Partition! using the notation given in (4.6).
is, (31 is well defined,
!2
is not well defined, and
f3
That
is not well
defined but its elements are potential candidates for restriction to
zero.
e will
~1
~.
and
be partitioned into definable and nondefinable segments,
113
f
E( Y2)_
82
£11
£12 £13
0
~012
.!02 £21
£22 £23
A021
~022
0
~812
~821
~822
~831
~832
~1
.<l2
°2
.!01
(4.7)
E( Y1)
81
_°1
83
.!81
D ..
.!82
.!83
The goal is to reduce all elements of A021 to zero so that all elements
of ° are defined in this missing cell case.
The method involves
converting a diagonal element of .!83 from 1 to 0 and treating that row
as a restriction.
Linear combinations of this row can be added to the
rows of A021 in order to reduce one of its columns to zero.
Elements
of 12 may sometimes be recovered during this process, but the primary
objective is to recover all secondary parameters.
Assume the jth element of
13,
E3j is converted to a restriction.
Since b3j .. 0, the jth columns of £13 and ~23 can be removed from the
definitional matrix.
This can be accomplished by post multiplying 4.7
by a matrix of the form
.!G2
p ..
.!S1
.!82
where P3j =[~1, ~2, ••• , .!j-1, ~+1, ... ~83]·
114
In general t
!3
is an identity matrix with a column removed for each
element of !3 that is converted from a parameter to a restriction.
all elements of !3 are converted t
!3
If
is a null matrix (OxO).
Selection of the restrictions can be made by examining the
following submatrix of D:
- 133
F.( Y1)_
~3
A021
°
!e21
.!.e3
Ae31
G1 £2 •
=
. • £133
°
~e3
~1
Start with the first column of F.(!1)'
!.01 !.02
.•
• !.0m
. !.e21
!.e2m
!.e31
• !.e3m
In order to make a01
least one element of !.e31 must be non-zero.
-
= -0t
at
Select a non-zero t say
i th t element of !.e31 and set the corresponding ~ = 0 in .!.e3'
The i th
row of Ae31 may then be used in row operations to get !.01 and !.e21
equal to zero. If !.e31 = £t converting the corresponding parameter to
zero will not help in recovering any parameters.
a zero pivot in Algorithm 3.
This is equivalent to
Continue with subsequent columns until
either A021 = 0 or all !3 are converted to restrictions.
A decision rule for the selection of the next parameter to be
converted to a restriction merits some discussion at this time.
The
potential candidates are the parameters corresponding to the non-zero
elements of ae3j for j=1 to m.
The simplest rule would be to chose the
.last eligible element t assuming that the ordering of the parameters
corresponds to their relative importance.
This rule cant
unfortunatelYt lead to logically indefensible results.
Consider the
3x4 Sigma Restricted Model in Example 3.3 of section 3.2.3 where all
interactions terms are eligible for conversion to restrictions.
115
If
cell (1,1) was missing, this decision rule would restrict the AB23
interaction to zero.
As noted in the literature review, most authors
would suggest restricting the interaction of the missing cell to zero,
in this case, AB11' A second method would be to let the user determine
which parameters to restrict for a particular missing cell pattern.
This approach requires the user to examine the definition for each
particular pattern of missing cells, but it does provide the
statistician with maximum flexibility and control.
An easily automated rule that is logical and consistent with other
authors would be to select the highest order interaction term
corresponding to the missing cell as the parameter to be restricted.
If such a term does not exist, then a parameter having the same level
as the missing cellon one or more factors will be chosen.
examinatio~of
An
the definitions of the models constructed in Examples
3.1 through 3.7 will reveal that, for each cell (column of D), if an
interaction term corresponding to that cell exists, then it will have a
larger absolute coefficient than any other term in that column.
If
such a term does not exist, then the largest absolute coefficient will
have the most factors where the" level of the parameter and of the cell
are the same.
Algorithm 6 will use the latter decision rule with the
option of allowing the user to explicitly specify those parameters to
be restricted to zero.
This process involves premultiplying by a
matrix M, as in Chapter 3, and postmultiplying by
element of !3 are converted to restrictions, then:
116
K.
Assuming that all
! =
!i ~
(4.8)
1.£11
-C12 -Cl3
0
Ae12
!e2 .£21
.£22 .£23
Ae21
Ae22
0
Af312
Af321
Af322
Af331
Af332
Q1
Q2
.!e1
.!e1
!e2
}11
!fn
.!f31
.!f32
!f32 !i2
!i3
!f33
.!s
I
C11'£12
.!e1
0
!e21
I
o
=
I
.!f32
o
M3 Ae32
.Q1
The original definition of
1e
can be recovered by premultiplying (4.8)
by
I
I
I
M*
Q2
=
I
!t3-1
I
117
Using Theorem 6, the definitions of
~
and
~
given by A022 +
~~e32
and AS32 + M2 AS32 will be equal to those in the complete case iff A021
+
~lAe31
= Qand
AS21 + M2 Ae31
= Q,
respectively.
previously, the objective here is to recover 02,
As was stated
not~.
-
reason, the algorithm will stop as soon as all elements A021 are O.
.
-
the primary parameters are the parameters of interest, setting C
will specify 0
=i
and give the desired result.
118
..
For this
=l
If
Algorithm 6.
Recovery of Secondary Parameters by Converting Eligible
Primary Parameters to Restrictions
Input:
the complete definitional matrix
D=
°
S
F{ Y)
.;
C
CA
0
Iq
A
0
0
Q
°
S
F{ Y)
.!a
C
A0
0
lq
AS
0
0
g
or
a secondary parameter matrix £' where
axq
~
-
= CS
index of missing cells n1 •••nm •
index of potential restrictions Pl •••PS3 (Default
= all)
or
index of
1.
e~plicit
restrictions r1 •••rR
Apply Algorithm 5 to both primary and secondary parameters
_01
°2
.!01
.!02
D
=
S2*
'£11
£12*
0
A012
C21
C22*
!021
A022
0
AS12
!S21
AS22
AS31
AIl32
gl
.9.2
.!Sl
.!S2*
2.
F{ Y1)
Sl
F{ Y2)
-I
Use the indices of potential or explicit restrictions to
partition!2* int0!2 and !3, thus forming (4.7).
119
3.
For j
1.
=1
to m:
If !.831j
1.
* 0 and a021j * 0
then:
Call subroutine A to select the ith element of
!3 to be converted to a restriction.
= a83lij.
2.
p
3.
Remove the column of D corresponding to the ith
column of 1.83.
4.
~
= !!i* = -I.
-0
5.
0
-!.021j /p
.J!02j
0
0
=
!!j =
.J!82j
-!.821j /p
m83j
-!.831j /p
0
6. .J!83ij
7.
8.
9.
4.
D = M* D.
5.
Redefine 01t
m*"
_J
=
0
= l/p •
0
( a+81+82)
!.831j
83
x
1
0
s
x
1
x
1
M *.
-M* = -M* ~
D = M D.
~2t
!I t !2 t and !3t and reorder D accordingly.
120
SUBROUTINE A.
Alternatives for Selectiop of the Next Restriction
If! = rl ••• rR is specified then:
1.
It.
Select i corresponding to r such that aS31ij is the last
remaining non zero element of
~S31j.
Otherwise:
If.
2.
Select i such that laS31jl
;>
laS31jl for k
= 1, ... ,133.
Return.
Example 4.6 generates a 3x4 Reference Cell model with interactions
for the missing cell design shown in Example 4.1. Algorithm 6 is then
applied with all two-way interaction parameters designated as potential
restrictions.
By allowing the interaction terms to be restricted to
zero, the remaining parameters can be recovered and are equivalent to
those shown in Example 4.1.
121
EXABPLE 4.6 - "L" DESIGI rOB J I J PACTOBIAL WITH I I
.E1
ULL couns
112
flJ
e
A1
12
1
1
1
o
1
o
A3
1
o
o
e
~
e
~
4.2.3
Strategy 3 - Define Similar Parameters
The third alternative for dealing with nonrecoverable parameters
involves constructing linear combinations of non-definable parameter
elements that are definable.
with cell (1, 3) missing •
Consider the 2x3 Reference Cell model
As was shown in section 4.2.2, the
parameters B3 and AB23 are nonrecoverable.
Rather than restricting one
of these parameters to zero, in this strategy they will be added
together resulting in the following definitional matrix:
°1
[lJ
A2
B2
AB22
1
1
D=
1
1
1
1
o
1
0
0
o
o
o
-1
0
1
o
o
o
-1
1
0
o
o
o
o
1
-1
-1
1
o
0
0
-1
o
1
Although !3 and AB23 are each nonrecoverable, their sum, which is a B
main effect confounded with the AB interaction, is recoverable and
well-defined.
This combining of parameters can be specified by the
user in the form of a
.E. matrix
defining the desired linear combinations
of the original primary parameters and can be implemented using
Algorithm 3.
123
Chapter 5
Summary
The increased availability of statistical software over the last
decade has generated considerable discussion regarding the analyses
performed using these statistical paCkages or programs.
One major area
of concetn is the inability of linear models programs to convey the
details of their analyses to the user in an interpretable form.
In
some instances, the model fitted is detenuined by· the computational
method used by the computer program and by the number of observations
in the cells of the design, with the result that the user may not know
what model was fitted and what hypotheses were tested.
This research
provides algorithms for generating linear models for classification
designs, including designs involving missing cells,and for
communicating information about these models to the researcher.
Several authors have approached the problem of communicating the
details of an analysis to the user by examining the hypotheses tested
by various statistical routines, while others have looked at the
parameters of the linear models themselves.
This research uses the
concept of parameter definition to describe each parameter element in
terms of a linear combination of the expected cell means of the design.
Secondary parameters of interest can also be described in terms of
their parameter definitions.
It has been shown that a parameter may have multiple definitions
which are "equivalent»" that is, they define the same parameter.
Since
our purpose is to communicate information about the linear model to the
researcher, we are interested in a definition that is easily
interpretable and intuitively appealing.
Normalized Gauss-Jordan
reduction of the design (X) matrix of a linear model results in a
"definitional matrix" that frequently provides such a description.
Designs involving missing cells often result in some parameters being
undefined because their definitions involve the expected means of
missing cells.
In certain instances, these parameters can be
"recovered" by finding an equivalent definition that does not involve
the missing cells.
The concept of "recoverable parameters" is
analagous to the concept of "connectedness" in a two-way design.
The
recovery of all parameters in a particular missing cell design provides
.
a test for connectedness of the design.
In addition, the concept of
recoverability of parameters generalizes the definition of
connectedness to higher order designs.
For those missing cell designs where one or more parameters are
undefined, three alternative strategies are available.
First, the user
can specify that subparameter of the original model that is
recoverable.
Second, she or he may make additional assumptions about
the model in the form of restrictions which will allow more parameters
to be recovered.
Third, the user may specify recoverable parameters
that are linear combinations of the non-recoverable original
parameters.
125
5.1
Results
The objectives of this research were met through a series of
theorems and algorithms which were then tested on various experimental
designs by implementing the algorithms as computer programs.
The
output from these programs are shown in the examples.
Generation of the design (!) and secondary parameter
is performed by Algorithms 1 and 4 respectively.
(~)
matrices
The first algorithm
provides for the construction of a cross-classification design
involving any combination of LTFR ANOVA, Sigma Restricted, Reference
Cell and Cell Mean effects.
In the case of the over-parameterized LTFR
ANOVA effects, the appropriate sum to zero restrictions are also
generated.
Algorithm 4 allows the user to specify any combination
Average Distance Between the Lines (ABDL), Deviations From the Means
(DFM), or Reference Cell effects for the design.
The algorithms will
generate the definitions of these effects in terms of the cell means,
and will provide the
~
matrix for the model generated by Algorithm 1.
The theoretical justification for these algorithms is provided by
Theorems 1 and 5.
For complete (no missing cells) designs, Theorems 2,3, and 4
prove that Normalized Gauss-Jordan reduction can be used to generate
definitions of primary and secondary parameters that are equivalent to
their canonical definitions.
These definitions usually involve fewer
cells of the design and provide definitions that are easier to
interpret than the canonical definitions.
Algorithms 2 and 3 are used
to generate these definitions as shown in the examples.
One
interesting and unexpected result obtained by examining these
126
definitions pertains to the main effects and lower order interactions
in models involving more than one type of parameter.
The examples show
that the types of the factors not in a particular effect determine the
definition of that effect.
For example, in an AxBxC factorial one
could not speak of the "ABDL A main effect" without. indicating the type
of parameterization of factors Band C.
The complete definitional matrix for a well-defined design with no
missing cells is used as the starting point for dealing with designs
with one or more cells missing.
The first step is to identify those
parameters that were defined in the complete case, but are now
undefined because their definitions involve the expected means of cells
that are missing.
Theorem 6 gives the necessary and sufficient
conditions under which an equivalent definition not involving the
missing cells exists.
to be "recoverable."
Parameters that meet these conditions are said
Algorithm 5 identifies and recovers parameters
for any given missing cell design.
The concept of recoverability of
parameters is similar to "connectedness" of a two-way model.
Recoverability can be used to generalize connectedness to.an n-way
model and Algorithm 5 can be used to test for connectedness of a
particular parameterization with a specific missing cell pattern.
In the event of nonrecoverable parameters, three strategies
suggested by the literature are implemented using Algorithms 3, 5, and
6.
The first strategy is to test only those hypotheses that involve
the recoverable subvector of the parameters.
by Algorithm 5.
This subvector is defined
Second, many authors suggest adding additional
restrictions to the parameters in the case of missing cells.
127
For
example» restricting the highest order interaction of the missing cells
to zero.
Algorithm 6 will implement this strategy automatically for a
specified list of potential restrictions» or it will allow the user to
explicitly restrict certain parameters to zero after examining the
definitions for a particular missing cell design.
The third strategy
involves combining nonrecoverable parameters into new» recoverable
parameters.
This can be implemented by simply specifying the desired
linear combination of parameters with a C matrix and applying Algorithm
3.
5.2
Applications
The algorithms» theorems» and computer programs developed during
the course of this research have potential applications in several
areas.
First» the ability to easily generate! and
£
matrices for
n-way designs with different types of factors and effects will be
useful for analyses» research» and instructional purposes.
At the
present time» the users of statistical software must either construct
these matrices themselves or use the parameterization supplied by the
program.
For complex designs» construction of these matrices is a
tedious and error prone process. The use of Algorithms 1 and 4 will
allow statisticians to fit the most appropriate model for the analyses»
rather than being restricted by the computer program that does the
calculations.
Students and researchers can use the algorithms to
generate matrices to explore the relationship among the various types
of models and effects.
Algorithms 2 and 3» along with their supporting theorems» provide
intuitively appealing definitions of primary and secondary parameters.
128
These definitions can be helpful in addressing the long-standing
questions of "What hypothesis is being tested?"
by answering the more
basic question of "What do the parameters involved in the hypothesis
mean?"
Again, these definitions can be useful to the statistician
doing data analyses and to the student or researcher exploring the
relationship among various models.
Finally, Algorithms 5 and 6 provide a mechanism for allowing
statistical software to deal with missing cell designs in a consistent
and straightforward manner. Algorithm 5 will recover those parameters
that it can, and will identify those that are not recoverable.
Algorithm 6 will "then select certain parameters from a prespecified
list and convert them to restrictions in order to recover the remaining
parameters.
If the missing cell structure is such as to make this
impossible, the algorithms will indicate which parameters are
nonrecoverable.
The intent of these algorithms is not to always select
the "best" analysis for messy designs, but to take some reasonable
action, i f possible, and always to inform the user of that action.
5.3.
Direction for Further Research
Further research on this topic can be divided·into implementation
of existing results, extention to other types of linear models. and
further theoretical development.
Implementation issues include a
"user-friendly" method of specifying the factors and effects; a method
of generating the full design matrix for computational purposes; and
interfaces with existing statistical software.
129
Other types of models that would obviously be of interest are
those invovling nested factors, models with covariables, and
multivariate models. Nested factors can be constructed by modification
of existing algorithms.
The application of parameter definitions to
covariables and multivariate models, however, will require some
additional theoretical work as well as additional algorithms and
programs.
130
References
Bock, R.D. (1963), "Programming Univariate I'lnd Multivaril'lte An('l.lysis of
Variance," Technometrics, 5, 95-117.
Bose, R.C. (1947), "The Design of Experiments," Presidental Adress to
the Section of Statistics, 34th Indian Science Congress, Delhi,
1-25.
Bradley, H.E. (1968), "Multiple Classification Analysis for Arbitrary
Experimental Arrangements," Technometrics, 10, 13-27.
Brandt, A.E. (1933), "The Analysis of Variance in a '2xs' Table with
Disproportionate Frequencies," JASA, 28, 164-173.
Bryce, G.R. (1975), "Letter to the Editor," The American
29,70.
-----
Statisticia~,
Bryce, G.R., Carter, M.W., and Scott, D.T. (1980), "Recovery of
Estimability in Fixed Models with Missing Cells," B.Y.U.
Statistics Department Report Series, SD-002-R, Provo, Utah.
Bryce, G.R., Scott, D.T., and Carter, M.W. (1980), "Estimation and
Hypothesis Testing in Linear Models. A Reparameteraization
Approach to the Cell Means Model," Communications in Statistics,
A9, 131-150.
Burdick, D.S. and Herr, D.G. (1980), "Counterexamples in Unbalanced
Two-Way Analysis of Variance, Communications in Statistics, A9,
231-241.
Burdick, D.S., Herr, D.G., O'Fallon, W.M., and O'Neill, B.V. (1974),
"Exact Methods in the Unbalanced, Two-Way Analysis of Variance--A
Geometric View," Communications in Statistics, 3, 581-595.
Carlson, J.E. (1975), "Letter to the Editor," The American
Statistic~, 29, 133.
Carlson, J.E. and Timm, N.H. (1974), "Analysis of Nonorthogonal
Fixed-Effects Designs," Psychological Bulletin, 81, 563-570.
Christiansen, D.H. (1981). "Algorithms for the Automatic Generation of
Linear Models for Incomplete Classification Designs," Proceedings
of the Statistical Comp_uting Section of the 1981 AnnuaTMeet1ng of
the ASA, 140-145.
Christiansen, D.H. and Helms, R.W. (1980)" "Definitions of Parameters
in Constrained Linear Regression," Proceedings of the 1980 Annual
Meeting of the ASA, 233-238.
131
Christiansen, D.H., Hosking. J.D., and Helms, R.W. (1978), "LINMOD: A
System for Linear Models Analysis," Proceedings of the 1979 Annual
Meeting of the ASA.
Dodge, Y. and Majumdar, D. (1979), "An Algorithm for Finding Least
Square Generalized Inverses for Classification Models with
Arbitrary Patterns," Jou;:,.~~~ of Statistical Computatio~ and
Simulation, 9, 1-17.
Elston, R.C. and Bush, N. (1964), "The Hypotheses That Can Be Tested
When There Are Interactions in an Analysis of Variance Model,"
Biometrics, 681-698.
Fowlkes, E.B. (1969), "Some Operators for ANOVA Calculations,"
Technometrics, 11, 511-526.
Francis, 1. (1973), "A Comparison of Several Analysis of Variance
Programs," Journal of the American Statistical Association, 68,
860-865.
Frane, J.W. (1977), "BMD and BMDP Approach to Unbalanced Data,"
Proceedings of Computer Science and Statistics: Tenth Annual
Symposium on-ehe Interfa~, National Bureau of Standards Special
Publication-s03, 40-47.
Frane, J.W. (1980), "Some Computing Methods for Unbalanced Analysis of
Variance and Covariance," Communications in Statistics, A9,
151-166.
Gianola, D. (1975), "Letter to the Editor," The American Statistician, .
29, 133.
Golhar, M.B. and Skillings, J.H. (1976), "A Comparison of Several
Analysis of Variance Programs with Unequal Cell Size,"
Communications in Statistics, B5, 43-53.
Goodnight, J.H. (1977), "Hypothesis Testing in Multi-Way Anova Models,"
Proceedings of Computer Science and Statistics: Tenth Annual
~mposium on the IJ.1't"erlace, National Bureau of StandardsSpecial
Publication 503, 48-53.
Goodnight, J .H. .( 1979), "A Tutorial on the Sweep Operator," The
~erican S~a}istican, 33, 149-158.
Goodnight, J.H. (1980), "Tests of Hypotheses in Fixed Effects Linear
Models," Communications in ~stics, A9, 168-180.
Heiberger, R.M. and Laster, LL. (1977), "Computing Approaches to the
Analysis of Variance for Unbalanced Data," Proceedings of Computer
Science and Statistics: Tenth Annual Symposium on the Interface,
National Bureau of Standards Specia:r-Publication-S0~37-39.
132
Helms, R.W. (1978), "Transformation Between Equivalent Linear Models."
Unpublished Class Notes, University of North Carolina, Chapel
Hill.
Helms, R.H. (1980), The Definition of Parameters in General Linear
Models, Institu~-of StatisticS-Mimeo SerieslNo. 1320, University
of North Carolina, Chapel Hill.
Henderson, C.R. and McAllister, A.J. (1978), "The l-fissing Subclass
Problem in Two-~vay Fixed Models," Journal of Animal Science, 46,
1125-1137.
Hermmerle, W.J. (1980), "Recognizing Balance with Unbalanced Data,"
Communications in Statistics, A9, 201-212.
Herr, D.G. and Gabelein, J. (1978), "Nonorthogonal Two-Way Analysis of
Variance," Psychological Bulletin, 85, 207-216.
Hocking,R.R., Hackney, O.P., and Speed, F.M. (1977), "Analysis of
Linear Models with Unbalanced Data," Proceedings of Computer
Science and Statistics: Tenth Annual Symposium on-the Interface,
National Bureau of Standards Spedal PublicationS03, 66-70.
Hocking, R.R., Hackney, O.P., and Speed, F.M. (1978), "The Analysis of
Linear Hodels with Unbalanced Data," Contributions to Survey
Sampling and Applied Statistics, New York: AcademicPress, Inc.
Hocking, R.R. and Speed, F .M. (1976), "The Use of the R( )-Notation with
Unbalanced Data," Journal of th~ American Statistical Association,
30, 30-33.
Hocking, R.R., Speed, F.M., and Coleman, A.T. (1980), "Hypotheses to be
Tested with Unbalanced Data," Communications in Statistics, A9,
117-130.
Hosking, J.D. and Hamer, R.M. (1979), "Nonorthogonal Analysis of
Variance Programs: An Evaluation," Journal of Educational
Statistics, 4, 161-188.
Johnson, A.F. (1971), "Linear Combinations in Designing Experiments,"
Technometrics, 13, 575-587.
Kurkjian, B. and Zelen, M. (1962), "A Calculus for Factorial
Arrangements," Annuals of Math Statistics, 34, 600-619.
Kurkjian, B. and Zelen, M. (1963), "Applications of the Calculus of
Factorial Arrangements," Biometrika, 50, 63-73.
Kutner, M.H. (1974),. "Hypothesis Testing in Linear Models (Eisenhart
Model 1)," Journal of the American Statistical Association, 28,
98-99.
133
Kutner, M.H. (1975), "Letter to the Editor," The American Statistician,
29, 134.
Neter, J. and Wasserman, W. (1974), Applied Linear Statistical Models,
Irwin, Homewood, Illinois.
Nonlezun, C.J. and Speed, F.M. (1980), "The Geometry of Estimation and
Hypothesis Testing in the Constrained Linear Model: The Full Rank
Case," Communications in Statistics, A9, 213-230.
NeIder, J.A. (1975), "Letter to the Editor," Journal of the Royal
Statistical Society, 23, 232.
Overall, J .E. and Spiegel, D.K. (1969), "Concerning Least Squares
Analysis of Experimental Data," Psychological Bulletin, 72,
311-322.
'
Rao, C.R. (1973), Linear Statistical Inference and its Applications,
New York: John Wiley & Sons, Inc.
-- -SAS User's Guide (1979), SAS Institute, Cary, NC.
Scheffe, H. (1959), The Analysis of Variance, Wiley, New York.
Schlater, J.E. and Hemmerle, W.J. (1966), "Statistical Computations
Based on Algebraically Specified Models," Communications of the
ACM, 9, 865-869.
..
- Scott, D.T. and Bryce, G.R. (1980), "Development of Some Computational
Algorithms for Linear Models with Discrete of Continuous Data,"
B.Y.U. Statistics Department Report Series, SD-026-R, Provo, Utah.
Searle, S.R. (1977), "Analyses of Variance of Unbalanced Data from
3-Way and Higher-Order Classifications," National Bureau of
Standards Special Publication 503, 54-57.
Searle, S.R. (1971), Linear Models, New York:
John Wiley & Sons, Inc.
Searle, S.R. (1980), "Arbitrary Hypothesis in Linear Models with
Unbalanced Data," Communications in Statistics, A9, 181-201.
Searle, S.R. and Henderson, H.V. (1978), "Annotated Computer Output for
Analyses of Unbalanced Data: SAS GLM," The Biometrics. Unit Mimeo
Series, BU-641-M, Cornell University.
Searle, S.R., Speed, F .M., and Henderson, H.V. (1979), "Some
Computational and Model Equivalences in Analyses of Variance of
Unequal-Subclass-Number Data," Biometrics Unit Mimeo Series,
BU-668-M, Cornell University.
G.E. (1974), An Introduction of
Dover, New York.--
Shilov~
134
th~
Theory of Linear Spaces,
.e
Speed, F.M. and Hocking, R.R. (1975), "A Full Rank Analysis of Some
Linear Model Problems," Journal of the American Statistical
Association, 70, 706-712
--Speed, F.M., Hocking, R.R., and Hackney, O.P. (1978), "Methods of
Analysis of Linear Models with Unbalanced Data," Journal of the
American Statistical Association, 73, 105-112.
Starmer, C.F. and Grizzle, J.E. (1968), A Computer Program for Analysis
of Data by General Linear Models, Institute of Statistics Mimeo---Series NO"':" 560-; UniversityOf-North Carolina, Chapel Hill.
Stewart, G.W. (1973), Introduction to Matrix Computation, Academic
Press, New York.
Urquhart, N.S., Weeks, D.L., and Henderson, C.R. (1973), "Estimation
Associated with Linear Models: A Revisitation," Communications in
Statistics, 1, 303-330.
Wilkinson (1977), "Anova for Non-Orthogonal Data," Proceedings of
Computer Science and Statistics: Tenth Annual Symposium on the
1i1ferIace, National Bureau of Standards Special Publication 503,
58-65.
Yates, F. (1934), "The Analysis of Multiple Classifications with
Unequal Numbers in the Difference Classes," Journal of the
American Statistical Association, 29, 51-66.---- -- --Zelen, M. and Federer, W.T. (1965), "Application of the Calculus for
Factorial Arrangements," Sankhya, 25, 383-400 •
..
,
•
135
© Copyright 2026 Paperzz