Smith, H.F.; (1955)Variance components, finite populations and experimental inference." (U.S. Army Ordnance)

VARIANCE COMPONENl'S, FINITE POPULATIONS, AND EXPEP..IMEN1rAL INFERENCE
•
Prepared Under Contract No. DA-36-034-0RD-1517 (lID)
(Experimental Designs for Industrial Research)
•
by
H. F. Smith
•
•
Institute of Statistics
Himeo Series No~ 135
July, 1955
•
e
VilRIAlICE COMPONENTS, FIUlTE POFULATIONS AND EXPERIHENTAL INFERENCE
by
•
H. F. Smith
North Carolina State College
Abstract
I
Recent development of models for analyses of variance has been concerned with
the effect of poetulating that aX» observed set of treatments is a sample from a
finite class of treatment variants, and with consequent effects on definitions of
variance components.
In this paper the word p0J?U!ation is used to imply an inti-
niLte population, the word universe to imply a fini.te population. The one extreme,
infinite populations of all classifications, is Eisenhart1s model II, commonly
e
•
described as the IIrarxlom model ll •
The other extreme, observing all members of small
universes, is commonly taken to be equivalent to Eisenhart's model I. These two
do not hO't'lever have the same philosophical background.
•
Model I postulates that
we wish to evaluate the means, parameters or statistics of location, for each
observed treatment irrespective of any universe from which they may have been
selected: it is described as a regression model, regression techniques being used
to evaluate the statistics of location.
hand postulates
JlO
The variance component model on the other
interest in individual treatment means but only in the dispersion
of the elements of their universe: it becomes similar to model I only incidentally,
'tV'hen a cODrplete universe or sub-universe is observed, in the sense that we then
have the means to completely describe the universe, including its dispersion, by
an enumeration of its elements.
Section 1 notes that statistical amlyses are formal.
•
The parts into lo1hich
lie imagine an observation or variance to be divisible are not physical entities
dictated by nature: they are defined empirically for purposes of statistical
.
e
description.
•
The criterion for preference among alternative models is simplicity
and conveneince
or
col1f!equent statistics:
nothing more fundal12ntal.
Sections 3 to 6 revieu standard models and their interpretation -
or
a setting
the stage for subsequent discussion.
Section
7
endeavours to tighten up the definition of some statistical concepts,
because, although the topic may seem pedantic, a deal of controversy seems to trace
back to small differences in usage and interpretation
or
words by different 'tvorkers.
Among other things the conclusion is reached that Tukey1s k 2 statistics for a
universe cannot be defined as variances without consequent inconsistencies.
are therefore referred to throughout this paper as k-statistics.
values are denoted by the capital,
K;
They
Their parametric
the greek letter,usual designation for a
parameter, having been reserved for the analogous parameters of infinite populations.
Section 8 presents the General model for universes
or
an;y size.
Analysis of
variance in terms of generalized symmetric means then indicates how we can define
components of mean squares which are tlinherited on the average" (Tukey, 1950).
The;y are therefore an extension 1:.0 components of the generalized
defined by Tukey for moments of universes and samples from them.
k-stati~tics
Like these the
definition of sample values is independent of the size of parent universe ani
therefore is the same as for the variance components of a random (infini.te population) model.
Beine; invariant for alterations of hypothetical universe sizes they
are christened canonical variance components.
Variance components as usually defined for fini. te uni.verses are here distinGuished as Ispecific variance components 1.
•
Their formulae appear to be most
easily derivable as a linear function of the canonical components (sec. 9) •
Hm-rever since inferences maybe made just as v1ell, and often better, in terms of
canonical components the specific ones may seldom be vlorth evaluating.
Recollecting
that the criterion of a good statistical model is simplicity of the descriptive
•
and teE:t statistics to Hhich it leads, a modified model is proposed (the canonical
latent equation) such that the variances of its groups of elements rra.y be the
canonical variance components.
Sections 10 and 11 consider the interpretation of regression am mixed models
as limiting caees of the generalized model.
The most important point to emerge
is that controversy over the mixed model derives from universal oversight that
"men one turns to examine the means of observed variants of "That Has defined as the
random factor one has abanioned the mixed model as originally defined and is not-T
applying regression interpretation to that factor.
restrictions formerly laid down on the
•
Itfi:~ed
Uhen that is realized the
factor" are seen to be irrelevant to
evaluating means of observed variants of the postulated "random factor II ; and
the debate is resolved.
Section 12 maintains that a significance test, like
statistical estimation discussed in sec. 7~ all-lays involves a hypothetical infinite
pOPUlation and that appropriate error mean squares are better taken to be those
indicated by canonical, rather than by specific, variance components.
Section 13 exaMines the ideas promulgated by Kempthorne and Hille for "logical
derivation" of linear models and their amlyses for experimental situations.
It
finds: (1) that their definition of the random variables generated by sampling
elements of a universe lacks theortical validity and merely produces an unnecessarily
cumbersome algebraic system; (2) the randomization tests '!'Thich they advocate do
not ('!'Tithin the frame'!'1ork of the theory) produce the ansuers which an experimenter
rec:pires; and (3) insertion of unidentifiable interactions in the model introduces
redundant complications.
The last follows from recognizing that an interaction can
properly be defined only for reproducible effects of variants which can be identified
.
e
in replications of the experimental units.
Unidentifiable interactions are a part
of experimental error and need to be considered only in so far as errors may be
heterogeneous" for exanple to distinguish between main-plot am split-plot errors.
Section
14
(not
ciples discussed•
•
~t
'"1I'itten) presents some examples to illustrate the prin-
.
e
VARIANCE COMroNENTS, FINITE PORJLATIONS, AND EXPERIMENTAL INF.ERENCE
by
H. Fairfield SD1ith
Narth Carolina state College
Contents
o.
1
Introduotion.
1. Analyses are formal,
2.
3
General notation and definitions.
6
3. Variance component models.
•
e
4.
,.
9
-
8
-,
-
11
12 • 16
Regression on
17
~alitative
factors.
20
7. Some finite population theory
27
Appendi~c
to Section 7.
(Definition of parameters)
8. General model.
Appendix to Section 8.
(varianc~s of
mean squares)
9. The canonical variance model.
Regression as limit of general model.
- 19
-
26
4, - ,a
39
40
44
,9
66
67
77
-
--
76
78
88
11. The mixed model.
79
12. Error terms and significance teets.
09
13. Randomization tests and unidentified interactions.
96 - loti
14. Examples
Summary
e
2
Reeression on quantitative factors
6. Randbm model.
10.
-
Literature Cited
-
9,
.
-1-
e
Introduction
In practice analysis at variance is not difficult to interpret relative to a
given purpose even although there may be some ambiguity about the underlying model
to best represent facts.
The two waTS in which analysis of variance may be used
"rere expounded by Eisenhart (1947).
Often however one rray want (as it has been
described) to view a given set of' data IIthrcugh either pair of spectaoleell and the
alternative may occasion no trouble.
Nevertheless 1dth usage being extended over
ever increasing range of applications, many have f'elt it desirable to get the
various models more rigorously formulated and synthesized.
To this end, consider-
ation has been given to interpretations centering around postulates that variants
,
e
or levels of' factors in a factorial experiment are representatives of limited
(finite and perhaps small) universes of variants. Alternative postulates, sometimes implied rather than defined, have led to controversy on l1hether or not main
effeot mean squares should or should not contain interaoti on variance components
in the mixed model and what coefficients should be associated 1,dth them in more
general models.
The main purpose of this paper is to define a set ot "canonical" variance
components (Sec. 6) which depend only on the design of an experiment and are invariant under al terna ti ve pbstulate s about the universe s supposed to have been
sall1'led.
They have
useful properties, in particular that there are unique un-
biased estimators Hhich are "inherited on the average" (Tutcey, 1950), aId they
lend themselves to flexible interpretation at experimental data Hith urdverse
postUlates introduced only all part of the process of interpretation instead of as
I
e
a part of the model. Their sample estimators are defined in the same way as for
variance components in the "random" model, as is
re~ired by
tmir invariance and
..
e
•
-2the condition that both should be as;ympto-liically equivalent. For elementary
exposition they may be regarded as parameters and estimators for hypothetical
infinite populations, leading to inferences relative to finite universes with
"finite population corrections" as developed by Yates and Zacopanay (193,>.,
Cochran (1939), Hendricks (1947, 19,1). Irwin and Kendall (1944), (Sec. 9).
J
Endeavor to obtain definitions l'lhich would be always consistent indicated that
ambiguity is creeping into current usage of many terms such as parameter J variance,
fixed effects* Indeed much contemporary discussion on analysis of variance and
variance components seems to derive from varying use and interpretation of words;
it poses, at least in part, a problem in semantics.
e
sections endeavor to set down just what we mean by certain concepts.
The later
sections develop the application of cananical variance components to experimental
inference •
•
Therefore the first seven
-31.
Statistical
.......
..
-_.---. ...........
analyses are_--...;,
formal.
....
Analyses at observations, y, into components according to a linear model such
as
or an analysis ot variance symbolized by
2
222
(1y I : 0"..<. + (1~ + ••• + (1e
are purely formal, the respective components being defined solely for simplicity of
statistical description.
The respective components may be associated with causal
influences; but they do not
2
e
represent
physically real or distinguishable parts
of y or of (1y; nor even of the uresponse ll to a treatment,
~vhich
should naturally
be evaluated as deviation from the yield l'lli. th no treatment rather than from the
mean for a set of treatments o
If He make chemical analyses of, for example, equal volumes of normal solutions
of NaC~ and of KC.e~ the analysis is dictated by nature - so much l-Tater, ;
chlorine, and sodium or potassium. Ue can say that the solutions have the
same amounts of water and chlorine in common but differ in their cations.
For contrast suppose hardness of rubber may be a.s indicated in Table 1 after
vulcanizing 1·Jith alternative qualities of sulphur, fi.;, and of carbon black,
13 ' $ in all combinations" The results are to be expressed as a linear
fulction repre1:enting contributions from each factor. A lo~~ical suggestion
might be to define ~ = the base of reference • 1; c a effect of sulphur • 6
for either type; ~ = effect of carbon black 10 1; (~~)i. = interaction of ..<..
with ~. II 1, 3, 2 and 4 for ij = 11, 12, 21 and 22
J respectively, the 1.
gains J over 9 being regarded as due to chemical interactions.. But these
defi:Jitions fail t9 si1'1plify things in the 1·ray we l-Jant o The zero levels
of either ingredient are likely to be uninteresting and unobserved; how much
simpler to summarize the other four combinations by ~ = 11.5, ~ = !. .5,
I3 i = .:!;.l, (.,(~\j = 0, although chemically speaking these may be arbitrary
df:!fini tions.
e
,
-.'!-
Table 1
0
1
2
0
1
2
2
1
7
7
10
11
12
13
2
In that example \'1e considered only description of certain fixed figures free
of sampling variations lIDich are the statistician IS major concern. Consider next
measuring sizes of seeds in pods of a legume. They are too nultitudinous to be
individually enumerated and we mat be content with representative observations
from v1hich a picture of the l1hole rray be formulated. Earlier statisticians
described the complex by saying that seeds within a pod are "correlated" and
sought to devise a measure of the correlation. NOlladays it is obviously more
clearly and canprehensibly deecribed by analysis of variance and its associated
"linear model" for the jth seed in the i th pod
Yij • ~ + Ri + 6ij
(1.3)
\fe speak of seeds in the same pod as having ~ + ni "in commonll , but it is a very
different cOllll11Unality from that exhibited in the chemical analysis -- the seeds
are not physically divisible into corresponding parts, and definitions of the parts
are arbitrary. A classical taxonomist would perhaps define ~ for a "type" pod"
and ~ for a type position, say the proximal seed. A "finite populationistll
descr~bing a single tree would define &J. as the mean lieight of all seeds in the
tree, ~ + n as the wean seed weight in the ith pod, and his definition of the
variance c~ponent ~ would coincide e::~ctly \.0. th the actually eXisting variance
of pod means. The b~ometrician will usually prefer to define them for a hypothetical population uhich \.0.11 not be exactly realizable in any ob~erved tree.
All are reasonable formlations and the hiometricianls choice is determined only
by statistical convenienee. It is nothint; more or les~ than a device to simplify
distribution of correlated variates by division into parts Hhose distributions
are uncorrelated.
Adopting a linear model is from the start arbitrary.
The realistic analysis
of yield of a cereal plant into components llhich have physical meaning -- ear
number (e), mnnber of grains per ear (n) and ltleight per grain (g) -- is
e
Treatment effects orten combine in a similar or more complex T,'lay" yet for statistical purposes the empirical linear equation may be mrth using. Analof:;ous variants
are possible for the variance equation (1.2) and will be illustrated in the sequel.
-5Statistical components, either of an obE:ervation y or of its variance
0;.
I
are dictated only in part by the structure of a population, sample or experiment.
Detail of definitiona is arb! trary and can be formulated to produce as much
simplicity of description as possible.
-62.
General notation and definitions.
'rhe structure of a population or experiment will mean the ways in which data
may be claseified according to factor.j the number of variants of each of these,
am Hhether they are
II
crossed" , (e.g. potato varieties x insect sprays, temperature
x concentration of a reagent) or "nested" (e.g. districts within states). When
a factor is continuous (e.g. tenperature) we shall speak of its levels; when
discrete (e.g. varieties) of its variants.
There mst always be a close relation bet'oleen an experiment and the population
about rhich it gives information.
Ideally we mould ccnsider first the population
structure and select an experinental design to match; frequently action precedes
suCh thought and we reconstruct the population which a given design can reasonably
represent.
For our purpose the order is immaterial as we can take the one to
dictate the structure of the other, except that the design need not state the
mmber of variants of each factor in the population.
Actually a given eJl;periment
can answer questions relative to a variety of populations, for example ue can seek
to evaluate yields for variants of one factor when crossed by certain selected
variants of another or as averaged tor a population ot tram. Ue shall therefore
take the experimental de15ign as basic structure, and suggest a general analysis
which l41ll depend only on Mat design and be readily adaptable to answering questi.ons
relative to any reasonable postulates about the sorts of populations "hieh it might
represent.
The::.e will be introduced only as secondary postulates with specifio
questions, and not be laid down as part of the initial model.
lile speak of y;i.e,ld as the measure of an observation on each experimental unit
or plot whether this be actually a yield as of a field crop or any other characteristic (e.g. hardness).
lIe write yield as a linear function of elements (e.g. 1.1
-7or 2.1) which can be associated with each factor level or variant in a given
treatment combination. A linear :function is used as routine merely because it is
the simplest matherratically,
Since we always have at least as many elements as
observations these equations are identities.
The number of elements into mich we imagine any yield aJJl,ljzable is dictated
by the structure except that there may often be some diecretionary choice about
the rtwnbar of interaction terms to be inserted, i.e. that are deemed worth considering, depending on what we already know about the response of yield to the
treatments and how well it may be represented by a linear function 1'4 th fewer than
the rraximum number of elements.
Throughout this paper we a'Ssume "balanced"
arrangements in both experiment and population,
(The meaning of balanced in this
context is defined by CI'11mp, 1951).
For the sake of specificity consider a standard example mich represents all
main :features, namely a t~-1o-factor experiment with ~ variants of factor
crossed by b variants of factor
(3
.A--
and with d ra.ndom replica.tions of eaoh treat-
ment combination. The linear model is
(2.1)
k • 1" 2, ..d•••D
"m
B • the potential number of variants of
observed, et~
13 of mich bare
Elements indicated by the same letter uill be called a gr0':!E'
When dealing ~dth finite populations, a given interaction element (4)1j must
•
be assooiated with a particular .(1 and
fl j •
~1hether
or not a similar condition
·aapplies for 6ijk depends on circumstances: the postulated cause of deviation
represented by this element and whether or not it can be randomized li. th the treatment combinations.
A subscript zero will indicate a mean of a finite universe, for example
D
~ ~ 11j
o
a subscript dot will indicate a mean of a sample
Yij
/1> ;
So far as these defini tiona have gone the elements are stUl indeterminate.
Without altering y a constant could be added to every element of one group and
or f3 j and
subtracted from the corresponding sub-group of (.((3 )ij ; similarly for exchanges
subtracted from every element of another;
or
ad~ed
to just one
~
with the 6 group but l-D. th sub-groups depending on the randomization postulates.
To obtain urdque defirdtion of the elements the location of each grcup or subgroup DUst be pinned and we can do this arbitrarily.
For algebraic converdence
l-le define IJ. to be the mean of the whole canplex population, and the means
group or sub-group to be zero•
•
or
every
-93. Classification of models.
Eisenhart (1947) gave a clear and explicit exposition of tlilO alternative models
underlying analyses of variance.
an observed set of treatments;
I:
The objective is to evaluate mean yields of
the analysis of variance is an algorithm for tests
of significance for differences between these means.. usually by groups.
It is a
standardized forlllllation of tests which follow from claseical least squares precedure for estimating the means, the parameters of location, mich can be formulated as regression coefficients.
model. II:
This is therefore described as the regression
The objective is to estimate the contribution of each factor to varillo
ability in a complex population: an analysis of variance in the literal meaning
of the wards.
In effect l-fe now say that we are not interested in tb3 yields
associated uith individual factor variants.. but only in the dispersion of yields
associated with a population of variants.
take the observed variants to be a
sa~1e
Interpretation is simplest whan we oan
from a potentially infinite number;
this is not an escential part of the model.
but
Even if' every potential variant of a
factor be observed the objective in view may recpire only an assessment of' the
variability for lJ'hich that factor can be held responsible.
For example in studying
variability of a given kind of cloth l-re may observe every spindle in the looms
which produce it and all the 1Dtormation we may need is how much of cloth variation
is due to variability among the spindle s irrespective of the effect of each one
individually.
This is described as: variance component anal;ys!!.
These two formulations have I1Dre recently been well describea by Hoel (19,4),
under the names "linear hypothesis model" and "components of variance model", in a
reView which skillfully avoids complexities of the subject.
.
Tukey (1949) gave a lOOre elaborate classification.
Crump (1951) notes that
its "general features ...are common to any set of data arranged in a multiple
classification and described by a linear model".
However it was introduced under
the section heading "Estimating effect variances", its exposition included the
sentence ''To each kind of effect there cot'responds a corresponding !ffect variance
or comfOnent of variance", and it is frequently taken as a classification of
variance component models.
Tukey writes of row, column and cell "effects" indis-
criminately whether they· are to be regarded as parameters of location or rsndom
variables, and the above quotations apparently relate to the stand ta ken in
his 19$0 paper "to define the varianee of any finite set of numbers as k , whether
2
the finite set is a sample or a population". (Contrast the practice of Anderson
and Bancroft (1952) and of other writers who, when dealing with a regression or
mixed model, 1...rrite expectations of mean s\.iuares with an explicit function of
"fixed effects" which they refrain from regarding as a variance.)
Sec. 7 will
demonstrate that there is justification for these views when finite universes are
being considered, but that they lead to inconsistencies of terminology.
They seem
to have engendered a tendency for contemporary writers to discriminate between
models I and II in terms of population sizes.
Furthermore the place of the Eisen-
hart models in the Tukey clase1fication is determined by their postulates of
normality, pistulates l-rhich were no more than riders to justify tests of significance.
These two circumstances have had the unfortunate effect of maSking the
more basic distinction between models I and II, relative to l-bich population sizes
and forms of distributions are mere incidentals.
The
~ukey
classification is primarily a classification of two-factor universes
and of ways of sampling therefrom.
It cuts across the distinction of models I and
-llII which this paper will endeavour to maintain.
In so far that the regression
model can be treated as a degeneration of the general model (Secs. 8-11) the two
formulations inevitably merge to some extent.
But in the viewpoint to be developed
here, universe postulates will be regarded as secondary conditions to be imposed
whe n asking specific questions of a set of data, not as an essential part of the
initial model.
,
4. Regression on quantitative factors.
lIe consider first application of the regression model to an experiment \-lith
\lUantitative factors because its interpretation may aid in visualizing interpretation
for the more general case.
It differs frOM the general case in that the levels of
each factor can be ordered a priori with a measurable distance between each on the
scale of 'intensity' of each factor.
Analysis can be, and most often is, carried
through on the general model (2.1), as i f each treatment uere discrete. 1Vith
factors at only two levels that is the only praoticable procedure since tr10 levels
give no handle on the response surface (unless it can be assumed a priori to be a
plane, rare for quantitative factors). With three levels per factor one usually
e
proceeds similarly for initial analysis, later partitioning treatment sum of squares
for linear and curvature effects which are simple functions of the treatment means
(cf. Yates, 1931). To \ring out the points to be made here assume many levels
per factor and a standard regression approach to evaluate the response surface.
Let x be the levels of
JI measured in any convenient units"
silJlilarly z for
13
I
and write the regression.
(4.1)
+ ••• + d.. + ei 'k'
J.J
J
l-There fp(x) are arbitrary functions of x chosen so that the surface may approximate
to the anticipated form of response with as few terms as possible, commonly (in
2
ignorance) x, x , ... ; P ~ (a-1); similarly for gq and ~; dij represent deviation
of the means Yij • from the regression surface" and e ijk Yijk - YijO If He use
polynomials of f l and gl and the maximum munber of terms (viz. p • 1 ... (a-1)"
lIS
e
q" 1 ... (b-l), r .. 1 ... (p-l)(q-1H, the regression surface passes identically
through all observed means Yij .' and dij are zero.
For simplicity of expaBition suppose we fit a quadratic
A
Y• m+
~x +
2
bIz + a 2x +
CD +
b2S
2
(4.2)
The assooiated analysis of variance will be as in table
Table
4.1.
Analysis or variance for a quadratic regression fitted to a two
factor experiment ta th a.b levels and d replications of each
treatment combination.
QuadratiC regression
Deviations ot 1... trom regression
1J.
Total between treatments
e
4.1.
"Internal error"
det·
S
(ab-6)
(ab-I)
ab(d-l)
M.
S.Sg.e
sQ.
Sa
St - Sa
ca
S
r
ab
d] (Yij.-Y••• ) 2 = S
t
abr
~ (Yijk-Yij)2
Ii
e
M may be greater than Me either because the regression fails adequately to describe
r
treatment effects, or because experimental conditions have permitted errors to creep
into Y • ld'lich are not fully represented in variation between replicates and hence
ij
in the "internal estimate of error" evaluated therefrom (in other ,.,orda" replicate
observations on the same treatment are correlated).
The oonventional
5 per
cent
significance level for F • Mr/Me is not adequate to distinguish these because it is
not sensitive to systematic deviations (ct. Yule and Kendall, 1937, Sec. 22.20);
adequate statistical test requires fitting a further
regreB~1on
term.
Sometimes
graphical examination may be more expeditious to indicate if this is \-lorth doing
e
and useful in showing just what kind of deviations are occurring.
appear to be at random and a tew more terms cannot reci.lce
Me' then
l\.
l\.
If deviations
to equivalence with
(termed by Edl1arda Deming the "external estimate of error") may be the
-14•
preferred estimate
or
error from 't'1hich to evaluate precision of regression coeffi-
cients and upredictions ll •
Inflation of
!\. may not
be due to a real error oomponent
but to faime to find a proper funotional form for the surface l.m1oh should be
fitted, or to havirg deliberately used a simple form of surfaoe to faoilitate
deductions to an adequate degree of approximation.(Box and Hader's lIlack of fit ll ) .
In suoh oases
l\. will
not bel:ave properly in aooordance with statistical clistri-
blltion theory of random variables, and the uses we make of it as an estimate of
error varianoe may not be theoretically justifiable. Ue nay nevertheless feel
obliged to ignore these oomplications to get a working oompromise in place of
impracticable oomplexity.
e
ConseqJ.ent inaccuraoy of tests may not be appreciably
worse than results from failure of real data to oonform l"ith postulates of ncrmality.
Interaotion mean squares in the general oase will often be in that role.
SUppose we have obtained a regression with satisfactory fit.
Uhat we have done
is to evaluate a. surfaoe l-hich states the mean yield (for the average environment
of the given experiment) for any treatment combination XZ" including values
intermediate between those observed, Nt thin the range of observation.
(Some
extrapolation may be oonsidered depending on one's faith that the fitted surfaoe
represents a functional form.
'cra~y'
Purely empirical
pol~omials
should not be extra-
polated.
They can do
things at very short distances from the observed
regio~)
In addition to being able to interpolate between observed points the
regression improves aocuraoy of estimation everyt-lhere wi tbin the region because
estimates even at observed points are reinforced by interpolated information from
adjacent points.
A few coefficients may be of imi vidual interest for testing
some hJPOthesis;
for example, if
loTe
knOl" that the functional relation follOl'1s a
quadratio form, the quadratic coefficients tell the rate of change slope, valid
everY\'lhere.
But if the surface is curved the linear ooeffioients tell only the
slope at an arbitrary location l-rhere lore have put the independent variables
equal to zero.
By and large the coefficients individually tell little, they are
only stepping stones to estiIll1tes of mean yield at any point.
'toTe
If, as cOlllllOnly,
have fitted orthogonal polynomials, an estimate hhich omits some effective
terms, for example using only linear terms (equivalent to main effects) lihen
quadratic ones are significant, is a llEan yield of
SOlBe
subset of treatments and
represents a point off the regression surface, the centre of a chord.
what it
repre~ents
Relative to
its standard error is correctly evaluable from Me' but what it
represents may be of little intered.
On the other hand if curvature is small
deviation of a chord from the surface proper nay be small relative to random elTor
e
and we may choose to use it for simplicity, but there is no strictly valid answer
to Nhat starrlard error should be attached to such estirrates.
some form of
..(i and
~j
l\.
may be used a s an expedient.
As described above
This interpretation applies to the
as these would be evaluated for the general equation (2.1).
Vari.ance conponent analysis is (almost) never relevant to an experiment l1ith
quantitative factors. '!iIe cannot reasonably formuJAte a probability distribution of
treatments over a surface.
points is meaningless.
The variance of yield between arbitrarily selected
The only exception is for a factor like location on coils
in the eac1mple &ivan by Vaurio and Danieljl (19,4), used by Wilk and Kempthorne
(19,,), and briefly descrtbed in Sec. 13 below; that is a "factor" uhich Hill not
be at choice for future ltlorldng but for which a gi van ran~e must always be present
in production. Ue can imagine location selectable at random from a uniform distribution over the length of a coil. Some systematic trend may be describable by
regression, and further random variability representable by a variance component
between randomly chosen locations.
•
This is different from the ordinary quantitative
-16factor loJ'h:l.ch is studied for the purpose of selecting that level at which operating
conditions are best for most profitable ;yield.. and for Hhich variation betloJ'een
selected levels will be i;rrelevant. (t-le might be interested in variability "'hich
could be associated ld th a specified amount of variation of one factor as may
occur under operating conditions of manufacture.
But this is something different
from the variance component between the arbitrarily selected levels observed in an
experiment) •
•
-11...
5. Regression model for qualitative factors.
In the general case for qualitative factors
't'16
term Eisenhart's I the Ilregres-
sion model lt because the linear equation (2.1) can be
writ~n
l'Jhere x, z are now Ildummy" variables tald.ng the values 0, 1 as required to reduce
(5.1) to (2.1) (except that here we use roman letters for the regression coefficients
.
in order to conform 1Ji th notation for the general model am analogy to it as
be described in Sec. 10).
~T.ill
The difference from (4.1) is that we are now fitting a
"surfacell to ab discrite points in (a+b) space, and interpolation between points
has no meaning.
-•
Expectations of mean squares in tie analysis of variance are
commonly expressed as in Table 5.1, l1here 9(a) • ~2
.&Ja1/(a-l), etc.
The excess of
mean squares over error is expressed in this form to indicate that they are functions
of location parameters not subject to variance as in sampling random variables;
but they can be read as p»oportional to the second moment of a restricted subuni verse of variants.
As stated in Sec. 3 the analysis is an algorithm to indicate
tests of significance for Groups of regression coefficients;
each term may be inflated if real effects are present.
Table 5.1
Factor
e
~
(a-l)
f!
(b-l)
~
(a-l)(b-l)
Error
E(M~SCFt)
d.f.
ab(d-l)
i
+ bd i(a)
(12 + ad 9(b'
2
(1 + d e(ab)
a2
the
e, indicate how
-18If the (ab)ij ts can be assumed zero or negligible the 'curved I or 'crooked I
surface becanes a 'planet vhich can be evaluated with only (a+b-l) constants; in
other words 't'le can bring averages to bear to improve accuracy td thout thereby
making projections off the plane, If there is a systematio pattern among the
(ab)ij it may be possible to reduce them to a fel'ler mmber of constan-t.s analogous
to fitting a aurved surface,
If the (ab)ij are not negligible, and whether
systematio
or erratic" averages represented bya and b j , "main effects", are
1
projections onto artificial points off the 'surface' in the same way as estimating
points on a curved surface using only linear terms.
They represent yields not
realizable in practice, except perhaps by deliberate mixtures of treatments such
e
as would not ordinarily be used.
The "internal estimate of error" is technically
valid for comparisons between such means interpreted strictly for what they are,
averages of certain sets of treatments, but only circumstances can determine mether
or not such comparisons are good ones to make.
Interaction elements play approximately the same role as coefficients for
curvature in quantitativa
regreE;~ion.
'Substantial ones indicate that treatment
combinations should be individually considered. llhen they are relatively small
the main effects may be read as
~eneral
estimators for treatment effects over more
or less wide ranges, ahalogous to usinc; a linear regression for approximate
description, or for interpolation, although the true relation may not be quite
straight.
Sec.
4.
The question of suitable error variance is then the same as for M in
r
The interaotion mean square may be appropriate as an "external estimate
of error"" alias variance of deviations from the fitted linear 'regression',
if the interactions do not occur at random and unpredictably with respect to
furore applications, the decision may be more a question of expediency than at
But
-19statistical theory,
Preferred procedure may be suggested by proposed applications
to future 1rTorking rather than be dictated by the treatment combinations observed
or the population which the experiment was originally designed to sample.
Main effects and interactions.Als originally defined for factorial experiments
(Yates, 1935,1937; Fisher, 1935) belong to the pure regression model.
These
definitions Here explicitly formulated to summarize informatively and effectively
contrasts betiJeen certain selected treatment combinations, equivalent to fitting a
regression to describe gradients between a given set of points.
As such they are
correctly judged relative to the internal error inich assesses precision of
If one
measurement of the mean yields of each treatment combination individually.
#
e
idshes to go on to a 1-1i.der class of hypothesis, to make inferences about r.elated
but unobserved treatments, the definitions and precision of their estimators
relative to the Hider field of inference, in effect
e:~trapolation,
must be recon-
sidered. He will return to these considerations under the heading of the generalizeq
and mixed models,
The regression model can be regarded as a limiting
ca~e
of such
more general hypotheses nherein the universe of stUdy is defined to be just the
set of treatments observed, but such definition
l-lm
often be merely a conventional
way of reverting attention to individual means.
Although the typical regression model has only one group of stochastic elements,
sometimes, for example in a split-plot experiment, there may be random errors at
t"TO levels of magnitude.
In such cases the model breaks into tl'lO regrescions:
that for main-plots has split-plot treatments balanced in every unit, that for
split-plots has main-plot treatments as blocks, each with its respective single
error group.
Complications are possible but can usually be sorted out on general
principles and need not be taken up in detail here.
-20The "random" model.
By the Ilrandomll model is usually implied a variance component analysis wi. th the
observed elements of each group aSSlmed to be random samples from potentially
infinite populations.
~1ith
respect to these populations the individual elements
of the obE:erved groups are of little interest;
share of variance of the complex population,
the objective now is to assess the
o'~ l'lhich nay be ascribable to each
factor operating wi thin it.
The linear equation (2.1) (except that in this case A, B, D do not exist) may
be termed the latent equation
to signify that values of the individual elements,
usually deBcribed as random variables, are no longer of interest.
The function of
the latent equa'l;,ion is first to define the structure of the population and experiment, and assumptions about interactions;
second to provide machinery to evaluate
expectations of functions of the observations, particularly mean squares" as
functions of distribution parameters of the random elements, and tamce to define
the variance components.
For a compleY infinite population of the type postulated its total variance is
the sum of variances of its composite elements.
T'le therefore ",rite the variance
equation
2
2
O'y sa a.".
222
+ o'~ + O'.,(~ + 0'5
to indicate the parametric .;.va...r;.;1.· .a,.;n..,c.e....c...
om
. .p;;;.o..n.e...n;.;t.,s.. which are to be evaluated.
_
Strictly spealdng the letters of the linear equation are "random
,.-...-variables"
_-,
....
only '!-Then regarded as functional forms.
tt the linear eq,lation be read as repre-
senting a particular observation and set of elements, these are not
nO~T
random
variables, though it rray be convenient to speak of them as such (Sec. 7.6).
Furthermore the nature of a random variable is a function of the method of sampling.
•
e
The elements postulated in (2.1) are uncorrelated between groups only because of
the balanced sampling design Which has been specified,
The elements of an unbal-
anced experiment will not in general be uncorrelated. Relative to unbalanced
experiments the appropriate postulate
about balance or unbalance in the hypo-
thetical population, and thence the consecpent correlatidn of elements in a sample
and the appropriate definitions of the variance components, seem never to have been
threshed out,
For balanced experiments and populations these postulates suffice for estimating
the variance components by aMlysis of variance,
To use maximum likelihood esti-
mation we have to add more" He have to specify also the probability distribution of
e
a given sample of Y; or equivalently" given the definition of the elements, to
postulate their joint probability distribution,
The complication over ordinary
likelihood problems arises from the condition that observations are correlated
within classes and that l'Je have therefore to consider an n-variate
function.
probability
The likelihood of a sample ll'Alst be expressible as a function of the
observations (y)" of the parameters (6.1), and of not more than an estimab1e number
of misance parameters,
The random elements may not appear,
For example consider the simplest case -- a nested classification symbolized
by the latent equation
i • 1 , •• a
j • 1 ••• d
with the assumptions that .( is NID (0,
probability of the sample
o~), 6 is
HID
(O,o~) for every i. The
can be fornnlated as a probability element of an
ad-variate normal distribution, with common variance
(o~ + o~h
and 't-Jith covariances
-
e
O'~ bet'"leen all pairs Hithin the same class. Some heavy algebra then leads to the
usual analysis of variance solution. Easier formulation is given by noting that the
2
2
mean square within classes, s6' mst be a sufficient estimator for 0'6 • (Recollect
that we are limiting ourselves to balanced designs, i.e" d: is constant in all
classes.
If di were variable the mean square 1dthin claf:ses does not contain all
2
the information on 0'6 and complications ensue.) lie can then reduce all available
s~
and the " i • which are NID (1Jr,
log likelihood for theEe can then be l'1ritten as
information to that given by
In L. Const + ia(d-I)
_ ~ (Yi. - 1Jr)
In (s2/i) _ a(d-l)
6 a
2 2
0'6
(O'~
s~ _ ~ In (O'~
c;
....
+ c{/d».
The
2
+ ",) _
Q
2
2 . 2
2(0'..< + O'a/d)
(cf'. Anderson and Bancroft, pp .319-320, 't-rho outline a more
complex case; and
Cochran, 1937, nho gi. vee corresponding formlation l.zhen d and O'~ are both variable
between classes.)
The direct likelihood approach has forced us back to considering correlation of
grOllped observations, a complexity which the linear model and analysis of variance
approach was formulated to avoid.
It does this in taking acoount of postulated or
given forms of distribution "lithin and bet1'1een classes, extra information which the
variance analysis does not need and does not use.
Variance analysis estimation is
a distribution-tree technique.
~lithout
stopping here to detine a random variable (Seo. 7.5) notice that a simple
random variable as usually considered is a function Whose observed value on any
event 'cannot be predicted in advance and "Ie cannot in making another observation
,
e
-23demand that an element similar to one previously observed should appear again.
Yet models (2.1) or (6.2) imply that all elell2nts except 6 can be re-sampled at
will_
Imagi ne the .,(i of (6.2) as values oorresponding to uras which can be
specitied by location, and the 6 as balls in the ith urn. l:1ith respect to
i3
sampling the urns .,( is a random variable which takes the value -\ on the ith
dral-ring.
BIlt after selecting the ith urn we return to it again and again, i.e.
to the same .,(i' to sal1llle various Yij- Uith resPedt to the distribution of repeated observations from the saJ\ll urn .,( 1s a parameter. In the simplest case being
considered here the duality is not troublesome, b1,lt in more complex cases it can
become confusing when elements nay seem to flip back and forth between parameters
e
and random variables like the faces of a cube drawn on flat paper for which
psychologists sometimes ask us to record how frequently they snap from lower to
upper vie'tiS and back again.
If the dual role of the .,(Ie be not watched one might be tempted to formulate
the above likelihood problem by saying that the probability of the eample is gi'l;en
by the probability of selecting a .,('s multiplied by the conditional probability of
then selecting n e's leading to
ln L • const - a ln
(1.,(
~ 2
~"',
2
2
2
-~.,(i/(2(1.,() .. n ln (16 .. ~~ (Yij-lJ. - .,(i) /(2(16)
(6.4)
Furthermore maximizing this with respect to variation of the .,(i' as well as of
~, (1~ and (1~ , yields a solution (if bet'tieen class mean square be not too small)
l-1hich appears not nonsensical. At least it is consistent in that it predicts a
e
set of
"'1.
whose second moment is
(1~, whereas if we estimate .,(1 as independent
parameters the variance of these estimates will not be the
solution might however be described as 'schizophrenic·l •
C1~ tihich we seek. The
In the third term of (6.4)
-24the
~
are variates which the procedure is trying to draw together as estimators
of their central parameter, IZero;
'l-lhereas in the last term they are parameters
being spread out to minimize variances within urns.
The procedure is irrational
in attempting to treat ""'i siIlDlltaneously in two roles, and the fault is here
easily exposed.
lht is not aaimilar misdeEsnour in effect committed, albeit more
indirectly, when one purports to estimate botha variance component for a larger
popula tion and the means of observed variants under guise of using the same linear
model (Sec. ll)?
In my opinion Neyman and Scott (1948) also commit this mlsdemenaour, although
it can be argued tha t they have evaded it by inserting only the ""'i l,rithout the
parameters of their distribution in (6.4). 'trIhen some parameters of the probability
distribution of a sample occur only lrith a mmber of ebservations which remains
finite as the sample is increased they term them "incidental parameters"; parameters
occurring in the distribution of all observations are termed "structural".
tlrlO
main examples are:
Their
(1) The observations are normally distributed about a
common mean""" the structural parameter, but fall into groups of n observations
i
lrJith each Group having its own variance
incidental parameters. Example (2)
c{,
is similar but with common (12 and variable means.. ""'i (this being the case discussed
above except that no distribution is postulated for the ""'i).
Kendall's (1952)
maxiIlDlM likelihood derivation of Kummellts solution for a linear functional relation,
y
III
""'0 + .,.X,
is similar.
Here the structural parameters are '<'0 and""'1;
the
groups have each two observations, the paired observations Xi' 1i , with the "true"
values, X., as incidental parameters.
1.
Neyman and Scott claim to have established tl'10 propositions:
11(1)
Maximum-likelihood estimates of the structural parameter-s
•
relating to a partially consistent series of observations
need not be consictent."
-25"(2)
Even it the maximum-likelihood estimate of a etructur~.l
parameter is consistent, if the series of observations is
only partially consistent, the maximum-likelihood estimate
need not possess the property of asymptotic efficiencYco ll
They demonstrate proposition (1)
~dth
example (2), and vice versa.
the failure depends on n remaining finite.
i
In both cases
The properties of maximum-likelihood
are asynptotic ones, and they proceed to the limit by takiTlG an indefimtely
increasing number of sub-samples of n observations. My feeling is thatthis is an
i
improper use of thea..CJymptotic process. If the incidental pa.rameters are parameters
in any true sense it is, at least theoretically, possible to return again and
again to resample the sub-populations of "hich they are parameters (cf. Sees. 7.18
and 13 belov1), hence the only valid asymptotic procedure is to allow all n to
i
tend to infinity. In that case the usual maximum likelihood propert:t-es obtain and
the propositions fail.
If the set-up is imagined to be such that lii cannot be
increased, so that the only available approach to infinity is by increasing the
mmber of groups, then the 'incidental parameters' must be random variables with
their Ot"1n distribution, and should appear as such in the likelihood function as
in (6.3) as opposed to (604).
Christening them incidental parameters is a dialec-
tical evasion of ignorance about l'Yhat probability distribution 'Should be postulated
for them and hence what parameters should be inserted in their place.
The only
other lrIay round is to treat the incidental parameters as if from a finite universe
'-Those parameters are the individual values
~,flich
may occur (Sec. 7.11).
In that
case Sec. 7.12 sholls that maximum likelihood cannot be ueed, and again the
propositmons fail because maximum likelihood estimatee do not exist.
Some texts and lecture notes appear to imply that a variance canponent analysis
-26can be deduced from a
~egression
mo&t:h.
Recogniti on of their incidental, almost
accidental, association may therefore be advisable.
Least squares estimation can
be directed only to mMel I, i.e. to estimate parameters of location.
Analysis of
variance, w:tthout reference to regression, indicates an estimation procedure for
variance components.
It is not necessarily efficient, (except for normal distri-
butions and balanced samples, for 'Hhich it gives the maximum likelihood solution)
but it is mana[:eable Hhere rnaximm likelihood solutions are impracticable or
impossible.
The best partition of the sum of squares of an unbalanced sample is
hO\fever not obvious, and it is customary to use a quasi-regression approach merely
to 1nd1calte a workable ertition.
e
It is not ideal because it does not lead to
unique solutions independently of order of Ileliminationlt of the (dulllllIlll) independent
variables.
But at least it is a pointer among innumerable possibilities.
principle the procedure is not
~cesE:ar1ly restricted
In
to partitions of sums of
squares. l·re could consider expectations of all sorts of functions of observations;
but quadratic forms are known to be most efficient for normal distributions, and
among the many such which mif;ht be concocted those indicated by a quasi-regression
analysis
or
variance seem reasonable.
-277. Some basic theory relating to samples from finite populations.
1.
r10st of us approach variance component analysis from the side of elementary
theory applicable to infinite populations as in Sec. 6.
Uhen we start postula.ting
finite populations for some of the classifications consequent effects on theory are
not at once obvious.
Invariably at least one group of elenents are random variables
(elements from a hypothetical infinite population) and therefore so also are the
observable Y's.
Thence the elements of any group, although finite in number, can
still be defined only asymptotically and more or less arbitrarily.
cumstances one rray overlook things lib iOO would be obvious in
finite population whose elements aan be directly observed;
lO
In these
cir..
rldng ld. th a simple
and other subtleties
which can usually be slurred over without noticeable consequence nOl·l become more
critical for clear understanding.
Thi s section endeavours to bring these explici.tn-
to attention and remove lurking ambiguities.
2.
To avoid
wea~i60m.e
reiteration of 'infinite' and 'finite', population 't-1ill
be used to denote a hypothetical intinite population" universe will denote a
potentially existent or postulated finite population.
3. Element will
be used to imply the quantitative value of an individual, a
fixed quantity.
4. The condition that a universe has just a definite number, say N, of elements
distinc'Uishes its study from other st".'.tiDtical methods more sharply than is perhaps
generall7 realized.
Firstly the number N has to be assumed known.
investigations to discover
~
{lIe exclude here
for example mmber of fish in a lake, which introduce
other rratters with which we are not concerned.)
Secondly all observations and
statistics are going to be limited to combinations and functions of a selection
from these N fixed quantities.
Since N is finite the frequency distribution of
the universe is necessarily discrete.
The space denoting the values which the
-28elements may have
(before toJe observe them and thus know what they. are) may houever
be continuous, or if discrete is likely to have very
actually represented.
uan7 more points than are
Complete specification of the frequency distribution of a
universe means specifying both the number and quantitative value of each type of
element present.
Since the number of types may be N, or nearly N, it will often
be convenient to suppose each type to be present only once, and to imagine the
universe distribution defined by listing every element individually, say
Xl"'~
(obviously with repetitions of the same numerical value as may be neces6ary~).
This distribution is of course not a pDobability distribution, though many l-JI'itings
appear to trea t it a.s if it were
e
so -- l"ye get so accustomed to thinJP.rg of ire-
quency distributions and probability distributions as synonymous.
5.
A probability distribution is associated only uith a random variable.
In
fact, conversely, a random variable (or variate) is usually defined (e.g. Kendall,
19tt.3) as
II
a variable t.Ji. th which is associated a probability distribution".
Feller
(1950) emphasizes that a random variable has the characteristics of a function and
is defined only in a Given sampl2. space.
Cramer (1946) describes its observed
values as determined by a "random experiment which may be repeated a 1ar5e number
of times under uniform conditions ll •
6. The act of sampling from a universe (the llrandom experiment") generates a
random variable, say x (without subscript).
An 'observation', x., is the value
1.
assumed by a random variable in a particular experiment (sampling).
it is the element sampled.
Equivalently
An observation once taken is again a fixed quantity,
sometimes regarded as a constant but (to me at least) to so regard it may be more
confusing than helpful.
t~2}
= 4,
2
A mathematician speaking of the function f(x) .. x writes
and tabulates the "function" for given arguments,
Therefore following
-29-
..
Feller's description of a random variable as a function whose "independent
variable (is) a point in sample space i.e. outcome of an experiment", to describe
an observation as the random variable of an experiment seems permissible.
An
estimator is a random variable, an estimate is a value observed in a given experiment; either is called a statistic and this bracketing seems preferable to bracketinr
goth an estimate and a parameter as constants.
Furthermore the suggested usage
is in effect adopted Hhen a standard error is a ttached to an observation or estimate,
and spoken of as its standard errar.
1. The probability distribution of a random varia ble generated from a universe
states the asymptotic relative frequency vr.i.th mich observed values rrBy appear in
infinitely many samplings of the same kind (tdth rep1acen,ent -- the repetitions
a random experiment under uniform conditions).
•
The universe dictates those points
of the sample space Nhose probability is non-zero;
function of the method of sarrpling.
or
the probability function is a
The probability distribution must specify both.
If single elements are "sampled vrith equal probability" the probability distribution
is
p(x)
I
=N'
(7.1)
It is easy to specify innumerable other methods of sampling to each of which would
correspond a different function p(x), for example sampling
~Tith
probability propar-
tional to size as is common in sample surveys.
8. The probability function p(x) has m::;aning and interpretation only
~dth
respect to the relative frequencies of a hypothetical infinite population of
•
e
observations resulting from samplings l1ith replacement .. i.e.
repetitions of a
"random experiment" under uniform conditions (cf. Kendall, 1949). i'1e thus commonly
think of the observations, and also of the random variable itself, as being elements
.'
, ....
-
e
-30In thinking of an ordinary mathematical function as a curve or
surface in space we identify it loa.
the infinity of points on that surface; so . y
of that population.
th
not identify the function known as "random variable ll with an infinite population of
elements representing the values it can assume?
The hypothetical population is a
model '''mch portrays its probability distribltion as a function of a function,
p(x(expt.».
9. Statistical theory, Hhile concerned also uith other matters, is to a large
degree synonymous l1ith probability theory.
In particular the theory of statistical
estimation is essentially the theory of how to estimate the parameters of a probability distribution, that is of a hypothetical infinite population.
e
Thus a
statistical estimate always invokes the concept of an infinite population and can
do not other\dse.
Tukey (1950) est:.entially reaches the same concept, but he does so empirically
as
"a price to pay for simplicity" and it th a later exclamation mark.
I come to
it independently as a fundamental concept found necessary in trying to H.I:'i te down
basic ideas which Will remain logically consistent ld th each other under all circumstance s.
10.
The bridge to inference about an existent or potentially existent universe
involves (1) assuming that sampling procedure has done its duty and (2) invoking the
a priori theory of probability to formulate a known relation between characteristic:
-
e
-31of the universe and paralOOtGrS of the probability distribution of a random variable
generated by the postulated sampling procedure (cf. Kendall,
u.
1949).
*
I do not know any satisfactory definition of 'parameters', text books seem to
evade stating one.
Nost af.'ten ue think of them as the constants S in expreEsing a
probability function as f(x; 9 ... 9 ).
1
p
However the population moments are cormilonly
referred to as parameters and the usage seems justified since they are functions of
S, and conversely the S can be expressed as functions of any p moments.
for any other suitable
set of populati on characteristics.
Similarly
Less common usage, but
one which seems to be necessary, is that the basic parameters of a distribution
-i~
Kendall, in this intere~ting and stimulating paper "On the reconciliation of
theories of probabil i ty", demonstrates that the frequentist and non-frequentist
theories of proQa,bility must each invoke the other at some point enroute from
theory to application. At the same time (among other ste.temants wlB.ch might be
disputed) lvhen criticising maximum likelihood and confidence intervals for
ignoring a priori probability he seems to forget the llbridc;e ll • His example is
to suppose observation of 1000 births of which 600 uereY':I!lale. He asks: "Are
lJe then to conclude that the sex ratio lies between .59 and .61 when from tremendous previous experience we knou it to be close to .51?1I fut the question
skips a chasm and is irrelevant. ~Je do not do an experiment to ascertain something already knovm. It i·:ould be fu tB.e to observe 1000 births to modify an
estimate already based on mi.llions. "Ie doan experiment to get information
about something unknown. If there are other samples Hhich we kno'tV' to be properl~
dz!awn from the eame population (and this ulm011ledge u is not contradicted by
.
evidence of heterogeneity), they are really all one experiment and estimation
prpperly proceeds from all pooled together. Estimation from a single sample may
have several objectives. 1.Je may knQ'fr the population but have used a new sampl1nl
procedure, or novitiate observers, and require to check the "bridge". 'Je may
think it might be from a certain population or a similar one, and assuming the
bridse souna, wish to check if the supposition iE reaeonable. ~:Je may kn011 it is
from a population sui generis, perhaps an inbred one affected by sex-linked genef
"1hose frequency is to be estimated. To begin by seeking an estimate already
biased by a prior guess, about uhat population the sample might have come froIp.,
'lr-10uld be poor procedure. The need is to get an independent estimate about the
population which this sample in fact reprecents, free of prejudicial suppositiom
Only then can "1e objectively consider relations to hypothetical possibilities.
That, naximum likelihood and confidence interVals very properly do; h01'7 can
objecti ve r8asoning proceed othenflse? Use of a priori knowledge or supposition
requires that one should be more than ordinarily sure that all bridGes lead to
the same island.
-
e
-32derived from a universe are the N values
Xl."~;
all of them are necessary to
completely specify any such distribution (with some possible compression i f several
Xi have the same numerical value).
12.
The probability function of the distribution of samples from a universe
is (usually) a function of N only; the parameters Xi appear in the place of ilhat
1fould usually be a specification of known sample space.
(Some sampling schemes,
for example sampling 1'1ith probability proportional to size, may introduce the Xi
into
p(x); but it is of little help since they still remain in specification of
those parts of sample SPace for l1mch p(x) " 0.)
can never
e
e~timate
'!'tvo consequences fol101v:
(1) He
all the parameters from a sample of less than the universe itself.
"To achieve reasonable simplicity it is necessary to describe the probability
distributions rather summarily by a £e1-1 .typical values' " (quoted from Feller,
1950, p. 171, uith alteration of tuo 'lords, the context is different).
For the
groups of elements lath which analysis of variance is concerned the mean (locati on)
is given by definition as zero;
the only
selves is the llvariance" or its analOGue.
t~r9ica1
(2)
value wi. th nhich ue concern our-
Since nei.ther the specific parameters
nor that single function of them ('variance t) to be estinated, enters into the
probability function, p(x), the method of naximum likelihood cannot be used to
indicate a preferred estim:'1tor, nor to provide an absolute measure of efficiency for
any other estimator.
probability function;
The estimation problem is not, as usual, to estimate the
but to estimate points of sample space to l'hich the prob_
ability function applies, or a function of them,
Herein lies a major difference
distimu1shiIl[;; problems of estimating characteristics of a universe from those
relating to parameters of probability distributions more commonly
discus~ed.
•
e
-33I have never before seen this explicitly stated. It is indeed obvious.
But it seoms worth stating to clarify consequencasc.of imposing finite universe
postula"lies on our usual approach to analysis of variance. For eJ::ample: on first
reading Tukey's (1950) claEsi.f'ication of models far C::ata in tWO""llay tables one
can be irritated by his vag,ue use of "effects", ignoring lnether these are to be
regarded as "fixed effects" to be individually evaluated" or as "random variables"
vlhose dispersion ionly is of interest. On formuJ.ating these consequences of
postulating a rini te universe one realizes excuse for uhat at first seemed careless
terminology.
It might seem simpler to say tblt, 't-1hereas ordinary statistical theory g'ljates
the sample space a priori and the probabillty functi on contains the parameters,
universe theory postulates the probability function as given and has a sample space
defined by the parameters (either those of the sample distritution itself or of the
parent universe). Such fornulation honever uould imply more fundament.al divergence
bet~lITeen "ordinary" and "universe" theory than does the above forrrulation, and vlculd
not, I guess, be acceptable to theorists. (But see appendix to this section.)
13. Unless stated othwrl-rl.se a (single) sample is assumed to be drawn a s a Hhole
i.e. l1ithout replacement.
independent.
•
The observations in such a sample are of course not
Consequently for most purposes it is convenient to regard the uhole
sample as one observation of a single random variable.
described as a vector;
For Jenerality it may be
but often the relevant random variable may be merely a
single statistic of interest, e.G. the sample mean.
14.
'Variance J, a measure of a random variable I S 'state of being variant',
is defined by
var(x)
= a2
=
~ = E(x-E(x»2
(1.2)
The definition involves the probability distribution of x and thus, strictly
interpreted, applies only to random variables.
Since, as the second moment, it is
a parameter of the probability distribution it may be said t.o be the 'variance of
the (hypothetical) popUlation'.
15.
A universe, being an assembly of fixed quant.ities" does not have a variance.
Symbolized by mass points along a line it does Mve a second moment, which" by
analo~y
rJith mearures of dispersion found effective in describin:; probability
distributions" is naturally adopted as a meaf;ure of the disper:s10n of its elements.
-34Inevitably this second moment gets called a 'variance', although the usage seems
inadvisable not only etymologically and semantically but
distinctions rrade above.*
becau~e
it helPls to 'blur
It may be excused on the grounds that it states the
variance of a random variable generated by saYI¥Jling single elements Hith equal
probability, and this may be regarded as the most primitive sampling distribution.
derivable from the universe.
16.
The 'variance of a sample' is justifiable on the same lines, only more so
i f 11e agree to regard an
ob~ervation
as a 'random variable of an e:h'Pcriment'.
It also has the excuse that estimates of a parameter cOllmonly go by the same name
and are difi'erentiated only symbolically.
Variance of a sample is seldom l1anted
for i teelf alone, i.e, merely as description of n given quantities, but has interest
•
•
* Edwards Deming
(1950, p.56) notes that, on the analogy .&0 phYs:!: cs frOffi-1 uich many
of our terms derive, the "moment" should be defined a s ~; (X-1)2" and &X-x)2 IN
should be termed the "moment coeffieient". Honever accepted usage has gone too
far to retract, and a total and a mean being effectively the same statistic a
student may usefully be encoura~ed' so t.o thinl~ .Jf them (provided he doee not mix
them in computingl). Clearly the moment is a characterisitc of the universe,
not of its individual (fi~~ed) elements; therefore, for logic as 't-lell as for
brevity, and de~pite leading texts, the phrase "moment (,rariance) of a universe"
seems preferable to "variance of X in the univ-3rse'J
Only the struggle for consistency thrwghout this paper has sensitized me to such
shades of phrasing. At other times I might have been numbered among those nho
saY:"Admittedly 'variance f is not etymolo~ically applic:l.ble to a universe; but
the idea of using its moment to measure spread of its elements is the same, so
'I,-Ihy quibble on a name?". However as a general rule careful use of analogies
should be guarded. For example: the probah'..lity of a discrete distribution
should not be termed a "density". To do so destroys the vThole utility of our
analogy of space and probability measures to volume and mass. Yet this erroneous
use of a physical term is becoming re[>Tettably oommon in texts and handbooks.
The distinction is not trivial. M1 first meeting 1'1ith that error in an excellent
and authoritative text, before my own understanding had crystallized, cost many
hours trying to rationalize it before oxplairri.ng the analogy and its uees to
students reading tInt text. In a sinti.lar Hay many students have lost time
H'orrJling over "variance of a universe", though this one is lesB Critical and does
not do comparable damage.
mainly as an estimate of a population parameter.
Restricting oneself for the moment
to samples of n independent observations, the true (population)' mean being Unknown,
the lQgical definition of sample variance seems to 1:2
n
s2 • k • ~ (x_x)2/(n-l)
2
A surprising number of writers still insist on
(7~3)
~ • ~ (x_i)2/n• The same writers
would object that, ever since Gauss, it has been l1rong to descirbe variance about
a regression on p independent variables as other than.:7 {x_~)2/(n-p).
Since
x is
essentially a regression coefficient on a constant independent variable they are
thence being inconsistent.
If required
~
is properly describable as the second
moment of the sanple.
17. I'Then" as happens l.n.thin samples from a uni:dTerse, observations are not
independent, proper usai,e is more problematic.
Tukey (1950) has given the most
emphatic statement of the inclination of manyautharities by choosing for convenience "to define the variance of any finite cet of numbers as k2, 1-1hether the
finite set is a sample or a population". (It was my Ol-m choice and practice until
endeavouring to find consistent terminolOGY for this paper.)
The convenience of
course arises from the fact that k is "inherited on the average" (Tukey, 1950)
2
a stronger quality than being unbiased" meaning that it is an unbiased estimator of
~
~~
a
/.
2
(X-I) /(N-l) independently of N. (I follow lvishart, 1952" in using capital
letters to designate universe homologuesc.:of sample statistics.
Cochran, 1953, and
Hansen et al, 1953, llse ~ under designation S2 vdthout other name.
given the name
II generalized
These properttes give
~,
Wishart has
k-statistics ll to Tukey's k-statistics for a universe.)
wha.tever its name" a central place in universe theory;
but it is not the variance by definition (7.2) of any random variable derivable
from the universe by ordinary sampling.
The variance of single observations on
l'\~peated sampling is, by (7.2), ~
lit
~ (X .. r)2/N, and cannot
be arbitrarily
redefined without becoming inconsistent with the variance of means of samples of
n observations iihich is
~
(% . . ~) = ~
when n
= 1.
Furthermore, if He form a sum of tuo elements drawn singly from each of two
universes then the vc:.riance of such
SWJlS
is
var (x+y) • E «x+Y) - E (x+y»2 •
~
(x) +
~
(y)
lit
~
(x+y)
where the last term is the second moment of the universe of N N possible sums.
xy
~
Contrariwise
(x+y) is the average for all possible samples of the k determined
2
Hithin samples of n sums draim without replacement;
n
e
~(x+y)
It is not a
~
K (x) + K (y)
Ave:2 (x+y_'X...y)2/(n_l)
n~min (NxJNy ) (7.4)
2
2
for any universe of sums and, 1l1hile extremely useful, is not to be
lit
lit
interpreted in the same i!ay as a sum of variances.
Other inconsistencies i-Jill appear below.
Therefore, perhaps rather ruefully
since one might like so to describe it to experimenters, one seams forced to
conclude that
~
cannot properly be called a variance.
In some contexts it may
for expediency be reGarded as a variance of a hypothetical infinite population, but
in general distinction may be advisable.
18.
Observations in samples from a universe have a peculiarly ambivalency
which is related to ambiguity often seen in interpretations of analyses of variance.
Each observation, x." is immediately an e-valuation, l1ithout error, of one of the
J.
parameters, X., of the universe. Nevertheles:::" in absence of the Hhole N, the
J
sub-class of n are still random variables in the :sense that their values could not
n
-~
2
be predicted, and the summary statistic k • /; (x-x) /(n-l) is a random variable
2
and an unbiased estimator of ~ of the universe. Except in that its distribution
depends on the ratio n:N, in such a Hay that its dispersion tends to zero as n
~N"
-37it is not esr.entially different from the variance of a sample from an infinite
population, despite the fact th:\t the iOO1Vidual elements composing it are, in a
sense, parameters themselves.
19. The concept that both the observed sample and the universe may be regarded
as samples from a hypothetical infinite population utth variance
~
and unspecified
mean Has used by Cochran (1939) in first introducing that parameter, and has been
advocated by Hendricks (1947, 19$1).
Their objective
vTclS
to evaluate the variance
of means of observed samples, Xn' and it is easy to shot-l that if 'He imagine Y to
be distributed about the population IJ. 'Hith variance ~/N, similarly Xnwith
variance K /n, and if the sample of n is a sub-sample of the super-sample of N, ·then
2
it follows that the mean square deviation of x
n about X ;: E(xn_X)2 .. K2 (!.
n
-
!N)'
In recent years sample survey workers seem to have dropped this approach, presumably for reasons similar to those outlined above.
HO't-lever it may sometimes
(Sec. 9) appear convenient to t hink of lhat Vie shall designate as K(parameters)
or k(statistics) with greek subscripts as variance::: of hypothetical infinite
populations.
20.
Like "variance ll , the terms "correlation" or Ilcovariance ll and llindependence u
are properly applicable only to random variables.
In a universe with multiple
classifications the analogue of covariance is the product moment;
independence is a certain symmetry of associated elements.
the analogue of
Consider a fixed uni-
verse of N ... ABD elements corresponding to the standard model
e
i .. 1 ...A, j .. l ...B, k • l ...D
~Te may picture the univer se space as N points representing all possible values of
y plotted aGainst co-ordinate axes, one for each group of elements.
In most cases
-38d. 'k Hill have a distribution lhich is the same for all ij (often it tn.ll be
J.J
specified only as a random variable l'lith a probability distribution, pictured as
an infinite :)opulation) and may be said to be "independent" of the other elements.
If not, considerations regarding it uUl be similar to those for (ab)ij.
Imagine
three co-ordinate axis for (a), (b) and Cab), functional forms which may take
values (a)., (b)., (ab)i'.
J
J.
opposed to elements;
same value.
J
l-le
The sub-scripts t-lith brackets refer to values as
l\lant
to picture many elements of each group having the
In the space thus farmed plot a point for every y'
l!hich may occur in the universe of such yl.
1\10
= ai
+ b
j
+ ab
ij
.;;roups ldoll be sid to be
"independent in the universe" if the conditional distributions of such points fur
values of one group, given any single value of the other, are the same.
21.
By definition of a balanced universe everya. occurs equally often Ivith
J.
every b.; they are therefore independent in the universe. FUrthermore by the metho(
J
of sampline to form a balanced experiment they are al so indepeIideht in experiments
-- i.e. independent in the universe sense, regarding observations as fixed qua nti tie ,
l1hich is the reason lihy
~
does not appear in the mean square for
~
and conversely.
They are not only uncorrelated in the general sense, their sampling correlations
are identidlly
22.
~ero.
Since a given treatment combination determines a particular abo . tlith a
J.J
presc~~ibed a. and b., (ab) ,-Jill not usually be independent of (a) and (b).
J.
J
But
it may be if there are many elements a. and b. having the same value, and for every
J.
J
subset of these with given (a).(b). the associated (ab) .. have the same distri].
J
J.J
bution. The comition is unlikely to occur in a real finite universe, but this is
Hhat He in effect assume for "large" universes or populations in the random model.
•
The question is often raised as to how (.,(~) can 00 iooependent of .,( and ~ even for
·39an infinite universe. rIe cormnonlyassume each group to be normally distributed,
and since by definition they are uncorrelated, independence mst also be implied.
The postulate of normality necessarily invokes the concept of an infinite population" and the point that seems often to be mesed is that that concept implies,
not only an infinity of elements for each group as a whole, but also infinitely
many elemen-iis at every pair of
to<.
and 13 values corresporrling to the conditional
distribution of (-<f3). 1:1hat we are really doing is to postulate that the relative
probabilities l'1ith l·hich given values can appear in samples can be approximated by
sampling distributions for which these populations of elements are only conceptual
models to aid thought.
2.3.
In a universe of a finite number
or
elements, (ab)ij \dll usually not
be independent of a
and b • But if the distribution of (ab)ij is effectively
j
i
at random, that is shows no systematic pattern relative to the magnitudes or
associated (a). and (b)." it may be said to be 'practically independent'.
J.
J
(lIe
can picture this by saying that if 1-1e section the (a)(b) plane in squares and
consider histo&,Tams for the associated frequencies of (ab )ij" then allot these
conditional histograms should be approximately similar anfBwith equal" zero,
In any case, by definition, the universe product moments
.....,
.~ai (ab)ij
~
means~
and ;': bj(ab)ij
are zero; and the random variables formed by simple sampling are uncorrelated.
The
conditional covariance ot saya. ldth (ab)i. for a given sub-set of b. is honever
J.
J
J
not necessarily zero. Attention to this point may be relevant far some problems
'l>yhich are being considered by one of my colleagues.
assu~
averaging to be over the whole universe.
In this paper we l-dll ah1ays
-40Appendix to Section
7
Definition of Farameters
Subsequent to Hriting Section 7 it nas brought to my attention that Kendall
and SUndrum (1953) have gl. ven reasons for prefering to restrict the definition of
parameters to the constants commonly designated 9 in the formulation of a prob...
i
ability function of a variate as f(X;Q,e ...9). Their primary problem was to define
.
p
ll
nnon...parametric hypotheses and "distribution-free interence ll ; oonsequently their
definition is conditioned by that context l1ith arguments like the follOldng.
117.
Consider the hypotheses
(a) the population is normal
(b) the population has finite cunulants but r = 0 for r
2.
The first is not a parametric hypothesis under any ordinary interpre...
tation.
For consistency, therefore, the second cannot be parametric
for it is eompletelyequivalent to the first.
This reinforces our con-
clusions that the parameters must be fim te in mwnber to constitute a
parametric hypothesis."
Their paragraphs 8 and 9 then find "that the question lhether a given hypothesis
is parametric or not has to be decided in the light of alternatives against l·hich
it is to be tested."
"10.
e
i.ihence
~le ·~herefore
define a statistical hypothesis as parametric if
(a) it makes an assertion about a distribution;
(b) it specifies the distribution completely except for the
values of a finite number of parameters;
(0) it is considered aeainst alternatives of the same distributional form which, therefore, differ only in the values of
the parameters involved.
If a hypothe~is is not parametric it may be said to be nonparametric.
-41.1111.
It is important to confine the adjectivee llparametric" and "nonparametric to statistical hypotheses.
They should not be applied to
statistics, tests or types of inference.
This may scund rather austere,
but 1'1e have found a great deal of confusion arising from the use of
phrases like Inon-parametric tests I and Inon-parametric inference I.
"12,
It remains open 1-Thether, in view of the subtlety and relativity of
the expreseions, it is worth '..hile contirming to make a distinction
between parametric and non-parametric h;ypotheses.
On the whole we
think that it is worth preserving but we should not quarrel uith
anyone who took the opposite vie,., or who felt that some other
classification was more desirable,"
These arguments thus relate mainly to defining the adjectival usage, "parametric h;ypothesis", but recognising a logical need to embrace the noun, earlier
paragraphs state:
112.
It is necessary in the first place to define what is meant by a 'parameter
in relation to a statistical distribution.
Traditionally, the term,
which Has borrolTed from pure mathematics, denotes a variable,
ap~')earing
in the equation defining the distribution, ,,,hich can be regarded as
having one of a set of values.
dF •
1 __ e
(1
_ ~ (x-m)2/i
Thus" the QUantities m and a in
dx,
-
-ooL.xLoo
(2n)
-
(1)
are parameters; and so is the quantity 9 in
dF • dx,
9 -::: x ~ 9 + 1.
(2)
"3.
But statistical practice has not stop·)od here.
For example, it is some-
times said that the rrean is a parameter of the normal distribution (1).
This is inaccurate; l1hat should be said is that the parameter m is equal
to the mean.
to refer to
The point of the distinction is that if ue allow ourselves
mea~ures
of location, dispersion, etc., as parameters,
eet into difficulties.
vIe
In fact, all the moments of the normal distribution
would then be parameJliElrs (though, of ca.trse, functionally dependent); and
every distributibn would have an infinite number of parameters. 1je should
have to admit the median, the quantiler, the individual frequencies of a
discrete distrimltion and perhaps even the distribution function itself
as parameters.
This is not at all l'Jhat is intended, and we conclude that
the practice of referring to the summarizing measures of a distribution as
parameters is to be condemned."
Their arguments are weighty but they create difficulty for other contexts.
Fisher made a fundamental clearing of a statisticaJ. tangle by emphasizing the
importance of distiTlu"'Uishinc betlvecn a sample estimate and the population value
Hhich it
estima:~es:
in that context the contrast statistic versus parameter is
firmly entrenched in our terminology.
sample moments, medians, etc., will continue
to be stathtics (or estimators) whether or not
form of the parent distribution.
vIe
can make an assertion about the
But if Kendall and Sundrum1s restriction on
parameters is to be maintained ullat shall we call their antitheses?
Neither least squares eE;timati on of loc'ltion Iparameters I nor analysis of
variance est:t:na.tion of variance canponents depends on any postulate about the form
of dist,ribution of the observations.
Therefore on the restricted definition
neither the elements of our linear model nor the components of the variance equation
-43can be called parameters.
or 'typical
value~'
'Things to be estimated' is intolerable.
do not convey the
re~ired
Ipopulation l is an unsatisfactory adjective,
'Elements I
distinction of population values.
'True' regression or variance equation
implies an attempt at colloquial explanation unsatisfactory for a technical term.
To think of the elements of these equations as other than parameters is now difficult
In a brief chance conversation before I knet'll' of his paper - much too brief for
thDrough consideration - Professor Kendall expressed the opinion that, follm-r.i.ng
his definition of "parameters ll " the probability distribution of samples from a
finite universe has no parameters, and that the elements of the universe define the
sample space.
Contrariwise Sec. 7.12 had taken the vie'!'1 that sample space is
ah1ays definable a priori to cover all possible outcomes of an experiment, that
e
the probability distribution defines the relative frequency llith which each point
in sample space may be observed (including those which have IIprobability measure
zero" for a given population)" and that all population characteristic::. reQUired
completely to define the probability di2tribution Jmlst be parameters.
Suppose an
experiment is a sample survey to deter.mine the distribution of some 'event I in a
district.
Several districts and the same district in different months are a class
of similar experiments.
lIould it not be contrary to established usage to postulate
a different sample s[.Qce for each of these "ex.perimentsll? Is not specification of
the potential outcome
more closely related to the
e
of the second example of
Kendall and Sundrum's par.,; 2 than to a specification of sample space?
For these reasons it seems more consistent l1ith Olstom to define a parameter
as any characteristic of a population;
including both general characteristics such
as mean and variance and all those reqUired to specify a probability dictribution.
To do othervdse would seem to require unpleasant periphrases in the contexts where
-44the word has been used in this paper; and these usages may be considered to have
historical precedence over endeavours to distit'l&uish between parametric and nonparametric hypothesis.
If it is not'l impracticable to have different uards for the
general and restricted clasEes presumably 't'le will" as so often, have to get along
~v.i. th
two usages depending on context for differentiation.
It may not be too
difficult once the alternative definitions have been explicitly formulated,
-458.
Generalized variance model
The generalized model postulates that the complex population of y is built
up with only a finite number of elements in one or more (possibly all) groups of
the linear model.
let the specific latent equation be
Yijk • m + ai + bj + (ab)ij + dijk
i • 1 ••• a •••
j
• 1 ••• b
k • 1
ABD •
e
(8.1)
A~oo
• •• B"
00
••• d ••• n4
00
N
Some readers may object to the duplicate roles for the same letter, e.g. a = the
sample number of elements a • But if one notes the convention that an element
i
will always be indicated with a subscript and a letter (other than m) ,,-Ii thout
subscript will denote a frequency (lower case for a sample, capital for a universe)
the mnemonic value outweighs risk of confusion.
As description of the (complex) universe of y the elements of (8.1) are not
random variables.
Relative to a sample they may be interpreted either as random
variables or as possible values which such random variables may take.
For
precision notice that each group of elements corresponds to only one random
variable; the position might be most clearly exhibited by writing the random
variable for the first group as a(i) to indicate a function which takes on the
value ai at the i th sampling of factor /I •
Suppose first that A, B, D are all finite, and that, case (i)J there is a
nested sub-universe of dijk for each i j combination.
of the elements we impose the restrictions
To obtain uni(lue definition
-46-
More commonly" case (ii)" there may be only one universe of dijk whose elements
associate at random with all
treatments. Here we will consider them as subject
N
to only the one restriction"
may be imposed.
~ ~jk
a
0", although more generally Ilblockll restriction
Case (i) is COl1lllonly described as "nested sampling"; case (ii)
represents a Ilcornpletely randomized experiment".
Tukey (1949)", Bennett and Franklin (1954) and Uilk and Kempthorne (1955)",
define the universe Kz-parameters (which they designate
A
K2
=~
ail (A-l) "
B
Kb .. ~ b~1 (B-1)"
i)
~
Kab ...
~ (ab )ijl (A-l) (B-1)
(8.2)
N
Kd(i)
II
~ dijI!AB(D-l)
or
To simplify notation a K or k without numerical subscript is to be read as
~.
(Kls with other subscripts will be needed only in the appendix.)
(8.2) will be called "specific variance components ll •
~
or
Lefinitiona
They will also be referred to
as K(r) '" indicating K for "roman elements"" both for brevity in distinction from
K(
r ) for
"greek elements" to
re
defined later and as reminder that they are
K-parameters rather than variances (Sec. 7.11).
If the universe of every group is finite and there are separate sets of D
units for each ij combination (case (i» the elements are defined as unique means
of a complex universe 'Which is potentially observable.
In case (ii) the elements
are defined only as averages over all possible randomizations with dijk - Vie can
imagine them as means of a conceptual universe of A2B2n elements" that is nth all
N conceptually possible replications on every ij combination_
However S'Uch universe
-47...
is not potentially observable:
therefore, we are justified in describing such
averages as "expectations", indicated by the operator E, implying the limits in
probability, random sampling with equal probability being assumed.
More usually the size of at least one group, usually D, will be imagined as
tending to infinity.
probability.
The elements are then defined only as asymptotic limits in
The K are therefore, in general, functions of hypothetical quantities.
(K
is an extension of Tukey1s statistics for a universe l'Ji th two-way restrictions.
ab
K (i), being merely an average for AB sub-universes, involves no essential extension.)
d
The authors quoted above state the expectations of mean squares of analyses
of variance in terms of K(r).
•
These expectations are shown in Table 8.1 for nested
classification based on the model
(8.3)
where bj(i) indicates nesting in classification i, etc. (notation introduced by
Bennett and Franklin); and in table 8.2 for the model (8.1).
Table 8.2 is presented
as for case (i), for which the last term of (8.1) might better be written as
dk(ij) in conformity with (8.3); if dijk randomize with all treatments (case i1)
the factors dID are to be deleted.
Table 8.1
Nested Classification
E(}5q) in terms of K(r)
E(l'lSq) in terms of K(
d.f.
K
d
~
Ka
KS
Ka
K.(
j/
(a-l)
(l~)
d(11)
bd
1
d
db
13
a(b-l)
(l..!!)
D
d
1
d
ab(d-l)
1
_$
D
1
to
-48-
•
e
Table 6.2
Crossed Classification (error terms nested)
•
d.:r.
K
d
K
ab
A
(a-l)
(1..2.)
d(l-i>
13
(b-l)
(l-S)
D
d(l...!.}
A
d
D
JlI3
(a-l) (b-l)
(l.i)
D
Error
ab(d-l)
1
~
da
K
a
K
K=<@
db
1
d
1
d
1
d
a
,
K.,(
db
da
1
Delete d/D if dijk randomize with treatments.
The coefficients of the specific variance components in the expectations of
mean squares can be evaluated in several ways.
-'
e
Bennett and Franklin, following
suggestions by Tukey, have formulated an algorithm for most ordinary experimental
designs.
It does not, howe.....er, cover all situations..
derives them from the coefficients for K(
¥)
A general and simple method
as described later. lrJilk and
Kempthorne give a general procedure (discussed in Sec. 13 below) but one which is
too complex, with consequent risk of algebraic errors, to be suitable for regular
use.
Main effect and error mean squares can be fairly directly evaluated in tems
of Tukey's k-statistics (ldth a slight complication for randomized errors Which do
not form with the treatment effects simple "random sums" as defined by him); but
they are not adapted to evaluate the interaction terms since K as defined in
ab
(8.2) is not a member of that class of parameters. However going back to his
"brackets" or "symmetric means" yields a straight-forward and general method which
will be developed here because we shall want to refer to it later.
For initial simplicity consider the following abbreviated models:
(8.4)
-49-
.:
represent a simple set of treatments, and dij may be read as dj (i) to
i
imply nested classification (case i), or as ~, k = I ••• (ad) ••• (AD) • N, to
where t
imply a completely randomized experiment (case ii):
and
(8.5)
vlhich is (8.1) without its error element.
For brevity, to save writing formulae
twice l the range of i ('rowsl) will be taken for both cases as I ••• a •••A, though
to conform with our usual notation these will represent t"T when applied to
(8.4). The range of
j is 1 ••• d ••• D in
(8.4), 1 ••• b ••• B in (B.,); for
formulae which apply to both models we will write c,C to represent either case as
required.
No other indication will be given as to whether formulae apply to one
or either model.
Let L·xij .7 be an A x C matrix representing the elements of a two-way universe.
Initially we take x .. equivalent to Y. j but use a different letter because it will
~J
~
later be set equal to elements.
Relative to (8.4) rows represent treatments and
order within rows is at random.
An experiment is formed by sampling d elements
from each of a rows.
There are
(~)~)
a
possible samples.
and columns represent crossed variants of two factors.
sampling a rows and b columns.
There are
(~)(~) ways
Relative to (8.5) rows
An experiment is formed by
of drawing such a sample.
In each case every sample arrangement is asswned to be equally probable e
We want to express analyses of variance of samples or universes in terms of
symmetric means analogous to these defined by Tukey (1950); but now prodllcts can
be formed in more than one vlay with different implications.
tion is therefore required.
A more extended nota-
The following angular brackets are the Tulrey-Hooke
notation for "generalized symmetric means", abbreviated "g.s.m.". Square brackets
are used to indicate sums in conformity with the notation of David and Kendall (1949).
-50Define:
e
Let
["pqJ* and <.pq >* be
the analogous functions for the universe, that is
replacing a, c by A, C; for example
~ ~xij = C2J* = AC <. 2 > * •
In Tukeyls phrase each symmetric mean, represented by angular brackets, is
lIinherited on the average", that i8
ave
< pq > • '" pq > *
For example, consider
<: ~ >.
\'fuen summed over all possible samples symmetry
requires that every ordered pair of elements must appear an equal number of times,
namely
(~:i) (~:~)
a
a given pair of colwnns.
ave
<~>
=
the number of samples which can contain a given raw and
Therefore
(C.-22-) fl. 1.] * .. (1•.•IJ * -<P.->*
( Aa·.ll) o
(
~ )(~)
&c(c-1)
AC(C-l)
-51Similarly for the others.
Note that the various types of products do not occur
an equal number of times aver all samples and do not have equal expectations.
From elementary algebraic relations
Whence
adx •••
2 • <2>
a
d2xi.
a d
u ':::"1 2.
~ ~ X ij
=
a
+ (d-l)
(~~> ... d(a-l)
~ <2) + a(d-l)
ad
<: i>
< ~~ >
< 2. >
Taldng differences and dividing by degrees o£ freedom leads to Table 8.3 for nested
classification. Similar algebra using
<i: >and <. ~~> leads to Table 8.4 for the
crossed classification.
Table
8.J
Treatments M. Sqo
Mt =
Deviations M. Sq.
Md
Table 8.4
ji
M. Sq.
13
M. Sq.
JI?>
M. Sq,
<2. > • <~ > + d (
= <' 2 >
- <'~: >
<:: '>
.. <i>
)
To obt.ain expectations note f:'rst that on squaring the linear models, or any
averages of them llSec. in. ccmlJnti!'g sums of squa:,es, a!ld averaging over all samples,
the expectations cf
produc~s
between ar:ry two groups of elements vanish.
~le
can
therefore fin1 e;C!'ec'La":,ions by letting X ij stand for single groups of elements in
turn and replu,<'i:'lg brackets b~r their uri verse values.
letting xij = t " since this is constant within rows
i
<i >is simple mean product for
Therefore <~ >- <i> •( >- <i >is a simple ~-statistit
and the contribution of t
a one way universe.
i
to M va.:'lishes.
d
Also
a
2
as defined by Tukey; in our notation kt" with expectation K •
t
I.etting x. . .. d k (case ii) the matrix is degenerate since there is no restricJ.J
tion of rows, and mean
namely -
< 2> *I(N-l)
E(M ) ... E(M )
t
d
produ~ts
in any direction must have the same expectation"
J * .. O. Therefore
.. <. 2 > '* - ~ 11 > * .. <: 2 >* N/(N-l) ..
D
since [1
K •
d
In case (i) (nested) ~ dij .. 0 within every row; therefore
.. .. - <2 > */(D-l)
<11)*
and
Therefore
E(M ) ..
d
<: 2)* - <~;)'*. <2'>* D/(D-l) • Kd
*
E(Mt ) .. E(Md ) + d( ( 11)*
~~
• (1
1»"
<. 2)*
d
(D-d)/(D-l) .. (1 - i)K
Extension to triple classification gives the formulae in Table 8.1.
d
If the t
treatments be regarded as made up of a .: b crossed variants of two factors it is
e
not difficult to show that the expectations of the <1,jk terms are maintained in
analysis of the two factors, and henoe the faotors for K in Table 8.2, the
d
remainder of whioh we proceed to examine on model (8.5) ignoring error terms.
As before for t i , a i and bj being constant in rowe and ool~s respeotively
l'
1they vanish from Mab (Table 8.4), and 11)
..
• <1')
-1
and l'
_~>either vanish
< >.<
<
or are simple k statistics with expectations K and
a
xij
ab we have
Q
~
respectively.
letting
ij
A B
~ ( ~ abij )2
.. 0
~ (~abi/ • 0 •
j,
(.2
~
~ abij ) 2 •
l:- :]*
c~f U:r
=[2.7*
0 • ["2
+
+
J * . . (1..1] * + [1']
1. *
']*
1
.. [ • 1
Which lead to
.. • - ( 2> *I(B-l)
<11)*
l' *
<. 1'>
• .<2> *I(A-l)
and substituting these as expeotations for the brackets in Table
8.4 (and
multiplying by number of replications) leads to the coefficients of K in
ab
Table 8.2. These results are or course already known, having been given by the
authors quoted above.
The following indioates, without detailed working, extension of this r_IIl
of analysis to three-way un! verses.
replioation:
Consider a three-factor experiment without
(8.7)
with our usual convention on numbers of variants.
Since we cannot conveniently write a three-way matrix on plane paper the preceding notation is inconvenient.
13 variants by columns
Imagine the.A -variants designated by rows..
(horizontal).. and $variants by verticals.
of products of pairs of elements will be designated as follows:
The various sets
!
will indicate a
pair of elements in the same row.. ! a pair each of which comes trom a dii'ferent row;
similarly
verticals.
~, ~
for wi thin and between columns;
~
and
£ for
wi thin and between
Since these letters appear overworked it might at this stage seem
advisable to use R.. 0, and V; but the reason is that they will lead to a simple
algorithm for the mean squares.
They will be distinguished by their associated
brackets. lve then define symmetric sums and means as follows, giving just a few
examples:
[ABD]
[aBD]
(8.8)
The rule for the number of terms in each sum (including permutations of each pair
of elements) is easily seen from the examples.
Then following the same procedure
as above.. the analysis of variance in terms of g.s.m. is found to be as in Table 8.5
which shows the multiples of symmetric means to be added to the mean squares
indicated in the second column to give the total mean square for each row.
It is
-55..
now visible that the above notation yields an algorithm for the formation of the
mean squares similar to the well known one for treatment effects of a factorial
experiment.
For example, with letters inside angular brackets indicating symmetric
mean products, those outside being replication multipliers:
MSq (/l13:IJ)
:z
<:: (A·a) (B-b) (D-d)
>
• bd .( (A-a)bd> + MSq(J(J;) + MSq(J~) -
MSq (/J)
MSqf.;PJJ»
We next consider the model (8.1) a two factor experiment with replication.
So
far as concerns the analysis of variance it makes no difference whether we consider
the error terms, dijk, nested within each treatment combination or associated at
random with all treatments.
The differenoe lies in restrictions Which only come
into play M'ien we want to evaluate the expectations in terms of Kd •
The difference
from the foregoing is that since elements can be assigned to verticals at random
there is no difference in the products wi thin and between verticals l1hen associated with between rows or columns.
< aBd'>
Thus in plaoe of, for example,
and
we get
2
[aB._7 .. {aBD.7 + ['aBd.7 .. a(a-l)bd
etc.; but
ducts.
<aBD>
< ABD>
and
< ABd>
<:
aB.
>
remain distinct, being respectively squares and pro-
These amalgamations lead to Table 8.6.
Expectations of the g.s.m. for each
group of elements leads directly to Table 8.2 without the composite inference
previously used.
e
Table 8.6
Analysis of Variance of two-factor experiment witb replication:
model (6.1)
M.Sq. +
.fb
J1'/;
<Ab. >
(/13)
(:6) + (In)
(J5}
vi
<. ABD>
Mean Squares
<ABd>
bd
+
(IJ )
.$
d
1
-d
-1
<.
aBo S
<. abe >
-
-bd
ad
-ad
-d
d
-
-
In Tables 6.1 and 6.2 the expectations are also written as they would be
evaluated for a completely random model, the components which would thus be
indicated being collectively termed k(){ ) and K( (().
By comparison with later
tables we see that their definitions in terms of g.s.m. are:
cation,
fr~
Table 603,
for one-way classifi-
-57-
,
e
•
for a two factor analysis, from Table 8.4 or 8.6,
k.,( =
<~>-.( ~~ > or more generally <' Ab. > -.( abe >
<i: >- <:~ > or more generally <aBe > - (ab. >
k = (2) - <:;) - <i:> + <~~> or more ~nerally( ABd>
4
~
(8.10)
CI
-< Ab. >< aB.> +<ab. >
and so on for more complex cases.
Hooke (19$4) obtained Table
8.4, and starting
from the linear functions of g.s.m. (8.10) developed a family of statistics and
parameters which he christened "bipolykays" to be used as mechanism for evaluating
sampling moments of mean squares and sp ecific variance components, k (r), for model
(8.,). He evidently regards k (r) as basic definitions of variance components for
e
fini te models.
The k(
~
), now extended to more than two dimensions, are all linear functions
-
of g.s.m. with coefficients + lindependently of the sizes of samples and universes.
They are therefore inherited on the average.
Contrariwise the k(r) are linear
functions with coefficients which involve both the observed sample size and the
postulated group universe sizes.
These introduce nuisance factors depending on
universe postulates which are nearly always dUbious, they destroy inheritance on
the average, and make the sampling distributions clumsier than those of k( (().
The linear model whence all derive is a quite arbitrary appro:ximation to reality.
If without making a model substantially more arbitrary we can get rid of these
troubles we may be well served.
Defining variances of the group universes by
(8.2), instead of as the more classic and basic second moments, was already a move
to gain some of the sirnplicity which was introduced by Tukey's generalized
k-statistics, or "polykays"" to study of simple universes.
'!he logical continuation
-58..
is to consider adopting the k(
universes.
r ) as basic descriptive
statistics for complex
Subsequent sections consider their use in interpreting experimentso
Since their formulae are inv"riant for finite or infinite group populations I
shall term them
canonical variance components •
Since comments both in the Ji terature and at statistical meetings indicate
that many feel confused about consequences of the interaction elements (ab)ij being
not independent of a and b , notice in passing that independence is not assumed in
j
i
these definitions. Notice also that, although in a small universe the conditional
arrays of (ab)ij for given i are unlikely to be similarly distributed, and conversely
for given j, nevertheless the mean second moment of the i arrays must be identically
equal to the mean second moment of the j arrays by virtue of the definitions,
Aver(~/ai) =
A
B
i ~ (~(ab)~j/B)
B
;
A
~ ~ (~(ab)~j/A) = Aver(~lbj).
This remains
true for K( 'l( ), but is not true of the mean K (r) wi thin arrays unles s A ::I B.
2
..$9Appendix to Section 8
Variances of mean squares
The theme of this paper is concerned
1.Q. th
estimating variance components and,
for that purpose" lJi th expectations of mean squares.
estinated variances has not been part ot its purpose.
To consider dis·l;ribution of
Mey (19$0) indicated that
his generalized k-statistics could be applied to that problem apparently
imply.i.~
that aEsociation of elements of the linear models could be treated as rand ani zed
sums.
The combination of treatment and error elements is,hol'1ever" not quite of
that form ance (l) with nested universes the error mean to associate wi. th each
treatment comes from a different sub-universe, and (2) l1ith randomized errors the
,
association of several d j l'dth each t. , leads to a ereater multiplicity of
i
1.
arrangements than for simple randoJlJized sums.
In an unpublished report Tukey (19.5oa)
sidesteps these complieations li thout mentioning them and obtains variances of
•
specifio variance components for eases where groups associate at random by an
ingenious method. of inference £rom particular cases.
Before that report was
available to me I had obtained variances of mean squares tor one-w§J' classifications,
model (8.4).. by direct application of his formulae for randol'lIi.zed sums.
The
procedure seems of some interest for its incidental evaluation of generalized
k-statistics of means and relative to a speculation by Uishart (1952). He consider
here only the treatment mean square which is denoted dVt " v being a
t
determined from t class means" '1..
t
Vt
•
~ (Yi.
1.
=m + ti
- y .. )2 / (t..l)
+ d.
1.
I
~
statistic
-
e
Case (i), nested sub-universes, that is eaoh set at d- values ot dij contributing to die is from a different sub-universe.
Initially we assume that dij have
the same distribution in eaoh of the T sub-universes, with K-parameters K , K
22
2
and \
as defined by Tukey.
(Since we now have to use subsoripts to distinguish
different orders of the parameters, these parameters will be written K (d) etc.
2
when they have to be distinguished from those for the t group.) Then the moments
ot di ., Mr ... E(di.)r, are:
M =
4
where ~
e
d,.«~3 + D-3)K + ~2K22
(Al)
4
• (1 _1)
d
D
as may be derived from fomulae givan by llishart (1952). We define also
•
(A2)
because the d.
1..
are in effect sampled with replacement and are therefore as it
from an infinite population with K (d. ) equivalent to ordinary cumulants and
r
~2 (di .)
=
1..
~ (di .) "" ~. Note however that the K on right hand sides of (Al) and
(A2) are defined for the universes of dij for which
K~
0:
~:~22
+
~.
Now (Yi. - m) = t i + die is a "random sum" in Tukey's sense, with the t i
selected (without replacement) from a universe of T elements, die selected from an
infinite population.
It follows from Tukey (1950) p. 507 that
(AJ)
-61as already given in sec. 8. And extending Tukey's argument as indicated by him,
pp. S17-8,
(A4)
This formula is equivalent to, for n random pairs of, say, a and b,
var(~ (a+b» .. var{~ (a»
+
var(~ (b»
(AS)
+ 4co,J(ab)/(n-l)
)J
~ (a -a
~
}(b -b
2
1. i.
:II the variance of the
covariance since
(n - 1)
by definition of the random sum E(cov) '" O. (Note however that the last term ot
where co;' Cab)
==
E
(AS) is not 2cov(~ (a), ~ (b», whose expectation is zero, but a term which
•
arises from random association of sets at a and b and still exists even though the
Whole universes be used so that either or both of var(k (a» and var(~(b» may
2
be zero.)
If we put t
i
= constant
:II
0, only the last two terms of (A4) survive,
representin5 var(~(di'»' and is the"usual formula for variance of a sample
variance from an infinite population as clearly it should be since in effect the
same population of d.
1..
is being sampled
K-parameters of the original
var(k_(di
--z
•
mth
replacement.
~j
In terms of the
J
» .. ~[~
- ~]K.
+ 2.l [...L - J
~
t d
D+l --4
t-l
t(D+l)
G
(A6)
Wishart (19S2, p.a) notes the natural extension of moments of «'8Mralized
k-statistics to cumulants but seems dubious about it.
He writes:
It . .
if we accep1i
-
e
Tulcey's concept of an infinite population of samples of size n from a population
of size N ... we might define a C\II1ulant L-in the manner of (A2 ).7
question really needs further consideration. II •
...
But this
In Section 1.9 it was noted that
all such statistics aN random variables formed by the act of sampling and that
the distribution of a random variable is inevitably that of a hypothetical infinite
population; that this is fundamental and not merely a device.
Therefore the use
of K (di ) as defined in (A2), am ~ for ~2' in the above formulae appear to
4
need no further justification. However in view of Hishart 1 s doubt I have checked
the usage in two ways.
First by deriving var(k_
--z (d.~.
» by expansion and averaging
over all possible combinations; cf. equation (AB). Secondly the complete formulae
(M) and (A6) were empirically checked by forming all possible values of k (y. )
2 ~.
from a universe of t i .. -1, 0, 0, 1, combined wi. til all combinations of
d.
~.
"
.. -1, -1, 1, 1, which in turn can be derived either as singletons (d ... 1) from
similar dij universes, or as samples of d ... 3 from universes of d ij = -3, -3, 3, 3.
If the distribution of d ., and thence also of d. , differs from class to
iJ
~.
class, expanding k (d.1.. ) and i te square in terms of symmetric polynomials of the
2
variables and eValuating their expectations leads to
(A1)
as would be expected" where
uni verse .. 4:
2i
1\1 .. the
second moment of die for samples from the i-th
; and to
var(~) =
1-
tOt -
;2
31'~) +
:2
• ~(M4 - 31"~)
+
~
t=i -
2il
~ var(~)
t.i (t(::i) - ~) var(~)
+
(AB)
(A9)
-63-
..
where var (~) = ~ - ~ and p
2
t -2t+3
• The first two tems are the same as
t(t-l) (T-l)
(A6), or the last two terms of (A4) if M is constant. (A9) is equivalent to using
2i
the average of \ (di ) evaluated for each universe; but clearly we do better to use
(Ae), with ~ in stead of ~, since ~ is negligible if T is large, and then (A8) is
=
little affected by variation of M •
2i
Further complications may arise when we consider the effect on var(v ) of
t
allowing variable types of universes of d • Writing
ij
K2(t)
=
E(k l
)
=
KI,
t
COy
III
~ (ti -t.'di/(t-l),
one finds
var(vt ) = Yar(k l
)
•
2
+ var(k") + 4I: cov
+ 2 (E(k1kl') -
K'~)
(AlO)
.... 4E(kll COY) + 4E(k' cov).
The first three terns are the standard formula (AS).
With any given set of t
i
ave ~ die ...
o.
The last is zero because along
The other two terms are not necessarily
zero:
k l and k" will be correlated i f ti is correlated with M ; kI' and cov will
2i
be correlated if t is correlated With skewnese of the distributions of d •
i
i
However in practical situations, e.g. l'1ilk and Kempthorne1s second example, such
correlations can safely be assumed to be negligible; so that we need consider only
average moments of the error universes along with standard formulae as if all their
distributions were the same.
Case (ii):
e
•
a single universe of N ... Dr experimental units with deviations
dj , j = 1 ... N, randamizable with all treatments, and with parameW8
~,
K22'
~
•
-64Consider f'irst a single partition of' the N elements into sets of d elements each"
giving a set of U
N/d values of' d~
;:r
1.
form a set of Y'.i' •
of 't"lhich t may be associated with the t
The t values of (Y'.i'
to
i
+ d! ) thus formed are a sample
- m) ... (t
1.
i
of "randomized sums" as defined by Tultey (19,0), the paired elements being selected
•
•
from two universes of T and U elements respectively with parameters
K, (do ).
2. 1.
'nlere are l'l
=
Nu·'
(d~)
ul
~ (t)
and,
possible sets of d ' I and averaging over all such
i•
sets
\.Jhen we expand
•
and Bum over the 1J arrangements, every possible
(j
;:r
1 ••• N" j
f
Ave (d~)
d~
and product djdj '
j') occurs an equal number of times.
Therefore" writing
N(N-l)
= ~ dj/N and Ave (djdj t) a ~ djd .!N(N-l)
j
N 2
we obtain
,
WUL-d
Ave K2 (di ) •
Ave(d~)
-
•
N
But since
and U
~ dj
=N/d
+ d(d-l) Ave(dj do,>.7
2
W(U - l)d
N(N-l)
= 0" we have -
j
djdj
- J
N
I
= ~ d;
= (N - l)K , and SUbstituting these
d
leads to
(All)
•
-65-
..
:.,xtending this argument yields
Ave K22 (di
'> .. ~2/d2
4
Ave K (di )
3
(Al2)
• I\/d
Tukey's (1950) argument then gives
1)
/l
Ave vart(vt )
1 1
=l'i .. T It (t)
1
+
(t: -
+ 2 (t-1 - T:I)K22 (t) +
1 K4(d)
ij' d3
1
+ 2 (t-1 -
1
~
'> between sets
u:r' i
2
1.
!!:!:!
K~ (d )
U-1 -""22 i.
+
!U K41 (di. )
The average value follows trom (m).
Similarly
Thence
var(K '.(d » .. Ave (K ' (d. »2 - .(Ave K' (d. »2
2 1.
2 1.
2 i•
2 (1
-
.""'W
de.
U-1
~
1
-)K(d)
N-1-c 2
Adding this to (A13) and substituting n • dt, N .. dU,
(Al3)
To this has to be
hilich is independent of t ).
i
Tukeyls relations
(K' (d. »2 ...
(t-1)d
K22 (d)
for the average variance ot random sums formed within sets.
added the variance of ~ (di
~(t).~ (d)
By
The formula was empirically checked on a nUJllerical example. Subsequently it has
been found to agree with Tukeyts (l9S0a, unpublished) fonnulae which were obtained
in different manner for variancEII and covariances of the components.
Case (ii) is more generally applicable than case (i), and is much the more
important one.
Although at first sight it appears more complex" owing to there
being no single universe of
e
<\.
trom whi ch to fOl'J11 random
SUJIlS
wi th t , it is
i
interesting that it works out more neatlyg Furthennore no secondary complications
arise from variable distributions of several error un!verses.
9. The canonical variance model
Define a canonical latent equation
y • "" + .( + p + (.«3) + 8
(9.1)
(In mathematics "canonical" implies merely a "standard" form.)
It is written
'l,·rithout subscripts to imply random var iables in the fumctional sense.
may be added to indicate elements of hypothetical population"
Subscripts
but without bothering
about how to define individual elements (9.1) already serves the first function of
a latent equation:
it defines the structure of the population and sample, and direc1
that a sum of squares, either of a
sa~le
or umverBe, be analyZed as in tables 9.1
Since k()? of a sample, table 9.1 (a), are inherited on the average (Sec. 6), their
expectations are the analogous functions K«» of the universe, Table 9.1 (b) or (c)
depending on error term postulatee.
universe.
K(r) are by definition mean squares of the
Table 9.1 (b) or (c) therefore provides an explicit linear transtwmation
between the tt070 classes of parameters" namely:
Ka • K.( + If.,<p/B + KS/En
~ • K~ + K4/A + Kf/AD
Kab- K.«3 + KSID
Kd • KS
and conversely
KS • Kd
K.,<p. Kab - Kd/D
K~
K..<.
e
~ - Kab/A
= Ka ' - Kab'IB
•
w.i th terl'llB in lID deleted i f error terms randomize '!'n. th treatments.
(9.3)
(Here and. in
expectations of mean squares in terms of K(O) we will make the convention that
-68D -1 ex>
for randomized error terms, as well aE: for postulated infinite error
populations, in order to save re-wri ting separate formulae for the nested and
randomized postulates..
It is juf:tifiable by noting that Nhen errors randomize we
Table 9.1
Mean Squares
d.f.
(a)
sample
~
13
v?P.>
Error
k
(a-I)
(b-l)
(a..l)(b-l)
ab(d-l)
bdva
adv
b
dV
ab
V
o
a
k.<p
at
I
d
•
1
1
d
d
a
'II
k
a
k.<
bd
ad
1
(b) Universe ( all groups finite and sub-groups of d
nested)
ijk
e
K
K.<@
a
.A
(A-I)
a
1
D
13
BDKa
(B-1)
(A-l){B-l)
AD~
~
1
D
DKab
•
1
D
AB{D-l)
Kd
•
1
J~
Error
(0 ) Universe
(if D
00,
(3
~/3
(A-I)
(B-1)
(A-I)(B-l)
e
K.<
BD
AD
or d ijk randomize with other elements: analysis of E(YljO)'
Ka
A
K
BKa
AKb
Kab
K.<@
•
1
•
1
&S
1
1]
K.(
B
A
in effect define treatment means as expectations of infinitely gany samplings from
the error universe with replace;1ent, and the K('6) are defined in terms of suoh
expectations. But the convention does not carryover to higher moments of
k-statistics, nor to sampling variances of class means, e.g. equations (11.6).)
-69SUbstituting (9.3) in Table 9.1 (a) leads at once to the expectations of mean
squares in terms of K(r) as given in Table 8.2.
method for getting the coefficients of K(r) in
This seems to be the simplest
Sl
ch analyses - if they be required
and tJonsidered warth getting.
Any universe is usually an arbitrary postulate made to focus interest on some
region of temporary interest.
He may today be concerned with variation between
spindles of a particular loom manufacturing a particular type of cloth.
But there
are other looms and more spindles of 'Ioh ich our observati. ons might l1easonably be a
sample and to which we might at another time wish to extend our inferences. Conversely we might be interested to consider "loariation in a smaller loom, or in cloth
made using only part of the observed loom.
All such changes of postulates mean
changes in the definitions of k(r) and It(r) which are tied to each postulate in
turn.
If having evaluated Ka for universes Band D, we i'ant to evaluate Kia for
universes B' and D', we have so to speak to undo the old postulates and substitute
the nel'1 ",11th formulae suoh as:
~
(1
1
+ 'S' - !)kab
1 (1
1
(9.4)
)kd
The canonical k(¥) statisti. os are by definition invariant under such changes of
KIa • k'a
postulates.
= ka
They remind us that
K~
+
'B' IT, - n'
are not so muoh fixed parameters as functions
of means associated with certain groupings; in terms of
involves only the
IleH
k(~
the formla for
k~
postulates without regard to prior ones.
k' a • k(Yio1o') • k~ + k~/Bt + ka/BID'
(9.,)
'He write (9.4) and (9.5) in terms of estimated k's because theoretically the K
whioh they estinate may alter with the particular factor variants supposed included
e
in a universe" although the estirrating k do not so change, being dependent only on
the &ample 'tbatever universe it be supposed to represent.
These conditions give
k(¥) a more fundamental position than the k(r) Hhich depend on mtable postulates.
·
e
-70A
If Ave (ai) • ]ai/A • Ka (A-l)/A1 etc. 1 be termed variances, distinction from
sanpling variances of statistics sometimes becalms a matter of some subtlety since
He find ourselves applying the same name, var(a ), both to Ave(ai) and to
i
e""
2
~
2
(., (ai -ai ) • G (Yi..-7ioo ) • the sampling variance of the estimate of a on
i
repeated sampling from the sub~un1verse with mean (m+a.). In an earlier draft I
1:
tried to distit1[.uish the t110 types of variance by different symbols, var and VJ
but the endeavour failed of consistency because all variances are essentially sampli
variances (sec. 7.15). l1henthe farmer is called a variance it means e(a~)
for
).
repeated samples of one lnth replacement from the universe of a • Tl'nls the only
i
basic distinction betlieen the t'tiO, vlhen both are regarded as variances, lies in
the universes being sanpled and not in the twa
or
'variance I.
seoms 'best to disti!\.,":Uish Ave(a~)
as the second moment
).
•
Therefore -it
of the universe a.1 or as
proportional to the K-parameter.
For simplicity in the follm1ing discussion assume randomized errors. It is
nuch the commoner and more important case. He then have
Recollect that a variance component model faYs that He are not interested in
individual elements but only in dispersions vlhich may be associated with vanants
of each factor.
Uhen interaction exists effects are a function of Qoth fact01'ss
acting jointly, and effects of one factor are not definable in isolation from the
other.
e
If l'1e choose to aesociate certain maGnitudes of effects and variances with
factors singly ani in interaction the partition can be only empirical and to
eOlm
degree dependent on formal definition of Hhat He shall mean by such phrases.
At
the initial stage of defining a latent equation the specific latent equation (8.1)
seems simplest and most straightforl1ard.
Ibt it is only empirical and if at a
later sta!e of analysis it leads to non-simple statistics 't'le can go back to con.sider
.
e
-70amodified models
l-lhi ch
may be simpler over the whole run of deducti ons.
Since k( ¥ )
are substantially simpler sta"t.istics than le(r), and form a complete set, it is
reaeonable to consider formulating a model for v1hich they may themselves be interpreted as the dispersion components, not merely as a stepping stone to the specifio
components •
..
-71Consider the general variance equation (1.2) or (6.1). In a complex finite
universe variances are still additive in the same way, but the K-parameters of ai. the
type are not.
For a two-way universe:
AB
-:-"",
~(Yij) •
2J /, (Yij
2
.. Yoo ) lAB
• ~(a) + ~(b) + ~(ab)
&:
~(.,() + ~ (~) ... ~ (-'.13)
£01101-1S
(1 .. 1I)K.,(
+ (1 -
i)~
(1
a
(9.6)
-l}Ka
III
+
(1- 1a)K
13
+ (1
+
-l><l - j)Kab
(1 - 1A!)K~13
(9.9)
directly from the definition of elements in the specific latent
equation, (9.7) gives its parts in terms of K(r) as these Here defined by (8.2).
The definitions of K(¥) necessitate (9 ..9).
•
of the elements of (9.1).
variances we
Remembering the relation between K-parameters and
see that K..<l3 appears as the K-parameter for a simple universe of AB
elements, that is for a uni verse
the (A
if'
Assume that its parts are the variances
~d. th
B-1) restrictions on ab •
ij
only one location restriction, in contrast to
Ue have merely to allo'tV' that the mean y for
all variants of one factor l-Tith a single variant or subset of variants of the other
factor may still contain a small part derived by definition from interaction.
For
example
Y
io
=1J.+~i+e43
io
Since elements of y.. are anyway empirical and imaginary (Sec. 1) this is a small
J.J
price to pay for simplification of the variance analysis and its estimators.
Equating the specifio (8.1) and canonical (9.1) latent equations
Yij
.
=m + ai
+ bi + abij • lot. + ~i + 131 +
and retaining arbitrary restriotions,
~o
" Po =
4 ij
~oo
• 0, the relations between the
r
e
-72tHO typeS of elements are seen to be
m = Yoo
=r t..L
a i = (Yio ~ Yoo ) .. -<i + -<f3io
bj
:0:
(YOj - Yoo ) .. f3 j +
4
0j
abij • (Yij - Yio - Yoi + Yoo ) = -<f3 ij - -<{3io • 4 0j
(9.10)
The canonical elements are not uniquely defined but we do not require that they
should be.
It Has par°i;, of the model to state that we are not interested in the
elements individually but only in their dispersions.
These and their estimators
are uniquely and explicitly defined by Tables 9.1 and equations (9.8) and (9.9).
If He ask for estim1.tes of the elements 'Tr1e alter the model by raising them
from the st.atus of random variables to that of location parameters (Sec. 7.9).
Suppose we may wish to make this extension:
the canonical model is over-determimte
in the sense th ~t it nOH has more parameters than obser1rations available for
estimating them" a not uncommon situation" for example: factor analysis.
1
toTe can
1
add the requirements that ~(..() ... (1 - I)K-< ' ~(-<f3io) ... (1 - 1i)K~ , etc., but
inmmerable solutions are still possible.
A
A
.. 0 and estimate ..<.. • (Yo
i
1
•
- Y ) ... a. •
••
1
In practice one would postulate ..cf3
etc.
io
If He could make observations such that
the element e.stirnators '"
a. conformed to the dispersion requirement of the model to
1
A.
yield E(~(a:i» = Ka there would be some attraction in retaining the specific model.
1 - 1~)Kab ... Ifd/bd.
Blt in practice this never happens since E(k2 (a'"i » ... Ka ~ (D'
There is therefore no lost advantage in accepting similar estimators for
A
to E(~ (-<i»
~
subject
.. K..<. ... K..q/b ... K6/bEl. HO'Tr1ever" as indicated, this is all supererogatory
since it really means a change from the variance component model to the regression
model.
Error terms add little complication. lJith randomized errors (9.10) are defined
by expectations of the y means and otheruise remain unaltered.
ljith finite nested
universes of experimental units He similarly remove the restrictions d. j
1. 0
add means
o.1.00
etc. to the right hand sides of (9.10).
= 0 and
They make a nice distinction
betHeen the tHO cases by reminding us that in nested universes the
e=~perimental
unit
are integral parts of t,he treatments and not eA'Penmental errors in the true sense.
The~r
are fundamentally similar to the interaction effects as discussed above since
treatment effects canl10t nOl'l be isolated from the associated units.
A term
(1 - j)K o now stands to be added to equation (9.9) irrespective of assumptions
on randomization; contrarhlise the term to be added to (9.5) for specific elements
is not invariant for the alternative assumptions.
The purpose of the foregoing is to develop the parameters and statistics K(¥)
and
k(~)
as a complete and sufficient set for desctibing the composition of dispersi
of a complex finite univer se J 'Hithout need to involte hypothetical infinite populatio
't'l1hich some workers seem to regard as undesirable abstractions; and to indicate that
their latent model is no more empirical than the usua: specific model"
been done is to relax reEtrictions on the sub-group means of abo
'J
l.J
All that ha$
allowing them to
take arbitrary values ,·lith variance inversely propo:l:'tional to numbers in each subgroup.
Such relaxation is permissible because the original restrictions" pinning
these means at zero" were imposed on the specific latent equation only Nith intent
to simplify
alge~a
(Sec. 2); they need not be retained when found not to serve
that purpose to best advantage.
Having thus relaxed restrictions on sub-group means it is reasonable to conside:
e
the effect of similar relaxation on all group means, ""0" ~o" ""~oo' etc.
Allow
them to talte arbitrary values conformable with the universe being a super-sample
·
e
-74from hypothetical infinite populations in all groups.. subject only to the restriction that the expectations of every group, and covariances of pIi'Iis of groups are
zero.
The extension is reasonable because postulated uhiver158S are almost invari-
ablV restricted segments of potentially larger p0J;l11ations.
This l:eing done the
canonical model (9.1) be cone s identical \.r.i.th the general "random" model"
K(
)
can be interpreted as variance components of the infinite population" and inferences
about a restricted universe (super-sample)
or l-rhich an observed sample is a part,
follow from uell lmO".m relations between statistics
or
a sample and of a sub-sample.
Doing this adds obvious further terms to equations (9.8).. for example
m• ~ +
e
ai • -\ ~1ith
'<0
'<0
+ ~ 0 + ~ 00
(9.& )
+ ..«3io - .«300" etc.
this convention the canonical latent Elquation most easily performs its second
function:
to provide machinery for evaluatine expectations of mean scpares and of
sampliIl[; variances.
Examples will be given in Sec. 11.
Defining ltcanonical variance components I in the
l;laY
suGgested has then the
f'ollmr.i.ng advantages over the 'specific variance canponents' (equations 8.2):
1.
Being .:;eneralized k-statistics which are inherited on the average they are
unique unbiased estimators of the universe I-parameters independently of size of
sample and of "wtlverse postulates. This is probably the chief advantage.
Postulates about universe sizes are nearly alt-lays introduced lfith a note of
apology and uncertainty for the dm.bious numbers assumed.
At other times they
depend on, and may alter with, different sorts of questions to be asked of the
same data (Sec, ll).
Consequently they are better treated aslImtative
hypotheses several of lhich may be considered.
~Jhat ~Je
need are statistics
l-lhich can be estimated once and for all from each experinent and used as
-75required to indicate conditions urKier each such hypothesis which may seem of
interect.
2.
The k(1r) supply this requirement.
If a universe be everywhere finite the canonical components are just as uniquel
and concretely defined in terms of the universe elements as are the oBpecific
compone nts.
If, as is usual, any group is postulated as inf'ini te or random-
izable the specific components themselves are defined only in terms of asymptotio or expeoted values.
They are therefore not really any more concrete than
are the canonical components.
8. The expeotations of mean squares can be evaluated simply and unambiguously by
well-known rules.
For a balanced e:::periment they can be written down at sight
by Lee Crumpls (1946) rule.
Although Bennett and Franklin have given a reasonan
algorithm. for the speoific oomponents, it still remains more onerous if only
because one has to evaluate different coefficients for each row.
4.
For a balanoed experiment the coefficients for a given canonical component are
cons"liant for all ro't-TS in which it occurs.
This both simplifies writing do'WIl
the e:xpectationa and facilitates solution of equations to estimate the oomponents.
5.
Any variance vlhich may be required is expressible as a simple linear function
of the canonical oomponents.
The speoific components of CQ1rse also yield
linear oombinations; but, e::oept for the particular sample size observed and the
postulated universe sizes on vlhich their computation has been based, their
coeffioients are more oomplicated (examples in Seo. 11).
6. Interaction Has originally, and is still, oasically defined with respect to
means in a factorial experiment.
In that oontext it is eSEentially symmetric
Nith respect to both (or all) factors involved.
The interaction of ~with
is identically the same as the interaotion of {/) ,dth J.
13
Although a trivial
-
e
point it seems aesthetically desirable that its analogue in var iance components
should be similarly symmetric.
The canonical components meet this pleasure,
the specific ones do not.
7. Although contrary to fashionable current practice, Sec. 12 will argue that the
canonical components are more meaningful for interpreting most eJq>etiments, and
more relevant to defining tests of signif'icJ.nce, than are specific components.
·
e
-7710.
Regression as a limitillg case of the generalized model.
Table 10.1, derived from table 8.2, gives the analysis of variance for a t\-10-
factor experiment with randoPdzed replicationa in terms of specific variance comIf we now let a = A and b .. B the factors
ponenta.
Table 10.1
Classifica ti on
.Ii
(a-l)
K
d
1
!3
(b-l)
1
d(l - bIB)
d(l - alA)
(a-l)(b-l)
ab(d-l)
1
d
..JP.>
Error
d.f.
K
a
db
da
1
for K inA and inJ3 become zero. Since then also the definitions of the K(r)"
ab
equations (8.2)" become identical with those of 9 in Sec. 5, Table 10.1. becomes the
same as Table
5.1
for the regression model.
Sec. 1.11 has noted that the parameters of a universe are an enumeration of
its elements. TJith a universe of observable elements there is, of cause, no
estimation problem if all are observed; here the elements of each universe are
central pirameters of other universes or populations and an estirrating problem
still remains.
But "lith representation of all elements of such a universe all its
specific parameters become estimable, which in effect is the same as saying that
the universe stands to be described by evaluating the rreans of its sub-classes
indi vidually, exactly \-lhat the regression approach doe s.
Accordingly the regression
model is commonly presented as the general analysis rlith the whole universe of
elements observed, these elenents then being commonly described as "fixed effects""
The only confusing feature is thtl.t students often feel ambiguity about i,!hat is then
meant by the universe of treatments.
Few experiments cover all variants of a factor
•
whose evaluation would be useful if resources permitted.
To say that the potential
universe is observed is false" consequently the idea 18 often introduced llith
appearance of apology for an artificial postulate, although l-1hat is intended is not
that ab has been increased to AB but that a sub-universe is selected for study
lvhich by definition reduces AB to abe
Although reasonable the appearance of
artificiality is often difficult to avoid" and one finds students thoroughly confused tiith the query: Ill:Jhen is a variable 'fixedt er trandom t ?"
that Ilfixed effect" (rarely defined) implies a focus of interest.
The answer is
To give an
element that name is to say tlu t one is interested in determining its value" as a
parameter in its own right, independently of" and not merely as a representative of"
the class of sinrl.lar elements "hose potential number is then irrelevant" that is
that it is to be treated as a regression coefficient and not as a ranc1.om element
from a universe.
The definition relative to a restricted sub-universe is however
useful in directing attention to the composition of means to l-hich a regression
surface is fitted.
Sec.
5
noted that, if we want to make a simplifying approximation by fitting
fewer regression coefficients than are required for complete fitting to all class
!Jleans, the variance of deviations about such simplified regression surface nay be
acceptable as an external estina te at error nhich may be preferred to the internal
estimate with respect to deductions Using the approximating surface.
Analogous
considerations in terms of the generalised m1lrlel are most easily forlllllated in
terms of canonical valf'iance components for which the interaction components will
not disappear from the main effect mean squares. iTe will return to this in the
next secti on.
-7911,
The "mixed model"
Relation of the "mixed model" to the regression and random models can be
exhibited by splitting the linear model into two parts,
Suppose a
i
to be "fixed"
effects; f3 j to be"random". 'He can write the regression equation:
i .. 1 , ..
a;
a canonical latent equation for the random variables:
e•
f3 + (~f3) +
5;
and a variance equation:
222
2
O'e • 0'f3 + 0~f3 + 0 5
(li,3)
Equation (11,1) points up the composite random variable, s, which determines
de·liations about the "regressionll •
However" since groups of e. 'k contain the same
1.J
f3 j and (,,(f3 )ij they are no longer independent and the regression cannot be fitted
and interpreted quite so simply as formerly.
The usual analysis of varianoe in
effect fits b parallel regressions" one for each p-variant observed, minimising
the sum of squares of the independent deviations (-</3 )ij + 5ij ,1 that is the inter..
action sum of squares which accordingly beoome~ the error component of MSq(H).
Current custom has endeavoured to avoid these oomplications by treating the
case according to the general model for a universe with a ... A.
/J
effects is then stra.ightroI'l'lard.
13 estimated by (y,.
1.. It
- 1..
1
t
••
).
Interpretation:; of
For example a treatment contrast (a ... a. t)
1
The average
f3 effeots
1
cancel from this difference
and its sampling error depends on the interaction meansquare (,)13) which measures
deviations about the partial regressions.
.
Folluwing the canonical formulation its
-80sampling variance is
'"
A
V(a.
• a.
,) • E(Yi •• - Yil .... Yioo + Yiloo)
J.
1
(.(~i.
.. E [
(~il ....
•
- -<Pio) ..
2
(,,(~il.
2
,,(~ilo)
-
+ (5 .. - 8ioo )
i
5iIOO )]2
[K.(~ (~ .. j)
+ K5
(~ - y£-)J
(11.4)
Finite population".corrections apply to each bracket since the elements forming each
universe mean include the corresponding sample means, and bracket pairs are unoorrelated on the average.
errors.
e
2 MSq
~j
The term in l/D is not here to be deleted for randomized
For the usual mixed model Band D tend to infinity and (11.,4) becomes
(AI3)/bd
as is t-1911 known.
A contrast between tl-10 A---variants for a given
is equivalent to considering a restricted sub-universe vrith b .. B =
1;
(11.4)
then yields its error variance as 2Kald (when D is infinite) as is also well known
from elementary considerations.
A llmain1effectll as usually defined for a factorial
experiment is to consider the average contrast in a Bub-universe with B ... b, and
thence \nth variance 2K5/bdj and so on.
namely "1..
The absolute mean for treatment Ai'
= m + ai'
is e::timated by 1.
• This estimate being derived from the
i ••
single average regression, instead of from contrasts within the partial regressions;;.
100
its discrepancy contains a component due to observing only a sample of tp,-variants
and its sampling variance mst accordingly include
K~:
(Yi •• - YiOO) = (~ ... ~o) + (,,(~i ... ,,(~io) + (oi .... 5ioo )
(11.5)
vThence its variance is
var(y.
1..
) . E(y,o ' .. 1.ioo)2
1 ••
• (Kf3 +
1
1
K,,(pHo - 'B)
+
1
1
K5 (bCI .. 'N)
(11.6)
where N is the universe of elements from 't-hich the bd 5. 0k'a are sampled: ED for
1J
nested classification, ABO for randomized error.
-81In terms of K(r) (11 • .5)
(~
+ ¥Kab
~Tould
)(% - ~)
be, for nested sampling
+ Kd
(~
-
fi) %
(1l.6a)
The factor (A-I)/A enters because of the nuisance that the average IS!(ab) l'1ithin
a row for F!iven a.J.'t viz.
~
B
1
-A
A
''li''l.'
. ):.
"~ab ..2
J.J
is not equal to K b· The last term follows from d
a
i ••
being an unref;-t.:dct.ed mean of b d • fS each l-1ith variance (~- ~)Kd. If error
ij
.::::.J
13 ..,
If.L
terms ranclorniza the ferm of the last iberm has to be altered to that in (11.6).
Similar modifications apply for (11.4) in terms of K(r).
e
variances can be
lJJ!'i tten
Using canonical components
dm-m at sight from equations like (11• .5) pay.i.ng attention
0:"11y to the num;)er of elE'r.1(mts entering into each mean" of sample or of umvcrsG.
1Jlth specific components t1:le sampling structure must also l::e considered in deter...
mining the forms of the coefficients,
Interpretations concerning the factor
f3 "'Those
"random" has occasioned considerable controversy.
variants are postulated as
Both parties to the oontroversy
appear to be operating ldthin the frameuork of the general
model~
mean square (~) differently in terms of variance components;
but evaluate
in effect one gra,p
ew.luate specific components with a .. A" the other evaluate canonical components.
Both then assume that the appropriate error term, both for testing significance of
mean sJ=luare (f3) and for attaching to means of I'!J-variants, is an estimate of the
part of E(mean square
(K"
rn
additional to that part respectively labelled
or Kf3 in our nota tLon) 0
c{
At this stage both seem to be overlooking precise
postulates of the model and that although the mixed model can for many purposes
be interpreted as a limiting case of the general modelJ the two are not identioal.
-82lIe shall return to this after first examird;ng the initial difference between the two
interpretations of the mean square.
Although involving some repetition of Section 9, it seems 't-lorth Hhile to try
to elucidate the apparent discrepancy of models because it has occasioned considerable debate.
One group (for e~alrnple Mood" 1950; RaId" 1952, Mentzer, 1953;
I
Scheffe, 1954) note that postulating an infinite population of
p. necessarily
J
implies an infinite population of (.,(~)ij' Exactly how they reason next is not
always clear.
Usually both are assumed to be normally distributed" whence the
definitional condition that they are unoorrelated implies independence, and a
bivariate normal population. llhether or not normality be assumed, they do assume,
perhaps without full consideration, that the population of
given
P as
(.,(~ )ij
is infinite for
't'lell as for given a., and that the sample of a or A values associated
with a given
J.
Pj
is from such conditional population llith expectation zero but not
necessarily with sample mean zero.
This leads to defining variance components
as "canonical components"" Table 9.1" except that K.,( may be 't-1I'itten as Q(.,()
or Q(a).
The other group (for example Anderson and Bancroft, 1951; Kempthorne" 1952;
Bennett and Franklin, 1954), avoiding assumption of indepepdence, assume samp1i1"€
whole columns from an A x B array of (ab )ij ld th B~ Q)" and the restricti on
A
" (ab)ij lit 0 in every column. The effect is most easily seen by starting from the
f
general model for tinite universes, elements being defined by the specific latent
equation (8.1), and analysis of variance as in 'lable 10.1.
e
PUtting B •
00
and
A causes Kab to drop out of E(MSq(!3». Exactly 1'1hat happens as B tends to
infinity is not discus~ed, being apparently assumed obvious. But most users d
a
lit
amlysis of variance are not mathematicians, so let us consider l-lhat may be .'3UpPosed.
to happen.
,
e
The postulated distribution of (ab)ij appears *,s a (or A) distinct contiwous
distributions, one for each rOlf (assumed similar, and normal
if
one wishes to add
that postulate), but lil1ked in such manner that a selection from anyone row
determines a column of elements, one from every row, satist'y.i.ng the above restriotion.
Imagine a column
indefinitely.
or
A elements for every j, the number
or
columns increasing
There is no intention to suggest the distributions in every column
to be the same apart from randomization i·Ji.th rows; first becaul:;e suoh forrm.1l.ation
loJould be oontradicted by facts;
seoond because it would imply the rOi." distributions
to b0 made up of the same A elements (with repetitions), and they could not be
continuous, let alone normal.
independence of band (ab).
At first sight hherefore this medel seems to preclude
lbt a continuous distribution
or
b implies not only
infinitely nany elements in the whole population, but also infinitely many at every
b value (as distinct from b j as an element, Sec. 7.22), and therefore also
infinitely many superimpo~ed column distributions to form collectively a lb-arrayl.
The progrescion goes little further by allowing that all such b-arrays may have
similar distril:utions, b and (ab) then being independent, although this is not
necessary.* In the limit the b-array distril:utions may be supposed to become contimaus. Ue thus reach a continuous bivariate distribution TJor.i.th the variates
(b and ab) uncorrelated and perhaps independent.
The difference left from the
previous formulation is that sal'lt'ling from any b-array (81 van b j ) is nade conditional on choosing sets of a (or A) elements (ab)ij l'dth means identically ~ero.
Now suppose that the sets of a elements of (ab)ij ban be generated by random
sampling from an infinite population of (.,(~)ij uith variance K.,(p= lab' followed
* This forrrulation indicates
that for t110 variates, each normally distrib.1ted and
uncorrelated, it is possible to formulate a non-continuous bivariate distribution
in vrhi.ch the tl'10 variates are not independent, and the ~egress1ons not necessarU1
linear.
_',tt:.
-84by subtra~ting the mean of each set:
(ab)ij
= (~)ij
.. (~).j where
(4) j :10 ~ (4 ).j/a. Since the whole analysis is formal (Sec. 1) we can further
•
i-l
:L
imagine bj to be made up of two uncorrelated parts: b • Pj + (4).j. Qi.ven a set
j
of Pj and (4)ij the bj and (ab)ij are unique~ detemined, but not conversely:
cf. equations (9.10). If' we could observe without error the elements of real
complete universes retention of the consequently uniquely defined bj in the linear
model might be preferred. But this rarely happens. Invariably any observation
contains random variat.i.on and individual elsments can be evaluated only as estimates
of statistical parameters, variance between such estimates being greater than that
between the postulated parameters.
Estimates of bj and (ij will be the same; the
only practical effect of postulating one rather than the other in the model is to
alter the definition of what is measured by variance between the estimates (y j ),
••
proportionate to MSq(
).
Since anyway this is going to have a component K , to
6
postulate the remainder as having a component K
4
dcdng we can slightly simplify general procedures.
bj
=~ j
is of no consequence if by so
Using the transformation
+ (..<{3).j
(11.1)
(ab)ij • (.q3)ij .. (.q3).j
the· variance relations are
var(b) • Kb •
I)
it
K /a
(11.8)
4
var(ab) = K.t.(3 (1 - l/a)
These are consistent with all postulates.
(11.9)
~
is equal to var(b) because the
population ot bj has been postulated as infinite.
e
Var(ab) follows from (11.7)
by the usual argument for variance about a sample (or universe) mean, and correepoJ'lda
with the original definition
(Sec. 7.17).
ot Kab which differs from var(ab) since a is finite
,
e
-8,-
•
The difference between the two groups of
~~iterB
is thus seen to lie mer61y
in the linear transformation (11.7) of the nominal components, the first group
using ~ j + (-<~ )ij
the second bi + (ab \ j in their linear model. The transformation consists in taking a part (-<~) j out of b and adding it to (ab) ..,
I
j
•
J.J
the part being randomly selected subject to having variance K-<f3/a • Kab/a and being
uncorrelated with
~j.
The transformation is permissible since the analysis of
Yijk into component elements is from the start formal.
In re combining parts and
·their. lISriances to anSl..Jer specific QUestions both fornulations must lead to the
same answers.
The more flexible canonical formulation seems preferable far oosic
analysis because the restriction a = A may be undesirable with respect to defining
Kb and Kat;.
Equations (11.1) to (11 • .3) define a mixed model as regression on If-variants
while
f3 -variants are
~ random selection.
yields of each obEerved
of
f3 -effects
11-
That is,~t diverts attention;· to ~an
variant, and says tho.t He are to evaluate only dispersio
l-lithout reGard to their ind:i.vidual values.
The first of these
introduces the distinction between the mixed model and a true general model.
former does not say that the l1hole universe of
t-Then lve turn to consider individual
f3 effects
A -variants is
II
Secondly
He abandon the model originally
postulated, and change over to regrestion on the
of model the background universe of
observed.
The
P> -variants.
Hith this change
variants should be reconsidered.
The
restriction A • a was only a device for directing attention to means of observed
A
e
-variants;
re1?tive to assessing /'3-efiects it may be unnecessarily restricti ve
and artificial. vIe are nOlor free to consider defining
13 -stfects
A universe l'lhich may seem appropDtate for this new purpose.
varianoe component
~
relativa to any
Similarly the specific
oan be defined tor a general mexiel 'ltlith any AB universe
-86independently of the limiting case.
The medel used by the first group of writers
above in effect uses this freedom to define
of
A -variants-nominally they do it
t3 -effects for
an infinite population
only for an infinite (-<f3)ij universe tor
every j, but it comes to the same thS·rc since the .,(i oancel out whether this
number be assumed infinite or restricted.
The mixed model is cOJllllonly assumed when one of the factors is quantitative
(for eY.s.mple crop varieties by levels of a fertilizer), the quantitative factor
being taken as the fixed one.
Sec.
4 argued that variance component
ana~ysis
is
not suited to quantitative factors whose observed levels cannot rationally be
regarded as a random sample from a universe of levels; so what should we do when a
quantitative factor is crossed by a (random) qualitative one?
Some writers (for example, Bennett and Franklin, pp. 369-370) argue that
asswnption of a continuous relation between t'tol0 variables is a technical decision,
and that statistical inference can be applicable only to observed levels of a
quanti ta tive fa ctor.
Jh t i t is :rutile to sugGest that we should concern our se1ve s
only with such isolated observations,
The purpose of research is to seek general
laHs of as wide applicability as possible (which these writers, despite their
disclaimer, of course proceed to do), and the province of sta.tistics is the
~hole
chain of inductive reasoning from observations to general law. \'lhen dealing with
quantitative factors we have no parent frequency distribution and no random sample,
therefore the bridge from particular to general cannot be based on the consequences
of random sampling from a frequency distribution.
e
The analogous function as steppil
stone from particular to gemral, and an essential part of the inductiva procedure,
is not-T played by the assumption of a contimous relationship.
Either to disclaim
orms tor the assumption, or to restrict statistical inference to the observed
-87til
points, is pedantio and unreasonable.
Unless for some reason "external" error is greater than "internal" as indicated
by replications (in which case the "interaction" iSfproperlya random error rather
than a real interaction of the tvl0 factors) interaction ',d. th a quantitative factor
U'ould not ordinarily occur at random.
Desirable analysis is therefore to subdivide
it in search of systematic oomponents.
Practicable prooedure may be to fit regres-
sions on levels (x) of the q.t.antitative factor for eaoh qualitative variant «(3j)
and to study variation of the regression coefficients, (examples
Sec.
I
).
I do not believe tha t fixed definiti ons of main effects etc., should be laid
down. A research Horker should rely on his wits accard:bng to ciroumstances of each
case.
Suppose the regresdon on x may be adequately fitted by a quadratio poly-
nomial for each flj:
y!.
• y,.j
- 81Je
..
1J
1.
a
C j +
0
2
clJ.Xi + 0 2jX.1
(11.8)
Dropping subscript i gives the functional or interpolation form for all x at given
.Bj •
Suppose Xi to be measured as deviations from its mean (x
= 0).
(Sinoe this
paper deals only l1i th balanced experiments we are supposing the same set of x.
1
for every
ftj ; but similar considerations oarry over, ldth some complioations,
more general cases.)
the
~dratio
y~J
to
Modern practice l-lould usually fit orthogonal polynomials"
form being
"l)
2
+ cljx + 0 2 .(x (11.9)
.J.
J
Contrasts between y . fS correspond to the usual definition of main effeots, but
= y.
.J.
these values do not fallon the curves.
In effect they estimate yl. at
J
x oj • - clj/(2C2j ) :. (clj2/4c2j2 + 21/2
x )
Therefore unless ~j/c2j is constant for all j (or can be supposed constant but for
experimental error), the usual definition of main effects is making comparison of
.-
e
•
-88estimates at different x levels for each ~j'
purposes these comparisons
1'1'13:1
In many circumstances or for some
sufficiently well evaluate average
flj
effeots, but
the point may occasionally deserve more consideration relative to individual
circumstances than it usually receives.
Nhen a quadratic function is adequate, it is eV:!ldent from (11.8) or (11.9)
that clj rreasure the slopes of all regressions at the same arbitrary (but not
necessarily most interesting) level, x '" O. If higher order polynomials have to be
fi tted, considerations similar to those for y.
.Jo
'toTould apPlY to the linear coeffi-
cients of the orthogonal forms; and so on as the order of polynomial is increased.
~f.hen using the average regression to describe average
J9 effects,
that is
average variation ruth the quantitative factor, the standard errors of its
coeffioients t·1ill depend on their variation 'loTith the
13
-variants.
Variance of
the constant term, Y... , evidently involves variation of the "main" effeots, y j ,
•
as in equation (11.6); while the other coefficients oorrespond to oontrasts like
(""i - .,(.il) and. their varianoes will involve only the formal interaotions (in
addition to experimental error) analogous to equation (11.4).
•
-8912. Error terms and tests of significance.
•
The chief
1IIS8
to which specific variance components have been put is to
indicate appropriate error variances for testing significance of any given mean
square in an analysis of variance. lIe exclude here interpretation of main effects
and interactions according to their classical definition.
section
S to
That was noted in
be a pure regression formulation, defining contrasts between selected
treatment combinations without references to potential universes which they might
represent.
We have since noted that that position can be reproduced as a limiting
case of the general model by defining sub-universes containing only the observed
treatments" but that this is little more than a convention to link the regression
(attention to individual means) with the general model.
are interested in dispersion of
re~l
The idea here is that we
potential universes" and the question is what
measure of dispersion do we real17 want to evaluate and test, indepenqently of
individual contrasts which can be handled as in Section 11. The question is one of
considerable current debate.
The prevailing trend" as illustrated by Bennett and Franklin, seems to be to
recommend formulating models according to the specific latent equation" specify.i.ng
the potential universes of each group" and. seeking to interpret the analysis of
variance as indicated by Table 10.1.
Some recent writers imply, somewhat
dogmatically" that error variances against which each mean square of an analysi s
of variance should be weighed, must be determined according to its composition as
evaluated for specific components with finite population correction factors.
In
fact, When it comes to example 5" they almost invariably make arbitrary decisions
that the variants of some factors will be· considered individually, others will be
taken as samples of indefinitely large universes.
In practice the general model
is rarely used except in this conventional degenerate form which might ;Jie3 better:'
•
-90replaced by' a less rigid mixed regression and random model.
The only example
known to me with a realistic finite universe is that of parts of a loom given by
Daniels (1939).
Furthermore" as illustrated in discussion of contrasts in section 11.
such arbitrary conventional partitions merely tie one IS hands unnecessarily with
respect to evaluating comparisons among those variants whiCh were postulated as
random.
Compare example
I
Section
•
However, suppose that universe sizes for each group can be postulated, giving
rise to a genuine finite universe model.
To follow the advocated procedure means:
firstly that a different error variance has to be computed for each mean
s~are
to
be tested; secondly that these error variances will be linear functions of observed
e
mean squares" wi. th ensuing complications including the chore of 00 mputing quasidegrees-of-freedom numbers for approximating F I tests. Are such troubles really
necessary or worth while?
Do they gain anything?
It interactions can be assumed zero so that the interaction K can be dropped
the main effect Kls become effectively the same in both specific am standard
formulations and no pl'oblem arises - except by mischance when an interaction which
should be zero accidentally appears significant.
behave differently with different
A-variants.
If.lf
be really inert
IS
cannot
Therefore if the interaction be
significant and we conclude that it is real we must be concluding that both factors
produce effects.
To test the main effect of
produces no effect is then pointless.
II
for the null hypothesis that it
The problem therefore iS I assuming inter-
action present, what do we really want to test under the name of main effect
variance component?
The Ispecific school I say that they want to test for differences
among the universe means of /I-variants; but if interactions are real it would be
•
very surprising if they should exactly balance to make these means equal" so we
can safely bet" without test.. that they do differ and the test is redundant.
tests for bk.(. + bk.,<f3/B being greater than zero" and if b/B is not too small it
It
-91-
...
e
could be just another test for the interaction.
or
more practical interest is to
ask whether one variant is consistently more effective than another over and above
such average difference as might be anticipated from chance arrangement of interaction effects.
In practical applications one will n011llally use only one treatment
combination.; rarely will future action produce the average of a universe of
crossed treatments.
The question therefore is" the cross variant to be used in a
future application being unknown, what can we say about average response that may
be anticipated whatever one be used and supposed selected by chance?
(If we can
specify a particular cross variant to be used we have to consider individ-J.al treatment canbinations instead of average main effects.)
All of which merely says that
to test for evidence that K.,( is greater than zero will generally be of more interest
e
than to test for K.
a
cr~mi.cal family,
For examples
Y3 -variants are
suppose the
drugs ot a certain
varieties of a genus of cocci bacteria; y being
some measure of health of test animals.
reasonable assignable sizes:l
A -variants are
The universes of these factors may have
The action of the drugs may be fairly specific.
If
we can have accurate diagnosis of the particular varietyof' coccus to be treated
we choose the drug according to the best specific combination.
But if, from lack
of time or of facilities, the particular variety is unknown, or is one which was
not included in the experiment, we must choose the drug which does best on the
average and the measure of confidence in its effectiveness depends on the interaction variance.
The dispersion of average
variant is evaluated by K.,(.
.If effects
per single chance cross
Although dealing with finite universes, the universe
means Whose spread is measured by K are irrelevant to practical application.
a
A similar point of view was expressed" all too briefly, by Yates (1946" pp. 17
and 42) relative to sample surveys.
Before doing a survey we can safely say that
no two cities or states, etc. would show the sarne mean for a'tr/ character in a
complete census.
A test for that is of no interest.
Ii' He make a test for
significance of a difference bet'1Teen classes of a sample sur vey it can only be to
arumer the question:
is there sorre influence operating to make the individuals of
one state by and laree different from those of another state? If ue could sample
aGain and again in unlimited time uhile general environment remains °lihe same, or if
He could subject innumerable different samples of individuals to the environment
of t'olO states, is there some factor operatinG differently in each state so that a
difference similar to that observed l'1ould be consistently reproduced? In other
nords a test of significance, like random v:lria'bles, always has in its backgrcund
hypothetical infinite populations.
Tests of significance should be those indicated
by a model alioning infinitely repeated sampling.
(Of. also Deming ani Stephan,
19411 on interpretinG censuses as samples.)
Some difficulty of interpretation occurs 'tmen ~ is not significantly greater
than.4 t3 and.,48 is not significantly [;Teater than EtTor, but
greater than Error.
depending on
~·lhat
interpretation.
No
~iZid
4-- is
significantly
rule can be given, some discretion mould be allot-led
ancillary information may be available as to most reasonable
As a rough general rule '\-1e might pool
test for maGnitude of
.4 effects
4
and .4 8 ~s a portnanteau
l'lhether due to interactions or cmnsistently simila]
'tiith all b.. (~1hether or not 8 should also be thrO\Jn in the pool again depends on
J
circumstances. If 13 is clearly significant it uwld be kept out since the prohlem
then relates only to
A.
If it also is insignificant relative to
cA 15 it
uould
go in to test l'1hether or not treatments in General are haVing effectll irrespective
e
of combina.tions.
And of course conversely for testing
indq:>endently significant.)
t3
if
4
is or is not
...>
If the pooled ted is significant ue may conclude that
effects are produced though He remain uncertain vlhether they are mainly
are consist.ent across the board, or are a mixture of the t,·yo.
intel~actions
-93If a main effect is not clearly significant relative to interaction, while
differences between treatments at large are big enough to merit attention, the
indication is that we cannot categorically recommend one quality of a factor as
best irrespective of crossed variants.
K-43 being large relative to K.,( (what we
mean by relative depends on circumstances and the econimics of the case) is a
warning that decision about best operating procedure requires that we study
combinations individually.
Canonical variance components (K(
~
» may be negative.
A negative sample
value is not necessarily to be regarded as an accident of sampling and interpreted
as zero.
JI
being significantly less than
JIf;
is evidence that some form of com-
pensation is taking place, evidence which will usually be worth having.
e
Appearance
of a k( ~ ) with negative sign and appreciable numerical magni tude will bring it
to attention; whereas analysis based on specific components would tend to overlook
it.
(Cf. Yates and Zacopanay, 1935; dealing with special conditions they termed
the effect "competition".)
In view of these considerations, recent worry over what are appropriate error
terms, entangled by certain postulates about potential universes and complicated
by finite population adjustment factors, seems to be creating unnecessary complications for routine working of statistical tools.
Attention to practical operating
condi tions to which consequent recommendations will be applied, and to the
potential~
infinite resamplings throughout which a reported effect may be expected, mey usually
indicate the straightforward classical tests to be relevant.
In general usage the conventional significance levels, P
are arbitrary and do not have to be taken as literally exact.
a
.0$, .01, .001,
The vaguer termino-
logy-significant, highly significant, very highly significant-is, I believe,
intended to remove the statements from the aura of accuracy implied by precise
-94figures and to indicate merely a rougher classification of "degrees of belief"
which may suffice to accept or reject a hypothesis for praotical working.
Inferences trom experimental data will usually be made in circumstances which may
approximate more or less to one of the following three 'climates':
(1)
vIe have to deal with a problem in pure science.
He know we cannot make
a perfectly exact statement J the whole inferential apparatus-linear
models, etc.-leads only to approximate description.
The point at which
tTe stop saying 'Get more evidence' and rest with 'This is accurate
enough for the time being I J depends on contentment, pe rhaps forced by
available resources, which will be determined by a subjective degree of
belief or confidence in the results and only rOUghly conditioned by a
nominal P value.
(2.)
Immediate action is required and we have to advise on what seems best
from available evidence.
This will be the treatmen:t which was best in
our sample and has to be recommended whether the significance of its
advantage over second best was at a .01, .3 or only .9 significance level.
(3) lie have to weigh benefit of a nel'l process against cost of bringing it
into operation.
For this decision we do need accurate probability
assessment, but the relevant null hypothesis will rarely be that the
specific variance component K
a
= O. '!he probabilities to be assessed
will have to be determined in light of all the circumstances of each
individual problem, in particular \-lith respect to influences bearing on
future operation which \'0.11 not usually be those of an arbitrary universe
supposed sampled by an experiment and belonging more to history of' the
exp9 riment than to future application. *
'*
For a sophisticated discussion of these points and an approach towards a
definitive decision theory see Lindley (1953) and discussion on that paper.
-95Interpretation according to t.lte specific model and its concomitant Table 10.1
may be correct for some
pu!."poces~
But re3earch means keeping alert to alternatives.
Those who advocate the specific r.1r.-':iel seem to be ty.4.ng themselves too r:i£idly to
arbi trary or dubious u.'1iverse postulates f.or all inte:,:,pretat.ions f.rom a given
experiment.
The chief thing is to a.void being dogmatic.
of t.he canonical
an3:.~r'3is
is to
r~'naj.n
The ove:::--riding advantage
free from sunh entanglements" while still
allovnng them to be eaGily int.roduced at a later stage as and when they may be
relevant to answer specific questions •
..
13.
Randomization tests and unidentifiable interactions.
Uilk and Kempthorne (1955) have endeavoured to Hork out a Illogical derivation
of linear models for experimental situations" such tl:at, given the pattern by which
observations are taken, the model can be
follot~
l~ri tten
by objecti. ve rules, whence may
without ambiguity the expectations of mean squares and estimates of error
variances.
For brevity that paper will be referred to as
~rK.
Their formulation
appears intriguing; but deeper inspeotion ShOHS it to be unsatisfactory at three
points: inference from randomization tests is too restrictive, their algebraia
mechanism to represent random samplil'€ is computationally cumbersome and Nraps the
theory in a dialectic haze, and their treatment of unidentifiable interactions drags
in redundant canplications; beside::: that it suffers from the rigidity in interpretiIl€
mixed models 't-tlich has bem indicated above to be undesirable.
•
Kempthorne (1952) stated a preference Hhenever possible to base significance
tests on randomization Hithout appeal to infinite pOpllation theory.
no one
't~ill
~-JK
note, v1hat
dispute, "that it is poscible to draH inferences statistically only
about the population from
in ich
samples are dravm according to probability consider-
ations ll , and that the parent universe of a randomization test is the particular set
of plots or ex!)erimen"l:ial units on l!hich an experiment ",ras done.
They contime to
the logical conclusion about randomization tests: tlerlensions of such inferences to
'tlider circUInstanceE cannot be aszes8ed probability 1'rise".
In other 'ltDrds a statis-
tician using randomization tests can act only as historian evaluating what might
have been on a certain past occasion if the dice rolled different ways.
They
recognize that this is not the information vranted, and are endeavouring to emphasize
lithe tenuous relation betvleen the physical situation and mathematical or statistical
abstractions".
•
The stand seems to veer close to
encas::i.~
the statistician in an
-97ivory tower 't'V'here he can say: "l :.;uarantee that arrows shot in my tOller 'tdll hit
their mark in the Nay I say; but I wash my hands of any re sponsi bility for '!-1hat
they will do uhen used in the big world outside5"
An inference lhich is applicable only to a historical grQlp of experimental
units Hhich l-zi.ll never be used again i8 of no interest to anyone.
An experiment
is useful only in so far that tre can expect phenomena &i.milar to those l'lhich it
displays to be reproduced in a wider sphere.
state only that of which he can be sure.
As far as pos:;ible a scientist should
lht there comes a point where some
speculation must be risked to Gain objecti ves, and a statistician1 s duty would seem
..
to include sharing responsibility for inferential jumps rather tl'an putting them
entirely on the shoulders of those i-rho may be ill equipped to assess the risks.
One of my early clients was a chemist Hho at first did not like probability statements.
liTo heck with your five per cents: is this result right, yes or no?" If I
retreated even further to say: "There is a five per cent chance that the conclusion
is i.JrOng for the particular hundred specimens you observed in the laboratory.
can give you no answer at all about mat nill happen in the factory";
presumably reply: "In other words, I must jud:.;e for myself as of yore.
I
he would
Therefore
you are quite useless to me, I need not consult you againll ; and he would appear to
be justified.
No one will disguise that there is a difficult jump from an experiment to
factory or farm conditions.
~1hatever
pains ue take to sample realistic
circumatanc~
we can never be sure that future commerci.al production will draH its materials from
.
e
a supply exactly similar to a population we sampled.
But to retreat to the point
of saying that the universe He sampled is only the N units actually observed am
nothiQ?; more is unpractical and stultifying. We must endeavour to arrange that
-98experimental material lull be a sample of a realistically large bulk of material
available for corrnnercial use, describe as "lell as pos:;.ible the bulk mich was
properly sampled and for l'1hich inferences are valid, and leave to the farmer or
production
mal1ra~Jer
only to decide uhether or not the resources available to him are
reasonably similar or differ in a way for 1-hich rational al101'1ance may be made.
I uould GO further to say that (ldth rare
e:;~eeptions)
economics prohibit doing an
experiment unless its inferences may be applicable to a bulk of material so vastly
greater than that used in the
population.
e~'Periment
that it constitutes virtually an infinite
In other WJ!)rds infinite population theory is almost al't'lays applicable
to the error elements of an
e:;~periment.
~Je
do not hide tha.t it is often difficult
to define juE.t \fhat populati on lfe did sample, the difficulty is especially great
and is lvell recognized in aGricultural
e:;~perirnentation. ~ve
can but do our best to
state Hhat sort of fields and Heathers our e:;:periments reasonably represent.
These aspects of randomization tests lIere recognized in the discussion Biven
,
by liTelch (1937), Pitman (1938), Pearson (1937), and Johnson (1948).
Pearson notes
a further point, namely that randomization tests are distribution-free only Hith
respect to control of type I error:
the optimum choice of critical region, or of
the criterion to be randomized, depends on the form of the population sampled.
He
notes further, as illustrated by the 't'lark of Tedin on blank experiments, that an
experimenter is ra'l;,her more concerned
different randomization sets.
repetitions lhich Hill belong to
"Some of these distributions ... lIould be biased in
one If8.Y, some in en other, so that
e
~'Jith
1.men they are all combined together the resulting
z-distribution should approach that of normal theory.
From each randomization set
the experimenter is concerned in fact with only one value of z, and this has been
selected at random ••• ; conse<pently from the point of vievl of his lone; run
-
e
-99-
At
experience, the appropriate probability distribution for him to use would appear
to be that of normal theory."
difference
"possibly we have here another instance of the
••• betNeen regarding a tost as giving eS8entially a rule to be applied
and justified by long-run experience, rather than a probability measure associated
l1ith an isolated experiment." liThe conception of randanization illustrated in the
examples
~iven
above is both exceedingly su:;c.estive and often practically useful,
but perhaps it mould be described as a valuable device rather than a fundamental
principle. Its adoption, l'lhen it can be follOt'1ed by the calculation necessary to
determine Hhat I have described as the alass I elements, ensures accuracy in the
determination of the probability level of a test criterion, but uithout the aid
of some further principle it cannot help us to decide which of a mmber of alternative tests to choose.
It seems hardly possible to build the methods of statistics
into a consistent 1hole Hithout facing squarely the vlhy of that choice. 1I
That to postulate an infinite population of experimental units simplifies
statistical treatment is a welcone sequel; it is not the reason for maldng the
postulate.
A characteristic of UKts formulation is the algebraic method which they use
to represent random sampling. Cornfield (1944) suggested l-a'iting a function of a
sarrq;>le from a universe a s a function of all the elenents X. in the urriverse ,-Ii. th
1
dummy variables:
5
i
=1
if the i th element is included in the sample
=0
otheruise
For example, the mean of a sample of observations from a universe of N elements is
r1ritten
_
x
N
l''Y
n ..651X.:1.
l!l -
-100Expectations of
xand of functions of it,
such as its variance, are then deli ved
by operating as if 6 lvere the random variable a Cornfield presented the scheme
i
merely as a "device that has been found useful in the derivation of expected values
and variances of stntistics .... Its only advantage is that it reduces the manipulations ... to a simple algebraic routine. 1I
Uhat is being done is to recognize t:r.at
over all possi. ble randomizations every Xi must appear an eqJ.al number of times,
therefore that the expeotation must be a function of the average X(or X2 , etc.)
multiplied by some constant d'lich can be obtained by averaging the constant coefficients 5 nithout bothering about individual X. vau.ues.
J.
forl1l11ation to precisely the -same argument as
expectations of generalized product means.
l.nS
It merely lends an algebraic
used in sect.ion 8 to evaluate
The random variable prodllced by the
act of sampling is a variable vector of n elements (Xl ... X ). The forIm1lation
n
v:rites the function to be considerea, for e~mple the sum as the matric product
/
of tHO vectors (51 ... ~) (~".XN)'. Then over all possible randomizations the
vector of elements remains constant and 'He have to average only the simpler vector
of zeros and ones.
Kempthorne (l952, sec. 8.2) has put this device to more extended usage.
postulates that a randonrl.zed block
bt plot,s each uith its atm
ri~:ed
e:~periroont
error e
ij
, i
He
is perforrood on a particular set of
a
l ...b, j = 1. ut.
He lvrites the
linear model,
"Uik • IJ. + bi + t j +
4 B~j eij
(4)
J
t-lhere 5~j is equal to unity if treatment k occurs on plot j in the i th block and
e
is zero otherw.i.se."
TIus is already more complicated than Cornfield's formulation
because the .sample (eXperiment) has bt of these vectors instead of only one; an
extension TJhich appears unavoidable in vie't'l of the more complex sampline pattern
-101and variety of quadratic forms in the analysis of variance.
H0l01ever, more important in Kempthorne Is use of the device is the theoretical
role Hhich he assigns to it. He continues:
tiThe random error attached to any
observed yield is the whole expression ~ 6~j eij • Any particular eij is a fixed
j
variable uhich we do not know. The random variable in the expression (4) is the
term 6~j ... 1I
Hilk and Kempthorne (1955) extend the device to all elements of the
•
general linear model, that for the t'io-factor e:::periment, ,bich has been used as
standard
e~3mple
in above sections, being l'rrit'hen.
"" i*
'y j*
":'.1* j*
'":i i*j*f
Yi*j*f • l.l. + Ll-<i a i +..6 ~j b. + L.i ""i ~j (ab)1j + ~ Pk
ek
i
j
J ij
k
,/
summations being over the '-Thole of respective group universes and the greek letters
being unity '-Then relevant universe elements indica.ted by subscripts i, j, k are
"'
the same as those indicated blJ the sample subscripts i*, j*, f, and zero otherwise.
On introduci-ilg this formlation they remark:
tiThe quantities (greek letters) can
be treated as random variables because random methods of selection and allocation
are employed. lI
Ibt caution which that sentence might apply is dispersed by the
next paragraph uhich contimes:
tl6f course the random variables in the statistical
i*.i*
i*j~-f
, ~ j am Pk
' 't-lhich take on the values 0 am 1 'd.th knolm
probabilitles. All other quantities in the model are fixed" unknown, parameters.
model are the
~
The forllDJ1ation appears to be an ingenious method of making the transfer from
universe elements as fixed constants to their appearance as random variables in a
sample.
Ibt if it vTere true that ~'* etc. are the random variables, a
etc. are
i
the constants, the expectations of quadratic forms should appear as variances of
.( lIDJ1tiplied by functions of a
as constants. Quite the opposite is produced like
i
a conjuring trick whose deception the authors have apparently overlooked. The
expectations 't-rhich come cut of the hat are variances of the universe element'S with
e
J
-102the nominal variances and covariances of the -< 's as constants I
of ltvariances" used by the authors are the
~
The definitions
parameters 0:£ the grcup universe'B
(8.2>; t..Jhether one chooses to regard these as proporti onate to the second mOlOOnts
of the universes, or to variances of random variables produced by the act of
sampling (sec. 7.15) is incidental.
The switch of notation reveals tha t the only
variables being studied are the ai' the -<'s are dummy constants of the same nature
as the
XiS
in the regression equation (5.1), and are no more random variables than
the universe elements. lIhat the model really postulates is A fixed vectors
(0...1 ...0), the act of random sampling picks a of these vectors,
~h ich
is identi-
cally the same as saying that it selects a of the a • The only thing tha. t has reen
i
done is to l'Jrite the element a as a lliltric product (O...I ...O)(a ...a ...a )'.
i
i
i
A
Presentation of the individual -<i*as basic random variables independently of the
a. is a dialectical artifact Hhich does nothing but confuse the issues.
J.
The only
excuse for using them is as mathematical operators if they simplify algebra.
This
they do not do; they lead to very complex algebra, which can be evaded either by
using Tukeyrs "bracltetslt or the canonical variance components K( y) rollot-Ted by
substituting the simple relations betueen K()") and K(r).
The last feature of 11K's f'orlll1lation to be discussed here is their insistence
that potential interaction of treatments
recognized in the lirear model.
Ne;yman et al (1935).
~1ith
experimental units should be e:xplicitly
The problem seems to have been first stated by
It has been treated by Anderson and Bancroft (1952, Chap. 2);.
in the slightly disguised form of discussing whether blocks should be treated as
random or fixed effects.
_
lJK
illustrate on an example by Vaurio and Daniel (1954) which has also been
discussed by SCheff~ (1954).
It concerns the effect of different methods of
J
e
- 103 -
.J
annealing tinned coils.
Material for observation is taken from certain prescribed
locations (head, middle and tail) on each coil uhich are regarded as subsidiary
treatments.
The anneal treatments mst be applied to ,.,hole coils, random samples
of c coils being selected for each treatment.
uithin coils.
Comparisons between locations are
A surprisiIl8 feature of all three discussions of this experiment is
that they all treat coils formally (lilee locations) as a treatment.
Although
surely they mst have recognized the circumstance none of the three presents the
experiment in the form ,hich most clearly reveals its stm cture, as a split-plot
e~periment.
The coils are main-plots on l"h ich anneal treatments are tested, segmen-
of the c01ls are split-plots for evaluation of location effects.
An a,3!'iculturist
uould "Trite the model as
6 beinr:; the main-p~ot error and &ijk being the split-plot error. (The main-plots
i3
are here completely randomized for treatments uithout hlocking.) It is true that
the theoretical effects'(Yi3 - Yi1j ) and (Yijl - Yi1j ) may differ (theoretical
because two different anneals, i and 1 I, cannot be applied to the same coil), and
UK uant this to be recognized by Nriting the nain plot part of the model as
~ + ~i +
(f
(i)j + (..<
'(j
)ij·
Ibt once started on
S1
ch refinements surely they ough'
also to recognize 6 to represent all sources of random experimental variation
ij
affecting whole plots independently of the intrinsic nature of the coil itself?
That they do not do so illustrates the impracticability of carrying models to the
objective detail for i.,hich they asl<: and llrite as if they
~'lere
from their theoretical argument the point is trivial si. nce
completely confounded.
...
achieving.
Apart
?f (i)j and 6
are
ij
For our present purpose He can ignore the split-plots
l1hich are irrelevant to main treatment comparisons" the mean of split-plot elements
-104,J
over variation of k becoming part of 5ij to gd.ve the nain plot model
Yij
D
m + a i + c(11j + (ac)ij + dij
which I v1l'ite l-lith ronan letters to indicate the specific latent equation.
After
the preceding exposition I dispense with the greek letter chmmy variables of :JKIs
formulation.
Treatlhng the coils as
e~erimental units,
c(i)j + dij is identical uith the
e , (ac)ij l-d. th the nijk, of the earlier part of "lIls paper. Following the first
k
part of this section it 110uld be superfluous in practice to consider the coils as
anything except a sample from a large population of calls.
theory, suppose that a finite universe of
B.1t, to folloH out the
oo11s is being sampled.
Only the
regression 'V±e't-T of anneal effects is of interest so a is read as A without thereby
implying that lIe are saying anything about a potential universe of anneal
...
The aralysis of variance is
E(M.Sq.)
M.Sq.
Between anneals
treat~nts.
><2 > ~2
(A .. 1)
Coils uithin anneals ACc - 1)
a
K
~ll> + o(L11> -<~
L II '>
i» 1
1- .
K~
K
1
1
1
1
y
K.(
c
Formulation of the mean :scpares in terms of brackets exhibits both the sample values
and their expectations.
The coefficients of the standard components, K(
¥),
are
obvious, or follOTt1 from Crumpls rules remembering that coil effects include (AC).
Alternati vely they follow from their definitions in terms of brackets (8.13) atter
exparrling
<: 2>_<~~>D
«2'> - <~;) - <i:>+(:~»
+
«i:>- <:~».
The specific components (8.2) follow from the usual relations
Ka
11 .
=K~ +r'fi:J,rC,o
Kc =.,K·;Me
~
.,.
'i~ ,I
~ ~;""'/A.
-10$Alternativoly one can consider the theoretical A x C matrices for each group of
elements, '.Ji th the ueual reLtrictions, to obtain:
for ai,constant in rows,
.(2) ..
«:~>
<*>
• <i:>- -t.
2
> I(A-l)
For c j' constant in columns,
(2'>
<:~>
i: >
• <~:> -- <2 > l(c-1)
=<
for (ac)ijJ
«'11)_ • .(
••
<i~a
*
-( 2> I(A-l)
<' :~>a<2 '>
the
I. i: >values
2> I(C-l)
I(A-l){C-l)
being hOi-rever here redundant since the sample contaie only one
element per column and does not evaluate these products.
Either substit,ution leads
at once to
Kd
Anneals
1
Coils
1
Kac
Kc
i - ~)
1
(1 .. l/A)
1
(1 ..
Ka
C
which are the first tl'10 rous of T;J}r's table 4, except that they have illogically
AC
redefined Kac as ~ (ao)ij2/A (C-l), a form l1hich implies absenoe of the restrictio.
A
'2i (ae).
3.=
0
~ uhich is none the less imagined, Hhereas the above table uses the
denondnator (A-l)(C-l) in accordance l'lith (8.2).
'"
,
e
-106Is this formulation in terms of hypothetical specific canponents of any use?
One result is that iJUk (discussion at the Montreal meeting) claimed that estimates
of variance components are always biased by treatment-unit interaction.
But that
deduction is seen to deperrl merely on the definition of mat is to be estimated.
The canonical components are, as always, unbiased.
So that detail is easily by.
passed by those who will agree that the k( ¥) are anyt-laY more convenient statistics r
Evidently K and K cannot be separated and K cannot be estimated. But
c
ac
a
these sub-divisions are academic~ In this example a variance component between
treatments is irrelevant because lole are concerned only with treatment means.
But
even suppose 1'1e Here concerned ldth a universe of treatments, still there can be
no urti.verse for which K is a variance component. Distinction of K and K is
a
c
ac
equally an artifact. The practical situation is that coils annealed in one l"lay
•
ldll have a certain mean and distribution;
andd:1stribution.
annealed in another lV'ay another mean
In the models under discussion the distributions are supposed
to be similar and only the means have to be differentiated.
The worth to a factory
of a difference of means depends on their separation relative to the total
variability of coils with a given treatment, measured by Ky + K.( y'
(lIe omit
K because, subject to an irreducible processing variability Hhich is inseplrable
5
from K¥ and rnuet be included theret·ti.th, it can be freely modified according to
number of specimens observed from each coil.)
Similarly a test of significance
between t't-l0 experimentally observed means depends on the total variance Hithin
treatments, the variance of a difference 1:.etween tl-l0 treatment means is
~ (K5
+ Keo<
« 01-
KY ),
independently of any C.
The finite population correction factors appearing with Kac in the analysis
of variance are therefore irrelevant for any practical consideration,
This does
,
e
-107-
t
not say that an interaotion should be ignored just because it cannot be estimated;
only that it is of no interest when it is an integral part of the only variability
~vhich
does interest us.
formal analysis (sec. 1).
Interactions of this sort exist merely as part of a
I assert that interactions of treatments with unidenti-
fiable random variables have no meaning, not even an academic one.
~1K
Behind the
argument lies the idea of a pure scientific investigation as to how a treatment
operates.
But there cannot be any Ilhow" attached to a non-repeatable random effect,
Scientific investigation depends on being able to repeat somethinr:; definable.
interac·t,ion can have meaning only between identifiable characters.
effect is affected by manganese content:
An
Suppose anneal
ue can investigate the anneal and manganef
interaction either by introducing different manganese contents as a deliberate
treatment or by a covariance study on nanganese contents as they appear by chance.
Either Hay the manganese content gives an identifiable link between one coil and
another; and the interaction has meaning.
But so long as there is no such identi-
fiable link the interacti on coil x treatment is a non-reproducible random effect
and a formal abstraction which can be nothing more than an unidentifiable part of
coil variability.
In a similar manner Anderson and Bancroft (Chap. 8) consider block
interaction composed of both error and Ureal" interaction, and
wha t
:It
treatment
should be the
error variance for treatment effects accordinG as bloeks are regarded as Ilrandom"
or Ilfixedu •
Block x treatment interaction is l1Easurable by adding replication
within blocks; but it is never identifiable in the above sense.
The fact tha t
internal replication is almost never used is prima facie evidence that experimenterE
instinctively feel little interest in the matter hot-rever it m9.y be debated theoretically.
A bloct is defined as a particular group of experimental units as used
e
I
-108on a particulnr occasion.
It is an experimental unit standing in the Earne relv.tion
to ultimate units as main-plots to split-plots.
action is the tame as treatment x
to a larger unit.
forming blocks,
repeatable.
e:~perimental
Ue nay be able to repeat
Hence block x treatment interunit, interaction merely pushed back
~ome
characteristic nominally used in
but the lhole complex of a Given block as experimental unit
i~
not
Fertili t~~ of a lump of soil cmnges from year to year, different plants
are grm-ling on it, and are affected by weather conditions and their incidence 1'1 ith
plant phases.
Even if ue could be interested in particular areas of land, identi-
fication of location x trGatment interaction needs replication in years; it is not
synonymous lJith block x treatment interaction in one year.
If blocks for a
mtrition e:qJeriment are formed by groupinG, animals accorclill{; to age, evidence of
varyine response to aGe may be north lookinG for, but needs to be evaluated against
treatment x block interaction for blocks of similar age.
of duplicates
~Yithin
Comparison to variation
an oven batch may ShOrl treatroont x batch interaction, but to
link it to a treatment x oven interaction we need replication of batches.
Any
character of experimental interest occurring between blocks in effect gives the
experiment a split-plot design, and amlysis proceeds aocordingly.
A block in its
general sense beine umoepea-i:.able, nominal interacti on lJi th it is an unreproducible
chance effect.
I therefore assert tffit unidentified treat1'OOnt
=~
block interaction
is an integral part of experimental error.
Note in passing that the inferred population is not one of similar blocks,
characterized by the mean and variance of their absolute yields.
•
-e
..
It is a popu-
lation havinG sinti.lar rooan and variance of differences (cf. variances (1l.4) and
(11.6». viith further experimentation we may be able to identify parts of treat...
ment x block interaction - lIy location, by Heather, by a:;e, by oven, etc. - but
,
e
-109this is rather beside the point.
identify the l-Thole of a block x
On the above definition of a block VIe can never
treat~nt
interaction.
Being unreproducible we
are led to describe blocks all-lays as "randomll , never as "fixed" effects.
1Jhen the population sampled cannot be rigorously defined this point of viel'1
't'Jill not satisty those who repudiate inference to an undefined population.
seems hel·rever to be a lescer evil than retreating to the extent of regarding
inference as 1:eing only for the observed set of experimental units.
It
,
e
REFERENCES
Anderson, R. L. and Bancroft, T. A. 1952 Statistical theory in research.
kcGra"ii-Hln:: Wel" York
Bennett, C. A. and Franklin! N. L. 1954 Statistical analysis in chemistry
and me chemical industry, ':Jiley: New York.
Cochran, U. G. 1937 ProlJlems arising in the apalysis of a series of similar
experiments. J. Roy. stat. Soc. Supple 4:102-118.
Cochran, vi. G. 1939 The use of amlysis of variance in emmeration of sampling.
J. Ani. stat. Assoc. 34:492-510.
1953 sample survey techniques. WileY: New York.
cornrield~J.
1944 On samples from tinite populations. J. Am. Stat. Assoo.
39;2 -2039.
Cramer, H. 1946 M:lthematical nethods of statistics. Princeton Uni v. Press.
crumpLtS, L. 1946 The estimation ot variance components in analysis of variance.
omet!'ics Bull. 2:7-11.
CrumPLS, L, 1951 The present status ot variance component analysis,
BLonetrics 1:1-16
Dan1els, H. E~ 1939 The estimation of components ot variance,
J. Roy, tat. SOC. SUpple 6:186-197.
David, F. N. and Kerxiallt M. G. 1949-53 Tables ot symmetric functions.
BiometriKa 36:431- 9; 38:435-462; 40:427-446.
Deming, H. E. 19$0 Sone theory ot sampling, Hiley; New York.
Deming, 1;1. E. and Stephant F. F, 1941 On the interpretation ot censuses as
samples. J. Am. stat. Assoc. )0:45-49.
Eisenhart, C. 1947
The assumptions underlying the analysis ot variance.
Biometrics 3:1-21.
Feller ll. 1950 An introc:h1ction to probability theory and its application.
trley:
New ! ork •
f
Fimer R. A, 1925 statistical methods tor research Horkers. Oliver and Boyd:
E inbUrgh.
Fimer, R. A. 1935 Design ot experiments, Oliver and BojU: Edinburgh.
Rald, A. 1952 Statistical theory with engineerinc applications. ~Jiley: New Yor~
HaillSOn M. HI" Hurwitz, lY. N. and lI.adOl-T, '1;1. G. 1953 sample survey methods.
I f . 'rheory. WUey: Flew ork.
Hendricks, '1;1. A. 1948 }.~thematics of sampling. Va. Ag, Expt. sta.: Special
Tech. BUll.
Heniricks, U. A. 1951 Variance components as a tool tor the analysis of sample
dati. Mometrics 7:97-101.
Cochran, VI. G.
a
t
Hoel, P. G. 1954 Introduction to mathematical statistics. 2nd Ed. ~:1iley: NetoJ' Yort.
Irwin, J. and Kendall, M. G. 1944 sampling monents of moments for a finite
population. Ann. Eng. 12:135-142.
Johnson, N. L. 1948 Alternative systems in the analysis of variance.
Bi.ometri~ 35: 80-87 •
Kap1anr. E. L. 1952
Tensor notation and the sampling cumulants of k-statistics.
om. 39:319-323.
Kempthorne, O.
K~ndal1,
1952
M. G. 1943
The design and analysis of experiments. Wiley: NevI York.
The advanced theory of statistics. Vol. I. Griffin. London.
On the reconciliation of theories of probability.
Kendall, M. G. 1949
Mom. 36; 101-116.
Kendall, M. G. 1951
Biom. 38: 11-25.
Regression, structure and functional relationship.
Kendal1 Z M. eG. and Sundrum, R. M. 1953 Distribution-free methods and order
properties. Rev. Intern. stat. Institl 3:12q.-134~
Lindley, D. V. 19?3 Statistical inference. J. Roy. Stat. Soc. B 15:30-65-16.
Mentzer, E. G. 1953 Tests by the analysis. of variance • ~1right Air Development
Genter
Tech. Report 53-23.
Mood, A•..fu. 1950 Introduction to the theory of statistics. McGraw-Hill, Net'!' York.
Newman, J. 19.35 Statistical problems in agricultural experimentation.
J. Roy: stat. Soc. Suppl. 2:108-144-154-180.
Ne1'Jllan, J. and Scott, E.. 1948 Consistent estimates based on partially consistent
observations Econometrica 16:1-32.
Pearson, E. s. 1937 Some aspects of the problem of randomization,
Biom. 29: 5J-Cij.
Pitman, E. B. G. 1938 Significance tests rlhich may be applied to samples from
any populations. III The analysis of variance test. Biom. 29:322-335.
Scheffe, H. 1954 Statistical methods for evaluation of several sets of constants
and sources of variability. Chem-Eng. Progress 50:200-205.
Smith, H. F. 1951 The analysis of variance with unequal but proportionate
numbers of observations in the sub-classes of a ttio-way clasE;ification.
Biomet~ics 1:70-14,
Tukey, J. H. 1949 Dyadic anova, an analysis of variance for vectors.
Human Biology 21:65-110 •
Tukey, J. T:I. 1950 Some sampling simplified. J. Am.Stat. Assoc. 45:501-519 •
tl
•
•
Hilk, M. B. 1955. The randomization analysis of a generalized randomized blool<:
design. Biom. 42:70-19.
..
Vaurio, V. W. and Daniel, C. 1954 Evaluation of several sets of constants and
several sources of variability. Chern. Eng. Progress. 50:81-66.
Weloh, B. L. 1937 On the II -test in randomized blocks and latin scpares.
Biom. 29:21-$2.
Wishart, J. 19,2 Moment coefficients of the k-statistics in samples from a
finite population. Biom. 39:1-13.
¥ates, F. 1935 Complex experiments. J. Royl Stat. Soc. Suppl. 2:181-247.
Yates, F. 1937 The design and analysis of factoriallexperiments. Imp. Bur. Soil
Sc. Tech. Comma 3,.
Yates, F. A review of recent statistical developments in sampling and sampling
surveys. J. Roy. stat. Soc. 109:12-30-43.
Yates, F. and Zacopanay, I. 1935 the estimation of the efficiency of sampling,
with special reference to sampling for yield in cereal experiments.
J. Ag. Sci. 2,:54,-517,,'
Yule, G. 11. and Kendall, M. G. 1950 An introduction to the theory of statistics.
14th. Ed. Griffin. London.
-e
,
UNPUBLISHED REPORTS
Hooke~
-
R. 1953 Sampling from a matrix, t-d. th applications to the theory of testing.
tatistica1 Research Group, Princeton Univ., Memo. Dept. ,3
Hooke, R. 1954
ibid. 55.
Moments of moments in matrix sampling:
Hooke, R.
The estimation of polykays in the analysis of variance.
1954
An extension of po1ykays.
,!ukey, J.
~'1.
1949a
Interaction in a row-by-column design.
Tukey, J.
vi.
1950a
Finite sampling simplified. ~: 45.
~.
ibid: 1'.
Ui1k, M. B. and Kempthorne, o. 1955 The logical derivation of linear models
and their use in selectIng the appropriate error term in the analysis of
variance. IV Fixed" mixed and random model s •
•
'.
56.