,
.
~
."
,-
First Draft
ANALYSIS. OF
COVJ~qIANCE
by
w. G. Cochran
Institute of Statistics
University of North Carolina
'~"
~:
!~
"
1.
2.
3.
4.
5.
6.
7.
8.
9.
Introduction
Uses of the technique
Computational procedure
The adjusted sum of squares for rows
Theory of the technique
Numerical example
Hultiple covariance
More complex classifications: the split-plot design
The estimation of components of variance
1.
Introduction.
Analysis of oovariance differs from analysis of variance in
that, instead of analyzing the variation of a quantity y, we analyze the deviations
\~from its linear regression on some quantity x.
For illustration, consider the two-
way (row, column) classification with a single observation in each cell.
If the
suffix (ij) denotes -the i th row and the jth column, and if ~ •• is the mean of the
x's, the model appropriate to the analysis of
cov~rianoe
( 1)
+ -y.
J
e , fJ·,
where
~
i and
Yj
is
+
are population parameters, with the usual interpretations.
The eij are random variables assumed to be normally and independently distributed
with zero means and variance
and
2
0.
e
,,;.
As a rule all quantities
a,
fA. '
f
i' 'Y j
are unlmown.
The principal objects of the analysis are usually to estimate the f i and Y j
and to test certain null hypotheses about them.
These quantitiBs are ofton
described as adjusted ·row or column effects, to emphasize tho fact that the original
e
data Yij have been adjusted so as to remove their linear regression on the Xij.
Since the parameters i i and V j are no oded only to rnoasure differences between tho
effe.cts of different rows or colunms, we may assume that t ") i = t Y j = O. With this
oonvention, tho moen value of the right hand side of (1), taken over the i th row,
is (j-+ ~ i).
This quantity may therefore be interpretod as the true (or population)
adjusted mann of the i th row.
2.
Uses of the technique.
Analysis of covariance was first applied by Fisher (1)
in agricultural experimentation, where the y's were plot yields and the rows (say)
oorresponded to tho experimental treatments.
two sots of circumst8ncos.
First#
Tho techniquo has provod valuable in
the x's mQY be variables which oro correlQtod
with the y's although not themsolves affoctod by the
troQ~T.onts.
In other words
tho x's measure some environmental factor which influences tho results of the
experimont.
~werc tho
For lllstanco, in the application first suggested by Fisher, the y's
yields of 16 plots of toa 'bushes in a hypothetical experimont and the x's
were tho yiolds of tho same 16 plots during a period, beforo the beginning of tho
- 2 -
experiment, in which all plots were being uniformly treatod.
in a sense, tho inherent productiveness of tho 16 plots.
~ to
Thus tho x's measured,
Clenrly, one would expect
obtain a more accurate comparison of th0 effects of the treatments by adjusting
the experimental yields y so as to correct for the effects of such difforoncos in
productiveness.
In this way, covariance may reduce the experimontal orror by ad-
justing for the effects of somo disturbing
Competent
~xperimenters
environmorrc~l
factor.
have, of course, always atternptedto nullify such dis-
turbing effects by careful control of the exper imentel technique.
of oovariance places a now weapon at tlioi r dis posal.
The introduction
If i t is not feasible or con-
veniant to oontrol such factors, it may still be possible to measure them. Their
•
effeots on the experimental results are then estimated in the covori.'1!lce analysis
and the appropriato corrections are made.
It need no'c be stressed tha.t the "linear"
regression oan be extended to include non-linear offects in the usual way.
applications of covariance Imve been made in biological
4It
~ng
success.
eA~erimontation with
Numerous
strik-
Thore seoms no doubt that equally fruitful uses could be made in
other typos of experimentation .TI10re covariance at present appears to be little
known.
Socondly, a covarianoe Cll1D.lysis somotimes helps to clnrify the intorprotation
in exporiments
~merG
the x's nre affeoted by the troatments.
to be presented lator is u case in point.
The numerical Gxample
This exporiment tested tho effects of
oortain soil fumigants on oelvrorms, 1:vhich live in the soil and attack some English
farm orops.
After tho troatmonts had boon appliod, tho numbers of oolworm cysts
per plot and the yields of tho crop (spring octs) woro both reoorded.
menta produced significont effects both on eelworms and on tho oats.
Tho treutIt is cleQrly
of interest to discover whothor tho effects on the crop can be regardod as a
reflaotion of the effects on the worms.
This can be done by finding .mether tho
troo.tmont effects on tho oats persist or disappear aftor adjusting; for the dife-feronces in oelworm numbors.
In this way a covnriance analysis mo.y throw some light
on tho mechanism by which tho troutments produce their results.
- 3 -
As BartlGtt (8) has pointod out, the intorprotQtion of this socond use of co__ variancQ must
involved.
~e
considered ca.reful 1.y, since a rather dongorous extra.polc.tion may be
Suppose in on experiment that the plot plont numbers for one vr.riety vary
from 400 to 600 with
1,000 with
0.
0.
moan of 500, [h'1d thnt for nnothor vo.riety the runge is 800 to
monn of 900.
If yields arc adjusted for differences in stand, th0
o.djustod yiolds for ea.ch variety would formally correspond to a plant number of 700,
this being the moan of 500 end 900.
Unless thore is additiona.l infonrration not
supplied by tho oxperimont, howevor, it would be risky to interpret these adjusted
yields literally, sinoo neithor vo.riety hus any plots with
plan'~
numbors ncar 700.
It is proper to sto.to mlother tho differences III yield oan or oannot be attributed
to a lineo.r r<:::gression on plo.nt number, but the possibility exists of ourvilinear
effeots that the do.ta at hend arc inoapo.blo of detooting.
3.
e
Computational prooedure.
Before describing the theory of the
outline the computational prooedure for the two-way olassifioation.
n~thod
we shall
The notation
to be used is given below.
I Row means
Columns
Yn
Y12
ypl
Yp2
YI.
-YZ·
Rows
---_._
Ypq
..._~--------
Column means
Y·2
-
Y.q
yp.
-
Y••
The information wanted from the statistical analysis is usually the following:
(i) An estimate b of the regression ooefficient
signifioance of the null hypothesis B = O.
B and a test of
(ii) Estima.tes of the true adjusted rOV! means (fA- +1 .), and a test
of significance for any giVGn linear combinatio:6. of these means.
Probably the most cornmon linear function is simply the difference
between two adjusted row moans.
L
- 4 (iii) A test of the null hypothesis that all adjusted row means
are equal.
Similar tests may, of course, be required for the column effects.
The first step in the computations is to carry out separate analyses of varianoe
for y and x, and a corresponding analysis of the oross-products (yx).
Table I.
Sums of squares and produots
(x2 )
(yx)
(y2)
(q-l)
(p-l)
(p-l)(q-l)
Cn:
Rxx
Cyy
Exx:
Cyx
Ryx
Eyx
Ryy
Eyy
(p-l)q
Sn:
Syx
Syy
d.f.
Columns
ROV/S
Error
Rows + Error
The only unusual feature is that a line is added at the foot giving totnls for
rows and error.
Thus
Syy
= Ryy
+ Eyy , etc.
The sum of squares for y in the error
lL~e
is now divided into two parts: the
sum of squares for regression on x (l.d.f.) and the sum of squares of
devia~ions.
The same subdivision is carried out for the "Rows plus Error" line.
Table II. Sums o£ squares for regression and deviations.
Deviations
Regression
d.f.
Error
1
SS
I
I
2
: E yx/Exx
I
I
Rows + Error
I
: S2yx;Sxx
Rows (by subtraction)
d.f.
I
2 SS
I
E yx
I
(p-l)(q-l)-l ,YY
IE - En:
I
S2
I
yx
(p-l)q-l
• Syy_·sI
xx
I
S2 yx ~2
I
'" yx
(p-l)
; Ryy - Sn: + Exx
f
I
: M.S.
t
, s2
I
e
I
I
I
I
I
,
: s2l'
o
Finally, the adjusted sum of squares for rows, with (p-l) d.f., is obtainod
.by. subtraot.ing tho deviations S.S. for error from that for rows plus error.
The information mentioned above is now obtainable as follows.
~
.,..-
(i) The regression
be
Eyx/Exx.
the regression
is distributed
=
coefficient a is estimated from tho orror line; thus
The signifioanoe .of be is tested byto.king 'bhe ratio of
to tho deviations ,mean square; i.e. E2yx/s ?eEx:;~. This
as F with 1 and t(p-l)(q-l)-l} d.f.
- 5 -
= Yi'.-
(ii) The adjusted moo.n of the i th row is estLmated by Ri
e
If the x's nro r.:.lgarded us fixed, any'spocified linear function
1;. ei ( )1-+ ~
normally distributed with moun
be(xi.- x •• )
I:. giRi is
i) IIDd vnrinnce
lJ
Honce u t-tost of tho hypothesis E gi (jJ-+ ~ i)
So for
00
in the usual way.
In
p~rticular,
=0
is obtained by substituting
for testing the difforonco between
the adj ustod means of tho i th and jth rovvs, we have
(1)'
t
=
with {(p-l)(q-l)-l) d.f.
(iii) For a t0St of the hypothesis that 0.11 adjusted row mednS are equal,
e
'write F
= s2r /s2e ,
Table II.
where s2r is the adjusted menn squD.ro for rows,
This F vnriate hns (p-l) Dnd
{(p-l)(q-l)-l \
D.S
W0
gi von in
d.f.
If the corr esponding test is wIIDt3d for th0 columns, Tc,blo II is ro-culcu1cctod·
vdth oolumns III p1nce of rows.
Similarly, for testing a particular subset of the
(p-l) d.f. for rows, Table II is re-computed with rmvs replaced by the subset in
question.
4.
The adjusted sum of squares for rows.
~~en
analysis of covariance is first
enoountered, the feature which seems least familiar is the procedure for obtaining
the adjusted sum of squares for raws (ASSR).
It may be helpful to study the ASSR
in more detail.
As shown in Table II, the ASSR is obtained by subtraction.
By analogy with
ordinary analysis of variance, one might expect instoad to compute it as some kind
e-of sum of squares of deviations of tho adjusted row means Ri = ~Yi.
However, the ASSR is
~
-
be(:Ki. -
x.)}.
equal to the sum of squares of deviations of the ad,just0d
- 6 row means, as will now be shown.
(2) ASSR ::
FrJm 'ra.ble II 'l:;he ASSR is given by
l)y -
It will be rea.lizod bhat the sum of squares of deviations of the
is the same as the rows sum of squares of (y - bex).
(3)
SSD ::
=
(4)
2Ey~
Exx
row meons
Henco
2
+ be
Rxx
Ryy - 2b e Ryx
Ryy -
adjus~Gd
+
2
EyX
Rxx
2
E xx
Subtracting (2) from (4) and taking the common denomina.tor, wo find that the difterence can be oxpressed as tho single square
( 5)
(6) or a.lterna.tively
Rix (b r - be )2
(R xx + Exx )
whore b r :: Ryx/Rxx ' is tho regression coeffioient calculated from the rows line in
the
•
analysis 'of vuriunco.
Exprossion
(5)
wus first given by Yatos
(2).
Consequently tho SSD of tho adjusted row mcC\l1S is o.lways too largo.
Before
considering why this is so, we may noto that if x exhibits no significont effoct of
rows, tho correction term (5) or (6) is usually smo.ll rclo:bivo to the SSD or ASSR.
to
Usc of this rnct/shorton tho computations will be illu~tro.tod later.
The discropo.ncy between tho SSD rmd tho ASSR is duo to the: sampling error of
boo
If
~
wore substituted for bo,it is easy to sec that the SSD would bo distri-
buted as a multiplo of
x. 2
(whon -che true adjusted row merms were 0.11 equal) end
vlOuld be appropriate for use us the numerator of the F-tcst.
in b o complicates the issue in two
difforent
e.
st~d~rd
error, since
WfxyS.
Tho sc.mpling error
First, ouch n.djusted row mCOcn Ri hus a.
- 7 -
Further, the o.dj us·ted moans for rmy two rows nro correlatod, since b o onters
into both.
~bC oxpected
A quo.dro.tic form in the Ri rQthor than
0.
simplo SSD might thorofore
o.s tho best mensure of tho vo.riation c:.mong tho adjusted row manns.
This quo.dratic form is found as follows.
From (5)
(Ryx - beRx,uZ
Rxx + Exx
ASSR = SSD -
( 7)
p
now
:::
yx -bR
OXX
R
q I;
i=l
(xi~
- x." {Gie -
Y••) -
bo(Xie - i .. )\
5
p
=
q I; 1
i=
(x.J.. - x • • )(R.J. - R)
SUbstituting this exprossion in (7) wo find
tr~t
p
ASSR
_
:E
- i, j=l
=
...,
J
~'xx"T'
E
xx
As might bo oxpoctod from gonoro.l theory, the inverse of tho
l~~trix
varianco - covnriunco matrix of tho Ri (o.pQrt from the CO~Jnon factor
o.ij is tho
a~).
'-'
It is o.lso instructi vo to coupe-era tho ASSR with tho sum of squnros of d07io.:I.::10::,s
of tho row morms Yi' from the
sum of squares of (y-brx).
and 1ms (p-2) d.f.
always
~.
~
(9)
~
regression on x, or in othor words ,tith the rOl'IS
This sum of squares is
By compo.rison vdth (2), it vdll be found tho.t this SSD is
than tho ASSR by tho a.mount
RxXExx(b r - b o )2
Rxx + Exx
.. 8 -
Consequently, the adjusted
s~
of squares for rows divides into two parts: a
single d.f., as shovm in (9), whieh compares the regression coefficients from the
-ravis and error lines in the analysis of variance, and (p-2) d.f., as shown in (8),
representing the SSD from the rows regression.
It sometimes happens that the ASSR is large solely
in interpreting analyses of data.
because of the difference between b r and be.
meaning
c~
This result is occasionally useful
In many problems, however, no simple
be attached to b r , and this approach is not then helpful.
To complete the circle of comparisons, we may note from (6) and (9) that the
rows sum of squares for (y - boX) exceeds that for (y-brx) by
( 9a)
Consequently, the rows sum of squares for (y - box) contains (p-2)
comparing (9a) with
5.
veight3d
The remaining d.f. is inflated by th8 factor (R xx + Exx)!Exx, as is seen by
d.f.
e
~roperly
(9).
Thoory of the technique.
A complotelygenoral proof, starting from first prin-
oiples, would be ru,thor lengthy.
of least squares
u.s
We will aSSQ11le somo O-cquo.intanoo with tho method
applied in tho analysis of vl:triance.
The mathemutical model is
where the x's are fixod and the ets arc normally and independently distributod with
zoro
zero.
mo~s
- -Vu.rla.ncos
""
an\},
0
2
e.
Also tho sums of the
)0,
s nnd tho y
tS
::tre both
By the method of least squares, the unknovm paromotors arc to bo estimated
by minimizing
( 11 )
·itj. { Y " - m - ri .. Cj - b(x
ij
iJ
subject to
!: r i
=:
t·Oj
=:
O.
Co.lculation of tho minimum is facilitated by means of
"'"e..Tho prediction oquation
(12)
Yij
= m + ri
+ Cj + b(xij - xu)
0..
slight tronsform'1+'ion.
- 9 -
may be
~
~~do
identioal with the prediotion equation
(13)
if we write
(14)
m = m'
OJ ::: 0
Moreover, sinoe
,j - b(x.j- x ..)
I: r i ::: l.::
= 0,
OJ
it is clear that
Hence, writing X'ij;;: Xij - xi. ( 15)
(
,,'
. ""
~ Yij - m - r i
1J
- c ,j
x. j
t
ri:::
l.::c'j;;:
x.• , we may minimize,
+
o.
instead of (11),
- b
x 'ij) 2
Thosa~fnL'li1iar with tho analysis of vaI'I.ance ""ill r,Gc0i.".izo x' as t:}O error CO:·l~(j:"l()!".t
of x in its analysis of'varianoe.
The reason for introducing X~j is that its
sum
is .zero over any row or column.
Thus (15) may be v~itten
(16)
I:
- o'J' )2 - 2b
• ° (YiJ' - m' - I"l'
~J
l~J'
Yl'J'x' ...4Jo + b
2
,2
.••
lL:' x ~J
j
Differentiating with respect to b, we find
( 17)
=b e
sinoe X~j is the error component of x ij •
Now the minimization of the first term in (16) is equivalent to an ord:i.nary·
analysis of varianoe of
( 18 )
ID':::
Y..,
y(~~thout
I' 'i
::
Yi.
-
covarianoe).
Y.. ,
0' j
:::
The solutions are
Y. j
-
Y..
Henoe, by (14), the least squares estimates for tlw covarianoe problem are
(19)
f
l
ID:::
Y
••
ri::: (Yi • Y• .)
Cj=(Y·r Y••)
be(Xi'-
x••)
\1
- 10 ..
The value (~+ ri) will be found to agree with the expression given in section
:3 for the estimate of the adjusted row mean.
Further, from (16), the residual sum of squares is
2
E
_ 2E XX
yy
Exx
(20)
with
t(p-l)(q-l)
-
2
E yx
Exx
+
I}
d.f.
II.
Hence the mean square is s2e as defined in 'l'able
With the x's fixed, be. ri and Cj are normally distributed, being linear
funotions of the yts.
Also, by the general theory of least squares, they are
unbiased estimhtes of the oorresponding population parameters.
The tosts of sig-
nifioanoe given in seotion 3 for these quantities are established by tho usual
methods and will not be given in detail.
With
~egard
to the F-tast of the adjusted row means, it is perhaps bost to
quote the general theorem on the
Theorem:
F-tGs~
in l0ast squarvs, as applied to this
problc~.
With the assumptions spocifiod in (10), let
Da
=
Dr
=
~
- m - ri - Cj - b(xij - x ••
ij { Yij
1:
ij
\ Yij
• mil
-
Cl!
-
J
b"(x ..
J.J
)1 2
. x•• »)2
where the oonstants are ohosen in each caso so as to minimize the correspondin::;
of squares.
Then if all
~i
are zero, the quantity
(Dr· Da ) /
. (p;.1 )
S'l1J'
Da
/ {("""'p-_lM )':'r(q--.....t ,..,)_""'"'I}
is distr~buted as F with (p-1) and
{(p~l)(q.l).l}
d.f.
Genoral proofs of this useful result, which is not
uS
well known as it deserves
to be, hnvo beon given by Yates (3) and Wald (4).
It has a1roady been shown in
~O)
that De. is tho SSD from t"no orror rogression
and that the oorresponding meun square is s2e •
..
By tho s~no QpproQc~, Dr will bo
the SSD from the "error" rogression when only column effects Dora climino.tod
j"1
t'VJ
c~nnalysis of variance. But; in tha.t event the "error" will bo equo.l to "rows pIns
error" in the present ana.lysis.
to 82r· us defined in Ta.ble II.
Hence the numerator of F, (D r - Dn)/(p-l), is oquo.l
This oompletes the proCJf.
- 11 ..
~
. An
alternative and more direct proof may be developed from (9).
In this it
was shown that the adjusted sum of squares for rows divides into 1 d.f. which compares b r and be, and (p-2) d.f.
representing deviations from the raws regression.
If the null hypothesis holds, each sum of squares may be shoVJl1 to be distributed
independently as ;t2 0 2e and to be independent of s2e •
The appropriate F test follows.
While the theory above has been developed for a particular type of classifieation, tho methods apply without diffioulty to other olassifications.
6 _ :Numerical example.
This example is intended to illustrate the computations and
to prosent two short cuts that are sometimes useful.
at tho Rothamsted Experimental Station.
Report, pp 176-178.
The experiment was conducted
Tho plQU and yields are given in the 1935
There were four fumigants, chlorodil1:1.trobenzene (eN), carbon
disulphide jolly (es), "Cymag" (CM) and "Sookayll (CK), each applied in a single and
double drossing.
~oach block und
The ninth troatment, no
fumig~t,
tilo exporiment oomprizod four
was roplicated four timeswlthin
rfu~domiz0d blooks
of 12 plots eaoh
(plots 1/80 acre).
Twa weeks beforo the fumigants wore a.pplied, an ostimnte
[
![
I
I
[
san~los
of tho n\uubcr ofoclworm cysts on oach plot.
It
WaS
WaS
made from soil
plannod to usc thoso
data. as on x-variable in oovariance in the hope of removing tho influence of no.turu.l
variations in tho degree of eelworm infestation.
after tho fumigants had boon ploughodin.
explained lutor.
~ere
drillod five days
To moasure the treutmont effects on the
worms, n seoond count was mnde on ouch plot after
the two counts arc shown in Table III.
Tho ants
h~rvost.
Trontmont totals for
Tho lines markod tLt und
tQt
will bo
It will be notod that tho oysts inoreasod from tho first totho
~.
socond counts under 0.11 treatments.
,
,
neoessarily inorrootivs sino. they
[~
,
r
I
This does not mean that troutncnts wore
~y hove
kept do>n tho rOGO or incrco'.,
- 12 Troatnent toto.ls foZ" nVI.lbors of cysts.*
Table III.
After fW1igo.tion (y)
Before funiga.tion. (x)
Tot 0.1
1975
eN
~O2
2289
1066
1265
3596
867
Love 1
CM
CN CS
CK
0
1975
1
402 417
513
2
389 554
568
778
L 1180 1525 1649 2126
415 280
458
362
Q
i
6840
1515
CM
CS
Total
5858
4317
4505
13327
4129
CK
5858
928
1431~92
I 877 . 1241
1122
2682 3913 13136
979 1621 i 662
*Numbors of cysts per plot wore tho numbors fOQ~d in 400 ~lS. of soil.
Thus the toto.ls in "I:;ho to.ble represent 1600 gns of soil, excopt for the
no-fumigcnt totals, which represont 6400 gLS.
Tho experiment.constitutes 0.
~ro-wo.y
clo.ssificction with 48 plots and is
annlyzod lllto roplioa.tos (3 d.f.), troo.tnents (8 d.f.) mld error (36 d.E.).
Tho
36 d.f. for error contcl.in 24 \ihich reprosent tho interactions of treatnonts and
roplic"ates and 12 which represent
vt'.rir~:tions
within the sots of 4 no-fUl-.Iigc.nt plotz
in ench roplicate.
e
The simplest hypothesis to consider ubout tho effects of the funigants is thnt
tho ro sponse is c, l:irl.oar function of tho level.
the number of cysts for tho uth fumigont o.s
in question o.nd
:motor
i~
~. (responso
(q.
Under this hypothosis we reprosent
-~... L) whore L (0,1,2) is tho level
is tho decroo.se in cysts for oo.oh increo.se in level.
at zoro lovel) mo.y be pre surned thQ
-I:;his hypothesis is tono.blo, tho quo.ntities
~~( or
SO.1.10
The
for c.ll f'unigo..nts.
po.r~-
If
ostilnatos of them) nrc c.pproprio.te
for compo..ring the effectivenoss of tho fur.ligQllts.*
With this o.pproo..oh III mind, tho 8 d.f. for tN[ttments Bny be sopo.ratod into
four orthogono.l ports.
Consider first the tot0.1s over all four
figures 5858, 4317 nnd 4505 for the 'o.ftor fu:;ligt,tion' do.to..
fill~igonts,
o.g. tho
These threo figures
provido 2 d.f. o.s follows:
Avoro.gc lincnr response = (5858-4505)2/32 Z 57,207
Avoro.go curvc,turo == {5858 + 4505 - 2( 43l7)} / 96 == 31,140.·
~*ThO 100.st
squo.ros solution for this problem (in c. DOl'O gonoro.l Co.so) ho.s boon given
rocently by Bliss (5).
- 13 Next, there are 3 d.f. vnlich compare the differences in the linear responses
'~to the
four fumigants.
First we add the total at the single level to
total at the double level, obtaining the figures
L~
row L of table III.
level, being cmmnon to all fwnigants, doc3 not enter.
data, tho 3 d.f. are thon given
bY~ 1(3596)2 +
,!i~
The zero'
For the tafter fumigation'
(2682)2 + (3913)2 +
(3136)~-~(l3~7f
:=
Finally,
~lere
the four respense
1
W
f(867) 2 +
43,409.
are 3 d.f. which measure tho differonces in tho curvatures of
Ct~vcs.
To obtain these wo subtract the doublo lovol from tvnco
the single (line Q in Table III).
arc
the
(979)
2
Again, the zero levol is not used.
Tho 3 d.f.
l '
+ (1621)2 + (G62)2} ... 00 (4l29)2}
:=
25,693.
Ono reason for making this subdivision is that i f curvature offocts arc significant
the simple modol will huvo to be abmldoncd.
The sums of squares and products
Table IV.
0.1'0
shown in Table IV.
Sums of squares and products.
d.f.
3
8
Replications
Treatmonts
1
rAvcragO linear rosponse
Avera.ge curvo.turc
1
3
lDiffcrences in linear
Differencos in curvature 3
36
Error
x
(x 2 )
159,597
29,141
3,081
2,204
22,976
881
121,429
= before,
(yx)
175,873
-9,222
-13,276
8,285
-6,837
(y2)
489 ,427
157,449
57,207
31,140
2,606~
25,693
544,690
189,278
Tho regrossion coefficient b o is (189,278)/121,429
to ma.ke an F-tcst of the adjusted sum of
of troo.tmonts.
y= after fumigo.tion
squ~ros
43~409
= 1.558754.
Wo vnsh first
for oach of tho four oomponents
Striotly speaking, it is necessary to carry out the procedure of
seotion 3 separately for oach component.
SL~co
this i8
r~thor
laborious whon thoro
arc numerous components, we first mako approximate F-tests from an analysis of
.. 14 Table V Analysis of variance of (y - bex)
4tTreatments
Average linear response
Average ourvature
Differences in linear
Differenoes in ourvature
Error
d.f.
S.S.
8
267,003
106,081
10,667
120,546
19,709
249,652
1
1
3
3
35
F
M.S.
32,125
106,081(-2,625)
10,667(- 191)
40,182(-4,199)
6,570( -4)
7,133
14.87
1.50
5.63
0.92
It will be recalled from section 4 that these F-values are all too large.
Sinoe.
however, the averago ourvature and the differences :in curvature in Table V are both
non-significant, there is no need for the exact tests of those components.
the two linear terms are both significant at the 1 perccmt level.
Further,
To one experienced
in such analyses, it would also be evident that the exact tests would not chango
these verdicts matorially.
~
As a matter of. interest the correction termS for tho
squares, as calculated from (5), are shown in parentheses in Table V.
sh~uld
It
again be emphasized, howover, that where treatments affoct tho x-variable
_tho correction terms ma.y be very substo.ntial, and this o.pproximde method may give
misleading rosults.
The adjusted trcatment moans
0.1'0
shovm in Table VI.
For examplo, for tho un-
treated plots the mean numbers of worms a.re 123.4 (first count) and 366.1 (second
COUll·C).
Hence the adjusted moon is
366.1 - 1.56(123.4 - 128.5)
= 374
the figure 128.5 being the gonerul moun nt the fir£t count.
Tublo VI
Level
Adjusted trontmont menns (cysts pCI' 400 gms)
. ON
cs .
310
270
203
o
OK
358
201
178
374
1
2
_..
OM
365
289
For t-tests of tho differences betwoen puirs of troatment mouns, it is ruthor
.
.•
inconvenient to havo to oalculate u soparato standard error for every pair, though
for oxuotness this is requirod beca.use of tho term
(xi.- Xj. )2/Exx. Finney (6) and
- 15 Coohran (7) have suggested as an approximation the use of an average derived as
~ollows.
The estimated variance of the difference between two adjusted treatment
means (averaged over q oolumns) is
s2
e
f~+
\q
The average over all pairs of treatments is found to be
where t xx is the treatments
~
square for x.
This suggests tha'c we may regard
t
s~ (1 + ~ ) as the estimated varianoe per plot, to be used for all t-tests based
Exx
.
on the adjusted means. As Finney points out, this device cannot be used with safety
unless the treatments produco no significant effects on x.
From Tables V and IV, the effective error variance per plot in this example is
found to be 7,133 x 1,030 = 7,347.
For the figures in Table VI which are based on
4 replicates, the standard error of the difference between two nwans would therefore
bo taken as
~
~ (2 x 7,347) -/ .f4= 60.6*
Since it takos aocount of the sampling errors of be' tho effective errorvarianoo
7,347, is also the appropriate error to assign to the experiment for answering the
question:
how much has the error been roduced by covariance?
If the first counts
had not boon taken, the error vo.riartce would havo been 544,690/36, or 15, 130, (from
Table IV).
It may therofore be estimated that tho use of covQriance was equivalent
to doubling the number of replicates.
*If we accept tho hypothesis of linear responses to tho .fwmigants, the results
of this experiment would be sWTh~arizod in terms of tho ostimates a, dl, ••• d4 (after
adjustment for tho first count), rl.lthcr than from Table VI. The reador 'who is
interested in the details may wish to verify that (i) the weighted moan of the tvvo
levels, giving double woight to the second leval, (e.g. (310 + 2 x 365)/3 for CN) is
tho' least-squaros Gstimata (a -~C\). This result implies, incidontr,lly, that"t-tosts
of tho differencos between thes3 weightod mGnns nrc identical with 'b-tests of the
difforenoos be~.oon tho oorresponding d's. (1i) while the moan for no-fumigant, 374,
is an ostimate of ex. , it is not tho lea.st-squaros cstirnato. This he.ppens beoause
quantities like (2 x 310 - 365) arc also estimates of ex. on the null h~2othosis.
The solutiml assigns a weight of 20 to the ostimnto 374 nnd a weight of 1 to tho
osti:nw.to from each fumigant. The resulting estimato of 0. is 363. (iii) the csti_~utes of the d's (i.e. tho docroasos in oelworms to the singlo lovel of fmnigant) aro
10 for CN~ 83 for CS, 31 for CM and 106 for CK.
- 16 As mentioned previously, this experiment also serves to illustrate the second
-euse of covariance, in whioh the oysts at the second oount are taken as x, and the oats
yields as y.
Since the error
re~ression
of y on x was no·!; signifioe.nt, no details
will be given.
7.
Multiple covarianoe.
The extension to the case where there is more than one
independent variable presents no essential difficulty, though the computations
na.turally become more involved.
With two variates x and z, the regression coefficients
arc obtained from the normal equations
If t-tests of the adjusted row means are wanted, it is advisable to compute the
inverse of the Exz
~trix,
say wxz •
The adjusted moan of the i th row is estimated'12Y
~(21)
Yi. - bx(x~ -
x•• ) -
bz(zi. -
z•• )
whilo tho variance of the difference
between the i th and jth rows is ostimated by
2
(22) s2e
+ (xi.- xj.)2 wxX+ 2(xi.- Xj.)(Zi.- Zj.)wxz+ (zi.- Zj.)2 w zz )
{q
The average ovor all pairs of rows is
2
(23)
q
s2e
\ 1 + r xxwxx + 2r xz wxz + r zzw zz }
Where r xx is the rows mean square for x, etc.
In experiments whore tho raws do not
affoot x or z, this expres s ion may be used to dorivo an o.vorage stonda.rd error for
t-tests.
Caloulation of tho exact F-test is la.borious, since ·cwo sots of rogrossion
oquations must bo solved for oo.ch F.
If rows do not affoct x or z, it is usually
best to begin with an approximato F-tost based on tlw ano.lysis of varianoe of
_
(y - bxx - bzz).
In cases where thisF is highly significant or non-signifioant
it will not bo nooessary to obtain the oorrect F.
..
•
the snfest prooodure is to compute the corroot F.
If, howover, rows affeot x or z
- 17 8.
More complex olassifications - the split-plot design.
__eadil y to marc oomplex classifioations.
The procedures also extend
A now situation arises, however, with
classifioations where the mathema.tioal model requires more thun one orror variance.
Although thorough discussion of such cases is boyond tho scope of this
na~ure
of the problem will bo
independent variate.
p~por,
tho
for the split-plot design with a single
illustr~ted
In this case, owing to tho positive correlation between aub-
plots in tho same whole-plot, treatments upplied to Whole-plots have in goneral a
larger orror than those applied to sub-plots.
In the analysis of varianoe, two independont estimates of the regression nrc
obtained: one, bw' from the whole-plot error und one, b s ' from tho sub-plot
error~
Often thore will bo a priori reasons for be Hoving that S w and 8 s are equal.
If
this hypothesis is not contradictod by the data, what is wantod is u pooled estimAto.
As might be guessed, the
mv~imum
is the corresponding Exx /s2 e •
likelihood estimato assigns to oach b u woight that
Since tho denominutors
820
depend on tilo value of b,
~suocessivo approximation is neoded and the resulting s~~pling theory is not simplo.
Often, however, it will be evidont that b
be used sntisfactorily as tho estimato.
supplies most of tho informa.tion and can
s
An example in which this wns done has been
presentod by Bnrtlett (9).
If the whole-plot regression differs from the sub-plot regression, bw may bo
used to adjust the Whole-plot yields ond b
s
to adjust tho sub-plot yields.
struction of the table of adjusted tron.tment moans roquires a li-[jtle cnre.
ndjustment to the mco.n Yt
Where
- Xw
- nnd xg
x.c,
fOl'
- bs(x t
ConTho
lUly treo.tmcmt combination is
-
Xv;) - bwCiw - Xg }
nrc respeotivoly the x moons for the treatment combinution,
the corresponding whole-plot troa.tment end the whole experiment.
Tho sto.ndn.rd error
requirod for any t-test of the differonce botwoon a.djusted treo.tmont moans can be
worked out from the algebraic struoture of the differenoe.
Evon with a relatively simple classifica.tion, it may occasionn.lly be oonsidered
a will v~ry in differont pnrts of t!ID
tion, a might vary from oolumn to column.
thnt
data..
Thus with a two-way clo.ssifica-
If it soor.1S vvorlhwhilo. a sepc..rate b nay
- 18 be fitted for each column.
Usually there is no convenient simplification of the
normal equations in these cases, which are best treated by straight-forward regression
- - methods.
9.
The estimation of components- of variance.
Just as in the analysis of variance,
the mathematical model for covariance may be appropriate in cases where senne or all
of the unknowns Sf i' Y j
etc. are regarded not as parameters, but as random variates
whose variances we wish to estimate.
In other words, the function of the analysis
is to estimate certain components of the vari~~ce of (y - ax).
problems involved appear to have been little discussed.
The statistical
Only an introductory account
will be attempted.
C'Onsider for simplicity a one-way classification, where Yij is the jth member
of the J.'th ro'" (~
" . . = 1 "2 •• p;
J' c. 1 , 2 , ••• q ) •
The model is
e
Y~J'=)A.+~J.'+
ex··+e
..
...
P
J.J
J.J
(24)
~ i'
where
VD.r i
llucas
e ij are all normally and independently distributed with zero means nnd
2
0 r'
0
2
respectively, whilep.JB mld tho x's arc fixod.
c
By frumiliar transformations the log. of tho likolihood is expressod by tho
=
-2L
equation
>;
{(Y1j- Y
i.) - 8(xii - Xi.)} 2 +
2
i,j
00
2
o g
0
2
Th~
g
=0 2e
+ q
0
\(Y1.... Y• .) - S(xi.wg
J.:?'
+ pq \(Y.. -,.,.- 8x• .)2~+ 2plog
where
q~,
0(1'
0
+ 2p(q-l) log
0
~••• )}2
0
2
r •
closo parallcl vnth tho problem of tho split-plot design vdll be noted.
The lnaximum likelihood ostimD.to of
a
is n weighted moan of tho b' s dorivcd from
the rov/s nnd error linos in tho analysis of variance, and successive approximation
is involved.
If, hovlovor, the information about B from the rows line cnn be
ignored without too much loss of
in~ormntion,
the procoduro bocomes cnsier.
It may be shmvn by stnndnrd mothods that the residuQI error sum of squares
(E yy - E2yx!Exx) is rn unbiased ostimato of p(q-l)o 2e •nnd , iVith normality assumod,
is distributed asX2a2
e
with p(q-l) "d.f.
The adjusted sum of squares for rows.
oaloulated by the usual subtraotion prooedure, will now be oonsidered.
As in the
oase of the two-way classification, this divides into two parts.
q
t!cY
l' i •:. Y.. ) -
b
(i. - i •• )}
r 1·
~
2
(b
r
_ b )
2
e
R
E
xx xx
RxxtExx
where b
r
is the regression ooeffioient from the rows line.
Yi. = ~+ ./ i
(26)
-
a
+
-
X
Now from (24)
+ ai_
ii
so that the quantities Yi. are independent and hava a linoar regression on the x.:L. ,
2
2
with residual variance ( a r + 0 e!q.) Hence by ordinary regression theory tho
2
first tern in (25) is distributed as V2 (a
+ q02 ) with (p-2) d.f.
"
For the seoond term,
e
~
we
r
have
-••
=a+
- x
)
Rxx
=a +
ii.)
be = i:j Yij(x ij -
E e ij (x ij - ~i.)
Exx
. . Consequently (b
•
r - b e ) is normally distributed with mean zero and variance
2 )
2
( q2 + q0
a
+
-------R
0
r
(3
xx
Honee the second term
(b
r
_ b )2 R E
e
=
Xltxx
Rxx+E xx
with 1 d.!'.
To s \II!1larize
2
= (p-l)s r =
ASSA
.
Residual error 55
2
(a e +
= [Jp( g-l)
2
C1 a r)
}2
- 1
S
2
222
(p-2) + (0 e + ~ qo- r)
J...
e
1-
1-
= a2e 2
')/,
"
Iv .; p( q-l )-1 \
....
)
Further, by oustomary analysis of varianoe theory, it is easy to show that the 1ihreo
are independent.
It follows that
2
of a
is
r
8
2
e
is an wbia.sed estimate of
(s
2
r
- s
q
2
)
e
(p-l)
(p-2+ ~ )
0
2
e
; while an unbiasod
o~ti.m.ato
- 20 -
This estimate involves some loss of information at tyro stages (i) information
~in
the rows about S is ignored (ii) the single d.f. in the ASSR is presumably
given slightly too muoh weight.
It seems likely that the loss of information will
often be trivial.
If the estimate of
S from the error line is used, the results for the two-way
olassification are exactly parallel.
The best methorl for obtaining confidence limits for the ratio
is not quite clear.
N = 0- 2r. /0 2 e
One procedure would be to use the fact that
(
(ASSR)(p_2)
.. (l+qv)
+
\
.
(ASSR)(l) _~
(l+lI.qv)-J / (p_l)s2 e
I
is distributed as F with (p-l) and [p(q - 1) - 1] d.f., Whore the suffixes distinguish the two components of the ASSR.
found by trial and error.
The upper and lower limits can then be
Referoncos
(1) R. A. Fisher. St~tistical Methods for Research Workers.
Edinburgh, Oliyor and Boyd, 4th ed., (1932), c,49.1
(2) F. Yatos. A complex pig-feeding exporiment.
Vol 24 (1934), P 519.
Jour. Agr. Sci.,
(3) F. Yatos. Orthogonal functions and tests of significance in the
analysis of variance. Jour. Roy. sta.t. Soc. Suppl., Vol 5,
(1938), pp. 177-180.
(4) A. Wnld. Lectures on the analysis of vnrinnce and covnrionco.
Columbia University. (1946).
(5)
C. I. Bliss. An experimontal design for slope-ra.tio
Ann. Mnth. Stnt., Vol 17, (1946), pp 232-237.
(6)
D. J. Finney. Stnndcrd errors of yields adj usted for rogression
on an indopendent mea.suroment. Biom. Bull. Vol 2, (1946),
pp 53-55.
(7)
w.
QSSQY.
G. Cochrc~. Tho ona.lysis of lattico and triple Inttico exporiments in corn vnriotQl tosts. II. Mnthcm~ticnl theory.
IowD. Agr. Exp. sta. Res. Bull. 281, (1940), pp 64-65.
(8) Iv!. S. Bc.rtlott. A note on tho one.lysis of cova.rianco.
Sci., Vol 26, (1936), pp 488-491.
Jour. Agr.
(9) M. S. B:)r-blett. Somo oxcunplcs of stc,tisticn1 mothods of research
in agrioulturo c~d ~pp1iod biology. Jour. Roy. stnt. Soc.
Suppl., Vol 4, (1937), pp 142-146.
© Copyright 2026 Paperzz