Hatheway, H.W. and E.J. Williams; (1957)Efficient estimation of the relationship between plot size and the variability of crop yields." Reprint 119.

Efficient Estimation of the Relationship between Plot Size
and the Variability of Crop Yields
•
W. H. Hatheway and E. J. \'Jilliams
Institute of Statistics
I-limeo Series No. 174
June, 1957
Efficient Estimation of the Relationship between Plot Size
and the Variability of Crop YieldsF
'W. H. Hatheway and E. J. Williamsi~
I.
Introduction
The optimum size of plot in field experimentation depends on the relationship
between fixed costs and costs varying with number of units, and on soil variability.
Perhaps the most useful measure of soil heterogeneity yet devised is that of Smith
(1938), who showed empirically that the logarithm of t he variance between plots of a
given size was linearly related to the logarithm of the size of the plot.
In the
present paper we consider only the relationship between size and variability.
The
objects of the paper are, firstly, to show hOTN' efficient estimates of the constants
in this relationship may be determined, and secondly, to illustrate a general method
of determining efficient linear estimates when the data are, as in the present instance, correlated and of unequal variability.
Koch and Rigney (1951) demonstrated that the regression coefficient of the
logarithm of variance on the logarithm of plot size cou.1d be estimated from experimental data in which treatment effects are present, as well as from the data of
uniformity trials.
They noted that Smith had recommended that, in determining the
regression coefficient
£,
the variances of the different sized plots should be
weighted by their respective degrees of freedom.
In fact, since the variance esti-
mates for different size of plot, both in uniformity trials and experimental data,
are built up from common components, they are frequently highly correlated, so that
a simple weighting by degrees of freedom is not accurate.
Koch and Rigney point out
this diffiCUlty for experiment a' data, but do not seem to have realized that their
arguments apply with equal force to uniformity trial data.
e *.;.
Paper No.
57 of the Agricultural Journal ~)eries of the Rockefeller Foundation.
Rockefeller Foundation, AgriCUltural Field Staff, and the Institute of Statistics,
Raleigh, N. C.
•
- 2 -
The present paIEr presents a method of weighting observed variances of differentsized plots which leads to an unbiased estimate
£. with
asymptotically minimum variance.
It is applicable both to uniformity trial data and to experimental data; in the letter
case the analysis of variance is in effect reconstructed to simulate one derived
directly from uniformity trial data, in the manner suggested by Koch and Rigney"
2?
Estimat1.on from Uniformity Trial Data
Koch and Rigney showed that a uniformity trial subdivided to simulate a spli-l::,·
plot or lattice design could be analyzed in the manner shol\Tll below; a randomized
block arrangem:mt could similarly be superimposed on the trial, though it would not
prOVide so much information about the relationship of variabj.lity to plot size.
Source
Degrees of
Freedom
~J[ean
Square
Expectation of
Mean Square
Replications
d-l
VI
S + aP + abQ + abcR
Blocks within
replications
d(c-l)
V2
S + aP +
cd(b...l)
V
3
...,C' + aP
bcd(a-l)
V
S
Plots 'TId thin
blocks
Subplots within
plots
4
ab~
The variance of plots the size of a complete replication is VI' the replication
mean square as it appears in the analysis of variance.
The variance of plots the
size of blocks contains, in addition to the variation due to blocks 1,rithin replications, that removed by the stratification of groups of blocks into replications in
the analysis of variance.
Thus the total sum of squares for blocks is
and since there are cd blocks, its rooan square is
- 3 Similarlyj the variance between plots over the entire
a~ea
is
I
= (cd(b-l)V.3 + d.(c-I)V 2 + (d~l)Vl)/(bcd-l)
3
and the variance bet'lj·Jeen subplots over the entire area is
V
I
cd(b-I)V + d(C-l)V + (d-l)Vl)/(abcd-l)r
2
3
These formul8.s a!'8 formally identical :bo those given by Koch and Rigney, who
V = (bcd(a-I)V
4
4+
expressed their results in terms of components of variance.
Smith's reg:"e ssion coefficient b is defined by the fa rmula
log Vx = log V - b log x,
where x is the number of units per plot, V is the variance among plots one unit in
size, and V is the variance of mean per unit area for plots of size x
x
units~
For
purposes of estimating optimum plot size, the coefficient b is alone of interest.
In the computations suggested bj- Koch and Rigney, the values of V are obtained by
x
I
dividing each value of V by the number of units per replication, block, plot or
subplot, thus putting them on a unit besis.
According to them, b is given as an
unweighted regression coefficient:
"
~--~
~y .(x.-X )
b
1
where y
= log (Vi/X),
x
. J J
= ~J"T---:~-;-~
(1)
)3(x~-x')2
.
J
I
=
J
.
log x, and i
I
~
I
=~x./ns
j J
As was pointed out to one of the authors by D. D. I"lason, Department of Experimental Statistics, N.. C. state College, application
gave results less than -1, which 8.re unacceptable
o~
011
tl1:i.s formula for b
physical grounds.
l
sometimes
It l'1as
realized that the above estimate (1) would often be inaccurate owing to the equal
e
weighting of y-values of differing variability..
It was therefore decided to apply
to the different terms in the sums of squares and products defining the regression
coefficient, 1I\Teights that would lead to an estimate of minimum variance.
The
- 4appropriate weights are the elements of the inverse of the cova~iance matrix (i.e.
the information matrix) of the values of y.
If these elements are designated w
jk'
the estimate is
(2)
= Tu (say),
where
•
The t-7eights wjk will have to be estimated from the data and will be to that
extent inaccurate; but apart from this source of error, the effect of which we do
not consider, the estimate will be of minimum variance; this variance is in fact
When, as is often the case, there are more than tlA"O variance estimates from
which to compute the regression, we may also test the significance of departure
from regression.
The weighted total sum of squares of the Yj is
V
=:
~~ W ., Y. (Yk-Y)
j k
JK" J
where
with n-l degrees of
freedom~
n being the number of variance estimates.
,
The sum of
equares attributable to regression on x. is
U2/T.
J
Hence the sum of squares for departure from regression is
e
V-
if/T ,
which is distributed approximately as X2 with n-2 degrees of freedom, and may be
•
tested accordingly.
It now remains to estimate the weights.
Since VI' V , V , and V are indepen-
3
4
2
2
2
2
dent, and their variances are 2v /(d-l), 2V /d(C-1), 2v /Cd(b-1), and 2v /hcd(a-1)
2
1
3
4
2
I
respectively, it is not difficult to determine the variances end covariances of Vl'
,
"
V2' V3' and V4' which are linear functions of the former set.
I
In fact, not only the
I
variance of V , but also its covariances with the other Vi' are proportional to V 2.
1
1
Likewise the variance of V is estimated as
2
2
2L-d(C-l)V + (d-l)V 2_7 I (cd-1)2
2
1
,
I
1
and its covariances with V3 and V4 are proportional to this.
Thus we find the
I
covariance matrix of the V. to be as follows:
~
D
D
D
D
(d... l)(cd-l)
(d-l)(bcd-l)
(d-l) (abed-I)
C+D
C+D
C+D
(d-l)(cd-l)
:2
(cd-I)
(cd-I) (bcd-I)
(cd-I) (abed-I)
D
C+D
B+C+D
B+C+D
(d-l)(bcd-l)
(cd-l )(bcd-I)
(bcd-l)2
(bcd-l)(abcd-l)
D
C+D
B+C+D
A+B+C+D
(d-l) (abed-I)
(cd-I) (abed-I)
(bcd-I) (abcd-l)
(abed-l)2
D
where
D
= 2(d-l)Vl2
2
C = 2d(c-l)V2
B
= 2cd(b-l)V32
A
= 2bcd(a-l)V42
The j.nverse matrix is found to be even simpler in form; as may be verified, it is
•
- 6 (d_l)2(1.1)
-(d-l)(cd-l)
CD
-(d-l)(cd-l)
c
0
(cd-I)
211
(F'C)
:(cd-l)(bcd-l)
2 I 1
(bcd-I) (r:B)
-(bcd-l) (abed-I)
.. (bcd-l)( abed-I)
(abcd_l)2
A
B
0
0
B
-(cd-l)(bcd-l)
0
0
0
C
A
A
f
The ~r;eights for y. (= log V.) are obtained by mUltiplying each row and each
J
J
f
column of this inverse matrix by the corresponding Vj. This result follows from
the approximate formula
e
If, as is usual in practical computation, logarithms to base 10 rather than natural
logarithms are taken, the weights will need to be multiplied by the factor
-2
M
= ( loglOe )-2
1'1e shall deal here with the transformation to natural logarithms, and indicate the
adjustments nece ssary for common logarithms below.
Thus, from the inverse matrix,
The weights may thus be determined from the inverse matrix T.-Jithout too much
difficulty.
It will be found that the sum of the elements of the weight matrix is equal to
half the total number of degrees of freedom for the sums of squares from which variance
e
estimates are derived.
This may be seen in the
follo~ring
T,;ray.
If the variances are
unaffected by si~ of plot, then all the available sums of squares are estimates of
.. 7 the same basic variance.
The different estimates of the logarithm of the variance,
derived from different lines of the analysis of variance, are independent, and have
asymptotic variance equal to twice the reciprocal of the corresponding degrees of
freedom.
Consequently the information from each is half the degrees of freedom,
whence the total information is half the total degrees of freedom.
Thus, for data from uniformity trials" the sum of the ""eights is
1
~
(abed-I)
,
while for data from split-plot experiments, the sum of the weight s will be
1
2' bc(ad-l)
•
Similar results may be derived for lattices and other types of experimental
design.
They provide a convenient check on the computation of the weights.
To determine the regression coefficient and to test the departure from regre:Jsion, the calculation is best carried out in stages, as follows.
Ik
=~w'kx~
j
J J
and
Then the sum of squares of x
~
r
is
r
T =~X.X. j
J J
.~
2/~'Q
(~X.)
~w'k
j
J
similarlY the sum of product s of y with x
,
j k
J
is
u .]X.y. - (2x.)( ~y.)/ ~w'k
j
J J
=~YjX ~
j
J
and the sum of squares of y is
j
-
J
j
J
j
k J
(~Xj)( ~Yj)/ '~k wjk
J
J
J
Let
- 8-
The variance of the estimate b
is, to the degree of approximation of the
2
analysis,
T-1 •
Hence, approximate confidence limits for the population regression coefficient p
are
b
+ tT- l / 2
2 -
,
t being the normal deviate at the required level of probability.
Departure from regression is tested, as indicated above, by means of
e
V _ U2/T
1I>rhich is regarded as
x2 with
n-2 degrees of freedom.
When common logarithms are used, the value of b 2 is determined as above, but
it s variance is now
T- 1/5,,302
= 0.1886
T-1
,
and the corresponding confidence limits are
b
2
: Oo4343tT- l /
2
The sum of squares for departure from regression is also altered, to
2
50302(V - U /T)"
It should be observed that the key to these computations is the covariance
I
matrix of the variances Vi of the plots of different sizes.
Because these variances
are expressed as linear combinations of the original mean squares, which are independent, and not in terms of the variance components, \>Jhich are correlated with one
another, the resulting covariance matrix, and its inverse,
take on a relatively
- 9simple form.
3.
Estimation from experimental data
t']hen variance components are to be estimated from experimental data, the esti-
mates are calculated in the same way as from uniformity trial data.
However" since
a number of comparisons are given over to the estimation of treatment effects, the
different plot and block variances are estimated "t17ith fewer degrees of freedom,
and hence Ie ss precision, than they could have been in a uniformity trial.
Apart
from this complication, for l>7hich allowance must be made in determining the l'11eights
for the various components, the determination of a linear unbiased estimate with
asymptotic minimum variance fol101'11S the same lines as that given in the previol'.s
section.
The method is illustrated by the analysis for a split-plot experiment in
the form given by Koch and Rigney.
It will be noted that, in this model, it il':
assumed that block-treatment interactions do not exist.
Degrees of
freedom
Replications
d-1
S + aF + abQ + abeR
Treatments (1)
c-1
S + aF + abQ + treatment effects
S + aF + ab:Q
Error (1)
Total between
whole plots
cd··l
Treatments (2) and
interactions
Error (2)
e
Expectation of
mean square
11ean
Square
c(b~"I)
c(b-I) (d-l)
Split-plot s
cd(b-l)
Sampling error
bcd(a-l)·
S + aF + treatment effects
S + aP
s
As for a unti'ormity trial, the estimated variance of plots the size of a complete replication is Vl'
Since it is estimated with the full d-l degrees of
•
- 10 -
freedom, its variance is as given in the previous section.
In estimating the rrean square for blocks (i.e. whole plots) we must allow for
the fact that, of the d(c-l) comparisons between blocks within replications, only
(c-l)(d-1) are available for estimating the variance, the other c-l containing
treatment effects.
Thus, as before, the estimated variance between blocks is
I
V2 = (d(c-l)V 2 + (d-I)V 1 )/(cd-1)
but its estimated variance is now increased to
,
2L-d2(C-l)V~/(d-l) + (d-1)Vi_7/(Cd-l)2 •
The variance of V has similarly to be adjusted by a factor d~l •
3
The analysis now proc89ds as for uniformity trials, and the j ...."lverse matrix is
as given above, provided we redefine
D
= 2(d-l)V12
C ~ 2d 2 (c-l)V '>/ (d-l)
2
2
B == 2Cd (b-l)V;/(d-l)
..;
A = 2bcd(a-1)V~
4. Numerical
e~am~
The computations required in the prQposed method are illustrated in a numerical
example, :Bhe data for which, set out in Table 1, 'lATere kindly furnished by D. D.
I>1ason.
..
- 11-
Table 1
Soybean Yield Trial Conducted by C. A. Brim"
Uo S. Department of Agriculture" at Willard" North Carolina" 1956.
Degrees of
Freedom
Source
2
= d-l
452
11
= c-l
30,401
Replications
Varieties
Experimental Error
22
= (d-l)(c-l)
ROvJS in Plots
36
= cd(b-l)
Subplots in Rows
72
Here a
= 2,
b
= 2,
c ..
IvT.ean Sq uare
= bcd(a-l)
12, d = 3.
10,589
5,938
2,862
= VI
= V2
= V3
= V4
I
In the determination of the weights we calf work with multiples of the V j more
I
e
conveniently than with the V. themselves. This cdevice makes the computations
J
simpler as well as more accurate. We have
I
(d-l) VI ..
I
=
(cd-l)V 2
I
=
=
2 VI
35 V
I
2
"2V
+33 V
l
904
.. 350341
2
I
71 V .. 2 VI + 33 V + 36 V
= 564109
2
3
3
3
I
I
(abed-I) V = 143 v .. 2 VI + 33 V + 36 V + 72 V = 770173
2
4
4
3
4
This gives
(bcd-I) V
I
= 452
V2 = 10010
V = 7945
3
V .. 5386
4
VI
I
I
I
The number of units per plot corresponding to the different-sized plots the
I
I
I
I
variances of which are VI' V , V , and V are 48, 4, 2, and 1 respectively. Putting
2
4
3
I
the variances V on a unit basis and taking logarithms, we obtain the values given
in Table 2.
- 12 •
Table 2
I
Logarithms of Relative Plot Sizes (x ) and
Unit Variances (y)
x
,
Y
0.0000
0.3010
0.6021
1.6812
3.7313
3.5891
3.3984
0.9739
The unweighted regression coefficient is then
4.7638 - 7.5544
= 3.2796 - 1. 6697
= -1.73.34
As Koch and Rigney point out, b is an index of soil variability; it should
vary between zero and minus one.
A value of zero indicates perfect correlation
(extreme uniformity) among the units making up a plot; a value of minus one
indicates no correlation.
Clearly the value obtained in the present 'example can
have no unambiguous physical interpretation.
Here it is apparent that 1',eighting
a low mean square for replications based on only
tv10
degrees of freedom,
equall~T
with others based on many WDre degrees of freedom, has led to an unreasonable estimate of soil variability.
f
Using the method proposed in the present paper, y and x are as before.
weights are the elements wjk of the information matrix of the y.
numbers it is convenient first to calculate
A = 2bCd(a-l)V~
= 1,179,500,000
B = 2Cd2(b-l)V~/(d-l)
= 3,808,,100,000
C = 2d2(C-l)V~/(d-l)
=
11,100,600,000
=
800,000
D
= 2(d-l)V l2
The
To obtain these
"
- 1.3 •
then
wl1 =
w12
==
w21
1 1
L (d-l)Vl _72 (c+n)
..
I
= 1.00
~(d-l)V~..7 L-(Cd-l)V;.7 /e = -0.0.3
== -
The remaining elements are computed in similar fashion.
The completed information
matrix is
W
=
1.00
-0,,0.3
0,00
0.00
.0.0.3
4.3.29
-51.90
0.00
0.00
-51.90
.35.3 •.36
-.368.34
the sum of whose elements is ~ bc(ad-l)
= 60,
0.00
0.00
- .368•.34
502.89
as m~ be verified.
He now compute the set
Yk
=~w'k!'
j
J J
Thus
Y3 =
=
-282.52
,•
,
,
=
=
554.42
•
=
233.58
Xl ==
=
1.66
X2 =
X ".
3
==
10.39
==
75011
0.. 87
similarly Y2 =
Y
4
". -.39.19
~Y.
j
J
In the same way we compute the X :
k
X ".
4
~X.
j
J
= -110.87
= .23.71
..
- 14 Then
.~
I
.~
2 ~~
T =~X.x. - (~X.) / ~w'k
j JJ
j J
jk J
= 31.65 - (-23071)2/ 60
=
u = -14.87
SimilarlY
and
e
22.28
v = 13.05
b2 = -14.87/22.28
= -0.667
V(b ) = 0.1886/22.28
2
= 0.00846
Standard error
=
0.092
As a matter of interest, the variance of b was also determined.
1
This
variance is given by
~~ wjk( X.-x
'-')( x-x
'-')
j k
J
/~ (x J~
k
-x') 2..72
...
J
'k
where the wJ are the elements of the inverse of the weight matrix; in oth6"~
they are the elements of the covariance matrix of the y. 's.
,
J
-,
Here x
V(;,:'c' ~;,.
is the un-
weighted mean of the x •
j
The variance of b was found to be 0.1644, giving a standard error of 00406e
l
Thus the efficiency of this est imate is
0.00846
0.1644
=
5 per cent.
r'Jith such a large standard error, any estimate of a quantity lying between 0 and 1
is of little value ..
..
- 15 To test departure from regression, we have
X~2) = ,.302
= ,.302
=
(V.u2/T)
x 3.125
16.57
Since this value exceeds the 1 per cent point of the
x2
distribution, the data
depart significantly from the assumed linear relationship.
5.
A110l.rance for departure from empirical relationship
Departure from linearity (i.e.,· from the empirical law of Fairfield Smith)
may cause concern in some examples.
In such cases, provided cost data are
available~
optimum plot size may be estimated with reasonable accuracy without the assumption
e
that the empirical law holds.
Suppose the cost of r replications is
~
where
r(K +K x)
,
l 2
is the cost of a plot (regardless of size), K is the cost per U'1.it cf
2
plot and x is the number of units per plot.
He then require to minimize Vx/r, subject to the condition that r(Kl+K2:;~; 02
fiXed.
This is equivalent to minimizing
't.rith respect to x.
If F(x) can be determined from experimental data for a few
values of x, its minimum may be fairly easily determined graphically.
Example 2
Johnson and Hixon (1952) have reported a 100 per cent cruise of 40 acres of
old-growth DougJ a.s fir timber in Oregon.
The data consist of timber volume on each
of 1600 1/40-acre plots in a 40 x 40 square.
The analysis of variance, with strati··
fication to eliminate systematic variation between sets of 8 rows and sets of 8
,.
- 16columns, has been
~.Torked
out in the manner shown in the table below:
Degrees of
Freedom
Source
Mean Square
Among 1.6-acre plots
16
Among 0.4-acre plots
in 1.6-acre plots
75
93947 .. V2
Among O.l-acre plots
in 0.4-acre plots
300
73012 .. V
3
Among 0.025-acre plots
in O.l-acre plots
1200
277106 =V
I
100744 = V
4
The large mean square among 0.025-acre plots indicates competitive effectso
Since the average diameter of trees measured was 45 inche s" such a result is
hardi.~T
surprising.
Here
,
1
,
Number of 0.025-acre
units per plot
..
277106
64
=
138349
16
V II
3
I
V "
4
89223
4
97869
1
V
V2
I
I
When the values of the V are adjusted to a unit basis and plotted" depar-GuY2f'
from linearity appear serious.
Johnson and Hixon also estimate the number of plots of different sizes which
can be measured in a f our-hour cruise of 40 acres.
Converting these data to mean
minute s per plot for plot s of different sizes" we obtain
.. ,
- 17 Number of
0.025-acre Unit s (x)
Kind of Plot
~
Mean Minutes
per Plot (t)
X 1 chain
2
7.50
t X 2 chaine
4
11.10
~ X 4 chains
8
16.14
12
22.16
~
X 6 chains
Assuming cost per plot (in minutes) to be linearly related to size of plot,
we may write
T = ~ + K2X
,
where T is total cost per plot (measured in minutes per plot)
K is a constant (measured in minutes per plot)
1
K is a constant (measured in minutes per unit area)
2
x
is the number of O.025-acre units per plot.
From the data given above we obtain
T = 4.9 + 1.43 x
Thus we may compute F(x)
x (number of
0.025-acre units)
1
4
16
64
~+ K2 x
V
I
Ix
= V
x
F(x) = V (K + K x)
x
1
2
6.33
10.62
97869
22306
27.78
96.42
8647
619500
236900
240200
4330
417500
Plotting F(x) as a function of x we find that its minimum occurs between x = 4 and
x = 16 units.
It is suggested that in this region departures from linearity in the
relation
Log Vx I: log VI - b log x
AI
•
will not be serious. Hence b is given approximately by