Efficient Estimation of the Relationship between Plot Size and the Variability of Crop Yields • W. H. Hatheway and E. J. \'Jilliams Institute of Statistics I-limeo Series No. 174 June, 1957 Efficient Estimation of the Relationship between Plot Size and the Variability of Crop YieldsF 'W. H. Hatheway and E. J. Williamsi~ I. Introduction The optimum size of plot in field experimentation depends on the relationship between fixed costs and costs varying with number of units, and on soil variability. Perhaps the most useful measure of soil heterogeneity yet devised is that of Smith (1938), who showed empirically that the logarithm of t he variance between plots of a given size was linearly related to the logarithm of the size of the plot. In the present paper we consider only the relationship between size and variability. The objects of the paper are, firstly, to show hOTN' efficient estimates of the constants in this relationship may be determined, and secondly, to illustrate a general method of determining efficient linear estimates when the data are, as in the present instance, correlated and of unequal variability. Koch and Rigney (1951) demonstrated that the regression coefficient of the logarithm of variance on the logarithm of plot size cou.1d be estimated from experimental data in which treatment effects are present, as well as from the data of uniformity trials. They noted that Smith had recommended that, in determining the regression coefficient £, the variances of the different sized plots should be weighted by their respective degrees of freedom. In fact, since the variance esti- mates for different size of plot, both in uniformity trials and experimental data, are built up from common components, they are frequently highly correlated, so that a simple weighting by degrees of freedom is not accurate. Koch and Rigney point out this diffiCUlty for experiment a' data, but do not seem to have realized that their arguments apply with equal force to uniformity trial data. e *.;. Paper No. 57 of the Agricultural Journal ~)eries of the Rockefeller Foundation. Rockefeller Foundation, AgriCUltural Field Staff, and the Institute of Statistics, Raleigh, N. C. • - 2 - The present paIEr presents a method of weighting observed variances of differentsized plots which leads to an unbiased estimate £. with asymptotically minimum variance. It is applicable both to uniformity trial data and to experimental data; in the letter case the analysis of variance is in effect reconstructed to simulate one derived directly from uniformity trial data, in the manner suggested by Koch and Rigney" 2? Estimat1.on from Uniformity Trial Data Koch and Rigney showed that a uniformity trial subdivided to simulate a spli-l::,· plot or lattice design could be analyzed in the manner shol\Tll below; a randomized block arrangem:mt could similarly be superimposed on the trial, though it would not prOVide so much information about the relationship of variabj.lity to plot size. Source Degrees of Freedom ~J[ean Square Expectation of Mean Square Replications d-l VI S + aP + abQ + abcR Blocks within replications d(c-l) V2 S + aP + cd(b...l) V 3 ...,C' + aP bcd(a-l) V S Plots 'TId thin blocks Subplots within plots 4 ab~ The variance of plots the size of a complete replication is VI' the replication mean square as it appears in the analysis of variance. The variance of plots the size of blocks contains, in addition to the variation due to blocks 1,rithin replications, that removed by the stratification of groups of blocks into replications in the analysis of variance. Thus the total sum of squares for blocks is and since there are cd blocks, its rooan square is - 3 Similarlyj the variance between plots over the entire a~ea is I = (cd(b-l)V.3 + d.(c-I)V 2 + (d~l)Vl)/(bcd-l) 3 and the variance bet'lj·Jeen subplots over the entire area is V I cd(b-I)V + d(C-l)V + (d-l)Vl)/(abcd-l)r 2 3 These formul8.s a!'8 formally identical :bo those given by Koch and Rigney, who V = (bcd(a-I)V 4 4+ expressed their results in terms of components of variance. Smith's reg:"e ssion coefficient b is defined by the fa rmula log Vx = log V - b log x, where x is the number of units per plot, V is the variance among plots one unit in size, and V is the variance of mean per unit area for plots of size x x units~ For purposes of estimating optimum plot size, the coefficient b is alone of interest. In the computations suggested bj- Koch and Rigney, the values of V are obtained by x I dividing each value of V by the number of units per replication, block, plot or subplot, thus putting them on a unit besis. According to them, b is given as an unweighted regression coefficient: " ~--~ ~y .(x.-X ) b 1 where y = log (Vi/X), x . J J = ~J"T---:~-;-~ (1) )3(x~-x')2 . J I = J . log x, and i I ~ I =~x./ns j J As was pointed out to one of the authors by D. D. I"lason, Department of Experimental Statistics, N.. C. state College, application gave results less than -1, which 8.re unacceptable o~ 011 tl1:i.s formula for b physical grounds. l sometimes It l'1as realized that the above estimate (1) would often be inaccurate owing to the equal e weighting of y-values of differing variability.. It was therefore decided to apply to the different terms in the sums of squares and products defining the regression coefficient, 1I\Teights that would lead to an estimate of minimum variance. The - 4appropriate weights are the elements of the inverse of the cova~iance matrix (i.e. the information matrix) of the values of y. If these elements are designated w jk' the estimate is (2) = Tu (say), where • The t-7eights wjk will have to be estimated from the data and will be to that extent inaccurate; but apart from this source of error, the effect of which we do not consider, the estimate will be of minimum variance; this variance is in fact When, as is often the case, there are more than tlA"O variance estimates from which to compute the regression, we may also test the significance of departure from regression. The weighted total sum of squares of the Yj is V =: ~~ W ., Y. (Yk-Y) j k JK" J where with n-l degrees of freedom~ n being the number of variance estimates. , The sum of equares attributable to regression on x. is U2/T. J Hence the sum of squares for departure from regression is e V- if/T , which is distributed approximately as X2 with n-2 degrees of freedom, and may be • tested accordingly. It now remains to estimate the weights. Since VI' V , V , and V are indepen- 3 4 2 2 2 2 dent, and their variances are 2v /(d-l), 2V /d(C-1), 2v /Cd(b-1), and 2v /hcd(a-1) 2 1 3 4 2 I respectively, it is not difficult to determine the variances end covariances of Vl' , " V2' V3' and V4' which are linear functions of the former set. I In fact, not only the I variance of V , but also its covariances with the other Vi' are proportional to V 2. 1 1 Likewise the variance of V is estimated as 2 2 2L-d(C-l)V + (d-l)V 2_7 I (cd-1)2 2 1 , I 1 and its covariances with V3 and V4 are proportional to this. Thus we find the I covariance matrix of the V. to be as follows: ~ D D D D (d... l)(cd-l) (d-l)(bcd-l) (d-l) (abed-I) C+D C+D C+D (d-l)(cd-l) :2 (cd-I) (cd-I) (bcd-I) (cd-I) (abed-I) D C+D B+C+D B+C+D (d-l)(bcd-l) (cd-l )(bcd-I) (bcd-l)2 (bcd-l)(abcd-l) D C+D B+C+D A+B+C+D (d-l) (abed-I) (cd-I) (abed-I) (bcd-I) (abcd-l) (abed-l)2 D where D = 2(d-l)Vl2 2 C = 2d(c-l)V2 B = 2cd(b-l)V32 A = 2bcd(a-l)V42 The j.nverse matrix is found to be even simpler in form; as may be verified, it is • - 6 (d_l)2(1.1) -(d-l)(cd-l) CD -(d-l)(cd-l) c 0 (cd-I) 211 (F'C) :(cd-l)(bcd-l) 2 I 1 (bcd-I) (r:B) -(bcd-l) (abed-I) .. (bcd-l)( abed-I) (abcd_l)2 A B 0 0 B -(cd-l)(bcd-l) 0 0 0 C A A f The ~r;eights for y. (= log V.) are obtained by mUltiplying each row and each J J f column of this inverse matrix by the corresponding Vj. This result follows from the approximate formula e If, as is usual in practical computation, logarithms to base 10 rather than natural logarithms are taken, the weights will need to be multiplied by the factor -2 M = ( loglOe )-2 1'1e shall deal here with the transformation to natural logarithms, and indicate the adjustments nece ssary for common logarithms below. Thus, from the inverse matrix, The weights may thus be determined from the inverse matrix T.-Jithout too much difficulty. It will be found that the sum of the elements of the weight matrix is equal to half the total number of degrees of freedom for the sums of squares from which variance e estimates are derived. This may be seen in the follo~ring T,;ray. If the variances are unaffected by si~ of plot, then all the available sums of squares are estimates of .. 7 the same basic variance. The different estimates of the logarithm of the variance, derived from different lines of the analysis of variance, are independent, and have asymptotic variance equal to twice the reciprocal of the corresponding degrees of freedom. Consequently the information from each is half the degrees of freedom, whence the total information is half the total degrees of freedom. Thus, for data from uniformity trials" the sum of the ""eights is 1 ~ (abed-I) , while for data from split-plot experiments, the sum of the weight s will be 1 2' bc(ad-l) • Similar results may be derived for lattices and other types of experimental design. They provide a convenient check on the computation of the weights. To determine the regression coefficient and to test the departure from regre:Jsion, the calculation is best carried out in stages, as follows. Ik =~w'kx~ j J J and Then the sum of squares of x ~ r is r T =~X.X. j J J .~ 2/~'Q (~X.) ~w'k j J similarlY the sum of product s of y with x , j k J is u .]X.y. - (2x.)( ~y.)/ ~w'k j J J =~YjX ~ j J and the sum of squares of y is j - J j J j k J (~Xj)( ~Yj)/ '~k wjk J J J Let - 8- The variance of the estimate b is, to the degree of approximation of the 2 analysis, T-1 • Hence, approximate confidence limits for the population regression coefficient p are b + tT- l / 2 2 - , t being the normal deviate at the required level of probability. Departure from regression is tested, as indicated above, by means of e V _ U2/T 1I>rhich is regarded as x2 with n-2 degrees of freedom. When common logarithms are used, the value of b 2 is determined as above, but it s variance is now T- 1/5,,302 = 0.1886 T-1 , and the corresponding confidence limits are b 2 : Oo4343tT- l / 2 The sum of squares for departure from regression is also altered, to 2 50302(V - U /T)" It should be observed that the key to these computations is the covariance I matrix of the variances Vi of the plots of different sizes. Because these variances are expressed as linear combinations of the original mean squares, which are independent, and not in terms of the variance components, \>Jhich are correlated with one another, the resulting covariance matrix, and its inverse, take on a relatively - 9simple form. 3. Estimation from experimental data t']hen variance components are to be estimated from experimental data, the esti- mates are calculated in the same way as from uniformity trial data. However" since a number of comparisons are given over to the estimation of treatment effects, the different plot and block variances are estimated "t17ith fewer degrees of freedom, and hence Ie ss precision, than they could have been in a uniformity trial. Apart from this complication, for l>7hich allowance must be made in determining the l'11eights for the various components, the determination of a linear unbiased estimate with asymptotic minimum variance fol101'11S the same lines as that given in the previol'.s section. The method is illustrated by the analysis for a split-plot experiment in the form given by Koch and Rigney. It will be noted that, in this model, it il': assumed that block-treatment interactions do not exist. Degrees of freedom Replications d-1 S + aF + abQ + abeR Treatments (1) c-1 S + aF + abQ + treatment effects S + aF + ab:Q Error (1) Total between whole plots cd··l Treatments (2) and interactions Error (2) e Expectation of mean square 11ean Square c(b~"I) c(b-I) (d-l) Split-plot s cd(b-l) Sampling error bcd(a-l)· S + aF + treatment effects S + aP s As for a unti'ormity trial, the estimated variance of plots the size of a complete replication is Vl' Since it is estimated with the full d-l degrees of • - 10 - freedom, its variance is as given in the previous section. In estimating the rrean square for blocks (i.e. whole plots) we must allow for the fact that, of the d(c-l) comparisons between blocks within replications, only (c-l)(d-1) are available for estimating the variance, the other c-l containing treatment effects. Thus, as before, the estimated variance between blocks is I V2 = (d(c-l)V 2 + (d-I)V 1 )/(cd-1) but its estimated variance is now increased to , 2L-d2(C-l)V~/(d-l) + (d-1)Vi_7/(Cd-l)2 • The variance of V has similarly to be adjusted by a factor d~l • 3 The analysis now proc89ds as for uniformity trials, and the j ...."lverse matrix is as given above, provided we redefine D = 2(d-l)V12 C ~ 2d 2 (c-l)V '>/ (d-l) 2 2 B == 2Cd (b-l)V;/(d-l) ..; A = 2bcd(a-1)V~ 4. Numerical e~am~ The computations required in the prQposed method are illustrated in a numerical example, :Bhe data for which, set out in Table 1, 'lATere kindly furnished by D. D. I>1ason. .. - 11- Table 1 Soybean Yield Trial Conducted by C. A. Brim" Uo S. Department of Agriculture" at Willard" North Carolina" 1956. Degrees of Freedom Source 2 = d-l 452 11 = c-l 30,401 Replications Varieties Experimental Error 22 = (d-l)(c-l) ROvJS in Plots 36 = cd(b-l) Subplots in Rows 72 Here a = 2, b = 2, c .. IvT.ean Sq uare = bcd(a-l) 12, d = 3. 10,589 5,938 2,862 = VI = V2 = V3 = V4 I In the determination of the weights we calf work with multiples of the V j more I e conveniently than with the V. themselves. This cdevice makes the computations J simpler as well as more accurate. We have I (d-l) VI .. I = (cd-l)V 2 I = = 2 VI 35 V I 2 "2V +33 V l 904 .. 350341 2 I 71 V .. 2 VI + 33 V + 36 V = 564109 2 3 3 3 I I (abed-I) V = 143 v .. 2 VI + 33 V + 36 V + 72 V = 770173 2 4 4 3 4 This gives (bcd-I) V I = 452 V2 = 10010 V = 7945 3 V .. 5386 4 VI I I I The number of units per plot corresponding to the different-sized plots the I I I I variances of which are VI' V , V , and V are 48, 4, 2, and 1 respectively. Putting 2 4 3 I the variances V on a unit basis and taking logarithms, we obtain the values given in Table 2. - 12 • Table 2 I Logarithms of Relative Plot Sizes (x ) and Unit Variances (y) x , Y 0.0000 0.3010 0.6021 1.6812 3.7313 3.5891 3.3984 0.9739 The unweighted regression coefficient is then 4.7638 - 7.5544 = 3.2796 - 1. 6697 = -1.73.34 As Koch and Rigney point out, b is an index of soil variability; it should vary between zero and minus one. A value of zero indicates perfect correlation (extreme uniformity) among the units making up a plot; a value of minus one indicates no correlation. Clearly the value obtained in the present 'example can have no unambiguous physical interpretation. Here it is apparent that 1',eighting a low mean square for replications based on only tv10 degrees of freedom, equall~T with others based on many WDre degrees of freedom, has led to an unreasonable estimate of soil variability. f Using the method proposed in the present paper, y and x are as before. weights are the elements wjk of the information matrix of the y. numbers it is convenient first to calculate A = 2bCd(a-l)V~ = 1,179,500,000 B = 2Cd2(b-l)V~/(d-l) = 3,808,,100,000 C = 2d2(C-l)V~/(d-l) = 11,100,600,000 = 800,000 D = 2(d-l)V l2 The To obtain these " - 1.3 • then wl1 = w12 == w21 1 1 L (d-l)Vl _72 (c+n) .. I = 1.00 ~(d-l)V~..7 L-(Cd-l)V;.7 /e = -0.0.3 == - The remaining elements are computed in similar fashion. The completed information matrix is W = 1.00 -0,,0.3 0,00 0.00 .0.0.3 4.3.29 -51.90 0.00 0.00 -51.90 .35.3 •.36 -.368.34 the sum of whose elements is ~ bc(ad-l) = 60, 0.00 0.00 - .368•.34 502.89 as m~ be verified. He now compute the set Yk =~w'k!' j J J Thus Y3 = = -282.52 ,• , , = = 554.42 • = 233.58 Xl == = 1.66 X2 = X ". 3 == 10.39 == 75011 0.. 87 similarly Y2 = Y 4 ". -.39.19 ~Y. j J In the same way we compute the X : k X ". 4 ~X. j J = -110.87 = .23.71 .. - 14 Then .~ I .~ 2 ~~ T =~X.x. - (~X.) / ~w'k j JJ j J jk J = 31.65 - (-23071)2/ 60 = u = -14.87 SimilarlY and e 22.28 v = 13.05 b2 = -14.87/22.28 = -0.667 V(b ) = 0.1886/22.28 2 = 0.00846 Standard error = 0.092 As a matter of interest, the variance of b was also determined. 1 This variance is given by ~~ wjk( X.-x '-')( x-x '-') j k J /~ (x J~ k -x') 2..72 ... J 'k where the wJ are the elements of the inverse of the weight matrix; in oth6"~ they are the elements of the covariance matrix of the y. 's. , J -, Here x V(;,:'c' ~;,. is the un- weighted mean of the x • j The variance of b was found to be 0.1644, giving a standard error of 00406e l Thus the efficiency of this est imate is 0.00846 0.1644 = 5 per cent. r'Jith such a large standard error, any estimate of a quantity lying between 0 and 1 is of little value .. .. - 15 To test departure from regression, we have X~2) = ,.302 = ,.302 = (V.u2/T) x 3.125 16.57 Since this value exceeds the 1 per cent point of the x2 distribution, the data depart significantly from the assumed linear relationship. 5. A110l.rance for departure from empirical relationship Departure from linearity (i.e.,· from the empirical law of Fairfield Smith) may cause concern in some examples. In such cases, provided cost data are available~ optimum plot size may be estimated with reasonable accuracy without the assumption e that the empirical law holds. Suppose the cost of r replications is ~ where r(K +K x) , l 2 is the cost of a plot (regardless of size), K is the cost per U'1.it cf 2 plot and x is the number of units per plot. He then require to minimize Vx/r, subject to the condition that r(Kl+K2:;~; 02 fiXed. This is equivalent to minimizing 't.rith respect to x. If F(x) can be determined from experimental data for a few values of x, its minimum may be fairly easily determined graphically. Example 2 Johnson and Hixon (1952) have reported a 100 per cent cruise of 40 acres of old-growth DougJ a.s fir timber in Oregon. The data consist of timber volume on each of 1600 1/40-acre plots in a 40 x 40 square. The analysis of variance, with strati·· fication to eliminate systematic variation between sets of 8 rows and sets of 8 ,. - 16columns, has been ~.Torked out in the manner shown in the table below: Degrees of Freedom Source Mean Square Among 1.6-acre plots 16 Among 0.4-acre plots in 1.6-acre plots 75 93947 .. V2 Among O.l-acre plots in 0.4-acre plots 300 73012 .. V 3 Among 0.025-acre plots in O.l-acre plots 1200 277106 =V I 100744 = V 4 The large mean square among 0.025-acre plots indicates competitive effectso Since the average diameter of trees measured was 45 inche s" such a result is hardi.~T surprising. Here , 1 , Number of 0.025-acre units per plot .. 277106 64 = 138349 16 V II 3 I V " 4 89223 4 97869 1 V V2 I I When the values of the V are adjusted to a unit basis and plotted" depar-GuY2f' from linearity appear serious. Johnson and Hixon also estimate the number of plots of different sizes which can be measured in a f our-hour cruise of 40 acres. Converting these data to mean minute s per plot for plot s of different sizes" we obtain .. , - 17 Number of 0.025-acre Unit s (x) Kind of Plot ~ Mean Minutes per Plot (t) X 1 chain 2 7.50 t X 2 chaine 4 11.10 ~ X 4 chains 8 16.14 12 22.16 ~ X 6 chains Assuming cost per plot (in minutes) to be linearly related to size of plot, we may write T = ~ + K2X , where T is total cost per plot (measured in minutes per plot) K is a constant (measured in minutes per plot) 1 K is a constant (measured in minutes per unit area) 2 x is the number of O.025-acre units per plot. From the data given above we obtain T = 4.9 + 1.43 x Thus we may compute F(x) x (number of 0.025-acre units) 1 4 16 64 ~+ K2 x V I Ix = V x F(x) = V (K + K x) x 1 2 6.33 10.62 97869 22306 27.78 96.42 8647 619500 236900 240200 4330 417500 Plotting F(x) as a function of x we find that its minimum occurs between x = 4 and x = 16 units. It is suggested that in this region departures from linearity in the relation Log Vx I: log VI - b log x AI • will not be serious. Hence b is given approximately by
© Copyright 2026 Paperzz