The variance of the variance of samples from a finite population Eungchun Cho, Kentucky State University∗ Moon Jung Cho, Bureau of Labor Statistics John Eltinge, Bureau of Labor Statistics Key Words: Sample Variance; Randomization Variance; Polykays; Moments of Finite Population Abstract A direct derivation of the randomization variance of the sample variance Vb (x) and related formulae are presented. Examples of the special cases of uniformly distributed population are given. 1 Introduction Introductory courses in the randomization approach to survey inference generally begin with a relatively parsimonious development based on withoutreplacement selection of simple random samples. For an arbitrary finite population one establishes the unbiasedness of the sample mean x for the corresponding finite population mean; evaluates the randomization variance of the sample mean; and develops as unbiased estimator, Vb (x) for this randomization variance. One subsequently uses similar developments for related randomized designs, e.g., stratified random sampling and some forms of cluster sampling. In applications of this material to practical problems, it is often important to evaluate V (Vb (x)), the variance of the variance estimator. For example, some cluster sample designs may be considered problematic if the resulting Vb (x) is unstable, i.e., has an unreasonably large variance. This note presents a relatively simple, direct derivation of the randomization variance of Vb (x) and related quantities. This derivation is pedagogically appealing because it builds directly on the standard whole sample approach used in introductory texts like Cochran [1], and does not require students to work with the more elaborate ”polykay” approach used by Tucky [5], and Wishart [8], ∗ [email protected] 1 2 Functions on Simple Random Samples Consider a finite population of N numbers A = [a1 , a2 , . . . , aN ]. Let Ln,A be the list of all possible samples of n elements selected without replacement from A. N! N Ln,A = [S1 , S2 , . . . , Sα ] , α = (1) = n!(N − n)! n One selects a without-replacement simple random sample of size n from A by selecting one element from Ln,A in such a way that each sample Sj has probability 1/α of being selected. Consider a function f on Ln,A , that is, f assigns a each sample S ∈ Ln,A a value f (S). Two prominent examples of f are the sample mean and the sample variance: 1 X a(S) = ai (2) n ai ∈S X 1 v(S) = {ai − a(S)}2 (3) n−1 ai ∈S Evaluation of the randomization properties of f (S) for S ∈ Ln,A is conceptually straightforward. For example, E{f (S)}, the expectated value of f (S), is obtained by computing its arithmetic average taken over the N n equally likely samples in Ln,A : X 1 E{f (S)} = N f (S) (4) n S∈Ln,A V {f (S)}, the variance of f (S), is defined to be the expectation of the squared deviations [f (S) − E{f (S)}]2 , X 1 V {f (S)} = N [f (S) − E{f (S)}]2 (5) n 3 S∈Ln,A The Variance of the Sample Variance Routine arguments (e.g., Cochran [1, Theorems 2.1, 2.2 and 2.4]) show E{a(S)} = A (6) E{v(S)} = V (A) n 1− N V (A) V {a(S)} = n (7) (8) where a(S) is the mean of the sample S, v(S) is the variance of the sample S, N P A= ai /N , the mean of A, and i=1 V (A) = N X 2 1 ai − A N − 1 i=1 (9) is the full finite-population analogue of the sample variance v(S). The principal task in this paper is to obtain a relatively simple expression (formula) for the variance of v(S), X 1 V {v(S)} = N {v(S) − V (A)}2 (10) n S∈Ln,A in terms of ai ’s in the underlying population. The formula will be useful for estimating the variance of the variance when the straight forward computation by the definition is practically impossible due the combinatorial explosion. 4 The Main Result The following theorem gives a formula for the variance of the sample variance under simple random sampling without replacement. Theorem. Let A = [a1 , a2 , . . . , aN ] be a list of N numbers, (N ≥ 4). Let Ln,A be the list of all possible samples of n numbers selected without replacement from A, ( 2 ≤ n ≤ N − 1 ). Ln,A = [S1 , S2 , . . . , Sα ] N where α = n = N !/n!(N − n)!, Si ⊂ A, and |Si | = n Let Vn be the list of the sample variance v(Si ) for each Si ∈ Ln,A , Vn = [v(S1 ), v(S2 ), . . . , v(Sα )] 2 Then E [v(S) − E{v(S)}] , the variance of the variances of all the samples of A of size n, is given by V {v(S)} = C1 N X a4i + C2 i=1 + C4 X i6=j,i6=k X a3i aj + C3 i6=j a2i aj ak + C5 X X a2i a2j + i<j ai aj ak al i<j<k<l j<k where C1 C2 C3 C4 C5 = N −n N 2n N −n = −4 N 2 (N − 1) n ( ) N (N − 1)((n − 1)2 + 2) − n(n − 1)((N − 1)2 + 2) = 2 N 2 (N − 1)2 n (n − 1) ( ) N (N − 1)(n − 2)(n − 3) − n(n − 1)(N − 2)(N − 3) = 4 2 N 2 (N − 1) (N − 2) n (n − 1) ( ) N (N − 1)(n − 2)(n − 3) − n(n − 1)(N − 2)(N − 3) = 24 2 N 2 (N − 1) (N − 2) (N − 3) n (n − 1) Sketch of Proof. The proof involves determining the coefficients of all the fourth degree terms ai 4 , ai 2 aj 2 , ai 2 aj ak , ai 3 aj , and ai aj ak al that appears in the summation. The summations in the formula are such that all like terms are combined thus appear only once. For example, ai aj ak al appears only for the indices arranged in increasing order i < j < k < l. Recall that α is the number of without-replacement samples S of A of size n and the variance of the variances of the samples S of A of size n is P {v(S) − V (A)}2 Pα 2 S v(S) − V (A)2 = α where the sum is taken over P all the samples S in Ln,A . For the second term, S∈Ln,A V (A)2 . it follows that P V (A)2 i = i<j S∈Ln,A S P a4i + 2 a2i a2j −4 n2 P N 2 (N − 1) 2 P a3i aj + i6=j P i6=j,i6=k,j<k ai ,aj ∈S ai ,aj ,ak ∈S P a2i a2j + 2 n2 a2i aj ak + 6 (n − 1) P i<j i6=j,i6=k,j<k i<j<k<l ai ,aj ∈S ai ,aj ,ak ∈S ai ,aj ,ak ,al ∈S 2 n2 (n − 1) P S v(S)2 is transformed to give P N −1 n−1 v(S)2 a2i aj ak v(S)2 it follows i<j Each term in X P ai ,aj ∈S 4 N 2 (N − 1) ai ∈S P i6=j,i6=k j<k j<k = + −4 P 3 i6=j ai aj + + P P P 2 2 ai aj + 2 a2i aj ak + 6 ai aj ak al i6 = j,i6 = k i<j i<j<k<l Similarly, for the term v(S)2 a2i a2j N2 + 4 P P a4i + 2 = a4i + 2 i n2 P N −2 n−2 i<j a2i a2j + a2i aj ak ai aj ak al + − + 4 4 P N −2 n−2 a3i aj + 4 i6=j P a2i aj ak i6=j,i6=k N −3 n−3 j<k n2 (n − 1) P N −2 n−2 a2i a2j + 8 i<j P N −3 n−3 + a2i aj ak + 24 i6=j,i6=k j<k n2 (n − 1) N −4 n−4 2 P ai aj ak al i<j<k<l Simplification of the binomial coefficients leads to, P 3 P 2 2 P 4 ai aj ai aj a i 1 X i<j i6=j i 2 + 2 (n − 1) −4 v(S) = α nN n (N − 1) N n (N − 1) N S∈Ln,A P a2i aj ak − 4 (n − 2) i6=j,i6=k j<k n (N − 2) (N − 1) N P 2 2 ai aj i<j + 4 + 24 (n − 2) (n − 3) n (n − 1) (N − 1) N + P a2i aj ak i6=j,i6=k j<k + 8 (n − 2) + n (n − 1) (N − 2) (N − 1) N P ai aj ak al i<j<k<l n (n − 1) (N − 3) (N − 2) (N − 1) N Substitution of the expressions of V (A)2 and of v(S)2 into the expression of the variance of the samples S of A leads to the result of the theorem. k The formula for V {v(S)} becomes considerably simpler for the population of a discrete uniform distribution on a finite interval. In this case, A is a finite arithmetic sequence. This occurs, for example, in the important special cases of equal-probability systematic sampling (Cochran [1], Chapter 8). Corollary 1. Let A = [1, 2, . . . , N ], N ≥ 3. Let S, Ln,A and v(S) be as in the theorem above. The variance of the sample variances V {v(S)} = N (N + 1)(N − n) (2 nN + 3n + 3N + 3) 360 n(n − 1) (11) For a more general arithmetic sequence, Corollary 2. Let A = [a0 , a0 + d, . . . , a0 + (N − 1) d], N ≥ 3. Then the variance of v(S), V {v(S)} = N (N + 1)(N − n) (2 nN + 3n + 3N + 3) 4 d 360 n(n − 1) Corollary 3. Let A be a list of numbers uniformly distributed on the interval [1/N, 1], A = N1 , N2 , . . . , NN−1 , N N , N ≥ 3. The variance of v(S), n 3 (1 + N1 )(1 − N ) 2 n + 3 + 3n N + N (12) V {v(S)} = 360 n(n − 1) V {v(S)} approaches 2 n + 3/360 n (n − 1) as N approaches ∞. For simplest two special cases when n = 2 and n = N − 1, we have V {v(S)} = (1 + 1 N )(1 − N2 )(7 + 720 9 N) , if n = 2 (13) and V {v(S)} = (1 + 1 N )(1 180 + 2 N) , if n = N − 1 (14) References 1. W. G. Cochran, Sampling Techniques (3rd ed. ), John Wiley, 1977. 2. R. L. Graham, D. E. Knuth and O. Patashnik, Concrete Mathematics, Addison-Wesley, 1989. 3. J. W. Tukey, Some sampling simplified, Journal of the American Statistical Association, 45 (1950), 501-519. 4. J. W. Tukey, Keeping moment-like sampling computation simple, The Annals of Mathematical Statistics, 27 (1956), 37-54. 5. J. W. Tukey, Variances of variance components: I. Balanced designs. The Annals of Mathematical Statistics, 27 (1956), 722-736. 6. J. W. Tukey, Variances of variance components: II. Unbalanced single classifications. The Annals of Mathematical Statistics, 28 (1957), 43-56. 7. J. W. Tukey, Variance components: III. The third moment in a balanced single classification. The Annals of Mathematical Statistics, 28 (1957), 378-384. 8. J. Wishart, Moment Coefficients of the k-Statistics in Samples from a Finite Population. Biometrika, 39 (1952), 1-13.
© Copyright 2026 Paperzz