Georgia Southern University Digital Commons@Georgia Southern Electronic Theses & Dissertations Jack N. Averitt College of Graduate Studies (COGS) Spring 2014 Comparing k Population Means with No Assumption about the Variances Tony Yaacoub Georgia Southern University Follow this and additional works at: http://digitalcommons.georgiasouthern.edu/etd Part of the Design of Experiments and Sample Surveys Commons, and the Statistical Theory Commons Recommended Citation Yaacoub, Tony, "Comparing k Population Means with No Assumption about the Variances" (2014). Electronic Theses & Dissertations. 1079. http://digitalcommons.georgiasouthern.edu/etd/1079 This thesis (open access) is brought to you for free and open access by the Jack N. Averitt College of Graduate Studies (COGS) at Digital Commons@Georgia Southern. It has been accepted for inclusion in Electronic Theses & Dissertations by an authorized administrator of Digital Commons@Georgia Southern. For more information, please contact [email protected]. COMPARING k POPULATION MEANS WITH NO ASSUMPTION ABOUT THE VARIANCES by TONY YAACOUB (Under the Direction of Charles W. Champ) ABSTRACT In the analysis of most statistically designed experiments, it is common to assume equal variances along with the assumptions that the sample measurements are independent and normally distributed. Under these three assumptions, a likelihood ratio test is used to test for the difference in population means. Typically, the assumption of independence can be justified based on the sampling method used by the researcher. The likelihood ratio test is robust to the assumption of normality. However, the equality of variances is often difficult to justify. It has been found that the assumption of equal variances cannot be made even after transforming the data. Our interest is to develop a method for comparing k population means assuming the data are independent and normally distributed but without assuming equal variances. This is the Behrens-Fisher problem for k = 2. We propose a method that uses the exact distribution of the likelihood ratio (test) statistic. The data is used to estimate this exact distribution to obtain an estimated critical value or an estimated p-value. Key Words: ANOVA, fixed effects one-way design, likelihood ratio test, orthogonal designs, statistically designed experiments 2009 Mathematics Subject Classification: COMPARING k POPULATION MEANS WITH NO ASSUMPTION ABOUT THE VARIANCES by TONY YAACOUB B.S. in Mathematics, Clayton State University, 2012 A Thesis Submitted to the Graduate Faculty of Georgia Southern University in Partial Fulfillment of the Requirement for the Degree MASTER OF SCIENCE STATESBORO, GEORGIA 2014 ©2014 TONY YAACOUB All Rights Reserved iii COMPARING k POPULATION MEANS WITH NO ASSUMPTION ABOUT THE VARIANCES by TONY YAACOUB Major Professor: Charles W. Champ Committee: Broderick O. Oluyede Hani Samawi Electronic Version Approved: May, 2014 iv DEDICATION This thesis is dedicated to my family who have always been supportive to me. Without them, I wouldn’t have been able to achieve what I have achieved so far. v ACKNOWLEDGMENTS Firstly, I would like to thank Dr. Charles W. Champ for his continuous encouragement and support, and for the long hours put into this thesis. Secondly, I would like to thank the thesis committee, Dr. Broderick O. Oluyede and Dr. Hani Samawi, for their support and helpful advice. Lastly, I would like to thank Dr. Jimmy Dillies and Dr. Enka Lakuriqi for helping us with one of the proofs in this thesis. vi TABLE OF CONTENTS Page DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix CHAPTER 1 2 3 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Model and Statistical Hypotheses . . . . . . . . . . . . . . 1 1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Overview of Proposed Research . . . . . . . . . . . . . . . 4 . . . . . . . . . . . . . . . . . . . . . 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Quadratic Forms 5 Some Useful Matrix Results . . . . . . . . . . . . . . . . . . . . . . . Some Distributional Results . . . . . . . . . . . . . . . . . . . . . 11 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Joint Distribution of the Sample Mean and Variance . . . . 11 3.3 Linear Combination of Central Chi Squares . . . . . . . . . 12 3.4 Linear Combination of Noncentral Chi Squares Each with One Degree of Freedom . . . . . . . . . . . . . . . . . . . 17 Ratio of a Linear Combination of Noncentral Chi Square and Central Chi Square Random Variables . . . . . . . . . . 23 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5 3.6 vii 4 5 6 Analysis Assuming Equal Variances . . . . . . . . . . . . . . . . . 26 4.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Sources of Variability . . . . . . . . . . . . . . . . . . . . . 27 4.3 The Classical F test 30 . . . . . . . . . . . . . . . . . . . . . Analysis with No Assumption of the Equality of Variances . . . . 31 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.2 Distribution of SSE with No Assumption about the Variances 31 5.3 Distribution of SST R . . . . . . . . . . . . . . . . . . . . . 32 5.4 Distribution of the Likelihood Ratio Test Statistic . . . . . 35 5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 6.1 General Conclusions . . . . . . . . . . . . . . . . . . . . . . 40 6.2 Areas for Further Research . . . . . . . . . . . . . . . . . . 40 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 viii LIST OF FIGURES Figure Page 3.1 Graph of f2χ23 +4χ26 (u) . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Graph of f2χ23 +4χ26 +5χ27 (u) 17 . . . . . . . . . . . . . . . . . . . . . . . ix CHAPTER 1 INTRODUCTION 1.1 Model and Statistical Hypotheses It is often the interest of researchers to compare two (k = 2) or more (k > 2) population means. For example, when one is comparing the means of the k levels of a one factor, fixed effects statistically designed experiment. To make these comparisons, the researcher is allowed to sample from the populations and observe the measurement Y on each individual in the samples. The measurement Yij to be taken on the jth individual from Population i is assumed to have a normal distribution with mean µi and variance σi2 , for j = 1, . . . , ni with ni ≥ 2, i = 1, . . . , k with k ≥ 2. Also, it is assumed that the Yij ’s are independent. We will refer to this model as the independent normal model. With the assumption of equal variances (σ12 = . . . = σk2 = σ 2 ), a method known as “analysis of variance (ANOVA)” is used to analyze these data. The ANOVA test of the null hypothesis H0 : µ1 = . . . = µk against the alternative hypothesis Ha :∼ (µ1 = . . . = µk ) is a likelihood ratio test. In the derivation of the likelihood ratio test statistic, it is observed that Xk Xni Yij − Y .. 2 = Xk Xni i=1 j=1 Yij − Y i. 2 + Xk i=1 ni Y i. − Y .. i=1 j=1 Y i. = 1 Xni 1 Xk Xni Yij , Y .. = Yij , and m = n1 + . . . + nK . j=1 i=1 j=1 ni m 2 , where for i = 1, . . . , k. One form for the decision rule for the likelihood ratio test of size α rejects H0 in favor of Ha if 2 ni Y i. − Y .. / (k − 1) V = Pk Pn ≥ Fk−1,m−k,α , 2 i Y − Y / (m − k) ij i. i=1 j=1 Pk i=1 2 where Fk−1,m−k,α is the 100 (1 − α)th quantile of an F -distribution with k − 1 numerator and m − k denominator degrees of freedom. We will discuss this statistical procedure in Chapter 4. A question that arises is “how robust is this test to the model assumptions of independence, normal, and equal variances?” Various authors, as will be discussed in the next section, have shown that the likelihood ratio test is not robust to the assumption of equal variances. In Chapter 5, we address this problem by deriving the exact distribution of V under the independent normal model with not assumption about the equality of the variances. It is shown that the distribution of V under the null hypothesis of equal means depends on the unknown population variances through the parameters λ21 = 2 σk−1 σ12 2 , . . . , λ = . k−1 σk2 σk2 In the case in which k = 2, Welch (1938) developed a test based on an approximation of the distribution of the statistic T =q 1 n1 (n1 −1) Pn1 j=1 Y 1. − Y 2. 2 2 . P 2 Y1j − Y 1. + n2 (n12 −1) nj=1 Y2j − Y 2. He approximated the distribution of T with a noncentral t-distribution in which the degrees of freedom is a function of the ratio λ2 = σ12 /σ22 and the sample sizes. The approximate test was then estimated by estimating λ2 with 2 Pn1 1 Y − Y 1j 1. j=1 n −1 2 b = 1 P λ 2 . n2 1 Y − Y 2j 2. j=1 n2 −1 Champ and Hu (2010) derived the exact distribution of T whose null distribution depends only on the parameter λ2 and developed a test that estimates the test based on the exact null distribution of T under the independent normal model. This estimate of the exact test was demonstrated by them to perform better the approximate/estimate test of Welch (1938). In Chapter 5, we extend Champ and Hu (2010) method to the case in which k > 2. 3 1.2 Literature Review Scheffee (1959) observed that the level of significance of the test may be significantly different from α if the assumption of equal variances does not hold. Thus, the Ftest can result in an increase of the Type I error when the assumption of equal variances is violated. This creates problem in biomedical experiments, where usually large samples are not used. In such experiments each data point can be vital and expensive (Ananda & Weerahandi, 1997). To avoid such problems, Welch (1951) developed a test that works well under heteroscedasticity, especially when the sample sizes are large. His method involved approximating the distribution of the likelihood ratio test statistic with a chi-square random variable, divided by its corresponding degrees of freedom. Then, the p-value is approximated based on the approximate distribution when the null hypothesis is true. Scott and Smith (1971) developed a test based on redefining the sample standard deviations. Under the null hypothesis, their test statistic follows a ChiSquare distribution. Further, Brown and Forsythe (1974) developed another test that utilizes a new test statistic that follows a central F distribution when the null hypothesis is true. Bishop and Dudewicz (1981), then, developed a test based on a two-stage sampling procedure. To improve on these exact tests, Rice and Gains (1989) extended the argument given by Barnard (1984) for comparing two population means to obtain an exact solution to the one-way ANOVA problem with unequal variances. Several other methods have been proposed using simulation. Krutchkoff (1988), for example, provided a simulation-based method of obtaining an approximate solution that works fairly well even with small sample sizes. There are several tests done in the literature that provide approximate tests, such as Chen and Chen, 1998; Chen, 2001; Tsui and Weerahandi, 1989; Krishnamoorthy et al., 2006; Xu and Wang, 2007a, 2007b. Krishnamoorthy et al. (2006) developed an approximate test based 4 on parametric bootstrapping. Weerahandi (1995a) developed a generalized p-value based on a generalized F-test to solve such problems with no adverse effect on the size of the test. Yiğit and Gokpinar (2010) showed using Monte Carlo simulations that Welch’s test, Weerahandi’s generalized F-test, and Krishnamoorthy et. al ’s test appear to be more powerful than other tests in most cases. 1.3 Overview of Proposed Research In this thesis, we derive a closed-form expressions for the probability density function of a linear combination of non-central chi-squares each with one degree of freedom, of a linear combination of central chi-squares, and of their ratio mutliplied by a certain constant. Furthermore, we look at the case in which variances are equal as a potential guide to our reseach. Laslty, we derive the distributions of SSE, SST R, and the likelihood ratio test statistic Q under the independent normal model when the variances are not assumed equal. CHAPTER 2 SOME USEFUL MATRIX RESULTS 2.1 Introduction In the study of the general linear model, vector and matrix notation are indispensable tools. In this chapter, some matrix results are given that are not typically found in the literature that are useful in finding the distribution of various quadratic forms. 2.2 Quadratic Forms For the vector yp×1 and the matrix Ap×p , the scalar value yT Ay is referred to as a quadratic form. For example, the variance S 2 of a sample of measurements Y1 , . . . , Yn can be expressed as a quadratic form. The sample variance is defined by S2 = where Y = 1 n Pn i=1 2 1 Xn Yi − Y , i=1 n−1 Yi . Observe that we can write Y1 − Y Y2 − Y 1 S2 = .. n−1 . Yn − Y T . Y1 − Y Y2 − Y .. . Yn − Y Next we observe that Y1 − Y Y2 − Y .. . Yn − Y n−1 n −1 n = . .. − n1 − n1 ... n−1 n . . . − n1 .. .. . . .. . − n1 . . . − n1 n−1 n Y1 Y2 . .. . Yn 6 Note that n−1 n −1 n . .. − n1 − n1 ... n−1 n . . . − n1 .. ... . .. . − n1 . . . − n1 n−1 n 1 =I− J n is symmetrix and idempotent. Hence we see that 1 1 2 T I − J Y, S = Y n−1 n where I is the n × n identity matrix, J the n × n matrix of ones, and Y = [Y1 , . . . , Yn ]T . It is interesting to look at the following factorization of the matrix I − n1 J. We can write 1 − n−1 1 1 1 n−1 1 − n−1 I− J= . .. n n .. . 1 1 − n−1 − n−1 ... 1 − n−1 1 . . . − n−1 .. ... . ... 1 The following theorem will be useful in examining further this quadratic form. Therorem 2.1: For any positive 1 ρ ρ 1 . . .. .. ρ ρ real number ρ, n×n ... ρ ... ρ = VCVT , . . . .. . ... 1 where the (i, j)th element of V is given by √ j(j+1) , i ≤ j; j(j+1) √ j j(j+1) vij = − j(j+1) , i = j + 1 < n; 0, otherwise. 7 and C = Diagonal (1 − ρ, . . . , 1 − ρ, 1 + (n − 1) ρ). Further V is a normalized orthogonal matrix. Proof of Theorem 2.1: The proof of this theorem can be found in Champ and Rigdon (2007). It follows from Theorem 2.1 that I− n−1 1 J= VCVT , n n where C = Diagonal n n ,..., ,0 n−1 n−1 n Diagonal (1, . . . , 1, 0) n−1 n H = n−1 = with H = Diagonal (1, . . . , 1, 0). Thus, 1 n−1 I− J= V n n n H VT = VHVT . n−1 Further, observe that H is symmetric and idempotent. Thus, we can write S2 = T 1 1 YT VHVT Y = HVT Y HVT Y . n−1 n−1 Theorem 2.2: If n1 , . . . , nk are integers each greater than or equal to 1 with at least one greater than or equal to 2 and m = n1 + . . . + nk , then the eigenvalues of the matrix I− 1 1/2 N JN1/2 m are 0 and 1 with 1 having multiplicity k − 1, where I is a k × k identity matrix, N = Diagonal (n1 , . . . , nk ), and J is a k × k matrix of ones. Further, the rank of this matrix is k − 1. 8 Proof of Theorem 2.2: Let λ be an eigenvalue of the matrix I − 1 N1/2 JN1/2 . m It follows that the characteristic polynomial g (λ) can be expressed as 1 g (λ) = I − N1/2 JN1/2 − λI = (−1)k λ (1 − λ)k−1 . m Thus, the eigenvalues are 0 and 1 with 1 having multiplicity k − 1. This also shows that the rank of I − 1 N1/2 JN1/2 m is k − 1. Theorem 2.3: Suppose n1 , . . . , nk are integers each greater than or equal to 1 with at least one greater than or equal to 2 and m = n1 + . . . + nk , then we have I− 1 1/2 N JN1/2 = VHVT , m where I is a k ×k diagonal matrix, N = Diagonal (n1 , . . . , nk ), and J is a k ×k matrix of ones, the orthonormal eigenvectors v1 , . . . , vk are the columns of V with the first k − 1 vectors associated with the k − 1 eigenvalues of 1 of the matrix I − m1 N1/2 JN1/2 and the eigenvector vk associated with the eigenvalue 0. Proof of Theorem 2.3: The proof of this theorem can be found in Champ and Jones-Farmer (2007). Theorem 2.4: If Σ = Diagonal (σ12 , . . . , σk2 ), I is a k × k identity matrix, N = Diagonal (n1 , . . . , nk ), and J is a k ×k matrix of ones with n1 , . . . , nk positive integers with at least one greater than or equal to 2, m = n1 + . . . + nk , and σi2 > 0 for i = 1, . . . , k − 1, then the matrix 1 1/2 1/2 1/2 Σ I − N JN Σ1/2 = VCVT , m is of rank k − 1, where the diagonal elements of C = Diagonal (ξ1 , . . . , ξk−1 , 0) are the eigenvalues with ξi > 0 of the matrix and V = [v1 , . . . , vk ]T is the matrix in which the column vectors are the associated orthonormal eigenvectors. Further, we have C1/2 VT N1/2 Σ−1/2 1 = 0, 9 where 0 and 1 are k × 1 vectors of zeros and ones, respectively. Proof of Theorem 2.4: Since the matrix I − 1 N1/2 JN1/2 m is a real symmetric I − m1 N1/2 JN1/2 Σ1/2 has matrix of rank k − 1 and Σ is of rank k, then Σ1/2 rank k − 1. Thus, the matrix Σ1/2 I − m1 N1/2 JN1/2 Σ1/2 has k − 1 positive real eigenvalues ξ1 , . . . , ξk−1 and one zero eigenvalue. It follows that we can express Σ1/2 I − m1 N1/2 JN1/2 Σ1/2 as VCVT . It is convenient in what follows to define −1/2 −1/2 C−1/2 = Diagonal ξ1 , . . . , ξk−1 , 0 . Note that C−1/2 C1/2 = Diagonal (1, . . . , 1, 0) = H and HC1/2 = C1/2 . Using the results in Dillies and Lakuriqi (2014), we can write C1/2 VT N1/2 Σ−1/2 1 = HC1/2 VT N1/2 Σ−1/2 1 = C−1/2 VT VC1/2 C1/2 VT N1/2 Σ−1/2 1 = C−1/2 VT VCVT N1/2 Σ−1/2 1 1 1/2 1/2 −1/2 T 1/2 1/2 =C V Σ Σ N1/2 Σ−1/2 1 I − N JN m 1 1/2 1/2 −1/2 T 1/2 −1/2 1/2 =C V Σ N N Σ1/2 Σ−1/2 N1/2 1 I − N JN m 1 −1/2 T 1/2 =C V Σ N1 − NJN1 m = C−1/2 VT Σ1/2 (N1 − N1) = 0. As discussed in Champ and Jones-Farmer (2007), it is convenient to define the p × p matrix B by B= 1 (δ + de1 ) (δ + de1 )T − I, d (δ1 + d) 2 T where δ = P−1 0 (µ − µ0 ), δ1 is the first component of the vector δ, d = δ δ = (µ − µ0 )T Σ−1 0 (µ − µ0 ), and e1 is a p × 1 vector with first coordinate one and the 10 remaining coordinates zero. It is easy to show that the matrix B is an orthogonal matrix that transforms δ into Bδ = [d, 0, . . . , 0]T = de1 . CHAPTER 3 SOME DISTRIBUTIONAL RESULTS 3.1 Introduction In the analysis of various designed experiments, the distribution of linear combinations of central and noncentral chi square random variables are of interest as well as the ratio of two such linear combinations. We will examine the distributions of the following linear combination of independent central chi square random variables, the linear combination of independent noncentral chi square random variables each with one degree of freedom, and the ratio of these linear combinations. The linear combinations and ratio are expressed symbolically as (1) Xs i=1 ai χ2νi ; (2) Xt i=1 ci χ21,τ 2 ; and (3) i Xt i=1 ci χ21,τ 2 / i Xs i=1 ai χ2νi . In the next section, we derive a closed form expression for (1). This is followed by two section in which we give closed form expressions for (2) and (3). 3.2 Joint Distribution of the Sample Mean and Variance A proof is given in Bain and Engelhart (1991) that the sample mean and variance are independent under the independent normal model. Further, they show that the sample mean has a normal distribution and the distribution of the sample variance is the same as the distribution of a given constant times a chi square random variable. In particular, if Y1 , . . . , Yn is a random sample from a N (µ, σ 2 ) distribution, then (n − 1) S 2 Y ∼ N µ, σ 2 /n and ∼ χ2n−1 . 2 σ 12 3.3 Linear Combination of Central Chi Squares First, we recall the form of the probability density function fX (x) of a central chi square distribution with ν degrees of freedom. We have fX (x) = xν/2−1 e−x/2 I(0,∞) (x) , Γ ν2 2ν/2 where IA (x) = 1 if x ∈ A and zero otherwise. The mean and variance are, respectively, 2 µX = ν and σX = 2ν. The moment generating function M GFX (t) of the distribution of X is M GFX (t) = 1 1 − 2t ν/2 . See Bain and Engelhardt (1991) for more details about the chi square distribution. Theorem 3.1: If X1 ∼ χ2ν1 , . . . , Xm ∼ χ2νm are independent random variables with ν1 , . . . , νm positive integers, then X1 + . . . + Xm ∼ χ2ν , with ν = ν1 + . . . + νm . Proof of Theorem 3.1: Bain and Engelhardt (1991) give a proof of this theorem using the moment generating function method. Theorem 3.2: If Xi ∼ χ2νi are independent random variables for i = 1, · · · , k, and ai are distinct positive real numbers, then the probability density function of U = a1 X1 + a2 X2 + · · · + ak Xk is given by fU (u) = = X∞ r1 =0 X r ··· X∞ rk−1 =0 ξr1 ,...,rk−1 fχ2ν −1 ξr fχ2νr a−1 k u ak 1 +...+νk +2r1 +...+2rk−1 −1 a−1 k u ak 13 where X r = X∞ r1 =0 ··· X∞ rk−1 =0 ; r = [r1 , . . . , rk−1 ]T ; νr = ν1 + . . . + νk + 2r1 + . . . + 2rk−1 ; r0 = 0; and ξr = ξr1 ,...,rk−1 (ν +...+νk−1 +2r1 +...+2rk−2 )/2 (ai+1 − ai )ri ak 1 Q = (ν +2r )/2 Qk−1 (νi +2ri +2ri−1 )/2 k−1 a1 1 1 a r ! i=2 i i=1 i Qk−1 ν1 +...+νi +2r1 +...+2ri i=1 Γ 2 . × Qk−1 ν1 +...+νi +2r1 +...+2ri−1 ν1 Γ Γ 2 i=2 2 (−1)r1 +...+rk−1 Q k−1 i=1 Proof of Theorem 3.2: Let Xi ∼ χ2νi be independent random variables and ai be distinct positive real values for i = 1, · · · , k. Consider the one-to-one transformation Ui = Xi j=1 aj X j for i = 1,. . . , k. Then, we can write the inverse transformation and the Jacobian as X1 = a−1 1 U1 and Xi = a−1 i (Ui − Ui−1 ) with J = k Y a−1 j , j=1 for i = 2, . . . , k. For ui > 0 (i = 2, . . . , k), the joint distribution function of U = 14 [U1 , U2 , · · · , Uk ]T can be expressed as k k Y Y −1 fU (u) = fX1 (a−1 u )( f (a (u − u )))( a−1 1 Xi i i i−1 1 i ) i=2 i=1 ν1 /2−1 −a−1 (a−1 e 1 u1 /2 ( ki=1 a−1 1 u1 ) i ) = Q k ν 2(ν1 +···+νk )/2 ( i=1 Γ( 2i )) k Y νi /2−1 −a−1 × (a−1 e i (ui −ui−1 )/2 ) i (ui − ui−1 )) Q i=2 Q u1 ν1 /2 ( ki=2 (ui − ui−1 ))νi /2−1 ) = Qk νi /2 Qk ( i=1 ai )( i=1 Γ( ν2i ))2(ν1 +···+νk )/2 ×e −a−1 1 u1 /2 k Y −1 ( e−ai (ui −ui−1 )/2 ) i=2 Q (ν1 +···+νk )/2−1 )(ν1 +···+νi−1 )/2−1 (1 − uui−1 )νi /2−1 u−1 ( ki=2 ( uui−1 i )uk i i = Q ν /2 Q ( ki=1 ai i )( ki=1 Γ( ν2i ))2(ν1 +···+νk )/2 −1 × e−ak uk /2 k−1 Y ( −1 e−ui (ai −a−1 i+1 )/2 ) i=1 Using Maclaurin series expansion, we have: k−1 Y e −1 −ui (a−1 i −ai+1 )/2 i=1 = k−1 ∞ YX −1 ri (−1)ri uri i (a−1 i − ai+1 ) 2ri ri ! i=1 r =0 i Thus, we obtain ∞ X Qk−1 −1 ∞ ri X (−1)r1 +···+rk−1 ( i=1 (ai − a−1 i+1 ) ) ··· fU (u) = Qk−1 Qk νi /2 Qk )( i=1 Γ( ν2i )) r1 =0 rk−1 =0 ( i=1 ri !)( i=1 ai −1 × e−ak uk /2 (ν +···+νk +2r1 +···+2rk−1 )/2−1 uk 1 2(ν1 +···+νk +2r1 +···+2rk−1 )/2 ui−1 (ν1 +···+νi−1 +2r1 +···+2ri−1 )/2−1 ui−1 νi /2−1 −1 ×( ( (1 − ) ) ui ) ui ui i=2 k Y Since Z ui ( 0 ui−1 c−1 ui−1 d−1 −1 Γ(c)Γ(d) ) (1 − ) ui dui−1 = ui ui Γ(c + d) 15 we can write the marginal distribution fU (u) = fUk (uk ) as Z uk Z u2 fU (u) du1 du2 · · · duk−1 Q −1 −1 ri (−1)r1 +···+rk−1 ( k−1 i=1 (ai − ai+1 ) ) = ··· Qk−1 Qk νi /2 Qk )( i=1 Γ( ν2i )) r1 =0 rk−1 =0 ( i=1 ri !)( i=1 ai ··· fU (u) = 0 ∞ X 0 ∞ X −1 × e−ak uk /2 (ν1 +···+νk +2r1 +···+2rk−1 )/2−1 (a−1 k uk ) 2(ν1 +···+νk +2r1 +···+2rk−1 )/2 (ν +···+νk +2r1 +···+2rk−1 )/2−1 × ak 1 × k Y Γ( ν1 +···+νi−1 +2r1 +···+2ri−1 )Γ( νi ) 2 2 Γ( ν1 +···+νi +2r2 1 +···+2ri−1 ) i=2 Notice that k Y Γ( i=2 k−1 Y = ν1 + · · · + νi−1 + 2r1 + · · · + 2ri−1 ) 2 Γ( i=1 ν1 + · · · + νi + 2r1 + · · · + 2ri ) 2 and that k−1 Y (a−1 i i=1 − ri a−1 i+1 ) Qk−1 = i=1 (ai+1 − ai ) Q k−1 ri i=1 (ai ai+1 ) ri . Also, k−1 k k−1 Y Y Y (ν +2r +2r )/2 (ν1 +2r1 )/2 (νk +2rk−1 )/2 νi /2 ri ak ( ai )( (ai ai+1 ) ) = a1 ( ai i i−1 i ). i=1 i=1 i=2 Thus, we obtain ∞ X ∞ X Q ri (−1)r1 +···+rk−1 ( k−1 i=1 (ai+1 − ai ) ) fU (uk ) = ··· Q Q (νi +2ri−1 +2ri )/2 (ν1 +2r1 )/2 ν1 ( k−1 )( k−1 r1 =0 rk−1 =0 a1 i=2 ai i=1 ri !)Γ( 2 ) Qk−1 ν1 +···+νi +2r1 +···+2ri ) (ν1 +···+νk−1 +2r1 +···+2rk−2 )/2−1 i=1 Γ( 2 × Qk−1 ak ν1 +···+νi +2r1 +···+2ri−1 ) i=2 Γ( 2 −1 e−ak uk /2 (a−1 uk )(ν1 +···+νk +2r1 +···+2rk−1 )/2−1 × ν1 +···+νk +2r1k+···2rk−1 . (ν1 +···+νk +2r1 +···+2rk−1 )/2 Γ( )2 2 The results follow. 16 Figure 3.1: Graph of f2χ23 +4χ26 (u) Consider the cases in which k = 2 and k = 3. For k = 2 with ν1 = 3, ν2 = 6, a1 = 2, and a2 = 4, the probability density function describing the distribution of U = 2χ23 + 4χ26 is given by 3/2 X r 3+2r ∞ (−1) Γ 4 −1 2 2 f fU (u) = 4 u . χ 9+2r r=0 2 4Γ 32 r! The graph of fU (u) is given in Figure 3.1. For k = 3 with ν1 = 3, ν2 = 6, ν3 = 7, a1 = 2, a2 = 4, and a3 = 5. The probability density function describing the distribution of U = 2χ23 + 4χ26 + 5χ27 is given by fU (u) = X∞ X∞ (−1)r+t r=0 t=0 1 r 4 1 t Γ 3+2r Γ 9+2r+2t 20 2 2 3 9+2r 15/2 2 r!t!Γ 2 Γ 2 The graph of this density is given in Figure 3.2. 5(7+2r+2t)/2 fχ216+2r+2t 5−1 u 17 Figure 3.2: Graph of f2χ23 +4χ26 +5χ27 (u) 3.4 Linear Combination of Noncentral Chi Squares Each with One Degree of Freedom In this section, we look at the probability distribution function of a non-central chisquare and some of its properties. We also derive a closed form expression for the probability distribution function describing the distribution of a linear combination of non-central chi-squares, each with one degree of freedom. Several approximations to this distribution have been done in the literature (Press 1966, Davis 1977). Theorem 3.3: If Z has a standard normal distribution and τ is a real number, then the probability density function describing the distribution of W = (Z + τ )2 18 is X∞ θr,τ 2 Γ 1+2r 2r 2 √ fχ21+2r (w) fW (w) = e r=0 π (2r)! ! X∞ (τ 2 )r Γ 1+2r 2r 2 2 √ = e−τ /2 fχ21 (w) + fχ21+2r (w) , r=1 π (2r)! −τ 2 /2 where θr,τ 2 = 1, if r = 0; (τ 2 )r , if r > 0. Proof of Theorem 3.3: The cumulative distribution function describing the distribution of W for is given by √ √ FW (w) = P (Z + τ )2 ≤ w = P − w − τ ≤ Z ≤ w − τ I(0,∞) (w) √ √ = Φ w − τ − Φ − w − τ I(0,∞) (w) , where Φ (z) is the cumulative distribution function of a standard normal distribution. It follows that the probability density function describing the distribution of W is 1 −1/2 √ 1 −1/2 √ φ w−τ + w φ − w − τ I(0,∞) (w) fW (w) = w 2 2 √ √ 2 2 1 1 −1/2 −( w−τ ) /2 −1/2 −(− w−τ ) /2 = √ w e + √ w e I(0,∞) (w) 2 2π 2 2π 2 √ e−τ /2 −1/2 −w/2 τ √w −τ w = √ w e +e I(0,∞) (w) , e 2 2π where φ (z) is the probability density function of a standard normal distribution. Next we observe that τ e √ w −τ +e √ w X∞ τ r wr/2 X∞ (−1)r τ r wr/2 = + r=0 r=0 r! r! r r/2 X∞ τ w = (1 + (−1)r ) r=0 r! X∞ (τ 2 )r wr =2 . r=0 (2r)! 19 Hence, we have X 2 r ∞ (τ 2 ) w r e−τ /2 −1/2 −w/2 I(0,∞) (w) fW (w) = √ w e 2 r=0 (2r)! 2 2π r X∞ (τ 2 ) 2 √ = e−τ /2 w(2r−1)/2 e−w/2 I(0,∞) (w) r=0 2π (2r)! r X ∞ (τ 2 ) −τ 2 /2 √ =e w(1+2r)/2−1 e−w/2 I(0,∞) (w) r=0 2π (2r)! X∞ (τ 2 )r Γ 1+2r 2(1+2r)/2 2 √ 2 = e−τ /2 r=0 2π (2r)! 1 × w(1+2r)/2−1 e−w/2 I(0,∞) (w) 1+2r Γ 2 2(1+2r)/2 X∞ (τ 2 )r Γ 1+2r 2(1+2r)/2 −τ 2 /2 2 √ =e fχ21+2r (w) r=0 2π (2r)! X∞ (τ 2 )r Γ 1+2r 2r 2 2 √ = e−τ /2 fχ21+2r (w) . r=0 π (2r)! The random variable W is said to have a noncentral chi square distribution with 1 degree of freedom and noncentrality parameter τ 2 or more compactly stated as W ∼ χ21,τ 2 . Theorem 3.4: If W ∼ χ2α,τ 2 , then the moment generating function of the distribution of W is M GFW (t) = (1 − 2t)−α/2 e−τ 2 t/(1−2t) for t ∈ (−1/2, 1/2). Proof of Theorem 3.4: The proof of this theorem can be found in Tanizaki (2004). Theorem 3.5: If Z1 , . . . , Zν are ν independent random variables each having a standard normal distribution and τ1 , . . . , τν are real numbers, then the distribution of W = (Z1 + τ1 )2 + . . . + (Zν + τν )2 is a noncentral chi square distribution with ν degrees of freedom and noncentrality parameter τ 2 = τ12 + . . . + τν2 . 20 Proof of Theorem 3.5: The moment generating function of W is M GFW (t) = Yν i=1 M GF(Zi +τi )2 (t) = Yν i=1 −ν/2 −(τ12 +···+τν2 )t/(1−2t) = (1 − 2t) (1 − 2t)−1/2 e−τ e 2 t/(1−2t) . This the moment generating function of a noncentral chi square distribution with ν degrees of freedom and noncentral parameter τ12 + · · · + τν2 . Theorem 3.6: If W1 ∼ χ21,τ 2 , . . . , Wη ∼ χ21,τη2 are stochasticly independent noncentral 1 chi square random variables each with one degree of freedom and c1 , . . . , cη are distinct positive real numbers for η ≥ 2, then the probability density function describing the distribution of W = c1 W1 +, . . . + cη Wη ∼ c1 χ21,τ 2 + . . . + cη χ21,τη2 1 is for w > 0 given by fW (w) = X∞ t1 =0 ··· X∞ tη =0 X∞ s1 =0 ··· X∞ sη−1 =0 ζt1 ,...,tη ,s1 ,...,sη−1 −1 × fχ2ν+2t +...+2t +2s +...+2s c−1 w cη η η 1 1 η−1 X −1 = ζt,s fχ2νt,s c−1 η w cη , t,s for w > 0 and fW (w) = 0 otherwise, where ζt,s = ζt1 ,··· ,tη ,s1 ,··· ,sη−1 θti ,τi2 Q 2 2 ( ηi=1 θti ,τi2 )(−1)s1 +···+sη−1 2t1 +···+tη e−(τ1 +···+τη )/2 = Q Q (1+2t +2s )/2 √ c1 1 1 ( π)η ( ηi=1 2ti !)( η−1 i=1 si !) Qη−1 (η−1+2t1 +···+2tη−1 +2s1 +···+2sη−2 )/2 ( i=1 (ci+1 − ci )si ) cη × Q 1+2ti +2si−1 +2si ( η−1 ) i=2 ci Qη−1 i+2t1 +···+2ti +2s1 +···+2si Qη i ( i=1 Γ( ))( i=2 Γ( 1+2t )) 2 2 × ; Qη−1 i+2t1 +···+2ti +2s1 +···+2si−1 ( i=2 Γ( )) 2 1, if ti = 0; = ; and t = (t1 , . . . , tη ) and s = (s1 , . . . , sη−1 ) . (τi2 )ti , if ti > 0. 21 Proof of Theorem 3.6: Let Wi ∼ χ21,τ 2 be independent random variables and ci be i positive real values for i = 1, · · · , η. Consider the one-to-one transformation Xi Wi = cj X j j=1 for i = 1,. . . , η. Note that W = Wη . Then, we can write the inverse transformation and the Jacobian as c−1 1 W1 X1 = c−1 i (Wi and Xi = − Wi−1 ) with J = η Y c−1 j , j=1 for i = 2, . . . , η. For wi > 0 (i = 1, . . . , η), the joint distribution function of W = [W1 , W2 , · · · , Wη ]T can be expressed as η η Y Y −1 −1 fW (w) = fX1 (c1 w1 )( fXi (ci (wi − wi−1 )))( c−1 i ) i=2 i=1 −1 (1+2t1 )/2−1 −c−1 X∞ θt1 ,τ 2 Γ 2 (c w ) e 1 w1 /2 1 1 −τ12 /2 1 =e √ 1 t1 =0 π (2t1 )!Γ 1+2t 2(1+2t1 )/2 2 η (1+2ti )/2−1 i Y X∞ θti ,τ 2 Γ 1+2t 2ti (c−1 i (wi − wi−1 )) 2 i × i ti =0 2(1+2ti )/2 (2ti )!Γ 1+2t 2 i=2 −1 η η 2 Y e−τi /2 e−ci (wi −wi−1 )/2 Y −1 √ × × ci ( π)i i=2 i=1 Q (1+2t )/2−1 Qη ∞ ∞ X X ( i=2 (wi − wi−1 )(1+2ti )/2−1 ) ( ηi=1 θti ,τi2 )w1 1 1+2t1 2 ··· = t1 Q Q √ (1+2ti )/2 ) 2η/2 ( π)η ( ηi=1 (2ti )!)( ηi=1 ci tη =0 t1 =0 η−1 ×e −(τ12 +···+τη2 )/2 −c−1 η wη e ( Y −1 e−wi (ci −c−1 i+1 )/2 ) i=1 Q −1 2 2 ( ηi=1 θti ,τi2 )e−(τ1 +···+τη )/2 e−cη wη ··· = Q Q √ η/2 ( π)η ( η (2t )!)( η c(1+2ti )/2 ) i t1 =0 tη =0 2 i=1 i=1 i ∞ X ∞ X η Y wi−1 (1+2ti )/2−1 −1 wi−1 (i−1+2t1 +···+2ti−1 )/2−1 ×( ( ) (1 − ) wi ) w w i i i=2 η−1 ×( Y −1 e−wi (ci −c−1 i+1 )/2 ) wη(η+2t1 +···+2tη )/2−1 . i=1 Using Maclaurin series expansion, we have: η−1 Y i=1 e −1 −wi (c−1 i −ci+1 )/2 = η−1 ∞ YX −1 si (−1)si wisi (c−1 i − ci+1 ) . si s ! 2 i i=1 s =0 i 22 Hence, Q 2 2 ∞ X ( ηi=1 θti ,τi2 )e−(τ1 +···+τη )/2 (−1)s1 +···+sη−1 fW (w) = ··· ··· Q Q √ (1+2ti )/2 ( π)η ( ηi=1 (2ti )!)( ηi=1 ci ) t1 =0 tη =0 s1 =0 sη−1 =0 Qη−1 −1 −1 (η+2t1 +···+2tη +2s1 +···+2sη−1 )/2−1 si e−cη wη /2 ( i=1 (ci − c−1 i+1 ) ) wη × Q (η+2s1 +···+2sη−1 )/2 ( η−1 i=1 si !) 2 η Y wi−1 (i−1+2t1 +···+2ti−1 +2s1 +···+2si−1 )/2−1 wi−1 (1+2ti )/2−1 −1 ×( ( ) (1 − ) wi ). w w i i i=2 ∞ X ∞ X ∞ X Since Z wi ( 0 wi−1 a−1 wi−1 b−1 −1 Γ(a)Γ(b) ) (1 − ) wi dwi−1 = , wi wi Γ(a + b) we can write the marginal distribution of Wη as Z fWη (wη ) = = wη Z fW (w) dw1 dw2 · · · dwη−1 0 Q 2 2 ∞ ∞ X ∞ X X ( ηi=1 θti ,τi2 )e−(τ1 +···+τη )/2 (−1)s1 +···+sη−1 Q √ ··· ··· ( π)η ( ηi=1 (2ti )!) sη−1 =0 tη =0 s1 =0 t1 =0 0 ∞ X −1 × w2 ··· e−cη wη /2 Q −1 −1 si (η+2t1 +···+2tη +2s1 +···+2sη−1 )/2−1 −1 ( η−1 i=1 (ci − ci+1 ) ) (cη wη ) Q (1+2ti )/2 Qη−1 )( i=1 si !) ( ηi=1 ci (η+2t +···+2t +2s +···+2s )/2−1 η 1 η−1 2t1 +···+tη cη 1 × 2(η+2t1 +···+2tη +2s1 +···+2sη−1 )/2 η Y Γ( i−1+2t1 +···+2ti−1 +2s1 +···+2si−1 )Γ( 1+2ti ) 2 2 . × i+2t1 +···+2ti +2s1 +···+2si−1 Γ( ) 2 i=2 Notice that η Y Γ( i=2 η−1 = Y i=1 i − 1 + 2t1 + · · · + 2ti−1 + 2s1 + · · · + 2si−1 ) 2 Γ( i + t1 + · · · + ti + 2s1 + · · · + 2si ) 2 and that η−1 Y i=1 (c−1 i − si c−1 i+1 ) Qη−1 si i=1 (ci+1 − ci ) = Q . η−1 si (c c ) i i+1 i=1 23 Also, η−1 η Y Y (1+2ti )/2 )( (ci ci+1 )si ) ( ci i=1 i=1 η−1 = (1+2t +2s )/2 η +2sη−1 )/2 ( c1 1 1 c(1+2t η Y (1+2ti +2si−1 +2si )/2 ci ). i=2 Hence, the marginal distribution fW (w) becomes Q 2 2 ∞ ∞ X ∞ ∞ X X X ( ηi=1 θti ,τi2 )e−(τ1 +···+τη )/2 (−1)s1 +···+sη−1 fW (w) = ··· ··· Q Q √ ( π)η ( ηi=1 (2ti )!)( η−1 i=1 si !) t1 =0 tη =0 s1 =0 sη−1 =0 Q (η−1+2t1 +···+2tη−1 +2s1 +···+2sη−2 )/2−1 si 2t1 +···+tη ( η−1 i=1 (ci+1 − ci ) ) cη × (1+2ti +2si−1 +2si )/2 (1+2t +2s )/2 Q c1 1 1 ( η−1 ) i=2 ci Qη−1 i+2t1 +···+2ti +2s1 +···+2si Qη 1+2ti ))( i=2 Γ( 2 )) ( i=1 Γ( × Qη−1 i+2t21 +···+2ti +2s1 +···+2si−1 ( i=2 Γ( )) 2 −1 e−cη × wη /2 (η+2t1 +···+2tη +2s1 +···+2sη−1 )/2−1 (c−1 η wη ) 1 +···+2sη−1 2(η+2t1 +···+2tη +2s1 +···+2sη−1 )/2 Γ( η+2t1 +···+2tη +2s ) 2 . The results follow. 3.5 Ratio of a Linear Combination of Noncentral Chi Square and Central Chi Square Random Variables It is useful to examine the distribution of a constant times the ratio of W and U . The probability density function describing the distribution of this ratio is given in the following theorem. Theorem 3.7: Let Xi ∼ χ2νi are independent random variables for i = 1, · · · , k, and ai are distinct positive real numbers, and let U = a1 X1 + a2 X2 + · · · + ak Xk . Let W1 ∼ χ21,τ 2 , . . . , Wk ∼ χ21,τη2 be stochasticly independent noncentral chi square 1 random variables each with one degree of freedom and c1 , . . . , cη are distinct positive real numbers for η ≥ 2, and let W = c1 W1 +, . . . + cη Wη . Then for positive real numbers dw and du the distribution of Q= W/dw du W = dw U U/du 24 is given by fQ (q) = where P t,s , P r, X ζt,s ξr X t,s r Γ ak dw cη du νt,s Γ 2 νt,s /2 νr 2 Γ 1+ νt,s +νr νt,s /2−1 q 2 ak dw q cη du (νt,s +νr )/2 νt,s , νr , ζt,s , and ξr are defined as in theorems 3.2 and 3.6. Proof of Theorem 3.7: The distributions of U and W are given in theorems 3.2 and 3.6 as fU (u) = X r X −1 ξr fχ2νr a−1 k u ak and fW (w) = t,s −1 ζt,s fχ2νt,s c−1 η w cη . Define Q= W/dw and Q2 = U/du U/du W = dw QQ2 and U = du Q2 and J = dw du q2 fQ,Q2 (q, q2 ) = fW (dw qq2 ) fU (du q2 ) dw du q2 X X −1 = ζt,s ξr fχ2νt,s c−1 η dw qq2 fχ2νr ak du q2 t,s r −1 × c−1 η ak d w d u q 2 = X X ζt,s ξr c−1 η dw qq2 νt,s /2−1 νt,s −1 e−(cη dw qq2 )/2 Γ 2 2νt,s /2 νr /2−1 −(a−1 du q2 )/2 e k a−1 k du q2 −1 × c−1 η ak d w d u q 2 Γ ν2r 2νr /2 νt,s /2 −1 νr /2 ν /2−1 X X ζt,s ξr c−1 ak d u q t,s η dw = νr t,s r (νt,s +νr )/2 a−1 d + c−1 d q (νt,s +νr )/2 Γ νt,s Γ 2 u w η k 2 2 (ν +ν )/2−1 r t,s −1 × a−1 k d u + cη d w q q 2 −1 −1 −1 × e−(ak du +cη dw q)q2 /2 a−1 k du + cη dw q t,s r 25 It follows that ∞ Z fQ,Q2 (q, q2 ) dq2 fQ (q) = 0 = X t,s Z × νt,s /2 −1 νr /2 ν /2−1 ak d u q t,s ζt,s ξr c−1 η dw νr r (νt,s +νr )/2 a−1 d + c−1 d q (νt,s +νr )/2 Γ νt,s Γ 2 u w η k 2 2 (νt,s +νr )/2−1 −1 d q q2 a−1 d + c w u η k X ∞ 0 −1 −(a−1 k du +cη dw q )q2 /2 −1 a−1 d + c d q dq2 u w η k νr /2 ν /2−1 νt,s /2 −1 X X d a ζt,s ξr c−1 d q t,s w u η k = (νt,s +νr )/2 t,s r −1 Γ ν2r 2(νt,s +νr )/2 a−1 Γ νt,s k d u + cη d w q 2 νt,s + νr ×Γ 2(νt,s +νr )/2 2 νt,s /2 −1 νr /2 X X ζt,s ξr c−1 ak d u Γ νt,s2+νr q νt,s /2−1 η dw = −1 νr t,s r −1 d q (νt,s +νr )/2 Γ νt,s Γ a d + c u w η k 2 2 νt,s /2 a d Γ νt,s2+νr q νt,s /2−1 X X ζt,s ξr ckη dwu = (νt,s +νr )/2 . t,s r νt,s ak dw νr Γ 2 Γ 2 1 + cη du q ×e 3.6 Conclusion Some distributional results were given. A closed form expression was derived for a linear combination of (central) chi square variates. Also, a closed form expression was derived for a linear combination of noncentral chi square variates each with one degree of freedom. In both cases, the resulting forms of the probability density functions are convex convolutions of probability density functions of central chi squares. This was followed by a closed form expression for the probability density function describing the distribution of a constant times the ratio of a linear combination of noncentral chi square variates each with one degree of freedom and a linear combination of central chi squares. CHAPTER 4 ANALYSIS ASSUMING EQUAL VARIANCES 4.1 Model It is often of interest to compare the means of more than two populations. This occurs in the design of experiments when the research wishes to investigate the effect of one factor on a response variable in which the factor has more than two levels. The levels of this factor that are of interest to the researcher may be considered to be fixed or a sample from the possible values of the factor. We are interested in this thesis in examining the case in which (1) the researcher is interested in comparing the means of k fixed populations or (2) a designed experiment with one factor in which the k levels of the factor are fixed. A commonly used statistical method for comparing these means is known as the ANalysis Of VAriance (ANOVA). However, this method was constructed by making three assumptions about the data. The firstly of these assumes that the response variables Yi,j to be taken on the jth individual from the ith population are uncorrelated. Secondly, it is assumed that Yi,j ∼ N (µi , σi2 ). These first two assumptions imply that the Yi,j ’s are independent. Thirdly, it is assumed that the k population variances are equal. That is, σ12 = . . . = σk2 = σ 2 . As has been demonstrated by Brown and Forsythe (1974), the size of the test is “markedly” different than the desired size. As stated by Rice and Gains (), an ANOVA analysis of the data is most affected when the variances are unequal. In this chapter, we derive the mathematical properties of the ANOVA procedure under the previously mentioned assumptions. While these results are derived in the literature, we will use this chapter as an outline for our method for comparing k 27 means without the assumption of equal variances. 4.2 Sources of Variability The method used to analyze the data begins by examining the statistic SST = Xk Xni i=1 j=1 Yi,j − Y .. 2 that is a measure of the variability in the responses. The statistic SST is referred to as sum of squares total. This variability can be partitioned as SST = SSE + SST R, where SSE = Xk Xni i=1 Yij − Y i. j=1 2 and SST R = Xk Xni i=1 j=1 Y i. − Y .. 2 . The statistics SSE and SST R are commonly referred to as the sum of squares error and the sum of squares treatment, respectively. Observe that we can write the SST R as SST R = 2 Xk 2 Y i. − Y .. = ni Y i. − Y .. j=1 i=1 T √ Y 1. − Y .. n1 Y 1. − Y .. √ n2 Y 2. − Y .. Y 2. − Y .. . .. . .. . √ Y k. − Y .. nk Y k. − Y .. Xk Xni i=1 = √ n1 √ n2 √ nk It is not difficult to show that SST R = N 1/2 Y T 1 1/2 1/2 I − N JN N1/2 Y , m where I is a k × k identity matrix, J is a k × k matrix of ones, m = n1 + . . . + nk , N = Diagonal (n1 , . . . , nk ); and T Y = Y 1. , . . . , Y k. . 28 As one can see, this is the quadratic form of SST R. It was shown in Chapter 2 that the k ×k matrix I− m1 N1/2 JN1/2 has k −1 eigenvalues that are one and one eigenvalue that is zero. Further, it was shown that the matrix can be expressed as I− 1 1/2 N JN1/2 = VHVT , m where the k × k matrix H has ones in the first k − 1 diagonal components and zeroes elsewhere, and the first k − 1 columns of the k × k matrix V are the normalized eigenvectors associated with the k − 1 eigenvalues of 1 and the k column is the normalized eigenvector associated with the eigenvalue of zero. It is not difficult to show that H is idempotent. We can now write SST R = N1/2 Y T T VHVT N1/2 Y = HVT N1/2 Y HVT N1/2 Y . Further, we can express N1/2 Y as √ √ µ1 Y 1. −µ1 √ Z1 + n1 δ1 σ/ n1 + n1 σ Y −µ Z2 + √n2 δ2 2.√ 2 + √n2 µ2 σ/ n2 σ 1/2 N Y = σ = σ .. .. . . √ √ Y k. −µk √ + nk µσk Zk + nk δk σ/ nk √ where Zi = Y i. − µi / σ/ n1 ∼ N (0, 1), δi = µi /σ, = σ Z + N1/2 δ , Z = [Z1. , . . . , Zk. ]T and δ = [δ1. , . . . , δk. ]T . It is not difficult to see that Zi ’s are independent. It follows that T SST R = σ 2 HVT Z + N1/2 δ HVT σ Z + N1/2 δ T = σ 2 HVT Z + HVT N1/2 δ HVT Z + HVT N1/2 δ . The random vector T ∗ Z∗ = HVT Z = Z1.∗ , . . . , Zk−1. , 0 ∼ Nk (0, H) . 29 Hence, the first k − 1 components of Z∗ are independent standard normal random variables. Letting τ = HVT N1/2 δ, we have SST R = σ 2 (Z∗ + τ )T (Z∗ + τ ) = σ 2 Xk−1 i=1 (Zi.∗ + τi )2 . Applying Theorem 3.5 in Chapter 3, we have SST R ∼ σ 2 χ2k−1,τ 2 , where Xk−1 T τi2 = τ T τ = HVT N1/2 δ HVT N1/2 δ i=1 1 1/2 1/2 1/2 T N1/2 δ = N δ I − N JN m Xk 2 Xk ni (µi − µ)2 n i δi − δ = = , i=1 i=1 σ2 τ2 = where δ= 1 Xk 1 Xk ni δi = ni µi /σ = µ/σ. i=1 i=1 m m Further, we observe that SSE = = = Xk Xni i=1 j=1 Yij − Y i. k X (ni − 1) s2i i=1 k X i=1 (ni − 1) s2i . σ σ2 2 Thus, k SSE X 2 ∼ χni −1 ∼ χ2m−k σ2 i=1 2 30 4.3 The Classical F test In the previous section, we have derived the distribution of SST R/σ 2 and SSE/σ 2 . It is important to note that SSE and SST R are independent. Thus, Q= χ2k−1,τ 2 /(k − 1) SST R/(k − 1) ∼ 2 . SSE/(m − k) χm−k /(m − k) The F distribution is a ratio of two independent Chi-squares, each divided by its degrees of freedom (Bain and Engelhardt 1991). Thus, Q ∼ Fk−1,m−k,τ 2 . Under the null hypothesis when all the population means are equal, the non-centrality parameter τ 2 becomes zero. Thus, the distribution of Q becomes a central F distribution with k − 1 and m − k degrees of freedom. Therefore, p-value = P (Fk−1,m−k ≥ qobs ), where qobs is the observed value of the ratio Q. The null hypothesis is rejected when p-value is less than some significance level α. CHAPTER 5 ANALYSIS WITH NO ASSUMPTION OF THE EQUALITY OF VARIANCES 5.1 Introduction In this chapter, we use the methods and theorems provided in chapters 2 through 4 to derive the distribution of the likelihood ratio test statistic under the independent normal model without the assumption of equal population variances. 5.2 Distribution of SSE with No Assumption about the Variances Theorem 5.1: Under the independent normal model with no assumption of equality of the population variances, SSE/σk2 ∼ Xk i=1 λ2i χ2ni −1 , where λ2i = σi2 /σk2 . Proof of Theorem 5.1: SSE = = Xk Xni j=1 i=1 Xk i=1 σi2 Yij − Y i. 2 = Xk i=1 (ni − 1) s2i (ni − 1) Si2 Xk ∼ σi2 χ2ni −1 , i=1 σi2 since (ni − 1) Si2 /σi2 ∼ χ2ni −1 for i = 1, . . . , k. By dividing SSE by σk2 , the results follows. The following theorem gives a closed form expression for the probability density function describing the distribution of SSE/σk2 . Theorem 5.2: Under the independent normal model with no assumption about the population variances with distinct values of λ2i = σi2 /σk2 for i = 1, . . . , k, fSSE/σk2 (u) = X∞ r1 =0 ··· X∞ rk−1 =0 ξr1 ,...,rk−1 fχ2n 1 +...+nk −k+2r1 +...+2rk−1 (u) , 32 where ξr1 ,...,rk−1 Q k−1 2 2 ri λ − λ (−1)r1 +...+rk−1 i+1 i i=1 Q Q = k−1 ni −1+2ri +2ri−1 k−1 λ r ! λν11 +2r1 i=2 i i=1 i Qk−1 n1 +...+ni −i+2r1 +...+2ri i=1 Γ 2 × Qk−1 n1 +...+ni −i+2r1 +...+2ri−1 n1 −1 Γ 2 i=2 Γ 2 with λ2i = σi2 /σk2 . Proof of Theorem 5.2: Subsituting ai by λ2i and νi = ni − 1 (i = 1, . . . , k) in Theorem 3.2 gives the result. 5.3 Distribution of SST R Theorem 5.3: Under the independent normal model and no assumption about the population variances, Pk−1 γi (χ2 ) , if H0 holds; 1 i i=1 , SST R/σk2 ∼ P 2 k−1 , if H holds. c χ a i=1 i 1,∆2 i √ vji µj nj /σj , γi = ci σk2 , ci are the eigenvalues and vi are the or thonormal eigenvectors of the matrix Λ I − m1 N1/2 JN1/2 Λ with Λ =Diagonal(λ1 , . . . , λk−1 , 1) where ∆i = Pk j=1 and λi = σi /σk for i = 1, ..., k. Proof of Theorem 5.3: 2 Xk 2 Y i. − Y .. = ni Y i. − Y .. i=1 j=1 i=1 T √ √ n1 Y 1. − Y .. n1 Y 1. − Y .. √ n2 Y 2. − Y .. √n2 Y 2. − Y .. = .. .. . . √ √ nk Y k. − Y .. nk Y k. − Y .. T 1 1/2 1/2 1/2 = N Y I − N JN N1/2 Y m SST R = Xk Xni 33 √ √ µ1 + n1 σ 1 σ1 √ µ2 Y 2. −µ2 σ2 σ2 /√n2 + n2 σ2 .. . √ µk −µk + σk σYkk./√ nk σk nk n1 Y 1. √ n2 Y 2. N1/2 Y = = .. . √ nk Y k. √ σ1 Z1 + n1 µσ11 σ2 Z2 + √n2 µ2 σ2 = . .. √ σk Zk + nk µσkk Y 1. −µ1 √ σ1 / n1 = Σ1/2 Z + N1/2 Σ−1/2 µ It follows that 1 1/2 1/2 Σ1/2 Z + N1/2 Σ−1/2 µ SST R = Σ Z+N Σ µ I − N JN m T 1 1/2 1/2 1/2 −1/2 1/2 1/2 = Z+N Σ µ Σ I − N JN Σ Z + N1/2 Σ−1/2 µ . m 1/2 1/2 −1/2 T Then, T 1 1/2 SST R 1/2 1/2 −1/2 1/2 −1/2 N JN Λ Z + N Σ µ = Z + N Σ µ Λ I − σk2 m where Λ = diagonal(λ1 , . . . , λk−1 , 1) with λ2i = σi2 /σk2 . Using Theorem 2.4, Λ I − m1 N1/2 JN1/2 Λ = VCVT , with C = Diagonal (c1 , . . . , ck−1 , 0). The diagonal elements of C are the eigenvalues with corresponding column vectors of V the normalized orthogonal eigenvectors of the matrix Λ I − m1 N1/2 JN1/2 Λ. We can now write SST R/σk2 as T SST R T 1/2 −1/2 1/2 −1/2 = Z + N Σ µ VCV Z + N Σ µ σk2 T 1/2 T = C1/2 VT Z + C1/2 VT N1/2 Σ−1/2 µ C V Z + C1/2 VT N1/2 Σ−1/2 µ 34 With Z∗ = VT Z, observe that C 1/2 T V Z=C 1/2 ∗ Z = h iT 1/2 1/2 ∗ c1 Z1∗ , . . . , ck−1 Zk−1 ,0 where Z∗ ∼ Nk (0, I). For convenience, we define δ = N1/2 Σ−1/2 µ and ∆ = VT N1/2 Σ−1/2 µ = VT δ. For clarity, we observed that √ c1 (Z1∗ + ∆1 ) .. . SST R = √c σk2 ∗ k−1 Zk−1 + ∆k−1 0 with ∆i = Pk j=1 T √ c1 (Z1∗ + ∆1 ) .. . √ c ∗ k−1 Zk−1 + ∆k−1 0 √ vji µj nj /σj (i = 1, . . . , k), where vji is the jth element of the eigenvector vi .. It then follows that SST R Xk−1 = ci (Zi∗ + ∆i )2 . i=1 σk2 As has been shown, (Zi∗ + ∆i )2 ∼ χ21,∆2 . i Hence, SST R Xk−1 ∼ ci χ21,∆2 . i i=1 σk2 The null hypothesis of equal means can be expressed as H0 : µ = µ1, where µ is an unknown constant and 1 is a k × 1 vector of ones. Note that Σ 1/2 1 1/2 1/2 I − N JN Σ1/2 = VDVT , m 35 where D = Diagonal (γ1 , . . . , γk−1 , 0) with γi = ci σk2 . Using the results of Theorem 2.4, we see that D1/2 VT N1/2 Σ−1/2 µ = µD1/2 VT N1/2 Σ−1/2 1 = 0. Thus, we can write T Xk−1 SST R 1/2 T 1/2 T = D V Z D V Z = γi χ21 i . 2 i=1 σk We can also note from the previous theorem that Xk−1 i=1 T 1/2 C ∆ ci ∆2i = C1/2 ∆ = C1/2 VT N1/2 Σ−1/2 µ T C1/2 VT N1/2 Σ−1/2 µ T 1/2 T −1/2 1/2 = C1/2 VT Σ−1/2 N1/2 µ C V Σ N µ T 1/2 T −1/2 T 1/2 T −1/2 = N1/2 µ C V Σ C V Σ N1/2 µ T −1/2 = N1/2 µ Σ VCVT Σ−1/2 N1/2 µ T 1 1/2 −1/2 1/2 −1/2 1/2 1/2 −1/2 = N µ Σ Σ I − N JN Σ Σ N1/2 µ n T 1 = N1/2 µ I − N1/2 JN−1/2 N1/2 µ n Xk = ni (µi − µ)2 , i=1 where µ= 5.4 1 Xk ni µi . i=1 m Distribution of the Likelihood Ratio Test Statistic In this section, we derive the distribution of the likelihood ratio test statistic under the null and alternative hypotheses. Theorem 5.4: Under the independent normal model and no assumption about the variances, the distribution of Q= M ST R SST R/(k − 1) = M SE SSE/(m − k) 36 is given by fQ (q) = X X ζt,s ξr t,s r Γ k−1 ck−1 (m−k) νt,s Γ 2 νr 2 νt,s /2 1+ Γ νt,s +νr νt,s /2−1 q 2 k−1 q ck−1 (m−k) (νt,s +νr )/2 , where M ST R = SST R/ (k − 1), M SE = SSE/ (m − k), and X t,s X r = = X∞ t1 =0 X∞ r1 =0 ··· ··· X∞ X∞ s1 =0 tk−1 =0 X∞ rk−1 =0 ··· X∞ sk−2 =0 ; ; νt,s = k − 1 + 2t1 + . . . + 2tk−1 + 2s1 + . . . + 2sk−2 ; νr = n1 + . . . + nk + 2r1 + . . . + 2rk−1 − k; Q k−1 2 ri 2 − λ (−1)r1 +...+rk−1 λ i i+1 i=1 Q Q ξr = (n −1+2r1 ) k−1 k−1 (ni −1+2ri +2ri−1 ) r ! λ λ1 1 i=1 i i=2 i Qk−1 n1 +...+ni +2r1 +...+2ri −i i=1 Γ 2 × Qk−1 n1 +...+ni +2r1 +...+2ri−1 −i ; n1 −1 Γ 2 i=2 Γ 2 Qk−1 s1 +···+sk−2 t1 +···+tk−1 −(∆21 +···+∆2k−1 )/2 2 e ( i=1 θti ,∆2i )(−1) ζt,s = Qk−2 Q √ (1+2t +2s )/2 c1 1 1 ( π)k−1 ( k−1 i=1 2ti !)( i=1 si !) Qk−2 (k−2+2t +···+2tk−2 +2s1 +···+2sk−3 )/2 (ci+1 − ci )si ) ck−1 1 ( i=1 × Q 1+2ti +2si−1 +2si ) ( k−2 i=2 ci Qk−2 i+2t1 +···+2ti +2s1 +···+2si Qk−1 1+2ti ( i=1 Γ( ))( i=2 Γ( 2 )) × ; and Qk−1 i+2t21 +···+2ti +2s1 +···+2si−1 ( i=2 Γ( )) 2 1, if ti = 0; θti ,∆2i = . (∆2i )ti , if ti > 0. Proof Theorem 5.4: Using Theorems 3.2, 3.6, 3.7, 5.2 and 5.3, and subtituting νi by ni − 1, ai by λ2i , τ by ∆, and η by k − 1, give the result. Corollary 5.4.1: The cumulative distribution function describing the distribution 37 of Q is given as q Z fQ (x) dx FQ (q) = 0 = X t,s r q Z × 0 νt,s /2 k−1 Γ ck−1 (m−k) Γ νt,s Γ ν2r 2 νt,s /2−1 X ζt,s ξr x 1+ k−1 x ck−1 (m−k) νt,s +νr 2 (νt,s +νr )/2 dx. Proof of Corollary 5.4.1: The proof is straight forward. Theorem 5.5: Under the independent normal model and no assumption about the variances, and under the null hypothesis that the population means are equal, the probability distribution function describing the distribution of Q= SST R/(k − 1) M ST R = M SE SSE/(m − k) is given by fQ (q) = X X s r ζs ξr Γ νs 2 k−1 γk−1 (m−k) Γ νr 2 νs /2 1+ Γ νs +νr 2 k−1 q γk−1 (m−k) q νs /2−1 (νs +νr )/2 , 38 where M ST R = SST R/ (k − 1), M SE = SSE/ (m − k), and X s X r = = X∞ s1 =0 X∞ r1 =0 X∞ ··· sk−2 =0 X∞ ··· rk−1 =0 ; ; νs = k − 1 + 2s1 + . . . + 2sk−2 ; νr = n1 + . . . + nk + 2r1 + . . . + 2rk−1 − k; Q k−1 2 2 ri λ − λ (−1)r1 +...+rk−1 i+1 i i=1 Q Q ξr = k−1 ni −1+2ri +2ri−1 k−1 λ r ! λn1 1 −1+2r1 i=2 i i=1 i Qk−1 n1 +...+ni +2r1 +...+2ri −i i=1 Γ 2 × Qk−1 n1 +...+ni +2r1 +...+2ri−1 −i ; n1 −1 Γ 2 i=2 Γ 2 Q (k−2+2s1 +···+2sk−3 )/2 si (−1)s1 +···+sk−2 ( k−2 i=1 (γi+1 − γi ) )γk−1 ζs = Q 1+2si−1 +2si (1+2s1 )/2 Qk−2 ) γ1 ( i=1 si !)( k−2 i=1 γi Qk−2 i+2s1 +···+2si ) i=1 Γ( 2 Q . × k−2 i+2s1 +···+2si−1 Γ( ) Γ( 12 ) i=2 2 Proof Theorem 5.5: Using the same procedure as in Theorems 3.7, and using Theorems 5.3, we get the result. Corollary 5.5.1: Under the null hypothesis that the population means are equal, the cumulative distribution function describing the distribution of Q is given as Z q fQ (x) dx FQ (q) = 0 = X X ζs ξr s Z × 0 Γ r q x 1+ k−1 γk−1 (m−k) νs 2 νs /2 Γ Γ νr νs +νr 2 2 νs /2−1 k−1 x γk−1 (m−k) (νs +νr )/2 dx. Proof of Corollary 5.5.1: The proof is straight forward. The value of FQ (q) for a given value of q must be obtained numerically as would determining the 100 (1 − α)th percentile of the distribution of Q. Under the hypothesis that the population means are equal (null hypothesis), this could only be done 39 2 for given values of λ21 = σ12 /σk2 , . . . , λ2k−1 = σk−1 /σk2 . On the other hand, the null distribution of Q can be estimated using the data (see Welch (1938)). For example, one could estimate the paramter λ2i = σi2 /σk2 by estimating the population variance σi2 with the sample variance Si2 for i = 1, . . . , k. One could then obtain an estimate of the 100 (1 − α)th percentile of the distribution of Q as well as an estimate of the p-value of the test for the equality of means. 5.5 Conclusion Closed form expressions were given for the probability density function describing the distribution of SST R/σk2 and SSE/σk2 under the independent normal model with no assumption about the equality of the variances. Further, closed form expressions for the probability density and cumulative distribution functions describing the distribution of the likelihood ratio test statistic M ST R/M SE were derived. CHAPTER 6 CONCLUSION 6.1 General Conclusions To conclude, closed form expressions for the probability density function and the cumulative distribution function describing the distribution of the likelihood ratio test statistic, under the independent normal model with no assumption of equality of population variances, are derived. However, values of FQ (q) for given values of q must be obtained numerically as would determining critical values. An estimate of the p-value of the test for the equality of the population means must also be obtained numerically based on the exact distribution of the likelihood ratio test statistic. 6.2 Areas for Further Research We intend to develop a MATLAB program that computes an estimate of the 100(1 − α)th percentile of the distribution of Q under the null hypothesis, as well as an estimate of the p-value of the test for the equality of means. Furthermore, we intend to compare our test to other tests found in the literature. This method can also be extended to comparing levels of more than one factor. Furthermore, further research can be made when the population means are ordered or when two population variances are the same. 41 REFERENCES [1] Ananda, M. and Weerahandi, S. (1997), “Two-Way ANOVA with Unequal Cell Frequencies and Unequal Variances,” Statistica Sinica 7, 631-646. [2] Barnard, G. A. (1984), “Comparing the Means of Two Independent Samples,” Applied Statistics 33, 266-271. [3] Bishop, T.A. and Dudewicz, E.J. (1978), “Exact Analysis of Variance with Unequal Variances: Test Procedures and Tables,” Technometrics 20, 419-430. [4] Brown, M. B., and A. B. Forsythe. 1974. Robust tests for the equality of variances. Journal of the American Statistical Association 69: 364–367. [5] Champ, C.W. and Jones-Farmer, L.A. (2007), “Properties of Multivariate Control Chart with Estimated Parameters,” Sequential Analysis 26(2), 153-169. [6] Champ, C.W. and Hu, F. (2010), “ An Exact Expression for the Univariate Behrens-Fisher Distribution,” unpublished manuscript. [7] Champ, C.W. and Rigdon, S.E. (2007), personal communication. [8] Davis, A.W. (1977) “A Differential Equation Approach to Linear Combinations of Independent Chi-Squares,” Journal of the American Statistical Association 72: 212–214. [9] Dillies, J. and Lakuriqi, E. (2014), personal communication. [10] Krutchkoff, R. G. (1988), “One-way Fixed Effects Analysis of Variance When the Error Variances May Be Unequal,” Journal of Statistical Computation and Simulation 30, 259-271. [11] Press, S.J. (1966), “Linear Combinations of Non-central Chi-squared Variates,” Annals of Mathematical Statistics 37(2), 480–487. [12] Rice, W. R. and Gains, S. D. (1989), “One-way Analysis of Variance with Unequal Variances,” Proceedings of the National Academia Sciences 86, 8183-8184. 42 [13] Scheffe, H. (1959), The Analysis of Variance, Wiley, New York (pp. 351-358). [14] Tanizaki, H. (2004), Computational Methods in Statistics and Econometrics, Marcel Dekker Inc., New York (pp. 118-119) [15] Tsui, K. and Weerahandi, S. (1989), “Generalized p-values in Significance Testing of Hypotheses in the Presence of Nuisance Parameters,” Journal of the American Statistical Association 84, 602-607. [16] Wang, Y.Y. (1971), “Probabilities of the Type I Errors of the Welch Tests for the Behrens-Fisher Problem,” Journal of the American Statistical Association 66, 605-608. [17] Weerahandi, S. (1995a). “ ANOVA under unequal error variances,” Biometrics 51, 589-599. [18] Welch, B.L. (1938), “The Significance of the Difference Between Two Means When the Population Variances Are Unequal,” Biometrika 29, 350–362. [19] Welch, B.L. (1951), “On the Comparison of Several Mean Values: An Alternative Approach,” Biometrika 38, 330-336. [20] Yiğit E, Gokpinar F. (2010), “A Simulation Study on Tests for One-Way ANOVA Under the Unequal Variance Assumption,” Communications Series A1 Mathematics & Statistics 59(2), 15-34.
© Copyright 2026 Paperzz