Supplemental Material for Chapter 3 S3-1. Random Samples To properly apply many statistical techniques, the sample drawn from the population of interest must be a random sample. To properly define a random sample, let x be a random variable that represents the results of selecting one observation from the population of interest. Let f ( x) be the probability distribution of x. Now suppose that n observations (a sample) are obtained independently from the population under unchanging conditions. That is, we do not let the outcome from one observation influence the outcome from another observation. Let xi be the random variable that represents the observation obtained on the ith trial. Then the observations x1 , x2 ,..., xn are a random sample. In a random sample the marginal probability distributions f ( x1 ), f ( x2 ),..., f ( xn ) are all identical, the observations in the sample are independent, and by definition, the joint probability distribution of the random sample is f ( x1 , x2 ,..., xn ) f ( x1 ) f ( x2 )... f ( xn ) . S3-2. Expected Value and Variance Operators Readers should have prior exposure to mathematical expectation from a basic statistics course. Here some of the basic properties of expectation are reviewed. The expected value of a random variable x is denoted by E ( x) and is given by xi p( xi ), xi is a discrete random variable all xi E ( x) xf ( x)dx, x is a continuous random variable The expectation of a random variable is very useful in that it provides a straightforward characterization of the distribution, and it has a simple practical interpretation as the center of mass, centroid, or mean of the distribution. Now suppose that y is a function of the random variable x, say y h( x) . Note that y is also a random variable. The expectation of h( x) is defined as h( xi ) p( xi ), xi is a discrete random variable all xi E[h( x)] h( x) f ( x)dx, x is a continuous random variable An interesting result, sometimes called the “theorem of the unconscious statistician” states that if x is a continuous random variable with probability density function f ( x) and y h( x) is a function of x having probability density function g ( y ) , then the expectation of y can be found either by using the definition of expectation with g ( y ) or in terms of its definition as the expectation of a function of x with respect to the probability density function of x. That is, we may write either 1 E ( y) yg ( y)dy or E ( y) E[h( x)] h( x) f ( x)dx The name for this theorem comes from the fact that we often apply it without consciously thinking about whether the theorem is true in our particular case. Useful Properties of Expectation I: Let x be a random variable with mean , and c be a constant. Then 1. 1. E (c) c 2. 2. E ( x) 3. 3. E (cx) cE ( x) c 4. 4. E[ch( x)] cE[h( x)] 5. If c1 and c2 are constants and h1 and h2 are functions, then E[c1h1 ( x) c2h2 ( x)] c1E[h1 ( x)] c2 E[h2 ( x)] Because of property 5, expectation is called a linear (or distributive) operator. Now consider the function h( x) ( x c)2 where c is a constant, and suppose that E[( x c)2 ] exists. To find the value of c for which E[( x c)2 ] is a minimum, write E[( x c)2 ] E[ x 2 2 xc c 2 ] E ( x 2 ) 2cE ( x) c 2 Now the derivative of E[( x c)2 ] with respect to c is 2 E ( x) 2c , and this derivative is zero when c E ( x ) . Therefore, E[( x c)2 ] is a minimum when c E ( x) . The variance of the random variable x is defined as V ( x) E[( x ) 2 ] 2 and we usually call V ( x) E[( x )2 ] the variance operator. It is straightforward to show that if c is a constant, then V (cx) c 2 2 The variance is analogous to the moment of inertia in mechanics. 2 Useful Properties of Expectation II: Let x1 and x2 be random variables with means 1 and 2 and variances 12 and 22 , respectively, and let c1 and c2 be constants. Then 1. E( x1 x2 ) 1 2 2. It is possible to show that V ( x1 x2 ) 12 22 2Cov( x1 , x2 ) , where Cov( x1 , x2 ) E[( x1 1 )( x2 2 )] is the covariance of the random variables x1 and x2. The covariance is a measure of the linear association between x1 and x2. More specifically, we may show that if x1 and x2 are independent, then Cov( x1 , x2 ) 0 . 3. V ( x1 x2 ) 12 22 2Cov( x1 , x2 ) 1. If the random variables x1 and x2 are independent, V ( x1 x2 ) 12 22 2. If the random variables x1 and x2 are independent, E( x1 x2 ) E( x1 ) E( x2 ) 12 3. Regardless of whether x1 and x2 are independent, in general x E ( x1 ) E 1 x2 E ( x2 ) 7. For the single random variable x V ( x x) 4 2 because Cov( x, x) 2 . Moments Although we do not make much use of the notion of the moments of a random variable in the book, for completeness we give the definition. Let the function of the random variable x be h( x ) x k where k is a positive integer. Then the expectation of h( x) x k is called the kth moment about the origin of the random variable x and is given by xik p ( xi ), xi is a discrete random variable all xi E( xk ) x k f ( x)dx, x is a continuous random variable Note that the first origin moment is just the mean of the random variable x. The second origin moment is E( x2 ) 2 2 3 Moments about the mean are defined as ( xi ) k p( xi ), xi is a discrete random variable all xi E[( x ) k ] ( x ) k f ( x)dx, x is a continuous random variable The second moment about the mean is the variance 2 of the random variable x. S3-3. Proof That E ( x ) and E (s 2 ) 2 It is easy to show that the sample average x and the sample variance s2 are unbiased estimators of the corresponding population parameters and 2 , respectively. Suppose that the random variable x has mean and variance 2 , and that x1 , x2 ,..., xn is a random sample of size n from the population. Then 1 n E ( x ) E xi n i 1 1 n E ( xi ) n i 1 1 n n i 1 because the expected value of each observation in the sample is E ( xi ) . Now consider n 2 ( xi x ) E ( s 2 ) E i 1 n 1 1 n E ( xi x ) 2 n 1 i 1 It is convenient to write n n i 1 i 1 ( xi x )2 xi2 nx 2 , and so n n E ( xi x ) 2 E ( xi2 ) E (nx 2 ) i 1 i 1 n Now E(x i 1 2 i ) 2 2 and E (x 2 ) 2 2 / n . Therefore 4 E (s 2 ) 1 n ( 2 2 ) n( 2 2 / n) n 1 i 1 1 n 2 n 2 n 2 2 n 1 (n 1) 2 n 1 2 Note that: a. These results do not depend on the form of the distribution for the random variable x. Many people think that an assumption of normality is required, but this is unnecessary. b. Even though E (s 2 ) 2 , the sample standard deviation is not an unbiased estimator of the population standard deviation. This is discussed more fully in section S3-5. S3-4. More About Parameter Estimation Throughout the book estimators of various population or process parameters are given without much discussion concerning how these estimators are generated. Often they are simply “logical” or intuitive estimators, such as using the sample average x as an estimator of the population mean . There are methods for developing point estimators of population parameters. These methods are typically discussed in detail in courses in mathematical statistics. We now give a brief overview of some of these methods. The Method of Maximum Likelihood One of the best methods for obtaining a point estimator of a population parameter is the method of maximum likelihood. Suppose that x is a random variable with probability distribution f ( x; ) , where is a single unknown parameter. Let x1 , x2 ,..., xn be the observations in a random sample of size n. Then the likelihood function of the sample is L( ) f ( x1; ) f ( x2 ; ) f ( xn ; ) The maximum likelihood estimator of is the value of that maximizes the likelihood function L( ). Example 1 The Exponential Distribution To illustrate the maximum likelihood estimation procedure, set x be exponentially distributed with parameter . The likelihood function of a random sample of size n, say x1 , x2 ,..., xn , is 5 n L( ) e xi i 1 ne n xi i 1 Now it turns out that, in general, if the maximum likelihood estimator maximizes L( ), it will also maximize the log likelihood, ln L( ) . For the exponential distribution, the log likelihood is n ln L( ) n ln xi i 1 Now d ln L( ) n n xi d i 1 Equating the derivative to zero and solving for the estimator of we obtain ˆ n n x 1 x i i 1 Thus the maximum likelihood estimator (or the MLE) of is the reciprocal of the sample average. Maximum likelihood estimation can be used in situations where there are several unknown parameters, say 1 , 2 , , p to be estimated. The maximum likelihood estimators would be found simply by equating the p first partial derivatives L(1 , 2 , , p ) / i , i 1, 2,..., p of the likelihood (or the log likelihood) equal to zero and solving the resulting system of equations. Example 2 The Normal Distribution Let x be normally distributed with the parameters and 2 unknown. The likelihood function of a random sample of size n is 1 xi 2 1 L( , 2 ) e 2 i 1 2 n 1 (2 ) 2 n/2 e n 1 2 2 ( xi )2 i 1 The log-likelihood function is n 1 ln L( , 2 ) ln(2 2 ) 2 2 2 n (x ) i 1 2 i 6 Now ln L( 2 ) 1 2 n (x ) i i 1 ln L( 2 ) n 1 2 4 2 2 2 n (x ) i 1 i 2 0 The solution to these equations yields the MLEs ˆ 1 n xi x n i 1 1 n ( xi x ) 2 n i 1 ˆ 2 Generally, we like the method of maximum likelihood because when n is large, (1) it results in estimators that are approximately unbiased, (2) the variance of a MLE is as small as or nearly as small as the variance that could be obtained with any other estimation technique, and (3) MLEs are approximately normally distributed. Furthermore, the MLE has an invariance property; that is, if ˆ is the MLE of , then the MLE of a function of , say h( ) , is the same function h(ˆ) of the MLE . There are also some other “nice” statistical properties that MLEs enjoy; see a book on mathematical statistics, such as Hogg and Craig (1978) or Bain and Engelhardt (1987). The unbiased property of the MLE is a “large-sample” or asymptotic property. To illustrate, consider the MLE for 2 in the normal distribution of example 2 above. We can easily show that E (ˆ 2 ) n 1 2 n Now the bias in estimation of 2 is E (ˆ 2 ) 2 n 1 2 2 2 n n Notice that the bias in estimating 2 goes to zero as the sample size n . Therefore, the MLE is an asymptotically unbiased estimator. The Method of Moments Estimation by the method of moments involves equating the origin moments of the probability distribution (which are functions of the unknown parameters) to the sample moments, and solving for the unknown parameters. We can define the first p sample moments as n M k x i 1 n k i , k 1, 2,..., p and the first p moments around the origin of the random variable x are just 7 k E ( x k ), k 1, 2,..., p Example 3 The Normal Distribution For the normal distribution the first two origin moments are 1 2 2 2 and the first two sample moments are M 1 x M 2 1 n 2 xi n i 1 Equating the sample and origin moments results in x 2 2 1 n 2 xi n i 1 The solution gives the moment estimators of and 2 : ˆ x ˆ 2 1 n ( xi x )2 n i 1 The method of moments often yields estimators that are reasonably good. For example, in the above example the moment estimators are identical to the MLEs. However, generally moment estimators are not as good as MLEs because they don’t have statistical properties that are as nice. For example, moment estimators usually have larger variances than MLEs. Least Squares Estimation The method of least squares is one of the oldest and most widely used methods of parameter estimation. Unlike the method of maximum likelihood and the method of moments, least squares can be employed when the distribution of the random variable is unknown. To illustrate, suppose that the simple location model can describe the random variable x: xi i , i 1, 2,..., n where the parameter is unknown and the i are random errors. We don’t know the distribution of the errors, but we can assume that they have mean zero and constant variance. The least squares estimator of is chosen so the sum of the squares of the model errors i is minimized. The least squares function for a sample of n observations x1 , x2 ,..., xn is 8 n L i2 i 1 n ( xi ) 2 i 1 Differentiating L and equating the derivative to zero results in the least squares estimator of : ˆ x In general, the least squares function will contain p unknown parameters and L will be minimized by solving the equations that result when the first partial derivatives of L with respect to the unknown parameters are equated to zero. These equations are called the least squares normal equations. The method of least squares dates from work by Karl Gauss in the early 1800s. It has a very well-developed and indeed quite elegant theory. For a discussion of the use of least squares in estimating the parameters in regression models and many illustrative examples, see Montgomery, Peck and Vining (2001), and for a very readable and concise presentation of the theory, see Myers and Milton (1991). S3-5. Proof That E ( S ) In Section S3-4 of the Supplemental Text Material we showed that the sample variance is an unbiased estimator of the population variance; that is, E (s 2 ) 2 , and that this result does not depend on the form of the distribution. However, the sample standard deviation is not an unbiased estimator of the population standard deviation. This is easy to demonstrate for the case where the random variable x follows a normal distribution. Let x have a normal distribution with mean and variance 2 , and let x1 , x2 ,..., xn be a random sample of size n from the population. Now the distribution of (n 1) s 2 2 is chi-square with n – 1 degrees of freedom, denoted n21 . Therefore the distribution of s2 is 2 /(n 1) times a n21 random variable. So when sampling from a normal distribution, the expected value of s2 is 9 2 2 E (s 2 ) E n 1 n 1 2 n 1 2 n 1 E ( n21 ) (n 1) 2 because the mean of a chi-square random variable with n – 1 degrees of freedom is n – 1. Now it follows that the distribution of (n 1) s is a chi distribution with n – 1 degrees of freedom, denoted n1 . The expected value of S can be written as E (s) E n 1 n 1 n 1 E ( n 1 ) The mean of the chi distribution with n –1 degrees of freedom is E ( n 1 ) 2 (n / 2) [(n 1) / 2] where the gamma function (r ) y r 1e y dy . Then 0 E ( s) 2 (n / 2) n 1 [(n 1) / 2] c4 The constant c4 is given in Appendix table VI. While s is a biased estimator of , the bias gets small fairly quickly as the sample size n increases. From Appendix table VI, note that c4 = 0.94 for a sample of n = 5, c4 = 0.9727 for a sample of n = 10, and c4 = 0.9896 or very nearly unity for a sample of n = 25. S3-6. More about Checking Assumptions in the t-Test The two-sample t-test can be presented from the viewpoint of a simple linear regression model. This is a very instructive way to think about the t-test, as it fits in nicely with the general notion of a factorial experiment with factors at two levels. This type of experiment is very important in process development and improvement, and is discussed extensively in Chapter 12. This also leads to another way to check assumptions in the t- 10 test. This method is equivalent to the normal probability plotting of the original data discussed in Chapter 3. We will use the data on the two catalysts in Example 3-9 to illustrate. In the two-sample t-test scenario, we have a factor x with two levels, which we can arbitrarily call “low” and “high”. We will use x = -1 to denote the low level of this factor (Catalyst 1) and x = +1 to denote the high level of this factor (Catalyst 2). The figure below is a scatter plot (from Minitab) of the yield data resulting from using the two catalysts shown in Table 3-2 of the textbook. Scatterplot of Yield vs Catalyst 98 97 96 Yield 95 94 93 92 91 90 89 -1.0 0.0 Catalyst -0.5 0.5 1.0 We will a simple linear regression model to this data, say yij 0 1 xij ij where 0 and 1 are the intercept and slope, respectively, of the regression line and the regressor or predictor variable is x1 j 1 and x2 j 1 . The method of least squares can be used to estimate the slope and intercept in this model. Assuming that we have equal sample sizes n for each factor level the least squares normal equations are: 2 n 2n 0 yij i 1 j 1 n n j 1 j 1 2n 1 y2 j y1 j 11 The solution to these equations is 0 y 1 2 1 ( y2 y1 ) Note that the least squares estimator of the intercept is the average of all the observations from both samples, while the estimator of the slope is one-half of the difference between the sample averages at the “high” and “low’ levels of the factor x. Below is the output from the linear regression procedure in Minitab for the catalyst data. Regression Analysis: Yield versus Catalyst The regression equation is Yield = 92.5 + 0.239 Catalyst Predictor Constant Catalyst Coef 92.4938 0.2387 S = 2.70086 SE Coef 0.6752 0.6752 R-Sq = 0.9% T 136.98 0.35 P 0.000 0.729 R-Sq(adj) = 0.0% Analysis of Variance Source Regression Residual Error Total DF 1 14 15 SS 0.912 102.125 103.037 MS 0.912 7.295 F 0.13 P 0.729 Notice that the estimate of the slope (given in the column labeled “Coef” and the row 1 1 labeled “Catalyst” above) is 0.2387 ( y2 y1 ) (92.7325 92.255) and the estimate 2 2 1 1 of the intercept is 92.4938 ( y2 y1 ) (93.7325 92.255) . Furthermore, notice that 2 2 the t-statistic associated with the slope is equal to 0.35, exactly the same value (apart from sign, because we subtracted the averages in the reverse order) we gave in the text. Now in simple linear regression, the t-test on the slope is actually testing the hypotheses H0 : 1 0 H0 : 1 0 and this is equivalent to testing H0 : 1 2 . It is easy to show that the t-test statistic used for testing that the slope equals zero in simple linear regression is identical to the usual two-sample t-test. Recall that to test the above hypotheses in simple linear regression the t-statistic is 12 t0 1 2 S xx 2 n where Sxx ( xij x ) 2 is the “corrected” sum of squares of the x’s. Now in our i 1 j 1 specific problem, x 0, x1 j 1 and x2 j 1, so S xx 2n. Therefore, since we have already observed that the estimate of is just sp, t0 ˆ1 ˆ 2 S xx 1 ( y2 y1 ) y y 2 2 1 1 2 sp sp 2n n This is the usual two-sample t-test statistic for the case of equal sample sizes. Most regression software packages will also compute a table or listing of the residuals from the model. The residuals from the Minitab regression model fit obtained above are as follows: Obs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Catalyst -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 -1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 Yield 91.500 94.180 92.180 95.390 91.790 89.070 94.720 89.210 89.190 90.950 90.460 93.210 97.190 97.040 91.070 92.750 Fit 92.255 92.255 92.255 92.255 92.255 92.255 92.255 92.255 92.733 92.733 92.733 92.733 92.733 92.733 92.733 92.733 SE Fit 0.955 0.955 0.955 0.955 0.955 0.955 0.955 0.955 0.955 0.955 0.955 0.955 0.955 0.955 0.955 0.955 Residual -0.755 1.925 -0.075 3.135 -0.465 -3.185 2.465 -3.045 -3.543 -1.783 -2.273 0.477 4.457 4.307 -1.663 0.017 St Resid -0.30 0.76 -0.03 1.24 -0.18 -1.26 0.98 -1.21 -1.40 -0.71 -0.90 0.19 1.76 1.70 -0.66 0.01 The column labeled “Fit” contains the predicted values of yield from the regression model, which just turn out to be the averages of the two samples. The residuals are in the sixth column of this table. They are just the differences between the observed values of yield and the corresponding predicted values. A normal probability plot of the residuals follows. 13 Normal Probability Plot of the Residuals (response is Yield) 99 95 90 Percent 80 70 60 50 40 30 20 10 5 1 -7.5 -5.0 -2.5 0.0 Residual 2.5 5.0 Notice that the residuals plot approximately along a straight line, indicating that there is no serious problem with the normality assumption in these data. This is equivalent to plotting the original yield data on separate probability plots as we did in Chapter 3. S3-7. Expected Mean Squares in the Single-Factor Analysis of Variance In section 3-5.2 we give the expected values of the mean squares for treatments and error in the single-factor analysis of variance (ANOVA). These quantities may be derived by straightforward application of the expectation operator. Consider first the mean square for treatments: E ( MSTreatments ) E SS F IJ G Ha 1 K Treatments Now for a balanced design (equal number of observations in each treatment) SSTreatments 1 a 2 1 2 yi . y.. n i 1 an and the single-factor ANOVA model is yij i ij i 1,2, , a R S Tj 1,2,, n In addition, we will find the following useful: 14 E ( ij ) E ( i . ) E ( .. ) 0, E ( ij2 ) 2 , E ( i2. ) n 2 , E ( ..2 ) an 2 Now E ( SSTreatments ) E ( 1 a 2 1 yi . ) E ( y..2 ) n i 1 an Consider the first term on the right hand side of the above expression: E( 1 a 2 1 a y ) E (n n i i . ) 2 i. n n i 1 i 1 Squaring the expression in parentheses and taking expectation results in E( a 1 a 2 1 2 2 y ) [ a ( n ) n i2 an 2 ] i. n n i 1 i 1 a an 2 n i2 a 2 i 1 because the three cross-product terms are all zero. Now consider the second term on the right hand side of E ( SSTreatments ) : 1 I 1 F G Han y J K an E (an n a E 2 .. i .. ) 2 i 1 a since i 1 E (an .. ) 2 an 0. Upon squaring the term in parentheses and taking expectation, we obtain i 1 E 1 I 1 F G Han y J K an [(an) 2 .. 2 an 2 ] an 2 2 since the expected value of the cross-product is zero. Therefore, E ( SSTreatments ) E ( 1 a 2 1 yi . ) E ( y..2 ) n i 1 an a an 2 n i2 a 2 (an 2 2 ) i 1 a 2 (a 1) n i2 i 1 Consequently the expected value of the mean square for treatments is 15 E ( MSTreatments ) E SS F IJ G Ha 1 K Treatments a 2 (a 1) n i2 i 1 a 1 a 2 n i2 i 1 a 1 This is the result given in the textbook. For the error mean square, we obtain SS E E ( MS E ) E N a a n 1 E ( yij yi. ) 2 N a i 1 j 1 a n 1 1 a E yij2 yi2. N a i 1 j 1 n i 1 Substituting the model into this last expression, we obtain 2 a n 1 1 a n 2 E ( MS E ) E ( i ij ) ( i ij ) N a i 1 j 1 n i 1 j 1 After squaring and taking expectation, this last equation becomes E ( MS E ) a a 1 2 2 2 2 N n N N n i2 a 2 i N a i 1 i 1 2 16
© Copyright 2026 Paperzz