Inference for Regression Equations In a beginning course in statistics, most often, the computational formulas for inference in regression settings are simply given to the students. Some attempt is made to illustrate why the components of the formula make sense, but the derivations are “beyond the scope of the course”. For advanced students, however, these formulas become applications of the expected value theorems studied earlier in the year. To derive the regression inference equations, students must remember that Var kX k 2 Var ( X ) , and, when X and Y are independent, Var X Y Var ( X ) Var Y and Var XY Var ( X )Var Y . Finally, Var X n Var X . n In addition, the Modeling Assumptions for Regression are: 1. There is a normally distributed subpopulation of responses for each value of the explanatory variable. These subpopulations all have a common variance. So, y | x ~ N ( y| x , e ) . 2. The means of the subpopulations fall on a straight-line function of the explanatory variable. This means that y| x x and that ŷ a bx estimates the mean response for a given value of the explanatory variable. Another way to describe this is to say that Y X with ~ N 0, e . Graphical Representation of regression assumptions 3. The selection of an observation from any of the subpopulations is independent of the selection of any other observation. The values of the explanatory variable are assumed to be fixed. This fixed (and known) value for the independent variable is essential for developing the formulae. The key to understanding the various standard errors for regression is to realize that the variation of interest comes from the distribution of y around y| x . This is ~ N 0, e . From our initial work on regression, we saw that ŷ a bx and ŷ y b x x . Now, if we let X i xi x XY and Y y y , then b . X i i i i i 2 i i equations originate with this computational formula for b. All of the regression To see that this is true, consider ŷ y b x x . In this form, we have a one variable problem. Since we know all the individual values of x and y, and, consequently, the means x and y , we can use first semester calculus to solve for b. Define 2 n n 2 S yi yˆi yi y b xi x . Now, let X i xi x and Yi yi y , so i 1 i 1 2 n S Yi bX i . Find the value of b that minimizes S. i 1 n dS dS 0 , then 2 Yi bX i X i . If db db i 1 i X iYi n n 2 b X i X iYi and b . i 1 i 1 X i2 X Y bX 0 . n i i i 1 2 i Solving for b, we find i The Standard Error for the Slope To compute a confidence interval for , we need to determine the variance of b, using the expected value theorems. XY XY . Since b , we compute Var b Var X X i i i i i i 2 i i assumed to be fixed, 2 i X Since the values of X are i 2 in the denominator is a constant. So, X iYi Var i 2 Xi i 1 Var X iYi 2 i X2 i i and Var X iYi Var X 1Y1 X 2Y2 X nYn . The X’s are constants and we are i interested in the variation of Yi for the given X i , which is the common variance e2 . So, Var X1Y1 X 2Y2 X nYn X 12Var Y1 X 22Var Y2 X n2Var Yn e2 X i2 . i X iYi Putting it all together, we find Var b Var i 2 Xi i is often written as Var b e2 x x 2 X i2 2 2 i e . This e 2 X i2 2 i Xi i . i i So, the standard error for the slope in regression can be estimated by se se or sbˆ . sbˆ 2 n 1 s x xi x i The Standard Error for ŷ , the Predicted Mean Confidence intervals for a predicted mean can now be obtained. The standard error can be determined by computing Var yˆ . We know that ŷ y b x x , so, as before, using the expected value theorems, we find Var yˆ Var y b x x Var y x x Var b , 2 with Var b e2 x x i 2 yi and Var y Var i n 2 2 1 Var y n e e . i n2 n2 n i So, Var yˆ Var y b x x e2 n e2 x x 2 x x 2 . i The standard error for predicting a mean response for a given value of x can be estimated x x . 1 n xi x 2 2 by s yˆ se The Standard Error for the Intercept The variance of the intercept a can be estimated using the previous formula for the standard error for ŷ . Since ŷ a bx , the variance of a is the variance of ŷ when x 1 . n xi x 2 2 x 0 . So, sa se The Standard Error for a Predicted Value Finally, to predict a y-value, y p , for a given x, we need to consider two independent errors. We know that y is normally distributed around y| x so y | x ~ N ( y| x , e ) . Given y| x , we can estimate our error in predicting y. But, as we have just seen, there is also variation in our predictions of y| x . First, we predict ŷ taking into account its own variation and then we use that prediction in predicting y. So 2 2 e2 x x 2 1 x x 2 2 e e e 1 . Var y p Var yˆ Var n x x 2 n x x 2 i i The standard error for this prediction can be estimated with 2 1 x x . s y p s 1 n x x 2 i 2 e Now we have all the equations found in the texts. Standard error for Slope: sbˆ se n 1 sx s yˆ se x x 1 n xi x 2 Standard error for the Intercept: sa se x 1 n xi x 2 Standard error for a Predicted Value: 2 1 x x s y p s 1 n x x 2 i Standard error for Predicted Mean: 2 2 2 e Reference: Kennedy, Joh B. and Adam M. Neville, Basic Statistical Methods for Engineers and Scientists, 3rd, Harper and Row, 1986.
© Copyright 2026 Paperzz