Ch14: Linear Least Squares 14.1: INTRO: Fitting a pth-order polynomial will require finding (p+1) coefficients from the data. Thus, a straight line (p=1) is obtained thru its slope and intercept. LS (Least Squares) method finds parameters by minimizing the sum of the squared deviations of the fitted values from the actual observations. Predicting y (response=dependent) from x (predictor=independent): Formula: n Choose 1 ( slope ) and 0 (int ercept ) that min imize S ( 0 , 1 ) y i ( 0 1 xi ) 2 i 1 n 2 n n n xi y i xi xi y i y ˆ x 0 solves S ( 0 , 1 ) 0 ˆ0 i 1 i 1 i 1 2 i 1 1 n n 0 n xi2 xi i 1 i 1 n n n xi y i xi y i i 1 i 1 1 solves S ( 0 , 1 ) 0 ˆ1 i 1 2 n n 1 n xi2 xi i 1 i 1 n n (x i 1 i n x )( y i y ) 2 ( x x ) i i 1 14.2: Simple Linear Regression (linear in the parameters) Regression is NOT fitting line but E(Y|X=x) Y Y Examples : Y Y 0 1 sin( X ) is linear 0 e 1 X is NONLINEAR 0 1 X 2 is linear sin( 0 1 X ) is NONLINEAR 14.2.1 Properties of the estimated slope & Intercept yi 0 1 xi ei for i 1,..., n xi fixed and ei N (0, ) Theorem A: E ( 0 ) 0 and E ( 1 ) 1 iid 2 Variance-Covariance of the beta’s : Under the assumptions of Theorem A: n Var ˆ0 2 xi2 i 1 n x xi i 1 i 1 n n 2 ; Var ˆ1 2 i 2 xi i 1 n x xi i 1 i 1 n n 2 i n x xi i 1 i 1 n n 2 i n Co var ˆ0 , ˆ1 n 2 2 2 Inferences about the beta’s: In the previous result, RSS 1. can be estimated by s (UNBIASED) n2 2 2 n where RSS ( y i ˆ0 ˆ1 xi ) 2 is the Re sidual Sum of Squares i 1 2. Confidence Intervals & Hypothesis Tests are possible ˆi i via ~ t n 2 for i 0,1 (use t tables ) s ˆ i where s ˆ are the s tan dard deviation estimates of ˆi (i 0,1) i when replacing 2 by its unbiased estimate s 2 14.2.2: Assessing the Fit Recall, that the residuals are the differences between the observed and the fitted values: eˆi residuals yi ( ˆ0 ˆ1 xi ) observed values fitted values Residuals are to be plotted versus the x-values. Ideal: plot should look like a horizontal blur; that is to say that one can reasonably model it as linear. Caution: the errors have zero mean and are said to be homoscedastic = constant variance & independently 2 of the predicator x. That is to say: ei ~ N (0, ) Steps in Linear Regression: 1. Fit the Regression Model (Mathematics) – – – – – Pick a method: Least Squares or else Plot the data Y versus g(x) Compute regression estimates & residuals Check for linearity & outliers (plot residuals) More diagnostics (beyond the scoop of this class) 2. Statistical Inference (Statistics) Check for error assumptions ei ~ N (0, ) Check for normality (if not transform data) If nonlinear form, then (beyond the scoop of this class) Least Squares Java applet: http://www.math.tamu.edu/FiniteMath/Classes/LeastSquares/ LeastSquares.html – – – 2 14.2.3: Correlation & Regression A close relation exists between Correlation Analysis & Fitting straight lines by the Least Squares method. r s xy (correlatio n coefficient between x & y ) s xx s yy n where s xy ( xi x )( y i y ) i 1 n n i 1 i 1 s xx ( xi x ) 2 and s yy ( y i y ) 2 Pr oposition : Zero Slope Zero Correlatio n s xy s xx ˆ ˆ because r 1 where 1 ( slope ) s yy s xx 14.3: Matrix approach to Linear Least Squares We’ve already fitted straight lines (p=1). What if p > 1 ? Investigate some Linear Algebra tools Model : Y X e (matrix form Compact Notation!!) n1 n p p1 n1 Y is the vector of observations y i 0 1 xi1 ... p 1 xi , p 1 ei (i 1,..., n) is the vector of unknown parameters 0 , 1 ,..., p 1 e is the vector of errors (TBA in Section 14.4.2) 1 x11 ... x1, p 1 1 x .... x 21 2 , p 1 . X is the n p matrix given by : X . . .... . . .... . 1 x n1 ..... x n , p 1 Formulation of the Least Squares problem: Vector of fitted or predicted values : Yˆ X Find the vector that min imizes : 2 S ( ) Y Yˆ y i 0 1 xi1 ... p 1 xi , p 1 n 2 i 1 u1 n 2 u2 2 2 2 where u u u1 u 2 ... u n u i2 .... i 1 u n Differentiating S with respect to each and every k and Setting these derivatives to zeroes Normal Equations X T Xˆ X T Y ( p equations ) ˆ X T X 1 X T Y as long as X T X is a non sin gular matrix rank ( X ) p. Alternative methods are : QR method ( Exercise #6 pg 554) & Cholesky Decomposit ion ( Exercise #7 pg 554) 14.4: Statistical Properties of Least Squares Estimates 14.4.1: Vector-valued Random Variables Y E (Y ) 1 1 1 Y2 E (Y2 ) 2 Y is a random vector with mean vector E (Y ) .... Y .... .... Y E (Y ) n n n 11 12 . 1n . . . (n n symmetric matrix) and cov ariance matrix Var (Y ) 21 YY . . . . . . nn n1 1. let Z AY c be a random vector , m1 m1 mn n1 where c is a fixed vector and A is a fixed linear transformation of Y Then E ( Z ) c AE (Y ) and Var ( Z ) A YY AT ZZ 2. let X be a random n vector , with mean and cov ariance E ( X T AX ) trace( A) T A , where A is a fixed matrix Cross-covariance matrix: let X be a random vector , with cov ariance matrix If Y A X and Z BX, p n mn then the cross cov ariance matrix of Y and Z is YZ A XX B T p m where A and B are fixed matrices. Applicatio n : let X be a random vector , with E ( X ) 1 and XX 2 I let Y X and Z vector with i th element X i X 1 1 That is Y 1T X and Z I 11T X n n 0 1 T 2 1 0 Then ZY I 11 I 1 is an n 1 matrix of zeroes n n ... n1 0 Thus, the mean X is uncorrelat ed with each of X i X for i 1,..., n 14.4.2: Mean and Covariance of Least Squares Estimates Let e vector of errors with E (e) 0 and ee 2 I E (ei ) 0 , Var (ei ) 2 and Cov(ei , e j ) 0 for i j The mod el Y X e can be viewed as : n1 n p p1 n1 Measurements True Values ( fixed not random) Errors (random uncorrelat ed with cons tan t var iance ) Theorem A : E (e) 0 E ( ˆ ) (the LSE ˆ are unbiased ) Theorem B : E (e) 0 and ee 2 I ˆˆ ( X X ) is the cov ariance matrix of the LSE ˆ 2 T 1 14.4.3: Estimation of the common variance for the random errors In order to make inference about , one must get an estimate of the parameter 2 (if unknown). Lemma A : The n n projection matrix P X X X T 1 X T satisfies : P P T P 2 and I P I P I P where the vector of residuals : eˆ Y Yˆ Y Xˆ Y PY ( I P)Y T 2 Theorem A : Under the assumptions that E (e) 0 and ee 2 I Y Yˆ 2 RSS s is an unbiased estimate of 2 n p n p 2 14.4.4: Residuals & Standardized Residuals The vector of residuals : eˆ Y Yˆ Y PY ( I P)Y eˆeˆ ( I P) I ( I P) ( I P) via Lemma A Yi Yˆi th Definition : the i s tan dardized residual is : s 1 pii 2 T 2 th where pii is the i diagonal element of P. Theorem A : E (e) 2 I eˆYˆ ( I P) 2 I P T 2 ( P T PP T ) 0 That is the residuals & the fitted values are uncorrelat ed 14.4.5: Inference about Recall Section 14.4 for the statistical properties of the Least Squares Estimates ˆ with some additional iid 2 assumptions about the errors being ei N (0, ) each component ˆi ~ N ( i , 2 cii ) where C ( X T X ) 1 ˆi i and ~ t n p where s ˆ s cii i s ˆ i ˆ 1. a 100(1 )% CI for i is i t n p s ˆ i 2 ˆi i 0 2. The statistic t can be used to test H 0 : i i 0 ( fixed # ) s ˆ i Exercise : Test for H 0 : i 0 vs H A : i 0 (Under H 0 , t ~ t n p ) 14.5: Multiple Linear Regression This section will generalize Section 14.2 (Simple Linear Regression) by doing the Multiple Linear Regression thru an example of polynomial regression. Y is the vector of observations y i 0 1 xi1 ... p 1 xi , p 1 ei vector of unknown parameters 0 , 1 ,..., p 1 Interpretation of the i ? (i 1,..., n) k is the change in the exp ected value of y if x k increases by one unit while other x' s are fixed The ei are independent random var iables with E (ei ) 0 and Var (ei ) 2 Polynomial Re gression : let xi 2 xi21 , xi 3 xi31 etc....
© Copyright 2026 Paperzz