Chap10: SUMMARIZING DATA

Ch14: Linear Least Squares
14.1: INTRO:
Fitting a pth-order polynomial will require finding
(p+1) coefficients from the data. Thus, a straight
line (p=1) is obtained thru its slope and intercept.
LS (Least Squares) method finds parameters by
minimizing the sum of the squared deviations of
the fitted values from the actual observations.
Predicting y (response=dependent)
from x (predictor=independent):
Formula:
n
Choose 1 ( slope ) and  0 (int ercept ) that min imize S (  0 , 1 )    y i  (  0  1 xi )
2
i 1
 n 2  n   n  n

  xi   y i     xi   xi y i 

  y  ˆ x
 0 solves
S (  0 , 1 )  0  ˆ0   i 1  i 1   i 1 2 i 1
1
n
n
 0


n xi2    xi 
i 1
 i 1 
 n  n 
n xi y i    xi   y i 

 i 1  i 1  
1 solves
S (  0 , 1 )  0  ˆ1  i 1
2
n
n
1


n xi2    xi 
i 1
 i 1 
n
n
 (x
i 1
i
n
 x )( y i  y )
2
(
x

x
)
 i
i 1
14.2: Simple Linear Regression
(linear in the parameters)
Regression is NOT fitting line but E(Y|X=x)
Y

Y
Examples : 
Y
Y

  0  1 sin( X )   is linear
  0 e 1 X   is NONLINEAR
  0  1 X 2   is linear
 sin(  0  1 X )   is NONLINEAR
14.2.1 Properties of the estimated slope & Intercept
yi   0  1 xi  ei for i  1,..., n
xi fixed and ei   N (0,  )


Theorem A: E (  0 )   0 and E ( 1 )  1
iid
2
Variance-Covariance of the beta’s :
Under the assumptions of Theorem A:
n
 
Var ˆ0 
 2  xi2
i 1


n x    xi 
i 1
 i 1 
n
n
2
 
; Var ˆ1 
2
i


  2  xi
i 1


n x    xi 
i 1
 i 1 
n
n
2
i


n x    xi 
i 1
 i 1 
n
n
2
i
n
Co var ˆ0 , ˆ1 
n 2
2
2
Inferences about the beta’s:
In the previous result,
RSS
1.  can be estimated by s 
(UNBIASED)
n2
2
2
n
where RSS   ( y i  ˆ0  ˆ1 xi ) 2 is the Re sidual Sum of Squares
i 1
2. Confidence Intervals & Hypothesis Tests are possible
ˆi   i
via
~ t n  2 for i  0,1 (use t  tables )
s ˆ
i
where s ˆ are the s tan dard deviation estimates of ˆi (i  0,1)
i
when replacing  2 by its unbiased estimate s 2
14.2.2: Assessing the Fit
Recall, that the residuals are the differences between
the observed and the fitted values:
eˆi

residuals

yi  ( ˆ0  ˆ1 xi )



observed
values
fitted
values
Residuals are to be plotted versus the x-values.
Ideal: plot should look like a horizontal blur; that is to say
that one can reasonably model it as linear.
Caution: the errors have zero mean and are said to be
homoscedastic = constant variance & independently
2
of the predicator x. That is to say: ei ~ N (0,  )
Steps in Linear Regression:
1. Fit the Regression Model (Mathematics)
–
–
–
–
–
Pick a method: Least Squares or else
Plot the data Y versus g(x)
Compute regression estimates & residuals
Check for linearity & outliers (plot residuals)
More diagnostics (beyond the scoop of this class)
2. Statistical Inference (Statistics)
Check for error assumptions ei ~ N (0,  )
Check for normality (if not transform data)
If nonlinear form, then (beyond the scoop of this class)
Least Squares Java applet:
http://www.math.tamu.edu/FiniteMath/Classes/LeastSquares/
LeastSquares.html
–
–
–
2
14.2.3: Correlation & Regression
A close relation exists between Correlation Analysis &
Fitting straight lines by the Least Squares method.
r
s xy
(correlatio n coefficient between x & y )
s xx  s yy
n
where s xy   ( xi  x )( y i  y )
i 1
n
n
i 1
i 1
s xx   ( xi  x ) 2 and s yy   ( y i  y ) 2
Pr oposition : Zero Slope  Zero Correlatio n
s xy
s xx
ˆ
ˆ
because r  1
where 1 
( slope )
s yy
s xx
14.3: Matrix approach to
Linear Least Squares
We’ve already fitted straight lines (p=1).
What if p > 1 ?  Investigate some Linear Algebra tools
Model : Y  
X
 e (matrix form  Compact Notation!!)

n1
n p
p1
n1
Y is the vector of observations y i   0   1 xi1  ...   p 1 xi , p 1  ei (i  1,..., n)
 is the vector of unknown parameters  0 , 1 ,...,  p 1
e is the vector of errors (TBA in Section 14.4.2)
1 x11 ... x1, p 1 
1 x

....
x
21
2 , p 1 

. 
X is the n  p matrix given by : X   . . ....


.
.
....
.


1 x n1 ..... x n , p 1 


Formulation
of the Least Squares problem:
Vector of fitted or predicted values : Yˆ  X
Find the vector  that min imizes :
2
S (  )  Y  Yˆ


  y i   0   1 xi1  ...   p 1 xi , p 1 
n
2
i 1
 u1 
 
n
2
 u2 
2
2
2
where u     u  u1  u 2  ...  u n   u i2
....
i 1
 
u 
 n
Differentiating S with respect to each and every  k and
Setting these derivatives to zeroes  Normal Equations X T Xˆ  X T Y ( p equations )

 ˆ  X T X

1
X T Y as long as X T X is a non sin gular matrix  rank ( X )  p.
Alternative methods are :
QR method ( Exercise #6 pg 554) & Cholesky Decomposit ion ( Exercise #7 pg 554)
14.4: Statistical Properties
of Least Squares Estimates
14.4.1:
Vector-valued Random Variables
Y
E (Y )
 1

 1 
1 
 

  
 Y2 
 E (Y2 )    2 
Y    is a random vector with mean vector E (Y )  
   ....    Y
....
....
 

  
Y 
 E (Y )    
n 
 n

 n
 11  12 .  1n 


.
.
.
   (n  n symmetric matrix)
and cov ariance matrix Var (Y )   21
YY
 .
.
.
. 



.
.

nn 
 n1
1. let Z
AY
  c  
 be a random vector ,
m1
m1
mn n1
where c is a fixed vector and A is a fixed linear transformation of Y
Then E ( Z )  c  AE (Y ) and Var ( Z )  A YY AT   ZZ
2. let X be a random n vector , with mean  and cov ariance 
 E ( X T AX )  trace( A)   T A , where A is a fixed matrix
Cross-covariance matrix:
let X be a random vector , with cov ariance matrix 
If Y  
A X and Z  
BX,
p n
mn
then the cross  cov ariance matrix of Y and Z is  YZ  A XX B T



p m
where A and B are fixed matrices.
Applicatio n : let X be a random vector , with E ( X )  1 and  XX   2 I
let Y  X and Z  vector with i th element X i  X
1
1


That is Y  1T X and Z   I  11T  X
n
n


0 
 
1 T  2  1  0 

Then  ZY   I  11   I  1   is an n  1 matrix of zeroes
 
n

 n   ... 
n1
0 
 
 
Thus, the mean X is uncorrelat ed with each of X i  X for i  1,..., n
14.4.2: Mean and Covariance
of Least Squares Estimates
Let e  vector of errors with E (e)  0 and  ee   2 I
 E (ei )  0 , Var (ei )   2 and Cov(ei , e j )  0 for i  j
The mod el Y  
X
 e can be viewed as :

n1
n p
p1
n1
Measurements  True Values ( fixed not random)
 Errors (random  uncorrelat ed with cons tan t var iance )
Theorem A : E (e)  0  E ( ˆ )   (the LSE ˆ are unbiased )
Theorem B : E (e)  0 and  ee   2 I
  ˆˆ   ( X X ) is the cov ariance matrix of the LSE ˆ
2
T
1
14.4.3: Estimation of the common
variance for the random errors
In order to make inference about  , one must get an
estimate of the parameter  2 (if unknown).

Lemma A : The n  n  projection  matrix P  X X X
T

1
X T satisfies :
P  P T  P 2 and I  P  I  P   I  P 
where the vector of residuals : eˆ  Y  Yˆ  Y  Xˆ  Y  PY  ( I  P)Y
T
2
Theorem A : Under the assumptions that E (e)  0 and  ee   2 I
Y  Yˆ
2
RSS
s 

is an unbiased estimate of  2
n p
n p
2
14.4.4: Residuals
& Standardized Residuals
The vector of residuals : eˆ  Y  Yˆ  Y  PY  ( I  P)Y
 
  eˆeˆ  ( I  P)  I ( I  P)   ( I  P) via Lemma A
Yi  Yˆi
th
Definition : the i s tan dardized residual is :
s 1  pii
2
T
2
th
where pii is the i diagonal element of P.
Theorem A :
 
E (e)   2 I   eˆYˆ  ( I  P)  2 I P T   2 ( P T  PP T )  0
That is the residuals & the fitted values are uncorrelat ed
14.4.5: Inference about 
Recall Section 14.4 for the statistical properties of the
Least Squares Estimates ˆ with some additional
iid
2
assumptions about the errors being ei  N (0,  )
 each component ˆi ~ N (  i ,  2 cii ) where C  ( X T X ) 1
ˆi   i
and
~ t n  p where s ˆ  s cii
i
s ˆ
i
 
ˆ
1. a 100(1   )% CI for  i is  i  t n  p    s ˆ
i
2
ˆi   i 0
2. The statistic t 
can be used to test H 0 :  i   i 0 ( fixed # )
s ˆ
i
Exercise : Test for H 0 :  i  0 vs H A :  i  0 (Under H 0 , t ~ t n  p )
14.5: Multiple Linear Regression
This section will generalize Section 14.2 (Simple Linear
Regression) by doing the Multiple Linear Regression
thru an example of polynomial regression.
Y is the vector of observations y i   0  1 xi1  ...   p 1 xi , p 1  ei
  vector of unknown parameters  0 , 1 ,...,  p 1
Interpretation of the  i ? (i  1,..., n)
 k is the change in the exp ected value of y
if x k increases by one unit while other x' s are fixed
The ei are independent random var iables
with E (ei )  0 and Var (ei )   2
Polynomial Re gression : let xi 2  xi21 , xi 3  xi31 etc....