Introduction to simple linear regression

IAPRI Quantitative Analysis
Capacity Building Series
Introduction to
simple linear regression
Outline
1. 
2. 
3. 
4. 
5. 
6. 
7. 
8. 
9. 
Motivation – what are we trying to measure?
The simple linear regression model
Interpretation: intercept & slope parameters
Why “linear” regression?
Ordinary least squares (OLS)
Sums of squares & R-squared
Unbiasedness vs. consistency
Assumptions for OLS to be unbiased
Assumptions for OLS to be BLUE &
consequences of violations
10.  Simple linear regression in Stata
1
Motivation
  y and x: variables representing some population
  Goal:
o  Explain y in terms of x
o  Study how y varies with changes in x
  Examples?
y
x
Maize yield
Fertilizer application
Beef demand
Beef price
Wheat acreage
Wheat price
2
The simple linear regression model
y = β 0 + β1 x + u
Simple (bivariate)
linear regression model
  y: dependent (explained) variable
  x: independent (explanatory) variable or regressor
  β’s (betas): parameters to be estimated
  u: error term or disturbance (unobserved)
  y, x, and u are random variables
3
Simple linear regression
y = β 0 + β1 x + u
1.  Relationship between y and x never exact
>> How to allow other factors to affect y?
  Captured by u
2.  What is the functional relationship
between y and x?
  If hold u fixed (Δu=0), then x has a linear
effect on y
Δy = β1Δx
4
Simple linear regression (cont.)
y = β 0 + β1 x + u
3.  Capturing a ceteris paribus effect of x on y?
  Restrict relationship between x and u:
1.  Assume E(u)=0
à NOT restrictive if intercept (β0)
2.  *** E(u|x)=E(u)=0 Zero conditional mean
(exogeneity)
EX) Land quality, LQ (u) & fertilizer kg/ha maize (x):
E(LQ|10) = E(LQ|100) = E(LQ|400) = E(LQ) = 0
  If fertilizer kg/ha chosen independently of
unobserved factors, then fertilizer kg/ha will not
depend on LQ and zero conditional mean will hold 5
Zero conditional mean
y = β 0 + β1 x + u
 If E(u|x)=E(u)=0, then
E( y | x) = β0 + β1x
∂E( y | x)
= β1
∂x
or
àE(y|x) is a linear
function of x
ΔE( y | x)
= β1
Δx
6
E(y|x) is a linear function of x
y E( y | x) = β0 + β1x
β1 slope
β0
intercept
1 2 3 x 7
Why “linear” regression?
y = β 0 + β1 x + u
  Linear in parameters
  Examples of models that are non-linear
in parameters?
1. y = 1 / ( β0 + β1x) + u
2. E( y | x) = Φ( β0 + β1x) where
Φ(.) is standard normal CDF
8
Regression line, fitted values
& residuals
y = β 0 + β1 x + u
Population model:
Regression line:
∂ ŷ
β̂1 =
∂x
ŷ = β̂0 + β̂1x
β̂0 = ŷ when x = 0
Fitted value for ith obs.:
Residual for ith obs.:
ŷi = β̂0 + β̂1xi
ûi = yi − ŷi = yi − β̂0 + β̂1xi
9
Fitted values & residuals
Source: Wooldridge (2002)
10
Ordinary Least Squares (OLS)
  How does OLS come up with estimates of
β0 and β1?
 Make the sum of squared residuals as small
as possible
N
∑
i=1
2
ûi
N
= ∑ ( yi − β̂0 −β̂1 xi )
2
i=1
11
OLS: minimize the sum of squared residuals
Source: Wooldridge (2002)
12
Total, explained, & residual
sum of squares, & R2
N
Total sum of squares:
Explained sum of squares:
SST ≡ ∑ ( yi − y)
i=1
N
SSE ≡ ∑ ( ŷi − y)
2
2
i=1
Residual sum of squares:
N
SSR ≡ ∑
SST = SSE+ SSR
i=1
2
ûi
Coefficient of determination:
2
R = SSE / SST = 1 − (SSR / SST )
13
My R2 is too low!
 Does a low R2 mean the regression
results are useless? Why or why not?
ŷ = β̂0 + β̂1x
β̂1 may still be good estimate of
ceteris paribus effect of x on y
even if R2 is low
14
Unbiasedness vs. Consistency
  Unbiased estimator: E(α̂ ) = α (alpha)
o  Finite sample concept
  Consistent estimator: plim(α̂ ) = α
o  Asymptotic (Nà∞) concept
o  Distribution tightens around α as N increases
o  As Nà∞, estimator collapses to α
o  Cliver Granger: “If you can’t get it right as N
goes to infinity, you shouldn’t be in this
business” (Wooldridge, 2002: 163)
o  Unbiasedness assumptions sufficient but not
necessary (i.e., stronger than) for consistency
15
Distribution tightens as Nà∞
Source: Wooldridge (2002)
16
Unbiasedness of OLS
  Necessary assumptions for unbiasedness of
OLS estimators of β’s?
SLR.1. Linear in parameters: y = β0 + β1 x + u
SLR.2. Random sampling
**SLR.3. Zero conditional mean (exogeneity):
E(u | x) = 0
SLR.4. Sample variation in x
N
β̂1 =
∑ (xi − x )( yi − y)
i=1
N
∑ (xi − x )
i=1
2
17
Can’t estimate slope parameter if no variation in x
Source: Wooldridge (2002)
18
Variance of OLS estimators
  SLR.5. Homoskedasticity (constant variance)
Var(u | x) = Var(u) = σ
2
o  σ2 (sigma squared) is the error variance
o  NOT necessary for unbiasedness of OLS
estimators of β’s
19
Simple regression - homoskedasticity
Source: Wooldridge (2002)
20
Simple regression - heteroskedasticity
Source: Wooldridge (2002)
21
Variance of OLS estimator (cont.)
  Under SLR.1 through SLR.5
(homoskedasticity):
Var( β̂1 ) =
σ
N
2
∑ (xi − x )
2
i=1
  If heteroskedasticity, formula more
complicated
à s.e.’s based on “regular” formula WRONG
(biased & inconsistent)
22
Gauss-Markov Theorem
SLR.1/MLR.1 Linear in parameters
SLR.2/MLR.2 Random sampling
SLR.3/MLR.3 Zero conditional mean (exogeneity)
SLR.4 Sample variation in x /
MRL.4 No perfect collinearity
SLR.5/MLR.5 Homoskedasticity (constant variance)
Gauss-Markov Theorem:
Under these 5 assumptions, OLS is BLUE
23
OLS is BLUE
B Best (most efficient, i.e.,
smallest variance)
  Heteroskedasticity à OLS is inefficient
L Linear (in parameters)
U Unbiased
E Estimator
  Violate zero conditional
mean/exogeneity
à OLS is biased
24
Basic Stata Commands
  regress y x
Linear regression of y on x
  predict newvar1, xb Compute fitted values
  predict newvar2, resid
Compute residuals
25
REFERENCES
(**main reference)
  Beaver, M. 2010. Stata 11 – Sample Session: Cross-Sectional
Analysis Short Course Training Materials Designing Policy Relevant
Research and Data Processing and Analysis with Stata 11, 1st
Edition. East Lansing, Michigan: Department of Agricultural
Economics, Michigan State University.
  Myers, R. J. 2006. “Simple Linear Regression”. Handout for
Agricultural Economics 835: Introductory Econometrics. East
Lansing, Michigan: Department of Agricultural Economics,
Michigan State University.
  **Wooldridge, J. M. 2002. Introductory Econometrics: A Modern
Approach, Second Edition. Cincinnati, OH: South-Western College
Publishing.
  Wooldridge, J. M. 2006. “Rudiments of Stata”. Handout for
Economics 823: Applied Econometrics. East Lansing, Michigan:
Department of Economics, Michigan State University.
Session materials prepared by Nicole Mason with input from Bill Burke.
26
December 2011. [email protected]