IAPRI Quantitative Analysis Capacity Building Series Introduction to simple linear regression Outline 1. 2. 3. 4. 5. 6. 7. 8. 9. Motivation – what are we trying to measure? The simple linear regression model Interpretation: intercept & slope parameters Why “linear” regression? Ordinary least squares (OLS) Sums of squares & R-squared Unbiasedness vs. consistency Assumptions for OLS to be unbiased Assumptions for OLS to be BLUE & consequences of violations 10. Simple linear regression in Stata 1 Motivation y and x: variables representing some population Goal: o Explain y in terms of x o Study how y varies with changes in x Examples? y x Maize yield Fertilizer application Beef demand Beef price Wheat acreage Wheat price 2 The simple linear regression model y = β 0 + β1 x + u Simple (bivariate) linear regression model y: dependent (explained) variable x: independent (explanatory) variable or regressor β’s (betas): parameters to be estimated u: error term or disturbance (unobserved) y, x, and u are random variables 3 Simple linear regression y = β 0 + β1 x + u 1. Relationship between y and x never exact >> How to allow other factors to affect y? Captured by u 2. What is the functional relationship between y and x? If hold u fixed (Δu=0), then x has a linear effect on y Δy = β1Δx 4 Simple linear regression (cont.) y = β 0 + β1 x + u 3. Capturing a ceteris paribus effect of x on y? Restrict relationship between x and u: 1. Assume E(u)=0 à NOT restrictive if intercept (β0) 2. *** E(u|x)=E(u)=0 Zero conditional mean (exogeneity) EX) Land quality, LQ (u) & fertilizer kg/ha maize (x): E(LQ|10) = E(LQ|100) = E(LQ|400) = E(LQ) = 0 If fertilizer kg/ha chosen independently of unobserved factors, then fertilizer kg/ha will not depend on LQ and zero conditional mean will hold 5 Zero conditional mean y = β 0 + β1 x + u If E(u|x)=E(u)=0, then E( y | x) = β0 + β1x ∂E( y | x) = β1 ∂x or àE(y|x) is a linear function of x ΔE( y | x) = β1 Δx 6 E(y|x) is a linear function of x y E( y | x) = β0 + β1x β1 slope β0 intercept 1 2 3 x 7 Why “linear” regression? y = β 0 + β1 x + u Linear in parameters Examples of models that are non-linear in parameters? 1. y = 1 / ( β0 + β1x) + u 2. E( y | x) = Φ( β0 + β1x) where Φ(.) is standard normal CDF 8 Regression line, fitted values & residuals y = β 0 + β1 x + u Population model: Regression line: ∂ ŷ β̂1 = ∂x ŷ = β̂0 + β̂1x β̂0 = ŷ when x = 0 Fitted value for ith obs.: Residual for ith obs.: ŷi = β̂0 + β̂1xi ûi = yi − ŷi = yi − β̂0 + β̂1xi 9 Fitted values & residuals Source: Wooldridge (2002) 10 Ordinary Least Squares (OLS) How does OLS come up with estimates of β0 and β1? Make the sum of squared residuals as small as possible N ∑ i=1 2 ûi N = ∑ ( yi − β̂0 −β̂1 xi ) 2 i=1 11 OLS: minimize the sum of squared residuals Source: Wooldridge (2002) 12 Total, explained, & residual sum of squares, & R2 N Total sum of squares: Explained sum of squares: SST ≡ ∑ ( yi − y) i=1 N SSE ≡ ∑ ( ŷi − y) 2 2 i=1 Residual sum of squares: N SSR ≡ ∑ SST = SSE+ SSR i=1 2 ûi Coefficient of determination: 2 R = SSE / SST = 1 − (SSR / SST ) 13 My R2 is too low! Does a low R2 mean the regression results are useless? Why or why not? ŷ = β̂0 + β̂1x β̂1 may still be good estimate of ceteris paribus effect of x on y even if R2 is low 14 Unbiasedness vs. Consistency Unbiased estimator: E(α̂ ) = α (alpha) o Finite sample concept Consistent estimator: plim(α̂ ) = α o Asymptotic (Nà∞) concept o Distribution tightens around α as N increases o As Nà∞, estimator collapses to α o Cliver Granger: “If you can’t get it right as N goes to infinity, you shouldn’t be in this business” (Wooldridge, 2002: 163) o Unbiasedness assumptions sufficient but not necessary (i.e., stronger than) for consistency 15 Distribution tightens as Nà∞ Source: Wooldridge (2002) 16 Unbiasedness of OLS Necessary assumptions for unbiasedness of OLS estimators of β’s? SLR.1. Linear in parameters: y = β0 + β1 x + u SLR.2. Random sampling **SLR.3. Zero conditional mean (exogeneity): E(u | x) = 0 SLR.4. Sample variation in x N β̂1 = ∑ (xi − x )( yi − y) i=1 N ∑ (xi − x ) i=1 2 17 Can’t estimate slope parameter if no variation in x Source: Wooldridge (2002) 18 Variance of OLS estimators SLR.5. Homoskedasticity (constant variance) Var(u | x) = Var(u) = σ 2 o σ2 (sigma squared) is the error variance o NOT necessary for unbiasedness of OLS estimators of β’s 19 Simple regression - homoskedasticity Source: Wooldridge (2002) 20 Simple regression - heteroskedasticity Source: Wooldridge (2002) 21 Variance of OLS estimator (cont.) Under SLR.1 through SLR.5 (homoskedasticity): Var( β̂1 ) = σ N 2 ∑ (xi − x ) 2 i=1 If heteroskedasticity, formula more complicated à s.e.’s based on “regular” formula WRONG (biased & inconsistent) 22 Gauss-Markov Theorem SLR.1/MLR.1 Linear in parameters SLR.2/MLR.2 Random sampling SLR.3/MLR.3 Zero conditional mean (exogeneity) SLR.4 Sample variation in x / MRL.4 No perfect collinearity SLR.5/MLR.5 Homoskedasticity (constant variance) Gauss-Markov Theorem: Under these 5 assumptions, OLS is BLUE 23 OLS is BLUE B Best (most efficient, i.e., smallest variance) Heteroskedasticity à OLS is inefficient L Linear (in parameters) U Unbiased E Estimator Violate zero conditional mean/exogeneity à OLS is biased 24 Basic Stata Commands regress y x Linear regression of y on x predict newvar1, xb Compute fitted values predict newvar2, resid Compute residuals 25 REFERENCES (**main reference) Beaver, M. 2010. Stata 11 – Sample Session: Cross-Sectional Analysis Short Course Training Materials Designing Policy Relevant Research and Data Processing and Analysis with Stata 11, 1st Edition. East Lansing, Michigan: Department of Agricultural Economics, Michigan State University. Myers, R. J. 2006. “Simple Linear Regression”. Handout for Agricultural Economics 835: Introductory Econometrics. East Lansing, Michigan: Department of Agricultural Economics, Michigan State University. **Wooldridge, J. M. 2002. Introductory Econometrics: A Modern Approach, Second Edition. Cincinnati, OH: South-Western College Publishing. Wooldridge, J. M. 2006. “Rudiments of Stata”. Handout for Economics 823: Applied Econometrics. East Lansing, Michigan: Department of Economics, Michigan State University. Session materials prepared by Nicole Mason with input from Bill Burke. 26 December 2011. [email protected]
© Copyright 2026 Paperzz