The linear model The linear regression model Assume that X and Y are two random variables obtained from a population characterized by a bivariate distribution. We are interested in the features of this (unobserved) joint distribution. However, studying multivariate distributions is nontrivial. Instead, we consider a conditional analysis of the joint distribution of X and Y by estimating 'conditional' population means of the form E (Y |Xi ) = µY |X . Consider estimating the unknown parameters of i Yi = β 1 + β 2 Xi + Ui , i = 1, . . . ,n , characterizing the average linear association of the variables in the underlying bivariate distribution. Given that the linear relationship only holds on average for joint random variables X and Y , the model includes a random error component U . christine Jan Roestel (CAU) Crash Course Winter term 2016 1 / 15 The linear model We have n observations, (Xi ,Yi ), i = 1, . . . ,n. X is the independent variable or regressor. Y is the dependent variable. β0 is the intercept. β1 is the slope. ui is the regression error or disturbance. The regression error consists of omitted factors. In general, these omitted factors are factors that inuence Y other than the variable X . The regression error also includes errors in the measurement of Y . christine Jan Roestel (CAU) Crash Course Winter term 2016 2 / 15 The linear model General model with k regressors (including a constant term): Yi = β1 + β2 xi 2 + β3 xi 3 + ... + βk xik + Ui , i = 1,..,n. In terms of matrix algebra: y Notation: y(n×1) ; X(n×k ) ; = β(k ×1) ; Xβ + u, u(n×1) . In detail: 1 y2 1 y3 = 1 y1 ... yn ... 1 x12 x22 x13 x23 x32 x33 ... xn 2 xn3 ... ... ... ... ... ... x1k u1 β1 x2k β2 u2 x3k β3 + u4 ... ... ... un βk xnk Important: To consider a constant term β1 in the regression model, a n × 1 vector of ones has to enter the X -matrix in addition to the other regressors. christine Jan Roestel (CAU) Crash Course Winter term 2016 3 / 15 The linear model Model assumptions Model assumptions u|X) = 0: The suggested linear model form is the right one to describe conditional averages for the underlying distribution. Moreover, the mean of ui is zero and we cannot learn from X how large the corresponding u will be. −→ This all implies that β̂1 is unbiased. (X2i ,...,Xki ,Yi ), i = 1, . . . ,n, are iid. −→ This is true if they are collected by simple random sampling. −→ It delivers the sampling distribution of β̂ . E( Large outliers in X and/or Y are rare. −→ Technically, X and Y have nite fourth moments. −→ Important because outliers can result in meaningless values of β̂ . christine Jan Roestel (CAU) Crash Course Winter term 2016 4 / 15 The linear model Derivation of the OLS estimator Derivation of the OLS estimator The minimization problem The model error is u = y − X β . The sum of P the squared model errors is thus: RSS (β) = n u2 = i =1 i 0 u u = (y − X β)0 (y − X β) Minimization problem: Choose the k Elements of the parameter vector β such that the residual sum of squares RSS (β) gets as small as possible! RSS (β) ∂ RSS (β) ∂β = u u 0 = (y − X β)0 (y − X β) = (y 0 − β 0 X 0 )(y − X β) = 0 y y − β0X 0y − y 0X β + β0X 0X β = y y 0 − 2β 0 X 0 y + β 0 X 0 X β = −2X 0 y + 2X 0 X β̂ = 0 ⇔ X X β̂ ⇔ ! 0 = X 0y β̂ = (X 0 X) −1 (k normal equations) 0 X y (OLS − estimator) christine Jan Roestel (CAU) Crash Course Winter term 2016 5 / 15 The linear model Derivation of the OLS estimator The OLS-estimator is thus: β̂ = (X 0 X )−1 X 0 y . Note: For matrices A(m,n) and B(n,z ) it holds that (AB)'=B'A'. 0 0 0 0 0 0 y X β and β X y are scalar terms, therefore y X β = β X y ! Dierentiation rules for vectors and matrices: Let A(m,m) be a square matrix and z(m,1) and w(m,1) vectors. Then: ∂z 0A ∂z ∂ z 0 Az ∂z 0 = A, ∂∂Az z = A0 , = (A0 + A)z , if A ∂w 0z ∂z 0 0 = ∂∂z zw = w 0 = A: ∂ z∂ zAz = 2Az . christine Jan Roestel (CAU) Crash Course Winter term 2016 6 / 15 The linear model The estimator's covariance matrix The estimator's covariance matrix An unbiased estimator for the variance of the error term is: 0 u = y − Xβ̂ . We divide by n−1 k because k parameters are σ̂ 2 = n^u−^uk , mit ^ estimated to obtain ^u. The covariance matrix of the OLS estimator β̂1 ,...β̂k : σβ2 1 σβ21 0 E [(β̂ − β)(β̂ − β) ] = Σβ = ... σβ 1 k σβ12 σβ2 2 ... ... ... ... ... ... σβ1 σβ2 ... σβ2 k k . k This expression can be estimated by means of Σ̂β = σ̂2 (X0 X)−1 . The corresponding estimators for the variances are located on the main diagonal 1 of Σ̂β :q I.e.: σ̂β2 = σ̂2 (X0 X)− ii . In terms of standard deviations: 1 σ̂β = σ̂ 2 (X0 X)− ii . i i christine Jan Roestel (CAU) Crash Course Winter term 2016 7 / 15 The linear model Example Example Consider the individual preparation time (xi ) and the points (Yi ) achieved by 6 students in a written exam. The associated realizations are bzw. (27 34 38 15 31 14). (9 15 19 10 14 5), Assume that the following model characterizes the population: Yi = β1 + β2 xi + Ui , i = 1, . . . ,n. christine Jan Roestel (CAU) Crash Course Winter term 2016 8 / 15 The linear model Example The empirical model: y1 1 1 y6 u1 β1 ... + ... β2 x12 ... = ... x62 u6 We get the following sample moment matrices: 0 = X X Hint: the inverse of a A −1 = a11 a21 6 72 72 988 (2X 2) matrix −1 a12 = a22 0 X y = 159 2129 A can be obtained as follows: 1 a11 a22 − a12 a21 a22 −a21 −a12 a11 . The OLS estimate: β̂ = (X 0 X )−1 X 0 y = 1.3280 −0.0968 −0.0968 0.0081 159 2129 = 5.113 1.782 christine Jan Roestel (CAU) Crash Course Winter term 2016 9 / 15 The linear model Thus, the estimate is: y Example i = 5,11 + 1,78xi + ûi . 40 35 PUNKTE 30 25 20 15 10 4 8 12 16 20 LERNSTUNDEN Interpretation: an increase of 1 hour of preparation time increases the grade by 1,78 points on average. Note: β1 ,β2 : model parameters; Ui : model errors; β̂1 ,β̂2 : estimator; ûi : residual; 5,11 ,1,78: estimate christine Jan Roestel (CAU) Crash Course Winter term 2016 10 / 15 The linear model Example Properties of the OLS estimator The OLS estimator is unbiased: E (β̂) = β Since, in addition, the variances converge to zero when the sample size approaches innity, the estimator β̂ is consistent. If the assumptions hold, the OLS estimator is ecient in the class of linear estimators. Put dierently, it is BLUE (best linear unbiased estimators). christine Jan Roestel (CAU) Crash Course Winter term 2016 11 / 15 Hypothesis testing Test for The linear model Hypothesis testing βi Assumption: Ui ∼ N (0,σ2 ) Test statistic: T0 β̂i − a β̂i − a ∼ tn−k = q = q 1 σ̂ 2 (X0 X)− Var β̂i ii Hypothesis H0 : βi = a ( H0 : βi ≥ a ) vs. H1 : βi < a H0 : βi = a ( H0 : βi ≤ a ) H0 : βi = a critical region T0 < −t1−α;n−k vs. H1 : βi > a T0 > t1−α;n−k vs. H1 : βi 6= a |T0 | > t1−α/2;n−k If ui is not normally distributed, T0 will be approximately normal distributed under H0 if the sample size gets large. christine Jan Roestel (CAU) Crash Course Winter term 2016 12 / 15 The linear model Degree of explanation Degree of explanation Goal: Describe the degree to which the linear model can explain the observed variation of Y The variance of Y might be decomposed as follows: TSS = ESS + RSS . The relative share of explained variation to total variation in Y is given as: R Hint: 2 = ESS TSS = TSS − RSS TSS =1− RSS TSS =1− ^u0^u y y − nȳ 2 0 Pn Pn Pn 2 2 2 0 2 i =1 (yi − ȳ ) = i =1 yi − i =1 ȳ = y y − nȳ . christine Jan Roestel (CAU) Crash Course Winter term 2016 13 / 15 The linear model Degree of explanation Properties: 0 ≤ R2 ≤ 1 2 R =1 ⇔ ŷi = ŷi for i = 1, . . . ,n 2 R =0 ⇔ ŷi = ȳ for i = 1, . . . ,n The more the R 2 approaches 1 the more variation in Y is explained by the linear model. For the bivariate regression case, the R 2 corresponds to the square of the correlation coecient between y and x. christine Jan Roestel (CAU) Crash Course Winter term 2016 14 / 15 The linear model Prediction Prediction Prediction for yP = f (xP ): ŷP = xP β̂, wobei xP = (1 xP 2 xP 3 ... xPk ) Prediction error: ûP = yP − ŷP Variance of the prediction error (E (ûP ) = 0): E [(yP − ŷP )2 ] = σ̂ 2 [xP (X0 X)−1 x0P + 1] = σ̂p2 The Prediction interval for yP and normally distributed disturbances is [ŷP − σ̂p tn−k ;1−α/2 ; ŷP + σ̂p tn−k ;1−α/2 ], where tn−k ;1−α/2 is the quantile of a t -distribution with n − k degrees of freedom. christine Jan Roestel (CAU) Crash Course Winter term 2016 15 / 15
© Copyright 2026 Paperzz