A Crash Course to Statistics and Econometrics

The linear model
The linear regression model
Assume that X and Y are two random variables obtained from a population
characterized by a bivariate distribution. We are interested in the features of
this (unobserved) joint distribution.
However, studying multivariate distributions is nontrivial. Instead, we
consider a conditional analysis of the joint distribution of X and Y by
estimating 'conditional' population means of the form E (Y |Xi ) = µY |X .
Consider estimating the unknown parameters of
i
Yi
= β 1 + β 2 Xi + Ui ,
i
= 1, . . . ,n
,
characterizing the average linear association of the variables in the underlying
bivariate distribution.
Given that the linear relationship only holds on average for joint random
variables X and Y , the model includes a random error component U .
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
1 / 15
The linear model
We have n observations, (Xi ,Yi ), i = 1, . . . ,n.
X is the independent variable or regressor.
Y is the dependent variable.
β0 is the intercept.
β1 is the slope.
ui is the regression error or disturbance.
The regression error consists of omitted factors. In general, these omitted
factors are factors that inuence Y other than the variable X . The regression
error also includes errors in the measurement of Y .
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
2 / 15
The linear model
General model with k regressors (including a constant term):
Yi
= β1 + β2 xi 2 + β3 xi 3 + ... + βk xik + Ui ,
i
= 1,..,n.
In terms of matrix algebra:
y
Notation: y(n×1) ;
X(n×k ) ;
=
β(k ×1) ;
Xβ
+
u,
u(n×1) .
In detail:
1
 y2   1
  
 y3  =  1
  
 
y1
 ... 
yn

...
1
x12
x22
x13
x23
x32
x33
...
xn 2
xn3
...
...
...
...
...
...
x1k
  
u1
β1
   
x2k 
 β2  u2 
   
x3k 
 β3  + u4 
...   ...   ... 
un
βk
xnk

Important: To consider a constant term β1 in the regression model, a n × 1
vector of ones has to enter the X -matrix in addition to the other regressors.
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
3 / 15
The linear model
Model assumptions
Model assumptions
u|X) = 0: The suggested linear model form is the right one to describe
conditional averages for the underlying distribution. Moreover, the mean of
ui is zero and we cannot learn from X how large the corresponding u will be.
−→ This all implies that β̂1 is unbiased.
(X2i ,...,Xki ,Yi ), i = 1, . . . ,n, are iid.
−→ This is true if they are collected by simple random sampling.
−→ It delivers the sampling distribution of β̂ .
E(
Large outliers in X and/or Y are rare.
−→ Technically, X and Y have nite fourth moments.
−→ Important because outliers can result in meaningless values of β̂ .
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
4 / 15
The linear model
Derivation of the OLS estimator
Derivation of the OLS estimator
The minimization problem
The model error is u = y − X β .
The sum of P
the squared model errors is thus:
RSS (β)
=
n u2 =
i =1 i
0
u u
= (y − X β)0 (y − X β)
Minimization problem: Choose the k Elements of the parameter vector β
such that the residual sum of squares RSS (β) gets as small as possible!
RSS (β)
∂ RSS (β)
∂β
=
u u
0
= (y − X β)0 (y − X β) = (y 0 − β 0 X 0 )(y − X β)
=
0
y y
− β0X 0y − y 0X β + β0X 0X β
=
y y
0
− 2β 0 X 0 y + β 0 X 0 X β
=
−2X 0 y + 2X 0 X β̂ = 0
⇔
X X β̂
⇔
!
0
= X 0y
β̂ = (X
0
X)
−1
(k normal equations)
0
X y
(OLS − estimator)
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
5 / 15
The linear model
Derivation of the OLS estimator
The OLS-estimator is thus:
β̂ = (X 0 X )−1 X 0 y .
Note:
For matrices A(m,n) and B(n,z ) it holds that (AB)'=B'A'.
0
0 0
0
0 0
y X β and β X y are scalar terms, therefore y X β = β X y !
Dierentiation rules for vectors and matrices: Let A(m,m) be a square
matrix and z(m,1) and w(m,1) vectors. Then:
∂z 0A
∂z
∂ z 0 Az
∂z
0
= A, ∂∂Az z = A0 ,
= (A0 + A)z , if
A
∂w 0z
∂z
0
0
= ∂∂z zw = w
0
= A: ∂ z∂ zAz = 2Az .
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
6 / 15
The linear model
The estimator's covariance matrix
The estimator's covariance matrix
An unbiased
estimator for the variance of the error term is:
0
u = y − Xβ̂ . We divide by n−1 k because k parameters are
σ̂ 2 = n^u−^uk , mit ^
estimated to obtain ^u.
The covariance matrix of the OLS estimator β̂1 ,...β̂k :
σβ2 1
 σβ21
0
E [(β̂ − β)(β̂ − β) ] = Σβ = 
 ...
σβ 1

k
σβ12
σβ2 2
...
...
...
...
...
...
σβ1
σβ2
...
σβ2

k
k

.

k
This expression can be estimated by means of Σ̂β = σ̂2 (X0 X)−1 . The
corresponding estimators for the variances are located on the main diagonal
1
of Σ̂β :q
I.e.: σ̂β2 = σ̂2 (X0 X)−
ii . In terms of standard deviations:
1
σ̂β = σ̂ 2 (X0 X)−
ii .
i
i
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
7 / 15
The linear model
Example
Example
Consider the individual preparation time (xi ) and the points (Yi ) achieved by 6
students in a written exam. The associated realizations are
bzw.
(27 34 38 15 31 14).
(9 15 19 10 14 5),
Assume that the following model characterizes the
population:
Yi
=
β1 + β2 xi +
Ui , i
= 1, . . . ,n.
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
8 / 15
The linear model
Example
The empirical model:
 
y1

1
1
y6

 
u1
β1

...
+  ... 
β2
x12
 ...  = ...
x62
u6
We get the following sample moment matrices:
0
=
X X
Hint: the inverse of a
A
−1
=
a11
a21
6
72
72
988
(2X 2) matrix
−1
a12
=
a22
0
X y
=
159
2129
A can be obtained as follows:
1
a11 a22
− a12 a21
a22
−a21
−a12
a11
.
The OLS estimate:
β̂ = (X 0 X )−1 X 0 y =
1.3280
−0.0968
−0.0968
0.0081
159
2129
=
5.113
1.782
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
9 / 15
The linear model
Thus, the estimate is: y
Example
i = 5,11 + 1,78xi + ûi .
40
35
PUNKTE
30
25
20
15
10
4
8
12
16
20
LERNSTUNDEN
Interpretation: an increase of 1 hour of preparation time increases the grade by
1,78
points on average.
Note: β1 ,β2 : model parameters; Ui : model errors;
β̂1 ,β̂2 : estimator; ûi : residual; 5,11 ,1,78: estimate
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
10 / 15
The linear model
Example
Properties of the OLS estimator
The OLS estimator is unbiased: E (β̂) = β
Since, in addition, the variances converge to zero when the sample size
approaches innity, the estimator β̂ is consistent.
If the assumptions hold, the OLS estimator is ecient in the class of linear
estimators. Put dierently, it is BLUE (best linear unbiased estimators).
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
11 / 15
Hypothesis testing
Test for
The linear model
Hypothesis testing
βi
Assumption: Ui ∼ N (0,σ2 )
Test statistic:
T0
β̂i − a
β̂i − a
∼ tn−k
= q
= q
1
σ̂ 2 (X0 X)−
Var β̂i
ii
Hypothesis
H0
: βi = a (
H0
: βi ≥ a )
vs. H1 : βi < a
H0
: βi = a (
H0
: βi ≤ a )
H0
: βi = a
critical region
T0
< −t1−α;n−k
vs. H1 : βi > a
T0
> t1−α;n−k
vs. H1 : βi 6= a
|T0 | > t1−α/2;n−k
If ui is not normally distributed, T0 will be approximately normal distributed
under H0 if the sample size gets large.
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
12 / 15
The linear model
Degree of explanation
Degree of explanation
Goal: Describe the degree to which the linear model can explain the observed
variation of Y
The variance of Y might be decomposed as follows: TSS = ESS + RSS . The
relative share of explained variation to total variation in Y is given as:
R
Hint:
2
=
ESS
TSS
=
TSS
− RSS
TSS
=1−
RSS
TSS
=1−
^u0^u
y y − nȳ 2
0
Pn
Pn
Pn
2
2
2
0
2
i =1 (yi − ȳ ) = i =1 yi − i =1 ȳ = y y − nȳ .
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
13 / 15
The linear model
Degree of explanation
Properties:
0 ≤ R2 ≤ 1
2
R =1
⇔ ŷi = ŷi for i = 1, . . . ,n
2
R =0
⇔ ŷi = ȳ for i = 1, . . . ,n
The more the R 2 approaches 1 the more variation in Y is explained by
the linear model.
For the bivariate regression case, the R 2 corresponds to the square of
the correlation coecient between y and x.
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
14 / 15
The linear model
Prediction
Prediction
Prediction for yP = f (xP ):
ŷP
= xP β̂,
wobei xP = (1 xP 2 xP 3 ... xPk )
Prediction error: ûP = yP − ŷP
Variance of the prediction error (E (ûP ) = 0):
E [(yP
− ŷP )2 ] = σ̂ 2 [xP (X0 X)−1 x0P + 1] = σ̂p2
The Prediction interval for yP and normally distributed disturbances is
[ŷP
− σ̂p tn−k ;1−α/2 ; ŷP + σ̂p tn−k ;1−α/2 ],
where tn−k ;1−α/2 is the quantile of a t -distribution with n − k degrees of
freedom.
christine
Jan Roestel
(CAU)
Crash Course
Winter term 2016
15 / 15