Regression model with stochastic regressor (RM2)
The multiple regression model (I)
The regression model with one stochastic
regressor (summary) and multiple regression
(start)
Ragnar Nymoen
University of Oslo
12 February 2013
1 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Summary of Lecture 7 and 8 I
I
Assumptions: The n pairs of random variables {Yi , Xi }
i = 1, 2, . . . , n are IID and representative of the population
distribution function fXY (X , Y ).
I
Argument: With use of factorization of the n identical joint
densities fXY we can establish the conditional expectation
function, aka the regression function:
E (Yi | Xi = xi ) = β 0 + β 1 Xi ∀ i
(1)
with homoskedasticity:
Var (Yi | Xi = xi ) = σ2 ∀ i
(2)
2 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Summary of Lecture 7 and 8 II
I
Remember that the linearity of E (Yi | Xi = xi ) in most
practical situations is a result of modelling choices: Variable
transformations and choice of functional form (HGL Ch 4;
BN kap 2; Lecture 2 and 7).
I
When there is no danger of misunderstanding, we use the
more compact notation
E (Yi | Xi ) = β 0 + β 1 Xi ∀ i
Var (Yi | Xi ) = σ2 ∀ i
where is understood that in a given sample of observable
random variables “| Xi ” operates on Xi and turns it into a
parameter xi .
3 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Summary of Lecture 7 and 8 III
I
The OLS estimators β̂ 0 and β̂ 1 are unbiased. The proof is by
the use of the theorem of Iterated Expectations (Lect 6 and
8), for example
E E ( β̂ 1 | X ) = E [ β 1 ] = β 1
where we first find the expectation of the function where X is
a parameter, namely of E ( β̂ 1 | X ). This is the same operation
as in RM1, giving E ( β̂ 1 | X ) = β 1 .
I
Inference: Since the distributions of the t −statistics that
we use for hypothesis testing and confidence intervals are
independent of X , the inference procedures for RM1 are valid
for the case with deterministic regressor Xi .
4 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Model specification with disturbance term
Bringing back the disturbance term I
I
The disturbance term was a central concept in RM1.
I
We can align the two models with regard to the disturbance
term, and give the model a similar specification as we had for
RM1, this will be closer to the typical textbook specification
I
As in Ch 10.1.1 in HGL
5 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Model specification with disturbance term
The disturbances and their properties I
Definition
We have {Xi ,Yi } i = 1, 2, . . . , n random variables and the
conditional expectation E (Yi | Xi ).Define n new random variables
ε i by
ε i := Yi − E (Yi | Xi ) ∀ i
(3)
which are the disturbances.
Expectation of ε i :
E (ε i | Xi ) = E [(Yi | Xi )] − E (Yi | Xi ) | Xi ] =
= E (Yi | Xi ) − E (Yi | Xi ) = 0
(4)
6 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Model specification with disturbance term
The disturbances and their properties II
Consequence: Exogeneity of Xi : Since the conditional
expectation of ε i given Xi is constant, we have that (Lecture 6):
Cov (ε i , Xi ) = 0
(5)
in econometric terminology this is called (strict) exogeneity of Xi
with respect to ε i . This type of exogeneity is generic in our
regression model.
Variance of ε i
Var (ε i | Xi ) = Var {(Yi | Xi ) − E (Yi | Xi ) | Xi )}
The second term, E (Yi | Xi ) is a parameter for fixed Xi (= xi ),
hence
Var (ε i | Xi ) = Var (Yi | Xi ) = σ2
(6)
7 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Model specification with disturbance term
The disturbances and their properties III
which is the conventional way of stating the homoskedasticity
property.
Covariance:
Also from IID:
Cov (ε i , ε j | Xi ) = E (ε i ε j | Xi ) = Cov (Yi ,Yj | Xi ) = 0 ∀i 6= j
(7)
8 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Model specification with disturbance term
“Classical” model specification and properites
The model can be specified by the linear relationship:
Yi = β 0 + β 1 Xi + ε i i = 1, 2, . . . , n
(8)
and the set of assumptions:
a. Xi (i = 1, 2, . . . , n) are IID stochastic variables with
Var (Xi ) = σX2 > 0 ∀ i
b. E (ε i | Xh ) = 0, ∀ i and h
c. Var (ε i | Xh ) = σ2 , ∀ i and h
d. Cov (ε i , ε j | Xh ) = 0, ∀ i 6= j, and for all h
e. β 0 , β 1 and σ2 are constant parameters
For the purpose of statistical inference we assume normally distributed
disturbances:
f. ε i ∼ N 0, σ2 | Xh .
OLS estimators β̂ 1 and β̂ 0 are BLUE.
9 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Asymptotic analysis for RM2
Consistency of estimators I
I
In Lecture 6 we showed that
plim (α̂) = α
in RM1.
I
Exactly the same argument can be used for α̂(= β̂ 0 − β̂ 1 X )
in RM2.
I
What about β̂ 1 and β̂ 0 ?
10 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Asymptotic analysis for RM2
Consistency of estimators II
I
For β̂ 1 , we start with the familiar decomposition of β̂ 1
β̂ 1 = β 1 +
∑ni=1 (Xi − X̄ )ε i
∑ni=1 (Xi − X̄ )2
And use the rules for probability limits:
∑ni=1 (Xi − X̄ )ε i
plim β̂ 1 = plim β 1 + n
∑i =1 (Xi − X̄ )2
!
= β 1 + plim
= β1 +
plim
plim
1 n
n ∑i =1 (Xi − X̄ ) ε i
1 n
2
n ∑i =1 (Xi − X̄ )
1 n
n ∑i =1 (Xi − X̄ ) ε i
1 n
2
n ∑i =1 (Xi − X̄ )
11 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Asymptotic analysis for RM2
Consistency of estimators III
Given the model specification (and a weak extra assumption
about finite 4th order moments of Yi and Xi ) the two “plims”
converge to their theoretical counterparts:
Cov (ε, X )
plim β̂ 1 = β 1 +
Var (X )
(Proof is by The Law of large numbers and the Central Limit
Theorem (Lecture 6)). From the model specification
(assumptions):
plim β̂ 1 = β 1 +
I
0
= β1
Var (X )
What about plim β̂ 0 ?
12 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Asymptotic analysis for RM2
Finite sample properties of the regression model
I
I
Monte Carlo analysis of IID model
Moderate samle sizes, varying disturbance and X variation (“noice”
and “signal”)
MC-1
n = 47
σ2 = 3
σX2 = “high”
ÊMC β̂ 1
0.80192
se
b MC β̂ 1
0.066575
I ÊMC β̂ 1 is the Monte
estimate of se
b ( β̂ 1 ).
MC-2
n = 28
σ2 = 3
σX2 = “high”
0.80601
0.072909
MC-3
n = 47
σ2 = 1.5
σX2 = “high”
0.80136
0.047076
Carlo estimate of E ( β̂ 1 ). se
b MC
MC-4
n = 47
σ2 = 3
σX2 = “low”
0.79731
0.16021
β̂ 1 is the MC
13 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Asymptotic analysis for RM2
Asymptotic distributions and tests
I
With the use of the Central Limit Theorem (Lect 6) it is also
possible to show convergence in distribution:
√
d
n ( β̂ 1 − β 1 ) −→ N (0, σ2 )
in the case where the data is IID, but without being normally
distributed.
I
As a consequence, the t-statistics used for hypotheses tests
and confidence intervals are also N (0, 1) asymptotically
I
Important for doing approximately correct inference when
exact normality is untenable (heteroscedastcity for example)
14 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Asymptotic analysis for RM2
Summary of summary
I
I
I
I
I
All the properties for the OLS estimators, and the inference theory hold
in the regression model with random regressor
The gap between RM1 and RM2 has now been bridged, and we do not
need the distinction any longer.
In applied modelling we are free to use deterministic and random
explanatory variables, and to combine them in multiple regression
models.
The main bridge principles are conditional expectation and IID
sampling. Later in the course we will return to both and ask new
questions. For example: “What, remains of all the nice OLS properties
if the variables are not independent?”
But first we will develop the multivariate regression model under the
IID assumption
15 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
References: I
I
HGL Chapter 5 and 6
I
BN Kap 7
16 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Motivation for multivariate regression I
I
Economic variables are influenced by several factors, rather
than one single “primary causal factor”
I
Economic theory often imply multivariate models
I
In order to test competing theories with the aid of regression,
must allow for at least two explanatory variables (bivariate)
I
Even if there is only one explanatory variable, the use of a
polynomials to model a non-linear relationship, e.g.,
E (Y | Xi ) = β 0 + β 1 Xi + β 2 Xi2
leads to regression models with two regressors
17 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Motivation for multivariate regression II
I
Although multiple regression indicates that we will want to
use models where the number of regressors is large (k ), all the
new theoretical points can be understood with the use of the
bivariate model (k = 2).
18 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Model specification
The model can be specified by the linear relationship
Yi = β 0 + β 1 X1i + β 2 X2i + ε i i = 1, 2, . . . , n
(9)
and the set assumptions (compare HGL p 173):
a. Xji (j = 1, 2), (i = 1, 2, . . . , n ) can be deterministic or
stochastic. For a deterministic variable we assume that at least
two values of the variables are distinct. For random Xs, we
assume Var (Xji ) = σX2 j > 0 (j = 1, 2) and ρ2X1 X2 < 1.
b. E (ε i ) = 0, ∀ i
c. Var (ε i ) = σ2 , ∀ i
d. Cov (ε i , ε j ) = 0, ∀ i 6= j,
e. β 0 , β 1 and σ2 are constant parameters
For the purpose of statistical inference we will assume normally distributed
disturbances:
f. ε i ∼ N 0, σ2 .
19 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Comments to specification I
I
a. is formulated to accommodate both data types.
I
ρ2X1 X2 < 1 is a way of saying that the two random variables
are truly separate variables.
I
In many presentations, incl HGL p 173, you will find an
assumption about “absence of exact linear relationships
between the variables often called absence of exact
collinearity. But this can only occur for the case for
deterministic variables, and would be an example of “bad
model specification”, e.g., specifying X2i as a variables with
the number 100 as the value for all i. (An example of the
“dummy-variable fallacy/pit-fall”).
20 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
Comments to specification II
I
For random variables, we can of course be unlucky and draw a
sample where rX21 X2 is very high. But this “near exact
collinearity” is a property of the sample, not of the
regression model
I
b.-d. and f. These are the same as in the case with one
variable. Since we want a model formulation that allows
random explanatory variables they should be interpreted as
conditional on X1i = x1i and X2i = x21 . With reference to
such a remark, it is OK to drop the explicit conditioning
notation when you specify the multivariate regression model.
It shows that you are aware and precise about the
interpretation of the assumptions, and it saves notation.
21 / 22
Regression model with stochastic regressor (RM2)
The multiple regression model (I)
OLS estimation
Nothing new here. Chose the estimates that minimize
n
S ( β 0 ,β 2 ,β 1 ) =
∑ (Yi − β0 − β1 X1i − β2 X2i )2
(10)
i =1
or, equivalently.
n
S (α,β 2 ,β 1 ) =
∑ (Yi − α − β1 (X1i − X 1 ) − β2 (X2i − X 2 ))2
i =1
where
α := β0 + β1 X 1 + β2 X 2
I
Rest of derivation and examples in class
22 / 22
© Copyright 2026 Paperzz