1 REV 00 Chapter 2 SIMPLE LINEAR REGRESSION

REV 00
Chapter 2
SIMPLE LINEAR
REGRESSION
QMT 3033 ECONOMETRIC
1
REV 00
Specification of Simple Linear
Regression
 A linear model that relates two variables, x
and y.
y = 0 + 1x + u
Where:
y = dependent variable
x = independent variable
0 = intercept
1 = slope
u = error term
QMT 3033 ECONOMETRIC
2
REV 00
Aim of Regression
 To examine whether there exists a significant
relationship between any of the x’s and y.
 To analyze the effects of changing one or
more of the x’s on y.
 To forecast the value of y for a given set of
x’s.
QMT 3033 ECONOMETRIC
3
REV 00
Deterministic and Stochastic
Relationship
 There are two type of relationships:
a) Deterministic or exact relationship.
b) Stochastic or statistical relationship, which
do not give unique values of y for given
values of x.
 Statistical relations are specified in
probabilistic terms.
 Regression uses statistical models.
QMT 3033 ECONOMETRIC
4
REV 00
Deterministic Relationships
 Take the following model:
Y = 2500 + 100x – x2
 The values for y can be exactly determined
for given values of x.
QMT 3033 ECONOMETRIC
5
REV 00
Stochastic Relationship
 Suppose the model is specified as:
Y = 2500 + 100x – x2 + u
 By defining y in probabilistic terms, it cannot
be exactly determined for given values of x.
QMT 3033 ECONOMETRIC
6
REV 00
Why Add an Error Term?
 Unpredictable element of randomness in
human behaviour.
 Effect of a large number of omitted variables.
 Measurement errors in y.
 Model misspecification.
 Functional misspecification.
 Aggregation biases.
QMT 3033 ECONOMETRIC
7
REV 00
Estimation with Ordinary Least
Squares (OLS)
Estimation Techniques
1) Consider a simple regression model:
Yi = β0 + β1Xi + εi
(i = 1, 2,…, n)
Where εi ~N(0,σ2).
2) The purpose of estimation techniques is to
obtain numerical values for the regression
coefficients (βs).
QMT 3033 ECONOMETRIC
8
REV 00
3) Suppose the estimated model is:
Yˆi = ̂ 0 + ˆ1 X i
4) Error term (εi) is a theoretical concept that
can never be observed. However, it may be
estimated by the regression model’s
residual:
ei = Yi - Yˆi = Yi  ˆ 0  ˆ1 X i
QMT 3033 ECONOMETRIC
9
REV 00
where Yi and Yˆi stand for the observed and
estimated values of Y at reference point i.
5) OLS estimation is the simplest and most
commonly used regression technique in
obtaining the coefficients of econometric
models.
QMT 3033 ECONOMETRIC
10
REV 00
6) OLS choose the regression coefficients (βs)
that minimize the residual sum of squared
(RSS):
RSS =
2
e1
2
 e2
2
 ...  en
n
2
=  ei
i 1
7) Method of Moments Estimation (MME) and
Maximum Likelihood Estimation (MLE) are
the two examples of other estimation
techniques.
QMT 3033 ECONOMETRIC
11
REV 00
8) MME technique chooses the regression
coefficients (s) that equate the sample
moment to zero, that is,
1 n
 X i ei  0
n i 1
QMT 3033 ECONOMETRIC
12
REV 00
9) MLE technique the following loglikelihood function:
Log-likelihood (  0 , 1 ,  2 ) =
n
1 n
2
 log( 2ˆ )  2  (Yi  ˆ 0  ˆ1 X i ) 2
2
2ˆ i 1
QMT 3033 ECONOMETRIC
13
REV 00
10) Statisticians have shown that the above 3
estimation techniques yield identical
estimates of βs if the model’s residual are
normally distributed with zero mean and
constant variance ( σ2 ).
QMT 3033 ECONOMETRIC
14
REV 00
OLS Estimator
1) An estimator or statistic is a numerical
estimate of population parameter. OLS
estimator refers to an estimate produced by
OLS technique.
QMT 3033 ECONOMETRIC
15
REV 00
2) The OLS estimates of β0 and β1 is given by
the formulae:
ˆ 0  Y  ˆ1 X
n
̂1 
(X
i 1
i
 X )(Yi  Y )
n
2
(
X

X
)
 i
i 1
QMT 3033 ECONOMETRIC
16
REV 00
Where ̂ 0 and ̂ 1 is the OLS estimates of
0
n
1
and  1 respectively, X =
 X i is the
n i 1
average value of independent variable X, and
1 n
Y   Yi is the average value of dependent
n i 1
variable Y.
QMT 3033 ECONOMETRIC
17
REV 00
3) Note that in order to obtain ̂ 0 , we must
first calculate ̂ 1.
4) The OLS estimates have a number of useful
characteristics:
 By rearranging ˆ 0  Y  ˆ1 X , we have
Y  ˆ 0  ˆ1 X
This implies that the estimated regression line
goes through the means of Y and X.
QMT 3033 ECONOMETRIC
18
REV 00
 The sum of the residuals is exactly zero, that is
n
 ei = 0
i 1
 The sum of the product of the residuals e and the
explanatory variable X is exactly zero, that is
n
 ei X i = 0
i 1
This implies that the residuals and the explanatory
variable are uncorrelated.
QMT 3033 ECONOMETRIC
19
REV 00
 OLS estimator ̂ 1 is a best linear unbiased
estimator (BLUE) under the Classical Linear
Regression Model (CLRM) assumptions.
QMT 3033 ECONOMETRIC
20
REV 00
OLS Estimation
1) OLS estimation can be easily done using
econometric software packages. However, it
is important to understand how they perform
the OLS estimation.
QMT 3033 ECONOMETRIC
21
REV 00
2) For illustration, consider the following 6
observations of variables Y and X:
X
6
10
17
21
26
30
Y
4
7
12
15
20
23
QMT 3033 ECONOMETRIC
22
REV 00
3) If we undergo the calculations,
n
̂1 
(X
i 1
i
 X )(Yi  Y )
n
2
(
X

X
)
 i
i 1
338
.
000
=
425.333
= 0.795
and
ˆ 0  Y  ˆ1 X = 13.500 – 0.795(18.333)
= - 1.069
QMT 3033 ECONOMETRIC
23
REV 00
4) The resulting estimated model for the
sample may be expressed as;
Yˆi = - 1.069 + 0.795Xi
which is the mathematical representation of
the fitted line of the scatter diagram of this
sample.
QMT 3033 ECONOMETRIC
24
REV 00
QMT 3033 ECONOMETRIC
25
REV 00
5) In the interpretation of the estimated
coefficients, econometricians are more
interested in the slope coefficient (̂ 1). In our
case, the positive sign of ̂ 1 = 0.795 implies
that Y is positively related to X. Besides, the
estimated value of 0.795 means that Y will
increase (decrease) by an average of 0.795
unit when X increase (decreases) by 1 unit.
QMT 3033 ECONOMETRIC
26
REV 00
6) The intercept is ̂ 0 = -1.069. Technically, it
means that when X = 0, Y = -1.069. It might
not have any economic sense at all.
7) By using the econometric software, EViews,
the printouts include more statistics
information.
QMT 3033 ECONOMETRIC
27
REV 00
QMT 3033 ECONOMETRIC
28
REV 00
Properties of Estimators
Algebraic Properties of OLS Statistics
• Property 1:
n
ˆ
ˆ
u

0

u
0
 i
i 1
Proof: Follows from first order conditions
OLS objective function.
QMT 3033 ECONOMETRIC
29
REV 00
• Property 2:
Proof:
S x ,uˆ
S x ,uˆ  0


1 n
1 n
xi  x  uˆ i  uˆ 

xi uˆ i  x uˆ


n  1 i 1
n  1 n 1
n
1
From the first order
uˆ   uˆ i  0
conditions we know that
n i 1
n
 x uˆ
i 1
i
i
0
 S x ,uˆ  0
QMT 3033 ECONOMETRIC
30
REV 00
• Property 3:
The point x, y  is always on the OLS regression
line.
Proof: From the first order condition
n


 2 y i  ˆ0  ˆ1 xi  0
i 1


1 n
  y i  ˆ 0  ˆi xi  0
n i 1
 y  ˆ  ˆ x
0
1
QMT 3033 ECONOMETRIC
31
REV 00
Five Assumptions of Simple Linear
Regression Model
1) In the population, the dependent variable y
is truly related to the independent variable x
and the error term u as:
y = 0 + 1x + u
2) {(xi,yi): i = 1,2…n} is a random sample of
size n from the population.
3) E (u x) = 0 (Zero Conditional Mean).
QMT 3033 ECONOMETRIC
32
REV 00
4) There is variation in the sample of
independent variables – i.e. x1 = x2 = …= xn
is not true.
5) var(u x) = 2 (Homoskedasticity)
It follows from Assumption 5 that var(u) =2.
var(u x) = E(u2 x) – [E(u x)]2 = E(u2 x) = 2
(constant, does not depend on x).
Therefore, E(u2) = 2. Since E(u) = 0, var(u) =
2.
QMT 3033 ECONOMETRIC
33
REV 00
Confidence Interval (CI)
 Is an interval estimate of a population
parameter.
 It is used to indicate the reliability of an
estimate. (small CI is more reliable than a
result with large CI).
 It gives an estimates range of values which is
likely to include an unknown population
parameter, the estimated range being
calculated from a given set of sample data.
QMT 3033 ECONOMETRIC
34
REV 00
 Confidence intervals are more informative
than the simple results of hypothesis test
(reject H0 or do not reject H0) since they
provide a range of plausible values for the
unknown parameter.
QMT 3033 ECONOMETRIC
35
REV 00
Hypothesis Testing
 One method of statistical inference.
 Involves the comparison of a conjecture we
make about the population to the information
contained in our data sample.
 In general, these conjecture are statements
about the parameters of the economics model.
QMT 3033 ECONOMETRIC
36
REV 00
Process of Hypothesis Testing
a) Formulation of a null hypothesis, H0.
b) Formulation of an alternative hypothesis,
H1 .
c) Computation of the test statistic.
d) Definition of a rejection region.
QMT 3033 ECONOMETRIC
37
REV 00
Null Hypothesis, Ho
 Specifies a value for the parameter in
question.
 Null hypothesis is an explanatory variable
which has a null effect on the dependent
variable.
 In the context of simple linear regression
model, 2 = 0
 Null hypothesis is hold until there is sufficient
sample evidence to refute it.
QMT 3033 ECONOMETRIC
38
REV 00
Alternative Hypothesis, H1
 Null is always paired with a logical alternative,
H1.
 If we reject the null hypothesis, then we do not
reject the alternative hypothesis.
 There are three alternatives to the null 2 = 0.
a) 2 ≠ 0
b) 2 > 0
c) 2 < 0
QMT 3033 ECONOMETRIC
39
REV 00
Test Statistic
 Test statistic is defined as:
Sample estimate – hypothesized value
=
Standard error of sample estimate
 Test statistic value determine whether reject or
not reject the null hypothesis.
 The test statistic exhibits randomness and follows
a statistical distribution.
QMT 3033 ECONOMETRIC
40
REV 00
Rejection Region
 Range of values of the test statistic that lead
us to reject the null hypothesis.
 If the null hypothesis is true, then the
rejection region exists for a range of values
such that there is a low probability that the
test statistic is contained therein.
 This probability is the level of significance
().
QMT 3033 ECONOMETRIC
41
REV 00
Do not reject the null hypothesis  = 0
Reject the
null
hypothesis
Reject the
null
hypothesis
-/2
+/2
QMT 3033 ECONOMETRIC
t
42
REV 00
Terminology
 If we find a test statistic that is in the rejection
region, then we say:
“….. the data is inconsistent with out null
hypothesis.”
“….. we reject the null hypothesis.”
 If we find a test statistic that is in the
acceptance region, then we say:
“….. the data is not inconsistent with our null.”
“….. we do not reject the null hypothesis.”
QMT 3033 ECONOMETRIC
43
REV 00
Example
 Let say for a consumption model, the mpc is
0.5091.
 Suppose we design the following test:
H0: β2 = 0.3
H1: β2 ≠ 0.3
 That is, under the null the mpc equals to 0.3,
but is less or greater than 0.3 under the
alternative.
QMT 3033 ECONOMETRIC
44
REV 00
One-sided or One-tail Test
Rejection region
Critical value
0
QMT 3033 ECONOMETRIC
t
t
45
REV 00
Two-sided or Two-tail Test
Rejection
region
Rejection
region
Upper critical
-t
0
+t
t
Lower critical
QMT 3033 ECONOMETRIC
46
REV 00
2
R
and Functional Form
R-Square (R2)
R-Square, often called the coefficient of
determination, is defined as the ratio of the
sum of squares explained by a regression
model and the "total" sum of squares around
the mean:
R2 = 1 - SSE / SST
QMT 3033 ECONOMETRIC
47
REV 00
Functional Form
 Refer to the algebraic form of a relationship
between a dependent variables and
explanatory variables.
 The simplest functional form is the linear
functional form, where the relationship
between dependent and independent variable
is graphically represented by a straight line.
QMT 3033 ECONOMETRIC
48
REV 00
Functional Forms for
Econometrics Analysis
a) Linear Functional Form
Y = 0 + 1X + u
Y
Y
1 > 0
1 < 0
X
QMT 3033 ECONOMETRIC
X
49
REV 00
 Each time X goes up by 1 unit, Y goes up by
1 units, and this is true no matter what the
values of X and Y are.
 For example, if you have a cost curve of the
form C = 0 + 1Q + u, then the linear
functional form implies that each time
quantity Q goes up by one unit, costs C go up
by 1 dollars.
QMT 3033 ECONOMETRIC
50
REV 00
b) Quadratic Functional Form:
Y = 1 + 1X + 2X2 + u
Y
Y
2 < 0
X
QMT 3033 ECONOMETRIC
2 > 0
X
51
REV 00
 When X goes up by one unit, Y goes up by
1 + 22X units. (You can derive this yourself
by calculating dY/dX from the equation if you
know some calculus).
 If 2 > 0, then as X gets bigger, the additional
effect of X on Y gets bigger also; if 2 < 0,
then as X gets bigger, the additional effect of
X on Y gets smaller.
QMT 3033 ECONOMETRIC
52
REV 00
c) Logarithmic Functional Form:
log Y = 0 + 1log X + u
Y
Y
1 > 0
1 < 0
X
QMT 3033 ECONOMETRIC
X
53
REV 00
 There are two ways to think about this
functional form.
i.
If X rises by 1%, then Y will rise by b1%;
this is a special property of the logarithmic
relationship.
ii. β1 is the elasticity of Y with respect to X;
this follows from the definition of elasticity.
QMT 3033 ECONOMETRIC
54
REV 00
 This functional form is commonly used in
estimating an elasticity of some kind.
 It is also widely used for cost functions,
production functions, utility functions and etc.
QMT 3033 ECONOMETRIC
55
REV 00
d) Translog Functional Form:
log Y = 0 + 1log X + 2 (log X)2 + u
 This functional form has the same
relationship to the logarithmic functional
form that the quadratic has to the linear; it
adds a squared term to the equation.
QMT 3033 ECONOMETRIC
56
REV 00
 The elasticity of Y with respect to X is
1 + 22 log X, which means that the elasticity
can change as X gets bigger or smaller, which
is useful if you think the elasticity might not
be constant with respect to X.
QMT 3033 ECONOMETRIC
57
REV 00
e) Semi-log Functional Form
log Y = 0 + 1X+ u
Y
Y
1 > 0
X
QMT 3033 ECONOMETRIC
1 < 0
X
58
REV 00
 The semi-log functional form has the property
that if X rises by 1 unit, Y rises per 100 by
[1 *100] percent. This is not a commonly
desired property but there are some
applications where it is useful.
QMT 3033 ECONOMETRIC
59
REV 00
E.g.
The relationship between salaries and
education is almost always expressed in this
functional form as log Sal = 0 + 1 Ed + u,
meaning that if a person's education goes up
by 1 year, that person's salary rises by
[1
*100] percent.
QMT 3033 ECONOMETRIC
60
REV 00
 If 1 = 0.08, it means that one extra year of
education increases the salary by 8%.
 As X gets large, the slope of the line will get
quite large, because as X gets larger a percent
increase in X also gets larger.
QMT 3033 ECONOMETRIC
61
REV 00
f) Reciprocal Function Form:
Y = 0 + 11/X+ u
Y
Y
1 < 0
X
QMT 3033 ECONOMETRIC
1 > 0
X
62
REV 00
 The reciprocal functional form is usually used
when Y and X both have to be positive and
when the relationship between them probably
slopes down (that is, 0 > 0 and 1 > 0).
 It's usually used for curves such as demand
curves which need to have this property.
QMT 3033 ECONOMETRIC
63