Multiple Regression Analysis

Multiple Regression Analysis
y = β0 + β1x1 + β2x2 + . . . βkxk + u
2. Inference
0
Assumptions of the Classical
Linear Model (CLM)
! So far, we know:
1. The mean and variance of the OLS estimators
2. That OLS is BLUE, given the Gauss-Markov
assumptions
! This still leaves open the shape of the distribution
of the OLS estimators
! To do classical hypothesis testing, we need to
make an assumption about this distribution
Assumption 6 (Normality): u is independent of x1,
x2,…, xk and u is normally distributed with zero
mean and variance σ2: u ~ Normal(0, σ2)
1
CLM Assumptions (cont)
! This is a very strong assumption, it implies:
E(u|x)=E(u)=0, and
Var(u|x)=Var(u)=σ2
! Assumptions 1-6 are called the Classical Linear
Model Assumptions, “CLM Assumptions”
! Under CLM, OLS is not only BLUE, but is the
minimum variance unbiased estimator
- no longer limited to the class of linear estimators
2
CLM Assumptions (cont)
! We can summarize the population assumptions of CLM as
follows
! y|x ~ Normal(β0 + β1x1 +…+ βkxk, σ2)
! While for now we just assume normality, clear that
sometimes not the case
! Example: Suppose we’re interested in modeling house
prices (our y variable) as a function of x variables:
 
 
 
House prices are never below zero (a normally distributed variable
takes on negative values); same problem with wages
Could take log of house prices to convert it to something close to
normal; analysis of wages as y variable often involves taking logs
Or, if our sample is large enough, non-normality ceases to be a
problem, thanks to the Central Limit Theorem.
3
The homoskedastic normal distribution with
a single explanatory variable
y
f(y|x)
.
. E(y|x) = β + β x
0
1
Normal
distribution
x1
x2
4
Normal Sampling Distributions
! Under the CLM assumptions, conditional on the
sample values of the independent variables, (x),
!ˆ j ~ N "# ! j ,Var(!ˆ j ) $%
! Why? Because each estimated β is a linear
combination of the errors, u, in the sample. Recall:
n
"ˆ j = " j +
$ (x
ij
# x j )ui
i=1
n
, and ui ~ N(0,% 2 )
2
(x
#
x
)
$ ij j
i=1
! Any linear combination of iid normal random
variables is normal (recall Appendix C)
5
!
Normal Sampling Distributions
(
thus,
!
!
!ˆ j " ! j
)
~ Normal ( 0,1)
ˆ
sd ! j
( )
To standardize (as before), we just subtract the
mean and divide by the standard deviation
The fact that the estimated β’s are distributed
normally has further implications we will use:
1) 
2) 
Any linear combination of the estimated β’s will be
normally distributed.
Any subset of the estimated β’s will be jointly
normally distributed
6
The t Test
! Testing hypotheses about a single population
parameter
! Remember, the βj are unknown
! However, we can “hypothesize” about the value of
βj and use statistical inference to test our
hypothesis
! To perform a hypothesis test we first note that
under CLM:
(!ˆ j " ! j )
~ t n " k "1
se(!ˆ )
j
7
The t Test (cont)
! Knowing the sampling distribution for the
standardized estimator allows us to carry out
hypothesis tests
! Start with a null hypothesis
! For example, H 0 : ! j = 0
! The null here is making the claim that variable xj
has no effect on y.
8
The t test (cont)
! To perform our test (given this particular stating
of the null hypothesis) we first form the t statistic
for our estimate of βj
ˆ #0
ˆ
!
!
j
j
for !ˆ j : t !ˆ "
=
j
se !ˆ j
se !ˆ j
( )
( )
! Notice that we subtract the value of βj under the
null (in this case, 0), from our estimate of βj.
 
Intuitively, if our estimate is close to the null, our tstatistic will be close to zero, and we will be unable to
reject the null that βj=0.
9
Things to notice about t-stat
! The denominator is always positive. So for a zero
null, the t-statistic will always take on the sign of
your estimate of βj. For positive estimated βj it
will be positive, for negative estimated βj it will be
negative.
! We can interpret the t-statistic as giving us an
estimate of how many standard deviations from
the null (0, in this case) our estimate of βj lies.
Say it equals 3. This suggests that it lies three
standard deviations to the right of 0. If it equals
-3, this suggests it lies 3 sd’s to the left of 0.
10
Intuition for t Test
! Because it is a continuous random variable, the
estimated βj will never precisely equal 0 even if
the underlying population parameter βj equals 0
! We want to know how far estimated βj is from 0
 
Essentially, if our estimated βj lies near the null, we
think there’s a good chance that estimate would be
generated if the null is true (so, we will fail to reject the
null); if our estimated βj lies far away from the null, we
think it’s unlikely that estimate would be generated
under the null: thus we reject the null.
! The question is, “How far is far enough from the
null to warrant rejection?”
! We need a rejection rule!
11
t-test: One-Sided Alternatives
! Besides our null, H0, we need an alternative hypothesis,
H1, and a significance level
! H1 may be one-sided, or two-sided
! Examples of different one-sided alternatives include
H 1 : ! j < 0 or H 1 : ! j > 0
! A 2-sided alternative is H 1 : ! j " 0
! We also need to choose a significance level of our test
(recall that the significance level is the probability of Type
I error: that is the probability of rejecting the null when the
null is true)
12
One-Sided Alternatives (cont)
Having picked a significance level, α, we look up the (1 –
α)th percentile in a t distribution with n–k–1 df and call
this c, the critical value
! With a 5% significance level and a 1-sided positive
alternative, c is the 95th percentile. If H0 is true, 95% of
the time our estimate of t will lie below c, so 95% of the
time we will correctly fail to reject the null.
! If our estimate of t is greater than c, then either
1)  H0 is true, and an unlikely event (Type I error) has
occurred; or
2)  H0 is false.
! Since the odds of (1) are very small (assuming we choose
a conventional significance level like 5%), we will
choose the second interpretation (but never forget that (1)
is a possibility!)
13
!
Rejection Rule
! Thus, our rejection rule for H1: βj >0 is as follows:
! We reject the null hypothesis if the t-statistic
is greater than the critical value
! We fail to reject the null hypothesis if the t
statistic is less than the critical value
14
One-Sided Alternative
yi = β0 + β1xi1 + … + βkxik + ui
H0: βj = 0
H1: βj > 0
Fail to reject
reject
(1 - α)
0
α
c
15
One-sided vs Two-sided
!
Because the t distribution is symmetric, testing
H1: βj < 0 is straightforward.
! The critical value is just the negative of before
! We can reject the null if the t statistic < –c, while
if t > –c then we fail to reject the null.
! For a two-sided test, we set the critical value
based on α/2 and reject H1: βj ≠ 0 if the absolute
value of the t statistic > c
 
In the 5% case, c would be the 97.5 percentile of the t
distribution with n-k-1 df
16
Two-Sided Alternative
yi = β0 + β1Xi1 + … + βkXik + ui
H 0: β j = 0
H1: βj ≠ 0
fail to reject
reject
reject
(1 - α)
α/2
-c
0
α/2
c
17
Summary for H0: βj = 0
! Unless otherwise stated, the alternative is assumed
to be two-sided; in fact EViews will automatically
generate t-statistics for you assuming your null
value is 0 and your test is two sided.
! If we reject the null, we typically say that our
estimated βj is “statistically significant at the 5%
level” or “statistically different from zero (the
null) at the 5% level”
! If we fail to reject the null, we typically say our
estimate of βj is “statistically insignificant” or
“statistically indistinguishable from zero (the
18
null).”
Testing other hypotheses
! A more general form of the t statistic recognizes that we
may want to test something like H0: βj = aj
! aj is a constant, e.g. test H0: βj=1
! In this case, the appropriate t statistic is
"ˆ j # a j
t!
, where a j = 0 for the standard test
ˆ
se(" )
j
! The t-statistic tells us how many standard deviations from
the null our estimate lies.
! As before, this can be used with a one-sided or two-sided
alternative.
19
Confidence Intervals
! Another way to use classical statistical testing is to
construct a confidence interval using the same critical
value as was used for a two-sided test
! A (1 - α) % confidence interval is defined as
# "&
ˆ
ˆ
! j ± c • se ! j , where c is the % 1- ( percentile
$ 2'
in a t n ) k )1 distribution
( )
! e.g. α=5%; 95% confidence interval
 
This implies that the true value of βj will be contained
in the interval in 95% of random samples.
20
Confidence Intervals (cont)
! The confidence intervals are easily calculated
( )
since we have "ˆ j and se "ˆ j
! Once we have the confidence interval it is easy to
do a two-tailed test
! If testing H0: βj= aj against H1: βj ≠aj, then
!
 
 
Reject H0 if aj does not lie in the confidence interval for
your sample.
Do not reject H0 if aj does lie in the confidence interval
for your sample.
21
Computing p-values for t tests
! An alternative to the classical approach is to ask, “what is
the smallest significance level at which the null would be
rejected?”
! So, compute the t statistic, and then look up what
percentile it is in the appropriate t distribution--this is
called the p-value
Interpretation:
! The p-value gives the probability of observing a t-statistic
as large as we did, if the null is in fact true.
! If p=0.01, this tells us we would obtain a t-statistic as big
in magnitude (or bigger) in only 1% of samples, if the null
is true.
! Small p-values are evidence against the null hypothesis
22
EViews and p-values, t tests, etc.
! Most computer packages will compute the p-value
for you, assuming a two-sided test
! If you really want a one-sided alternative, just
divide the two-sided p-value by 2
! EViews provides the t statistic, and p-value for H0:
βj = 0 for you, in columns labeled “t-statistic”,
“Prob.”, respectively. It also gives you the
coefficient estimate and standard error, which can
be used to calculate confidence intervals.
23
Testing Hypotheses about a Linear
Combination of Parameters
! Suppose instead of testing whether β1 is equal to a
constant, you want to test whether it is equal to another
parameter, e.g. H0 : β1 = β2
! This is a hypothesis test concerning 2 parameters (haven’t
seen this before)
! Can use same basic procedure as before to form the t
statistic if we write H0: β1-β2=0,
!ˆ1 " !ˆ 2
t=
se(!ˆ1 " !ˆ 2 )
24
!
Testing Linear Combo (cont)
! Problem/difficulty is calculating the standard error
se "ˆ # "ˆ . Recall (from review) that
(
1
2
(
)
)
( )
( )
(
Var "ˆ1 # "ˆ 2 = Var "ˆ1 + Var "ˆ 2 # 2Cov "ˆ1, "ˆ 2
2
( )
se ( !ˆ & !ˆ ) = { " se ( !ˆ ) $
#
%
)
( )
Since " se !ˆ j $ is an unbiased estimate of Var !ˆ j
#
%
1
where s12
2
2
( )
is an estimate of Cov ( !ˆ , !ˆ )
2
1
+ " se !ˆ 2 $ & 2s12
#
%
1
2
1/ 2
}
25
Testing a Linear Combo (cont)
! So, to use formula, need s12, which standard
EViews output does not give
! Many packages will have an option to get it, or
will just perform the test for you
! In lab you will learn how to use EViews to test
whether !ˆ1 = !ˆ2 .
! More generally, you can always restate the
problem to get the test you want
26
Example
! Suppose you are interested in the effect of campaign
expenditures by Party A and Party B on votes received by
Party A’s candidate
! Model is voteA = β0 + β1log(expendA) + β2log(expendB) +
β3prtystrA + u
! You might think that expendA and expendB should have
an equal yet opposite effect on votes received by Party A’s
candidate
  i.e. H0:β1 =-β2
  We could define a new parameter, θ1, where θ1 = β1 +β2
27
Example (cont):
! If we could generate an estimate of se(θ1) we
could test H0: θ1 = β1 + β2 = 0
! We can do this by rewriting the model
! recall: β1 = θ1 – β2, so substitute in and rearrange
voteA = β0 + θ1 log(expendA) + β2[log(expendB)-log
(expendA)] + β3prtystrA + u
! This is the same model as before, but now we get
a direct estimate of θ1 and the standard error of θ1
which makes for easier hypothesis testing.
28
Example (cont):
! The t-statistic can be written:
!ˆ1
t=
se !ˆ
( )
1
! In fact, any linear combination of parameters
!
!
!
!
could be tested in a similar manner
Other examples of hypotheses about a single
linear combination of parameters:
β1 = 1 + β2
β1 = 5β2
β1 = -1/2β2 ; etc
29
Multiple Linear Restrictions
! Everything we’ve done so far has involved testing
a single linear restriction, (e.g. β1 = 0 or β1 = β2 )
! However, we may want to jointly test multiple
hypotheses about our parameters
! A typical example is testing “exclusion
restrictions” – we want to know if a group of
parameters are all equal to zero
Example: Suppose that you are a college
administrator and you want to know what
determines which students will do well
30
Exclusion Restrictions
! You might estimate the following model (US applicable):
ColGPA=β0+ β1highscGPA+ β2VerbSAT+β3MathSAT+u
! Once we control for high-school GPA do the SAT scores
have any effect on (or at least predictive power for) college
GPA?
! Here, the null hypothesis is H0: β2 =0, β3 =0 (where each of
these is referred to as an “exclusion restriction”; the
alternative hypothesis is that H0 is not true.
! If we can’t reject the null, it suggests we should drop
VerbSAT and MathSAT from the model
! While it’s tempting to do separate t-tests on β2 and β3, this
is not appropriate, because the t-test is done assuming no
restrictions are simultaneously being placed on other
parameter values (i.e. the t-test is designed for single
exclusion restrictions)
31
Testing Exclusion Restrictions
! We want to know if the parameters are jointly
significant at a given level
! For example – it is possible that neither β2 nor β3
is individually significant at, say, the 5% level but
that they are jointly significant at that level
! To test the restrictions we need to estimate
1. The Restricted Model--without x2 and x3
2. The Unrestricted Model--with x1, x2 and x3
! We can use the SSR’s from each of these
regressions to form a test statistic
32
Exclusion Restrictions (cont)
! Clearly, the SSR will be greater in the restricted model
(because with fewer RHS variables, we explain less of the
variation in y and so get larger residuals, on average).
! Intuitively, we want to know if the increase in SSR is big
enough to warrant inclusion of x1 and x2
! The F-statistic in the general case with q exclusion
restrictions is:
(SSRr " SSRur ) / q
F!
SSRur / (n " k " 1)
Where r refers to the restricted model, and ur refers to the
unrestricted model.
33
The F statistic
! The F statistic is always positive, since the SSR
from the restricted model can’t be less than the
SSR from the unrestricted model
! Essentially the F statistic is measuring the relative
increase in SSR when moving from the
unrestricted to the restricted model
! q = number of restrictions, or the numerator
degrees of freedom dfr – dfur
! n – k – 1 = denominator degrees of freedom = dfur
34
The F statistic (cont)
! To decide if the increase in SSR when we move to
a restricted model is “big enough” to reject the
exclusions, we need to know about the sampling
distribution of our F statistic under H0
! It turns our that F~Fq,n-k-1, where q is the
numerator degrees of freedom and n-k-1 is the
denominator degrees of freedom.
! We can use the back of the text (Table G.3) or
EViews to tabulate the F-distribution
35
The F statistic (cont)
f(F)
fail to reject
α
reject
(1 - α)
0
Reject H0 at α
significance level
if F > c
c
F
36
Summary of the F statistic
! If H0 is rejected we say, for example, that x2 and x3
are “jointly significant”
! If H0 is not rejected the variables are “jointly
insignificant”
! The F-test is particularly useful when a group of
the x variables are highly correlated
 
 
 
e.g. expenditures on hospitals, doctors, nurses, etc.
We don’t know which are more important
Multicollinearity won’t allow for individual t-tests (due
to high standard errors of individual coefficient
estimates)
37
The R2 form of the F statistic
! Because the SSR’s may be large and unwieldy, an
alternative form of the formula is useful
! We use the fact that SSR = SST(1 – R2) for any
regression, so can substitute in for SSRr and SSRur
F!
(
)
Rur2 " Rr2 q
(1 " R ) ( n " k " 1)
2
ur
! Again, r = restricted and ur = unrestricted
! Notice in the numerator that the unrestricted Rsquared comes first.
38
Overall Significance
! A special case of exclusion restrictions is to test
H0: β1 = β2 =…= βk = 0
! Since the R2 from a model with only an intercept
will be zero, the F statistic is simply
R2 / k
F=
(1 ! R 2 ) / (n ! k ! 1)
! If we can’t reject the null, this suggests that data
are consistent with the RHS variables explaining
none of the variation in y.
! Even if the R-squared is very small, we can use
this statistic to test whether the x’s are explaining
39
any of y.
General Linear Restrictions
! The basic form of the F statistic will work for any
set of linear restrictions (not just setting a couple
betas equal to zero)
! First estimate the unrestricted model and then
estimate the restricted model
! In each case, make note of the SSR
! Imposing the restrictions can be tricky – will
likely have to redefine variables again
40
Example:
! Use same voting model as before
! Model is voteA = β0 + β1log(expendA) +
β2log(expendB) + β3prtystrA + u
! now suppose null is H0: β1 = 1, β3 = 0
! Substituting in the restrictions: voteA = β0 + log
(expendA) + β2log(expendB) + u, so use voteAlog(expendA)= β0 + β2log(expendB) + u as the
restricted model.
! Note: Can’t use the R-squared form of the Fstatistic here because SSTs will be different.
41
F-statistic Summary
! Just as with t statistics, p-values can be calculated
by looking up the percentile in the appropriate F
distribution
! In lab, you’ll learn how to do this in EViews.
! If only one exclusion is being tested, then F = t2,
and the p-values will be the same
42