Gra6020-3-2007spring

GRA 6020
Multivariate Statistics; The Linear
Probability model and
The Logit Model (Probit)
Ulf H. Olsson
Professor of Statistics
Binary Response Models
y is a binary response var iable
x'  ( x1 , x2 ,......, xk ) is the full set of exp lanatory
var iables
Pr ob( y  1 | x)  G(  0  1 x1   2 x2  .....   k xk )
 G(  0  xβ)
•The Goal is to estimate the parameters
Ulf H. Olsson
The Linear Probability Model
y   0  1 x1   2 x2  .....   k xk  u
y  1 or y  0;
Pi  Pr ob( yi  1)  1  Pi  Pr ob( yi  0);
E ( y)  Pi   0  1 x1  ...   k xk
Ulf H. Olsson
The Linear Probability Model
• Number of problems
• The predicted value can be outside the interval
(0,1)
• The error term is not normally distributed
• => Heteroscedasticity =>Non-efficient estimates
• T-test is not reliable
Ulf H. Olsson
The Logit Model
z
e
G( z ) 
z
1 e
•The Logistic Function
Ulf H. Olsson
The Probit Model
z
G( z )  ( z )    (u )du;  is the s tan dard

normal distributi on
Ulf H. Olsson
The Logistic Curve G (The Cumulative
Normal Distribution)
Ulf H. Olsson
The Logit Model
G (  0  1 x1  .... k xk   )
 0  1 x1 ....  k xk 
e

 0  1 x1 ....  k xk 
1 e
1

(  (  0  1 x1 ....  k xk  ))
1 e
Ulf H. Olsson
Logit Model for Pi
y  1 or y  0;
Pi  Pr ob( yi  1)

1
(  (  0  1 x1 ....  k xk  ))

1 e
 Pi 
   0  1 x1  .... k xk  
ln 
 1  Pi 
Ulf H. Olsson
The Logit Model
• Non-linear => Non-linear Estimation =>ML
• Comparing estimates of the linear probability model and the
logit model ?
• Amemiya (1981) proposes:
• Multiply the logit estimates with 0.25 and further adding 0.5
to the constant term.
• Model can be tested, but R-sq. does not work. Some pseudo
R.sq. have been proposed.
Ulf H. Olsson
The Logit Model (example)
• Dependent variable: emp=1 if a person has a job, emp=0 if a
person is unemployed
• Independent variables: (x1) edu = yrs. at a university; (x2)
score= score on a dancing contest.
• Estimate a model to predict the probability that a person has
a job, given yrs. at a university and score at the dancing
contest. (data see SPSS-file:Binomgra1.sav)
Ulf H. Olsson
The Logit Model (example)
Coeffi cientsa
Model
1
(Const ant)
edu
sc ore
Unstandardized
Coeffic ients
B
St d. Error
-,144
,241
,124
,065
,050
,034
St andardiz ed
Coeffic ients
Beta
t
-,598
1,907
1,478
,402
,312
Sig.
,558
,074
,158
a. Dependent Variable: emp
Variables in the Equation
Step
a
1
edu
score
Constant
B
,703
,282
-3,640
S.E.
,413
,196
1,765
Wald
2,903
2,060
4,252
df
1
1
1
Sig.
,088
,151
,039
Exp(B)
2,020
1,325
,026
a. Variable(s) entered on s tep 1: edu, score.
Ulf H. Olsson
The Latent Variable Model
y*   0  xβ   i
y  1 when y*  0 and y  0 when y*  0
P( y  1 | x)  P( y*  0 | x)  P(  (  0  xβ) | x)
1  P(  (  0  xβ) | x)  1  G ((  0  xβ))
 G (  0  xβ)
Ulf H. Olsson
The Latent Variable Model
P( y  1 | x)  P( y*  0 | x)
Ulf H. Olsson
Binary Response Models
• The magnitude of each effect  j is not especially useful since y*
rarely has a well-defined unit of measurement.
• But, it is possible to find the partial effects on the probabilities by
partial derivatives.
• We are interested in significance and directions (positive or
negative)
• To find the partial effects of roughly continuous variables on the
response probability:
p( x)
dG( z )
 g (  0  xβ)  j ; where g ( z ) 
x j
dz
Ulf H. Olsson
Binary Response Models
• The partial effecs will always have the same sign as
j
Typically , the l arg est effects :  0  xβ  0
  (0)  0.40 in the Pr obit case
 g (0)  0.25 in the Logit case
Ulf H. Olsson