Review of Probability and Statistics

Binary independent and
dependent variables
FIN822 Li
1
Binary independent variables
y = b0 + b1x1 + b2x2 + . . . bkxk + u
FIN822 Li
2
Dummy Variables
A dummy variable is a variable that takes
on the value 1 or 0
Examples: male (= 1 if are male, 0
otherwise), NYSE (= 1 if stock listed in
NYSE, 0 otherwise), etc.
Dummy variables are also called binary
variables
FIN822 Li
3
A Dummy Independent Variable
Consider a simple model with one
continuous variable (x) and one dummy (d)
y = b0 + d0d + b1x + u
This can be interpreted as an intercept shift
If d = 0, then y = b0 + b1x + u
If d = 1, then y = (b0 + d0) + b1x + u
The case of d = 0 is the base group
FIN822 Li
4
Example,
Y=turnover=trading volume/outstanding shares
X=size of firms=log (stock price*outstanding
shares)
D=1 for NYSE stocks, 0 otherwise (AMEX and
NASDAQ)
If you assume the slope is the same for both
groups, then regress y = b0 + d0d + b1x + u
FIN822 Li
5
Example of d0 > 0
y
y = (b0 + d0) + b1x
d=1
{
slope = b1
d0
d=0
} b0
y = b0 + b1x
x
FIN822 Li
6
Dummies for Multiple Categories
We can use dummy variables to deal with
something with multiple categories
Suppose everyone in your data is either a
HS dropout, HS grad only, or college grad
To compare HS and college grads to HS
dropouts, include 2 dummy variables
hsgrad = 1 if HS grad only, 0 otherwise;
and colgrad = 1 if college grad, 0 otherwise
FIN822 Li
7
Multiple Categories (cont)
Any categorical variable can be turned into
a set of dummy variables
The base group is represented by the
intercept; if there are n categories there
should be n – 1 dummy variables
FIN822 Li
8
Example,
Y=turnover=trading volume/outstanding shares
X=size of firms=log (stock price*outstanding
shares)
D=1 for NYSE stocks, 0 otherwise (AMEX and
NASDAQ)
If you assume the slope and intercept are different
for both groups, you could run two separate
regression for each group, or…
FIN822 Li
9
Interactions with Dummies
Can also consider interacting a dummy
variable, d, with a continuous variable, x
y = b0 + d1d + b1x + d2d*x + u
If d = 0, then y = b0 + b1x + u
If d = 1, then y = (b0 + d1) + (b1+ d2) x + u
This is interpreted as a change in the slope
(and intercept).
FIN822 Li
10
Example of d0 > 0 and d1 <
0
y
y = b0 +
bd1=x 0
d=1
y = (b0 + d0) + (b1 + d1) x
x
FIN822 Li
11
Caveats
A typical use of a dummy variable is when we are
looking for a program effect
For example, we may have individuals that
received job training and wish to test the effect of
training.
We need to remember that usually individuals
choose whether to participate in a program, which
may lead to a self-selection problem
FIN822 Li
12
Self-selection Problems
If we can control for everything that is correlated
with both participation and the outcome of interest
then it’s not a problem
Often, though, there are unobservables that are
correlated with participation. For example,
people’s skill before the training. Low skill people
may be more likely to take the training.
In this case, the estimate of the program effect is
biased.
FIN822 Li
13
Self Selection Problem: An example
Suppose one professor finds that his
evening class students perform better that
his daytime class.
Does that mean the professor teaches better
at night or students learn better at night?
No. It turns out that students participating in
night classes might be more hard working.
They care to come at night, sometimes after
a daytime work, etc.
FIN822 Li
14
Self-selection Problems
If we can control (control means adding
more explanatory variables) for everything
that is correlated with both participation
(the dummy variable) and the outcome (y),
then it’s not a problem.
FIN822 Li
15
Binary Dependent Variables
Logit and Probit models
P(y = 1|x) = G(b0 + xb)
FIN822 Li
16
Binary Dependent Variables
A linear probability model can be written as
P(y = 1|x) = b0 + xb
A drawback to the linear probability model
is that predicted values are not constrained
to be between 0 and 1
An alternative is to model the probability as
a function, G(b0 + xb), where 0<G(z)<1
FIN822 Li
17
The intuition
FIN822 Li
18
The Probit Model
One choice for G(z) is the standard normal
cumulative distribution function (cdf)
G(z) = F(z) ≡ ∫f(v)dv, where f(z) is the
standard normal, so f(z) = (2p)-1/2exp(-z2/2)
This case is referred to as a probit model
Since it is a nonlinear model, it cannot be
estimated by our usual methods
Use maximum likelihood estimation
FIN822 Li
19
The Logit Model
Another common choice for G(z) is the
logistic function
G(z) = exp(z)/[1 + exp(z)] = L(z)
This case is referred to as a logit model, or
sometimes as a logistic regression
Both functions have similar shapes – they
are increasing in z, most quickly around 0
FIN822 Li
20
Probits and Logits
Both the probit and logit are nonlinear and
require maximum likelihood estimation
No real reason to prefer one over the other
Traditionally saw more of the logit, mainly
because the logistic function leads to a more
easily computed model
FIN822 Li
21
Interpretation of Probits and Logits
In general we care about the effect of x on
P(y = 1|x), that is, we care about ∂p/ ∂x
For the linear case, this is easily computed
as the coefficient on x
For the nonlinear probit and logit models,
it’s more complicated:
∂p/ ∂xj = g(b0 +xb)bj, where g(z) is dG/dz
FIN822 Li
22
For logit model
∂p/ ∂xj =bj exp(b0 +xb ) / (1 + exp(b0 +xb) ) 2
For Probit model:
∂p/ ∂xj = bj f (b0 +xb) = bj (2p)-1/2exp(-(b0
+xb)2/2)
FIN822 Li
23
Interpretation (continued)
Can examine the sign and significance
(based on a standard t test) of coefficients,
FIN822 Li
24