1/26
Outline
Basic Econometrics in Transportation
Features of the discrete choice models.
What is a choice set?
Utility-maximizing behavior.
The most prominent types of discrete choice models
Behavioral Models
Logit,
Generalized extreme value (GEV),
Probit,
Amir Samimi
Mixed logit
How the models are used for forecasting over time.
Civil Engineering Department
Sharif University of Technology
Primary Source: Discrete Choice Methods with Simulation (Train)
2/26
3/26
Discrete Choice versus Regression
The Choice Set
W
With
t regression
eg ess o models,
ode s, the
t e dependent
depe de t variable
va ab e iss continuous,
co t uous,
The
e set of
o alternatives
a te at ves within
wt
a discrete
d sc ete cchoice
o ce framework
a ewo
With infinite number of possible outcomes (e.g. how much money to save).
When there is an infinite number of alternatives, discrete choice models
cannot be applied.
Often regressions examine choices of “how much” and discrete
choice models examine choice of “which.”
This distinction is not accurate.
needs to exhibit three characteristics:
The alternatives must be mutually exclusive.
The decision maker chooses only one alternative from the choice set.
The choice set must be exhaustive
The decision maker necessarily chooses one of the alternatives.
The number of alternatives must be finite.
Discrete choice models have been used to examine choices of “how much”
Whether to use regression or discrete choice models is a specification issue.
Usually a regression model is more natural and easier.
A discrete choice model is used only if there were compelling reasons.
The appropriate specification of the choice set in these
situations is governed largely by the goals of the research and
the data that are available to the researcher.
4/26
5/26
Example
Utility-maximizing Behavior
Consider households’
households choice among heating fuels:
Discrete choice models are usually derived under an assumption
of utility-maximizing behavior by the decision maker.
Available fuels: natural gas, electricity, oil, and wood.
These alternatives violate both mutual exclusivity (a household can have two
It is a psychological stimuli, that is also interpreted as utility.
types) and exhaustiveness (a household can have no heating).
Models that can be derived in this way are called random utility models
How to handle these issues ?
(RUMs).
Note:
To obtain mutually exclusive alternatives:
List every possible combination as an alternative.
2. Define the choice as the choice among fuels for the “primary” heating source.
To obtain exhaustive alternatives:
1. Include “no heating” as an alternative.
2. Redefine as the choice of heating fuel conditional on having heating.
The first two conditions can usually be satisfied.
In contrast, the third condition is actually restrictive.
Models derived from utility
y maximization can also be used to represent
p
decision
1.
6/26
making that does not entail utility maximization.
The models can be seen as simply describing the relation of explanatory
variables to the outcome of a choice, without reference to exactly how the choice
is made.
7/26
Utility-maximizing Behavior
Derivation of Choice Probabilities
Notat
Notation:
o :
You
ou do not
ot obse
observe
ve the
t e decision
dec s o maker’s
a e s utility,
ut ty, rather
at e
A decision maker, labeled n, faces a choice among J alternatives.
some attributes of the alternatives, xnj
The utility that decision maker n obtains from alternative j is Unj.
and some attributes of the decision maker, sn
Decision Rule:
The decision maker chooses the alternative that provides the greatest utility:
choose alternative i if and only if Uni > Unj ∀ j ≠ i .
Note:
This utility is known to the decision maker but not by the researcher.
and then specify a representative utility function, Vnj = V(xnj , sn)
There are aspects that the researcher does not see: Unj = Vnj + εnj
These terms are treated as random, as the researcher does not know εnj.
Joint PDF of the random vector εn = εn1, ... , εnJ is denoted f ((εn)).
The probability that decision maker n chooses alternative i is:
Pni = Prob (Uni > Unj ∀ j ≠ i )
= Prob (Vni + εni > Vnj + εnj ∀ j ≠ i)
= Prob (εnj − εni < Vni − Vnj ∀ j ≠ i)
= ∫ε I(εnj − εni < Vni − Vnj ∀ j ≠ i ) f (εn) dεn
I (·) is the indicator function.
8/26
Derivation of Choice Probabilities
Pnii = ∫ε I(εnjj − εnii < Vnii − Vnjj ∀ j ≠ i ) f (εn) dεn
This is a multidimensional integral over the density of εn.
Different discrete choice models are obtained from different
specifications of the unobserved portion of utility.
Logit and nested logit have closed-form expressions for this integral. Under the
assumption that εn is distributed iid extreme value and a type of generalized
extreme value
value, respectively.
respectively
Probit is derived under the assumption that f is a multivariate normal.
Mixed logit is based on the assumption that the unobserved portion of utility
consists of a part that follows any distribution specified by the researcher plus a
part that is iid extreme value.
10/26
9/26
Meaning of Choice Probabilities
Consider a person who can take car or bus to work.
The researcher observes the time (T) and cost (M) that the person would incur:
Vc = αTc + βMc
and
Vb = αTb + βMb
There are other factors that affect the person’s choice.
If it turns out that Vc = 4 and Vb = 3:
This means that car is better for this person by 1 unit on observed factors.
It does not mean that the person chooses car.
Specifically, the person will choose bus if the unobserved portion of utility is
higher than that for car by at least 1 unit:
Pc = Prob(εb − εc < 1) and Pb = Prob(εb − εc > 1)
What is meant by the distribution of εn?
11/26
Meaning of Choice Probabilities
Specific Models
The
e density
de s ty iss the
t e distribution
d st but o of
o unobserved
u obse ved ut
utility
ty within
wt
the
t e
A qu
quick
c preview
p ev ew of
o logit,
og t, G
GEV,
V, p
probit,
ob t, aandd mixed
ed logit
og t iss use
useful
u to
population of people who face the same observed utility.
Pni is the share of people who choose alternative i within the population of
people who face the same observed utility for each alternative as person n.
The distribution can be considered in subjective terms,
Pni is the probability that the researcher assigns to the person’s choosing
alternative i given the researcher’s ideas about the unobserved utility.
The distribution can represent the effect of factors that are
quixotic to the decision maker (aspects of bounded rationality)
Pni is the probability that these quixotic factors induce the person to choose
alternative i given the observed, realistic factors.
show how they relate to general derivation of all choice models.
Logit
The most widely used discrete choice model.
It is derived under the assumption that εni is iid extreme value for all i:
Unobserved factors are uncorrelated with the same variance over alternatives.
Ap
person who dislikes travel byy bus because of the presence
p
of other riders
might have a similar reaction to rail travel.
GEV
To avoid the independence assumption within a logit.
A comparatively simple GEV model places the alternatives into nests, with
unobserved factors having the same correlation for all alternatives within a nest
and no correlation for alternatives in different nests.
12/26
Specific Models
Probit
Unobserved factors are distributed jointly normal.
With full covariance matrix , any pattern of correlation and heteroskedasticity
can be accommodated.
A customer’s willingness to pay for a desirable attribute of a product is
necessarily positive. (normal distribution has density on both sides of zero)
Mixed logit
Allows
All
th
the unobserved
b
d ffactors
t tto ffollow
ll any di
distribution.
t ib ti
Unobserved factors can be decomposed into a part that contains all the
correlation and heteroskedasticity, and another part that is iid extreme value.
13/26
Identification of Choice Models
Several aspects of the behavioral decision process affect the
specification and estimation of any discrete choice model.
The issues can be summarized easily in two statements:
“Only differences in utility matter”
“The scale of utility is arbitrary.”
Other models
These models are often obtained by combining concepts from other models.
A mixed probit has a similar concept to the mixed logit.
14/26
15/26
Differences in Utility Matter
Alternative-Specific Constants
The
e level
eve oof ut
utility
ty doesn’t
does t matter.
atte .
Itt iss often
o te reasonable
easo ab e to specify
spec y the
t e observed
obse ved part
pa t of
o utility
ut ty to be
Pni = Prob(Uni − Unj > 0 ∀ j ≠ i )
It has several implications for the identification and specification
of discrete choice models.
It means that the only parameters that can be estimated are those that capture
differences across alternatives.
This general statement takes several forms
forms.
Alternative-Specific Constants
Sociodemographic Variables
Number of Independent Error Terms
linear in parameters with a constant.
The alternative-specific constant for an alternative captures the average effect on
utility of all factors that are not included in the model.
Since alternative-specific constants force the εnj to have a zero mean, it is
reasonable to include a constant in Vnj for each alternative.
Researcher must normalize the absolute levels of constants
It is impossible to estimate the constants themselves, since an infinite number of
constants (any values with the same difference) result in the same choice.
The standard procedure is to normalize one of the constants to zero.
In our example: Uc = αTc + βMc + εc and Ub = αTb + βMb + kb + εb
With J alternatives, at most J − 1 constants can enter the model.
16/26
17/26
Sociodemographic Variables
Number of Independent Error Terms
Attributes of the decision maker do not vary over alternatives.
Remember that Pnii = ∫ε I(εnjj − εnii < Vnii − Vnjj ∀ j ≠ i ) f (εn) dεn
This probability is a J-dimensional integral.
The same issue!
Consider the effect of a person’s income on the mode choice decision.
Suppose that a person’s utility is higher with higher income (Y):
With J errors, there are J − 1 error differences.
The probability can be expressed as a (J − 1)-dimensional integral:
Uc = αTc + βMc + θ0cY + εc and Ub = αTb + βMb + θ0bY + kb + εb
Since only differences matter, absolute levels of θ’s cannot be estimated.
To set the level,, one of these parameters
p
is normalized to zero:
Uc = αTc + βMc + εc and Ub = αTb + βMb + θbY + kb + εb
Sociodemographic variables can enter utility in other ways.
For example, cost is often divided by income.
Now, there is no need to normalize the coefficients.
g(·) is the density of these error differences.
For any f, the corresponding g can be derived.
Normalization occurs automatically in logit.
Normalization should be performed in probit.
18/26
19/26
Overall Scale of Utility Is Irrelevant
Normalization with IID Errors
The
e alternative
a te at ve with
w t the
t e highest
g est utility
ut ty iss the
t e same
sa e noo matter
atte how
ow
W
With
t iid
d assu
assumption,
pt o , normalization
o a at o for
o scale
sca e iss straightforward.
st a g t o wa d.
utility is scaled.
Consider U0nj = x’njβ + ε0nj where Var(ε0nj) = σ2
The model U0nj = Vnj + εnj is equivalent to U1nj = λVnj + λεnj for any λ > 0.
To take account of this fact, the researcher must normalize the
scale of utility.
The standard way is to normalize the variance of the error terms.
The scale of utility and the variance of the error terms are linked by definition
definition.
Var(λεnj) =
λ2Var(ε
We will discuss
Normalization with iid Errors,
Normalization with Heteroskedastic Errors
Normalization with Correlated Errors
nj)
The original model is equivalent to U1nj = x’nj(β/σ) + ε1nj where Var(ε1nj) = 1
As we will see, the error variances in a standard logit model are traditionally
normalized to π2/6, which is about 1.6.
Interpretation of results must take the normalization into account
Suppose in a mode choice model, the estimated cost coefficient is −0.55 from a
logit and −0.45 from a probit model.
It is incorrect to say that the logit model implies more sensitivity to costs.
Same issue when a model is estimated on different data sets:
Suppose we estimate cost coefficient of −0.55 and −0.81 for Chicago and Boston
This scale difference means unobserved utility has less variance in Boston
Other factors have less effect on people in Boston
20/26
21/26
Normalization with Heteroskedastic Errors
Normalization with Correlated Errors
Variance of error terms can be different for different segments.
When errors are correlated, normalizing the variance of the error
One sets the overall scale of utility by normalizing the variance for one segment,
and then estimates the variance for each segment relative to this one segment.
In our Chicago and Boston mode choice model:
Unj = αTnj + βMnj + εBnj ∀n in Boston
Unj = αTnj + βMnj + εCnj ∀n in Chicago
Define k = Var(εBnjj)/Var(εCnjj)
for one alternative is not sufficient to set the scale of utility
differences.
Consider the utility for the four alternatives as Unj = Vnj + εnj.
The error vector εn = {εn1, εn2, εn3, εn4} has zero mean and
Covariance matrix
.
.
The scale parameter (k), is estimated along with β and α.
Variance of error term can differ over geographic regions, data
sets, time, or other factors.
22/26
Since only differences in utility matter, this model is equivalent to
one in which all utilities are differenced from, the first alternative.
23/26
Normalization with Correlated Errors
Aggregation
The
e variance
va a ce of
o each
eac error
e o difference
d e e ce depends
depe ds on
o the
t e variances
va a ces
Discrete
sc ete choice
c o ce models
ode s operate
ope ate at individual
d v dua level.
eve .
and covariances of the original errors:
The researcher is usually interested in some aggregate measure.
In linear regression models:
Estimates of aggregate Y are obtained by inserting aggregate values of X’s.
Setting the variance of one of the original errors is not sufficient to set the
variance of the error differences (σ11 = k is not sufficient)
Instead,
Instead normalize the variance of one of the error differences to some number.
number
The covariance matrix for the error differences becomes:
On recognizing that only differences matter and that the scale of utility is
arbitrary, the number of covariance parameters drops from ten to five.
Discrete choice models are not linear in explanatory variables:
Inserting aggregate values of X’s will not
provide an unbiased estimate of average
response.
Estimating the average response by
calculating derivatives and elasticities at the
average of the explanatory variables is
similarly problematic.
Sample enumeration or segmentation.
24/26
25/26
Sample Enumeration
Segmentation
The most straightforward, and by far the most popular approach.
When
The choice probabilities of each decision maker in a sample are
summed, or averaged, over decision makers.
Each sampled decision maker n has some weight associated with him,
representing the number of decision makers similar to him in the population.
A consistent estimate of the total number of the decision makers in the
population who choose i is wnPni.
Average derivatives and elasticities are similarly obtained by
calculating the derivative and elasticity for each sampled person
and taking the weighted average.
26/26
Forecasting
Procedures
ocedu es described
desc bed ea
earlier
e for
o aggregate
agg egate va
variables
ab es are
a e applied.
app ed.
The exogenous variables and/or the weights are adjusted to reflect changes that
are anticipated over time.
Recalibration of constants
To reflect that unobserved factors are different for the forecast area or year.
An iterative process is used to recalibrate the constants.
α0j: The estimated alternative-specific constant for alternative j.
Sj: Share of agents in the forecast area that choose alternative j in base year.
: Predicted share of decision makers in the forecast area who will choose
each alternative.
Adjust alternative-specific constants:
With the new constants, predict the share again, compare with the actual
shares, and if needed adjust the constants again.
The number of explanatory variables is small, and
Those variables take only a few values.
Consider
A model with only two variables in the utility function: education and gender.
The total number of different types of decision makers (segments) is eight, if
there are four education levels for each of the two genders.
Choice probabilities vary only over the segments, not over
individuals within each segment.
Aggregate outcome variables can be estimated by calculating the choice
probability for each segment and taking the weighted sum of these probabilities.
© Copyright 2026 Paperzz