Testing for Random Utility Maximization

TESTING FOR RANDOM UTILITY
MAXIMIZATION
by J. Scott Shonkwiler and Darek J. Nalle*
T
HE USE OF MULTI-RESPONSE QUALITATIVE
response models is prevalent in
all fields of microeconomic inquiry. The most common formulation is the
conditional logit model which has been widely embraced to represent qualitative response variables arising from the choice of a single alternative from a set of J
alternatives. Extensions of the conditional logit model typically are chosen to better
account for more flexible patterns of substitution among the J alternatives. Such extensions comprise the nested logit, generalized nested logit, mixed multinomial logit,
and the multinomial probit estimators. Fundamental to these estimators is the maintained assumption of random utility maximization. While random utility maximization
might be a desirable construct in terms of generating utility theoretic welfare
measures, its imposition may result in a misspecified estimator of the data-generating
process.
Most tests of the conditional logit model have been concerned with detecting
violations of the independence from irrelevant alternatives assumption. Curiously,
little work has been directed at inferring consistency with random utility maximization in qualitative response models as contrasted to the extensive literature on testing
for optimizing behavior of agents in the allocation of continuous choice variables. A
principal reason for the lack of testing the random utility maximization hypothesis has
been a dearth of alternative qualitative choice models that are well defined, flexible,
and unconstrained by certain restrictions of random utility maximization. In particular, to test the Hotelling symmetry condition of random utility maximization, the
universal logit and the dogit models have been proposed as schemes for incorporating
the characteristics of other alternatives in the choice model of a given alternative. Yet
the universal logit model is typically computationally infeasible in its general form
and the dogit model is fairly inflexible (McFadden 1984, p. 1415; McFadden 1981,
footnote 33).
We propose the multinomial discrete normal joint probability mass function as a
flexible alternative to the set of statistical models consistent with random utility maximization. We will show that the multinomial discrete normal is a member of the exponential family and that it nests the conditional logit model. We will discuss its
parametric identification and describe how it fails to satisfy the conditions required
*
J. Scott Shonkwiler and Darek J. Nalle are Professor and Assistant Professor, respectively, at the
Department of Resource Economics, University of Nevada, Reno. This chapter has benefited from
discussions with Alan Ker, Klaus Moeltner, Kenneth Train, and Roger von Haefen. Work was
supported in part by the Nevada Agricultural Experiment Station and NRI Grant No. 2002-01815.
2
ESSAYS IN HONOR OF STANLEY R. JOHNSON
for random utility maximization. We will consider more flexible alternatives to the
conditional logit and adopt tests for discriminating between the multinomial discrete
normal and conditional logit specifications. We then implement our approach by analyzing a data set of recreational visits to reservoirs in the Columbia River Basin. First,
however, we will review the key features of random utility maximization for the
linear in income case and illustrate the requirements of random utility maximization
for the conditional logit model.
Random Utility Modeling
A
chooses from i = 1,2,..., J mutually
exclusive alternatives on a given choice occasion. Let an indirect utility function
represent the individual’s (n = 1,2,..., N) level of satisfaction associated with the ith alternative, according to
SSUME THAT A RELEVANT POPULATION
U ni = f i ( I n - p ni) + β' x i + ε ni = µ ni + ε ni ,
where In is money income, pni is individual n’s cost of obtaining the ith alternative, and x
is a vector of attributes associated with each alternative. Assume that the indirect utility
function can be defined to be linear and additive in income and attributes. The probability that the ith alternative (suppressing individual subscripts) yields maximum utility
across the J alternatives is given by
Pi = Prob [U ( I, pi , x i , εi) > U (I, ph , x h , εh )] for all h ≠ i .
Obtaining an operational estimator requires finding a joint distribution for the errors
such that evaluation of the probability above is not computationally burdensome.
Assume the errors are independently and identically distributed according to the joint
extreme value cumulative distribution function F(ε); then the probability of selecting the
ith alternative results in the multinomial logit model with probabilities given by
J
µ
µ
Pi = e i / ∑ e h .
h=1
Under the random utility model Ui = µi + εi with ε distributed F(ε), then E[maxU] =
1n ∑ Jh =1 eµh + E , where E is Euler’s constant, i.e., E = 0.577215665 (McFadden 1999).
In the additive income random utility setup, µni is typically parameterized according to
TESTING FOR RANDOM UTILITY MAXIMIZATION
3
µni = βppni + β'xni .
In this formulation the parameters do not vary across alternatives; therefore, the
statistical model is termed the conditional logit under the iid extreme value assumption
for the εh. It is important to note that there is an equivalent representation of the estimator using either utility levels or utility differences, i.e., the translation invariance property. For example, in the two alternative case, either the probability of alternative one
may be written
Pn1 = exp(µ n1 ) [exp(µ n1 ) + exp(µ n 2 )]
or, if instead we consider the differences in the systematic components of utility, this
probability may be written as
Pn1 = exp(µ n1 − µ n 2 ) [exp(µ n1 − µ n 2 ) + exp(µ n 2 − µ n 2 )] ,
which is estimably equivalent to the former expression. As Hanemann (1999) notes, it is
this characteristic that is necessary for an estimator to be consistent with random utility
maximization (RUM). Violation of this condition can occur when the joint distribution
function of ε depends in some manner on one or more elements of µ (Koning and
Ridder 2003). The literature provides no test of this condition.
A symmetry restriction imposed by RUM is that ∂Pni / ∂p nj = ∂Pnj / ∂p ni (McFadden
1981; Daly and Zachary 1978). This relationship follows from applying Roy’s Identity
to a proper indirect utility function. Termed the Hotelling condition by Daly and
Zachary, we have in the MNL (multinomial logit) case
∂µ
∂Pni ∂µ nj
= − Pni Pnj nj = − Pni Pnjβp .
∂µ nj ∂p nj
∂p nj
This condition can be tested empirically by allowing µh to depend on prices other
than ph.
Finally, the non-negativity requirement is that each of the j ≤ J-1 mixed partials of Pi
with respect to the components of µ, excluding µi, have signs given by
(−1) j ∂ j Pi ∂µ1...∂µi −1∂µi +1...∂µ j ≥ 0.
This guarantees that the implied density function is non-negative (Daly and Zachary
1978) and also has the interpretation that if all other alternatives become more attractive,
the probability of choosing the ith alternative cannot increase (Koning and Ridder 2003).
4
ESSAYS IN HONOR OF STANLEY R. JOHNSON
Borsch-Supan (1990) examines the non-negativity condition because its violation can
arise in the estimation of nested logit models. He introduces the distinction between
global compatibility versus local compatibility with the conditions for random utility
maximization and suggests that for the non-negativity condition the Daly-Zachary–
McFadden conditions for the validity of random utility maximization may be unnecessarily strong. Borsch-Supan develops a weaker validity condition which depends on
the magnitudes of the systematic components of utility (see Herriges and Kling, 1996,
for an extension of this work).
Koning and Ridder (2003) generalize the relationships between global and local
compatibility under stochastic utility maximization. In an empirical application their
tests suggest that even with the fewest maintained assumptions the hypothesis of
random utility maximization cannot be supported. In fact, they claim that the conditions for stochastic utility maximization are much stronger than those underpinning
models of a representative agent facing a continuous choice problem. However, they
do not develop parametric tests for the key assumptions of random utility maximization. To accomplish this, a more general model that nests the conditional logit but is
not bound by the random utility maximization conditions would be needed. We
propose the multinomial discrete normal distribution for this purpose.
Kemp’s Discrete Normal Distribution
T
for multinomial data based on the
discrete normal distribution. The discrete normal distribution − introduced by
Kemp (1997) − has support on all integers, not just non-negative integers. The
intriguing characteristic of this distribution is its generalization to the multivariate
discrete normal distribution. The discrete normal distribution is at the center of the proposed estimator and has recently been reparameterized by Szablowski (2001) and
Shonkwiler (2001) so that location and scale parameters are estimated directly. This
resultant probability mass function is easily shown to be a member of the exponential
family.
Kemp (1997) recognized that the discrete normal distribution (i) is analogous to
the normal distribution in that it is the only two-parameter discrete distribution on
(−∞, ∞) for which the first two moment equations are the maximum-likelihood equations; (ii) is log-concave (Johnson, Kotz, and Balakrishnan 1997, p. 27); and (iii) has
either a single mode or a joint mode at y and y-1. Unfortunately, Kemp’s characterization does not permit closed form expressions of the mean and variance of Y. The
discrete normal may be represented as
HIS STUDY INVESTIGATES A NEW MODEL
TESTING FOR RANDOM UTILITY MAXIMIZATION
P(Y = y) =
5
e −.5( y − µ )
∑e
2
/ σ2
−.5(Y − µ )2 / σ2
Y
(Szablowski 2001; Shonkwiler 2001), such that E(Y) ≈ µ and V(Y) ≈ σ2.
While there do not exist closed form expressions for the mean and variance, the
location and scale parameters, µ and σ2, are the approximate mean and variance
respectively. We say approximate because problems arise when σ2→0. Essentially, in
these situations, the available points of support are not sufficient to reproduce a mass
function with mean and variance equal to µ and σ2 respectively (see also Szablowski
2001). For σ2 > 1, µ and σ2 can effectively be considered the mean and variance of
the discrete normal distribution (with accuracy better than about 10-4 using the bounds
provided by Szablowski).
Multinomial Discrete Normal Distribution
The univariate discrete normal distribution can be generalized to the multivariate
case (Shonkwiler 2001) using the representation
P(Y1 = y1 , Y2 = y 2 ,..., Ym = y m ) =
exp( −.5(y − µ) ' ∑ −1 (y − µ))
yi = −2, −1,0,1, 2,...∀i,
−1
∑ ...∑ ∑ exp(−.5(Y − µ) ' ∑ (Y − µ))
Ym
Y2 Y1
where Σ is a positive (semi) definite symmetric matrix and quantities in bold denote
the corresponding m-element vectors. Empirical application of this distribution can be
computationally intensive because the entire support of the distribution enters the
normalizing factor. We see that this problem can be mitigated under certain types of
truncation of the distribution. When each yj is truncated such that yh = 1 and yj≠h = 0,
the support of the multivariate discrete normal for multinomial data with J alternatives is an identity matrix of dimension J. As a consequence, the application of the
multinomial discrete normal requires relatively few computations to obtain a normalizing factor. In contrast, the multinomial probit model requires integration of order
J-1 (Hausman and Wise 1978).
The multinomial discrete normal has several attractive properties. First, each
conditioning variable can exhibit both positive and negative marginal cross effects,
unlike the multinomial logit, which restricts all cross effects of a conditioning variable to have the same sign. Second, the multivariate binary discrete normal coincides
with the MNL when Σ is an identity matrix (see below). Third, while the multinomial
probit can accommodate both positive and negative correlations, it is typically more
computationally intensive than the multinomial discrete normal distribution.
The multinomial discrete normal joint probability mass function is
6
ESSAYS IN HONOR OF STANLEY R. JOHNSON
exp(−.5(y − µ) ' Σ + (y − µ))
.
+
∑ ..... ∑ ∑ exp( −.5(Y − µ) ' Σ (Y − µ))
P(Y1 = y1 , Y2 = y 2 ,..., YJ = y J ) =
YJ =1
Yj≠ J = 0
Y2 =1
Y1 =1
Yj≠2 = 0 Yj≠1 = 0
This form, however, obscures the density’s relationship to the multinomial. This leads to
Proposition 1. The likelihood for the nth observation may be written as
J
∏ P(y nj = 1)
j=1
y nj
,
such that
P(y nj = 1) =
exp(µ n '[Σ + ]. j − .5[Σ + ] jj )
J
+
+
∑ exp(µ n '[Σ ].h − .5[Σ ]hh )
,
h =1
where [Σ+]jj denotes the jth diagonal element of Σ+, and [Σ+]j denotes the jth column of Σ+.
Proposition 2. When Σ = I the multinomial discrete normal and multinomial logit
models coincide. Proof in Appendix.
Identification in the Discrete Normal Multinomial Model
Shonkwiler and Nalle (2003) discuss the conditions necessary for identifying the
elements of Σ and µn. Typically the jth element of the vector µn will be parameterized
as µnj = xnjβ so that both the elements of the β vector and the (potentially) J(J+1)/2
unique elements of Σ will be estimated. For the J>2 case, the identification requirement is that one diagonal element of Σ be normalized to unity and one off-diagonal
element be normalized to zero (Shonkwiler and Nalle). We choose to require that
Σ11 = 1 and Σ21 = 0. One reason for such a choice is that if the Cholesky decomposition
is employed to form Σ = AA' (where A is a lower triangular matrix), then the identification requirement is satisfied by imposing A11 = 1 and A21 = 0. In terms of identifying the elements of β, we consider the restrictions that identify the full multinomial
model. If there are K variables that change over observations and alternatives, then
(J-1)K parameters can be identified along with J-1 constant terms. As J increases, the
number of over-identifying restrictions increases as well. Consequently the model is
distinct from the universal logit.
TESTING FOR RANDOM UTILITY MAXIMIZATION
7
Violation of the Daly-Zachary–McFadden Conditions
Because the conditional logit is nested within the multinomial discrete normal
model, we show those conditions of stochastic utility maximization that may not be
satisfied in its general form. Consider the fundamental case of model equivalence in
terms of utility differences. In the two-alternative case,
Pn1 =
exp(µ n1 − 0.5)
−1
−2
exp(µ n1 − 0.5) + exp(µ n 2s 22
− 0.5s 22
)
,
and only if s22 = 1 can the probability be equivalently written in terms of differences of
the µh. Consequently the multinomial discrete normal distribution does not satisfy translation invariance. The symmetry (Hotelling) property in the multinomial discrete normal
case is
∂µ
∂Pni ∂µ nj
= Pni (Σij − Pn ' Σ. j ) nj = Pni (Σij − Pn ' Σ⋅ j )β p ,
∂µ nj ∂p nj
∂p nj
where Σ⋅ j denotes the jth column of Σ −1 and Pn denotes the J×1 vector of probabilities.1
In general, there is no symmetry of cross-price effects, nor are cross-price effects restricted to be positive as in the conditional logit model. Thus the non-negativity condition may not be satisfied as well.
Generalizing and Testing the MNL Model
E
typically are chosen to better
account for more flexible patterns of substitution among the J alternatives and
to relax the independence from irrelevant alternatives property which the MNL model
possesses. Such models comprise the nested logit, generalized nested logit, mixed
multinomial logit, and the multinomial probit estimators. Generally these models can
be made consistent with requirements of random utility maximization. But specification issues arise in the case of nested logit models due to the lack of theory to guide
the choice of nesting structures – typically the selection of the nesting structure is
somewhat arbitrary. Similarly, generalizations which may violate the conditions of
RUM, such as the universal logit, dogit, and approximate generalized extreme value
1
XTENSIONS OF THE CONDITIONAL LOGIT MODEL
More generally the symmetry condition relates to the systematic effects and requires that ∂Pni /∂µnj =
∂Pnj /∂µni – a property not maintained by the multinomial discrete normal. This condition is satisfied in the
conditional logit when attributes of the other h ≠ i alternatives do not enter µi. This assumption is known as
weak complementarity.
8
ESSAYS IN HONOR OF STANLEY R. JOHNSON
(GEV) models (Small 1994), have forms that are subject to the analyst’s tastes as
opposed to being dictated by theory or convention.
In contrast, the mixed (or random) parameters logit model operates with the same
specification of the systematic component of utility as the MNL and introduces
flexibility by randomizing the parameters of the conditioning variables. Consequently
the IIA property does not hold. According to McFadden and Train (2000), mixed
multinomial logit can “closely approximate a very broad class of RUM models”
(p. 451). Additionally, it can be considered as an error components type model and
thus is attractive for modeling panel data (Train 2003). And unlike generalized
extreme value RUMs, mixed multinomial logit models are nested relative to multinomial discrete normal models with identical random parameters. We will exploit this
property in testing the random utility hypothesis implied by mixed logit models as
well as the constant parameter logit model.
In the empirical application, we entertain parameters of the form βkn = βk + Γk⋅ηn ,
where Γ is a lower triangular matrix of dimension K×K and η denotes a standard normal random error vector. If Γ is a diagonal matrix, then the random parameters are
uncorrelated. For the mixed logit, the probability of the jth alternative for the nth individual is given by
Pnj = ∫ (e
µ ( β )nj
∑e
µ ( β )nh
)φ(β, ΓΓ ')dβ = ∫ L(β) nj φ(β, ΓΓ ')dβ
(Train 2003) when the β’s are assumed normally distributed with mean vector β and
variance-covariance ΓΓ'. The estimator is implemented by taking R draws and evaluating P̂nj =1/ R ∑ Rr=1 L(βr ) nj , j = 1,2,…,J, for each observation. The P̂nj ’s are used to form
the maximum simulated likelihood. Estimation of the random parameter multinomial
discrete normal is performed analogously.
Testing Strategy
Given that either constant or random parameter conditional logit models are
nested relative to constant or random parameter multinomial discrete normal models,
our tests of compatibility of the random utility restrictions embodied by the logit
models rest on the hypotheses that Σ22 = Σ33 =…ΣJJ = 1 and Σij = 0 ∀, i ≠ j and i,j ≠
1,2. This yields a total of ½ J(J+1)−2 restrictions. Under the assumption that the
multinomial discrete normal distribution characterizes the data-generating process,
then likelihood ratio tests may be used to discriminate between the two models.
Under misspecification of the data-generating process, however, robust tests of the
restrictions associated with the conditional logit model are obtained by using the
Wald test given in Theorem 3.4 of White (1982). We will present both tests.
TESTING FOR RANDOM UTILITY MAXIMIZATION
9
Empirical Application
T
HE DATA USED TO ESTIMATE THE MODEL were obtained from a subset of data
collected using a mail survey of a sample of individuals who live in the Pacific
Northwest. The larger data set was developed for use in examination of water reallocation policy issues (see Callaway et al. 1995), the most important being related to
flushing salmon smolts down the Columbia River from spawning areas. The survey
questionnaire focuses mainly on reservoirs on the Columbia, and we select four such
reservoirs as destinations for the analysis: Lake Roosevelt (behind Grand Coulee Dam),
Dworshak, Lower Granite, and Lake Pend Oreille. We use only the actual behavior data
(as opposed to contingent behavior data, which were also collected) for the analysis
below. Respondents reported their visits each summer month to each of the four reservoirs. For this analysis, 234 randomly sampled respondents reported a total of 1,396
trips to these reservoirs.
Because we use travel costs to sites as a proxy for the price of a trip, one can
interpret the model below as a recreation demand, or travel cost model (Shonkwiler
1999; Shonkwiler and Shaw 2003), and, as such, satisfaction of the requirements of
optimizing behavior is necessary if conventional measures of welfare changes are to
be valid. In addition to the travel cost variable, which mainly drives the empirical
model, we also use the monthly average deviation of each water level away from its
full pool level (e.g., a negative ten means ten feet below full pool). We recognize that
our sample contains different types of recreators (e.g., anglers, water skiers, people
picnicking, etc.), and so we have no a priori expectation on the direction of influence
this variable should have on trips. One might hypothesize that shore-oriented users
prefer more shore to be available, suggesting a negative coefficient on a deviation
below full pool, and the reverse might be true for boaters who may prefer high water.
Constant Parameter Estimation Results
Under the assumption that the multiple trips made by respondents are independent
of each other, a conventional multinomial model was estimated with the systematic
component of utility modeled for the nth observation on the jth site as µnj = βppnj +
βddevnj + β3d3j + β4d4j, where p represents travel cost, dev represents deviation from
full pool, and the dij, dij = I(i=j), are constants representing the Lower Granite and
Lake Pend Oreille sites, respectively. Although the model is not fully saturated, it is
nearly so because dev has only 8 unique values (some reservoir levels did not change
every month). Table 1 reports the parameter estimates and associated robust (White
1982) standard errors for the multinomial model under independence. Two multinomial discrete normal counterparts were also estimated under the independence
assumption. Table 2 reports the results for a model in which Σ is restricted to be
diagonal, and Table 3 provides estimates for the model when Σ is unconstrained
(aside from those restrictions required for identification). In each table the average
10
ESSAYS IN HONOR OF STANLEY R. JOHNSON
Table 1. Multinomial Logit. Log Likelihood: -740.30
Parameter
Estimate
Std. Error
-0.06908
0.00282
βp
0.00849
0.00408
βd
-1.12527
0.15292
β3
0.55066
0.08601
β4
Asymptotic t
-24.49460
2.08096
-7.35832
6.40226
Average Price Effects Matrix
∂π1
∂π2
∂π3
∂π4
∂p1
-0.0062
0.0004
0.0006
0.0052
∂p2
0.0004
-0.0040
0.0031
0.0005
∂p3
0.0006
0.0031
-0.0042
0.0005
∂p4
0.0052
0.0005
0.0005
-0.0061
price effects matrix is included to provide information regarding the sensitivity of
E[Yi] = πi to changes in own and cross prices.
In the independent multinomial case we can test the restrictions necessary for
random utility maximization by considering the estimated elements of the Σ matrix.
For the model reported in Table 2, a test of RUM is that Σjj = 1 (j=2,3,4).2 The robust
Wald test (White 1982) and likelihood ratio tests of this restriction yield test statistics
of 42.90 (p=.000) and 43.52 (p=.000), respectively. A joint test that Σjj = 1 (j=2,3,4)
and Σij = 0, i≠j, determined from the model reported in Table 3, produces robust Wald
and likelihood ratio statistics of 1179.8 (p=.000) and 125.18 (p=.000), respectively.
Therefore, we conclude that under the hypothesis of independence with a constantparameters specification, the data-generating process is not consistent with the logit
specification.
Random Parameter Estimation Results
Of course the conclusion above is not particularly surprising considering that the
logit model is of a restrictive form and the panel nature of the data set has been
ignored. Given that most of the individuals in the data set make repeated choices,
allowing parameters that vary over individuals but not over choice situations for an
individual is a defensible notion as long as the individual’s tastes are stable over the
choice period. All the β’s then were assumed to be normally distributed, and a set of
2
In the diagonal Σ specification, note that ∂Pi ∂µ j = −Pi Pj / σ jj .
TESTING FOR RANDOM UTILITY MAXIMIZATION
11
Table 2. Multinomial Discrete Normal with Diagonal Σ Matrix. Log Likelihood:
−718.54
Parameter
βp
βd
β3
β4
Σ22
Σ33
Σ44
Estimate
-0.08799
0.01212
-2.14192
-0.54365
1.06465
1.21849
1.31493
Std. Error
0.00488
0.00547
0.36279
0.24551
0.05921
0.05240
0.05265
Asymptotic t
-18.04770
2.21698
-5.90410
-2.21436
17.97966
23.25269
24.97494
Average Price Effects Matrix
∂π1
∂π2
∂π3
∂π4
∂p1
-0.0076
0.0003
0.0008
0.0064
∂p2
0.0003
-0.0046
0.0038
0.0005
∂p3
0.0006
0.0033
-0.0045
0.0006
∂p4
0.0049
0.0004
0.0005
-0.0058
1,500 standard normal draws for each of the 234 individuals was created. Specifically, the kth parameter for the nth individual making a choice in the tth period is βknt =
βk + Γk⋅ηn, where, as before, Γ is a lower triangular matrix and η denotes a standard
normal random error vector. When the individual makes multiple choices, the probabilities for each choice occasion are calculated from a draw of β. Then the products
of these probabilities over choice occasions are averaged (Train 2003, p. 150). The
same set of draws is used in estimating each of the random parameter models. The
maximum simulated likelihood estimator (Train, p. 148) was programmed in GAUSS.
We first consider the case where the random coefficients are assumed uncorrelated (Γ is diagonal). The estimation results for the mixed logit model are presented in
Table 4. Note the substantial increase in log likelihood due to introducing the panel
structure via the random parameters. The variance-covariance matrix of the random
parameters is obtained from the expression ΓΓ'. Therefore the (absolute values of)
estimated Γ values may be interpreted as standard deviations since Γ is diagonal in
this case. Note that the coefficient mean on the deviation from full pool variable is not
significantly different from zero at conventional levels (p=.121) based on the robust
statistic; however, its associated standard deviation is significantly greater than zero
(p=.025), so this variable should remain in the model.
We contrast the results in Table 4 to the random parameter multinomial discrete
normal model reported in Table 5. Because of its identical random parameters
12
ESSAYS IN HONOR OF STANLEY R. JOHNSON
Table 3. Multinomial Discrete Normal Estimator. Log Likelihood: -677.71
Parameter
Estimate
Std. Error
Asymptotic t
βp
-0.03436
0.03000
-1.14525
βd
0.00372
0.00447
0.83251
β3
-1.21937
0.50556
-2.41191
β4
-0.10902
0.57583
-0.18932
Σ22
1.11504
0.18152
6.14266
Σ33
1.78413
0.38381
4.64850
Σ44
1.16697
0.25340
4.60529
Σ13
0.87635
0.52767
1.66080
Σ14
0.42366
0.18327
2.31173
Σ23
-0.08902
0.13482
-0.66030
Σ24
0.73195
0.21993
3.32816
Σ34
0.13016
0.09937
1.30983
Average Price Effects Matrix
∂π1
∂π2
∂π3
∂π4
∂p1
-0.0081
-0.0030
0.0041
0.0070
∂p2
-0.0037
-0.0040
0.0030
0.0048
∂p3
0.0035
0.0019
-0.0031
-0.0024
∂p4
0.0064
0.0035
-0.0026
-0.0073
specification, the mixed logit is nested relative to it. The test that Σ = I yields robust
Wald and likelihood ratio statistics of 172.34 (p=.000) and 72.60 (p=.000),
respectively. Therefore it appears that the assumption of RUM implied by the mixed
logit model is inconsistent with the data-generating mechanism. A more flexible
mixed logit specification still exists, however – we can allow the random parameters
to be correlated. This introduces six more parameters. Table 6 provides the estimation
results for the correlated parameter mixed logit model. Since the parameter variancecovariance matrix is not estimated directly, the implied parameter correlation matrix
is also presented. The robust Wald and likelihood ratio tests of the hypothesis that Γij
= 0, i≠j, yields test statistics of 78.64 (p=.000) and 24.26 (p=.000), respectively,
suggesting that the introduction of non-zero covariances among random parameters
produces a better model. This model is nested relative to the random parameter
TESTING FOR RANDOM UTILITY MAXIMIZATION
13
Table 4. Mixed Multinomial Logit. Log Likelihood: -546.55
Parameter
Estimate
Std. Error
-0.13156
0.01587
βp
0.01390
0.00897
βd
-2.04596
0.77970
β3
0.62461
0.36965
β4
0.06652
0.01480
Γ11
0.02902
0.01476
Γ22
2.96007
0.77072
Γ33
2.51068
0.47783
Γ44
Asymptotic t
-8.28729
1.54987
-2.62405
1.68974
4.49386
1.96581
3.84066
5.25435
Average Price Effects Matrix
∂π1
∂π2
∂π3
∂π4
∂p1
-0.0139
0.0012
0.0020
0.0107
∂p2
0.0012
-0.0083
0.0055
0.0016
∂p3
0.0020
0.0055
-0.0101
0.0026
∂p4
0.0107
0.0016
0.0026
-0.0148
discrete normal model, whose results are shown in Table 7. A test that Σ = I yields
robust Wald and likelihood ratio test statistics of 163.87 (p=.000) and 60.60 (p=.000),
respectively. Again, we conclude that the random utility maximization restrictions embedded in the correlated mixed logit model are not consistent with the data-generating
mechanism.
Note that the multinomial discrete normal model is always more highly parameterized than the corresponding logit model, and under a poorly specified conditional
mean function one could argue that this over-parameterization works in favor of
rejecting RUM. But comparison of the correlated mixed logit with the uncorrelated
random parameter multinomial discrete normal makes the degrees-of-freedom critique less of an issue. Here the latter model has only two more parameters than the
former. Since the models are non-nested, we use Vuong’s (1989) test to discriminate
between the two. The z-score for testing the superiority of the random parameter
multinomial discrete normal is 5.00 (p=.000), providing strong evidence for its
selection.3
3
We could also consider whether the uncorrelated random parameter discrete normal model is to be
preferred to its counterpart with correlated random parameters. The robust Wald and likelihood ratio
tests of the hypothesis that Γij = 0, i≠j, yield test statistics of 26.34 (p=.000) and 12.26 (p=.056),
respectively. Because of the conflicting criteria and the fact that the likelihood ratio test is not robust to
distributional misspecification, a robust Lagrange multiplier test of this hypothesis was formulated as a
14
ESSAYS IN HONOR OF STANLEY R. JOHNSON
Table 5. Random Parameter Multinomial Discrete Normal. Log Likelihood:
−510.25
Parameter
Estimate
Std. Error
Asymptotic t
-0.09999
0.01884
-5.30628
βp
0.02570
0.02113
1.21633
βd
-2.28663
1.42798
-1.60130
β3
-2.31452
0.90489
-2.55780
β4
0.01075
0.01834
0.58609
Γ11
0.05859
0.03349
1.74937
Γ22
3.65608
0.88749
4.11959
Γ33
1.39979
0.34476
4.06014
Γ44
1.70904
0.36332
4.70395
Σ22
1.65143
0.29767
5.54780
Σ33
1.41251
0.10368
13.62381
Σ44
0.19257
0.32860
0.58603
Σ13
0.77832
0.07346
10.59450
Σ14
-0.04919
0.20416
-0.24093
Σ23
0.32941
0.22376
1.47215
Σ24
0.26567
0.20988
1.26585
Σ34
Average Price Effects Matrix
∂π1
∂π2
∂π3
∂π4
∂p1
-0.0173
-0.0016
0.0019
0.0171
∂p2
-0.0022
-0.0043
0.0025
0.0039
∂p3
0.0007
0.0026
-0.0052
0.0019
∂p4
0.0133
0.0023
0.0008
-0.0163
score test. It produced a test statistic of 9.082 (p=.169). Given that there were no compelling reasons
for including random parameter covariances, we would likely choose the more parsimonious model.
TESTING FOR RANDOM UTILITY MAXIMIZATION
15
Table 6. Mixed Multinomial Logit with Correlated Parameters. Log Likelihood:
−534.42
Parameter
Estimate
Std. Error
Asymptotic t
-0.13882
0.02043
-6.79645
βp
0.02379
0.00892
2.66735
βd
-2.93706
0.95972
-3.06033
β3
0.58871
0.39381
1.49489
β4
0.06854
0.01452
4.72106
Γ11
0.02358
0.00764
3.08538
Γ22
2.83812
0.52919
5.36317
Γ33
-0.15355
0.55384
-0.27724
Γ44
-0.03049
0.00884
-3.44772
Γ21
1.70795
0.85570
1.99596
Γ31
-0.59317
0.39354
-1.50726
Γ32
0.91033
0.52099
1.74731
Γ41
-2.39703
0.34666
-6.91461
Γ42
-0.58727
0.67463
-0.87050
Γ43
Implied Parameter Correlation Matrix
βp
βd
β3
β4
βp
1.000
βd
-0.791
1.000
β3
0.508
-0.509
1.000
β4
0.345
-0.830
0.148
1.000
∂p2
0.0008
-0.0084
0.0061
0.0014
∂p3
0.0022
0.0061
-0.0110
0.0027
∂p4
0.0111
0.0014
0.0027
-0.0152
Average Price Effects Matrix
∂π1
∂π2
∂π3
∂π4
∂p1
-0.0141
0.0008
0.0022
0.0111
16
ESSAYS IN HONOR OF STANLEY R. JOHNSON
Table 7. Multinomial Discrete Normal with Correlated Random Parameters.
Log Likelihood: −504.12
Parameter
Estimate
Std. Error
Asymp. t
-0.11997
0.02319
-5.17254
βp
0.03260
0.01835
1.77675
βd
-4.18226
2.13511
-1.95880
β3
-3.55298
1.00296
-3.54249
β4
0.04021
0.01012
3.97235
Γ11
0.02543
0.04257
0.59749
Γ22
2.20081
4.21866
0.52168
Γ33
0.85609
0.24511
3.49270
Γ44
-0.03372
0.02224
-1.51592
Γ21
1.12011
0.88323
1.26819
Γ31
3.17119
2.25504
1.40627
Γ32
2.01848
0.50048
4.03306
Γ41
-0.00634
0.97266
-0.00652
Γ42
-0.52692
0.49827
-1.05749
Γ43
1.46969
0.54769
2.68342
Σ22
1.81879
0.40344
4.50822
Σ33
1.38213
0.12569
10.99609
Σ44
0.11408
0.29094
0.39211
Σ13
0.74669
0.1207
6.18655
Σ14
-0.09926
0.28881
-0.34369
Σ23
0.23594
0.18487
1.27624
Σ24
0.30563
0.25176
1.21399
Σ34
Implied Parameter Correlation Matrix
βp
βd
β3
β4
βp
1.000
βd
-0.798
1.000
β3
0.279
0.253
1.000
β4
0.895
-0.716
0.119
1.000
TESTING FOR RANDOM UTILITY MAXIMIZATION
17
Average Price Effects Matrix
∂π1
∂π2
∂π3
∂π4
∂p1
-0.0198
-0.0005
0.0013
0.0189
∂p2
-0.0020
-0.0050
0.0033
0.0037
∂p3
-0.0005
0.0026
-0.0052
0.0031
∂p4
0.0154
0.0012
0.0013
-0.0179
Discussion
It is tempting to speculate about the features of this data set that lead to the demonstrated superiority of the multinomial discrete normal estimator. As mentioned,
previous studies using data from the same survey found that own price explained
most of the variation in visitation patterns. We can analyze own price effects by
examining the diagonal elements of the Average Price Effects Matrix provided for
each estimated model. Upon inspection, there appear to be no uniform differences
between the magnitudes of own price effects across comparable models. On the
other hand, note that for all multinomial discrete models estimated (excepting, of
course, the one with diagonal Σ), there are at least two off-diagonal elements of the
Average Price Effects Matrix that are negative. Considering that every column of
the Average Price Effects Matrix is constrained to sum to zero and that the diagonal
elements are logically negative, this seems to be an important finding. The
multinomial discrete normal models apparently capture a complementarity
relationship between certain alternatives – a relationship that is precluded by
models that must satisfy the conditions of random utility maximization.
Summary
R
ANDOM UTILITY MAXIMIZATION IS A COMMONLY maintained hypothesis in
multi-response qualitative response models. Most practitioners would likely
claim that it is a benign restriction that can be satisfied by proper (or flexible)
specification of the qualitative response model. Such confidence is manifested by the
work of McFadden and Train (2000), who claim that the mixed multinomial logit
model can closely approximate any random utility model and thus, by extension, can
represent behavioral choice in general. But random utility maximization is in fact a
very stringent requirement that is not satisfied when attributes of other alternatives
affect the utility of one alternative or when marginal cross effects are not identically
signed and symmetric.
Certainly more complicated GEV models might exist that may satisfy the DalyZachary–McFadden properties for additive-in-income random utility maximization.
These models would not likely be nested relative to a multinomial discrete normal
18
ESSAYS IN HONOR OF STANLEY R. JOHNSON
estimator, so non-nested hypothesis type tests would be needed in order to discriminate between them. Current practice has, however, gravitated away from generalized
extreme value models in favor of mixed logit models largely because of the specification and estimation difficulties GEV models present.
This study has introduced a new statistical distribution for estimating multiresponse qualitative choice models. The multinomial discrete normal distribution
appears to be an ideal device for testing MNL models with either constant or random
coefficients since these latter models are nested in the former. Unlike previous
models that have been promulgated to verify RUM, the multinomial discrete normal
estimator has a well-defined form. Estimation is straightforward and it permits the
systematic introduction of attributes of alternatives. Based on the empirical example,
the multinomial discrete normal estimator provides strong evidence against random
utility maximization and produces substantially better fitting models than its MNL
counterparts.
References
Borsch-Supan, A. “On the Compatibility of Nested Logit Models with Utility Maximization.” Journal
of Econometrics 43(1990): 373-388.
Callaway, J.M., S. Ragland, S. Keefe, T.A. Cameron, and W.D. Shaw. “Columbia River Systems Operation Review of Recreation Impacts: Demand Model and Simulation Results.” Report prepared
for the U.S. Army Corps of Engineers by RGC/Hagler Bailly, Boulder, CO, 1995.
Daly, A.J., and S. Zachary. “Improved Multiple Choice Models.” In D. Hensher and Q. Dalvi, eds.
Determinants of Travel Choice. Farnborough, UK: Saxon House, 1978.
Hanemann, W.M. “Welfare Analysis with Discrete Choice Models.” In J. Herriges and C. Kling, eds.,
Valuing Recreation and the Environment. Northampton, MA: Edward Elgar, 1999.
Hausman, J.A., and D.A. Wise. “A Conditional Probit Model for Qualitative Choice: Discrete Decisions Recognizing Interdependence and Heterogeneous Preferences.” Econometrica 46(1978):
403-426.
Herriges, J.A., and C.L. Kling. “Testing the Consistency of Nested Logit Models with Utility
Maximization.” Economics Letters 50(1996): 33-39.
Johnson, N.L., S. Kotz, and N. Balakrishnan. Discrete Multivariate Distributions. New York: John
Wiley, 1997.
Kemp, A. “Characterizations of a Discrete Normal Distribution.” Journal of Statistical Planning and
Inference 63(1997): 223-229.
Koning, R.H., and G. Ridder. “Discrete Choice and Stochastic Utility Maximization.” Econometrics
Journal 6(2003): 1-27.
Lindsey, J.K. Parametric Statistical Inference. New York: Oxford University Press, 1996.
McFadden, D. “Econometric Models of Probabilistic Choice.” In C. Manski and D. McFadden, eds.,
Structural Analysis of Discrete Data with Econometric Applications. Cambridge, MA: MIT Press,
1981.
McFadden, D. “Econometric Analysis of Qualitative Response Models.” In Z. Griliches and M.
Intriligator, eds., Handbook of Econometrics (Vol. II). Amsterdam: North Holland, 1984.
McFadden, D. “Computing Willingness-to-Pay in Random Utility Models.” In J.R. Melvin, J.C. Moore,
and R. Riezman, eds., Trade, Theory and Econometrics. New York: Routledge, 1999.
TESTING FOR RANDOM UTILITY MAXIMIZATION
19
McFadden, D., and K. Train. “Mixed MNL Models for Discrete Response.” Journal of Applied
Econometrics 15(2000): 447-470.
Shonkwiler, J.S. “Recreation Demand Systems for Multiple Site Count Data Travel Cost Models.” In
J. Herriges and C. Kling, eds., Valuing Recreation and the Environment. Northampton, MA:
Edward Elgar, 1999.
Shonkwiler, J.S. “New Estimators for the Analysis of Univariate and Multivariate Discrete Data.”
Working paper, Applied Economics and Statistics Department, University of Nevada, 2001.
Shonkwiler, J.S., and D.J. Nalle. “An Empirical Test of the Random Utility Hypothesis.” Working
paper, Applied Economics and Statistics Department, University of Nevada, 2003.
Shonkwiler, J.S., and W.D. Shaw. “A Finite Mixture Approach to Analyzing Income Effects in Random Utility Models: Reservoir Recreation along the Columbia River.” In N. Hanley, W.D. Shaw,
and R.E. Wright, eds., The New Economics of Outdoor Recreation. Northampton, MA: Edward
Elgar, 2003.
Small, K.A. “Approximate Generalized Extreme Value Models of Discrete Choice.” Journal of
Econometrics 62(1994): 351-382.
Szablowski, P.J. “Discrete Normal Distribution and Its Relationship with Jacobi Theta Functions.”
Statistics and Probability Letters 52(2001): 289-299.
Train, K.E. Discrete Choice Methods with Simulation. Cambridge, UK: Cambridge University Press,
2003.
Vuong, Q.H. “Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses.” Econometrica 57(1989): 307-333.
White, H. “Maximum Likelihood Estimation of Misspecified Models.” Econometrica 50(1982): 1-25.
20
ESSAYS IN HONOR OF STANLEY R. JOHNSON
Appendix
Proof of Proposition 1. We show the result for the case of J = 2; the case of J>2
follows by induction.
P(Y1 = y1 , Y2 = y 2 ) =
exp(−.5(y − µ)' Σ + (y − µ))
+
∑ ∑ exp(−.5(Y − µ) ' Σ (Y − µ))
Y2 =1 Y1 =1
Y1 = 0 Y2 = 0
If y1 = 1,
P(y1=1, y2=0) =
exp(−[(1 − µ1 ) 2 Σ11 − 2µ 2 (1 − µ1 )Σ12 + µ 22 Σ 22 ] / 2)
exp(−[(1 − µ1 ) 2 Σ11 − 2µ 2 (1 − µ1 )Σ12 + µ 22 Σ 22 ] / 2) + exp(−[µ12 Σ11 − 2µ1 (1 − µ 2 )Σ12 + (1 − µ 2 )2 Σ 22 ] / 2)
.
If y2 = 1,
P(y1=0, y2=1) =
exp(−[µ12 Σ11 − 2µ1 (1 − µ 2 )Σ12 + (1 − µ 2 ) 2 Σ 22 ] / 2)
exp(−[(1 − µ1 ) 2 Σ11 − 2µ 2 (1 − µ1 )Σ12 + µ 22 Σ 22 ] / 2) + exp(−[µ12 Σ11 − 2µ1 (1 − µ 2 )Σ12 + (1 − µ 2 )2 Σ 22 ] / 2)
.
Note that terms involving µ12 Σ11 , µ1µ 2 Σ12 , and µ 22 Σ 22 cancel from both numerator and
denominator, yielding
P(y1 = 1, y 2 = 0) =
exp((µ1 − 0.5)Σ11 + µ 2 Σ12 )
exp((µ1 − 0.5)Σ + µ 2 Σ12 ) + exp(µ1Σ12 + (µ 2 − 0.5)Σ 22 )
11
,
etc.
Proof of Proposition 2. Trivial given Proposition 1 and upon recognizing that all Σij =
0∀i ≠ j and Σjj = 1 and multiplying the numerator and denominator by exp(0.5).